Sunday, September 25, 2016

Using and abusing the GNU assembler for ATtiny programming...


One of the big troubles with programming AVRs on Linux, as you might have gathered from previous posts, is that you have to use the GNU compiler family and the AVR-libc library, along with your favorite editor and avrdude and/or the ArduinoISP to finish the job.

ALL this is tolerable, even comfortable, if you are a Linux geek and a command line junkie.  At least if you are writing in C; you can even use the Arduino IDE and libraries, if your target μc is supported in it.  Or you can do it the 'hard way', with straight AVR-libc, gcc, and the command line. Not too painful, as I've covered previously....

GNU 'as' docs are sh****

Once you decide you want to start doing some straight assembly, though...not so much fun. I have spent the last week just reading The GNU Assembler documentation and various forum posts, looking for 'best practice' or 'just plain works' code examples.  Because of the vagaries of FSF 'free software' license policies (which apply to the GNU compiler docs, fortunately or unfortunately as Alice might say), the support for avr-as is stark, at best, and there is VERY little code to examine out in the wild.

The attitude seems to be....you REALLY should be programming in C, or at worst using the (awful) inline assembler.  REAL assembler is just there in case someone like to torture themselves with AVR-libc's comprehensive but extremely obscure symbol/macro naming conventions and complex header file hierarchy.

Atmel's attitude toward support of other assemblers is worse, as is to be expected.  They would rather you use AVR Studio and avrasm, which has entirely different preprocessor syntax.  Atmel does assert a minimum of support for the GNU AVR toolchain, and even have some documentation on using it...but it is even sketchier than the GNU docs, outright wrong about certain preprocessor directives, and completely out to sea on code and data sectioning among other things.

Even though I despise coding under Windows, I LIKE avrasm and Atmel's take on assembly.  I have ALWAYS liked it.  The syntax was VERY well documented, it is very clear where everything goes, and when, and why wouldn't it be, since Atmel had a lot to do with developing it? Unlike AVR-libc, Atmel took a more incremental approach, assuming that anyone doing assembly would be more likely to have a 'bag-of-tricks' set of macros for common tasks (like timed delays) and would generally write software toward a specific subset of the AVR family (in my case the tiny26, tinyx4, tinyx5, and tinyx61 devices) rather than everything.

The Atmel/avrasm device 'include' files are particularly clearly put together; it occurred to me right away that, with a bit of search and replace, I could convert them to avr-as syntax and get the best of both worlds.  Or the best of THREE worlds, since I wouldn't have to rely on the AVR-libc header files either, which are really oriented toward C rather than AVR assembler anyway.

And the last hurdle, therefore...


AVR-libc header files are gibberish...

The latter is only a personal opinion, but....I do NOT plan on using assembler for the big ATmega's and the Arduinos.  Those devices have a LOT of memory and peripheral support, so why not be comfortable and do it all in C?   For the kind of tasks it makes sense to implement on an Arduino, we can sit back and just include the libraries we need and hope the magic works, not worrying too much about what's going on in there unless we run into serious problems.

But there's a good reason that Arduino C and the Arduino IDE exist in the first place; bare-metal C programming on an AVR is not for the faint of heart.  Despite the abundance of excellent code examples out there and the well-intentioned effort by the maintainers to create a device-agnostic environment, even the the simplest tasks (such as the 'demo.c' program in the AVR-libc docs) require a knowledge of the hardware that few beginners are likely to have.

With assembly, using the AVR-libc envirnoment is even more painful.  In assembly, you can't ignore the device-compatibility gibberish in all those files.  The gyrations needed to get one set of header files to work with the dozens of different AVR's, even in C, require a LOT of esoteric macro witchcraft that I may understand...eventually.  In the next year or so.  Or maybe when I retire.


But...we aren't talking about ATmegas, anyway, and why not?  The AVR ATtinys might not be able to do a hardware multiply in four clock cycles, but do we really need that to start with?  The whole idea behind using a microcontroller for these tasks is that they are more space- and resource- and power-efficient than a large TTL or analog network, in addition to being more flexible, programming-wise.  But going too far with this, making an MCU do everything, can defeat the purpose of using them in the first place.  With 32K of memory there is little incentive (beyond the limitations of the hardware itself) to code efficiently.  More, there are only so many pins and so many counters and so many processor cycles available on an AVR.  Why not develop a sensible multiprocessor control system, with good communications between sub-systems as a priority?

IN other words...why not use ATmegas or small Linux platforms like the Pi to do the supervisory tasks that require space, flexibility and horsepower, and the small devices like the excellent stable of ATtinys to do the low-end sensor and motor control gruntwork?  C is perfectly fine for the big picture baton-waving...but assembly might be a very handy tool to have if we want the best results out at the ragged edge of reality, where every clock cycle and bit counts.

If we want all this, and Linux too, we need a simple set of programming tools that work fairly seamlessly with the Arduino and AVR-libc environments, but don't inherit the narrow device support of the former or the one-size-fits all limitations of the latter.  We want a set of device-centric headers, some powerful macros, and a functional code base that will allow us to easily write EXACTLY to the device and the purpose we need.  And we want as much automation as we can devise so that we CAN concentrate on those goals.

In short:

1.  Device-specific include files, not generic ones.
2.  Familiarity with the minimum subset of assembler directives that the GNU assembler furnishes to get the job done, so that our code is clear and easy to adapt.
3.  A set of rules and scripts that allows us to deploy our code quickly.

Firstly...

To get started, I needed to convert a subset of the Atmel '<device>def.inc' files to a format avr-as likes.   AVR Studio/avrasm use now use an XML format for newer versions of the Atmel programming environment, but I found a set of older files here.

With some chopping, cutting, search-and-replace, I was is business.  Below, ferinstance, are some snips from my massaged 'tn26def.inc' include file:
-----------------
... (other stuff)
...
.equ    PINA,            0x19
.equ    PORTB,           0x18
.equ    DDRB,            0x17
.equ    PINB,            0x16
.equ    USIDR,           0x0f
.equ    USISR,           0x0e
.equ    USICR,           0x0d
.equ    ACSR,            0x08
.equ    ADMUX,           0x07
.equ    ADCSRA,          0x06
.equ    ADCH,            0x05
.equ    ADCL,            0x04


; ***** BIT DEFINITIONS **************************************************

; ***** AD_CONVERTER *****************
; ADMUX - The ADC multiplexer Selection Register
.equ    MUX0,            0    ; Analog Channel and Gain Selection Bits
.equ    MUX1,            1    ; Analog Channel and Gain Selection Bits
.equ    MUX2,            2    ; Analog Channel and Gain Selection Bits
.equ    MUX3,            3    ; Analog Channel and Gain Selection Bits
.equ    MUX4,            4    ; Analog Channel and Gain Selection Bits
.equ    ADLAR,        5    ; Left Adjust Result
.equ    REFS0,        6    ; Reference Selection Bit 0
.equ    REFS1,        7    ; Reference Selection Bit 1

...
... (and so on...)
...
; ***** CPU REGISTER DEFINITIONS *****************************************
.equ    XH,        r27
.equ    XL,        r26
.equ    YH,        r29
.equ    YL,        r28
.equ    ZH,        r31
.equ    ZL,        r30

; ***** DATA MEMORY DECLARATIONS *****************************************
.equ    FLASHEND,         0x03ff    ; Note: Word address
.equ    IOEND,            0x003f
.equ    SRAM_START,       0x0060
.equ    SRAM_SIZE,        128
.equ    RAMEND,           0x00df
.equ    XRAMEND,          0x0000
.equ    E2END,            0x007f
.equ    EEPROMEND,        0x007f
.equ    EEADRBITS,        7


-------- 

All relatively clear, as far as mud goes. In Atmel's assembler, the '.equ' directive uses an '=' as a delimiter rather than a comma, so a simple search-and-replace fixed 95% of the file; I removed all the # (C-style) directives and fixed some tabs for cosmetic reasons. All of the dozen files I needed for the ATtinys took me a total of an hour to fix up for avr-as.

So at this point, I could at least have all the shortcut symbols I would need to get at the I/O ports and registers, plus some handy constants. So what's next?

Of the dozens of preprocessor directives that the GNU docs describe (sort of), you really only need a small subset:

.equ         - to define symbolic constants
.include     - to include other files
.section     - to define non-code sections with non-standard names, like...
.section .bss - to reserve space in the SRAM data
.text        - to define the code area
.global      - to make sure an external program can see a label
.macro/.endm - to define fancy code shortcuts
.if/.endif   - more fancy stuff in macros, condtional assembly, and....

...and a gaggle of data description things like:

.byte, .word, .quad, .ascii, .asciz, .octa, .space

...for space reservation in data areas.  See below for a working example; a simple tiny26 program that uses a macro to generate an inline 'microsecond_delay' function, and reads a array of bytes from the tail end of program memory to sequentially blink an LED in a set pattern:

 The blink26.S program
;
; blink26.S
;
; ** LOAD IO DEFS FOR THE RIGHT DEVICE
;
.include "avrinc/tn26def.inc"
;
;    ** GET ME SOME HANDY MACROS **
;
.include "avrinc/macro.inc"
;
;  *** DEFINES HERE ***
;
.equ F_CPU, 8000  ; 8Mhz clock, for msec_delay define in 'macro.inc'
.equ OUTPIN, PA7  ; for t26
.equ IO_OS, 0x20  ; for lds/sts with i/o ports, e.g 'lds r2, (IO_OS + TCNT0)'
;
;  *** SRAM RESERVES HERE ***
;
.section .bss
;  FOR example:
;<datalabel>: .byte 1  ; to reserve 1 byte in SRAM for <datalabel>
;
;  *** RESETS, IVECTORS, AND MAIN PROGRAM ***
;
.text
.global reset   ; don't know if '.global' needed but...
reset:
    rjmp main;  *** tn26 interrupt vector names, just fer handy ***
    reti        ; 0001 INT0_vect
    reti        ; 0002 PCI0_vect
    reti        ; 0003 TIMER1_OC1A_vect
    reti        ; 0004 TIMER1_OC1B_vect   
    reti        ; 0005 TIMER1_OVF_vect
    reti        ; 0006 TIMER0_OVF_vect
    reti        ; 0007 USI_START_vect
    reti        ; 0008 USI_OVF_vect
    reti        ; 0009 EERDY_vect                EEPROM Ready
    reti        ; 000A ACOMP_vect             Analog Comparator
    reti        ; 000B ADC_vect                 ADC Conversion Complete   
;
;
.global main
main:
    setupsp r16, RAMEND  ; use the macro to set up the stack
    sbi DDRA, OUTPIN        ; set pin 'OUTPIN' on port A to high for OUTPUT
    ldi ZL, lo8(array)
    ldi ZH, hi8(array)
blink:
    sbi PORTA, OUTPIN        ; then drive it HIGH
    lpm r20, Z
b2:
    msec_delay 1, r17, r18, r19  ; 1 ms wait * r20 times
    dec r20
    brne b2
    cbi PORTA, OUTPIN        ; then LOW...
    lpm r20, Z
b3:
    msec_delay 1, r17, r18, r19  ; 1 ms wait * r20 times
    dec r20
    brne b3
    adiw Z, 1
    lpm r20, Z
    tst r20
    breq end
    rjmp blink;
;
end:
    rjmp end;
;  *** MAIN PROGRAM END ***
;

; *** SUBROUTINES OR INTERRUPT SERVICE ROUTINES HERE ***
;
; (don' have none o' that at the moment)
;
; *** PROGRAM MEMORY DATA HERE ***
;
array: .byte 10,250,20,240,30,230,40,220,50,210,60,200,70,190,80,180,90,170,100,160,110,150,120,140,130,0
;
;  end of test.s


--------------------

Everything you really need is right there, unless you are going to start moving code blocks around and deal with a boot loader; if you need to load something into EEPROM you could either hand-code a hex file that would work, or...more when I know more.

Among other things, note the very tail-end of the program, where the 'array:' label is used to mark off a list of byte values.  Nothing special is done with directives (other than .byte to make it clear what size and type the data are) to reserve this area; it's just slapped right to the end of the program after the last of the code.

You can even use '.org <addr>' before the label to move the memory location this label is pointing at around, but....why not just leave it where it is?

A few minor caveats with putting data in program memory....among other things you can only use the Z register (r31:r32 jammed together as a 16-bit register) to access it indirectly, using the LPM instructions.  LPM also has a very limited number of addressing modes, depending on the device, and takes three cycles to get the data to the register.  Use the 'lo8()' and 'hi8()' built-in macros to carve the 16-bit program address into something you can load into the two halves (ZL and ZH) of the Z register.

A more interesting problem...program memory is organized 'word-wise', in other words 16 bits at a time, so any data you put out there in program space has to align with word boundaries; if you have an odd number of bytes in a chunk of data, you may need to add a '.space 1' directive before or after it to line the label address up with right...avr-as will complain if you don't.

----------------


Here is a snip from my 'macro' file, just so you get a taste of how those work:

;
; define SPL if only SP is defined
;
.ifndef SPL
.equ SPL, SP
.endif
;
; setup stack pointer
;
.macro setupsp r,x
ldi \r, \x % 256
out SPL, \r
.if \x > 256
ldi \r, \x / 256
out SPH, \r
.endif
.endm
;
;
;
;***
;    8-bit hex to ascii-coded hex (subroutine)
;
;    use 4 registers above r15!!
;
.macro hex2ascii_m in, tmp, o1, o2
hex2ascii:
    mov \tmp, \in
    andi \tmp, 0x0f    ; get bottom nybble
    ldi \o1, 48
    add \o1, \tmp
    cpi \o1, 58
    brlo h2ach1
    subi \o1, (-7)
h2ach1:
    mov \tmp, \in
    swap \tmp
    andi \tmp, 0x0f    ; get top nybble
    ldi \o2, 48
    add \o2, \tmp
    cpi \o2, 58
    brlo h2ach2
    ldi \tmp, 7
    subi \o2, (-7)
h2ach2:
    ret
.endm
;
;
; ***
; *** inline millisec delay loop
; ***
; *** max is about 11000
; *** !!and be sure to define F_CPU in main program for both like below!!
;
;.equ F_CPU, <CPU mhz * 1000>  (e.g. for 8Mhz use 8000)
;
; NOTE: label '1' is necessary if you are going to use this macro more than
;   once in a give program.  The '1b' branch is shorthand for 'branch to the
;   '1' label BEFORE this instruction'.  '1f' would mean to go to the '1' label
;   FOLLOWING the branch.
;
.macro msec_delay delay, ra, rb, rc
    ldi \ra, ((((\delay * F_CPU - 1) % 327680) % 1280) / 5)
    ldi \rb, (((\delay * F_CPU - 1) % 327680) / 1280)
    ldi \rc, ((\delay * F_CPU - 1) / 327680)
1:
    subi \ra, 0x01
    sbci \rb, 0x00
    sbci \rc, 0x00
    brne 1b         ; see note above
    nop
.endm

 ----------------------------
 There are a few different things going on here...

'as' macros are actually quite sophisticated, and I've used them above in two different ways.  If you want to just to define a subroutine:

;    8-bit hex to ascii-coded hex (subroutine)
;
;    use 4 registers above r15!!
;
.macro hex2ascii_m in, tmp, o1, o2
hex2ascii:
    mov \tmp, \in
    andi \tmp, 0x0f    ; get bottom nybble
    ldi \o1, 48
    add \o1, \tmp

...

...shows the way.  First, notice that this macro has four parameters; and it is pretty obvious from the way the 'mov' instruction is used below that 'tmp' and 'in' refer to two of the AVR registers.  And it should also be pretty obvious that to get at a parameter inside the macro you need to put a slash in front of it, e.g. '\tmp'.

But you can also pass constants, and use C-style expressions to calculate new ones:
;
.macro msec_delay delay, ra, rb, rc
    ldi \ra, ((((\delay * F_CPU - 1) % 327680) % 1280) / 5)
    ldi \rb, (((\delay * F_CPU - 1) % 327680) / 1280)
    ldi \rc, ((\delay * F_CPU - 1) / 327680)


'delay' is passed in, and then a BUNCH of math is done to figure out how to set the three registers used for the cascading countdown.

'msec_delay' generates inline code, rather than a subroutine...and therfore you may notice some weirdness farther down:

... (continued from above)
...
    ldi \rc, ((\delay * F_CPU - 1) / 327680)
1:
    subi \ra, 0x01
    sbci \rb, 0x00
    sbci \rc, 0x00
    brne 1b         ; see note above
    nop
.endm


How the hell does the branch get to the '1' label if it refers to it as '1b'?  What the hell does THAT mean?  Hmmm...

Okay.  We are going to use this 'msec_delay' more than once, presumably, in a fairly complicated program, so when this macro gets expanded by the compiler...it AIN'T going to like seeing multiple '1:' labels.  SOOO, the rule is that, if you use a digits only label in a macro, you reference either the 'next' or the 'previous' instance of that label, and when the macro is expanded the actual labels are removed and numerical offsets put in on the branches and jumps in their place.  Therefore...

brne 1b

..means 'branch if not equal to the first label '1:' BEFORE this location (thus the 'b').  If you need to branch or jump FORWARD, you would say 'brne 1f', ferinstance.

And what about conditional things?  Try this out:
 ; setup stack pointer
;
.macro setupsp r,x
ldi \r, \x % 256
out SPL, \r
.if \x > 256
ldi \r, \x / 256
out SPH, \r
.endif
.endm
;


Almost self explanatory.  The macro is intended to set up the stack pointer.  It is named 'setupsp' and has two parameters, 'r' (which should be a register), and 'x', which looks to be used as a constant.  This line:

 .if \x > 256

...looks an awful lot like it is CHECKING that constant to see if it is just so large.  If RAMEND is plugged into 'x', you are probably guessing that we are trying to see if the SRAM size for the device needs a 16-bit or only an 8-bit register to store the stack pointer...and that is what this does.  If RAMEND is larger than 256, then two extra instructions are generated to init SPH as well as SPL, with the high bytes of RAMEND.

One of the few things that the GNU docs DO include are a VERY few examples of macros, so those pages are worth a perusal; and the many variations of '.if' are also worth a look.

Make?  Ugh?  Ahh....

Lastly...it would be nice to have an automated way to compile/assemble our programs, so we aren't stuck memorizing a lot of stupid and easy-to-screw-up compiler options...how about a nice Makefile?
--------------------

P=test MCU=attiny26
BUPATH=./back
TS=`date +%H%M%S-%Y`
ifdef VER
    TS:=$(VER)
endif
ifdef UC
    MCU:=$(UC)
endif
ifdef PRG
    P:=$(PRG)
endif
CFLAGS=-g -mmcu=$(MCU) -Os
AFLAGS=-g -mmcu=$(MCU)
LDLIBS=-I/home/zenas/mylib/
CC=avr-gcc
OC=avr-objcopy
AS=avr-as
ASL=avr-ld
DIS=avr-objdump

$(P).ao: $(P).S
    cp $(P).S $(BUPATH)/$(P)_$(TS).S; $(AS) $(AFLAGS) $(LDLIBS) $(P).S -o $(P).ao

$(P).elf: $(P).ao
    $(ASL) -o $(P).elf $(P).ao

$(P).o: $(P).c
    cp $(P).c $(BUPATH)/$(P)_$(TS).c; $(CC) $(CFLAGS) $(LDLIBS) $(P).c -o $(P).o    
$(P).ahex: $(P).elf   
    $(OC) -j .text -j .data -O ihex $(P).elf $(P).ahex

$(P).hex: $(P).o
    $(OC) -j .text -j .data -O ihex $(P).o $(P).hex

$(P)-eeprom.hex: $(P).o
    $(OC) -j .eeprom --change-section-lma .eeprom=0 -O ihex $(P).o $(P)-eeprom.hex

dump: $(P).o
    $(DIS) -h -S $(P).o > $(P).lst
   
dumpa: $(P).elf
    $(DIS) -h -S $(P).elf > $(P).lst

ahex: $(P).ahex

hex: $(P).hex

eeprom: $(P)-eeprom.hex

touchc:
    touch *.c

toucha:
    touch *.S

clean:
    rm *.o; rm *.ao; rm *.ahex; rm *.hex; rm *.elf; 


-------------------- 

Just plug in PRG, UC, and VER variables on the command line if you want to use this file generically; by default it compiles/assembles 'test.S' or 'test.c' for the ATtiny26.

Note that there is an extra 'linker' stage (invoked with avr-ld) needed to get the final uploadable code!  One of my early mistakes took a very long time to fix: I was generating both the final '.ahex' file (to burn to the device) and the disassembly '.lst' file from the output object '.ao' file from avr-as, NOT the '.elf' file that the linker produces....while all of the symbolic constants and macros unrolled right, none of the jumps or branches or labels did anything useful.


The 'make dumpa' routine is particularly handy (once you are dumping the right thing anyway); you can see exactly what the assembler has done after all the macros and constants have been resolved, and where everything will end up when you upload to your device.  It will save you an immense amount of headache if you can grit your teeth and familiarize yourself with the disassembled product of avr-gcc and avr-as.  Among the bonuses: you can discover some very handy code shortcuts by poaching the output of avr-gcc...just write up something simple in C, let avr-gcc have it, then use the 'make dump' routine and see what the final product looks like.  avr-objdump very helpfully includes the C code segment used to generate each piece of object code, and as long as the task isn't too complicated you can tease out the jewels.


Incidentally, the 'hex' and 'ahex' routines backup the .c/.S file before it's compiled, shoving a time stamp on it in case you want to go back and see what worked before...the backed up files ends up in the 'back' folder by default.


The makefile works with both avr-libc 'C' files and assembler 'S' files:
make clean           --- gets everything ready
make hex PRG=cblah   --- compiles, links, and genrates a .hex file of cblah.c
make ahex PRG=ablah  --- assembles, links, and genrates a .hex file of ablah.S
make dumpa PRG=ablah --- creates a disassembled version of ablah.S
make dump PRG=cblah  --- ditto for object file of cblah.c

..and so on.  I still upload stuff to the device by hand with 'avrdude',

avrdude -p t26 -P /dev/ttyACM0 -c arduino -b 19200 -U flash:w:ablah.ahex

...but that could probably go in there too, eventually.