KEMBAR78
Introduction To Microcontrollers Pages 101 130 | PDF | Assembly Language | Subroutine
0% found this document useful (0 votes)
53 views30 pages

Introduction To Microcontrollers Pages 101 130

The document discusses the software development cycle, emphasizing the importance of testing and debugging after code compilation, particularly in safety-critical applications. It outlines strategies for testing, including bottom-up and top-down approaches, and highlights the significance of good documentation in identifying bugs. Additionally, it introduces assembly language programming, explaining its relevance and various addressing modes used in assembly language commands.

Uploaded by

Qazi Sharifullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
53 views30 pages

Introduction To Microcontrollers Pages 101 130

The document discusses the software development cycle, emphasizing the importance of testing and debugging after code compilation, particularly in safety-critical applications. It outlines strategies for testing, including bottom-up and top-down approaches, and highlights the significance of good documentation in identifying bugs. Additionally, it introduces assembly language programming, explaining its relevance and various addressing modes used in assembly language commands.

Uploaded by

Qazi Sharifullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 30

4.1.

DEVELOPMENT CYCLE 95

First of all, you should be aware of the fact that after you have developed a compilable piece of
code, the work is not done yet. You might even say that it has just begun. What comes next is the very
important and often time-consuming task of testing and debugging the software, which makes up a
large portion of the overall development cycle. Testing is performed with the aim to check whether
the tested system meets its specification. Detected deviations from the specification may result in
debugging the program code (if its cause was an implementation error), but may even instigate a
complete redesign of the project in case of a design flaw.
It is immediately apparent that testing is important, even more so in safety-critical applications.
However, it is also a fact that barring the use of formal verification at all stages (including a formally
proven specification!) in conjunction with either automatic code generation or exhaustive testing,
testing and debugging does not guarantee the absence of bugs from your software. It only (hopefully)
removes bugs that show up in the tests, preferably without introducing any new bugs in the process.
The higher the test coverage, the more bugs are found. On the other hand, the longer the testing and
debugging phase, the longer the time-to-market, which has a direct impact on the financial gain that
can be expected from the product. Figure 4.3 roughly sketches the relationship between debugging
time and the percentage of errors remaining in the code.

100

90

80
remaining % bugs in code

70

60

50

40

30

20

10

0
time

Figure 4.3: Relationship of debugging time and percentage of errors still in the code.

As you can see, in the initial stages of the testing phase, a lot of bugs are found and removed in
a short amount of time. After these easy to find bugs have been removed, however, it grows more
and more difficult to find and eliminate the remaining errors in the code. Since 80/20 rules are very
popular, there is one for the debugging process as well: The final 20% of the bugs cost 80% of the
money spent for debugging. In the light of these figures, it is only natural for companies to enforce a
limit on the time spent for debugging, which in turn influences the percentage of bugs remaining in
the system. This limit depends on the target field of application, with safety-critical systems putting
the highest demands on the testing and debugging process (using formal verification methods and
automatic testing).

Testing and debugging is not just done on the final product, but should be performed in the early
stages of implementation as well. As we have seen, the sooner a bug is caught the better. In con-
sequence, modular design is important, because it facilitates testing. Testing concerns should also
be considered during and incorporated into the design of the product. Both bottom-up and top-down
testing are feasible strategies. In both cases, the application (which is on the highest level, on the top)
is broken into modules, which are again composed of sub-modules and so on. In bottom-up testing,
96 CHAPTER 4. SOFTWARE DEVELOPMENT

the components on the lowest level, which are not broken down any further, are tested first. After that,
the module which is formed by them is tested, and so on until the final integration test of the whole
application, which tests the interworking of the modules. In top-down testing, the sub-modules of a
module are emulated by so-called stubs, which are dummy implementations with the sole purpose
of providing adequate behavior to allow testing the module. Testing then starts at the top and moves
down until the lowest level is reached. The top-down strategy has the advantage that the application
itself can be tested at an early stage. Since a design error on a high level most likely affects the levels
below and can even instigate a complete redesign, finding such errors soon saves a lot of time. How-
ever, this approach requires the implementation of stubs and does not remove the need to do additional
integration tests after the sub-modules become available. The bottom-up strategy does not need stubs,
but high-level modules can only be tested after all sub-modules are available and tested. Note that the
usage of stubs allows any module, on any level, to be tested at an early stage. So a hybrid approach
could be implemented, testing low-level modules as soon as they are finished, while in the meantime
testing crucial high-level modules with stubs.
Note that in any of the strategies, it is not sufficient to test the modules stand-alone. Integration
tests must be performed to see if the modules correctly work together, and if any sub-module is
changed, it and all modules affected by the change must be tested as well.

Finally, do not underestimate the value of good code documentation for avoiding and also finding
bugs. Good documentation of the code forces the software engineer to think about what he or she
is doing, about expectations placed upon the hardware and software. Not only does this help the
software engineer focus on what needs to be done, it also facilitates debugging because the initial
expectations can be compared to the real outcome step by step until the error is found.
4.2. PROGRAMMING 97

4.2 Programming
4.2.1 Assembly Language Programming
This section gives a brief introduction to some of the concepts of assembly language programming
which most students will be unfamiliar with. In the following, we will encounter many of the concepts
of Section 2.1 again. This text is by no means exhaustive – for further information, you will have to
consult the datasheets and manuals provided for the microcontroller and Assembler of your choice.

Why Assembly Language?


In times of object-oriented programming and nth generation languages, even bringing the assembly
language up is often viewed as anachronistic. After all, everything you can do in assembly, you can do
in hinsert the programming language of your choice, henceforth referred to as PLi. PL is just slower
and less memory efficient (and some people would probably even argue with that), but at the same
time much safer and a lot more convenient. For a few C more, you can buy better hardware with
enough raw computing power and a big enough memory to offset what little advantage Assembler
might have in terms of program speed and size. There seems to be no real reason to concern yourself
with a programming language as crude as assembly. Sure, somebody has to write the first compiler
or interpreter for a high level language, but that need not be you, really. There will always be some
people crazy enough to volunteer for that, let them do it.
Well, it is not quite as simple as that. Especially embedded systems applications tend to have high
production counts – a few C per item could translate into a few million C more in total production
costs. It might pay off to bother with assembly after all, if it means you can fit your code into a 2 C
MCU with 8 KB SRAM and 8 MHz instead of one with 256 KB / 50 MHz at 10 C.
Of course, cost is not the only issue. True, your competition might save a lot in production by
using assembly, but since PL is so much more convenient, your development time will be shorter, so
you can beat them to market. Doesn’t that count for something? Well, it does, but not as much as
one might think. Unfortunately, firmware development tends to be closer to the final deadlines than,
say, hardware development or market research. So, if the product is not on time, that often seems to
be due to the software part. However, delays due to problems in hardware development, hardware-
software interaction, marketing, or management may all help to push firmware development right into
and beyond the deadline. Also, you might use a high level language on a high-end controller for the
proof-of-concept prototype and switch to assembly language on a low-end controller for production,
overlapping firmware development with other activities. In short: Firmware development is but one
part of product development, and reducing firmware development time by 50% does not even remotely
reduce time to market by 50%.
And even if you can afford to develop your firmware in PL – once you work close enough to
hardware, you will find that you often need to verify what your compiler makes of your source. For
that, a basic familiarity with assembly language is required.

What is Assembly Language?


You are, of course, aware that any program code written in a high-level language like C++ needs to
be translated into machine code before it can be executed by a processor.
Ultimately, a program in machine language is just a sequence of numbers. If represented in
the right base (in most cases base 2 does the trick), the internal structure of a command is usually
98 CHAPTER 4. SOFTWARE DEVELOPMENT

discernible. For example, there is a command in the AVR instruction set1 to copy the content of one
register (the source) into another register (destination). In binary, it looks like this:

001011sdddddssss

A ‘d’ represents one binary digit of the destination register number, a ‘s’ one digit of the source
register number (as usual, the most significant bit is to the left). So, if we wanted to copy the contents
of R4 into R3, we would have to write:

0010110000110100

Obviously, binary numbers are rather unwieldy – writing a program in machine language would
be tedious. This is where assembly language comes into the picture: Instead of using the actual
numbers, meaningful names are assigned to each command. In AVR assembly language, the above
would read

MOV R3, R4

which is a lot easier to remember (in fact, the command names are referred to as mnemonics, because
their main purpose is to help us memorize the commands).
The CPU, however, does not ‘understand’ mnemonics, so a program written in assembly language
needs to be translated into machine language before it can be executed by the CPU. This translation
is done with a program aptly named Assembler. In its most basic form, an Assembler merely replaces
command mnemonics with the corresponding number (though usually, it does quite a lot more than
that, as we will see shortly).

Addressing Modes
Now, if we want to load some value into a register, we could use the above command – but only if that
same value is already available in another register, which will usually not be the case. Obviously, we
need some means to load an arbitrary constant into a register. We would need a command quite similar
to the one above – copy a value into some register –, but the source would have to be a numerical
constant instead of another register. The difference between the commands would be in the so-called
‘addressing mode’: There are different ways to specify the operands of commands.

Register Addressing
What we have seen in the example above is called register addressing. If we use register addressing
for the source, it means we use the value contained in the given register. If we use it for the destination,
it means we want to store the value into the given register.

Immediate Addressing
Now, if we need to load a constant value into a register, the destination is still specified in register ad-
dressing mode, but this time the source is a constant value. This addressing mode is called immediate
addressing. The corresponding AVR command would be:

1
In the remainder of this section, we will use the AVR instruction set for our examples where possible.
4.2. PROGRAMMING 99

LDI Rx, <8-bit value>

Note carefully that x, the target register number, has to be in the range 16-31. The same goes for
all assembly instructions involving immediate addressing.
So, to load the hexadecimal value 5016 (or 0x50 in C-syntax) into register R16, one would use

LDI R16, 0x50

Direct Addressing
Register and immediate are the two most basic addressing modes, but there are a lot more. Think of
data memory — neither of the above addressing modes can be used to load values from data memory.
The addressing mode we need for that is called direct addressing:

LDS R1, <16-bit address>

So, to load the value from data memory address 0x2000 into register R1, we would write:

LDS R1, 0x2000

Indirect Addressing
Now assume we want to operate on a sequence of bytes in data memory – say, we need to compute
a bit-wise exclusive or over all bytes from address 0x3001 up to and including 0x3080 as a simple
checksum. Yes, we could use the above addressing mode, but with the source address being a static
part of the command, we would need 0x80 = 12810 assembly commands. If we were able to change
the source address of the command, we could do the same in a simple loop. This is where indirect
addressing comes in handy. An indirect load command would look like this:

LDS R1, (0x2000)

The braces around the address indicate that this command does not just load the byte from address
0x2000 directly into R1. Rather, it reads the bytes from 0x2000 and 0x2001 and combines them to
form the actual address from which the register is loaded. So, assuming 0x2000 contains 0x00 and
0x2001 contains 0x30 (and assuming the CPU is a little endian machine, which the AVR is), the
effective address would be 0x3000. Therefore, the above command would load the value from address
0x3000 into R1 – despite the fact that the address 0x3000 does not actually appear in the command
itself.
It is clear how this addressing mode could be useful in our checksum example: We could write a
loop containing a command which computes the exclusive or of R1 with (0x2000). If we increment
the content of address 0x2001 by one each time and exit the loop after 128 (0x80) times, in effect we
compute our checksum over the specified range of bytes.

Indirect Addressing With Auto-Increment/Decrement


However, we don’t even need to do the incrementing ourselves. Loops operating on a continuous
sequence of bytes are so common that there are dedicated addressing modes for that, namely indirect
addressing with auto-increment or -decrement. An indirect load with auto-increment would look
something like this:
100 CHAPTER 4. SOFTWARE DEVELOPMENT

LDS R1, (0x2000)+

That command does the same as the above: Say address 0x2000 contains 0xff and 0x2001 con-
tains 0x30, then the command loads the value from address 0x30ff into R1. However, after that,
it increments the two-byte word stored at address 0x2000, so that 0x2000 now contains 0x00 and
0x2001 contains 0x31 (giving 0x3100 – the address right after 0x30ff).
So, the above command really does two things: First an indirect load from (0x2000), and second
an automatic increment. However, that order is arbitrary, and it might also make sense to first do
the auto-increment, and then do the indirect load. The prefixes pre and post indicate which order is
actually used: The above example would be indirect addressing with post-increment; pre-decrement
would mean that the content of 0x2000/0x2001 is first decremented and then used to create the effec-
tive address.

Load/store Architectures
Did you notice we said that an indirect load command would look like this? That is be-
cause the AVR CPU core we use as an example does not offer memory indirect addressing.
It is a RISC CPU core with a load/store architecture. This means that only load/store com-
mands access the data memory, whereas generic arithmetic/logic commands only operate
on registers.
So, to compute the sum of two values in data memory, first each value must be loaded into
a register using a dedicated load command, because the add command is only available
with register addressing.
Furthermore, load/store architectures usually do not reference data memory twice in one
command. That is why there is no memory indirect addressing at all in the AVR instruction
set: The CPU would first need to access memory to read the effective address, and a second
time to load the byte from there.
Yes, this is inconvenient, but it is also efficient: CISC architectures usually offer most
addressing modes for all commands, but due to increased hardware complexity, they re-
quire several machine cycles to compute effective addresses and transfer the actual data.
RISC CPUs, on the other hand, limit memory accesses to load/store instructions, and even
then to only one memory access. In consequence, they need several commands to achieve
what a CISC architecture does in one. However, due to the aforementioned restrictions,
the CPU design is much more ‘streamlined’, and there are more resources for, say, a large
register set, ultimately increasing performance.

Memory/Register Indirect Addressing


The above examples are actually a special variant of indirect addressing, namely memory indirect.
There is also register indirect addressing. There, the effective address is not taken from a given
memory address, but rather from a register or a pair of registers. An example of that in AVR assembly:

LD R1, X

Now, you probably expected to see a register (or a pair of registers) in the source operand, and a
pair of braces as well. Instead, it says just ‘X’. Well, according to the official register nomenclature
4.2. PROGRAMMING 101

for the AVR architecture, X, Y, and Z are 16-bit registers comprised of the 8-bit registers R27/R26
(X), R29/R28 (Y), and R31/R30 (Z). E.g., R27 contains the high byte of 16-bit register X, and R26
contains the low byte. In addition to that, these three registers are used as indirect address registers.
This means that, when used to address data memory, the indirect addressing is implicitly assumed.
So, the above command actually means something like this:

LD R1, (R27:R26)

Register indirect addressing works pretty much like memory indirect addressing: If R27 contains
0x30 and R26 contains 0x00, the above command loads the byte from data memory address 0x3000
into R1. Of course, auto-increment/decrement can work with register indirect addressing, too, though
not all combinations may be implemented. The AVR offers post-increment and pre-decrement, but
not post-decrement and pre-increment.

Indirect Addressing With Displacement


There’s still more variants of indirect addressing: Often, you need to keep an array of records with
identical structure in memory – e.g., a list of students’ grades. For each student, we store the ma-
triculation number (three bytes), the achieved points (one byte), and the grade (one byte). Now, a
student record can be referenced by its base address. However, you would have to modify the address
in order to access the different components of the data structure – e.g., to access the points of a record
given its base address 0x2000, you would have to add 0x03. If you then need to access the grade, you
would have to add 0x01 to that, or 0x04 to the base address. To get the last byte of the matriculation
number, you’d have to subtract 0x02 if your current address points to the grade, or 0x01 if it points
to the percentage, or add 0x02 if you’re still at the base address. With all those manipulations of the
address, the code becomes hard to read and maintain.
To remedy that, there is a variant of indirect addressing just for this sort of access, namely indirect
addressing with displacement. It works just like indirect addressing, but also features a fixed offset
(the displacement). On the AVR, it would look like this:

LDD R1, Y+displacement

Remember that the above command actually means

LDD R1, (R29:R28+displacement)

So, if the base address of a student record is in the Y-register, and you need the points in register
R1 and the grade in R2, the code could look like this:

LDD R1, Y+3


LDD R2, Y+4

Note that you do not need to manipulate the address at all, because the offset within the record is
given as the displacement.
102 CHAPTER 4. SOFTWARE DEVELOPMENT

PC-relative Addressing
Addressing modes are also relevant for jump instructions – after all, the target address of a jump
must be specified just like the address from which we want to load data. So, we could use direct
addressing, specifying the absolute target address in the command. However, that would take two
bytes for the address. In the case of the AVR, including the 2-byte command itself we would need
four bytes for each jump instruction. Now, looking at assembly language programs, one can observe
that jump instructions rarely reach far; since most of them are used for program loops, the target is
usually within a few dozen commands from the jump instruction itself. Obviously, one could save
memory space by telling the CPU to jump ahead or back from the current program address instead
of anywhere within the whole program memory space. Considering that we regularly jump small
distances, we can use a displacement that is considerably smaller than what would be needed for
absolute addressing. The so-called PC-relative addressing mode does this:

RJMP <relative offset from current PC>

In the AVR’s case, the relative offset is 12 bit wide, which is contained within the command’s 16
bits. This allows for a relative jump range from -8192 to +8191. In contrast, a long jump uses 22 bits
for absolute addressing of the target address. With that, the long jump can cover the entire program
memory, but at the price of two extra bytes.
How exactly does the jump work? Usually, when a command is executed, the program counter
(PC) is incremented to point to the next command. With an RJMP, however, the given offset is added
to the (already incremented) PC. So, consider the following situation

Addr Opcode
0x0000 ...
0x0001 ...
0x0002 ...
0x0003 RJMP -4 ; jump back to address 0x0000
0x0004 ...

While the RJMP command at program address 0x0003 is decoded, the PC is incremented concur-
rently (in order to increase performance, the AVR even prefetches the next sequential command while
the current command is executed). Once the CPU knows that it is supposed to jump, the PC already
points to the next address 0x0004. To that, the offset -4 is added, which gives address 0x0000.
Of course, computing address offsets becomes tedious once the target is more than a few com-
mands away. Luckily, the Assembler does that for you, but you still need to specify where to jump.
For that, the so-called labels are used:

Label Addr Opcode


LOOP: 0x0000 ...
0x0001 ...
0x0002 ...
0x0003 RJMP LOOP
0x0004 ...
4.2. PROGRAMMING 103

A label is a special mark we can set up at each command or address. If we need to jump to the
command at address 0x0000, we put a unique label there, like ‘LOOP’ in the above example. In the
jump command, we just give the label instead of the offset. This way, the PC-relative addressing ap-
pears like absolute addressing. However, you need to be aware that this is just because the Assembler
does the offset computation for you, and that the actual addressing mode is still PC-relative. That
distinction becomes important once a jump goes very far: The offset is usually too small to cover
the full program memory address range. After all, saving the extra 2 bytes for the full address is the
advantage of PC-relative jumps, so we only have a rather limited range for the offset. If the jump
exceeds this, you need a long jump with absolute addressing.
In the above, we said that saving memory is one of the advantages of PC-relative addressing. The
second is that code which avoids absolute program memory addressing and instead uses PC-relative
addressing exclusively is independent of its absolute location within program memory space – the
code is implicitly relocatable: Since the target of each jump is given as an offset from the current
location, it does not make a difference where in the memory space the code actually is. If it uses
absolute addressing on the program memory, it must either be loaded at a specific address, or all ab-
solute references to program memory need to be adjusted – a process which is called code relocation.
That used to be an advantage, but is not really important anymore, since modern Assemblers do all
relocating for you.

Pseudo-Opcodes

You will often need some values in data memory which are initialized at the start of your program.
The question is: how do you get them into data memory in the first place? Of course, you could load
them into a register with immediate adressing and write them into data memory. That would require
two machine commands for each byte, which is a bit of a waste. Ideally, you would need a way to
directly pre-load the data memory, without involving the CPU.

.byte and .word


For things like that, an Assembler offers so-called pseudo-opcodes. These are operation codes which
do not correspond to an actual machine command, but still directly generate output2 . To initialize
bytes in data memory, the following pseudo-opcodes can be used:

twoBytes:
.byte 0x01, 0x02
andOneWord:
.word 0x0403

This pre-loads data memory with the byte sequence 0x01, 0x02, 0x03, and 0x04. The byte 0x01
is at address twoBytes, 0x02 at address twoBytes+1, and the word 0x0403 at address andOneWord
(which is twoBytes+2).

2
Actually, it seems to be quite common to refer to pseudo-opcodes as directives (a directive controls the Assembler’s
internal state, but does not directly generate output).
104 CHAPTER 4. SOFTWARE DEVELOPMENT

.ascii and .asciz


You can even use text strings:

string1:
.ascii "Hello, world!"
string2:
.asciz "Hello!"

The first line puts the sequence of ascii codes for the given characters in memory. The .asciz
pseudo-opcode does the same, but adds a zero as a string terminator.

.space
To initialize a sequence of bytes with the same value, .space can be used:

buffer1:
.space 10, 0x80

This would fill 10 bytes of memory with 0x80, and the label ‘buffer1’ would contain the address
of the first of those ten bytes. If the fill value is omitted, 0x00 is assumed.

Assembler Directives
As we have seen, the main purpose of an Assembler is to translate opcodes (assembly mnemonics)
into machine language. In addition to that, it also does a lot of the mundane tasks of low-level
programming for you, e.g., computing address offsets or relocating code. However, for many of these
tasks, we need to specify information beyond that which is in the assembly opcodes themselves. This
information is provided through so-called directives: Statements which do not directly produce binary
output, but instead change the Assembler’s internal state. For example, a directive is used to specify
the actual location of code in memory.

.org
We learned from the above that the Assembler converts assembly and pseudo opcodes into a binary
file which contains machine language statements or raw binary data. For that, the Assembler uses
a so-called location counter: Each time it encounters an assembly or pseudo opcode, it produces
the appropriate machine instructions or data bytes and puts them in the output file under the address
contained in the location counter. The location counter is continually advanced to point to the next
free address.
However, we definitely need control over the specific location where our code will go. Just take
the AVR’s data memory space: The 32 working registers are mapped into the first 32 (0x20) bytes,
and the various I/O-registers are mapped at addresses 0x0020 –0x005f. We would not want our ascii
strings to overwrite any of those. Rather, if we put a string in data memory, we would want to specify
an address greater than or equal to 0x0060
That is accomplished with the .org-directive, which sets the location counter to a specific ad-
dress – a new ORiGin.

.org 0x0060
.ascii "Hello, world!"
4.2. PROGRAMMING 105

In the example above, the first character of the string is put at address 0x0060, the next at 0x0061,
and so on. The .org-directive can of course be used multiple times:

.org 0x0060
.ascii "Hello, world!"
.org 0x0070
.ascii "Hello, world!"

Here, the first string starts at 0x0060, and the second one at 0x0070.

.section
The address alone, however, is only sufficient to specify a memory location if you are dealing with a
unified address space containing both code and data (a von-Neumann architecture). The AVR is a Har-
vard architecture, which means that data and program memories each have their own address space.
In addition to that, the AVR has EEPROM memory, also with its own address space. This means that
the address 0x0060 could be in any of the three memory spaces. Obviously, we need another directive
to declare which memory space we are currently using, namely .section <section name>.
ASCII strings should of course go into data memory:

.section .data
.org 0x0000
.ascii "Hello, world!"

The first line makes .data the active section. Notice how we specify 0x0000 as the address,
while before we said that addresses below 0x0060 are not available. That’s because the Assembler
avr-as has been specifically tailored to the AVR architecture. Since the first 0x60 bytes in the SRAM
are occupied by working and I/O registers, an implicit offset of 0x0060 is added to all addresses in
the .data section.
On the AVR, the .data section specifies the SRAM. The FLASH (program) memory is referred
to as the .text section, and the EEPROM memory would be the .eeprom section. Note that each
of these sections has their own location counter:

.section .data
.org 0x0010
.byte 0x01
.section .text
.org 0x0080
LDI R16, 0x02
.section .data
.byte 0x03

Here, the byte 0x01 ends up at SRAM address 0x0070 (0x0010 plus implicit offset 0x0060), and
the byte 0x03 at SRAM address 0x0071, despite the .org directive in between. This is because the
.data section has its own location counter, which is unaffected by any .org directive issued while
some other section is active.

.equ
106 CHAPTER 4. SOFTWARE DEVELOPMENT

Up to now, we referred to registers by their actual names R0–R31. To make an assembly program
more readable, it would make sense to be able to assign registers meaningful names. That can be
accomplished with the .equ directive:

.equ loopCounter, R1
LDI loopCounter, 10

.equ is short for equivalent: Here, we tell the Assembler that it should treat the name ‘loop-
Counter’ as equivalent to R1. For C/C++ programmers: .equ works just like the C preprocessor
keyword #define.

Status Flags
If you are familiar with high level languages, but never programmed in assembly language, status
flags might be a new concept for you. Apart from the working registers, program counter, and the
stack pointer, a CPU has a special register containing various status flags. This register is usually
called status or condition code register.

Carry Flag
A status flag is a bit which indicates whether the CPU is in a particular internal state or not. At this
point, we will look at the arithmetic status flags. You know that the CPU contains an arithmetic-logic
unit (ALU), which executes arithmetic and logic operations. For example, let’s look at what can
happen during an add operation:

0x10
+0x20
-----
0x30

In this case, everything is in order. 0x10 + 0x20 makes 0x30. Now try a different addition:

0x70
+0x90
-----
0x100

Adding 0x70 and 0x90, we get a result of 0x100. However, the working registers are all just eight
bit wide. This means that the most significant bit in the result is lost, and the target register will
contain 0x00 after the add operation. Obviously, that would not be acceptable. What to do? Well,
we clearly need to know whether the result of the addition was too large to fit in the target register.
For that – you guessed it – a status flag is used, namely the carry flag. This flag indicates that the last
operation (the add in our case) yielded a result which was too large to fit in the target register – in our
case, a ninth bit was set, which needs to be ‘carried over’ to the next digit. That way, no information
is lost, and the result is correct if we consider the state of the carry flag.
The carry flag enables us to add numbers that are too wide for our working registers, say 0x170 +
0x290:
4.2. PROGRAMMING 107

0x70
+0x90
-----
0x0100
+0x01
+0x02
-----
0x0400

Here, we first add the two least significant bytes, which results in 0x00 and sets the carry flag
to 1. Then, we add the two most significant bytes plus the carry flag. So, we really have two add
operations: One which adds just the two operands, and one which also adds the current carry flag. A
program to compute the above addition would look like this:

LDI R17, 0x01


LDI R16, 0x70 ; R17:R16 = 0x0170
LDI R19, 0x02
LDI R18, 0x90 ; R19:R18 = 0x0290
ADD R16, R18 ; add low bytes (without carry flag)
ADC R17, R19 ; add high bytes (with carry flag)

As you can see, the AVR even offers two different add operations: ADD will just add two register
contents, while ADC (ADd with Carry) adds the carry flag, too.
Now, what about subtraction? Again, there is nothing to it, as long as we don’t exceed the width
of our registers:

0x10
-0x40
-----
0xd0 = -0x30

Why is 0xd0 equal to -0x30? Well, to represent negative numbers, the so-called two’s comple-
ment is used. Starting out with the corresponding positive value, say 0x30, we first create the one’s
complement by inverting each bit: 0x30 = 0b00110000 → 0b11001111 = 0xcf. To that, we add 1 to
+1
arrive at the two’s complement: 0xcf = 0b11001111 −→ 0b11010000 = 0xd0.

Negative Flag
Note that with two’s complement, the range of positive numbers which can be represented in a byte
is restricted to numbers ≤ 127. This also means that any number which has the most significant bit
set is negative – the msb in effect becomes a sign bit. Actually, there is the so-called negative flag in
the status register, which reflects the state of the msb in the result of an operation and thus indicates
that it would be a negative number if interpreted as two’s complement.
Now, what’s up with the +1? Why not just use one’s complement? Well, in one’s complement,
there are two representations for the number 0: +0 (0b00000000) and -0 (0b11111111). Mathemati-
cally, however, they are the same of course. Two’s complement remedies that: 0b11111111 is -1, and
0 ist just 0b00000000. This also means that we can subtract two numbers by making the second one
negative and then just adding them: To compute 0x50 - 0x30, we first compute the two’s complement
of 0x30, which is 0xd0. Then, we add that to 0x50:
108 CHAPTER 4. SOFTWARE DEVELOPMENT

0b01010000 = 0x50
+0b11010000 = 0xd0
-----------------------
0b(1)00100000 = 0x(1)20

If we ignore the carry flag, the result is just what we expected: 0x50 - 0x30 = 0x20. This would
not work in one’s complement.
But wait: why should we ignore the carry flag? Well, we don’t actually ignore it – we just interpret
it as a borrow flag, which is kind of an inverse carry flag. If the borrow flag is 0 (carry flag is 1) after
a subtraction, there was no need to borrow anything. If it is 1 (carry flag is 0), then the second number
was larger than the first, and we had to borrow a higher bit. Consequently, when subtracting the next
higher byte, we need to subtract the borrow flag – if it was 1 (carry flag 0), we subtract just that 1,
because this is what was borrowed. If it was 0, nothing was borrowed, so we don’t subtract anything.
The AVR even negates the carry flag automatically after a subtraction, so in this context, the carry
flag actually is the borrow flag.
Let’s try this with 0x220 - 0x170 (54410 − 36810 = 17610 ):

0x20
+ -0x70 = 0x90
----------------
0xb0 ; carry = 0 -> borrow = 1

0x02
+ -0x01 = 0xff
- 0x01 ; borrow flag from low byte
--------------
0x100 ; carry = 1 -> borrow = 0
--------------
0x00
------
0x00b0

The carry (the msb of the second addition’s result 0x100) is inverted to 0 for the borrow flag,
because no borrow was necessary. This makes the high byte 0x00, which together with the low byte
0xb0 gives the result as 0x00b0 = 17610 .

Overflow Flag
The fact that a byte can now represent both positive and negative integers introduces a new problem:
With unsigned integers, an overflow was no problem, because it could be handled with the carry flag.
With signed integers, it is not that simple. Consider the following addition:

0x60
+ 0x20
------
0x80
4.2. PROGRAMMING 109

If we interpret the numbers as unsigned, everything is in order: 9610 + 3210 = 12810 . However, if
the numbers are supposed to be signed, we have a problem: In two’s complement, 0x80 is not 12810 ,
but rather -12810 . Despite the carry flag not being set, an overflow did actually occur, namely from bit
6 into bit 7, the latter of which is supposed to be the sign bit. To indicate two’s complement overflows,
which result in an incorrect sign bit, the overflow flag is used.

Zero Flag
Another very important flag is the zero flag: This flag is set whenever the result of an operation is
zero. Now, why have a dedicated flag to indicate that the result of, say, a subtraction is zero? After
all, most of the times it is irrelevant whether the result was zero or not. If one must absolutely know
that, why not just compare it to zero afterwards?
Well, that is just it: There are no actual comparisons in assembly language. If you need to compare
the content of two registers, you cannot just write ‘if (Rx == Ry)’ like you would in C. What you can
do is do use those registers in an operation and then look at the status flags. So, to find out whether
the two are equal, you subtract them and look at – yes, the zero flag. If after the subtraction the zero
flag is set, it means that the values were equal. For subtractions used as comparisons, the AVR offers
a special instruction called CP (ComPare). This is actually a SUB, but the result is discarded, so two
registers’ contents can be compared without overwriting one of them in the process.
Of course, just having a flag set when the two values were equal is not enough. We obviously need
a way to change the execution path of our program depending on the zero flag. For that, conditional
branches are used. For each flag, there are usually two branch instructions: One which executes the
branch if the flag is set and skips it if not, and one which does the reverse. In the case of the zero flag,
these two conditional branches are BREQ (BRanch if EQual) and BRNE (BRanch if Not Equal). So, a
comparison would look like this:

Label Opcode
---------------
CP R1, R2 ; computes R1 - R2
BREQ theyWereEqual
theyWereNOTEqual:
...
...
...
JUMP restOfTheProgram
theyWereEqual:
...
...
...
restOfTheProgram:
...

Subroutines and the Stack


In addition to jump/branch instructions, assembly language also offers subroutine calls, which are
somewhat similar to procedure or function calls in higher level languages:
110 CHAPTER 4. SOFTWARE DEVELOPMENT

Label Opcode
...
CALL mySubroutine
...
...
...

mySubroutine:
...
...
...
RET
...
In the above example, the CALL instruction causes the CPU to jump to the label mySubroutine
and execute the instructions there, until it encounters a RET instruction. At that point, it returns to the
line immediately after the CALL.
Of course, in order to correctly return from a subroutine, the CPU must store the return address
somewhere. It could use a special register for that, but what about nested subroutines? After all, we
might call a subroutine from within a subroutine. Sure, we could use multiple return address registers,
but their number would limit our maximum call depth.
An elegant solution for that problem is the so-called stack. Instead of using internal registers to
store a number of return addresses, we store them in data memory and use a special register, the stack
pointer, to point to the current return address.
So, each time the CPU encounters a call instruction, it puts the current address (actually, the
address of the next instruction) onto the stack. The RET instruction causes the CPU to get the most
recent return address from the stack. Now what addressing mode would it use on these accesses? It
seems that register indirect would make sense. After all, the stack pointer is a register, which points
to the current position in the stack. If we need to put something on the stack, we take the contents
of the stack pointer and use it as an address under which we store our data. However, we also need
to advance the stack pointer, so when the next CALL follows, we can put the corresponding return
address onto the stack as well. Obviously, this is a case for register indirect with automatic increment
or decrement.
Which of these should it be, increment or decrement? Well, consider the memory usage of a
program: The working registers will usually not be sufficient to store all the data your program needs.
So you will store some or rather most of it in data memory. You are of course free to write anywhere,
but you will probably start at the lowest addresses. It’s the same with high-level languages: In C, the
heap (where automatic variables are stored) grows from lower to higher addresses. So, it would make
sense to put the stack somewhere else. Now, assuming we have, say, 0x100 bytes of data memory.
We could use the lower 0x80 bytes for general data storage, and allocate the higher 0x80 bytes for the
stack. This would mean that initially, the stack points to address 0x80 and grows upwards.
There is one catch: We may need a lot of memory for data storage, but maybe very little space on
the stack. That would be a waste, because our data can only grow up to 0x80, where it would begin
to overwrite the stack. We could of course move the stack to a higher address, but we would need to
know beforehand how much stack space we are going to need – after all, if we move the stack too
high up, we could run out of stack space, which would be just as fatal.
The solution to this is simple, but effective: We locate the stack right at the top of our data memory
and let it grow downwards. So, if we consume data memory from lowest address up, while the stack
4.2. PROGRAMMING 111

grows from highest address downwards, sharing of memory between data and stack will automatically
work as long as the total amount of available memory is not exceeded.3
This makes register indirect with auto-decrement the logical addressing mode for stack access.
Whether it is pre- or post-decrement depends on the architecture. The AVR uses post-decrement, but
the Motorola HC12, for example, uses pre-decrement. The difference is merely in the initial value
for the stack pointer. With post-decrement, it is initialized to the last valid address, whereas with
pre-decrement, it would be the last valid address + 1.

Interrupts
Interrupts are a special type of subroutine. An interrupt is called automatically when a particular
condition becomes true – for example, when an internal timer/counter overflows, or at the rising edge
of some signal. This event is asynchronous to the program execution, so an interrupt routine can get
called at any point in your program. Obviously, it is mandatory that interrupt routines do not leave
the state of the CPU or data in memory changed in a way which may have an unintentional influence
on program execution at any point. One common cause of problems is the status register:

...
myLoop:
...
...
DEC R16
BRNE myLoop
...

In the above example, an interrupt routine might be executed between the decrement of R16 and
the conditional branch. However, the branch instruction is based on the state of the zero flag as set by
the DEC instruction. Now, if the interrupt routine changes that flag without restoring it, the branch
will be erroneous. Of course, the probability that an interrupt occurs right at that point in the program
is very low – which is actually not a good thing, as it makes that bug not only extremely difficult to
track down, but also highly elusive during testing.

Push, Pop
So the CPU uses the stack to remember return addresses for subroutine and calls and interrupts, but
of course, you can use the stack as well. Consider the case where you call a subroutine: At this
point, many of the working registers will be in use. The subroutine will, of course, also need to use
some registers. This means that you may need to save some registers before you call the subroutine.
Considering that the subroutine might, in turn, call another subroutine, or that you might even have
recursive subroutine calls, avoiding register contention would become a major issue.
This is where the stack comes in handy: When you write a subroutine, you make sure that at all
registers you are going to use are saved at the beginning and restored at the end of the subroutine.
This is best accomplished by putting them on the stack:

Label Opcode
...
3
Of course, for any serious or even critical application it is still mandatory to formally verify that the available memory
will be sufficient for all possible executions.
112 CHAPTER 4. SOFTWARE DEVELOPMENT

mySubroutine:
PUSH R16
PUSH R17

LDI R16, 0x10


LDI R17, 0x40
...
...
...
POP R17
POP R16
RET
...

The subroutine uses two working registers, R16 and R17. Right at the start, both registers’ con-
tents are put on the stack. After that, you are free to use them. At the end, you load the values back
from the stack, restoring the registers’ contents to what they were when the subroutine was called.
This also works if the subroutine is called recursively: Each time, the current content of the
registers is saved on and restored from the stack. It is of course critical that the recursion is bounded
so the stack does not overrun.
Note that the stack is a LIFO (last in, first out) storage: We first PUSH R16 onto the stack, post-
decrementing the stack pointer. Then we PUSH R17, and again the stack pointer is decremented. At
this point, the value on the stack is R17. So, when we restore the values, we first POP R17, which
increments the stack pointer and then reads the value. The next POP instruction also increments the
stack pointer, which now points to the content of R16.
While the stack is really neat as a place to temporarily save register contents, one must always be
extremely careful with it. What happens, for example, if at the end of the subroutine, we forget to
POP one of the registers, say R16? Well, of course, the content of R16 will not be restored to what
it was before the subroutine was called. That’s bad, but unfortunately, the problems don’t stop there:
Remember how the CPU uses the stack to store the return address for each CALL instruction? If
we forget to POP R16, that one byte remains on the stack. Upon the RET instruction, the CPU goes
ahead and reads the return address (two bytes) from the stack, which it stored there during the CALL
instruction. Due to our missing POP instruction, however, instead of reading first the low byte and
then the high byte of the return address, it reads the content of R16 and the low byte of the return
address. The two bytes are then assembled to form what should be the return address, but of course
isn’t. The CPU now tries to jump back to the instruction right after the most recent CALL, but instead,
it jumps to an address which is basically random, the low byte being the original content of R16, and
the high byte being the low byte of the actual return address.
Another sure recipe for disaster is forgetting to initialize the stack pointer in the first place. Upon
reset, the stack pointer is initialized to 0x0000. You need to set the stack pointer to the highest data
memory address, otherwise your program will most likely crash at the first subroutine call or interrupt.

An Example Program
Here’s a little demo program to compute a checksum over eight bytes in FLASH memory:

; include register definitions for ATmega16,


4.2. PROGRAMMING 113

; but do not put into LST-file


.NOLIST
.INCLUDE "m16def.inc"
.LIST

.equ temp, 0x10


.equ loopCounter, 0x11
.equ checkSum, 0x12

.section .text

.global Main

.org 0x0000

Reset:
rjmp Main ; this is the reset vector

Main:
; initialize stack pointer
ldi temp, lo8(RAMEND)
out SPL, temp
ldi temp, hi8(RAMEND)
out SPH, temp

; initialize Z to point at Data


ldi ZL, lo8(Data)
ldi ZH, hi8(Data)

; we need to loop 7 times


ldi loopCounter, 0x07

; load first data byte


lpm checkSum, Z+

ComputeChecksum:
lpm temp, Z+
eor checkSum, temp
dec loopCounter
brne ComputeChecksum

Infinite_loop:
rjmp Infinite_loop

Data:
.byte 0x01, 0x02, 0x03, 0x04
.byte 0x05, 0x06, 0x07, 0x08
114 CHAPTER 4. SOFTWARE DEVELOPMENT

Now, let’s take a closer look:

; include register definitions for ATmega16,


; but do not put into LST-file
.NOLIST
.INCLUDE "m16def.inc"
.LIST

As in other languages, it is common to put definitions in separate files to be included in your


source files. The file we include here, m16def.inc, is provided by the manufacturer of the AVR, and
it contains definitions of register names (via the .equ directive) which conform to the datasheet. So,
when you include that file, you can refer to the General Interrupt Control Register by its short name
GICR, rather than just its address 0x3b. Also, some values specific to the ATmega16 are defined, for
example RAMEND, which is the highest address of the ATmega16’s SRAM.
The .NOLIST and .LIST directives control whether the output of the Assembler is to be included
in the so-called list file. A list file is optionally generated by the Assembler, and it contains the
source lines along with the machine code the Assembler produced. Here, we don’t want the entire
m16def.inc file to be included in the list file.

.global Main

This is just to demonstrate how labels are made public to other files. Normally, the scope of a
label is the file, so if your program consists of two files, and you define a label ‘myLoop’ in one of
them, it will only be visible in the other file if you explicitly make it known to the linker by declaring
it global.

.section .text

The purpose of that line should be clear by now: We activate the .text section, which is where
our program code should go.

.org 0x0000

This directive is redundant, as the location counter of the .text section is initialized to 0x0000
anyway. However, it is good practise to clearly mark the starting address with a .org directive.

Reset:
rjmp Main ; this is the reset vector

As explained in the datasheet in Section ‘Interrupts’, the AVR expects the interrupt vector table at
the beginning of the FLASH memory. Each interrupt source is assigned a vector, which is a particular
address the CPU jumps to when the interrupt occurs. The vector for the External Interrupt 0 (INT0),
for example, is at word address 0x0002 (which is byte address 0x0004). This means that if INT0 is
enabled and the interrupt occurs, the CPU jumps to word address 0x0002. Normally, there will be a
jump instruction to the actual interrupt routine.
The first interrupt vector is for the reset routine. This vector is called upon reset, in particular after
power-on. So, we will usually put a jump to the start of our program at that address, just like shown
above. However, if we do not need any other interrupt vectors, it is of course possible to just put the
program at address 0x0000.
4.2. PROGRAMMING 115

Main:
; initialize stack pointer
ldi temp, lo8(RAMEND)
out SPL, temp
ldi temp, hi8(RAMEND)
out SPH, temp

This example program does not actually need the stack, as no interrupts/subroutines nor any
PUSH/POP instructions are used. Still, since forgetting to initialize the stack is a pretty popular
mistake among beginners, we do it here anyway. It’s simple, really: the m16def.inc include file de-
fines both the stack pointer register (as low and high byte, SPL and SPH), and the highest address in
the ATmega16’s SRAM (RAMEND, which is defined as 0x045f). hi8() and lo8() are used to get the
high and low byte of the word address and assign it to the stack pointer high and low byte.

; initialize Z to point at Data


ldi ZL, lo8(Data)
ldi ZH, hi8(Data)

The bytes over which we need to compute our checksum are located in the FLASH right behind
the program code, at label ‘Data’. The LPM instruction, which reads bytes from the FLASH, uses
the Z register for register indirect addressing. Therefore, we initialize the Z register with the address
marked by the ‘Data’ label.

; we need to loop 7 times


ldi loopCounter, 0x07

; load first data byte


lpm checkSum, Z+

We need to repeat the loop seven times – the first of our eight bytes is loaded into the checkSum
register (using post-increment), which then is EORed with the remaining seven bytes consecutively.

ComputeChecksum:
lpm temp, Z+
eor checkSum, temp
dec loopCounter
brne ComputeChecksum

The label marks the start of the loop. First, the next byte is loaded into a temporary register (since
the EOR only accepts register addressing), then it is EORed into the checkSum register. The loop
counter is decremented, and unless it has become zero, the BRNE jumps back to the start of the
loop. Note that LPM uses register indirect with post-increment as the addressing mode for the source
operand, so we do not need to increment the address pointer Z manually.

Infinite_loop:
rjmp Infinite_loop
116 CHAPTER 4. SOFTWARE DEVELOPMENT

At this point, our program is finished. However, the CPU keeps working, of course. It would
continue from here on, trying to execute whatever garbage may be left in the FLASH. For the sake of
simplicity, in this example we just append an infinite loop. Any real program would put the AVR into
an appropriate sleep mode as long as there is nothing to do.

Data:
.byte 0x01, 0x02, 0x03, 0x04
.byte 0x05, 0x06, 0x07, 0x08

Finally, this is the data that we operate on. Technically, this should of course go into the .data
section. However, the SRAM is not accessible for the programmer, so we can only allocate space
there, but we cannot have it automatically initialized. If we need initialized data in the .data section
(SRAM), we would still need to put it in the .text section (FLASH) and then copy it into the .data
section at the start of our program.
This concludes our short introduction into assembly language. For more information, refer to the
‘AVR Instruction Set Manual’ and the ‘ATmega16 Datasheet’ as well as the GNU Assembler Manual.
4.3. DOWNLOAD 117

4.3 Download
After a program has been compiled and linked, you need to download the executable to the micro-
controller. On the host side, downloading is generally done via the serial or parallel port. On the
microcontroller’s side, one or more programming interfaces are available. The big questions are how
host and target are connected, and how the microcontroller knows when to take over a new program
and where to put it.
But before we take a closer look on how a program gets into the controller, let us first consider
what we want to download in the first place. When you write a program and compile it, the com-
piler will generate one binary file. This file contains the different segments, like the text segment
with the program code, several data segments, and possibly an EEPROM segment containing EEP-
ROM data. If all your controller’s memory types are accessible through one common address space
(see Section 2.2), you can simply download this binary. The linker will have made sure that the
segment addresses correlate to the start addresses of the different memory types, ensuring that the
program ends up in the program memory, variables in RAM, and the EEPROM data in the EEPROM
memory. If your controller has different address spaces, however, it may be necessary to extract the
different blocks (program code, EEPROM, possibly RAM data) from the binary and download them
separately. For example, the ATmega16 has a Harvard architecture with separate Flash, RAM, and
EEPROM memory address spaces. Of these, only Flash and EEPROM are externally accessible, and
it is necessary to program these two separately. So in the case of the ATmega16, you would extract
both the program code and the EEPROM data from the single binary generated by the compiler, and
download these files separately. RAM cannot be programmed at all, so if initialized variables are
used, their values are stored in program memory by the compiler and copied into RAM by the startup
code.

4.3.1 Programming Interfaces


Microcontrollers have at least one, but often several programming interfaces. These interfaces may
be normal communication interfaces that are used for programming as well, like the SPI, special in-
terfaces just used for programming, like the parallel programming interface of the Atmel ATmega16,
or debug interfaces (JTAG, BDM) used for programming.
In any case, there is a certain programming protocol that has to be followed. As an example,
let’s consider programming the ATmega16 over the SPI interface: Here, you need to pull the RESET
pin to low and then transmit a special “Programming Enable” instruction (0xAC53XXXX, where X
means don’t care) to commence programming. While transmitting the third byte of the instruction,
the second byte is echoed back to acknowledge programming mode. If it does not echo back cor-
rectly, you need to give the RESET line a positive pulse after the fourth byte and then try again. After
programming mode has been entered, further instructions like “Chip Erase”, “Read Program Mem-
ory”, or “Write EEPROM Memory” are available. To end the programming session, just release the
RESET line to commence normal program execution. Similar protocols must be followed with other
programming interfaces.
Obviously, connecting such an interface to the serial port of the PC requires special software, the
programmer, as well as special hardware, the programming adapter. For the programming adapter,
you may require at least some logic to translate the PC signals to the voltage of the microcontroller and
vice versa. More elaborate hardware may also contain additional logic to implement the programming
protocol, for example JTAG adapters contain a small controller for that purpose.
118 CHAPTER 4. SOFTWARE DEVELOPMENT

As far as the programmer is concerned, it normally needs to access the pins of the PC’s serial
port directly to implement the programming protocol. Simple serial transmission using the standard
UART protocol is only possible if there is external hardware to implement the programming protocol
itself.
The same is true for using the PC’s parallel port. Note that if the programming interface requires
more than two wires, you can only use USB if the programming adapter is capable of implementing
the programming protocol. If it is not, then a simple USB to RS-232 converter will not work, as you
need more than just the RX and TX pins of the serial interface.

4.3.2 Bootloader
An alternative to using the programming interface every time you want to change your application
program is to use a bootloader. This is a piece of software already residing in the controller’s memory
that takes over new user programs and installs them in the controller. In that case, programming can
be done for example via the UART interface of the controller, so there may not be any need for more
than a simple (or no) programming adapter.
The important thing here is how control is transfered from the bootloader to the user program and
vice versa. After all, if you want to program something, you need control to lie with the bootloader.
At all other times, the controller should execute your program (and the bootloader should not interfere
with program execution). This problem can be solved if the bootloader is executed directly after the
reset. The bootloader simply checks on an external pin whether the user wants to program something,
and if not, it transfers control to the user application. If the pin, which could be connected to a
jumper on the board, indicates that a new program will be downloaded, then the bootloader enters
programming mode, in which it accepts the new program from the PC and stores it in the program
memory of the controller. After the download has completed, the bootloader transfers control to the
application program.
When using a bootloader and normal RS-232 communication, the download protocol is only
determined by the bootloader. The programmer on the host does not have to access any port pins
and need not even know any particulars about the programming interfaces of the target controller.
Furthermore, additional features like integrity checks by the bootloader can be implemented. On the
negative side, the bootloader takes up memory space in the controller, so it should be small. Secondly,
if anything happens to the bootloader, either through an accidental overwrite by the application (some
controllers have a special bootloader section which cannot be overwritten by application code) or
through a bit flip, then the bootloader has to be reprogrammed the hard way through the controller’s
normal programming interface. Finally, not all microcontrollers allow residential code to overwrite
the program memory.

4.3.3 File Formats


Apart from considerations about programming protocols and interfaces, there is the question of which
file format to use for downloading the program. Obviously, the final download into the memory of
the controller should be binary, storing the sequence of opcodes (see Section 2.1.2) in the program
memory. However, it makes sense to use an extended file format for programming which also contains
information about the size of the program, its intended location, and a checksum to ensure integrity.
The programmer (or bootloader) translates this format into the binary form required to program the
memory. Therefore, it depends on the programmer which object file format should be used.
4.3. DOWNLOAD 119

Two ASCII file formats are widely used for this purpose, the Hex format from Intel and the S-
Record format from Motorola. The advantage of using an ASCII file format is that is allows to view
the file with a text editor.

Intel’s Hex Format

A hex file [Int88] consists of a series of lines (records) in a file. Each record is made up of six fields:

Field #chars Description


1 Mark 1 a simple colon, ’:’
2 Length 2 number of bytes in the data field
3 Offset 4 the address (2 byte) at which data should be programmed
4 Type 2 record type (00, 01, or 02)
5 Data 0-2k 0 to k bytes; this contains the opcodes
6 Checksum 2 sum of bytes in fields 2-5 plus checksum are zero

Note that since this is an ASCII encoding, each byte (in hexadecimal) requires two characters. For
example, a byte with value 255 would be written as “FF”.
The format can be used for 8-, 16- and 32-bit microprocessors. It distinguishes between several
different record types, not all of which are available for all architectures:

Type Description Architecture


’00’ data record 8-, 16-, 32-bit
’01’ end of file record 8-, 16-, 32-bit
’02’ extended segment address record 16-, 32-bit
’03’ start segment address record 16-, 32-bit
’04’ extended linear address record 32-bit
’05’ start linear address record 32-bit

Consider the following example (taken from an ATmega16 assembly program):

:100000000CC00F9300E000000000000000000A9503
:10001000D1F70F910A95A9F708950FE50DBF04E0F8
:100020000EBF00E005BB0FEF04BB11E015BB00E005
:0E003000E8DFE7DFE6DF8894111FF0F3F7CF7B
:00000001FF

The first line has data length 0x10 = 16 bytes, programming should start at address 0x0000,
and the type of the record is 00 (data). After that follow 16 bytes of data, starting with 0x0C. The
ATmega16 has a 16-bit opcode and is a little-endian machine, so the first opcode is 0xC00C (0x0C at
byte address 0x0000, 0xC0 at byte address 0x0001), which translates to an rjmp to address 0x0C, in
this case the start of the main program. The last byte in the record, 0x03, is the checksum, which you
get by summing up the bytes from 0x0C until 0x95 (that makes 0x02FD) and computing the two’s
complement of the lowest byte (-0xFD = 0x03). The following three records are also data records.
The last line is the end-of-file record.
120 CHAPTER 4. SOFTWARE DEVELOPMENT

Motorola’s S-Record Format


The second popular file format is the S-record file format, which again consists of a sequence of lines
called records. Each record is made up of the following fields:

Field #chars Description


1 Start Mark 1 the letter ’S’
2 Type 1 record type (0, 1, or 9)
3 Length 2 number of bytes to follow
4 Address 4 the address (2 byte) at which data should be programmed
5 Data 0-2k 0 to k bytes; this contains the opcodes
6 Checksum 2 sum of bytes in fields 3-5 plus checksum are 0xFF

The format can be used for 8-, 16- and 32-bit microprocessors. However, only the types 0, 1, and 9
are important for 8-bit architectures (giving the file format the alternative name S19 file format):

Type Description
0 header
1 data
9 end of record

Formats S2 (24-bit addressing) and S3 (32-bit addressing) with additional record types 2, 3, 5, 7, 8
are also available.

Consider the following example (taken from the same ATmega16 assembly program as the hex
format example above):

S00C000064656D6F2E7372656373
S11300000CC00F9300E000000000000000000A95FF
S1130010D1F70F910A95A9F708950FE50DBF04E0F4
S11300200EBF00E005BB0FEF04BB11E015BB00E001
S1110030E8DFE7DFE6DF8894111FF0F3F7CF77
S9030000FC

Looking again at the first line, we see a start-of-record line. It has 0x0C=12 bytes, has a start
address of 0x0000 (which is not important, since this line is ignored anyway), and contains the file
name (in our case demo.srec) as data. The last byte 0x73 is the checksum, which is computed by
summing up the bytes from 0x0C to 0x63 (that makes 0x038C) and computing the one’s complement
of the lowest byte (∼ 0x8C = 0x73). The next line is the first data record and contains the same data
entry as the Intel hex record. The last line is the end-of-file record.
4.4. DEBUGGING 121

4.4 Debugging
Of course it is possible to develop and debug embedded applications without any special development
and debugging tools – you only need a way to download the program to the microcontroller. In
the beginnings of microcontroller software development, which means the 70’s and early 80’s, this
often was the method of choice: Debugging tools were rare, tools for new architectures often non-
existant. In consequence, the program was often developed on paper, burned into an EPROM, and
then tested on the target hardware. Debugging, unavoidable in complex applications, was either done
with external measurement equipment like logic analyzers, or realized through more or less creative
use of the output elements on the target. For example, targets generally contained some LEDs for
status output, which were used for debug output during the debugging phase. Through them, the
software engineer visualized the program flow, indicating if and in which order the program reached
certain memory addresses.
Since programming an EPROM took a lot of time, so-called ROM emulators resp. EPROM emu-
lators were employed; these consisted of a RAM of the same size, which used some additional logic
to simulate the behavior of the ROM resp. EPROM in the target hardware, but was at the same time
externally accessible to facilitate programming. With these emulators, program and data could be
directly downloaded from a host PC to the target hardware, much as we nowadays program a Flash
memory. Such ROM emulators saved a lot of time, but did not facilitate the debugging process itself.
Still, it was possible to debug applications this way, even though it took a lot of time and patience.
However, since at least the former tends to be in short supply in any commercial project, efforts were
made to facilitate the debugging process at an early age. Even so, the techniques used in the early
years of embedded systems programming are still important in situations where no debugging envi-
ronment is available (either because an exotic controller is being used or because the controller is still
too new to be supported by a tool chain). It is also often the case that people who know how to debug
without tools are better at debugging (with or without tools) than people who have only learned to
debug in elaborate debug environments. Therefore, we will first give you an overview of techniques
useful when no debugger is available, before we shift our concentration to the various debugging tools
available today.
Before we move on to the different debugging tools, let us consider what it is we need from
a debugger. Any state-of-the-art debugger will offer breakpoints, that is, it will allow the user to
define points in the code where program execution should stop and control should be transfered to the
debugger. Related to that is the single-stepping feature, which simply executes the code instruction
by instruction. When control is with the debugger, the user generally wants to get information about
the state of the program. On top of the list is the examination and modification of variable contents,
followed by information about the function call history and the parameters with which functions were
called. So any debugging tool worth its salt should be able to offer these features to the user. When
developing for embedded and real-time systems, the timing behavior of the program and its interaction
with the hardware become issues as well. So ideally, useful debuggers should also support the user in
this regard.

4.4.1 No Debugger
Before you start your project, be aware that the less elaborate your tools, the better your program
structure must be. As we have already mentioned, it is always important to write modular code, to
design good and well-defined interfaces between modules, and to write good program comments.
These things become vital if you plan to debug without tools. Also, try to avoid side-effects and do
122 CHAPTER 4. SOFTWARE DEVELOPMENT

not strive to be “clever”. Instead, strive for clear and easy to understand code. And, very important,
already plan your testing strategy before you start programming.
Now despite all your efforts, even the most perfectly designed and implemented program will
probably have some bugs. You notice a bug by conducting a test and detecting a divergence between
the expected behavior and the observed one. Naturally, you want to find out what went wrong and fix
it. Even though elaborate debugging tools are nowadays available for all popular architectures, you
may still occasionally be forced to work on a system that has no such support. However, as long as
the system has some output mechanisms, not all is lost. Depending on what your target system has to
offer, you have several options:

LEDs

LEDs can be used to display information about the application’s state. Items that are useful for
debugging include the contents of registers and memory, the current location of the stack pointer, the
function call history, function parameters, whether some sections of code are reached, . . .
For example, LEDs can easily be used to trace program execution. To do this, you switch on
different LEDs at different locations in your code. For example, if you have 4 LEDs, you could
check whether 4 specific and independent locations in your code are reached by simply turning on the
associated LED at each of these points. The technique is very useful for verifying whether ISRs are
called, or whether some conditional part of the code gets executed.
You can also use LEDs to implement (conditional) breakpoints and display the ID of the break-
point reached. For example, if you define a macro

#define CHECK(c,n) { \
if (!(c)) { /* if condition not true */ \
OUT_LED = n; /* -> display breakpoint number */ \
for (;;) ; /* -> halt program */ \
} \
}

you can use it in your code to verify conditions. Thus, the code

CHECK (1==1,1);
CHECK (1>2,2);
CHECK (2*2>4,3);

will display the binary value 2 on the LEDs and then halt the program. If you have 4 LEDs available,
you can implement 15 such breakpoints (all LEDs off indicates that no breakpoint is active).
Of course, LEDs can also be used to display memory contents like registers or variables. Depend-
ing on the number of LEDs available, you may have to split up the data and display it sequentially
(e.g., display first the high nibble, then pause, then display the low nibble of a byte). The same goes
for the stack pointer value (to check whether you have a stack overflow), or the stack contents, which
can be used to trace back the function call history (all return addresses are on the stack) and the pa-
rameters with which a function was called (also stored on the stack). If you have a numeric display,
you can even display data in a more convenient form as hex or even BCD numbers. But be aware that
a numeric multi-digit display requires more sophisticated control.
4.4. DEBUGGING 123

Switches & Buttons


Buttons can be used to implement single stepping through the code. To achieve single-stepping, you
just need to implement a loop that waits until a button is pressed. Your single-step macro could look
similar to this:

#define STEP() { \
for (; IN_BTN & (1<<BTN1) ;) ; /* wait for BTN1 pressed */ \
for (; ˜IN_BTN & (1<<BTN1) ;) ; /* wait for BTN1 release */ \
}

It is important that you wait not only until a button is pressed, but also until the button is released
again (and you possibly have to debounce the button as well). Otherwise, you could run through
several consecutive steps before the button is released. The single-step macro can be combined with
the breakpoint macro to allow stepping from one breakpoint to the next.
You can also implement a break mechanism if your button is connected to an input pin that can
generate an interrupt (preferably an NMI). Now if your program hangs, you can press the button and
its ISR gets called. In the ISR, you can output the return address from the stack to find out where your
program was stuck.
Switches can also be very useful for debugging. For instance, you can implement a rudimentary
stimulus generator: Just reroute the input routine to read from the switches instead of its normal port,
then use the switches to test different stimuli and see how your program reacts.
You can also use switches to control program flow: Override branches in the program flow with
switch states to manually direct your program.

UART
If you have a free serial connection, you have more or less hit the jackpot. You can set up a serial
connection with your PC, allowing you to transmit any amount of data you like in a human-readable
form. It will also free the target hardware from debug-related tasks. In addition to simple monitoring,
you can also enable the user to interact with the target software, even to change the contents of
variables. If you so desire, you can build your own personal ROM monitor (see Section 4.4.2).
However, the more elaborate your debug software, the more effort you have to invest to get it right.

All techniques mentioned above can help you a lot if you have no other tools available. However, they
do come with some strings attached. First of all, the I/O features you use must be available. If they
are normally used by the application itself, you must make sure that this does not interfere with your
debug actions. Second, these techniques require you to instrument your code (i.e., put your debug
code into the application code), so they interfere with the timing behavior of the application. Hence,
these mechanisms are unsuited to debug time-sensitive areas.
As a concluding remark, let us state something that should be obvious to you anyway: You need
to test and debug your debugging code, before you can use it to debug your program. If your debug
code is faulty, this can cost you more time than you could expect to save by using it.
124 CHAPTER 4. SOFTWARE DEVELOPMENT

4.4.2 ROM Monitor


Since it is tedious to instrument the program code manually, soon better ways to debug were devel-
oped. The ROM monitor is a piece of software running on the target controller that can be seen as
a rudimentary operating system. In its simplest form, it uses a numeric display and a hex keypad
to allow the user interactive debugging. After a reset, control lies with the monitor, which can set
breakpoints, display and modify memory contents, or single-step through the code. To implement
breakpoints, the monitor replaces the instruction at the breakpoint address with a jump to the monitor
code, which then allows to check the contents of registers and variables. To resume program execut-
ing, the monitor simply restores the original instruction and transfers control back to the application.
Since such software breakpoints require that the program memory can be written by the controller it-
self, which is not supported by all controller architectures, some microcontrollers also offer hardware
breakpoints. Here, the microcontroller itself will interrupt program execution and transfer control
back to the monitor when such a breakpoint is reached.
So you see, a ROM monitor already meets many of our requirements to a suitable debugger.
However, its interface still leaves room for improvement. Therefore, it became common to add a
serial interface to the system and use the host PC to control the ROM monitor. This opened the door
for nice integrated debug interfaces, making the ROM monitor a very useful debugging tool that has
maintained its popularity until today. Instead of the serial interface, modern debug monitors may use
parallel interfaces or Ethernet. Most monitors also support program download. Note that the term
ROM monitor stems from a time when this program was indeed in (EP)ROM where it could not be
accidentally overwritten. With current architectures, it may also be in EEPROM/Flash or even in
RAM.
Of course, the ROM monitor, although commonly used, has its drawbacks. First of all, it takes
up some of the target controller’s memory. Second, as long as the target controller does not provide
breakpoints, the application program must be located in RAM and must be writable by the controller
itself – no matter of course for a harvard architecture. Furthermore, the monitor requires an interface
all to itself. Finally, in architectures where the monitor program is stored in writable memory and
where the application can overwrite program memory, the monitor may be erroneously overwritten
by the program, in which case it cannot be used to locate the bug.

4.4.3 Instruction Set Simulator


Since it is a lot more comfortable to develop on the PC than it is to work on the target hardware,
instruction set simulators (ISS) were developed to allow the execution of target software on the host
PC. The ISS accurately simulates the target controller down to the number of clock cycles required
to execute different instructions. Note that this does not mean that the simulator executes the appli-
cation program in the same time as the target controller – after all, the PC is much faster than the
microcontroller. But if instruction A takes 1 cycle on the target controller and instruction B takes 2
cycles, then a cycle-accurate simulator will keep this relationship intact. In addition to the processor
core, the simulator also accurately simulates the other modules of the microcontroller, like digital I/O
and timers. The ISS hence allows the software engineer to execute the application code in an envi-
ronment that maintains the timing behavior of the target microcontroller. Furthermore, the simulator
provides all the debug features typically found in modern debuggers, ranging from single-stepping to
memory manipulation. It also allows the user to watch processor-internal registers to better help track
problems.
Although it has many advantages, the ISS is not the last word on the subject. It is indeed very

You might also like