8086 Assembly
Programming
What is in a Computer?
The field of Computer Architecture is about the
fundamental structure of computer systems
What are the components?
How are they interconnected?
How fast does the system operate?
What is the power consumption?
How much does it all costs?
What architecture leads to the “best” trade-offs?
The conceptual model for computer architecture that
is still in effect since 1965 is the Von-Neumann
architecture
Instructions?
Whenever somebody builds a CPU they first define
what instructions the CPU will know how to decode
and execute
This is called the Instruction Set Architecture (ISA)
The ISA for a Pentium is different from the ISA for a
PowerPC for instance
The ISA is described in a (lengthy) documentation
that describes everything that one can do with the
CPU
Every instruction lasts some number of clock cycles
Instructions
Instructions are encoded in binary machine code
E.g.: 01000110101101 may mean “perform an addition of two
registers and store the results in another register”
The CPU is built using gates (or, and, etc.) which
themselves use transistors
These gates implement instruction decoding
Based on the bits of the instruction code, several signals are
sent to different electronic components, which in turn perform
useful tasks
Typically, an instruction consists of two parts
The opcode: what the instruction computes
The operands: the input to the computation
opcode operands
0 1 0 0 0 1 1 0 1 0 1 1 0 1
Assembly language
It’s really difficult for humans to read/remember
binary instruction encodings
We will see that typically one would use hexadecimal
encoding, but still
Therefore it is typical to use a set of mnemonics,
which form the assembly language
It is often said that the CPU understands assembly
language
This is not technically true, as the CPU understand
machine code, which we, as humans, choose the
represent using assembly language
An assembler transforms assembly code into
machine code
Assembly Language
It used to be that all computer programmers did all
day was to write assembly code
This was difficult for many reasons
Difficult to read
Very difficult to debug
Different from one computer to another!
The use of assembly language for all programming
prevented the (sustainable) development of large
software project involving many programmers
This is the main motivation for the development of
high-level languages
FORTRAN, Cobol, C, etc.
Why Assembly?
It's difficult
Error prone
Hard to debug
Takes a lot of time to
develop
Why Assembly?
However:
Assembly is fast. A LOT faster than any compiler
of any language could ever produce.
Assembly is a lot closer to machine level than
any language because the commands of
assembly language are mapped 1-1 to machine
instructions.
Assembly code is a lot smaller than any compiler
of any language could ever produce.
In Assembly, we can do a lot of things that we
can't do in any higher level language, such as
playing with processor flags, etc.
High-level Languages
The first successful high-level language was FORTRAN
Developed by IBM in 1954 to run on they 704 series
Used for scientific computing
The introduction of FORTRAN led people to believe that there would
never be bugs again because it made programming so easy!
But high-level languages led to larger and more complex software
systems, hence leading to bugs
Another early programming language was COBOL
Developed in 1960, strongly supported by DoD
Used for business applications
In the early 60s IBM had a simple marketing strategy
On the IBM 7090 you used FORTRAN to do science
On the IBM 7080 you used COBOL to do business
Many high-level languages have been developed since then, and
they are what most programmers use
Fascinating history
High-Level Languages
Having high-level languages is good, but CPUs do not
understand them
Therefore, there needs to be a translation from a high-level
language to machine code
There are two ways to run a high-level language on a CPU
that only understands machine code:
Interpretation: An interpreter is a program that reads in high-
level code and simulates a computer that understands high-
level code
Compilation: A compiler is a program that reads in high-level
code and produces equivalent machine code, which can then
be executed on the CPU at a later time
Some languages are interpreted, some are compiled, some
are both or hybrid
The Big (Simplified) Picture
Machine code
High-level code
010000101010110110
101010101111010101
101001010101010001
101010101010100101
char *tmpfilename; 111100001010101001
int num_schedulers=0;
000101010111101011
ASSEMBLER
int num_request_submitters=0;
int i,j; 010000000010000100
000010001000100011
if (!(f = fopen(filename,"r"))) {
xbt_assert1(0,"Cannot open file %s",filename); 101001010010101011
} 000101010010010101
while(fgets(buffer,256,f)) {
if (!strncmp(buffer,"SCHEDULER",9))
010101010101010101
num_schedulers++; 101010101111010101
if (!strncmp(buffer,"REQUESTSUBMITTER",16)) 101010101010100101
num_request_submitters++;
} 111100001010101001
fclose(f);
tmpfilename = strdup("/tmp/jobsimulator_
Assembly code
sll $t3, $t1, 2
add $t3, $s0, $t3
sll $t4, $t0, 2
Program counter register
add $t4, $s0, $t4
lw $t5, 0($t3) register
CPU
lw $t6, 0($t4) register
slt $t2, $t5, $t6
COMPILER
beq $t2, $zero, endif
add $t0, $t1, $zero
sll $t4, $t0, 2 Control
add $t4, $s0, $t4 ALU Unit
lw $t5, 0($t3)
lw $t6, 0($t4)
slt $t2, $t5, $t6
beq $t2, $zero, endif
The Big (Simplified) Picture
Hand-written Machine code
High-level code Assembly code 010000101010110110
101010101111010101
101001010101010001
101010101010100101
char *tmpfilename; sll $t3, $t1, 2 111100001010101001
int num_schedulers=0; add $t3, $s0, $t3
000101010111101011
ASSEMBLER
int num_request_submitters=0; sll $t4, $t0, 2
int i,j; 010000000010000100
add $t4, $s0, $t4
lw $t5, 0($t3)
000010001000100011
if (!(f = fopen(filename,"r"))) {
xbt_assert1(0,"Cannot open file %s",filename); lw $t6, 0($t4) 101001010010101011
} slt $t2, $t5, $t6 000101010010010101
while(fgets(buffer,256,f)) {
if (!strncmp(buffer,"SCHEDULER",9))
beq $t2, $zero, endif 010101010101010101
num_schedulers++; 101010101111010101
if (!strncmp(buffer,"REQUESTSUBMITTER",16)) 101010101010100101
num_request_submitters++;
} 111100001010101001
fclose(f);
tmpfilename = strdup("/tmp/jobsimulator_
Assembly code
sll $t3, $t1, 2
add $t3, $s0, $t3
sll $t4, $t0, 2
Program counter register
add $t4, $s0, $t4
lw $t5, 0($t3) register
CPU
lw $t6, 0($t4) register
slt $t2, $t5, $t6
COMPILER
beq $t2, $zero, endif
add $t0, $t1, $zero
sll $t4, $t0, 2 Control
add $t4, $s0, $t4 ALU Unit
lw $t5, 0($t3)
lw $t6, 0($t4)
slt $t2, $t5, $t6
beq $t2, $zero, endif
What we do in this class:
Hand-written Machine code
High-level code Assembly code 010000101010110110
101010101111010101
101001010101010001
101010101010100101
char *tmpfilename; sll $t3, $t1, 2 111100001010101001
int num_schedulers=0; add $t3, $s0, $t3
000101010111101011
ASSEMBLER
int num_request_submitters=0; sll $t4, $t0, 2
int i,j; 010000000010000100
add $t4, $s0, $t4
lw $t5, 0($t3)
000010001000100011
if (!(f = fopen(filename,"r"))) {
xbt_assert1(0,"Cannot open file %s",filename); lw $t6, 0($t4) 101001010010101011
} slt $t2, $t5, $t6 000101010010010101
while(fgets(buffer,256,f)) {
if (!strncmp(buffer,"SCHEDULER",9))
beq $t2, $zero, endif 010101010101010101
num_schedulers++; 101010101111010101
if (!strncmp(buffer,"REQUESTSUBMITTER",16)) 101010101010100101
num_request_submitters++;
} 111100001010101001
fclose(f);
tmpfilename = strdup("/tmp/jobsimulator_
Assembly code
sll $t3, $t1, 2
add $t3, $s0, $t3
sll $t4, $t0, 2
Program counter register
add $t4, $s0, $t4
lw $t5, 0($t3) register
CPU
lw $t6, 0($t4) register
slt $t2, $t5, $t6
COMPILER
beq $t2, $zero, endif
add $t0, $t1, $zero
sll $t4, $t0, 2 Control
add $t4, $s0, $t4 ALU Unit
lw $t5, 0($t3)
lw $t6, 0($t4)
slt $t2, $t5, $t6
beq $t2, $zero, endif
Performance : Bubble Sort
Example
14
Processors Prior to 8086
(1971) 4004 – First processor made by the Intel
Corporation. Allowed computer intelligence to
be put into small devices like cell phones, key
chains, calculators, etc.
(1972) 8008 – Twice as powerful as the 4004,
but was used in the Mark-8. Mark-8 was one of
first personal computers.
Processors Prior to 8086(cont.)
(1974) 8080 – Slight improvement on the 8008
with a more complex instruction set. Started to
mass produce for personal computers. Last
processor update before 8086.
Intel’s 8086 and 8088
(1978) 8086/8088 – Biggest improvement of
the 8-bit processors. Laid the groundwork for
the X86 architecture in processors. X86 is
still used in the newer Pentium models today.
The 8088 processor was selected by IBM to
be placed in the “IBM PC” which was their
most popular product. Skyrocketed Intel’s
stature as a company and was honored by
being named a Fortune 500 company.
Processors After 8086/8088
(1982-89) 286/386/486 – Started being able to
run multiple programs at one time and point and
click operating systems.
(1993-2001) Pentium’s 1-4 – Much faster
speeds allowed multimedia elements like voice,
sounds, and graphics to run much clearer and
faster.
The 80x86 Architecture
To learn assembly programming we need to pick a
processor family with a given ISA (Instruction Set
Architecture)
We will pick the Intel 80x86 ISA (x86 for short)
The most common today in existing computers
For instance in my laptop
We could have picked others
Old ones: Sparc, VAX
Current ones: PowerPC, Itanium, MIPS
Some courses in some curricula subject students to
two or even more ISAs, but in this course we’ll just
focused on one more in depth
Organization of 8088/8086
Address bus (20 bits)
AH AL General purpose
BH BL register
CH CL
Execution UnitDH DL
Data bus
(EU) SP
Segment
CS (16 bits)
BP register DS
SI SS
DI ALU Data bus ES
(16 bits)
IP
Bus
control
ALU Instruction Queue External bus
EU
control
Flag register
Bus Interface Unit (BIU)
20
Organization of 8088/8086
Intel 8088 facts
VDD (5V)
20 bit address bus allow accessing
1 M memory locations
16-bit internal data bus and 8-bit 20-bit
external data bus. Thus, it need 8-bit data address
two read (or write) operations to
read (or write) a 16-bit datum control 8088 control
Byte addressable and byte-swapping signals signals
To 8088 from 8088
Word: 5A2F CLK
18001 5AHigh byte of word GND
18000 2FLow byte of word
8088 signal classification
Memory locations
21
The 8086 Registers
To write assembly code for an ISA you must know
the name of registers
Because registers are places in which you put data to
perform computation and in which you find the result of the
computation (think of them as variables for now)
The registers are identified by binary numbers, but
assembly languages give them “easy-to-remember” names
The 8086 offered 16-bit registers
Four general purpose 16-bit registers
AX
BX
CX
DX
General purpose registers
AX, BX, CX, and DX: They can be
assigned to any value you want.
AX (accumulator register). Most of
arithmetical operations are done with AX.
BX (base register). Used to do array
operations. BX is usually worked with other
registers like SP to point to stacks.
CX (counter register). Used for counter
purposes.
DX (data register). Used for storing data value.
The 8086 Registers
AX BX CX DX
AH AL BH BL CH CL DH DL
Each of the 16-bit registers consists of 8 “low bits”
and 8 “high bits”
Low: least significant
High: most significant
The ISA makes it possible to refer to the low or high
bits individually
AH, AL
BH, BL
CH, CL
DH, DL
The 8086 Registers
AX BX CX DX
AH AL BH BL CH CL DH DL
The xH and xL registers can be used as 1-
byte register to store 1-byte quantities
Important: both are “tied” to the 16-bit register
Changing the value of AX will change the values
of AH and AL
Changing the value of AH or AL will change the
value of AX
Index registers
SI and DI: Usually used to process arrays or
strings:
SI (source index) is always pointed to the
source array
DI (destination index) is always pointed to
the destination array.
These are basically general-purpose registers
But by convention they are often used as “pointers”,
i.e., they contain addresses
And they cannot be decomposed into High and Low 1-
byte registers
Segment registers
CS, DS, ES, and SS:
CS (code segment register). Points to the
segment of the running program. We may NOT
modify CS directly.
DS (data segment register). Points to the
segment of the data used by the running
program. You can point this to anywhere you
want as long as it contains the desired data.
ES (extra segment register). Usually used with
DI and doing pointers things. The couple DS:SI
and ES:DI are commonly used to do string
operations.
SS (stack segment register). Points to stack
segment.
Pointer registers
BP, SP, and IP:
BP (base pointer) used for preserving
space to use local variables.
SP (stack pointer) used to point the
current stack.
IP (instruction pointer) denotes the
current pointer of the running program. It
is always coupled with CS and it is NOT
modifiable. So, the couple of CS:IP is a
pointer pointing to the current instruction
of running program. You can NOT access
CS nor IP directly.
Extended register
386 processors introduce extended
register.
Most of the registers, except segment
registers are enhanced into 32-bit.
So, we have extended registers EAX,
EBX, ECX, and so on.
AX is only the low 16-bit (bit 0 to 15) of
EAX.
There are NO special direct access to the
upper 16-bit (bit 16 to 31) in extended
register.
The 8086 Registers
The 16-bit Instruction Pointer (IP) register:
Points to the next instruction to execute
Typically not handled directly when writing assembly code
The 16-bit FLAGS registers
Information is stored in individual bits of the FLAGS
register
Whenever an instruction is executed and produces a
result, it may modify some bit(s) of the FLAGS register
Example: Z (or ZF) denotes one bit of the FLAGS register,
which is set to 1 if the previously executed instruction
produced 0, or 0 otherwise
We’ll see many uses of the FLAGS registers
Flag register
Flag is 16-bit register that contains processor
status.
It holds the value of which the programmers may
need to access. This involves detecting whether the
last arithmetic holds zero result or may be overflow.
Intel doesn't provide a direct access to it; rather it is
accessed via stack. (via POPF and PUSHF)
You can access each flag attribute by using bitwise
AND operation since each status is mostly
represented by just 1 bit.
Flag register cont.
C carry flag is turned to 1 whenever the last
arithmetical operation, such as adding and
subtracting, has carry or borrow otherwise 0.
P parity flag It will set to 1 if the last operation (any
operation) results even number of bit 1.
A auxilarry flag It is set in Binary Coded Decimal
(BCD) operations.
Z zero flag used to detect whether the last operation
(any operation) holds zero result.
S sign flag used to detect whether the last operation
holds negative result. It is set to 1 if the highest bit
(bit 7 in bytes or bit 15 in words) of the last operation
is 1.
Flag register cont.
T trap flag used in debuggers to turn on the step-by-step
feature.
I interrupt flag used to toggle the interrupt enable or not. If
the bit is set (= 1), then the interrupts are enabled,
otherwise disabled. The default is on.
D direction flag used for directions of string operations. If
the bit is set, then all string operations are done backward.
Otherwise, forward. The default is forward (= 0).
O the overflow flag used to detect whether the last
arithmetic operation result has overflowed or not. If the bit
is set, then it has been an overflow.
Flag Register
Flag register contains information reflecting the current status of a
microprocessor. It also contains information which controls the
operation of the microprocessor.
15 0
OF DF IF TF SF ZF AF PF CF
Control Flags Status Flags
IF: Interrupt enable flag CF: Carry flag
DF: Direction flag PF: Parity flag
TF:Trap flag AF: Auxiliary carry flag
ZF:Zero flag
SF: Sign flag
OF: Overflow flag
34
The 8086 Registers
AH AL = AX
BH BL = BX
CH CL = CX
DH DL = DX
SI
DI
BP
SP
IP
= FLAGS
CS
DS
SS
ES
16 bits
Control
ALU Unit
Addresses in Memory
We mentioned several registers that are used for
holding addresses of memory locations
Segments:
CS, DS, SS, ES
Pointers:
SI, DI: indices (typically used for pointers)
SP: Stack pointer
BP: (Stack) Base pointer
Let’s look at the structure of the address space
Code, Data, Stack
Although we’ll discuss these at length later,
let’s just accept for now that the address
space has three regions
address space
A program constantly references all three code
regions
Therefore, the program constantly references
bytes in three different segments
For now let’s assume that each region is fully data
contained in a single segment, which is in fact
not always the case
CS: points to the beginning of the code
segment stack
DS: points to the beginning of the data
segment
SS: points to the beginning of the stack
segment
Address Space
In the 8086 processor, a program is limited to referencing an
address space of size 1MB, that is 220 bytes
Therefore, addresses are 20-bit long!
A d-bit long address allows to reference 2d different “things”
Example:
2-bit addresses
00, 01, 10, 11
4 “things”
3-bit addresses
000, 001, 010, 011, 100, 101, 110, 111
8 “things”
In our case, these things are “bytes”
One cannot address anything smaller than a byte
Therefore, a 20-bit address makes it possible to address 220
individual bytes, or 1MB
Address Space
One says that a running program has a 1MB
address space
And the program needs to use 20-bit
addresses to reference memory content
Instructions, data, etc.
Problem: registers are at 16-bit long! How
can they hold a 20-bit address???
The solution: split addresses in two pieces:
The selector
The offset
For 20-bit Addresses
selector offset
4 bits 16 bits
On the 8086 the offset if 16-bit long
And therefore the selector is 4-bit
We have 24 = 16 different segments
Each segment is 216 byte = 64KB
For a total of 1MB of memory, which is what the
8086 used
For 20-bit Addresses
0000…
0001…
selector offset 0010…
0011…
address 4 bits 16 bits 0100…
0101…
0110…
0111…
1MB
We have 1MB of memory of
1000…
We have 64K segments memory
1001…
We have 16 segments 1010…
1011…
1100…
1101…
1110…
1111…
Memory Segmentation
A segment is a 64KB block of memory starting from any 16-byte
boundary
For example: 00000, 00010, 00020, 20000, 8CE90, and E0840 are all valid
segment addresses
The requirement of starting from 16-byte boundary is due to the 4-bit
left shifting
Segment registers in BIU
15 0
CS Code Segment
DS Data Segment
SS Stack Segment
ES Extra Segment
42
Memory Address Calculation
Segment addresses must be stored
Segment address 0000
in segment registers
Offset is derived from the combination + Offset
of pointer registers, the Instruction
Memory address
Pointer (IP), and immediate values
Examples
CS 3 4 8 A 0 SS 5 0 0 0 0
IP + 4 2 1 4 SP + F F E 0
Instruction address 3 8 A B 4 Stack address 5 F F E 0
DS 1 2 3 4 0
DI + 0 0 2 2
Data address 1 2 3 6 2
43
Fetching Instructions
Where to fetch the next instruction?
8088 Memory
CS 1 2 3 4
IP 0012 12352 MOV AL, 0
12352
Update IP
— After an instruction is fetched, Register IP is updated as follows:
IP = IP + Length of the fetched instruction
— For Example: the length of MOV AL, 0 is 2 bytes. After fetching this instruction,
the IP is updated to 0014
44
Accessing Data Memory
There is a number of methods to generate the memory address when
accessing data memory. These methods are referred to as
Addressing Modes
Examples:
— Direct addressing: MOV AL, [0300H]
DS 1 2 3 4 0 (assume DS=1234H)
0 3 0 0
Memory address 1 2 6 4 0
— Register indirect addressing: MOV AL, [SI]
DS 1 2 3 4 0 (assume DS=1234H)
0 3 1 0 (assume SI=0310H)
Memory address 1 2 6 5 0
45
In-class Exercise
Consider the byte at address 13DDE within a
64K segment defined by selector value
10DE. What is its offset?
In-class Exercise
Consider the byte at address 13DDE within a
64K segment defined by selector value
10DE. What is its offset?
13DDE = 10DE * 1610 + offset
offset = 13DDE - 10DE0
offset = 2FFE (a 16-bit quantity)
Addressing Modes
Where Are the Operands?
Operands required by an operation can be specified
in a variety of ways
A few basic ways are:
operand in a register
register addressing mode
operand in the instruction itself
immediate addressing mode
operand in memory
variety of addressing modes
direct and indirect addressing modes
operand at an I/O port
Simple IN and OUT commands
Register Addressing
Operand is in an internal register
Examples
mov EAX,EBX ; 32-bit copy
mov BX,CX ; 16-bit copy
mov AL,CL ; 8-bit copy
The mov instruction
mov destination,source
copies data from source to destination
Register Addressing
Operands of the instruction are the names of internal register
The processor gets data from the register locations specified by
instruction operands
For Example: move the value of register BL to register AL
MOV AL, BL AH AL
BH BL
If AX = 1000H and BX=A080H, after the execution of MOV AL, BL
what are the new values of AX and BX?
In immediate and register addressing modes, the processor does not access memory.
Thus, the execution of such instructions are fast.
Immediate Addressing Mode
Data is part of the instruction
Operand is located in the code segment along with the
instruction
Typically used to specify a constant
Example
mov AL,75
This instruction uses register addressing mode
for destination and immediate addressing mode
for the source
Direct Addressing Mode
Data is in the data segment
Need a logical address to access data
Two components: segment:offset
Various addressing modes to specify the offset component
offset part is called effective address
The offset is specified directly as part of instruction
We write assembly language programs using memory labels
(e.g., declared using DB, DW, LABEL,...)
Assembler computes the offset value for the label
Uses symbol table to compute the offset of a label
Direct Addressing Mode
Assembler builds a symbol table so we can refer to the
allocated storage space by the associated label
Example
.DATA name offset
value DW 0 value 0
sum DD 0 sum 2
marks DW 10 DUP (?) marks 6
message DB ‘The grade is:’,0 message 26
char1 DB ? char1 40
Direct Addressing Mode
Examples
mov AL,char1
Assembler replaces char1 by its effective address (i.e., its
offset value from the symbol table)
mov marks,56
marks is declared as
marks DW 10 DUP (0)
Since the assembler replaces marks by its effective address,
this instruction refers to the first element of marks
In C, it is equivalent to
table1[0] = 56
Direct Addressing Example
DS 10H + Displacement = Memory location
— Example: assume DS = 1000H, AX = 1234H
DS: 1 0 0 0 _
MOV [7000H], AX + Disp: 7 0 0 0
AH AL 17000
12 34 12 17001H
34 17000H
Direct Addressing Mode
Problem with direct addressing
Useful only to specify simple variables
Causes serious problems in addressing data types such as
arrays
As an example, consider adding elements of an array
Direct addressing does not facilitate using a loop structure
to iterate through the array
We have to write an instruction to add each element of the
array
Indirect addressing mode remedies this problem
Register Indirect Addressing
One of the registers BX, BP, SI, DI appears in the instruction operand
field. Its value is used as the memory displacement value.
For Example: MOV DL, [SI]
Memory address is calculated as following:
BX
SI
DS DI
10H + = Memory address
SS
BP
If BX, SI, or DI appears in the instruction operand field, segment register DS
is used in address calculation
If BP appears in the instruction operand field, segment register SS is used in
address calculation
Register Indirect Addressing
Example 1: assume DS = 0800H, SI=2000H
MOV DL, [SI] DH DL
12
0A000H 12
DS: 0 8 0 0 _
+ SI: 200 0
memory
0A0 0 0
Example 2: assume SS = 0800H, BP=2000H, DL = 7
MOV [BP], DL
Register Indirect Addressing
Using indirect addressing mode, we can
process arrays using loops
Example: Summing array elements
Load the starting address (i.e., offset) of the
array into BX
Loop for each element in the array
Get the value using the offset in BX
Use indirect addressing
Add the value to the running total
Update the offset in BX to point to the next element
of the array
Register Indirect Addressing
Loading offset value into a register
Suppose we want to load BX with the offset value of
table1
We cannot write
mov BX,table1
Two ways of loading offset value
Using OFFSET assembler directive
Executed only at the assembly time
Using lea instruction
This is a processor instruction
Executed at run time
Register Indirect Addressing
Loading offset value into a register
(cont’d)
Using OFFSET assembler directive
The previous example can be written as
mov BX,OFFSET table1
Using lea (load effective address) instruction
The format of lea instruction is
lea register,source
The previous example can be written as
lea BX,table1
Register Indirect Addressing
Loading offset value into a register
(cont’d)
Which one to use -- OFFSET or lea?
Use OFFSET if possible
OFFSET incurs only one-time overhead (at assembly time)
lea incurs run time overhead (every time you run the program)
May have to use lea in some instances
When the needed data is available at run time only
An index passed as a parameter to a procedure
We can write
lea BX,table1[SI]
to load BX with the address of an element of table1 whose
index is in SI register
We cannot use the OFFSET directive in this case
Ambiguous Indirect Operands
Consider the following instructions:
mov [EBX], 100
add [ESI], 20
inc [EDI]
Where EBX, ESI, and EDI contain memory addresses
The size of the memory operand is not clear to the
assembler
EBX, ESI, and EDI can be pointers to BYTE, WORD, or DWORD
Solution: use PTR operator to clarify the operand size
mov BYTE PTR [EBX], 100 ; BYTE operand in memory
add WORD PTR [ESI], 20 ; WORD operand in memory
inc DWORD PTR [EDI] ; DWORD operand in memory
Based Addressing
The operand field of the instruction contains a base register (BX or BP)
and an 8-bit (or 16-bit) constant (displacement)
For Example: MOV AX, [BX+4]
Calculate memory address
DS BX
10H + + Displacement = Memory address
SS BP
If BX appears in the instruction operand field, segment register DS
is used in address calculation
If BP appears in the instruction operand field, segment register SS
is used in address calculation
What’s difference between register indirect addressing and based addressing?
Based Addressing
Example 1: assume DS = 0100H, BX=0600H
AH AL
MOV AX, [BX+4] C0 B0
DS: 0 1 0 0 _ 01605H C0
+ BX: 0 6 0 0 01604H B0
+ Disp.: 0 0 0 4
01604 memory
Example 2: assume SS = 0A00H, BP=0012H, CH = ABH
MOV [BP-7], CH
Indexed Addressing
The operand field of the instruction contains an index register (SI or DI)
and an 8-bit (or 16-bit) constant (displacement)
For Example: MOV [DI-8], BL
Calculate memory address
SI
DS 10H + + Displacement = Memory address
DI
Example: assume DS = 0200H, DI=0030H BL = 17H
MOV [DI-8], BL
BH BL
DS: 0 2 0 0 _ 17
+ DI: 003 0 17 02028H
- Disp.: 0 0 0 8
02 028 memory
Based Indexed Addressing
The operand field of the instruction contains a base register (BX or BP)
and an index register
For Example: MOV [BP] [SI], AH
or MOV [BP+SI], AH
Calculate memory address
DS BX
10H + + {SI or DI} = Memory address
SS BP
If BX appears in the instruction operand field, segment register DS
is used in address calculation
If BP appears in the instruction operand field, segment register SS
is used in address calculation
Based Indexed Addressing
Example 1: assume SS = 2000H, BP=4000H, SI=0800H, AH=07H
AH AL
MOV [BP] [SI], AH 07
SS: 2 0 0 0 _ 24800H 07
+ BP: 4 0 0 0
+ SI.: 080 0
24800 memory
Example 2: assume DS = 0B00H, BX=0112H, DI = 0003H, CH=ABH
MOV [BX+DI], CH
Based Indexed with Displacement Addressing
The operand field of the instruction contains a base register (BX or BP),
an index register, and a displacement
For Example: MOV CL, [BX+DI+2080H]
Calculate memory address
DS BX
10H + + {SI or DI} + Disp. = Memory address
SS BP
If BX appears in the instruction operand field, segment register DS
is used in address calculation
If BP appears in the instruction operand field, segment register SS
is used in address calculation
Based Indexed with Displacement Addressing
Example 1: assume DS = 0300H, BX=1000H, DI=0010H
CH CL
MOV CL, [BX+DI+2080H] 20
DS: 0 3 0 0 _
+ BX: 1 0 0 0 06090H 20
+ DI.: 0010
+ Disp. 2 0 8 0
memory
06090
Example 2: assume SS = 1100H, BP=0110H, SI = 000AH, CH=ABH
MOV [BP+SI+0010H], CH
Summary of Addressing Modes
Assembler converts a variable name into a
constant offset (called also a displacement)
For indirect addressing, a base/index
register contains an address/index
CPU computes the effective
address of a memory operand
Variables in
Assembly
Variables in Assembly
Note:
The assembly language is NOT
case-sensitive.
A comment in assembly begins with
a semicolon (;). Everything after a
semicolon until the end of the line is
ignored.
Data Allocation
Variable declaration in a high-level language such as C
char response
int value
float total
double average_value
specifies
Amount storage required (1 byte, 2 bytes, …)
Label to identify the storage allocated (response, value, …)
Interpretation of the bits stored (signed, floating point, …)
Bit pattern 1000 1101 1011 1001 is interpreted as
-29,255 as a signed number
36,281 as an unsigned number
Data Allocation (cont’d)
In assembly language, we use the define directive
Define directive can be used
To reserve storage space
To label the storage space
To initialize
But no interpretation is attached to the bits stored
Interpretation is up to the program code
Define directive goes into the .DATA part of the assembly
language program
Define directive format
[var-name] D? init-value [,init-value],...
Variables Declaration
Our ideal syntax (TASM based) looks like this:
.MODEL SMALL
.STACK 200
.DATA
; data definitions using DB, DW, DD, etc. come here
.CODE
START: MOV AX , @DATA ; Initialize DS
MOV DS , AX ;
...
; Return to DOS
MOV AX , 4C00H
INT 21H
END START
Data Allocation (cont’d)
Five define directives
DB Define Byte ;allocates 1 byte
DW Define Word ;allocates 2 bytes
DD Define Doubleword ;allocates 4 bytes
DQ Define Quadword ;allocates 8 bytes
DT Define Ten bytes ;allocates 10 bytes
Examples
sorted DB ’y’
response DB ? ;no initialization
value DW 25159
Data Allocation (cont’d)
Multiple definitions can be abbreviated
Example
message DB ’B’
DB ’y’
DB ’e’
DB 0DH
DB 0AH
can be written as
message DB ’B’,’y’,’e’,0DH,0AH
More compactly as
message DB ’Bye’,0DH,0AH
Data Allocation (cont’d)
Multiple definitions can be cumbersome to initialize data
structures such as arrays
Example
To declare and initialize an integer array of 8 elements
marks DW 0,0,0,0,0,0,0,0
What if we want to declare and initialize to zero an array
of 200 elements?
There is a better way of doing this than repeating zero 200 times
in the above statement
Assembler provides a directive to do this (DUP directive)
Data Allocation (cont’d)
Multiple initializations
The DUP assembler directive allows multiple initializations to
the same value
Previous marks array can be compactly declared as
marks DW 8 DUP (0)
Examples
table1 DW 10 DUP (?) ;10 words, uninitialized
message DB 3 DUP (’Bye!’) ;12 bytes, initialized
; as Bye!Bye!Bye!
Name1 DB 30 DUP (’?’) ;30 bytes, each
; initialized to ?
Data Allocation (cont’d)
The DUP directive may also be nested
Example
stars DB 4 DUP(3 DUP (’*’),2 DUP (’?’),5 DUP (’!’))
Reserves 40-bytes space and initializes it as
***??!!!!!***??!!!!!***??!!!!!***??!!!!!
Example
matrix DW 10 DUP (5 DUP (0))
defines a 10X5 matrix and initializes its elements to 0
This declaration can also be done by
matrix DW 50 DUP (0)
Data Allocation (cont’d)
Correspondence to C Data Types
Directive C data type
DB char
DW int, unsigned
DD float, long
DQ double
DT internal intermediate
float value
Variables Declaration
Variable Limits and Negative Values
Declaration Acronym Length Limit
db define byte 1 byte 0-255
dw define word 2 bytes 0-65535
dd define double 4 bytes 0-4294967295
You can assign the variables as negative values, too.
However, assembler will convert them to the corresponding
positive value. For example: If you assign -1 to a db
variable, assembler will convert them to positive 255
integers.
2’s complement
Defining BYTE
Each of the following defines a single byte of storage: Physical Address
value1 DB 'A'; character constant 80000
value2 DB 0; smallest unsigned byte 80001
value1 41H 80002
value3 DB 255; largest unsigned byte
value2 0 80003
value4 DB -128; smallest signed byte value3 FF H 80004
value5 DB +127; largest signed byte value4 80 H 80005
value5 7F H 80006
value6 DB ?; uninitialized byte value6 ? 80007
80008
80009
A variable name is a data label that implies an offset (an address).
Defining Bytes
Physical Address
80000
Examples that use multiple initializers:list1 10
80001
80002
20 80003
30 80004
list1 DB 10,20,30,40 40 80005
list2 10 80006
list2 DB 10,20,30,40 20 80007
30 80008
DB 50,60,70,80 40 80009
50 8000A
DB 81,82,83,84 60 8000B
70 8000C
list3 DB ?,32,41h,00100010b 80 8000D
81 8000E
list4 DB 0Ah,20h,‘A’,22h 82 8000F
83 80010
84 80011
list3 ? 80012
32 80013
41H 80014
22H 80015
list4 0A 80016
20H 80017
22H 80018
Defining Strings (1 of 3)
A string is implemented as an array of characters Physical Address
For convenience, it is usually enclosed in quotation marks 80000
It usually has a null byte at the end 80001
str1 E 80002
Examples: N 80003
T 80004
E 80005
R 80006
80007
Y 80008
O 80009
U 8000A
str1 DB "Enter your name", ’$’ R 8000B
str2 DB 'Error: halting program', ’$’ 8000C
N 8000D
str3 DB 'A','E','I','O','U' A 8000E
greeting DB "Welcome to the Encryption Demo program " M 8000F
E 80010
DB "created by someone.", ’$’ $ 80011
str2 E 80012
R 80013
R 80014
O 80015
R 80016
: 80017
80018
Defining Strings (2 of 3)
To continue a single string across multiple
lines, end each line with a comma:
menu DB "Checking Account",0dh,0ah,0dh,0ah,
"1. Create a new account",0dh,0ah,
"2. Open an existing account",0dh,0ah,
"3. Credit the account",0dh,0ah,
"4. Debit the account",0dh,0ah,
"5. Exit",0ah,0ah,
"Choice> ", ’$’
Defining Strings (3 of 3)
End-of-line character sequence:
0Dh = carriage return
0Ah = line feed
str1 DB "Enter your name: ",0Dh,0Ah
DB "Enter your address: ",’$’
newLine DB 0Dh,0Ah, ’$’
Idea: Define all strings used by your program in the same
area of the data segment.
Using the DUP Operator
Use DUP to allocate (create space for) an array or
string.
Counter and argument must be constants or constant
expressions
var1 DB 5 DUP(0) ; 20 bytes, all equal to zero
var2 DB 4 DUP(?) ; 20 bytes, uninitialized
var3 DB 4 DUP("STACK") ; 20 bytes: "STACKSTACKSTACKSTACK"
var4 DB 10,3 DUP(0),20
Physical Address
80000
80001
var1 DB 5 DUP(0) var1 0 80002
0 80003
var2 DB 4 DUP(?) 0 80004
var3 DB 2 DUP("STACK") 0 80005
0 80006
var4 DB 10,3 DUP(0),20 var2 ? 80007
? 80008
? 80009
? 8000A
var3 S 8000B
T 8000C
A 8000D
C 8000E
K 8000F
S 80010
T 80011
A 80012
C 80013
K 80014
var4 10 80015
0 80016
0 80017
0 80018
20
Defining DW
Define storage for 16-bit integers
or double characters
single value or multiple values
word1 DW 1234H ; largest unsigned value
word2 DW -1 ; smallest signed value
word3 DW ? ; uninitialized, unsigned
word4 DW "AB" ; double characters
myList DW 1,2,3,4,5 ; array of words
array DW 5 DUP(?) ; uninitialized array
Physical Address
80000
80001
word1 34 80002
12 80003
word2 FF 80004
FF 80005
word3 ? 80006
? 80007
word1 DW 1234H word4 B 80008
word2 DW -1 A 80009
myList 01 8000A
word3 DW ? 00 8000B
word4 DW "AB" 02 8000C
myList DW 1,2,3,4,5 00 8000D
03 8000E
array DW 5 DUP(?) 00 8000F
04 80010
00 80011
05 80012
00 80013
array ? 80014
? 80015
? 80016
? 80017
? 80018
?
Defining DD
Storage definitions for signed and unsigned 32-bit
integers:
val1 DD 12345678h ; unsigned
val2 DD –1 ; signed
val3 DD 20 DUP(?) ; unsigned array
val4 DD –3,–2,–1,0,1 ; signed array
Physical Address
80000
80001
val1 78 80002
val1 DD 12345678h 56 80003
val2 DD –1 34 80004
val3 DD 20 DUP(?) 12 80005
val2 FF 80006
val4 DD –3,–2,–1,0,1 FF 80007
FF 80008
FF 80009
Val3[0] val3 ? 8000A
? 8000B
? 8000C
? 8000D
Val3[1] Val3+4 ? 8000E
? 8000F
? 80010
? 80011
Val3+8
Val3[2] ? 80012
? 80013
? 80014
? 80015
Val3[3] Val3+12 ? 80016
? 80017
? 80018
?
Defining QB, TB
Storage definitions for quadwords, tenbyte values,
and real numbers:
quad1 DQ 1234567812345678h
val1 DT 1000000000123456789Ah
Little Endian Order
All data types larger than a byte store their individual
bytes in reverse order. The least significant byte occurs
at the first (lowest) memory address.
Example:
val1 DD 12345678h
EQU Directive
Define a symbol as either an integer or
text expression.
Cannot be redefined
PI EQU <3.1416>
pressKey EQU <"Press any key to continue...",0>
.data
prompt DB pressKey
Moving Around Values
If you need to do some calculations or commands
involving the variables you'll have to load the variable
values to the registers.
The syntax of the mov command is mov a , b . which
means assign b to a
Var1
Var2
Reg 1
mov ax, [var2]
MM mov [var1],ax
Reg 2
Caveats in MOVs
You CANNOT use mov [var1], [var2].
In other words, mov command cannot transfer
values between two variables directly. So, how can
we get around with this? Use the register.
Suppose both var1 and var2 are word
variables. We can use any word registers (AX,
BX, CX, DX, and so on) to do the transfer.
Suppose we use AX.
Thus, mov [var1], [var2] must be transformed into:
mov ax, [var2]
mov [var1],ax
Moving Around Values example
:
jmp start
our_var dw 10
start: The square brackets [ ] are to
mov bx, [our_var] distinguish the variable from its
mov cx, bx address.
mov [our_var], cx
mov ax, 4c00h
int 21h
end
Moving Around Values cont.
When we deal with byte variables (i.e. db), we need to use
byte registers (e.g. AL, AH, BL, BH, and so on) to do our
bidding.
AX, BX, CX, DX, and so on are word registers.
You can use double-word registers which is available in 80386
processors or better (use p386n instead of p286n to enable
double-word registers).
The double-word registers includes EAX, EBX, ECX, EDX, and
so on.
We can assign variables with constants with mov instruction.
However, this will work only with 80286 or better processors:
mov [word ptr our_var], 1
Notice the word ptr modifier must be used when you assign
constants to variables. Since our_var is a word variable, we
need to use word ptr modifier. Likewise, byte variable uses
byte ptr modifier and double-word variable uses dword ptr.
Moving Around Values
example
Notice the way that Intel assembler
store a word value.
It stores the least significant byte first,
then the most significant byte later.
Moving Around Values cont.
Recall that variables in assembly are
treated as addresses.
AX <= 0502h
Moving Around Values cont.
Double-word variables are also stored
similarly (i.e. bottom-up, flipped like the word
variables)
my_var dd 1234BABEh
Impacts on Registers
Recall that the word register AX consists of
AH and AL.
Modifying either AH or AL will modify the
contents of AX.
Likewise, modifying AX will be likely modify
AH and AL.
Question Marks on Variables
If you are not certain about the default
value of a variable you can give a
question mark ("?") instead. For
example:
another_var dw ?
String Variables
You can define strings variables in
assembly. It is as follows:
message db "Hello World!$“
String variables are required to be stored
as db variables. The string is then
surrounded by quotes, either single or
double, up to you.
String Variables
Why do we have to end our string with a dollar sign
("$")?
Well, some of the old DOS services require us to
do so. However, some of the system may require
you to end it by zero ASCII code instead:
message db "Hello World!",0
each characters of the string is converted to its
corresponding ASCII code.
Another thing to remember in string variables is
that the string ASCII codes are NOT flipped as it
usually is in normal variables.
Size of Data
Remember that labels merely declare an
address in the data segment, and do not
specify any size
Size of data is inferred based on the source
or destination register
mov eax, [L] ; loads 32 bits
mov al, [L] ; loads 8 bits
mov [L], eax ; stores 32 bits
mov [L], ax ; stores 16 bits
This is why it’s really important to know the
names of the x86 registers
Size Reduction
Sometimes one needs to decrease the data size
For instance, you have a 4-byte integer, but you needs to use it as a
2-byte integer for something
We simply uses the registers: when moving a quantity from an X-bit
registers to a Y-bit register (Y < X), the highest (X-Y) bits are simply
removed
Example:
mov ax, [L] ; loads 16 bits in ax
mov bl, ax ; takes the lower 8 bits of ax and puts them in bl
Equivalent to “mov bl, al”
al
ax
bl
Size Reduction
Of course, when doing a size reduction, one loses
information
So the “conversion” may not work
Example:
mov ax, 000A2h ; ax = 162 decimal
mov bl, ax; ; bl = 162 decimal
Decimal 162 is encodable on 8 bits
Example:
mov ax, 00101h ; ax = 257 decimal
mov bl, ax; ; bl = 1 decimal
Decimal 257 is not encodable on 8 bits
Size Reduction and Sign
Consider a 2-byte quantity: FFF4
If we interpret this quantity as unsigned it is decimal 65,524
Remember that the computer does not know whether the
content of registers/memory corresponds to signed or unsigned
quantities
Once again it’s the responsibility of the programmer to do the
right thing
In this case size reduction “does not work”, meaning that
reduction to a 1-byte quantity will not be interpreted as
decimal 65,524, but instead as decimal 244 (F4h)
If instead FFF4 is a signed quantity (using 2’s complement),
then it corresponds to -000C (000B + 1), that is to decimal -12
In this case, size reduction works!
Size Reduction and Sign
This does not mean that size reduction always
works for signed quantities
For instance, consider FF32h, which is a negative
number equal to -00CEh, that is, decimal -206
A size reduction into a 1-byte quantity leads to 32h,
which is decimal +50!
Note that -206 is not encodable on 1 byte
The range of signed 1-byte quantities is between decimal
-128 and decimal +127
So, size reduction may work or not work for signed
or unsigned quantities!
Two Rules to Remember
For unsigned numbers: size reduction works if all removed
bits are 0
0 0 0 0 0 0 0 0 X X X X X X X X
X X X X X X X X
For signed numbers: size reduction works if all removed bits
are all 0’s or all removed bits are all 1’s, AND if the highest bit
not removed is equal to the removed bits
This highest remaining bit is the new sign bit, and
thus must be the same as the original sign bit
1 1 1 1 1 1 1 1 1 X X X X X X X
1 X X X X X X X
Size Increase
Size increase for unsigned quantities is
simple: just add 0s to the left of it
Size increase for signed quantities requires
sign extension: the sign bit must be
extended, that is, replicated
Consider the signed 1-byte number 5A. This
is a positive number (decimal 90), and so its
2-byte version would be 005A
Consider the signed 1-byte number 8A. This
is a negative number (decimal -118), and so
its 2-byte version would be FF8A
Unsigned size increase
Say we want to size increase an unsigned 1-
byte number to be a 2-byte unsigned number
This can be done in a few easy steps, for
instance:
Put the 1-byte number into al
Set all bits of ah to 0
Access the number as ax
Example
mov al, 0EDh
mov ah, 0
move ..., ax
Unsigned size increase
How about increasing the size of a 2-byte quantity to 4 byte?
This cannot be done in the same manner because there is no
way to access the 16 highest bit of register eax separately!
AX
AH AL = EAX
Therefore, there is an instruction called movzx (Zero eXtend),
which takes two operands:
Destination: 16- or 32-bit register
Source: 8- or 16-bit register, or 1 byte of memory, or a
word of memory
The destination must be larger than the source!
Using movzx
movzx eax, ax ; extends ax into eax
movzx eax, al ; extends al into eax
movzx ax, al ; extends al into eax
movzx ebx, ax ; extends ax into ebx
movzx ebx, [L] ; gives a “size not specified”
error
movzx ebx, byte [L] ; extends 1-byte value
at address L into ebx
movzx eax, word [L]; extends 2-byte value
at address L into eax
Signed Size Increase
There is no way to use mov or movzx instructions to increase the
size of signed numbers, because of the needed sign extension
Four “old” conversion instructions with implicit operands
CBW (Convert Byte to Word): Sign extends AL into AX
CWD (Convert Word to Double): Sign extends AX into DX:AX
DX contains high bits, AX contains low bits
a left-over instruction from the time of the 8086 that had no 32-bit registers
CWDE (Convert Word to Double word Extended): Sign extends AX into
EAX
CDQ (Convert Double word to Quad word): Signs extends EAX into
EDX:EAX (implicit operands)
EDX contains high bits, EAX contains low bits
This is really a 64-bit quantity (and we have no 64-bit register)
The much more popular MOVSX instruction
Works just like MOVZX, but does sign extension
CBW equiv. to MOVSX ax, al
CWDE equiv. to MOVSX eax, ax
Example
mov al 0A7h ; as a programmer, I view this
; as a unsigned, 1-byte quantity
; (decimal 167)
mov bl 0A7h ; as a programmer, I view this
; as a signed 1-byte
; quantity (decimal -89)
movzx eax, al; ; extend to a 4-byte value
; (000000A7)
movsx ebx, bl; ; extend to a 4-byte value
; (FFFFFFA7)
Data Transfer Instructions
We will look at three instructions
mov (move)
Actually copy
xchg (exchange)
Exchanges two operands
xlat (translate)
Translates byte values using a translation table
Other data transfer instructions such as
movsx (move sign extended)
movzx (move zero extended)
Data Transfer Instructions (cont’d)
The mov instruction
The format is
mov destination,source
Copies the value from source to destination
source is not altered as a result of copying
Both operands should be of same size
source and destination cannot both be in memory
Most Pentium instructions do not allow both operands to be
located in memory
Pentium provides special instructions to facilitate memory-
to-memory block copying of data
Data Transfer Instructions (cont’d)
The mov instruction
Five types of operand combinations are
allowed:
Instruction type Example
mov register,register mov DX,CX
mov register,immediate mov BL,100
mov register,memory mov BX,count
mov memory,register mov count,SI
mov memory,immediate mov count,23
The operand combinations are valid for all
instructions that require two operands
Data Transfer Instructions (cont’d)
Source Operand Destination Operand
General Segment Memory Constant
Register Register Location
General Register Yes Yes Yes No
Segment Register Yes No Yes No
Memory Location Yes Yes No No
Constant Yes No Yes No
Data Transfer Instructions (cont’d)
Ambiguous moves: PTR directive
For the following data definitions
.DATA
table1 DW 20 DUP (0)
status DB 7 DUP (1)
the last two mov instructions are ambiguous
mov BX,OFFSET table1
mov SI,OFFSET status
mov [BX],100
mov [SI],100
Not clear whether the assembler should use byte or word
equivalent of 100
Data Transfer Instructions (cont’d)
Ambiguous moves: PTR directive
The PTR assembler directive can be used to clarify
The last two mov instructions can be written as
mov WORD PTR [BX],100
mov BYTE PTR [SI],100
WORD and BYTE are called type specifiers
We can also use the following type specifiers:
DWORD for doubleword values
QWORD for quadword values
TWORD for ten byte values
Data Transfer Instructions (cont’d)
The xchg instruction
The syntax is
xchg operand1,operand2
Exchanges the values of operand1 and operand2
Examples
xchg EAX,EDX
xchg response,CL
xchg total,DX
Without the xchg instruction, we need a temporary
register to exchange values using only the mov
instruction
Data Transfer Instructions (cont’d)
The xchg instruction
The xchg instruction is useful for conversion of 16-bit
data between little endian and big endian forms
Example:
mov AL,AH
converts the data in AX into the other endian form
Pentium provides bswap instruction to do similar
conversion on 32-bit data
bswap 32-bit register
bswap works only on data located in a 32-bit register
Data Transfer Instructions (cont’d)
The xlat instruction
The xlat instruction translates bytes
The format is
xlatb
To use xlat instruction
BX should be loaded with the starting address of the translation table
AL must contain an index in to the table
Index value starts at zero
The instruction reads the byte at this index in the translation table and
stores this value in AL
The index value in AL is lost
Translation table can have at most 256 entries (due to AL)
The contents of the byte that is AL bytes from the start of the
translation table pointed to by DS:BX is copied into AL, i.e.,
the effect of XLAT is equivalent to the invalid statement:
MOV AL , [BX + AL]
Data Transfer Instructions (cont’d)
The xlat instruction
Example: Encrypting digits
Input digits: 0 1 2 3 4 5 6 7 8 9
Encrypted digits: 4 6 9 5 0 3 1 8 7 2
.DATA
xlat_table DB ’4695031872’
...
.CODE
mov BX,OFFSET xlat_table
GetCh AL
sub AL,’0’ ; converts input character to index
xlatb ; AL = encrypted digit character
PutCh AL
...
Arithmetic Instructions
Addition and Subtraction
Two instructions used for additions and subtractions: add and sub
Both instructions can be used on a pair of signed numbers or on a
pair of unsigned numbers
One of the big advantaged of 2’s complement storage
No mixing of signed and unsigned numbers
IMPORTANT: The CPU does not know whether numbers stored in
registers are signed or unsigned!
You, the programmer, must keep your own interpretation of the number
consistent throughout your program
The CPU will happy add whatever registers together using binary
addition
These two instructions each may set some bits of the FLAG
register:
The carry bit
The overflow bit
The zero bit (=1 if the result is equal to zero)
The sign bit (=1 if the result is negative)
The Magic of 2’s Complement
I have two 1-byte values, A3 and 17, and I add them together:
A3 + 17 = BA
If my interpretation of the numbers is unsigned:
A3h = 163d
17h = 23d
BAh = 186d
and indeed, 163d + 23d = 186d
If my interpretation of the numbers is signed:
A3h = -93d
17h = 23d
BAh = -70d
and indeed, -93d + 23d = -70d
So, as long as I stick to my interpretation, the binary addition does
the right thing!!
Same thing for the subtraction
This is why the computer does not need to know whether register
contents are signed or unsigned
Overflow???
Generally speaking, overflow occurs when the result of an arithmetic
operation generates a result that’s “out of range”
This happens because a register has a limited number of bits, which
means that our interpretation of a number comes with a valid range
For instance
adding 1-byte unsigned quantity 240d to 1-byte unsigned quantity 100d
will lead to an overflow because 340d > 255d
subtracting 1-byte unsigned quantity 240d from 1-byte unsigned quantity
100d will lead to an overflow because -140d < 0d
adding 1-byte signed quantity 100d to 1-byte signed quantity 120d will
lead to an overflow because 220d > 127d
etc.
Question: how do we detect overflow in a program?
Important otherwise we could be working with bogus numbers
It depends on whether numbers are signed or unsigned...
Overflow for Unsigned Operations
There is an overflow with an unsigned operation
(i.e., on unsigned quantities) if the carry bit is set
If the carry bit is set, that means we’d need a larger
quantity to hold the result
This also works for subtractions (instead of a carry, we
have a “borrow”, but it’s still set in the carry bit)
1-byte Example (all in hex):
FF + 02 Carry is set (result would be 101h)
255 + 2 > 255
01 - 02 Carry is set (result cannot be negative)
1-2<0
8A - 0F Carry is not set (result is 7Bh)
138 - 15 = 123
Overflow for Signed Operations
There is an overflow with a signed operation (i.e., on
signed quantities) if the overflow bit is set
This bit is set when the sign of the result does not agree
with the signs of the operands, which would be annoying
for the programmer to check by hand
1-byte Example (all in hex, same as before):
FF + 02 Overflow is not set (result is 01h)
-1 + 2 = +1
01 - 02 Overflow is not set (result is FFh)
1 - 2 = -1
8A - 0F Overflow is set (result would be < 80h)
8A is negative, and is equal to -76h = -118d
-118 - 15 < -128, and thus cannot be represented as a 1-byte
signed quantity
In-Class Exercise
Which of these operations set the Carry bit to 1? (presumably
we care because we think of these as unsigned operations)
0F12 + F212 (2-byte quantities)
00E3 + F74F (2-byte quantities)
F1 - FA (1-byte quantities)
FB12 - A3AA (2-byte quantities)
A314 - B010 (2-byte quantities)
Which of these operations set the Overflow bit to 1?
(presumably we care because we think of these as signed
operations)
00E3 + FF4F (2-byte quantities)
F1 - 7A (1-byte quantities)
In-Class Exercise
Which of these operations set the Carry bit to 1?
0F12
+ F212
= 10124 Carry bit is set
00E3
+ F74F
= F832 Carry bit is not set
F1 - FA: F1 < FA Carry bit is set
FB12 - A3AA: FB12 > A3AA Carry bit is not set
A314 - B010: A314 < B010 Carry bit is set
In-Class Exercise
Which of these operations set the Overflow bit to 1?
00E3 + FF4F
00E3 > 0, equal to decimal +251
FF4F < 0, 2’s complement = 00B0+1 = B1, equal do decimal -177
+251 - 177 = 74
2 byte unsigned numbers are in [-32,768, +32,767]
Overflow bit is not set
F1 - 7A
F1 < 0, 2’s complement = 0E+1 = 0F, equal to decimal -15
7A > 0, equal to 122
-15 - 122 = -137
1-byte unsigned numbers are in [-128,+127]
Overflow bit is set
Overflow is your Responsibility
The processor merely computes bits and puts
them into the destination location as if
everything were fine, and it’s your
responsibility to check the overflow!
Let’s look at two examples
An unsigned arithmetic example
A signed arithmetic example
Note that we will see later how to “check” the
Carry bit and the Overflow bit in the FLAGS
register
Unsigned Overflow
On web site as
ics312_overflow_unsigned.asm
mov al, 0F0h ; al = F0h
mov bl, 0A3h ; bl = A3h
add al, bl ; al = al + bl
movzx eax, al ; increase size for printing
call print_int ; print al as an integer
As a programmer we decided to do some computation with unsigned values
We put value F0h in al (unsigned F0h is decimal 240)
We put value A3h in bl (unsigned A3h is decimal 163)
We add them together
The “true” result should be decimal 240+163 = 403, which cannot be encoded on 8
bits (should be < 255)
But the processor just goes ahead: F0 + A3 = 193h, and then drops the leftmost bits
to truncate to a 1-byte value to get 93h!
Therefore, when we call print_int, we print the decimal value 00000093, that is: 147!
This is obviously wrong, and we can tell (or will be able to shortly) because the carry
bit is in fact set to 1
Note that this is all correct if we assume signed values and replace movzx by movsx,
but then our initial interpretation of the two values is different
Signed Overflow
On web site as
ics312_overflow_signed.asm
mov al, 09Ah ; al = 9Ah
mov bl, 073h ; bl = 73h
sub al, bl ; al = al - bl
movsx eax, al ; increase size for printing
call print_int ; print al as an integer
As a programmer we decided to do some computation with signed values
We put value 9Ah in al (signed 9Ah is decimal -102)
We put value 73h in bl (signed 73h is decimal +115)
We subtract bl from al
The “true” result should be decimal -102 - 115 = -217, which cannot be encoded on 8
bits (should be >= -128)
But the processor just goes ahead: 9A - 73 = 27h
Therefore, when we call print_int, we print the decimal value 00000027, that is: 39!
This is obviously wrong, and we can tell (or will be able to shortly) because the
overflow bit is in fact set to 1
Note that this is all correct if we assume unsigned values and replace movsx by
movzx, but then our initial interpretation of the two values is different
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
What does this program print?
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
AL E1 BL A2
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
AL 83 BL A2
E1
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
AL
EAX 00 00 00 83 BL A2
E1
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
AL
EAX 00 00 00 83 BL A2
E1 prints out: 131
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
AL
EAX FF FF FF A2 BL A2
E1
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int
AL
EAX FF FF FF A2 BL A2
FFFFFFA2 is a negative number prints out: -94
2’s complement: (0000005D+1) = 5E
= decimal 94
Addition and Subtraction
You may actually add or subtract variables
with constants. But don't forget to add the
word ptr or dword ptr as appropriate.
If the result of an addition overflows, the
carry flag is set to 1, otherwise it is 0.
Similarly, if the result of subtraction
requires a borrow, then the carry flag is
also set to 1, otherwise it is 0.
Addition and Subtraction
Suppose you'd like to add a 32-bit integers
with 16-bit registers.
Intel processor has a special instruction
called adc.
For the subtraction, we have similar
instruction called sbb.
Multiplication and Division
Multiplication and division always assume AX
as the place holder.
If there is an overflow in multiplication, the
overflow flag will be set.
Note: mul and div will treat every numbers as
positive. If you have negative values, you'll
need to replace them imul and idiv
respectively.
Multiplication
There are two instructions to perform
multiplications
Multiplying unsigned numbers: mul
Multiplying signed numbers: imul
Why do we need two different instructions?
Consider the multiplication of FF by FF
If we assume unsigned quantities, this is 255*255
= 65035 = FE0Bh
If we assume signed quantities, this is -1 * -1 = 1
= 0001h
The mul Instruction
The size of the result of the multiplication is sometimes twice
larger than the size of the operands
Multiplications just leads to much bigger numbers than additions
At most the result will be twice the size of the operands (255 *
255 = 65,025, which is encodable on 2 bytes)
The oldest form of multiplication is the “mul” instruction, which
produce a result twice the size of its operand
mul <register or memory reference>
If the operand is a byte, then it is multiplied by AL and the result
is stored in (16-bit) AX
If the operand is 16-bit, it is multiplied by AX and stored in (32-
bit) DX:AX
There used to be no 32-bit registers
If the operand is 32-bit, it is multiplied by EAX and the result is
stored in (64-bit) EDX:EAX
We don’t have 64-bit registers on a 32-bit architecture
The imul instruction
Imul, which is used for signed numbers has three formats:
imul src
imul dst, src1
imul dst, src1, src2
The different combinations are shown in Table 2.2 in the text
book
This table uses the typical way in which one specifies
operands:
reg16: a 16-bit register
reg32: a 32-bit register
immed8: an 8-bit immediate operand (i.e., a number)
mem16: a word of memory
etc.
Let’s look at the table
The imul instruction
dst src1 src2 action
Will not
overflow reg/mem8 AX = AL * src1
(although the reg/mem16 DX:AX = AX * src1
overflow bit may
reg/mem32 EDX:EAX = EAX * src1
be set)
reg16 reg/mem16 dst *= src1
reg32 reg/mem32 dst *= src1
reg16 immed8 dst *= immed8
reg32 immed8 dst *= immed8
reg16 immed16 dst *= immed16
reg32 immed32 dst *= immed32
reg16 reg/mem16 immed8 dst = src1*src2
reg32 reg/mem32 immed8 dst = src1*src2
reg16 reg/mem16 immed16 dst = src1*src2
reg32 reg/mem32 immed32 dst = src1*src2
Division
Two instruction:
div for unsigned quantities
idiv for signed quantities
They perform integer division
e.g.,: 19 / 4 produces quotient = 4 remainder = 3
Only one format for both:
div/idiv src
If src is an 8-bit quantity:
AX is divided by src
quotient stored into AL
remainder stored into AH
If src is a 16-bit quantity:
DX:AX is divided by src
quotient stored into AX
remainder stored into DX
Division
If src is a 32-bit quantity:
EDX:EAX is divided by src
quotient stored into EAX
remainder stored into EDX
Warning: it’s very common for programmers
to forget initializing DX or EDX before the
division
Negation
There is a convenient instruction to negate an
operand: neg
It simply computes the 2’s complement of a
quantity
Works on 8-bit, 16-bit, or 32-bit quantities
either in registers or in memory
We’ll ignore the content of Section 2.1.5 in
the textbook
Increment and Decrement
Often times, we'd like to incrementing something
by 1 or decrement thing by 1.
You can use add x, 1 or sub x, 1 if you'd like to,
but Intel x86 assembly has a special instruction
for them.
Instead of add x, 1 we use inc x. These are
equivalent.
Likewise in subtraction, you can use dec x for
substitution.
Beware that neither inc nor dec instruction sets
the carry flag as add and sub do.
Tip
The arithmetic operations can have special
properties.
For example: add x, x is actually equal to
multiplying x by 2.
Similarly, sub x, x is actually setting x to 0.
In 8086 processor, these arithmetic is faster
than doing mul or doing mov x, 0. Even
more, its code size is smaller.
Bitwise Operation
Why bit operations
Assembly languages all provide ways to
manipulate bits
Some of the coolest “tricks” in assembly rely
on bit operations
Only a few instructions can do a lot very quickly
using judicious bit operations
Let’s look at some of the common operations,
starting with shifts
logical shifts
arithmetic shifts
rorate shifts
Shift Operations
A shift moves the bits around in some data
A shift can be toward the left (i.e., toward the
most significant bits), or toward the right (i.e.,
toward the least significant bits)
There are two kinds of shifts:
Logical Shifts
Arithmetic Shifts
Logical Shifts
The simplest shifts: bits disappear at one end
and zeros appear at the other
original byte 1 0 1 1 0 1 0 1
left log. shift 0 1 1 0 1 0 1 0
left log. shift 1 1 0 1 0 1 0 0
left log. shift 1 0 1 0 1 0 0 0
right log. shift 0 1 0 1 0 1 0 0
right log. shift 0 0 1 0 1 0 1 0
right log. shift 0 0 0 1 0 1 0 1
Logical Shift Instructions
Two instructions: shl and shr
For each you can specify by how many bits you want to do
the shift
Either by just passing a constant to the instruction
Or by using whatever is stored in the CL register
After the instruction executes, the carry flag (CF) contains the
(last) bit that was shifted out
Example:
mov al, 0C6h ; al = 1100 0110
shl al, 1 ; al = 1000 1100 (8Ch) CF=1
shr al, 1 ; al = 0100 0110 (46h) CF=0
shl al, 3 ; al = 0011 0000 (30h) CF=0
mov cl, 2
shr al, cl ; al = 0000 1100 (0Ch) CF=0
Shifts and Numbers
The main use for shifts: quickly multiply and divide by powers of 2
In decimal
multiplying 0013 by 10 amounts to doing one left shift to 0130
multiplying by 100 amounts to doing two left shifts to 1300
In binary
multiplying by 00101 by 2 amounts to doing a left shift to 01010
multiplying by 4 amounts to doing two left shifts to 10100
If numbers are too large, then we’d need more bits and multiplication
doesn’t produce valid results
e.g., 10000000 (128d) cannot be left-shifted to obtain 256 using 8-bit values
Similarly, dividing by powers of two amounts to doing right shifts:
right shifting 10010 (18d) leads to 01001 (9d)
Note that when dividing odd numbers by two we “lose bits”, which amounts
to rounding to the lower integer quotient
Consider number 10011 (19d)
Right shift: 01001 (9d - rounded below)
Right shift: 00100 (4d - rounded below)
Shifts and Unsigned Numbers
Using the shifts works only for unsigned numbers
When numbers are signed, the shifts do not handle
the sign bits correctly and cannot be interpreted as
multiplying/dividing by powers of 2 anymore
Example: Consider the 1-byte number FE
If Unsigned:
FE = 254d = 11111110b
right shift: 01111111b = 7Fh = 127d (which is 254/2)
In Signed:
FE = - 2d = 11111110b
right shift: 0111111b = 7Fh = +127d (which is NOT -2/2)
Arithmetic Shifts
Since the logical shifts do not work for signed
numbers, we have another kind of shifts called
arithmetic shifts
Left shift: sal
This instruction works just like shl
As long as the sign bit is not changed by the shift, the
result will be correct (i.e., will be multiplied by 2)
Right shift: sar
This instruction does NOT shift the sign bit: the new bits
entering on the left are copies of the sign bit
Both shifts store the last bit out in the carry flag
Arithmetic Shift Example
If signed numbers, then the operations below are correct
multiplications / divisions of 1-byte quantities
mov al, 0C3h ; al = 1100 0011 (-61d)
sal al, 1 ; al = 1000 0110 (86h = -122d)
sar al, 3 ; al = 1111 0000 (F0h = -16d)
; (note that this is not an exact division as we
; lose bits on the right!)
The following is not a correct multiplication by 16!
sal al, 4 ; al = 0000 0000 (0d, which can’t be right)
One should use the imul instruction instead (but unfortunately imul
doesn’t work on 1-byte quantities):
movsx ax, al ; sign extension
imul ax, 16 ; result in ax
In-Class Exercise
Consider the following instructions
mov ax, 0F471h
sar ax, 3
shl ax, 7
sar ax, 10
At each step give the content of register ax
(in hex) and the value of CF
In-Class Exercise
Consider the following instructions
mov ax, 0F471h
F471 CF=0
sar ax, 3
FE8E CF=0
shl ax, 7
4700 CF=1
sar ax, 10
0011 CF=1
Rotate Shifts
There are more esoteric shift instructions
rol and ror: circular left and right shifts
bits shifted out on one end are shifted in the other
end
rcl and rcr: carry flag rotates
the source (e.g., a 16-bit register) and the carry
flag are rotated as one quantity (e.g., as a 17-bit
quantity)
Example Using Shifts
Say you want to count the number of bits that are
equal to 1 in register EAX
One easy way to do this is to use shifts
Shift 32 times
Each time the carry flag contains the last shifted bit
If the carry flag is 1, then increment a counter, otherwise
do not increment a counter
When you’re done the counter contains the number of 1’s
Let’s write this in x86 assembly
Example Using Shifts
; Counting 1 bits in EAX
mov bl, 0 ; bl is the number of 1 bits
mov cl, 32 ; cl is the loop counter
loop_start:
shl eax, 1 ; left shift
jnc not_one ; if carry != 1, jump to not_one
inc bl ; increment the number of 1 bits
not_one:
dec cl ; decrement the loop counter
jnz loop_start ; if more iterations goto loop_start
Boolean Bitwise Operations
There are assembly bitwise instructions for all standard
boolean operations: AND, OR, XOR, and NOT
Bits are computed individually
Examples:
1 0 1 1 0 0 1 1 0 0 0 1
AND OR
1 1 0 1 1 0 0 1 1 0 1 1
= =
1 0 0 1 0 0 1 1 1 0 1 1
1 1 0 0 0 1 NOT 1 1 0 0 0 1
XOR
0 1 1 0 1 1
= = 0 0 1 1 1 0
1 0 1 0 1 0
Boolean Bitwise Instructions
mov ax, 0C123h
and ax, 82F6h ; ax = C123 AND 82F6 = 8022
or ax, E34Fh ; ax = 8022 OR E34F = E36F
xor ax, 36E9h ; ax = E36F XOR 36E9 = D586
not ax ; ax = NOT D586 = 2A79
The test Instruction
The test instruction performs an AND, but does not
store the result
It only sets the FLAG bits
Just like cmp does a subtraction but never stores its result
Note that all boolean bitwise instructions do set the
FLAG bits, BUT for the not operation, which doesn’t
Example:
mov al, 0FFh mov al, 0FFh
test al, 00h not al
jz foo ; branches jz foo ; does not branch
Uses of Bitwise operations
Bitwise operations are very useful to modify individual bits
within data
This is done via “bit masks”, that is constant (immediate)
quantities with carefully chosen bits
Example:
Say you want to turn on bit 3 of a 2-byte value (counting from the
right, with bit 0 being the least significant bit)
An easy way to do this is to OR the value with
0000000000001000, which is 8 in decimal
Say the value is stored in ax
You can simply execute the command:
or ax, 8 ; turns on bit 3 in ax
Easy to generalize
To turn on bits: use OR (with appropriate 1’s in the bit mask)
To turn off bits: use AND (with appropriate 0’s in the bit mask)
To flip bits: use XOR (with appropriate 1’s in the bit mask)
Bit Mask Operations Examples
mov eax, 04F346BA2h
or ax, 0F000h ; turns on 4 leftmost bits of ax
; eax = 4F34FBA2
xor eax, 000400000h ; inverts bit 22 of EAX
; eax = 4F74FBA2
xor ax, 0FFFFh ; 1’s complement of ax
; eax = 4F74045D
Turning on a specific bit
Say you want to turn on a specific bit in some data,
but that you don’t know which one before you run
the program
the index of the bit to turn on is contained in a register
we need to build the bit mask “on the fly”
Assuming that the index of the bit is initially in bl,
and that we which to turn on a bit in eax
mov cl, bl ; must have the bit index in cl
mov ebx, 1 ; create a number 0...01
shl ebx, cl ; shift it left cl times
or eax, ebx ; turn on the desired bit using
; ebx as a mask!
An odd xor
One often sees the following instruction:
xor eax, eax ; eax = 0
This is a simple way to set eax to 0
It is useful because its machine code is smaller than that of,
for instance, mov eax, 0
Therefore on saves a few bits in the text segment and while
the program runs a few bits less will be needed to be loaded
from memory, saving perhaps a few cycles
Lesson: On could do everything with operations that look like
those of high-level languages, but the good assembly
programmer (and the good compiler) will use bit operations to
save memory and/or time
Branching & Loop
Instructions
Control Structures
So far we have seen instructions to
Move data back and forth between memory and registers
Do some data conversion
Perform arithmetic operation on that data
Now we’re going to learn about control structures, that is
instructions that modify the order in which instructions are executed
i.e., we not necessarily execute the next instruction
High-level programming languages provide control structures
for loops, while loop, if-then-else statements, etc.
Assembly language provides much more basic control structures
Mostly it provides a goto!
A really infamous instruction, that causes horrendous “spaghetti code”
Luckily, high-level control structures can be cleanly translated into
assembly code
Therefore, one can write non-spaghetti assembly! (sort of)
Comparisons
Control structures essentially decide which
instruction should be executed next based on
comparisons of data items
In assembly, the result of a comparison is stored in
the bits of the FLAGS register
The basic comparison instruction is cmp
cmp subtracts one operand from another, and sets
the bits of FLAGS accordingly, but the result of the
subtraction is not stored anywhere
Other arithmetic instructions also set bits of FLAGS
(add, sub, mul, etc.)
Unsigned Integers
When you use unsigned integers the bits in the FLAGS
register (also called “flags”) that are important are:
ZF: The Zero Flag (set to 1 if result is 0)
CF: The Carry Flag
During an arithmetic operation, used to detect overflow or to do
clever arithmetic since it may denote a carry or a borrow
Consider: cmp a, b (which computes a-b)
If a = b: ZF is set, CF is not set
If a < b: ZF is not set, CF is set (borrow)
If you were computing the difference for real, this would mean an
error!
If a > b: ZF is not set, CF is not set
Therefore, by looking at ZF and CF you can determine the
result of the comparison!
We’ll see how we “look” at the flags shortly
Signed Integers
For signed integers you should care about
three flags:
ZF: zero flag
OF: overflow flag (set to 1 if the result overflows
or underflows)
SF: sign flag (set to 1 if the result is negative)
Consider: cmp a, b (which computes a-b)
If a = b: ZF is set, OF is not set, SF is not set
If a < b: ZF is not set, and SF ≠ OF
If a > b: ZF is not set, and SF = OF
Therefore, by looking at ZF, SF, and OF you can
determine the result of the comparison!
Signed Integers: SF and OF???
Why do we have this odd relationship between SF
and OF?
Consider two signed integers a and b, and
remember that we compute (a-b)
If a < b
If there is no overflow, then (a-b) is a negative number!
If there is overflow, then (a-b) is (erroneously) a positive
number
Therefore, in both cases SF ≠ OF
If a > b
If there is no overflow, the (correct) result is positive
If there is an overflow, the (incorrect) result is negative
Therefore, in both cases SF = OF
Signed Integers: SF and OF???
Example: a = 80h (-128d), b = 23h (+35d) (a < b)
a - b = a + (-b) = 80h + DDh = 15Dh
dropping the 1, we get 5Dh (+93d), which is erroneously positive!
So, SF=0 and OF=1
Example: a = F3h (-13d), b = 23h (+35d)(a < b)
a - b = a + (-b) = F3h + DDh = D0h (-48d)
D0h is negative and we have no overflow (in range)
So, SF=1 and OF=0
Example: a = F3h (-13d), b = 82h (-126d) (a > b)
a - b = a + (-b) = F3h + 7Eh = 171h
dropping the 1, we get 71h (+113d), which is positive and we have no
overflow
So, SF=0 and OF=0
Example: a = 70h (112d), b = D8h (-40d)(a > b)
a - b = a + (-b) = 70h + 28h = 98h, which is erroneously negative
So, SF=1 and OF=1
In-Class Exercise
What are the ZF, CF, SF, and OF flags for
“comp a,b” for the following values
a = 0F3h and b = 019h
a = 074h and b = 082h
a = 0A3h and b = 071h
In-Class Exercise
a = 0F3h and b = 019h
ZF = 0
CF? (thinking of numbers as unsigned)
a - b = 0F3h - 019h = something that’s still >0
CF=0
SF? (thinking of numbers as signed)
a + (-b) = F3h + E7h = 1DAh, drop the 1
DAh is negative
SF = 1
OF? (thinking of numbers as signed)
a is negative, b is positive, DA is negative, we’re good
OF = 0
In-Class Exercise
a = 074h and b = 082h
ZF = 0
CF? (thinking of numbers as unsigned)
a - b = 074h - 082h = something that’s <0
CF=1
SF? (thinking of numbers as signed)
a + (-b) = 74h + 7Eh = F2h
F2h is negative
SF = 1
OF? (thinking of numbers as signed)
a is positive, b is negative, F2 is erroneously negative
OF = 1
In-Class Exercise
a = 0A3h and b = 071h
ZF = 0
CF? (thinking of numbers as unsigned)
a - b = 0A3h - 71h = something that’s >0
CF=0
SF? (thinking of numbers as signed)
a + (-b) = A3h + 8Fh = 152h, drop the 1
52h is positive
SF = 0
OF? (thinking of numbers as signed)
a is negative, b is positive, 52 is erroneously positive
OF = 1
The FLAGS register
Is it very important to remember that many
instructions change the bits of the FLAGS
register
So you should “act” on flag values
immediately, and not expect them to remain
unchanged inside FLAGS
or you can save them by-hand for later use
perhaps
Summary
cmp a,b ZF CF OF SF
a=b 1 0
unsigned a<b 0 1
a>b 0 0
a=b 1 0 0
signed a<b 0 v !v
a>b 0 v v
Branch Instructions
A “branch” is basically a “goto” that says:
instead of executing the next instruction, go
execute that other one
Two types of branches
Unconditional (often called a “jump”)
always branches
Conditional
branches only when some condition is true
The JMP Instruction
JMP allows you to “jump” to a code label
Example:
...
add eax, ebx
jmp here
sub al, bl This instruction will
mvsx ax, al never be executed!
here:
call print_int
...
The JMP Instruction
The ability to jump to a label in the assembly code is convenient
In machine code there is no such thing as a label: only addresses
So one would constantly have to compute addresses by hand
e.g., “jump to the instruction +4319 bytes from here in the source code”
e.g., “jump to the instruction -18 bytes from here in the source code”
This is what programmers way back when used to do by hand, using
signed displacements in bytes
The displacements are added to the EIP register (program counter)
There are three versions of the JMP instruction in machine code:
Short jump: Can only jump to an instruction that is within 128 bytes in
memory of the jump instruction (1-byte displacement)
Near jump: 4-byte displacement (any location in the code segment)
Far jump: very rare jump to another code segment
We won’t use this at all
The JMP Instruction
A short jump:
jmp label
or jmp short label
A near jump:
jmp near label
Why do we even have this?
Remember that instructions are encoded in binary
To jump one needs to encode the number of bytes to add/subtract to the
program counter
If this number is large, we need many bits to encode it
If this number is small, we want to use few bits so that our program
takes less space in memory
i.e., the encoding of a short jmp instruction takes fewer bits than the
encoding of a near jmp instruction (3 bytes less)
In a code that has 100,000 near jumps, if you can replace 50% of them
by short jumps, you save ~150KB (in the size of the executable)
Conditional Branches
There is a large set of conditional branch
instructions
The simple ones just branch (or not)
depending on the value of one of the flags:
ZF, OF, SF, CF, PF
PF: Parity Flag
Set to 0 if the number of bits set to 1 in the lower 8-bit
of the “result” is odd, to 1 otherwise
Simple Conditional Branches
JZ branches if ZF is set
JNZ branches if ZF is unset
JO branches if OF is set
JNO branches if OF is unset
JS branches is SF is set
JNS branches is SF is unset
JC branches if CF is set
JNC branches if CF is unset
JP branches if PF is set
JNP branches if PF is unset
Example
Consider the following C-like code
if (EAX == 0)
EBX = 1;
else
EBX = 2;
Here it is in x86 assembler
cmp eax, 0 ; do the comparison
jz thenblock ; if = 0, then goto thenblock
mov ebx, 2 ; else clause
jmpnext ; jump over the then clause
thenblock:
mov ebx, 1 ; then clause
next:
Could use jnz and be the other way around
Another Example
Say we have the following C code (let us assume that EAX is
signed)
if (EAX >= 5)
EBX = 1;
else
EAX = 2;
This is much less straightforward
Let’s go back to our table for signed numbers
cmp a,b ZF OF SF After executing cmp eax, 5
a=b 1 0 0
signed
a<b 0 v !v if (OF = SF) then a >= b
a>b 0 v v
Another Example
a>=b if (OF = SF)
Skeleton program
cmp eax, 5 Comparison
???? Testing relevant flags
thenblock:
mov ebx, 1 “Then” block
jmp end
elseblock:
mov ebx, 2 “Else” block
end:
Another Example
a>=b if (OF = SF)
Program:
cmp eax, 5 ; do the comparison
jo oset ; if OF = 1 goto oset
js elseblock ; (OF=0) and (SF = 1) goto elseblock
jmp thenblock ; (OF=0) and (SF=0) goto thenblock
oset:
jns elseblock ; (OF=1) and (SF = 0) goto elseblock
jmp thenblock ; (OF=1) and (SF=1) goto thenblock
thenblock:
mov ebx, 1
jmp end
elseblock:
let’s check that it works
mov ebx, 2
end:
Another Example
cmp eax, 5 ; do the comparison
jo oset ; if OF = 1 goto oset
js elseblock ; (OF=0) and (SF = 1) goto elseblock
jmp thenblock ; (OF=0) and (SF=0) goto thenblock
oset:
jns elseblock ; (OF=1) and (SF = 0) goto elseblock
jmp thenblock ; (OF=1) and (SF=1) goto thenblock
thenblock:
mov ebx, 1
Unneeded
jmp end
instruction, we can
elseblock:
just “fall through”
mov ebx, 2
end:
A bit too hard?
One can play tricks by putting the else block
before the then block
The previous two examples are really
awkward, and it’s very easy to introduce bugs
Consequently, x86 assembly provides other
branch instructions to make our life much
easier :)
Let’s look at these instructions
More branches
cmp x, y
signed unsigned
Instruction branches if Instruction branches if
JE x=y JE x=y
JNE x != y JNE x != y
JL, JNGE x<y JB, JNAE x<y
JLE, JNG x <= y JBE, JNA x <= y
JG, JNLE x>y JA, JNBE x>y
JGE, JNL x >= y JAE, JNB x >= y
Redoing our Example
if (EAX >= 5)
EBX = 1;
else
EAX = 2;
cmp eax, 5
jgethenblock
mov eax, 2
jmp end
thenblock:
mov ebx, 1
end:
Translating high-level structures
We are used to using high-level structures
rather than just branches
Therefore, it’s useful to know how to translate
these structures in assembly, so that we can
just use the same patterns than when writing,
say, C code
A compiler does such translations
Let’s start with a high-level control structure
we just talked about: if-then-else
If-then-Else
A generic if-the-else construct:
if (condition) then
then_block
else
else_block;
Translation into x86 assembly:
; instructions to set flags (e.g., cmp ...)
jxx else_block; ; select xx so that branches if condition==false
; code for the then block
jmp endif
else_block:
; code for the else block
endif:
No Else?
A generic if-the-else construct:
if (condition) then
then_block
Translation into x86 assembly:
; instructions to set flags (e.g., cmp ...)
jxx endif; ; select xx so that branches if condition==false
; code for the then block
endif:
For Loops
Let’s translate the following loop:
sum = 0;
for (i = 0; i <= 10; i++)
sum += i
Translation
mov eax, 0 ; eax is sum
mov ebx, 0 ; ebx is i
loop_start:
cmp ebx, 10 ; compare i and 10
jg loop_end ; if (i > 10) goto end_loop
add eax, ebx ; sum += i
inc ebx ; i++
jmp loop_start ; goto loop
loop_end:
The loop instruction
It turns out that, for convenience, the x86
assembly provides instructions to do loops!
The instruction is called loop
It is called as: loop <label>
and does
Decrement ecx (ecx has to be the loop index)
If (ecx != 0), branches to the label
Let’s try to do the loop in our previous
example
For Loops
Let’s translate the following loop:
sum = 0;
for (i = 0; i <= 10; i++)
sum += i
The x86 loop instruction requires that
The loop index be stored in ecx
The loop index be decremented
The loop exists when the loop index is equal to zero
Given this, we really have to think of this loop in reverse
sum = 0
for (i = 10; i > 0; i--)
sum += i
This loop is equivalent to the previous one, but now it can be
directly translated to assembly using the loop instruction
Using the loop Instruction
Here is our “reversed” loop
sum = 0
for (i = 10; i > 0; i--)
sum += i
And the translation
mov eax, 0 ; eax is sum
mov ecx, 10 ; ecx is i
loop_start:
add eax, ecx ; sum += i
loop loop_start ; if i > 0, go to loop_start
While Loops
A generic while loop
while (condition) {
body
}
Translated as:
while:
; instructions to set flags (e.g., cmp...)
jxx end_while ; branches if condition=false
; body of loop
jmp while
end_while
Do While Loops
A generic do while loop
do {
body
} while (condition)
Translated as:
do:
; body of loop
; instructions to set flags (e.g., cmp...)
jxx do ; branches if condition=true
Find average of two numbers
.model small
.stack 100
.data
No1 DB 63H ; First number storage
No2 DB 2EH ; Second number storage
Avg DB ? ; Average of two numbers
.code
START:
MOV AX,@data ; [ Initialises
MOV DS,AX ; data segment ]
MOV AL,NO1 ; Get first number in AL
ADD AL,NO2 ; Add second to it
ADC AH,00H ; Put carry in AH
SAR AX,1 ; Divide sum by 2
MOV Avg,AL ; Copy result to memory
Find sum of numbers in the array
.model small
.data
ARRAY DB 12H,24H,26H,63H,25H,86H,2FH,33H,10H,35H
SUM DW 0
.code
START:
MOV AX,@data ; [ Initialise
MOV DS,AX ; data segment ]
MOV CL,10 ; Initialise counter
XOR DI,DI ; Initialise pointer
LEA BX,ARRAY ; Initialise array base pointer
BACK:
MOV AL,[BX+DI] ; Get the number
MOV AH,00H ; Make higher byte 00h
ADD SUM,AX ; SUM = SUM + number
INC DI ; Increment pointer
DEC CL ; Decrement counter
JNZ BACK ; if not 0 go to back
MOV AH,4CH
INT 21H
END STAR
Find
.model small
maximum number in the array
.stack 100
.data
ARRAY DB 63H,32H,45H,75H,12H,42H,09H,14H,56H,38H ; Array of ten numbers
MAX DB 0 ; Maximum number
.code
START:
MOV AX,@data ; [ Initialises
MOV DS,AX ; data segment ]
XOR DI,DI ; Initialise pointer
MOV CL,10 ; Initialise counter
LEA BX,ARRAY ; Initialise base pointer for array
MOV AL,MAX ; Get maximum number
BACK: CMP AL,[BX+DI] ; Compare number with maximum
JNC SKIP ; jump if no carry, if CF is 0 ( CF is set if there is a borrow )
MOV DL,[BX+DI] ; [ If number > MAX
MOV AL,DL ; MAX = number ]
SKIP: INC DI ; Increment pointer
DEC CL ; Decrement counter
JNZ BACK ; IF count = 0 stop; otherwise go BACK
MOV MAX,AL ; Store maximum number
END START
Separate even and odd numbers in array
.model small
.STACK 100
.data
ARRAY DB 12H,23H,26H,63H,25H,86H,2FH,33H,10H,35H
ARR_ODD DB 10 DUP (?)
ARR_EVEN DB 10 DUP (?)
.code
START:
MOV AX,@data ; [ Initialise
MOV DS,AX ; data segment ]
MOV CL,10 ; Initialise counter
XOR DI,DI ; Initialise odd_pointer
XOR SI,SI ; Initialise even_pointer
LEA BP,ARRAY ; Initialise array base_pointer
BACK:
MOV AL,DS:[BP] ; Get the number
AND AL,01H ; Mask all bits except LSB
JZ NEXT ; If LSB = 0 go to next
LEA BX,ARR_ODD ; [ Otherwise
MOV AH,[BX+DI] ; initialse pointer to odd array
MOV ARR_ODD,AH ; and save number in odd array ]
INC DI ; Increment odd_pointer
JMP SKIP
NEXT:
LEA BX,ARR_EVEN ; [ Initialise pointer
MOV AH,[BX+SI] ; to even array and save number
MOV AH,ARR_EVEN ; in even array
INC SI ; Increment even_pointer
SKIP:
INC BP ; Increment array base_pointer
DEC CL ; Decrement counter
JNZ BACK ; if not 0 go to back
END START
Computing Prime Numbers
The book has an example of an assembly
program that computes prime numbers
Let’s look at it in detail
Principle:
Try possible prime numbers in increasing order
starting at 5
Skip even numbers
Test whether the possible prime number (the
“guess”) is divisible by any number other than 1
and itself
If yes, then it’s not a prime, otherwise, it is
Computing Primes in C
unsigned int guess;
unsigned int factor;
unsigned int limit;
printf(“Find primes up to: “);
scanf(“%u”,&limit);
printf(“2\n3\n”); // prints the first 2 obvious primes
guess = 5; // we start the guess at 5
while (guess <= limit) {
factor = 3; // look for a possible factor
// we only look at factors < sqrt(guess)
while ( factor*factor < guess && guess % factor != 0 )
factor += 2;
if ( guess % factor != 0 ) // we never found a factor
printf(“%d\n”,guess);
guess += 2; // skip even numbers
}
Computing Primes in Assembly
unsigned int guess;
unsigned int factor; bss segment
unsigned int limit;
printf(“Find primes up to: “);
scanf(“%u”,&limit); data segment (message)
printf(“2\n3\n”); // prints the first 2 obvious primes
guess = 5; // we start the guess at 5 easy text segment
while (guess <= limit) {
factor = 3; // look for a possible factor
// we only look at factors < sqrt(guess)
while ( factor*factor < guess && guess % factor != 0 )
factor += 2;
if ( guess % factor != 0 ) // we never found a factor
more difficult text segment
printf(“%d\n”,guess);
guess += 2; // skip even numbers
}
Computing Primes in Assembly
unsigned int guess;
unsigned int factor; bss segment
unsigned int limit;
printf(“Find primes up to: “);
scanf(“%u”,&limit); data segment (message)
printf(“2\n3\n”); // prints the first 2 obvious primes
guess = 5; // we start the guess at 5 easy text segment
%include “asm_io.inc” mov eax, Message ; print the message
segment .data call print_string
Message db “Find primes up to: “, 0 call read_int ; read Limit
segment .bss mov [Limit], eax
Limit resd 1 ; 4-byte int mov eax, 2 ; print “2\n”
Guess resd 1 ; 4-byte int call print_int
segment .text call print_nl
global asm_main mov eax, 3 ; print “3\n”
asm_main: call print_int
enter 0, 0 call print_nl
pusha mov dword [Guess], 5 ; Guess = 5
Computing Primes in Assembly
while (guess <= limit) {
...
unsigned }
numbers
while_limit:
mov eax, [Guess]
cmp eax, [Limit] ; compare Guess and Limit
jnbe end_while_limit ; If !(Guess <= Limit) Goto end_while_limit
... ; body of the loop goes here
jmp while_limit
end_while_limit:
popa ; clean up
mov eax, 0 ; clean up
leave ; clean up
ret ; clean up
Computing Primes in Assembly
factor = 3; // look for a possible factor
mov ebx, 3 ; ebx is factor // we only look at factors < sqrt(guess)
while_factor: while ( factor*factor < guess &&
guess % factor != 0 )
mov eax, ebx ; eax = factor factor += 2;
mul eax ; edx:eax = factor * factor if ( guess % factor != 0 ) // we never found a
factor
cmp edx, 0 ; compare edx and 0 printf(“%d\n”,guess);
jne endif ; factor too big guess += 2; // skip even numbers
cmp eax, [Guess] ; compare factor*factor and guess
jnb endif ; if !< goto endif (factor too big)
mov edx, 0 ; edx = 0 if edx != 0, then we’re
mov eax, [Guess] ; eax = [Guess] too big
div ebx ; divide edx:eax by factor
cmp edx, 0 ; compare the reminder with 0
don’t forget to
je end_while_factor ; if == 0 goto end_while_factor
initialize edx
add ebx, 2 ; factor += 2
jmp while_factor ; loop back
end_while_factor:
mov eax, [Guess] ; print guess
call print_int ; print guess
We don’t chose
call print_nl ; print guess
eax for factor
endif:
because eax is
add dword [Guess], 2 ; guess += 2
used by a lot of
functions/routines
Stacks
Why Stack?
There are several reasons why we need
stacks:
To save register values if we ran out of
registers.
To pass parameters to subroutines
To make space for local variables in
subroutines
To preserve original register values if we
change them in a subroutine
To fetch processor flag status
Stack operations
last in first out (LIFO)
Stack operations mainly done by two
instructions either push or pop.
The instruction push will push values
into the stack, while pop will pop it out.
The syntax is like this:
Usually the operand x is a 16-bit
registers.
You can push 8-bit registers too, but the
processor will push a 16-bit value
anyway.
Memory Layout
You should know that register CS by
default points to the segment where
the code resides. DS will point to the
data segment. ES usually pointed to
data segment too. SS will point to stack
segment. Since CS, DS, ES, and SS
point to the same segment, it means
code, data, and stack resides in the
same region.
How can we manage this?
The stack is not only pointed by SS register. But
also SP register.
So, the pair SS:SP points the top of the stack.
Initially, SP is set to the very bottom of the
segment in "tiny" mode, at address 0FFFEh (not
0FFFFh, that's the bottom end of the segment).
Each time we push something into the stack, this
SP register will be decremented up by 2. If we pop
something, SP will be incremented down by 2.
Whereas, our code and our data starts at offset
100h. So, the layout looks something like this:
Application
Other Uses
Can we push a constant? In 8086 NO. In 80286 or
above YES. So, doing push 1, this will be treated
as if a 16-bit value. No need to specify word ptr
and stuff.
The more useful usage of push and pop is to push
flag and then pop it into register. That way, we can
examine the flag content directly. Look at the
following code:
pushf
pop ax
There... we can examine the flag values in register
AX, The net effect is the same like assigning AX
with flags. Of course, the mov instruction cannot
handle this.
Likewise, you can set the flag values using push ax
then popf.
Subroutines
Subprograms
Subprograms (functions, procedures,
methods) are key to making programs easier
to read and write (code reuse)
We are going to see how to define and call
subprograms in assembly
Useful to write large(r) assembly programs
More importantly, will allow us to understand how
subprograms work in higher-level languages
What is a subprogram?
A subprogram is a piece of code that starts at
some address in the text segment
The program can jump to that address to
“call” the subprogram
When the subprogram is done executing it
jumps back to the instruction after the call
The subprogram can take parameters
Let’s see how we can implement this using
only what we’ve seen so far in the course
Example Subprogram
Say we want to write a subprogram that computes
some numerical function of two operands and
“returns” the result
e.g., because we need to compute that function often
We will write the program so that when it is called,
the first operand is in eax and the second in ebx,
and when it returns the result is in eax
This is a convention that we make, and that should be
documented in the code
Calling the program can then be done via a simple
jmp
Let’s look at the code
“By hand” subprogram
...
mov eax, 12 ; first operand = 12
mov ebx, 14 ; second operand = 14
jmp func ; “call” the function
ret:
... Why isn’t this really
a valid implementation
... of a subprogram?
func:
add eax, ebx ; do something with eax and ebx
; put result in eax
jmp ret ; “return” to the instruction
; after the call
Multiple Calls?
Typically we want to call a function from multiple
places in a program
The problem with the previous code is that the
function always returns to a single label!
...
jmp func ; “call” the function
ret1:
...
jmp func ; “call” the function
ret2:
...
func:
...
jmp ??? ; where do we return???
A Better Function Call
To fix our previous example, we simply need
to remember the place where the function
should return!
This can be done by storing the address of
the instruction after the call in a register, say,
register ecx
The code for the function then can just return
to whatever instruction ecx points to
Again, this is a convention that we decide as a
programmer and that we must remember
A Better Function Call
...
mov ecx, ret1 ; store the return address
jmp func ; “call” the function
ret1:
...
mov ecx, ret2 ; store the return address
jmp func ; “call” the function
ret2:
...
func:
...
jmp ecx ; return
All Good, but ...
So at this point, we can do any function call
We just need to decide on convention about which registers
hold
input parameters
return value
return address
The problem is that this gets very cumbersome
It requires a bunch of “ret” labels
the return address can be computed numerically as “$ + x”, where x
is the length in bytes of the address of the “jump func” instruction,
which is very awkward
It forces the programmer to constantly keep track of registers
and be careful to save and restore important values
Solution:
A stack
Two new instructions: CALL and RET
The Stack
A stack is a Last-In-Last-Out data structure
Provides two operations
Push: puts something on the stack
Pop: removes something from the stack
Defined by the address of the “element” at the top of the stack
Push: puts the element on top of the stack and increments the stack
pointer
Pop: gets the element from the top of the stack and decrements the
stack pointer
Our stack only allows pushing/popping of elements that are double
words (4-byte elements)
Note “quite” true, but a much safer approach
The Stack and the ESP Register
Initially the stack is empty and the ESP register has some value
Pushing an element:
Decrease ESP by 4
Write 4 bytes at address ESP
Examples:
push eax
push dword 42
Popping an element:
Get the value from the top of the stack into a register
Increase ESP by 4
Examples:
pop eax
pop ebx
Accessing an element:
Read the 4 bytes at address ESP
Example:
mov eax, [esp]
Example Stack Instructions
00001000h
Assuming that ESP=00001000h 00000FFFh 0
little endian
00000FFEh 0
increasing addresses
00000FFDh 0
push dword 1 ; ESP = 00000FFCh 00000FFCh 1
00000FFBh 0
little endian
push dword 2 ; ESP = 00000FF8h 00000FFAh 0
push dword 3 ; ESP = 00000FF4h
00000FF9h
00000FF8h
0
2
00000FF7h 0
little endian
pop eax ; EAX = 3
00000FF6h 0
00000FF5h 0
pop ebx ; EBX = 2 00000FF4h 3
pop ecx ; ECX = 1
The ESP Register
The ESP register always contains the
address of the element at the top of the stack
Do not use it for anything else!
Its value is typically updated by calls to push
and pop
Sometimes we’ll update it by hand
See this in a few slides
PUSHA and POPA
One use of the stack is to save/restore register values
For instance, say your program uses eax and calls a function written
by somebody else
You have no idea (or don’t care to know) whether that function uses
eax also
If it does, your eax will be corrupted
One easy solution:
push eax onto the stack
call the function
pop eax to restore its value
The x86 offers two convenient instructions
PUSHA: pushes EAX, EBX, ECX, EDX, ESI, EDI, and EBP onto the
stack
POPA: restores them and pops the stack
It’s now simple to say “save all my registers” and “restore my
registers”
The CALL and RET Instructions
One of the annoying things with our previous
subprogram was that we had to manage the return
address
In our example we stored it into the ECX register
Two convenient instructions can do this for us
CALL:
Puts the address of the next instruction on the stack
Unconditionally jumps to a label (calling a function)
RET:
Pops the stack and gets the return address
Unconditionally jumps to that address (returning from a
function)
Without CALL and RET
...
mov ecx, ret1 ; store the return address
jmp func ; “call” the function
ret1:
...
mov ecx, ret2 ; store the return address
jmp func ; “call” the function
ret2:
...
func:
...
jmp ecx ; return
With CALL and RET
...
call func ; call the function
...
call func ; call the function
...
func:
...
ret ; return
Nested Calls
The use of the stack enables nested calls
Return addresses are popped in the reverse order in which
they were pushed (Last-In-First-Out)
Warning: one must be extremely careful to pop
everything that’s pushed on the stack inside a
function
Example of erroneous use of the stack:
func:
mov eax, 12 ;
push eax ; put eax on the stack
ret ; pop eax and interpret
; it as a return address!!
Activation Records
The stack is useful to store and retrieve return
addresses, transparently managed via the CALL and
RET instructions
But it’s much more useful than this
In general, when calling a function, one puts all kinds of
useful information on the stack
When the function returns, this information is popped off
the stack and the function’s caller can safely resume
execution
The set of “useful information” is typically called an
activation record (or a “stack frame”)
One very important component of an activation record is
the parameters passed to the function
Another is the return address, as we’ve already seen
Subprogram Conventions
Typically, one uses a consistent calling convention, so that there is a
generic way to call a subprogram
Of course compilers use calling conventions
The compiler, when generating assembly code, must follow a standard
process to generate assembly corresponding to function calls and
returns
Some languages specify which calling convention should be used
What we describe in all that follows is mostly the convention used
by the C language
i.e., C compilers should use this convention when generating assembly
code from C code
A Simple Activation Record
To call a function you have to follow the following steps:
Push the parameters onto the stack
Execute the CALL instruction, which pushes the return address
onto the stack
Warning: In the C calling convention parameters are
pushed onto the stack in reverse order!
Say the function is f(a,b,c)
c is pushed onto the stack first
b is pushed onto the stack second
c is pushed onto the stack third
A Simple Activation Record
Say you want to call a function with 2 32-bit parameters
If parameters are < 32 bits, they need to be converted to 32-bit
values
After the call, the stack looks like this:
ESP+8 2nd parameter
Activation direction
ESP+4 1st parameter
Record of growth
ESP return address
ESP and EBP
There is one problem with referencing parameters
using ESP, as in [ESP+8]
If the subprogram uses the stack for something else,
ESP will be modified!
So at some point in the program, the 2nd parameter
should be accessed as [ESP+8]
And at some other point, it may be accessed as [ESP+12],
[ESP+16], etc., depending on how the stack grows
So the convention is to use the EBP register to save
the value of ESP as soon as the subprogram starts
Afterwards, the 2nd parameter is always accessed
as [EBP+8] and the 1st parameter is always
accessed as [EBP+4]
ESP and EBP
Stack as it is when the subprogram begins
ESP+8 2nd parameter
ESP+4 1st parameter
ESP return address EBP = ESP
EBP+8 2nd parameter
EBP+4 1st parameter
EBP = ESP return address
Further use of the stack
ESP+16 EBP+8 2nd parameter
ESP+12 EBP+4 1st parameter
Parameters still referred to
ESP+8 EBP return address as EBP+4 and EBP+8
ESP+4 stuff
ESP stuff
ESP and EBP
So far so good, but the caller may have been using EBP!
Typically to access its own parameters
So the convention is to first save the value of EBP onto the
stack and then set EBP = ESP, as soon as the program starts
So, the stack right before the subprogram truly begins is:
ESP+12 2nd parameter
Parameter accesses:
ESP+8 1st parameter 1st parameter: [EBP+8]
ESP+4 return address 2nd parameter: [EBP+12]
EBP = ESP old value of EBP
At the end of the subprogram, the value of EBP is popped
and restored with a simple POP instruction
Subprogram Skeleton
func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP
... ; subprogram code
pop ebp ; restore original EBP
ret ; returns
Returning from a Subprogram
After the subprogram returns, one must “clean up” the stack
The stack has on it:
The return address
The parameters
The old EBP value
The old EBP value must be popped in the subprogram (at the end)
The return address is removed by the RET instruction
You don’t see the POP, but it’s there
So the only thing that must be removed from the stack are the
parameters
The C convention specifies that the caller code must do this
Other languages specify that the callee must do it
In fact, it is well known that it’s a little bit more efficient to have the
subprogram (i.e., the callee) do it!
So one may wonder why C opts for the slower approach
Turns out, it’s all because of varargs
Using the Parameters
Inside the code of the subprogram, parameters can
be simply accessed via indirection from the stack
pointer
In our previous example:
mov eax, [ESP + 4] ; put 1st parameter into eax
mov ebx, [ESP + 8] ; put 2nd parameter into ebx
Typically the subprogram does not pop the
parameters off the stack before using them
It would be annoying to have to pop the return address
first, and then push it back
It’s convenient to have the parameters always stored in
memory as opposed to being careful to constantly
preserve them in registers
Example: Calling a Subprogram
Caller:
push dword 2 ; second parameter
push dword 1 ; first parameter
call func ; call the function
add esp, 8 ; pop the two arguments
Note that to pop the two arguments we merely add 8 to the
stack pointer ESP
Since we do not care to get the values of the arguments at this
point, it’s quicker than to call pop twice!
For the case with one argument, calling pop may be better
The two arguments stay there in memory but will be
overwritten next time a function is called or next time the stack
is used
Return Values?
Often, one wants a subprogram to return a value
e.g., a function that computes some number
There are several ways to do this
One way is to pass as a parameter the address of a zone of
memory in which some result should be written
As in: void foo(int *x); foo(&a);
This is not a true return value
As in: int foo();
The C convention is that the return value is always stored in
EAX when the function returns
It’s the responsibility of the caller to save the EAX value before
the call (if needed) and to restore it later
In some of our previous example, we just didn’t use EAX to hold
anything important so that this issue never arose
e.g., when calling read_int(), read_char(), etc.
Saving Registers in Subprograms
Just saving EBP
func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP
... ; subprogram code
mov eax, ... ; set return value
pop ebp ; restore original EBP
ret ; returns
Saving Registers in Subprograms
Saving EBX and ECX in addition to EBP
func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP
push ebx ; save EBX
push ecx ; save ECX
... ; subprogram code
mov eax, ... ; set return value
pop ecx ; restore ECX
pop ebx ; restore EBX
pop ebp ; restore ebp
ret ; returns
Saving Registers in Subprograms
Saving “all” registers using PUSHA and POPA
func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP
pusha ; save all (including new EBP)
... ; subprogram code
mov eax, ... ; set return value Problem?
popa ; restore all (including new EBP)
pop ebp ; restore original ebp
ret ; returns
Overwrites the return value
that’s stored in eax!
Local Variables in Subprograms
In all the examples we have seen so far, the subprograms
were able to do their work using only registers
But sometimes, a subprogram’s needs are beyond the set of
available registers and some data must be kept in memory
Just think of all subprograms you wrote that used more than 6
local variables (EAX, EBX, ECX, EDX, ESI, EDI)
One possibility could be to declare a small .bss segment for
each subprogram, to reserve memory space for all local
variables
Drawback #1: memory waste
This reserved memory consumes memory space for the entire
duration of the execution even if the subprogram is only active
for a tiny fraction of the execution time
Drawback #2: subprogram are not reentrant
Local variables on the stack
Since activation records on the stack are used to
store relevant information pertaining to a
subprogram, why not use it for storing the
subprogram local variables?
The standard approach is to store local variables
right after the saved EBP value on the stack
This is simply done by subtracting some amount to the
ESP pointer
The local variables are then accessed as [EBP - 4],
[EBP - 8], etc.
Let’s see this on an example
Local Variable Examples
Say we have a subprogram that takes 2 parameters, uses 3
local variables, and doesn’t return any value
The code of the subprogram is as follows:
func:
push ebp ; save old EBP value
mov ebp, esp ; set EBP
subesp, 12 ; add space for 3 local variables
; subprogram body
mov esp, ebp ; deallocate local variables
pop ebp ; restore old EBP value
ret
Let’s look at the content of the stack when the subprogram
body begins
Local Variables Example
Inside the body of the EBP+12 2nd parameter
subprogram, parameters are EBP+8 1st parameter
referenced as: EBP+4 return address
[EBP+12]: 2nd parameter EBP saved EBP
[EBP+8]: 1st parameter EBP-4 1st local var
EBP-8 2nd local var
Inside the body of the EBP-12 3rd local var
subprogram, local variables are
referenced as:
[EBP-4]: 1st local variable
[EBP-8]: 2nd local variable
[EBP-12]: 3rd local variable
Functions example
Let's make a subroutine to calculate
1+2+...+n.
Document a subroutine
It is a good habit to document a
subroutine. At least give a comment
above it.
Routine Placement
Conclusion
When programming one always faces trade-offs between
program readability and program performance
Choices must be made based on the task at hand
With by-hand assembly programming, the programmer can
make fine-tuned decisions for these trade-offs
e.g., for a particular function I decide to not save all registers
because I _know_ that it won’t corrupt them, thus saving a bit of
time
e.g., I know that I can reuse some register value that was
modified in a subprogram to do some clever optimization
Some of these optimizations can only be done by a human
who understands what the program does
Some of these optimizations can sometimes be done by a
compiler that generates assembly code from a program
written in some high-level language
Arrays
Array Revisited
To refresh our mind, declaring a ten-
byte array is like this:
To load the first element of the array into
register al is like this:
Accessing the second, the third, and the
forth element is like this:
Access Array through a
loop
Reverse array example
Note:
BX is nicked as 'base register',
SI as 'source index' and
DI as 'destination index'.
XCHG
XCHG instruction used to swap things
Interrupt
Essentials
Introduction to Interrupt
Interrupts can be seen as a number of
functions.
These functions make the programming
much easier, instead of writing a code to
print a character you can simply call the
interrupt and it will do everything for you.
There are also interrupt functions that
work with disk drive and other hardware.
We call such functions software
interrupts.
Interrupts are also triggered by different
hardware, these are called hardware
interrupts. Currently we are interested
in software interrupts only.
Introduction to Interrupt
To make a software interrupt there is an INT instruction, it has
very simple syntax:
INT value
Where value can be a number between 0 to 255 (or 0 to 0FFh),
generally we will use hexadecimal numbers.
You may think that there are only 256 functions, but that is not
correct.
Each interrupt may have sub-functions.
To specify a sub-function AH register should be set before
calling interrupt.
Each interrupt may have up to 256 sub-functions (so we get 256
* 256 = 65536 functions).
Introduction to Interrupt
cont.
Interrupt number alone is not enough.
Interrupt behaves differently depending on which
service number is called.
Service numbers are usually placed in AH.
Sub-service number is usually placed in AL.
This interrupt mechanism is pretty much like a
phone number.
Input and Output to
Screen
Output to Screen
After the start label we are invoking
interrupt number 21h, service 09h.
Interrupt 21h is reserved for operating
system calls.
when you look up what service 09h
does on interrupt 21h in interrupt list
To insert a new line simply change the
message declaration into:
Input from keyboard
interrupt 21h service 0Ah offers a mean
to input from keyboard. The interrupt
lists say:
Input from keyboard
example
Buffer
Output: A Better Version
There is one way to cope with “$” issue
by output characters one by one using
a loop.
The loop terminates if the character
being read is 0.
Zero in ASCII number is defined as a
blank and usually used to terminate
stuffs.
Interrupt 21h, service 06h used to print
one character on screen
[bx] means bx is treated as a pointer instead
of value
Input one Character
Number to String
The output routines we discussed so far
are intended only for outputting strings.
How can we output numbers?
We have to convert the numbers to
string first.
Screen features
Setting the cursor
INT 10H , service 02H
MOV AH, 02H ; request set cursor
MOV BH, 00 ; page number 0
MOV DH, 08 ; row 8
MOV DL, 15 ; column 15
INT 10H ; call interrupt
Clearing the screen
INT 10H , service 06H
AL: # of lines to scroll, 00 for full screen
BH: color
MOV AX, 0600H ; request clear screen, full screen
MOV BH, 71h ; white BG (7), Blue text (1)
MOV CX, 0000h ; upper left row:column
MOV DX, 184Fh ; lower right row:column
INT 10H ; call interrupt
Procedure for GREEN BACKGROUND AND WHITE
TEXT
PROC SETSCREEN NEAR
MOV AH,06H
MOV AL,00
MOV BH,2FH ;GREEN BACKGROUND AND WHITE TEXT
MOV CX,0000H
MOV DX,184FH
INT 10H
RET
SETSCREEN ENDP
Screen display & Keyboard
Input
INT 21H , service 09H: Display string
end with $ (or 24h)
See Fig 8-1: displaying ASCII character set
INT 21H , service 0AH: for accepting
data from the keyboard. (Buffer)
INT 21H , service 02H: to Display single
Character.