KEMBAR78
8086 Assemblyprogramming | PDF | Assembly Language | Computer Programming
0% found this document useful (0 votes)
1K views300 pages

8086 Assemblyprogramming

The document discusses computer architecture and low-level programming. It explains that computer architecture involves understanding a system's components, performance, cost and optimal design tradeoffs. It then describes the Von Neumann architecture model and how CPUs are designed with instruction sets that decode machine code instructions. Assembly language acts as mnemonics for binary machine code instructions. While assembly is difficult for humans, it allows for optimal performance and low-level hardware control not possible in high-level languages. Programs are typically compiled from high-level languages to assembly then machine code for a CPU to execute.

Uploaded by

Motasim Shahin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views300 pages

8086 Assemblyprogramming

The document discusses computer architecture and low-level programming. It explains that computer architecture involves understanding a system's components, performance, cost and optimal design tradeoffs. It then describes the Von Neumann architecture model and how CPUs are designed with instruction sets that decode machine code instructions. Assembly language acts as mnemonics for binary machine code instructions. While assembly is difficult for humans, it allows for optimal performance and low-level hardware control not possible in high-level languages. Programs are typically compiled from high-level languages to assembly then machine code for a CPU to execute.

Uploaded by

Motasim Shahin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 300

8086 Assembly

Programming
What is in a Computer?
 The field of Computer Architecture is about the
fundamental structure of computer systems
 What are the components?
 How are they interconnected?
 How fast does the system operate?
 What is the power consumption?
 How much does it all costs?
 What architecture leads to the “best” trade-offs?
 The conceptual model for computer architecture that
is still in effect since 1965 is the Von-Neumann
architecture
Instructions?
 Whenever somebody builds a CPU they first define
what instructions the CPU will know how to decode
and execute
 This is called the Instruction Set Architecture (ISA)
 The ISA for a Pentium is different from the ISA for a
PowerPC for instance
 The ISA is described in a (lengthy) documentation
that describes everything that one can do with the
CPU
 Every instruction lasts some number of clock cycles
Instructions
 Instructions are encoded in binary machine code
 E.g.: 01000110101101 may mean “perform an addition of two
registers and store the results in another register”
 The CPU is built using gates (or, and, etc.) which
themselves use transistors
 These gates implement instruction decoding
 Based on the bits of the instruction code, several signals are
sent to different electronic components, which in turn perform
useful tasks
 Typically, an instruction consists of two parts
 The opcode: what the instruction computes
 The operands: the input to the computation

opcode operands
0 1 0 0 0 1 1 0 1 0 1 1 0 1
Assembly language
 It’s really difficult for humans to read/remember
binary instruction encodings
 We will see that typically one would use hexadecimal
encoding, but still
 Therefore it is typical to use a set of mnemonics,
which form the assembly language
 It is often said that the CPU understands assembly
language
 This is not technically true, as the CPU understand
machine code, which we, as humans, choose the
represent using assembly language
 An assembler transforms assembly code into
machine code
Assembly Language
 It used to be that all computer programmers did all
day was to write assembly code
 This was difficult for many reasons
 Difficult to read
 Very difficult to debug
 Different from one computer to another!
 The use of assembly language for all programming
prevented the (sustainable) development of large
software project involving many programmers
 This is the main motivation for the development of
high-level languages
 FORTRAN, Cobol, C, etc.
Why Assembly?
It's difficult
Error prone
Hard to debug
Takes a lot of time to
develop
Why Assembly?
However:
 Assembly is fast. A LOT faster than any compiler
of any language could ever produce.
 Assembly is a lot closer to machine level than
any language because the commands of
assembly language are mapped 1-1 to machine
instructions.
 Assembly code is a lot smaller than any compiler
of any language could ever produce.
 In Assembly, we can do a lot of things that we
can't do in any higher level language, such as
playing with processor flags, etc.
High-level Languages
 The first successful high-level language was FORTRAN
 Developed by IBM in 1954 to run on they 704 series
 Used for scientific computing
 The introduction of FORTRAN led people to believe that there would
never be bugs again because it made programming so easy!
 But high-level languages led to larger and more complex software
systems, hence leading to bugs
 Another early programming language was COBOL
 Developed in 1960, strongly supported by DoD
 Used for business applications
 In the early 60s IBM had a simple marketing strategy
 On the IBM 7090 you used FORTRAN to do science
 On the IBM 7080 you used COBOL to do business
 Many high-level languages have been developed since then, and
they are what most programmers use
 Fascinating history
High-Level Languages
 Having high-level languages is good, but CPUs do not
understand them
 Therefore, there needs to be a translation from a high-level
language to machine code
 There are two ways to run a high-level language on a CPU
that only understands machine code:
 Interpretation: An interpreter is a program that reads in high-
level code and simulates a computer that understands high-
level code
 Compilation: A compiler is a program that reads in high-level
code and produces equivalent machine code, which can then
be executed on the CPU at a later time
 Some languages are interpreted, some are compiled, some
are both or hybrid
The Big (Simplified) Picture
Machine code
High-level code
010000101010110110
101010101111010101
101001010101010001
101010101010100101
char *tmpfilename; 111100001010101001
int num_schedulers=0;
000101010111101011

ASSEMBLER
int num_request_submitters=0;
int i,j; 010000000010000100
000010001000100011
if (!(f = fopen(filename,"r"))) {
xbt_assert1(0,"Cannot open file %s",filename); 101001010010101011
} 000101010010010101
while(fgets(buffer,256,f)) {
if (!strncmp(buffer,"SCHEDULER",9))
010101010101010101
num_schedulers++; 101010101111010101
if (!strncmp(buffer,"REQUESTSUBMITTER",16)) 101010101010100101
num_request_submitters++;
} 111100001010101001
fclose(f);
tmpfilename = strdup("/tmp/jobsimulator_

Assembly code
sll $t3, $t1, 2
add $t3, $s0, $t3
sll $t4, $t0, 2
Program counter register
add $t4, $s0, $t4
lw $t5, 0($t3) register

CPU
lw $t6, 0($t4) register
slt $t2, $t5, $t6

COMPILER
beq $t2, $zero, endif
add $t0, $t1, $zero
sll $t4, $t0, 2 Control
add $t4, $s0, $t4 ALU Unit
lw $t5, 0($t3)
lw $t6, 0($t4)
slt $t2, $t5, $t6
beq $t2, $zero, endif
The Big (Simplified) Picture
Hand-written Machine code
High-level code Assembly code 010000101010110110
101010101111010101
101001010101010001
101010101010100101
char *tmpfilename; sll $t3, $t1, 2 111100001010101001
int num_schedulers=0; add $t3, $s0, $t3
000101010111101011

ASSEMBLER
int num_request_submitters=0; sll $t4, $t0, 2
int i,j; 010000000010000100
add $t4, $s0, $t4
lw $t5, 0($t3)
000010001000100011
if (!(f = fopen(filename,"r"))) {
xbt_assert1(0,"Cannot open file %s",filename); lw $t6, 0($t4) 101001010010101011
} slt $t2, $t5, $t6 000101010010010101
while(fgets(buffer,256,f)) {
if (!strncmp(buffer,"SCHEDULER",9))
beq $t2, $zero, endif 010101010101010101
num_schedulers++; 101010101111010101
if (!strncmp(buffer,"REQUESTSUBMITTER",16)) 101010101010100101
num_request_submitters++;
} 111100001010101001
fclose(f);
tmpfilename = strdup("/tmp/jobsimulator_

Assembly code
sll $t3, $t1, 2
add $t3, $s0, $t3
sll $t4, $t0, 2
Program counter register
add $t4, $s0, $t4
lw $t5, 0($t3) register

CPU
lw $t6, 0($t4) register
slt $t2, $t5, $t6

COMPILER
beq $t2, $zero, endif
add $t0, $t1, $zero
sll $t4, $t0, 2 Control
add $t4, $s0, $t4 ALU Unit
lw $t5, 0($t3)
lw $t6, 0($t4)
slt $t2, $t5, $t6
beq $t2, $zero, endif
What we do in this class:
Hand-written Machine code
High-level code Assembly code 010000101010110110
101010101111010101
101001010101010001
101010101010100101
char *tmpfilename; sll $t3, $t1, 2 111100001010101001
int num_schedulers=0; add $t3, $s0, $t3
000101010111101011

ASSEMBLER
int num_request_submitters=0; sll $t4, $t0, 2
int i,j; 010000000010000100
add $t4, $s0, $t4
lw $t5, 0($t3)
000010001000100011
if (!(f = fopen(filename,"r"))) {
xbt_assert1(0,"Cannot open file %s",filename); lw $t6, 0($t4) 101001010010101011
} slt $t2, $t5, $t6 000101010010010101
while(fgets(buffer,256,f)) {
if (!strncmp(buffer,"SCHEDULER",9))
beq $t2, $zero, endif 010101010101010101
num_schedulers++; 101010101111010101
if (!strncmp(buffer,"REQUESTSUBMITTER",16)) 101010101010100101
num_request_submitters++;
} 111100001010101001
fclose(f);
tmpfilename = strdup("/tmp/jobsimulator_

Assembly code
sll $t3, $t1, 2
add $t3, $s0, $t3
sll $t4, $t0, 2
Program counter register
add $t4, $s0, $t4
lw $t5, 0($t3) register

CPU
lw $t6, 0($t4) register
slt $t2, $t5, $t6

COMPILER
beq $t2, $zero, endif
add $t0, $t1, $zero
sll $t4, $t0, 2 Control
add $t4, $s0, $t4 ALU Unit
lw $t5, 0($t3)
lw $t6, 0($t4)
slt $t2, $t5, $t6
beq $t2, $zero, endif
Performance : Bubble Sort
Example

14
Processors Prior to 8086
 (1971) 4004 – First processor made by the Intel
Corporation. Allowed computer intelligence to
be put into small devices like cell phones, key
chains, calculators, etc.

 (1972) 8008 – Twice as powerful as the 4004,


but was used in the Mark-8. Mark-8 was one of
first personal computers.
Processors Prior to 8086(cont.)
 (1974) 8080 – Slight improvement on the 8008
with a more complex instruction set. Started to
mass produce for personal computers. Last
processor update before 8086.
Intel’s 8086 and 8088
 (1978) 8086/8088 – Biggest improvement of
the 8-bit processors. Laid the groundwork for
the X86 architecture in processors. X86 is
still used in the newer Pentium models today.
The 8088 processor was selected by IBM to
be placed in the “IBM PC” which was their
most popular product. Skyrocketed Intel’s
stature as a company and was honored by
being named a Fortune 500 company.
Processors After 8086/8088
 (1982-89) 286/386/486 – Started being able to
run multiple programs at one time and point and
click operating systems.

 (1993-2001) Pentium’s 1-4 – Much faster


speeds allowed multimedia elements like voice,
sounds, and graphics to run much clearer and
faster.
The 80x86 Architecture
 To learn assembly programming we need to pick a
processor family with a given ISA (Instruction Set
Architecture)
 We will pick the Intel 80x86 ISA (x86 for short)
 The most common today in existing computers
 For instance in my laptop
 We could have picked others
 Old ones: Sparc, VAX
 Current ones: PowerPC, Itanium, MIPS
 Some courses in some curricula subject students to
two or even more ISAs, but in this course we’ll just
focused on one more in depth
Organization of 8088/8086
Address bus (20 bits)
AH AL General purpose
BH BL register 
CH CL
Execution UnitDH DL
Data bus
(EU) SP
Segment
CS (16 bits)
BP register DS
SI SS
DI ALU Data bus ES
(16 bits)
IP

Bus
control
ALU Instruction Queue External bus
EU
control
Flag register
Bus Interface Unit (BIU)
20
Organization of 8088/8086
 Intel 8088 facts
VDD (5V)
 20 bit address bus allow accessing
1 M memory locations
 16-bit internal data bus and 8-bit 20-bit
external data bus. Thus, it need 8-bit data address
two read (or write) operations to
read (or write) a 16-bit datum control 8088 control




 Byte addressable and byte-swapping signals signals
To 8088 from 8088
Word: 5A2F CLK
18001 5AHigh byte of word GND
18000 2FLow byte of word
8088 signal classification
Memory locations
21
The 8086 Registers
 To write assembly code for an ISA you must know
the name of registers
 Because registers are places in which you put data to
perform computation and in which you find the result of the
computation (think of them as variables for now)
 The registers are identified by binary numbers, but
assembly languages give them “easy-to-remember” names
 The 8086 offered 16-bit registers
 Four general purpose 16-bit registers
 AX
 BX
 CX
 DX
General purpose registers
 AX, BX, CX, and DX: They can be
assigned to any value you want.
 AX (accumulator register). Most of
arithmetical operations are done with AX.
 BX (base register). Used to do array
operations. BX is usually worked with other
registers like SP to point to stacks.
 CX (counter register). Used for counter
purposes.
 DX (data register). Used for storing data value.
The 8086 Registers
AX BX CX DX

AH AL BH BL CH CL DH DL

 Each of the 16-bit registers consists of 8 “low bits”


and 8 “high bits”
 Low: least significant
 High: most significant
 The ISA makes it possible to refer to the low or high
bits individually
 AH, AL
 BH, BL
 CH, CL
 DH, DL
The 8086 Registers
AX BX CX DX

AH AL BH BL CH CL DH DL

 The xH and xL registers can be used as 1-


byte register to store 1-byte quantities
 Important: both are “tied” to the 16-bit register
 Changing the value of AX will change the values
of AH and AL
 Changing the value of AH or AL will change the
value of AX
Index registers
 SI and DI: Usually used to process arrays or
strings:
 SI (source index) is always pointed to the

source array
 DI (destination index) is always pointed to

the destination array.


 These are basically general-purpose registers
 But by convention they are often used as “pointers”,
i.e., they contain addresses
 And they cannot be decomposed into High and Low 1-
byte registers
Segment registers
 CS, DS, ES, and SS:
 CS (code segment register). Points to the
segment of the running program. We may NOT
modify CS directly.
 DS (data segment register). Points to the
segment of the data used by the running
program. You can point this to anywhere you
want as long as it contains the desired data.
 ES (extra segment register). Usually used with
DI and doing pointers things. The couple DS:SI
and ES:DI are commonly used to do string
operations.
 SS (stack segment register). Points to stack
segment.
Pointer registers
 BP, SP, and IP:
 BP (base pointer) used for preserving

space to use local variables.


 SP (stack pointer) used to point the

current stack.
 IP (instruction pointer) denotes the

current pointer of the running program. It


is always coupled with CS and it is NOT
modifiable. So, the couple of CS:IP is a
pointer pointing to the current instruction
of running program. You can NOT access
CS nor IP directly.
Extended register
 386 processors introduce extended
register.
 Most of the registers, except segment
registers are enhanced into 32-bit.
 So, we have extended registers EAX,
EBX, ECX, and so on.
 AX is only the low 16-bit (bit 0 to 15) of
EAX.
 There are NO special direct access to the
upper 16-bit (bit 16 to 31) in extended
register.
The 8086 Registers
 The 16-bit Instruction Pointer (IP) register:
 Points to the next instruction to execute
 Typically not handled directly when writing assembly code
 The 16-bit FLAGS registers
 Information is stored in individual bits of the FLAGS
register
 Whenever an instruction is executed and produces a
result, it may modify some bit(s) of the FLAGS register
 Example: Z (or ZF) denotes one bit of the FLAGS register,
which is set to 1 if the previously executed instruction
produced 0, or 0 otherwise
 We’ll see many uses of the FLAGS registers
Flag register
 Flag is 16-bit register that contains processor
status.
 It holds the value of which the programmers may
need to access. This involves detecting whether the
last arithmetic holds zero result or may be overflow.
 Intel doesn't provide a direct access to it; rather it is
accessed via stack. (via POPF and PUSHF)
 You can access each flag attribute by using bitwise
AND operation since each status is mostly
represented by just 1 bit.
Flag register cont.
 C carry flag is turned to 1 whenever the last
arithmetical operation, such as adding and
subtracting, has carry or borrow otherwise 0.
 P parity flag It will set to 1 if the last operation (any
operation) results even number of bit 1.
 A auxilarry flag It is set in Binary Coded Decimal
(BCD) operations.
 Z zero flag used to detect whether the last operation
(any operation) holds zero result.
 S sign flag used to detect whether the last operation
holds negative result. It is set to 1 if the highest bit
(bit 7 in bytes or bit 15 in words) of the last operation
is 1.
Flag register cont.
 T trap flag used in debuggers to turn on the step-by-step
feature.
 I interrupt flag used to toggle the interrupt enable or not. If
the bit is set (= 1), then the interrupts are enabled,
otherwise disabled. The default is on.
 D direction flag used for directions of string operations. If
the bit is set, then all string operations are done backward.
Otherwise, forward. The default is forward (= 0).
 O the overflow flag used to detect whether the last
arithmetic operation result has overflowed or not. If the bit
is set, then it has been an overflow.
Flag Register
 Flag register contains information reflecting the current status of a
microprocessor. It also contains information which controls the
operation of the microprocessor.
15 0

  OF DF IF TF SF ZF AF  PF  CF

 Control Flags  Status Flags

IF: Interrupt enable flag CF: Carry flag


DF: Direction flag PF: Parity flag
TF:Trap flag AF: Auxiliary carry flag
ZF:Zero flag
SF: Sign flag
OF: Overflow flag
34
The 8086 Registers
AH AL = AX
BH BL = BX
CH CL = CX
DH DL = DX

SI
DI

BP
SP
IP
= FLAGS

CS
DS
SS
ES

16 bits

Control
ALU Unit
Addresses in Memory
 We mentioned several registers that are used for
holding addresses of memory locations
 Segments:
 CS, DS, SS, ES
 Pointers:
 SI, DI: indices (typically used for pointers)
 SP: Stack pointer
 BP: (Stack) Base pointer

 Let’s look at the structure of the address space


Code, Data, Stack
 Although we’ll discuss these at length later,
let’s just accept for now that the address
space has three regions

address space
 A program constantly references all three code
regions
 Therefore, the program constantly references
bytes in three different segments
 For now let’s assume that each region is fully data
contained in a single segment, which is in fact
not always the case
 CS: points to the beginning of the code
segment stack
 DS: points to the beginning of the data
segment
 SS: points to the beginning of the stack
segment
Address Space
 In the 8086 processor, a program is limited to referencing an
address space of size 1MB, that is 220 bytes
 Therefore, addresses are 20-bit long!
 A d-bit long address allows to reference 2d different “things”
 Example:
 2-bit addresses
 00, 01, 10, 11
 4 “things”
 3-bit addresses
 000, 001, 010, 011, 100, 101, 110, 111
 8 “things”
 In our case, these things are “bytes”
 One cannot address anything smaller than a byte
 Therefore, a 20-bit address makes it possible to address 220
individual bytes, or 1MB
Address Space

 One says that a running program has a 1MB


address space
 And the program needs to use 20-bit
addresses to reference memory content
 Instructions, data, etc.
 Problem: registers are at 16-bit long! How
can they hold a 20-bit address???
 The solution: split addresses in two pieces:
 The selector
 The offset
For 20-bit Addresses

selector offset

4 bits 16 bits

 On the 8086 the offset if 16-bit long


 And therefore the selector is 4-bit
 We have 24 = 16 different segments
 Each segment is 216 byte = 64KB
 For a total of 1MB of memory, which is what the
8086 used
For 20-bit Addresses
0000…

0001…
selector offset 0010…

0011…

address 4 bits 16 bits 0100…


0101…

0110…

0111…
1MB
We have 1MB of memory of
1000…
We have 64K segments memory
1001…
We have 16 segments 1010…
1011…

1100…

1101…
1110…

1111…
Memory Segmentation
 A segment is a 64KB block of memory starting from any 16-byte
boundary
 For example: 00000, 00010, 00020, 20000, 8CE90, and E0840 are all valid
segment addresses
 The requirement of starting from 16-byte boundary is due to the 4-bit
left shifting

 Segment registers in BIU


15 0
CS Code Segment

DS Data Segment

SS Stack Segment

ES Extra Segment
42
Memory Address Calculation

 Segment addresses must be stored


Segment address 0000
in segment registers
 Offset is derived from the combination + Offset

of pointer registers, the Instruction


Memory address
Pointer (IP), and immediate values

 Examples

CS 3 4 8 A 0 SS 5 0 0 0 0
IP + 4 2 1 4 SP + F F E 0
Instruction address 3 8 A B 4 Stack address 5 F F E 0

DS 1 2 3 4 0
DI + 0 0 2 2
Data address 1 2 3 6 2
43
Fetching Instructions
 Where to fetch the next instruction?
8088 Memory

CS 1 2 3 4
IP 0012 12352 MOV AL, 0

12352

 Update IP
— After an instruction is fetched, Register IP is updated as follows:

IP = IP + Length of the fetched instruction

— For Example: the length of MOV AL, 0 is 2 bytes. After fetching this instruction,
the IP is updated to 0014

44
Accessing Data Memory
 There is a number of methods to generate the memory address when
accessing data memory. These methods are referred to as
Addressing Modes
 Examples:
— Direct addressing: MOV AL, [0300H]

DS 1 2 3 4 0 (assume DS=1234H)
0 3 0 0
Memory address 1 2 6 4 0

— Register indirect addressing: MOV AL, [SI]

DS 1 2 3 4 0 (assume DS=1234H)
0 3 1 0 (assume SI=0310H)
Memory address 1 2 6 5 0
45
In-class Exercise

 Consider the byte at address 13DDE within a


64K segment defined by selector value
10DE. What is its offset?
In-class Exercise

 Consider the byte at address 13DDE within a


64K segment defined by selector value
10DE. What is its offset?

 13DDE = 10DE * 1610 + offset


 offset = 13DDE - 10DE0
 offset = 2FFE (a 16-bit quantity)
Addressing Modes
Where Are the Operands?
 Operands required by an operation can be specified
in a variety of ways
 A few basic ways are:
 operand in a register
 register addressing mode
 operand in the instruction itself
 immediate addressing mode
 operand in memory
 variety of addressing modes
 direct and indirect addressing modes
 operand at an I/O port
 Simple IN and OUT commands
Register Addressing

 Operand is in an internal register


Examples
mov EAX,EBX ; 32-bit copy
mov BX,CX ; 16-bit copy
mov AL,CL ; 8-bit copy

 The mov instruction


mov destination,source
copies data from source to destination
Register Addressing
 Operands of the instruction are the names of internal register
 The processor gets data from the register locations specified by
instruction operands
For Example: move the value of register BL to register AL

MOV AL, BL AH AL

BH BL

 If AX = 1000H and BX=A080H, after the execution of MOV AL, BL


what are the new values of AX and BX?

In immediate and register addressing modes, the processor does not access memory.
Thus, the execution of such instructions are fast.
Immediate Addressing Mode

Data is part of the instruction


 Operand is located in the code segment along with the
instruction
 Typically used to specify a constant
Example
mov AL,75
 This instruction uses register addressing mode
for destination and immediate addressing mode
for the source
Direct Addressing Mode
Data is in the data segment
 Need a logical address to access data
 Two components: segment:offset
 Various addressing modes to specify the offset component
 offset part is called effective address

 The offset is specified directly as part of instruction


 We write assembly language programs using memory labels
(e.g., declared using DB, DW, LABEL,...)
 Assembler computes the offset value for the label
 Uses symbol table to compute the offset of a label
Direct Addressing Mode

 Assembler builds a symbol table so we can refer to the


allocated storage space by the associated label
Example
.DATA name offset
value DW 0 value 0
sum DD 0 sum 2
marks DW 10 DUP (?) marks 6
message DB ‘The grade is:’,0 message 26
char1 DB ? char1 40
Direct Addressing Mode
Examples
mov AL,char1
 Assembler replaces char1 by its effective address (i.e., its
offset value from the symbol table)
mov marks,56
 marks is declared as
marks DW 10 DUP (0)
 Since the assembler replaces marks by its effective address,
this instruction refers to the first element of marks
 In C, it is equivalent to

table1[0] = 56
Direct Addressing Example

DS  10H + Displacement = Memory location

— Example: assume DS = 1000H, AX = 1234H


DS: 1 0 0 0 _
MOV [7000H], AX + Disp: 7 0 0 0

AH AL 17000
12 34 12 17001H
34 17000H
Direct Addressing Mode

 Problem with direct addressing


 Useful only to specify simple variables
 Causes serious problems in addressing data types such as
arrays
 As an example, consider adding elements of an array
 Direct addressing does not facilitate using a loop structure
to iterate through the array
 We have to write an instruction to add each element of the
array
 Indirect addressing mode remedies this problem
Register Indirect Addressing
 One of the registers BX, BP, SI, DI appears in the instruction operand
field. Its value is used as the memory displacement value.
For Example: MOV DL, [SI]

 Memory address is calculated as following:


BX
SI
DS DI
 10H + = Memory address
SS
BP
 If BX, SI, or DI appears in the instruction operand field, segment register DS
is used in address calculation
 If BP appears in the instruction operand field, segment register SS is used in
address calculation
Register Indirect Addressing
 Example 1: assume DS = 0800H, SI=2000H

MOV DL, [SI] DH DL


12

0A000H 12
DS: 0 8 0 0 _
+ SI: 200 0
memory
0A0 0 0
 Example 2: assume SS = 0800H, BP=2000H, DL = 7

MOV [BP], DL
Register Indirect Addressing
Using indirect addressing mode, we can
process arrays using loops
Example: Summing array elements
 Load the starting address (i.e., offset) of the
array into BX
 Loop for each element in the array
 Get the value using the offset in BX
 Use indirect addressing
 Add the value to the running total
 Update the offset in BX to point to the next element
of the array
Register Indirect Addressing
Loading offset value into a register

 Suppose we want to load BX with the offset value of


table1
 We cannot write
mov BX,table1
 Two ways of loading offset value
 Using OFFSET assembler directive
 Executed only at the assembly time
 Using lea instruction
 This is a processor instruction
 Executed at run time
Register Indirect Addressing
Loading offset value into a register
(cont’d)

 Using OFFSET assembler directive


 The previous example can be written as
mov BX,OFFSET table1

 Using lea (load effective address) instruction


 The format of lea instruction is
lea register,source
 The previous example can be written as
lea BX,table1
Register Indirect Addressing
Loading offset value into a register
(cont’d)
Which one to use -- OFFSET or lea?
 Use OFFSET if possible
 OFFSET incurs only one-time overhead (at assembly time)
 lea incurs run time overhead (every time you run the program)
 May have to use lea in some instances
 When the needed data is available at run time only
 An index passed as a parameter to a procedure
 We can write
lea BX,table1[SI]
to load BX with the address of an element of table1 whose
index is in SI register
 We cannot use the OFFSET directive in this case
Ambiguous Indirect Operands
 Consider the following instructions:
mov [EBX], 100
add [ESI], 20
inc [EDI]
 Where EBX, ESI, and EDI contain memory addresses
 The size of the memory operand is not clear to the
assembler
 EBX, ESI, and EDI can be pointers to BYTE, WORD, or DWORD

 Solution: use PTR operator to clarify the operand size


mov BYTE PTR [EBX], 100 ; BYTE operand in memory
add WORD PTR [ESI], 20 ; WORD operand in memory
inc DWORD PTR [EDI] ; DWORD operand in memory
Based Addressing
 The operand field of the instruction contains a base register (BX or BP)
and an 8-bit (or 16-bit) constant (displacement)
For Example: MOV AX, [BX+4]

 Calculate memory address

DS BX
 10H + + Displacement = Memory address
SS BP

 If BX appears in the instruction operand field, segment register DS


is used in address calculation
 If BP appears in the instruction operand field, segment register SS
is used in address calculation

What’s difference between register indirect addressing and based addressing?


Based Addressing
 Example 1: assume DS = 0100H, BX=0600H
AH AL
MOV AX, [BX+4] C0 B0

DS: 0 1 0 0 _ 01605H C0
+ BX: 0 6 0 0 01604H B0
+ Disp.: 0 0 0 4
01604 memory

 Example 2: assume SS = 0A00H, BP=0012H, CH = ABH

MOV [BP-7], CH
Indexed Addressing
 The operand field of the instruction contains an index register (SI or DI)
and an 8-bit (or 16-bit) constant (displacement)
For Example: MOV [DI-8], BL
 Calculate memory address
SI
DS  10H + + Displacement = Memory address
DI
 Example: assume DS = 0200H, DI=0030H BL = 17H
MOV [DI-8], BL
BH BL
DS: 0 2 0 0 _ 17
+ DI: 003 0 17 02028H
- Disp.: 0 0 0 8
02 028 memory
Based Indexed Addressing
 The operand field of the instruction contains a base register (BX or BP)
and an index register
For Example: MOV [BP] [SI], AH
or MOV [BP+SI], AH

 Calculate memory address

DS BX
 10H + + {SI or DI} = Memory address
SS BP

 If BX appears in the instruction operand field, segment register DS


is used in address calculation
 If BP appears in the instruction operand field, segment register SS
is used in address calculation
Based Indexed Addressing
 Example 1: assume SS = 2000H, BP=4000H, SI=0800H, AH=07H
AH AL
MOV [BP] [SI], AH 07

SS: 2 0 0 0 _ 24800H 07
+ BP: 4 0 0 0
+ SI.: 080 0
24800 memory

 Example 2: assume DS = 0B00H, BX=0112H, DI = 0003H, CH=ABH

MOV [BX+DI], CH
Based Indexed with Displacement Addressing
 The operand field of the instruction contains a base register (BX or BP),
an index register, and a displacement

For Example: MOV CL, [BX+DI+2080H]

 Calculate memory address

DS BX
 10H + + {SI or DI} + Disp. = Memory address
SS BP

 If BX appears in the instruction operand field, segment register DS


is used in address calculation
 If BP appears in the instruction operand field, segment register SS
is used in address calculation
Based Indexed with Displacement Addressing
 Example 1: assume DS = 0300H, BX=1000H, DI=0010H
CH CL
MOV CL, [BX+DI+2080H] 20

DS: 0 3 0 0 _
+ BX: 1 0 0 0 06090H 20
+ DI.: 0010
+ Disp. 2 0 8 0
memory
06090

 Example 2: assume SS = 1100H, BP=0110H, SI = 000AH, CH=ABH

MOV [BP+SI+0010H], CH
Summary of Addressing Modes
Assembler converts a variable name into a
constant offset (called also a displacement)

For indirect addressing, a base/index


register contains an address/index

CPU computes the effective


address of a memory operand
Variables in
Assembly
Variables in Assembly
 Note:
 The assembly language is NOT
case-sensitive.
 A comment in assembly begins with

a semicolon (;). Everything after a


semicolon until the end of the line is
ignored.
Data Allocation
 Variable declaration in a high-level language such as C
char response
int value
float total
double average_value
specifies
 Amount storage required (1 byte, 2 bytes, …)
 Label to identify the storage allocated (response, value, …)
 Interpretation of the bits stored (signed, floating point, …)
 Bit pattern 1000 1101 1011 1001 is interpreted as
 -29,255 as a signed number
 36,281 as an unsigned number
Data Allocation (cont’d)
 In assembly language, we use the define directive
 Define directive can be used
 To reserve storage space
 To label the storage space
 To initialize
 But no interpretation is attached to the bits stored
 Interpretation is up to the program code

 Define directive goes into the .DATA part of the assembly


language program
 Define directive format
[var-name] D? init-value [,init-value],...
Variables Declaration
 Our ideal syntax (TASM based) looks like this:
 .MODEL SMALL
 .STACK 200
 .DATA
 ; data definitions using DB, DW, DD, etc. come here
 .CODE
 START: MOV AX , @DATA ; Initialize DS
 MOV DS , AX ;

 ...
  
 ; Return to DOS
 MOV AX , 4C00H
 INT 21H
 END START
Data Allocation (cont’d)

 Five define directives


DB Define Byte ;allocates 1 byte
DW Define Word ;allocates 2 bytes
DD Define Doubleword ;allocates 4 bytes
DQ Define Quadword ;allocates 8 bytes
DT Define Ten bytes ;allocates 10 bytes
Examples
sorted DB ’y’
response DB ? ;no initialization
value DW 25159
Data Allocation (cont’d)

 Multiple definitions can be abbreviated


Example
message DB ’B’
DB ’y’
DB ’e’
DB 0DH
DB 0AH
can be written as
message DB ’B’,’y’,’e’,0DH,0AH
 More compactly as
message DB ’Bye’,0DH,0AH
Data Allocation (cont’d)
 Multiple definitions can be cumbersome to initialize data
structures such as arrays
Example
To declare and initialize an integer array of 8 elements
marks DW 0,0,0,0,0,0,0,0
 What if we want to declare and initialize to zero an array
of 200 elements?
 There is a better way of doing this than repeating zero 200 times
in the above statement
 Assembler provides a directive to do this (DUP directive)
Data Allocation (cont’d)
 Multiple initializations
 The DUP assembler directive allows multiple initializations to
the same value
 Previous marks array can be compactly declared as
marks DW 8 DUP (0)
Examples
table1 DW 10 DUP (?) ;10 words, uninitialized
message DB 3 DUP (’Bye!’) ;12 bytes, initialized
; as Bye!Bye!Bye!
Name1 DB 30 DUP (’?’) ;30 bytes, each
; initialized to ?
Data Allocation (cont’d)
 The DUP directive may also be nested
Example
stars DB 4 DUP(3 DUP (’*’),2 DUP (’?’),5 DUP (’!’))
Reserves 40-bytes space and initializes it as
***??!!!!!***??!!!!!***??!!!!!***??!!!!!
Example
matrix DW 10 DUP (5 DUP (0))
defines a 10X5 matrix and initializes its elements to 0
This declaration can also be done by
matrix DW 50 DUP (0)
Data Allocation (cont’d)
Correspondence to C Data Types

Directive C data type


DB char
DW int, unsigned
DD float, long
DQ double
DT internal intermediate
float value
Variables Declaration
 Variable Limits and Negative Values
  Declaration Acronym Length Limit
db define byte 1 byte 0-255

dw define word 2 bytes 0-65535

dd define double 4 bytes 0-4294967295

 You can assign the variables as negative values, too.


However, assembler will convert them to the corresponding
positive value. For example: If you assign -1 to a db
variable, assembler will convert them to positive 255
integers.
2’s complement
Defining BYTE
Each of the following defines a single byte of storage: Physical Address
value1 DB 'A'; character constant 80000
value2 DB 0; smallest unsigned byte 80001
value1 41H 80002
value3 DB 255; largest unsigned byte
value2 0 80003
value4 DB -128; smallest signed byte value3 FF H 80004
value5 DB +127; largest signed byte value4 80 H 80005
value5 7F H 80006
value6 DB ?; uninitialized byte value6 ? 80007
80008
80009
A variable name is a data label that implies an offset (an address).
Defining Bytes
Physical Address
80000
Examples that use multiple initializers:list1 10
80001
80002
20 80003
30 80004
list1 DB 10,20,30,40 40 80005
list2 10 80006
list2 DB 10,20,30,40 20 80007
30 80008
DB 50,60,70,80 40 80009
50 8000A
DB 81,82,83,84 60 8000B
70 8000C
list3 DB ?,32,41h,00100010b 80 8000D
81 8000E
list4 DB 0Ah,20h,‘A’,22h 82 8000F
83 80010
84 80011
list3 ? 80012
32 80013
41H 80014
22H 80015
list4 0A 80016
20H 80017
22H 80018
Defining Strings (1 of 3)
 A string is implemented as an array of characters Physical Address
 For convenience, it is usually enclosed in quotation marks 80000
 It usually has a null byte at the end 80001
str1 E 80002
 Examples: N 80003
T 80004
E 80005
R 80006
80007
Y 80008
O 80009
U 8000A
str1 DB "Enter your name", ’$’ R 8000B
str2 DB 'Error: halting program', ’$’ 8000C
N 8000D
str3 DB 'A','E','I','O','U' A 8000E
greeting DB "Welcome to the Encryption Demo program " M 8000F
E 80010
DB "created by someone.", ’$’ $ 80011
str2 E 80012
R 80013
R 80014
O 80015
R 80016
: 80017
80018
Defining Strings (2 of 3)

 To continue a single string across multiple


lines, end each line with a comma:

menu DB "Checking Account",0dh,0ah,0dh,0ah,


"1. Create a new account",0dh,0ah,
"2. Open an existing account",0dh,0ah,
"3. Credit the account",0dh,0ah,
"4. Debit the account",0dh,0ah,
"5. Exit",0ah,0ah,
"Choice> ", ’$’
Defining Strings (3 of 3)

 End-of-line character sequence:


 0Dh = carriage return
 0Ah = line feed
str1 DB "Enter your name: ",0Dh,0Ah
DB "Enter your address: ",’$’

newLine DB 0Dh,0Ah, ’$’

Idea: Define all strings used by your program in the same


area of the data segment.
Using the DUP Operator
 Use DUP to allocate (create space for) an array or
string.
 Counter and argument must be constants or constant
expressions

var1 DB 5 DUP(0) ; 20 bytes, all equal to zero


var2 DB 4 DUP(?) ; 20 bytes, uninitialized
var3 DB 4 DUP("STACK") ; 20 bytes: "STACKSTACKSTACKSTACK"
var4 DB 10,3 DUP(0),20
Physical Address
80000
80001
var1 DB 5 DUP(0) var1 0 80002
0 80003
var2 DB 4 DUP(?) 0 80004
var3 DB 2 DUP("STACK") 0 80005
0 80006
var4 DB 10,3 DUP(0),20 var2 ? 80007
? 80008
? 80009
? 8000A
var3 S 8000B
T 8000C
A 8000D
C 8000E
K 8000F
S 80010
T 80011
A 80012
C 80013
K 80014
var4 10 80015
0 80016
0 80017
0 80018
20
Defining DW
 Define storage for 16-bit integers
 or double characters
 single value or multiple values

word1 DW 1234H ; largest unsigned value


word2 DW -1 ; smallest signed value
word3 DW ? ; uninitialized, unsigned
word4 DW "AB" ; double characters
myList DW 1,2,3,4,5 ; array of words
array DW 5 DUP(?) ; uninitialized array
Physical Address
80000
80001
word1 34 80002
12 80003
word2 FF 80004
FF 80005
word3 ? 80006
? 80007
word1 DW 1234H word4 B 80008
word2 DW -1 A 80009
myList 01 8000A
word3 DW ? 00 8000B
word4 DW "AB" 02 8000C
myList DW 1,2,3,4,5 00 8000D
03 8000E
array DW 5 DUP(?) 00 8000F
04 80010
00 80011
05 80012
00 80013
array ? 80014
? 80015
? 80016
? 80017
? 80018
?
Defining DD
Storage definitions for signed and unsigned 32-bit
integers:

val1 DD 12345678h ; unsigned


val2 DD –1 ; signed
val3 DD 20 DUP(?) ; unsigned array
val4 DD –3,–2,–1,0,1 ; signed array
Physical Address
80000
80001
val1 78 80002
val1 DD 12345678h 56 80003
val2 DD –1 34 80004
val3 DD 20 DUP(?) 12 80005
val2 FF 80006
val4 DD –3,–2,–1,0,1 FF 80007
FF 80008
FF 80009
Val3[0] val3 ? 8000A
? 8000B
? 8000C
? 8000D
Val3[1] Val3+4 ? 8000E
? 8000F
? 80010
? 80011
Val3+8
Val3[2] ? 80012
? 80013
? 80014
? 80015
Val3[3] Val3+12 ? 80016
? 80017
? 80018
?
Defining QB, TB

Storage definitions for quadwords, tenbyte values,


and real numbers:

quad1 DQ 1234567812345678h
val1 DT 1000000000123456789Ah
Little Endian Order
 All data types larger than a byte store their individual
bytes in reverse order. The least significant byte occurs
at the first (lowest) memory address.

 Example:
val1 DD 12345678h
EQU Directive
 Define a symbol as either an integer or
text expression.
 Cannot be redefined
PI EQU <3.1416>
pressKey EQU <"Press any key to continue...",0>
.data
prompt DB pressKey
Moving Around Values
 If you need to do some calculations or commands
involving the variables you'll have to load the variable
values to the registers.
 The syntax of the mov command is mov a , b . which
means assign b to a
Var1
Var2
Reg 1
mov ax, [var2]
MM mov [var1],ax

Reg 2
Caveats in MOVs
 You CANNOT use mov [var1], [var2].
 In other words, mov command cannot transfer
values between two variables directly. So, how can
we get around with this? Use the register.
 Suppose both var1 and var2 are word
variables. We can use any word registers (AX,
BX, CX, DX, and so on) to do the transfer.
Suppose we use AX.
 Thus, mov [var1], [var2] must be transformed into:
mov ax, [var2]
mov [var1],ax
Moving Around Values example

:
jmp start
our_var dw 10
start: The square brackets [ ] are to
mov bx, [our_var] distinguish the variable from its
mov cx, bx address.
mov [our_var], cx
mov ax, 4c00h
int 21h
end
Moving Around Values cont.
 When we deal with byte variables (i.e. db), we need to use
byte registers (e.g. AL, AH, BL, BH, and so on) to do our
bidding.
 AX, BX, CX, DX, and so on are word registers.
 You can use double-word registers which is available in 80386
processors or better (use p386n instead of p286n to enable
double-word registers).
 The double-word registers includes EAX, EBX, ECX, EDX, and
so on.
 We can assign variables with constants with mov instruction.
However, this will work only with 80286 or better processors:
mov [word ptr our_var], 1
Notice the word ptr modifier must be used when you assign
constants to variables. Since our_var is a word variable, we
need to use word ptr modifier. Likewise, byte variable uses
byte ptr modifier and double-word variable uses dword ptr.
Moving Around Values
example

Notice the way that Intel assembler


store a word value.
It stores the least significant byte first,
then the most significant byte later.
Moving Around Values cont.
 Recall that variables in assembly are
treated as addresses.

AX <= 0502h
Moving Around Values cont.

 Double-word variables are also stored


similarly (i.e. bottom-up, flipped like the word
variables)
my_var dd 1234BABEh
Impacts on Registers
 Recall that the word register AX consists of
AH and AL.
 Modifying either AH or AL will modify the
contents of AX.
 Likewise, modifying AX will be likely modify
AH and AL.
Question Marks on Variables
 If you are not certain about the default
value of a variable you can give a
question mark ("?") instead. For
example:
another_var dw ?
String Variables
 You can define strings variables in
assembly. It is as follows:
message db "Hello World!$“
String variables are required to be stored
as db variables. The string is then
surrounded by quotes, either single or
double, up to you.
String Variables
Why do we have to end our string with a dollar sign
("$")?
 Well, some of the old DOS services require us to
do so. However, some of the system may require
you to end it by zero ASCII code instead:

message db "Hello World!",0


 each characters of the string is converted to its
corresponding ASCII code.
 Another thing to remember in string variables is
that the string ASCII codes are NOT flipped as it
usually is in normal variables.
Size of Data
 Remember that labels merely declare an
address in the data segment, and do not
specify any size
 Size of data is inferred based on the source
or destination register
 mov eax, [L] ; loads 32 bits
 mov al, [L] ; loads 8 bits
 mov [L], eax ; stores 32 bits
 mov [L], ax ; stores 16 bits
 This is why it’s really important to know the
names of the x86 registers
Size Reduction
 Sometimes one needs to decrease the data size
 For instance, you have a 4-byte integer, but you needs to use it as a
2-byte integer for something
 We simply uses the registers: when moving a quantity from an X-bit
registers to a Y-bit register (Y < X), the highest (X-Y) bits are simply
removed
 Example:
 mov ax, [L] ; loads 16 bits in ax
 mov bl, ax ; takes the lower 8 bits of ax and puts them in bl
 Equivalent to “mov bl, al”

al
ax

bl
Size Reduction
 Of course, when doing a size reduction, one loses
information
 So the “conversion” may not work
 Example:
 mov ax, 000A2h ; ax = 162 decimal
 mov bl, ax; ; bl = 162 decimal
 Decimal 162 is encodable on 8 bits
 Example:
 mov ax, 00101h ; ax = 257 decimal
 mov bl, ax; ; bl = 1 decimal
 Decimal 257 is not encodable on 8 bits
Size Reduction and Sign
 Consider a 2-byte quantity: FFF4
 If we interpret this quantity as unsigned it is decimal 65,524
 Remember that the computer does not know whether the
content of registers/memory corresponds to signed or unsigned
quantities
 Once again it’s the responsibility of the programmer to do the
right thing
 In this case size reduction “does not work”, meaning that
reduction to a 1-byte quantity will not be interpreted as
decimal 65,524, but instead as decimal 244 (F4h)
 If instead FFF4 is a signed quantity (using 2’s complement),
then it corresponds to -000C (000B + 1), that is to decimal -12
 In this case, size reduction works!
Size Reduction and Sign
 This does not mean that size reduction always
works for signed quantities
 For instance, consider FF32h, which is a negative
number equal to -00CEh, that is, decimal -206
 A size reduction into a 1-byte quantity leads to 32h,
which is decimal +50!
 Note that -206 is not encodable on 1 byte
 The range of signed 1-byte quantities is between decimal
-128 and decimal +127
 So, size reduction may work or not work for signed
or unsigned quantities!
Two Rules to Remember
 For unsigned numbers: size reduction works if all removed
bits are 0

0 0 0 0 0 0 0 0 X X X X X X X X

X X X X X X X X
 For signed numbers: size reduction works if all removed bits
are all 0’s or all removed bits are all 1’s, AND if the highest bit
not removed is equal to the removed bits
 This highest remaining bit is the new sign bit, and
thus must be the same as the original sign bit
1 1 1 1 1 1 1 1 1 X X X X X X X

1 X X X X X X X
Size Increase
 Size increase for unsigned quantities is
simple: just add 0s to the left of it
 Size increase for signed quantities requires
sign extension: the sign bit must be
extended, that is, replicated
 Consider the signed 1-byte number 5A. This
is a positive number (decimal 90), and so its
2-byte version would be 005A
 Consider the signed 1-byte number 8A. This
is a negative number (decimal -118), and so
its 2-byte version would be FF8A
Unsigned size increase
 Say we want to size increase an unsigned 1-
byte number to be a 2-byte unsigned number
 This can be done in a few easy steps, for
instance:
 Put the 1-byte number into al
 Set all bits of ah to 0
 Access the number as ax
 Example
 mov al, 0EDh
 mov ah, 0
 move ..., ax
Unsigned size increase
 How about increasing the size of a 2-byte quantity to 4 byte?
 This cannot be done in the same manner because there is no
way to access the 16 highest bit of register eax separately!
AX

AH AL = EAX

 Therefore, there is an instruction called movzx (Zero eXtend),


which takes two operands:
 Destination: 16- or 32-bit register
 Source: 8- or 16-bit register, or 1 byte of memory, or a
word of memory
 The destination must be larger than the source!
Using movzx
 movzx eax, ax ; extends ax into eax
 movzx eax, al ; extends al into eax
 movzx ax, al ; extends al into eax
 movzx ebx, ax ; extends ax into ebx
 movzx ebx, [L] ; gives a “size not specified”
error
 movzx ebx, byte [L] ; extends 1-byte value
at address L into ebx
 movzx eax, word [L]; extends 2-byte value
at address L into eax
Signed Size Increase
 There is no way to use mov or movzx instructions to increase the
size of signed numbers, because of the needed sign extension
 Four “old” conversion instructions with implicit operands
 CBW (Convert Byte to Word): Sign extends AL into AX
 CWD (Convert Word to Double): Sign extends AX into DX:AX
 DX contains high bits, AX contains low bits
 a left-over instruction from the time of the 8086 that had no 32-bit registers
 CWDE (Convert Word to Double word Extended): Sign extends AX into
EAX
 CDQ (Convert Double word to Quad word): Signs extends EAX into
EDX:EAX (implicit operands)
 EDX contains high bits, EAX contains low bits
 This is really a 64-bit quantity (and we have no 64-bit register)
 The much more popular MOVSX instruction
 Works just like MOVZX, but does sign extension
 CBW equiv. to MOVSX ax, al
 CWDE equiv. to MOVSX eax, ax
Example
mov al 0A7h ; as a programmer, I view this
; as a unsigned, 1-byte quantity
; (decimal 167)
mov bl 0A7h ; as a programmer, I view this
; as a signed 1-byte
; quantity (decimal -89)

movzx eax, al; ; extend to a 4-byte value


; (000000A7)
movsx ebx, bl; ; extend to a 4-byte value
; (FFFFFFA7)
Data Transfer Instructions
 We will look at three instructions
 mov (move)
 Actually copy
 xchg (exchange)
 Exchanges two operands
 xlat (translate)
 Translates byte values using a translation table
 Other data transfer instructions such as
movsx (move sign extended)
movzx (move zero extended)
Data Transfer Instructions (cont’d)

The mov instruction


 The format is
mov destination,source
 Copies the value from source to destination
 source is not altered as a result of copying
 Both operands should be of same size
 source and destination cannot both be in memory
 Most Pentium instructions do not allow both operands to be
located in memory
 Pentium provides special instructions to facilitate memory-
to-memory block copying of data
Data Transfer Instructions (cont’d)

The mov instruction


 Five types of operand combinations are
allowed:
Instruction type Example
mov register,register mov DX,CX
mov register,immediate mov BL,100
mov register,memory mov BX,count
mov memory,register mov count,SI
mov memory,immediate mov count,23

 The operand combinations are valid for all


instructions that require two operands
Data Transfer Instructions (cont’d)
Source Operand Destination Operand

General Segment Memory Constant


Register Register Location

General Register Yes Yes Yes No

Segment Register Yes No Yes No

Memory Location Yes Yes No No

Constant Yes No Yes No


Data Transfer Instructions (cont’d)

Ambiguous moves: PTR directive


 For the following data definitions
.DATA
table1 DW 20 DUP (0)
status DB 7 DUP (1)
the last two mov instructions are ambiguous
mov BX,OFFSET table1
mov SI,OFFSET status
mov [BX],100
mov [SI],100
 Not clear whether the assembler should use byte or word
equivalent of 100
Data Transfer Instructions (cont’d)

Ambiguous moves: PTR directive


 The PTR assembler directive can be used to clarify
 The last two mov instructions can be written as
mov WORD PTR [BX],100
mov BYTE PTR [SI],100
 WORD and BYTE are called type specifiers
 We can also use the following type specifiers:
DWORD for doubleword values
QWORD for quadword values
TWORD for ten byte values
Data Transfer Instructions (cont’d)
The xchg instruction
 The syntax is
xchg operand1,operand2
Exchanges the values of operand1 and operand2
Examples
xchg EAX,EDX
xchg response,CL
xchg total,DX
 Without the xchg instruction, we need a temporary
register to exchange values using only the mov
instruction
Data Transfer Instructions (cont’d)

The xchg instruction


 The xchg instruction is useful for conversion of 16-bit
data between little endian and big endian forms
 Example:
mov AL,AH
converts the data in AX into the other endian form
 Pentium provides bswap instruction to do similar
conversion on 32-bit data
bswap 32-bit register
 bswap works only on data located in a 32-bit register
Data Transfer Instructions (cont’d)

The xlat instruction


 The xlat instruction translates bytes
 The format is
xlatb
 To use xlat instruction
 BX should be loaded with the starting address of the translation table
 AL must contain an index in to the table
 Index value starts at zero
 The instruction reads the byte at this index in the translation table and
stores this value in AL
 The index value in AL is lost
 Translation table can have at most 256 entries (due to AL)
 The contents of the byte that is AL bytes from the start of the
translation table pointed to by DS:BX is copied into AL, i.e.,
the effect of XLAT is equivalent to the invalid statement:
MOV AL , [BX + AL]
Data Transfer Instructions (cont’d)

The xlat instruction


Example: Encrypting digits
Input digits: 0 1 2 3 4 5 6 7 8 9
Encrypted digits: 4 6 9 5 0 3 1 8 7 2
.DATA
xlat_table DB ’4695031872’
...
.CODE
mov BX,OFFSET xlat_table
GetCh AL
sub AL,’0’ ; converts input character to index
xlatb ; AL = encrypted digit character
PutCh AL
...
Arithmetic Instructions
Addition and Subtraction
 Two instructions used for additions and subtractions: add and sub
 Both instructions can be used on a pair of signed numbers or on a
pair of unsigned numbers
 One of the big advantaged of 2’s complement storage
 No mixing of signed and unsigned numbers
 IMPORTANT: The CPU does not know whether numbers stored in
registers are signed or unsigned!
 You, the programmer, must keep your own interpretation of the number
consistent throughout your program
 The CPU will happy add whatever registers together using binary
addition
 These two instructions each may set some bits of the FLAG
register:
 The carry bit
 The overflow bit
 The zero bit (=1 if the result is equal to zero)
 The sign bit (=1 if the result is negative)
The Magic of 2’s Complement
 I have two 1-byte values, A3 and 17, and I add them together:
A3 + 17 = BA
 If my interpretation of the numbers is unsigned:
 A3h = 163d
 17h = 23d
 BAh = 186d
 and indeed, 163d + 23d = 186d
 If my interpretation of the numbers is signed:
 A3h = -93d
 17h = 23d
 BAh = -70d
 and indeed, -93d + 23d = -70d
 So, as long as I stick to my interpretation, the binary addition does
the right thing!!
 Same thing for the subtraction
 This is why the computer does not need to know whether register
contents are signed or unsigned
Overflow???
 Generally speaking, overflow occurs when the result of an arithmetic
operation generates a result that’s “out of range”
 This happens because a register has a limited number of bits, which
means that our interpretation of a number comes with a valid range
 For instance
 adding 1-byte unsigned quantity 240d to 1-byte unsigned quantity 100d
will lead to an overflow because 340d > 255d
 subtracting 1-byte unsigned quantity 240d from 1-byte unsigned quantity
100d will lead to an overflow because -140d < 0d
 adding 1-byte signed quantity 100d to 1-byte signed quantity 120d will
lead to an overflow because 220d > 127d
 etc.
 Question: how do we detect overflow in a program?
 Important otherwise we could be working with bogus numbers
 It depends on whether numbers are signed or unsigned...
Overflow for Unsigned Operations
 There is an overflow with an unsigned operation
(i.e., on unsigned quantities) if the carry bit is set
 If the carry bit is set, that means we’d need a larger
quantity to hold the result
 This also works for subtractions (instead of a carry, we
have a “borrow”, but it’s still set in the carry bit)
 1-byte Example (all in hex):
 FF + 02 Carry is set (result would be 101h)
 255 + 2 > 255
 01 - 02 Carry is set (result cannot be negative)
 1-2<0
 8A - 0F Carry is not set (result is 7Bh)
 138 - 15 = 123
Overflow for Signed Operations
 There is an overflow with a signed operation (i.e., on
signed quantities) if the overflow bit is set
 This bit is set when the sign of the result does not agree
with the signs of the operands, which would be annoying
for the programmer to check by hand
 1-byte Example (all in hex, same as before):
 FF + 02 Overflow is not set (result is 01h)
 -1 + 2 = +1
 01 - 02 Overflow is not set (result is FFh)
 1 - 2 = -1
 8A - 0F Overflow is set (result would be < 80h)
 8A is negative, and is equal to -76h = -118d
 -118 - 15 < -128, and thus cannot be represented as a 1-byte
signed quantity
In-Class Exercise
 Which of these operations set the Carry bit to 1? (presumably
we care because we think of these as unsigned operations)
 0F12 + F212 (2-byte quantities)
 00E3 + F74F (2-byte quantities)
 F1 - FA (1-byte quantities)
 FB12 - A3AA (2-byte quantities)
 A314 - B010 (2-byte quantities)
 Which of these operations set the Overflow bit to 1?
(presumably we care because we think of these as signed
operations)
 00E3 + FF4F (2-byte quantities)
 F1 - 7A (1-byte quantities)
In-Class Exercise
 Which of these operations set the Carry bit to 1?
0F12
+ F212
= 10124 Carry bit is set

00E3
+ F74F
= F832 Carry bit is not set

 F1 - FA: F1 < FA Carry bit is set


 FB12 - A3AA: FB12 > A3AA Carry bit is not set
 A314 - B010: A314 < B010 Carry bit is set
In-Class Exercise
 Which of these operations set the Overflow bit to 1?
 00E3 + FF4F
 00E3 > 0, equal to decimal +251
 FF4F < 0, 2’s complement = 00B0+1 = B1, equal do decimal -177
 +251 - 177 = 74
 2 byte unsigned numbers are in [-32,768, +32,767]
 Overflow bit is not set
 F1 - 7A
 F1 < 0, 2’s complement = 0E+1 = 0F, equal to decimal -15
 7A > 0, equal to 122
 -15 - 122 = -137
 1-byte unsigned numbers are in [-128,+127]
 Overflow bit is set
Overflow is your Responsibility
 The processor merely computes bits and puts
them into the destination location as if
everything were fine, and it’s your
responsibility to check the overflow!
 Let’s look at two examples
 An unsigned arithmetic example
 A signed arithmetic example
 Note that we will see later how to “check” the
Carry bit and the Overflow bit in the FLAGS
register
Unsigned Overflow
On web site as
ics312_overflow_unsigned.asm

mov al, 0F0h ; al = F0h


mov bl, 0A3h ; bl = A3h
add al, bl ; al = al + bl
movzx eax, al ; increase size for printing
call print_int ; print al as an integer

 As a programmer we decided to do some computation with unsigned values


 We put value F0h in al (unsigned F0h is decimal 240)
 We put value A3h in bl (unsigned A3h is decimal 163)
 We add them together
 The “true” result should be decimal 240+163 = 403, which cannot be encoded on 8
bits (should be < 255)
 But the processor just goes ahead: F0 + A3 = 193h, and then drops the leftmost bits
to truncate to a 1-byte value to get 93h!
 Therefore, when we call print_int, we print the decimal value 00000093, that is: 147!
 This is obviously wrong, and we can tell (or will be able to shortly) because the carry
bit is in fact set to 1
 Note that this is all correct if we assume signed values and replace movzx by movsx,
but then our initial interpretation of the two values is different
Signed Overflow
On web site as
ics312_overflow_signed.asm

mov al, 09Ah ; al = 9Ah


mov bl, 073h ; bl = 73h
sub al, bl ; al = al - bl
movsx eax, al ; increase size for printing
call print_int ; print al as an integer

 As a programmer we decided to do some computation with signed values


 We put value 9Ah in al (signed 9Ah is decimal -102)
 We put value 73h in bl (signed 73h is decimal +115)
 We subtract bl from al
 The “true” result should be decimal -102 - 115 = -217, which cannot be encoded on 8
bits (should be >= -128)
 But the processor just goes ahead: 9A - 73 = 27h
 Therefore, when we call print_int, we print the decimal value 00000027, that is: 39!
 This is obviously wrong, and we can tell (or will be able to shortly) because the
overflow bit is in fact set to 1
 Note that this is all correct if we assume unsigned values and replace movsx by
movzx, but then our initial interpretation of the two values is different
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

 What does this program print?


In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

AL E1 BL A2
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

AL 83 BL A2

E1
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

AL
EAX 00 00 00 83 BL A2

E1
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

AL
EAX 00 00 00 83 BL A2

E1 prints out: 131


+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

AL
EAX FF FF FF A2 BL A2

E1
+ A2
= 183
In-Class Exercise
mov al, 0E1h
mov bl, 0A2h
add al, bl
movzx eax, al
call print_int
movsx eax, bl
call print_int

AL
EAX FF FF FF A2 BL A2

FFFFFFA2 is a negative number prints out: -94


2’s complement: (0000005D+1) = 5E
= decimal 94
Addition and Subtraction
 You may actually add or subtract variables
with constants. But don't forget to add the
word ptr or dword ptr as appropriate.

 If the result of an addition overflows, the


carry flag is set to 1, otherwise it is 0.
 Similarly, if the result of subtraction
requires a borrow, then the carry flag is
also set to 1, otherwise it is 0.
Addition and Subtraction
 Suppose you'd like to add a 32-bit integers
with 16-bit registers.
 Intel processor has a special instruction
called adc.

 For the subtraction, we have similar


instruction called sbb.
Multiplication and Division
 Multiplication and division always assume AX
as the place holder.

 If there is an overflow in multiplication, the


overflow flag will be set.
 Note: mul and div will treat every numbers as
positive. If you have negative values, you'll
need to replace them imul and idiv
respectively.
Multiplication
 There are two instructions to perform
multiplications
 Multiplying unsigned numbers: mul
 Multiplying signed numbers: imul
 Why do we need two different instructions?
 Consider the multiplication of FF by FF
 If we assume unsigned quantities, this is 255*255
= 65035 = FE0Bh
 If we assume signed quantities, this is -1 * -1 = 1
= 0001h
The mul Instruction
 The size of the result of the multiplication is sometimes twice
larger than the size of the operands
 Multiplications just leads to much bigger numbers than additions
 At most the result will be twice the size of the operands (255 *
255 = 65,025, which is encodable on 2 bytes)
 The oldest form of multiplication is the “mul” instruction, which
produce a result twice the size of its operand
mul <register or memory reference>
 If the operand is a byte, then it is multiplied by AL and the result
is stored in (16-bit) AX
 If the operand is 16-bit, it is multiplied by AX and stored in (32-
bit) DX:AX
 There used to be no 32-bit registers
 If the operand is 32-bit, it is multiplied by EAX and the result is
stored in (64-bit) EDX:EAX
 We don’t have 64-bit registers on a 32-bit architecture
The imul instruction
 Imul, which is used for signed numbers has three formats:
imul src
imul dst, src1
imul dst, src1, src2
 The different combinations are shown in Table 2.2 in the text
book
 This table uses the typical way in which one specifies
operands:
 reg16: a 16-bit register
 reg32: a 32-bit register
 immed8: an 8-bit immediate operand (i.e., a number)
 mem16: a word of memory
 etc.
 Let’s look at the table
The imul instruction
dst src1 src2 action
Will not
overflow reg/mem8 AX = AL * src1
(although the reg/mem16 DX:AX = AX * src1
overflow bit may
reg/mem32 EDX:EAX = EAX * src1
be set)
reg16 reg/mem16 dst *= src1
reg32 reg/mem32 dst *= src1
reg16 immed8 dst *= immed8
reg32 immed8 dst *= immed8
reg16 immed16 dst *= immed16
reg32 immed32 dst *= immed32
reg16 reg/mem16 immed8 dst = src1*src2
reg32 reg/mem32 immed8 dst = src1*src2
reg16 reg/mem16 immed16 dst = src1*src2
reg32 reg/mem32 immed32 dst = src1*src2
Division
 Two instruction:
 div for unsigned quantities
 idiv for signed quantities
 They perform integer division
 e.g.,: 19 / 4 produces quotient = 4 remainder = 3
 Only one format for both:
div/idiv src
 If src is an 8-bit quantity:
 AX is divided by src
 quotient stored into AL
 remainder stored into AH
 If src is a 16-bit quantity:
 DX:AX is divided by src
 quotient stored into AX
 remainder stored into DX
Division

 If src is a 32-bit quantity:


 EDX:EAX is divided by src
 quotient stored into EAX
 remainder stored into EDX
 Warning: it’s very common for programmers
to forget initializing DX or EDX before the
division
Negation

 There is a convenient instruction to negate an


operand: neg
 It simply computes the 2’s complement of a
quantity
 Works on 8-bit, 16-bit, or 32-bit quantities
 either in registers or in memory

 We’ll ignore the content of Section 2.1.5 in


the textbook
Increment and Decrement
 Often times, we'd like to incrementing something
by 1 or decrement thing by 1.
 You can use add x, 1 or sub x, 1 if you'd like to,
but Intel x86 assembly has a special instruction
for them.
 Instead of add x, 1 we use inc x. These are
equivalent.
 Likewise in subtraction, you can use dec x for
substitution.
 Beware that neither inc nor dec instruction sets
the carry flag as add and sub do.
Tip
 The arithmetic operations can have special
properties.
 For example: add x, x is actually equal to
multiplying x by 2.
 Similarly, sub x, x is actually setting x to 0.
 In 8086 processor, these arithmetic is faster
than doing mul or doing mov x, 0. Even
more, its code size is smaller.
Bitwise Operation
Why bit operations
 Assembly languages all provide ways to
manipulate bits
 Some of the coolest “tricks” in assembly rely
on bit operations
 Only a few instructions can do a lot very quickly
using judicious bit operations
 Let’s look at some of the common operations,
starting with shifts
 logical shifts
 arithmetic shifts
 rorate shifts
Shift Operations
 A shift moves the bits around in some data
 A shift can be toward the left (i.e., toward the
most significant bits), or toward the right (i.e.,
toward the least significant bits)

 There are two kinds of shifts:


 Logical Shifts
 Arithmetic Shifts
Logical Shifts
 The simplest shifts: bits disappear at one end
and zeros appear at the other
original byte 1 0 1 1 0 1 0 1
left log. shift 0 1 1 0 1 0 1 0
left log. shift 1 1 0 1 0 1 0 0
left log. shift 1 0 1 0 1 0 0 0

right log. shift 0 1 0 1 0 1 0 0


right log. shift 0 0 1 0 1 0 1 0
right log. shift 0 0 0 1 0 1 0 1
Logical Shift Instructions
 Two instructions: shl and shr
 For each you can specify by how many bits you want to do
the shift
 Either by just passing a constant to the instruction
 Or by using whatever is stored in the CL register
 After the instruction executes, the carry flag (CF) contains the
(last) bit that was shifted out
 Example:
mov al, 0C6h ; al = 1100 0110
shl al, 1 ; al = 1000 1100 (8Ch) CF=1
shr al, 1 ; al = 0100 0110 (46h) CF=0
shl al, 3 ; al = 0011 0000 (30h) CF=0
mov cl, 2
shr al, cl ; al = 0000 1100 (0Ch) CF=0
Shifts and Numbers
 The main use for shifts: quickly multiply and divide by powers of 2
 In decimal
 multiplying 0013 by 10 amounts to doing one left shift to 0130
 multiplying by 100 amounts to doing two left shifts to 1300
 In binary
 multiplying by 00101 by 2 amounts to doing a left shift to 01010
 multiplying by 4 amounts to doing two left shifts to 10100
 If numbers are too large, then we’d need more bits and multiplication
doesn’t produce valid results
 e.g., 10000000 (128d) cannot be left-shifted to obtain 256 using 8-bit values
 Similarly, dividing by powers of two amounts to doing right shifts:
 right shifting 10010 (18d) leads to 01001 (9d)
 Note that when dividing odd numbers by two we “lose bits”, which amounts
to rounding to the lower integer quotient
 Consider number 10011 (19d)
 Right shift: 01001 (9d - rounded below)
 Right shift: 00100 (4d - rounded below)
Shifts and Unsigned Numbers
 Using the shifts works only for unsigned numbers
 When numbers are signed, the shifts do not handle
the sign bits correctly and cannot be interpreted as
multiplying/dividing by powers of 2 anymore
 Example: Consider the 1-byte number FE
 If Unsigned:
 FE = 254d = 11111110b
 right shift: 01111111b = 7Fh = 127d (which is 254/2)
 In Signed:
 FE = - 2d = 11111110b
 right shift: 0111111b = 7Fh = +127d (which is NOT -2/2)
Arithmetic Shifts
 Since the logical shifts do not work for signed
numbers, we have another kind of shifts called
arithmetic shifts
 Left shift: sal
 This instruction works just like shl
 As long as the sign bit is not changed by the shift, the
result will be correct (i.e., will be multiplied by 2)
 Right shift: sar
 This instruction does NOT shift the sign bit: the new bits
entering on the left are copies of the sign bit
 Both shifts store the last bit out in the carry flag
Arithmetic Shift Example
 If signed numbers, then the operations below are correct
multiplications / divisions of 1-byte quantities

mov al, 0C3h ; al = 1100 0011 (-61d)


sal al, 1 ; al = 1000 0110 (86h = -122d)
sar al, 3 ; al = 1111 0000 (F0h = -16d)
; (note that this is not an exact division as we
; lose bits on the right!)
 The following is not a correct multiplication by 16!
sal al, 4 ; al = 0000 0000 (0d, which can’t be right)

 One should use the imul instruction instead (but unfortunately imul
doesn’t work on 1-byte quantities):
movsx ax, al ; sign extension
imul ax, 16 ; result in ax
In-Class Exercise

 Consider the following instructions


mov ax, 0F471h
sar ax, 3
shl ax, 7
sar ax, 10

 At each step give the content of register ax


(in hex) and the value of CF
In-Class Exercise
 Consider the following instructions
mov ax, 0F471h
F471 CF=0
sar ax, 3
FE8E CF=0
shl ax, 7
4700 CF=1
sar ax, 10
0011 CF=1
Rotate Shifts
 There are more esoteric shift instructions
 rol and ror: circular left and right shifts
 bits shifted out on one end are shifted in the other
end
 rcl and rcr: carry flag rotates
 the source (e.g., a 16-bit register) and the carry
flag are rotated as one quantity (e.g., as a 17-bit
quantity)
Example Using Shifts
 Say you want to count the number of bits that are
equal to 1 in register EAX
 One easy way to do this is to use shifts
 Shift 32 times
 Each time the carry flag contains the last shifted bit
 If the carry flag is 1, then increment a counter, otherwise
do not increment a counter
 When you’re done the counter contains the number of 1’s
 Let’s write this in x86 assembly
Example Using Shifts

; Counting 1 bits in EAX


mov bl, 0 ; bl is the number of 1 bits
mov cl, 32 ; cl is the loop counter
loop_start:
shl eax, 1 ; left shift
jnc not_one ; if carry != 1, jump to not_one
inc bl ; increment the number of 1 bits
not_one:
dec cl ; decrement the loop counter
jnz loop_start ; if more iterations goto loop_start
Boolean Bitwise Operations
 There are assembly bitwise instructions for all standard
boolean operations: AND, OR, XOR, and NOT
 Bits are computed individually
 Examples:

1 0 1 1 0 0 1 1 0 0 0 1
AND OR
1 1 0 1 1 0 0 1 1 0 1 1
= =
1 0 0 1 0 0 1 1 1 0 1 1

1 1 0 0 0 1 NOT 1 1 0 0 0 1
XOR
0 1 1 0 1 1
= = 0 0 1 1 1 0
1 0 1 0 1 0
Boolean Bitwise Instructions

mov ax, 0C123h

and ax, 82F6h ; ax = C123 AND 82F6 = 8022

or ax, E34Fh ; ax = 8022 OR E34F = E36F

xor ax, 36E9h ; ax = E36F XOR 36E9 = D586

not ax ; ax = NOT D586 = 2A79


The test Instruction
 The test instruction performs an AND, but does not
store the result
 It only sets the FLAG bits
 Just like cmp does a subtraction but never stores its result
 Note that all boolean bitwise instructions do set the
FLAG bits, BUT for the not operation, which doesn’t
 Example:

mov al, 0FFh mov al, 0FFh


test al, 00h not al
jz foo ; branches jz foo ; does not branch
Uses of Bitwise operations
 Bitwise operations are very useful to modify individual bits
within data
 This is done via “bit masks”, that is constant (immediate)
quantities with carefully chosen bits
 Example:
 Say you want to turn on bit 3 of a 2-byte value (counting from the
right, with bit 0 being the least significant bit)
 An easy way to do this is to OR the value with
0000000000001000, which is 8 in decimal
 Say the value is stored in ax
 You can simply execute the command:
or ax, 8 ; turns on bit 3 in ax
 Easy to generalize
 To turn on bits: use OR (with appropriate 1’s in the bit mask)
 To turn off bits: use AND (with appropriate 0’s in the bit mask)
 To flip bits: use XOR (with appropriate 1’s in the bit mask)
Bit Mask Operations Examples

mov eax, 04F346BA2h


or ax, 0F000h ; turns on 4 leftmost bits of ax
; eax = 4F34FBA2
xor eax, 000400000h ; inverts bit 22 of EAX
; eax = 4F74FBA2
xor ax, 0FFFFh ; 1’s complement of ax
; eax = 4F74045D
Turning on a specific bit
 Say you want to turn on a specific bit in some data,
but that you don’t know which one before you run
the program
 the index of the bit to turn on is contained in a register
 we need to build the bit mask “on the fly”
 Assuming that the index of the bit is initially in bl,
and that we which to turn on a bit in eax
mov cl, bl ; must have the bit index in cl
mov ebx, 1 ; create a number 0...01
shl ebx, cl ; shift it left cl times
or eax, ebx ; turn on the desired bit using
; ebx as a mask!
An odd xor
 One often sees the following instruction:
xor eax, eax ; eax = 0
 This is a simple way to set eax to 0
 It is useful because its machine code is smaller than that of,
for instance, mov eax, 0
 Therefore on saves a few bits in the text segment and while
the program runs a few bits less will be needed to be loaded
from memory, saving perhaps a few cycles
 Lesson: On could do everything with operations that look like
those of high-level languages, but the good assembly
programmer (and the good compiler) will use bit operations to
save memory and/or time
Branching & Loop
Instructions
Control Structures
 So far we have seen instructions to
 Move data back and forth between memory and registers
 Do some data conversion
 Perform arithmetic operation on that data
 Now we’re going to learn about control structures, that is
instructions that modify the order in which instructions are executed
 i.e., we not necessarily execute the next instruction
 High-level programming languages provide control structures
 for loops, while loop, if-then-else statements, etc.
 Assembly language provides much more basic control structures
 Mostly it provides a goto!
 A really infamous instruction, that causes horrendous “spaghetti code”
 Luckily, high-level control structures can be cleanly translated into
assembly code
 Therefore, one can write non-spaghetti assembly! (sort of)
Comparisons
 Control structures essentially decide which
instruction should be executed next based on
comparisons of data items
 In assembly, the result of a comparison is stored in
the bits of the FLAGS register
 The basic comparison instruction is cmp
 cmp subtracts one operand from another, and sets
the bits of FLAGS accordingly, but the result of the
subtraction is not stored anywhere
 Other arithmetic instructions also set bits of FLAGS
(add, sub, mul, etc.)
Unsigned Integers
 When you use unsigned integers the bits in the FLAGS
register (also called “flags”) that are important are:
 ZF: The Zero Flag (set to 1 if result is 0)
 CF: The Carry Flag
 During an arithmetic operation, used to detect overflow or to do
clever arithmetic since it may denote a carry or a borrow
 Consider: cmp a, b (which computes a-b)
 If a = b: ZF is set, CF is not set
 If a < b: ZF is not set, CF is set (borrow)
 If you were computing the difference for real, this would mean an
error!
 If a > b: ZF is not set, CF is not set
 Therefore, by looking at ZF and CF you can determine the
result of the comparison!
 We’ll see how we “look” at the flags shortly
Signed Integers
 For signed integers you should care about
three flags:
 ZF: zero flag
 OF: overflow flag (set to 1 if the result overflows

or underflows)
 SF: sign flag (set to 1 if the result is negative)

 Consider: cmp a, b (which computes a-b)


 If a = b: ZF is set, OF is not set, SF is not set
 If a < b: ZF is not set, and SF ≠ OF
 If a > b: ZF is not set, and SF = OF
 Therefore, by looking at ZF, SF, and OF you can
determine the result of the comparison!
Signed Integers: SF and OF???
 Why do we have this odd relationship between SF
and OF?
 Consider two signed integers a and b, and
remember that we compute (a-b)
 If a < b
 If there is no overflow, then (a-b) is a negative number!
 If there is overflow, then (a-b) is (erroneously) a positive
number
 Therefore, in both cases SF ≠ OF
 If a > b
 If there is no overflow, the (correct) result is positive
 If there is an overflow, the (incorrect) result is negative
 Therefore, in both cases SF = OF
Signed Integers: SF and OF???
 Example: a = 80h (-128d), b = 23h (+35d) (a < b)
 a - b = a + (-b) = 80h + DDh = 15Dh
 dropping the 1, we get 5Dh (+93d), which is erroneously positive!
 So, SF=0 and OF=1
 Example: a = F3h (-13d), b = 23h (+35d)(a < b)
 a - b = a + (-b) = F3h + DDh = D0h (-48d)
 D0h is negative and we have no overflow (in range)
 So, SF=1 and OF=0
 Example: a = F3h (-13d), b = 82h (-126d) (a > b)
 a - b = a + (-b) = F3h + 7Eh = 171h
 dropping the 1, we get 71h (+113d), which is positive and we have no
overflow
 So, SF=0 and OF=0
 Example: a = 70h (112d), b = D8h (-40d)(a > b)
 a - b = a + (-b) = 70h + 28h = 98h, which is erroneously negative
 So, SF=1 and OF=1
In-Class Exercise

 What are the ZF, CF, SF, and OF flags for


“comp a,b” for the following values
 a = 0F3h and b = 019h
 a = 074h and b = 082h
 a = 0A3h and b = 071h
In-Class Exercise
 a = 0F3h and b = 019h
 ZF = 0
 CF? (thinking of numbers as unsigned)
 a - b = 0F3h - 019h = something that’s still >0
 CF=0
 SF? (thinking of numbers as signed)
 a + (-b) = F3h + E7h = 1DAh, drop the 1
 DAh is negative
 SF = 1
 OF? (thinking of numbers as signed)
 a is negative, b is positive, DA is negative, we’re good
 OF = 0
In-Class Exercise
 a = 074h and b = 082h
 ZF = 0
 CF? (thinking of numbers as unsigned)
 a - b = 074h - 082h = something that’s <0
 CF=1
 SF? (thinking of numbers as signed)
 a + (-b) = 74h + 7Eh = F2h
 F2h is negative
 SF = 1
 OF? (thinking of numbers as signed)
 a is positive, b is negative, F2 is erroneously negative
 OF = 1
In-Class Exercise
 a = 0A3h and b = 071h
 ZF = 0
 CF? (thinking of numbers as unsigned)
 a - b = 0A3h - 71h = something that’s >0
 CF=0
 SF? (thinking of numbers as signed)
 a + (-b) = A3h + 8Fh = 152h, drop the 1
 52h is positive
 SF = 0
 OF? (thinking of numbers as signed)
 a is negative, b is positive, 52 is erroneously positive
 OF = 1
The FLAGS register

 Is it very important to remember that many


instructions change the bits of the FLAGS
register
 So you should “act” on flag values
immediately, and not expect them to remain
unchanged inside FLAGS
 or you can save them by-hand for later use
perhaps
Summary

cmp a,b ZF CF OF SF

a=b 1 0

unsigned a<b 0 1

a>b 0 0

a=b 1 0 0

signed a<b 0 v !v

a>b 0 v v
Branch Instructions

 A “branch” is basically a “goto” that says:


instead of executing the next instruction, go
execute that other one
 Two types of branches
 Unconditional (often called a “jump”)
 always branches
 Conditional
 branches only when some condition is true
The JMP Instruction
 JMP allows you to “jump” to a code label
 Example:

...
add eax, ebx
jmp here
sub al, bl This instruction will
mvsx ax, al never be executed!
here:
call print_int
...
The JMP Instruction
 The ability to jump to a label in the assembly code is convenient
 In machine code there is no such thing as a label: only addresses
 So one would constantly have to compute addresses by hand
 e.g., “jump to the instruction +4319 bytes from here in the source code”
 e.g., “jump to the instruction -18 bytes from here in the source code”
 This is what programmers way back when used to do by hand, using
signed displacements in bytes
 The displacements are added to the EIP register (program counter)
 There are three versions of the JMP instruction in machine code:
 Short jump: Can only jump to an instruction that is within 128 bytes in
memory of the jump instruction (1-byte displacement)
 Near jump: 4-byte displacement (any location in the code segment)
 Far jump: very rare jump to another code segment
 We won’t use this at all
The JMP Instruction
 A short jump:
jmp label
or jmp short label
 A near jump:
jmp near label
 Why do we even have this?
 Remember that instructions are encoded in binary
 To jump one needs to encode the number of bytes to add/subtract to the
program counter
 If this number is large, we need many bits to encode it
 If this number is small, we want to use few bits so that our program
takes less space in memory
 i.e., the encoding of a short jmp instruction takes fewer bits than the
encoding of a near jmp instruction (3 bytes less)
 In a code that has 100,000 near jumps, if you can replace 50% of them
by short jumps, you save ~150KB (in the size of the executable)
Conditional Branches

 There is a large set of conditional branch


instructions
 The simple ones just branch (or not)
depending on the value of one of the flags:
 ZF, OF, SF, CF, PF
 PF: Parity Flag
 Set to 0 if the number of bits set to 1 in the lower 8-bit
of the “result” is odd, to 1 otherwise
Simple Conditional Branches

JZ branches if ZF is set
JNZ branches if ZF is unset
JO branches if OF is set
JNO branches if OF is unset
JS branches is SF is set
JNS branches is SF is unset
JC branches if CF is set
JNC branches if CF is unset
JP branches if PF is set
JNP branches if PF is unset
Example
 Consider the following C-like code
if (EAX == 0)
EBX = 1;
else
EBX = 2;
 Here it is in x86 assembler
cmp eax, 0 ; do the comparison
jz thenblock ; if = 0, then goto thenblock
mov ebx, 2 ; else clause
jmpnext ; jump over the then clause
thenblock:
mov ebx, 1 ; then clause
next:
 Could use jnz and be the other way around
Another Example
 Say we have the following C code (let us assume that EAX is
signed)
if (EAX >= 5)
EBX = 1;
else
EAX = 2;
 This is much less straightforward
 Let’s go back to our table for signed numbers

cmp a,b ZF OF SF After executing cmp eax, 5


a=b 1 0 0

signed
a<b 0 v !v if (OF = SF) then a >= b
a>b 0 v v
Another Example
 a>=b if (OF = SF)
 Skeleton program
cmp eax, 5 Comparison

???? Testing relevant flags

thenblock:
mov ebx, 1 “Then” block
jmp end
elseblock:
mov ebx, 2 “Else” block
end:
Another Example
 a>=b if (OF = SF)
 Program:
cmp eax, 5 ; do the comparison
jo oset ; if OF = 1 goto oset
js elseblock ; (OF=0) and (SF = 1) goto elseblock
jmp thenblock ; (OF=0) and (SF=0) goto thenblock
oset:
jns elseblock ; (OF=1) and (SF = 0) goto elseblock
jmp thenblock ; (OF=1) and (SF=1) goto thenblock
thenblock:
mov ebx, 1
jmp end
elseblock:
let’s check that it works
mov ebx, 2
end:
Another Example
cmp eax, 5 ; do the comparison
jo oset ; if OF = 1 goto oset
js elseblock ; (OF=0) and (SF = 1) goto elseblock
jmp thenblock ; (OF=0) and (SF=0) goto thenblock
oset:
jns elseblock ; (OF=1) and (SF = 0) goto elseblock
jmp thenblock ; (OF=1) and (SF=1) goto thenblock
thenblock:
mov ebx, 1
Unneeded
jmp end
instruction, we can
elseblock:
just “fall through”
mov ebx, 2
end:
A bit too hard?
 One can play tricks by putting the else block
before the then block
 The previous two examples are really
awkward, and it’s very easy to introduce bugs
 Consequently, x86 assembly provides other
branch instructions to make our life much
easier :)
 Let’s look at these instructions
More branches
cmp x, y
signed unsigned
Instruction branches if Instruction branches if
JE x=y JE x=y
JNE x != y JNE x != y
JL, JNGE x<y JB, JNAE x<y
JLE, JNG x <= y JBE, JNA x <= y
JG, JNLE x>y JA, JNBE x>y
JGE, JNL x >= y JAE, JNB x >= y
Redoing our Example
if (EAX >= 5)
EBX = 1;
else
EAX = 2;

cmp eax, 5
jgethenblock
mov eax, 2
jmp end
thenblock:
mov ebx, 1
end:
Translating high-level structures

 We are used to using high-level structures


rather than just branches
 Therefore, it’s useful to know how to translate
these structures in assembly, so that we can
just use the same patterns than when writing,
say, C code
 A compiler does such translations
 Let’s start with a high-level control structure
we just talked about: if-then-else
If-then-Else
 A generic if-the-else construct:
if (condition) then
then_block
else
else_block;
 Translation into x86 assembly:

; instructions to set flags (e.g., cmp ...)


jxx else_block; ; select xx so that branches if condition==false
; code for the then block
jmp endif
else_block:
; code for the else block
endif:
No Else?
 A generic if-the-else construct:
if (condition) then
then_block

 Translation into x86 assembly:


; instructions to set flags (e.g., cmp ...)
jxx endif; ; select xx so that branches if condition==false
; code for the then block
endif:
For Loops
 Let’s translate the following loop:
sum = 0;
for (i = 0; i <= 10; i++)
sum += i
 Translation
mov eax, 0 ; eax is sum
mov ebx, 0 ; ebx is i
loop_start:
cmp ebx, 10 ; compare i and 10
jg loop_end ; if (i > 10) goto end_loop
add eax, ebx ; sum += i
inc ebx ; i++
jmp loop_start ; goto loop
loop_end:
The loop instruction
 It turns out that, for convenience, the x86
assembly provides instructions to do loops!
 The instruction is called loop
 It is called as: loop <label>
 and does
 Decrement ecx (ecx has to be the loop index)
 If (ecx != 0), branches to the label
 Let’s try to do the loop in our previous
example
For Loops
 Let’s translate the following loop:
sum = 0;
for (i = 0; i <= 10; i++)
sum += i
 The x86 loop instruction requires that
 The loop index be stored in ecx
 The loop index be decremented
 The loop exists when the loop index is equal to zero
 Given this, we really have to think of this loop in reverse
sum = 0
for (i = 10; i > 0; i--)
sum += i
 This loop is equivalent to the previous one, but now it can be
directly translated to assembly using the loop instruction
Using the loop Instruction
 Here is our “reversed” loop
sum = 0
for (i = 10; i > 0; i--)
sum += i
 And the translation
mov eax, 0 ; eax is sum
mov ecx, 10 ; ecx is i
loop_start:
add eax, ecx ; sum += i
loop loop_start ; if i > 0, go to loop_start
While Loops
 A generic while loop
while (condition) {
body
}
 Translated as:
while:
; instructions to set flags (e.g., cmp...)
jxx end_while ; branches if condition=false
; body of loop
jmp while
end_while
Do While Loops
 A generic do while loop
do {
body
} while (condition)
 Translated as:
do:
; body of loop
; instructions to set flags (e.g., cmp...)
jxx do ; branches if condition=true
Find average of two numbers
.model small
.stack 100
.data
No1 DB 63H ; First number storage
No2 DB 2EH ; Second number storage
Avg DB ? ; Average of two numbers
.code
START:
MOV AX,@data ; [ Initialises
MOV DS,AX ; data segment ]
MOV AL,NO1 ; Get first number in AL
ADD AL,NO2 ; Add second to it
ADC AH,00H ; Put carry in AH
SAR AX,1 ; Divide sum by 2
MOV Avg,AL ; Copy result to memory
Find sum of numbers in the array
.model small
.data
ARRAY DB 12H,24H,26H,63H,25H,86H,2FH,33H,10H,35H
SUM DW 0
.code
START:
MOV AX,@data ; [ Initialise
MOV DS,AX ; data segment ]
MOV CL,10 ; Initialise counter
XOR DI,DI ; Initialise pointer
LEA BX,ARRAY ; Initialise array base pointer
BACK:
MOV AL,[BX+DI] ; Get the number
MOV AH,00H ; Make higher byte 00h
ADD SUM,AX ; SUM = SUM + number
INC DI ; Increment pointer
DEC CL ; Decrement counter
JNZ BACK ; if not 0 go to back
MOV AH,4CH
INT 21H
END STAR
Find
.model small
maximum number in the array
.stack 100
.data
ARRAY DB 63H,32H,45H,75H,12H,42H,09H,14H,56H,38H ; Array of ten numbers
MAX DB 0 ; Maximum number
.code
START:
MOV AX,@data ; [ Initialises
MOV DS,AX ; data segment ]
XOR DI,DI ; Initialise pointer
MOV CL,10 ; Initialise counter
LEA BX,ARRAY ; Initialise base pointer for array
MOV AL,MAX ; Get maximum number
BACK: CMP AL,[BX+DI] ; Compare number with maximum
JNC SKIP ; jump if no carry, if CF is 0 ( CF is set if there is a borrow )
MOV DL,[BX+DI] ; [ If number > MAX
MOV AL,DL ; MAX = number ]
SKIP: INC DI ; Increment pointer
DEC CL ; Decrement counter
JNZ BACK ; IF count = 0 stop; otherwise go BACK
MOV MAX,AL ; Store maximum number
END START
Separate even and odd numbers in array
.model small
.STACK 100
.data
ARRAY DB 12H,23H,26H,63H,25H,86H,2FH,33H,10H,35H
ARR_ODD DB 10 DUP (?)
ARR_EVEN DB 10 DUP (?)
.code
START:
MOV AX,@data ; [ Initialise
MOV DS,AX ; data segment ]
MOV CL,10 ; Initialise counter
XOR DI,DI ; Initialise odd_pointer
XOR SI,SI ; Initialise even_pointer
LEA BP,ARRAY ; Initialise array base_pointer
BACK:
MOV AL,DS:[BP] ; Get the number
AND AL,01H ; Mask all bits except LSB
JZ NEXT ; If LSB = 0 go to next
LEA BX,ARR_ODD ; [ Otherwise
MOV AH,[BX+DI] ; initialse pointer to odd array
MOV ARR_ODD,AH ; and save number in odd array ]
INC DI ; Increment odd_pointer
JMP SKIP
NEXT:
LEA BX,ARR_EVEN ; [ Initialise pointer
MOV AH,[BX+SI] ; to even array and save number
MOV AH,ARR_EVEN ; in even array
INC SI ; Increment even_pointer

SKIP:
INC BP ; Increment array base_pointer
DEC CL ; Decrement counter
JNZ BACK ; if not 0 go to back
END START
Computing Prime Numbers
 The book has an example of an assembly
program that computes prime numbers
 Let’s look at it in detail
 Principle:
 Try possible prime numbers in increasing order
starting at 5
 Skip even numbers
 Test whether the possible prime number (the
“guess”) is divisible by any number other than 1
and itself
 If yes, then it’s not a prime, otherwise, it is
Computing Primes in C
unsigned int guess;
unsigned int factor;
unsigned int limit;
printf(“Find primes up to: “);
scanf(“%u”,&limit);
printf(“2\n3\n”); // prints the first 2 obvious primes
guess = 5; // we start the guess at 5
while (guess <= limit) {
factor = 3; // look for a possible factor
// we only look at factors < sqrt(guess)
while ( factor*factor < guess && guess % factor != 0 )
factor += 2;
if ( guess % factor != 0 ) // we never found a factor
printf(“%d\n”,guess);
guess += 2; // skip even numbers
}
Computing Primes in Assembly
unsigned int guess;
unsigned int factor; bss segment
unsigned int limit;
printf(“Find primes up to: “);
scanf(“%u”,&limit); data segment (message)
printf(“2\n3\n”); // prints the first 2 obvious primes
guess = 5; // we start the guess at 5 easy text segment
while (guess <= limit) {
factor = 3; // look for a possible factor
// we only look at factors < sqrt(guess)
while ( factor*factor < guess && guess % factor != 0 )
factor += 2;
if ( guess % factor != 0 ) // we never found a factor
more difficult text segment
printf(“%d\n”,guess);
guess += 2; // skip even numbers
}
Computing Primes in Assembly
unsigned int guess;
unsigned int factor; bss segment
unsigned int limit;
printf(“Find primes up to: “);
scanf(“%u”,&limit); data segment (message)
printf(“2\n3\n”); // prints the first 2 obvious primes
guess = 5; // we start the guess at 5 easy text segment

%include “asm_io.inc” mov eax, Message ; print the message


segment .data call print_string
Message db “Find primes up to: “, 0 call read_int ; read Limit
segment .bss mov [Limit], eax
Limit resd 1 ; 4-byte int mov eax, 2 ; print “2\n”
Guess resd 1 ; 4-byte int call print_int
segment .text call print_nl
global asm_main mov eax, 3 ; print “3\n”
asm_main: call print_int
enter 0, 0 call print_nl
pusha mov dword [Guess], 5 ; Guess = 5
Computing Primes in Assembly
while (guess <= limit) {
...
unsigned }

numbers
while_limit:
mov eax, [Guess]
cmp eax, [Limit] ; compare Guess and Limit
jnbe end_while_limit ; If !(Guess <= Limit) Goto end_while_limit

... ; body of the loop goes here

jmp while_limit
end_while_limit:

popa ; clean up
mov eax, 0 ; clean up
leave ; clean up
ret ; clean up
Computing Primes in Assembly
factor = 3; // look for a possible factor
mov ebx, 3 ; ebx is factor // we only look at factors < sqrt(guess)
while_factor: while ( factor*factor < guess &&
guess % factor != 0 )
mov eax, ebx ; eax = factor factor += 2;
mul eax ; edx:eax = factor * factor if ( guess % factor != 0 ) // we never found a
factor
cmp edx, 0 ; compare edx and 0 printf(“%d\n”,guess);
jne endif ; factor too big guess += 2; // skip even numbers
cmp eax, [Guess] ; compare factor*factor and guess
jnb endif ; if !< goto endif (factor too big)
mov edx, 0 ; edx = 0 if edx != 0, then we’re
mov eax, [Guess] ; eax = [Guess] too big
div ebx ; divide edx:eax by factor
cmp edx, 0 ; compare the reminder with 0
don’t forget to
je end_while_factor ; if == 0 goto end_while_factor
initialize edx
add ebx, 2 ; factor += 2
jmp while_factor ; loop back
end_while_factor:
mov eax, [Guess] ; print guess
call print_int ; print guess
We don’t chose
call print_nl ; print guess
eax for factor
endif:
because eax is
add dword [Guess], 2 ; guess += 2
used by a lot of
functions/routines
Stacks
Why Stack?
There are several reasons why we need
stacks:
 To save register values if we ran out of
registers.
 To pass parameters to subroutines
 To make space for local variables in
subroutines
 To preserve original register values if we
change them in a subroutine
 To fetch processor flag status
Stack operations
 last in first out (LIFO)
 Stack operations mainly done by two
instructions either push or pop.
 The instruction push will push values
into the stack, while pop will pop it out.
 The syntax is like this:

 Usually the operand x is a 16-bit


registers.
 You can push 8-bit registers too, but the
processor will push a 16-bit value
anyway.
Memory Layout
 You should know that register CS by
default points to the segment where
the code resides. DS will point to the
data segment. ES usually pointed to
data segment too. SS will point to stack
segment. Since CS, DS, ES, and SS
point to the same segment, it means
code, data, and stack resides in the
same region.
How can we manage this?
 The stack is not only pointed by SS register. But
also SP register.
 So, the pair SS:SP points the top of the stack.
Initially, SP is set to the very bottom of the
segment in "tiny" mode, at address 0FFFEh (not
0FFFFh, that's the bottom end of the segment).
 Each time we push something into the stack, this
SP register will be decremented up by 2. If we pop
something, SP will be incremented down by 2.
 Whereas, our code and our data starts at offset
100h. So, the layout looks something like this:
Application
Other Uses
 Can we push a constant? In 8086 NO. In 80286 or
above YES. So, doing push 1, this will be treated
as if a 16-bit value. No need to specify word ptr
and stuff.
 The more useful usage of push and pop is to push
flag and then pop it into register. That way, we can
examine the flag content directly. Look at the
following code:
pushf
pop ax
 There... we can examine the flag values in register
AX, The net effect is the same like assigning AX
with flags. Of course, the mov instruction cannot
handle this.
 Likewise, you can set the flag values using push ax
then popf.
Subroutines
Subprograms

 Subprograms (functions, procedures,


methods) are key to making programs easier
to read and write (code reuse)
 We are going to see how to define and call
subprograms in assembly
 Useful to write large(r) assembly programs
 More importantly, will allow us to understand how
subprograms work in higher-level languages
What is a subprogram?
 A subprogram is a piece of code that starts at
some address in the text segment
 The program can jump to that address to
“call” the subprogram
 When the subprogram is done executing it
jumps back to the instruction after the call
 The subprogram can take parameters
 Let’s see how we can implement this using
only what we’ve seen so far in the course
Example Subprogram
 Say we want to write a subprogram that computes
some numerical function of two operands and
“returns” the result
 e.g., because we need to compute that function often
 We will write the program so that when it is called,
the first operand is in eax and the second in ebx,
and when it returns the result is in eax
 This is a convention that we make, and that should be
documented in the code
 Calling the program can then be done via a simple
jmp
 Let’s look at the code
“By hand” subprogram
...
mov eax, 12 ; first operand = 12
mov ebx, 14 ; second operand = 14
jmp func ; “call” the function
ret:
... Why isn’t this really
a valid implementation
... of a subprogram?
func:
add eax, ebx ; do something with eax and ebx
; put result in eax
jmp ret ; “return” to the instruction
; after the call
Multiple Calls?
 Typically we want to call a function from multiple
places in a program
 The problem with the previous code is that the
function always returns to a single label!
...
jmp func ; “call” the function
ret1:
...
jmp func ; “call” the function
ret2:
...
func:
...
jmp ??? ; where do we return???
A Better Function Call

 To fix our previous example, we simply need


to remember the place where the function
should return!
 This can be done by storing the address of
the instruction after the call in a register, say,
register ecx
 The code for the function then can just return
to whatever instruction ecx points to
 Again, this is a convention that we decide as a
programmer and that we must remember
A Better Function Call
...
mov ecx, ret1 ; store the return address
jmp func ; “call” the function
ret1:
...
mov ecx, ret2 ; store the return address
jmp func ; “call” the function
ret2:
...
func:
...
jmp ecx ; return
All Good, but ...
 So at this point, we can do any function call
 We just need to decide on convention about which registers
hold
 input parameters
 return value
 return address
 The problem is that this gets very cumbersome
 It requires a bunch of “ret” labels
 the return address can be computed numerically as “$ + x”, where x
is the length in bytes of the address of the “jump func” instruction,
which is very awkward
 It forces the programmer to constantly keep track of registers
and be careful to save and restore important values
 Solution:
 A stack
 Two new instructions: CALL and RET
The Stack
 A stack is a Last-In-Last-Out data structure
 Provides two operations
 Push: puts something on the stack
 Pop: removes something from the stack
 Defined by the address of the “element” at the top of the stack
 Push: puts the element on top of the stack and increments the stack
pointer
 Pop: gets the element from the top of the stack and decrements the
stack pointer
 Our stack only allows pushing/popping of elements that are double
words (4-byte elements)
 Note “quite” true, but a much safer approach
The Stack and the ESP Register
 Initially the stack is empty and the ESP register has some value
 Pushing an element:
 Decrease ESP by 4
 Write 4 bytes at address ESP
 Examples:
 push eax
 push dword 42
 Popping an element:
 Get the value from the top of the stack into a register
 Increase ESP by 4
 Examples:
 pop eax
 pop ebx
 Accessing an element:
 Read the 4 bytes at address ESP
 Example:
 mov eax, [esp]
Example Stack Instructions
00001000h
 Assuming that ESP=00001000h 00000FFFh 0

little endian
00000FFEh 0

increasing addresses
00000FFDh 0
push dword 1 ; ESP = 00000FFCh 00000FFCh 1
00000FFBh 0

little endian
push dword 2 ; ESP = 00000FF8h 00000FFAh 0
push dword 3 ; ESP = 00000FF4h
00000FF9h
00000FF8h
0
2
00000FF7h 0

little endian
pop eax ; EAX = 3
00000FF6h 0
00000FF5h 0
pop ebx ; EBX = 2 00000FF4h 3
pop ecx ; ECX = 1
The ESP Register

 The ESP register always contains the


address of the element at the top of the stack
 Do not use it for anything else!
 Its value is typically updated by calls to push
and pop
 Sometimes we’ll update it by hand
 See this in a few slides
PUSHA and POPA
 One use of the stack is to save/restore register values
 For instance, say your program uses eax and calls a function written
by somebody else
 You have no idea (or don’t care to know) whether that function uses
eax also
 If it does, your eax will be corrupted
 One easy solution:
 push eax onto the stack
 call the function
 pop eax to restore its value
 The x86 offers two convenient instructions
 PUSHA: pushes EAX, EBX, ECX, EDX, ESI, EDI, and EBP onto the
stack
 POPA: restores them and pops the stack
 It’s now simple to say “save all my registers” and “restore my
registers”
The CALL and RET Instructions
 One of the annoying things with our previous
subprogram was that we had to manage the return
address
 In our example we stored it into the ECX register
 Two convenient instructions can do this for us
 CALL:
 Puts the address of the next instruction on the stack
 Unconditionally jumps to a label (calling a function)
 RET:
 Pops the stack and gets the return address
 Unconditionally jumps to that address (returning from a
function)
Without CALL and RET
...
mov ecx, ret1 ; store the return address
jmp func ; “call” the function
ret1:
...
mov ecx, ret2 ; store the return address
jmp func ; “call” the function
ret2:
...
func:
...
jmp ecx ; return
With CALL and RET
...
call func ; call the function
...

call func ; call the function


...

func:
...
ret ; return
Nested Calls
 The use of the stack enables nested calls
 Return addresses are popped in the reverse order in which
they were pushed (Last-In-First-Out)
 Warning: one must be extremely careful to pop
everything that’s pushed on the stack inside a
function
 Example of erroneous use of the stack:
func:
mov eax, 12 ;
push eax ; put eax on the stack
ret ; pop eax and interpret
; it as a return address!!
Activation Records
 The stack is useful to store and retrieve return
addresses, transparently managed via the CALL and
RET instructions
 But it’s much more useful than this
 In general, when calling a function, one puts all kinds of
useful information on the stack
 When the function returns, this information is popped off
the stack and the function’s caller can safely resume
execution
 The set of “useful information” is typically called an
activation record (or a “stack frame”)
 One very important component of an activation record is
the parameters passed to the function
 Another is the return address, as we’ve already seen
Subprogram Conventions
 Typically, one uses a consistent calling convention, so that there is a
generic way to call a subprogram
 Of course compilers use calling conventions
 The compiler, when generating assembly code, must follow a standard
process to generate assembly corresponding to function calls and
returns
 Some languages specify which calling convention should be used
 What we describe in all that follows is mostly the convention used
by the C language
 i.e., C compilers should use this convention when generating assembly
code from C code
A Simple Activation Record
 To call a function you have to follow the following steps:
 Push the parameters onto the stack
 Execute the CALL instruction, which pushes the return address
onto the stack

 Warning: In the C calling convention parameters are


pushed onto the stack in reverse order!
 Say the function is f(a,b,c)
 c is pushed onto the stack first
 b is pushed onto the stack second
 c is pushed onto the stack third
A Simple Activation Record
 Say you want to call a function with 2 32-bit parameters
 If parameters are < 32 bits, they need to be converted to 32-bit
values
 After the call, the stack looks like this:

ESP+8 2nd parameter


Activation direction
ESP+4 1st parameter
Record of growth
ESP return address
ESP and EBP
 There is one problem with referencing parameters
using ESP, as in [ESP+8]
 If the subprogram uses the stack for something else,
ESP will be modified!
 So at some point in the program, the 2nd parameter
should be accessed as [ESP+8]
 And at some other point, it may be accessed as [ESP+12],
[ESP+16], etc., depending on how the stack grows
 So the convention is to use the EBP register to save
the value of ESP as soon as the subprogram starts
 Afterwards, the 2nd parameter is always accessed
as [EBP+8] and the 1st parameter is always
accessed as [EBP+4]
ESP and EBP
 Stack as it is when the subprogram begins
ESP+8 2nd parameter
ESP+4 1st parameter
ESP return address  EBP = ESP
EBP+8 2nd parameter
EBP+4 1st parameter
EBP = ESP return address
 Further use of the stack

ESP+16 EBP+8 2nd parameter


ESP+12 EBP+4 1st parameter
Parameters still referred to
ESP+8 EBP return address as EBP+4 and EBP+8
ESP+4 stuff
ESP stuff
ESP and EBP
 So far so good, but the caller may have been using EBP!
 Typically to access its own parameters
 So the convention is to first save the value of EBP onto the
stack and then set EBP = ESP, as soon as the program starts
 So, the stack right before the subprogram truly begins is:

ESP+12 2nd parameter


 Parameter accesses:
ESP+8 1st parameter  1st parameter: [EBP+8]
ESP+4 return address  2nd parameter: [EBP+12]
EBP = ESP old value of EBP

 At the end of the subprogram, the value of EBP is popped


and restored with a simple POP instruction
Subprogram Skeleton

func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP

... ; subprogram code

pop ebp ; restore original EBP


ret ; returns
Returning from a Subprogram
 After the subprogram returns, one must “clean up” the stack
 The stack has on it:
 The return address
 The parameters
 The old EBP value
 The old EBP value must be popped in the subprogram (at the end)
 The return address is removed by the RET instruction
 You don’t see the POP, but it’s there
 So the only thing that must be removed from the stack are the
parameters
 The C convention specifies that the caller code must do this
 Other languages specify that the callee must do it
 In fact, it is well known that it’s a little bit more efficient to have the
subprogram (i.e., the callee) do it!
 So one may wonder why C opts for the slower approach
 Turns out, it’s all because of varargs
Using the Parameters
 Inside the code of the subprogram, parameters can
be simply accessed via indirection from the stack
pointer
 In our previous example:
 mov eax, [ESP + 4] ; put 1st parameter into eax
 mov ebx, [ESP + 8] ; put 2nd parameter into ebx
 Typically the subprogram does not pop the
parameters off the stack before using them
 It would be annoying to have to pop the return address
first, and then push it back
 It’s convenient to have the parameters always stored in
memory as opposed to being careful to constantly
preserve them in registers
Example: Calling a Subprogram
Caller:
push dword 2 ; second parameter
push dword 1 ; first parameter
call func ; call the function
add esp, 8 ; pop the two arguments

 Note that to pop the two arguments we merely add 8 to the


stack pointer ESP
 Since we do not care to get the values of the arguments at this
point, it’s quicker than to call pop twice!
 For the case with one argument, calling pop may be better
 The two arguments stay there in memory but will be
overwritten next time a function is called or next time the stack
is used
Return Values?
 Often, one wants a subprogram to return a value
 e.g., a function that computes some number
 There are several ways to do this
 One way is to pass as a parameter the address of a zone of
memory in which some result should be written
 As in: void foo(int *x); foo(&a);
 This is not a true return value
 As in: int foo();
 The C convention is that the return value is always stored in
EAX when the function returns
 It’s the responsibility of the caller to save the EAX value before
the call (if needed) and to restore it later
 In some of our previous example, we just didn’t use EAX to hold
anything important so that this issue never arose
 e.g., when calling read_int(), read_char(), etc.
Saving Registers in Subprograms
 Just saving EBP
func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP

... ; subprogram code

mov eax, ... ; set return value

pop ebp ; restore original EBP


ret ; returns
Saving Registers in Subprograms
 Saving EBX and ECX in addition to EBP

func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP
push ebx ; save EBX
push ecx ; save ECX

... ; subprogram code

mov eax, ... ; set return value

pop ecx ; restore ECX


pop ebx ; restore EBX
pop ebp ; restore ebp
ret ; returns
Saving Registers in Subprograms
 Saving “all” registers using PUSHA and POPA

func:
push ebp ; save original EBP
mov ebp, esp ; set EBP = ESP
pusha ; save all (including new EBP)

... ; subprogram code

mov eax, ... ; set return value Problem?

popa ; restore all (including new EBP)


pop ebp ; restore original ebp
ret ; returns

Overwrites the return value


that’s stored in eax!
Local Variables in Subprograms
 In all the examples we have seen so far, the subprograms
were able to do their work using only registers
 But sometimes, a subprogram’s needs are beyond the set of
available registers and some data must be kept in memory
 Just think of all subprograms you wrote that used more than 6
local variables (EAX, EBX, ECX, EDX, ESI, EDI)
 One possibility could be to declare a small .bss segment for
each subprogram, to reserve memory space for all local
variables
 Drawback #1: memory waste
 This reserved memory consumes memory space for the entire
duration of the execution even if the subprogram is only active
for a tiny fraction of the execution time
 Drawback #2: subprogram are not reentrant
Local variables on the stack
 Since activation records on the stack are used to
store relevant information pertaining to a
subprogram, why not use it for storing the
subprogram local variables?
 The standard approach is to store local variables
right after the saved EBP value on the stack
 This is simply done by subtracting some amount to the
ESP pointer
 The local variables are then accessed as [EBP - 4],
[EBP - 8], etc.
 Let’s see this on an example
Local Variable Examples
 Say we have a subprogram that takes 2 parameters, uses 3
local variables, and doesn’t return any value
 The code of the subprogram is as follows:
func:
push ebp ; save old EBP value
mov ebp, esp ; set EBP
subesp, 12 ; add space for 3 local variables
; subprogram body
mov esp, ebp ; deallocate local variables
pop ebp ; restore old EBP value
ret

 Let’s look at the content of the stack when the subprogram


body begins
Local Variables Example
 Inside the body of the EBP+12 2nd parameter
subprogram, parameters are EBP+8 1st parameter
referenced as: EBP+4 return address
 [EBP+12]: 2nd parameter EBP saved EBP
 [EBP+8]: 1st parameter EBP-4 1st local var
EBP-8 2nd local var
 Inside the body of the EBP-12 3rd local var
subprogram, local variables are
referenced as:
 [EBP-4]: 1st local variable
 [EBP-8]: 2nd local variable
 [EBP-12]: 3rd local variable
Functions example
 Let's make a subroutine to calculate
1+2+...+n.
Document a subroutine
 It is a good habit to document a
subroutine. At least give a comment
above it.
Routine Placement
Conclusion
 When programming one always faces trade-offs between
program readability and program performance
 Choices must be made based on the task at hand
 With by-hand assembly programming, the programmer can
make fine-tuned decisions for these trade-offs
 e.g., for a particular function I decide to not save all registers
because I _know_ that it won’t corrupt them, thus saving a bit of
time
 e.g., I know that I can reuse some register value that was
modified in a subprogram to do some clever optimization
 Some of these optimizations can only be done by a human
who understands what the program does
 Some of these optimizations can sometimes be done by a
compiler that generates assembly code from a program
written in some high-level language
Arrays
Array Revisited
 To refresh our mind, declaring a ten-
byte array is like this:

 To load the first element of the array into


register al is like this:

 Accessing the second, the third, and the


forth element is like this:
Access Array through a
loop
Reverse array example

Note:
BX is nicked as 'base register',
SI as 'source index' and
DI as 'destination index'.
XCHG
 XCHG instruction used to swap things
Interrupt
Essentials
Introduction to Interrupt
 Interrupts can be seen as a number of
functions.
 These functions make the programming
much easier, instead of writing a code to
print a character you can simply call the
interrupt and it will do everything for you.
 There are also interrupt functions that
work with disk drive and other hardware.
We call such functions software
interrupts.
 Interrupts are also triggered by different
hardware, these are called hardware
interrupts. Currently we are interested
in software interrupts only.
Introduction to Interrupt
 To make a software interrupt there is an INT instruction, it has
very simple syntax:
INT value
Where value can be a number between 0 to 255 (or 0 to 0FFh),

 generally we will use hexadecimal numbers.


You may think that there are only 256 functions, but that is not
correct.
 Each interrupt may have sub-functions.
To specify a sub-function AH register should be set before
calling interrupt.
Each interrupt may have up to 256 sub-functions (so we get 256
* 256 = 65536 functions).
Introduction to Interrupt
cont.
 Interrupt number alone is not enough.
 Interrupt behaves differently depending on which
service number is called.
 Service numbers are usually placed in AH.
 Sub-service number is usually placed in AL.
 This interrupt mechanism is pretty much like a
phone number.
Input and Output to
Screen
Output to Screen
 After the start label we are invoking
interrupt number 21h, service 09h.
 Interrupt 21h is reserved for operating
system calls.
 when you look up what service 09h
does on interrupt 21h in interrupt list

 To insert a new line simply change the


message declaration into:
Input from keyboard
 interrupt 21h service 0Ah offers a mean
to input from keyboard. The interrupt
lists say:
Input from keyboard
example

Buffer
Output: A Better Version
 There is one way to cope with “$” issue
by output characters one by one using
a loop.
 The loop terminates if the character
being read is 0.
 Zero in ASCII number is defined as a
blank and usually used to terminate
stuffs.
 Interrupt 21h, service 06h used to print
one character on screen
[bx] means bx is treated as a pointer instead
of value
Input one Character
Number to String
 The output routines we discussed so far
are intended only for outputting strings.
 How can we output numbers?
 We have to convert the numbers to
string first.
Screen features
Setting the cursor

 INT 10H , service 02H

MOV AH, 02H ; request set cursor


MOV BH, 00 ; page number 0
MOV DH, 08 ; row 8
MOV DL, 15 ; column 15
INT 10H ; call interrupt
Clearing the screen

 INT 10H , service 06H


AL: # of lines to scroll, 00 for full screen
BH: color
MOV AX, 0600H ; request clear screen, full screen
MOV BH, 71h ; white BG (7), Blue text (1)
MOV CX, 0000h ; upper left row:column
MOV DX, 184Fh ; lower right row:column
INT 10H ; call interrupt
Procedure for GREEN BACKGROUND AND WHITE
TEXT
PROC SETSCREEN NEAR
MOV AH,06H
MOV AL,00
MOV BH,2FH ;GREEN BACKGROUND AND WHITE TEXT
MOV CX,0000H
MOV DX,184FH
INT 10H
RET
SETSCREEN ENDP
Screen display & Keyboard
Input
 INT 21H , service 09H: Display string
end with $ (or 24h)
See Fig 8-1: displaying ASCII character set
 INT 21H , service 0AH: for accepting
data from the keyboard. (Buffer)
 INT 21H , service 02H: to Display single
Character.

You might also like