KEMBAR78
Lecture 04 Assembly II | PDF | Central Processing Unit | Java Virtual Machine
0% found this document useful (0 votes)
23 views49 pages

Lecture 04 Assembly II

Uploaded by

zB H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views49 pages

Lecture 04 Assembly II

Uploaded by

zB H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Lecture 4

•Instructions: Language of the Computer (II)

1
Procedures
◼ int f1 (int i, int j, int k, int g)
{ :::: callee
return 1;
}

int f2 (int s1, int s2)


{ caller
::::::
add $3, $4, $3
i = f1 (3,4,5, 6);
add $2, $3, $3
::::
}
◼ How to pass parameters & results?
◼ How to preserve caller register values?
◼ How to alter control? (I.e. go to callee, return from callee) 2
Procedures
◼ How to pass parameters & results
❑ $a0-$a3: four argument registers. What if # of parameters is
larger than 4? – push to the stack
❑ $v0-$v1: two value registers in which to return values
◼ How to preserve caller register values?
❑ Use stack
❑ Caller saved register
❑ Callee saved register
◼ How to switch control?
❑ How to go to the callee
◼ Jal procedure_address (jump and link)
❑ Store the the return address (PC +4 ) at $ra
❑ set PC = procedure_addres
❑ How to return from the callee
❑ Callee exectues jr $ra
3
Procedure Call Stack (Frame)
::::::::::::::: Higher memory address

$fp
argument registers
Procedure Frame
return registers
Stack grows
Callee saved registers

Local variables

$sp Lower memory address

• Frame pointer points to the first word of the procedure frame


4
Procedure Call Stack (Frame)

$fp → $fp →

$sp → $sp →
$fp →

$sp →

Before the procedure call during the procedure call after the procedure call

5
Procedure Calling Convention

◼ Calling Procedure
❑ Step-1: pass the argument
❑ Step-2: save caller-saved registers
❑ Step-3: Execute a jal instruction

::::::::
foo1 () li $a0, 4 # passing argument
{ :::::::: sw $t3, 4($sp) # save $t3
i= i+1; jal foo
x=foo(4); lw $t3, 4($sp) # restore $t3
i=x+i add $t3, $v0, $t3
}
::::::
6
Procedure Calling Convention
◼ Called Procedure subi $sp, $sp, 32
❑ Step-1: establish stack frame sw $ra, 20($sp)
sw $fp, 16($sp)
◼ subi $sp, $sp <frame-size> addi $fp, $sp, 28
❑ Step-2: saved callee saved registers :::::
◼ $ra, $fp,$s0-$s7 :::::
:::::
❑ Step-3: establish frame pointer
◼ Add $fp, $sp, <frame-size>-4 addi $v0, $zero, 1
lw $fp, 16($sp)
◼ On return from a call lw $ra, 20($sp)
❑ Step-1: put returned values in addi $sp, $sp,32
jr $ra
register $v0, $v1.
❑ Step-2: restore callee-saved registers
❑ Step-3: pop the stack
❑ Step-4: return: jr $ra 7
Preserved Registers
0 zero constant 0 16 s0 callee saves
1 at reserved for assembler ...
2 v0 expression evaluation & 23 s7
3 v1 function results 24 t8 temporary (cont’d)
4 a0 arguments 25 t9
5 a1 26 k0 reserved for OS kernel
6 a2 27 k1
7 a3 28 gp Pointer to global area
8 t0 temporary: caller saves 29 sp Stack pointer
... 30 fp frame pointer
15 t7 31 ra Return Address (HW)

8
Nested Procedures
fact:
int fact (int n)
addi $sp, $sp, -8
{
sw $ra, 4($sp) # save $ra
slti $t0, $a0, 1 # n< 1?
if (n <1) return 1;
beq $t0, $zero, L1 else return (n x fact(n-1));
addi $v0,$zero,1 # return 1 }
addi $sp, $sp, 8 # fix up the stack pointer & return
jr $ra
L1: sw $a0, 0($sp) # save argument $a0
addi $a0,$a0,-1 # n = n-1
jal fact # jal(n-1)
lw $a0, 0($sp) # restore argument $a0
mul $v0, $a0, $v0 # return n x fact(n-1)
lw $ra, 4($sp) # restore $ra
addi $sp, $sp, 8 # restore stack pointer
jr $ra # return to the caller 9
Allocating Space for New Data on the Heap
$sp 7fff fffc hex Stack

◼ Heap
❑ The segment for dynamic data
structures, e.g. linked lists

Dynamic data
$gp 1000 8000 hex
Static data
1000 0000 hex
Text
pc 0040 0000 hex
Reserved

0 10
Representation of Characters
◼ ASCII (American Standard Code for Information
Interchange)
❑ Uses 8 bits to represent a character
❑ MIPS provides instructions to move bytes:
lb $t0, 0($sp) #Read byte from source
sb $t0, 0($gp) #Write byte to destination
◼ Unicode (Universal Encoding)
❑ Uses 16 bits to represent a character
❑ Used in Java
❑ MIPS provides instructions to move 16 bits:
lh $t0, 0($sp) #Read halfword from source
sh $t0, 0($gp) #Write halfword to destination

11
void strcpy (char x[ ], char y[ ]) {
String Copy Procedure in C int i;
i = 0;
while ((x[i] = y[i]) != ‘\0’)
i+=1;
strcpy: }
addi $sp, $sp, -4 # adjust stack for 1 more item
sw $s0, 0($sp) # save $s0
add $s0, $zero, $zero #i=0
L1: add $t1, $s0, $a1 # address of y[i] in $t1
lb $t2, 0($t1) # t2 = y[i]
add $t3, $s0, $a0 # address of x[i] in $t3
sb $t2, 0($t3) # x[i] = y[i]
beq $t2, $zero, L2 # if y[i]==0, go to L2
addi $s0, $s0, 1 # i= i+1
j L1 # go to L1
L2: lw $s0, 0($sp) # y[i] ==0; end of string, restore old
# $s0
addi $sp, $sp, 4 #pop 1 word off stack
jr $ra #return
12
MIPS Addressing Mode

◼ Addressing mode:
❑ How to indicate where the data is
◼ 5 addressing modes in MIPS
❑ Immediate addressing
❑ Register addressing
❑ Base or displacement addressing
❑ PC-relative addressing
❑ Pseudo-direct addressing

13
MIPS Addressing Mode (1)

◼ Immediate addressing

op rs rt Immediate

The operand is a constant


Example: addi $2, $3, 4 within the instruction itself

14
MIPS Addressing Mode (2)

◼ Register addressing

op rs rt op … funct

The operand is a register

Register

Example : add $r1, $r2, $r3


15
MIPS Addressing Mode (3)

◼ Base addressing

op rs rt Address

+ Byte Halfword Word

Register

The operand is at MEM

Example : lw $2, 100($3) 16


How to Get the Base Address in the Base Register

Method 1.

.data # define prog. data section


xyz: .word 1 # some data here
… # possibly some other data
.text # define the program code
… # lines of code
lw $5,xyz # loads contents of xyz in r5

◼ the assembler generates an instruction of the form:


lw $5, offset($gp) # gp is register 28, the global pointer

Note : .data, .word, .text are assembler directives

17
How to Get the Base Address in the Base Register
(cont.)

Method 2.
la $6,xyz #r6 contains the address of xyz
lw $5,0($6) #r5 contains the contents of xyz

Method 3. If the address is a constant


li, if less than ±32K
lui and ori, otherwise.

18
MIPS Addressing Mode (4)

◼ PC-relative addressing

op rs rt Address

+ Word

PC

Example : beq $2, $3, 100 19


MIPS Addressing Mode (5)

◼ Pseudodirect addressing

op Address

: Word

PC

Example : j 100
20
Parallelism and Instructions: Synchronization

◼ Danger of data race:


❑ Two tasks are executed in parallel
❑ The two tasks cooperate with each other
◼ One task writes new values that the other task must read
❑ Without any control → reader may get wrong data!
◼ Synchronization
❑ Guarantee the reader must read the data after the writer
has written the data
❑ How?
◼ Sequence of operations on the shared data
◼ Only one task can access the shared data at one time
◼ Commonly seen synchronization techniques:
❑ Lock and unlock
❑ Mutual exclusion
❑ Atomic execution
21
C program
Starting A Program
Compiler
Transforms the C program into an assembly language program.

Assembly language program

Assembler

Object: Machine language module Object: Library routine (machine language)

Linker

Executable: Machine language program

Loader

Memory22
Assembler
◼ Assembler
❑ The assembler turns the assembly language program
into an object file.
❑ Symbol table: A table that matches names of labels
to the addresses of the memory words that
instruction occupy.

23
Object file
header

Name Procedure A

Text
100hex
size

Data
20hex
size

Text segment Address Instruction

0 lw $a0, 0($gp)
lw $a0, x
4 jal 0 jal B
… …

Data segment O (X)

… …

Relocation Instruction Dependen


Address
information Type cy

0 lw X

4 jal B

Symbol table Label Address

X —

B —
24
Assembler (cont.)

◼ Psudoinstruction: a common variation of


assembly language instructions often treated as
if it were an instruction in its own right.
❑ move $t0, $t1 -> add $t0, $zero, $t1
❑ blt -> slt & bne

25
Linker (Link editor)
◼ Linker takes all the independently assembled
machine language programs and “stitches”
them together to produce an executable file
that can be run on a computer.
◼ There are three steps for the linker:
1. Place code and data modules symbolically in
memory.
2. Determine the addresses of data and instruction
labels.
3. Patch both the internal and external references.

26
Object file header

Name Procedure A

Text size 100hex

Data size 20hex

Text segment Address Instruction

0 lw $a0, 0($gp)

4 jal 0 Executable
… …
file header
Data segment O (X) Text size 300hex
… …
Data size 50hex
Relocation Instruction
Address Dependency Text segment Address Instruction
information Type

0 lw X 0040 0000hex lw $a0, 8000hex ($gp)


4 jal B
0040 0004hex jal 40 0100hex
Symbol table Label Address
… …
X —

B — 0040 0100hex sw $a1, 8020hex ($gp)

Object file header


0040 0104hex jal 40 0000hex
Name Procedure B … …
Text size 200hex
Data segment Address
Data size 30hex

Text segment Address Instruction


1000 0000hex (X)
0 sw $a1, 0($gp) … …
4 jal 0
1000 0020hex (Y)
… …

Data segment O (Y)


… …
… …

Relocation information Address Instruction Type Dependency

0 sw Y lw $a0, y
4 jal A
jal A
Symbol table Label Address

Y —

A —
27
Loader
◼ Read the executables file header to determine the size of the text
and data segments
◼ Creates an address space large enough for the text and data
◼ Copies the instructions and data from the executable file into
memory
◼ Copies the parameters (if any) to the main program onto the
stack
◼ Initializes the machine registers and sets the stack pointer the
first free location
◼ Jump to a start-up routine
main();

_start_up:
lw a0, offset($sp) ## load arguments
jal main;
exit 28
Dynamically Linked Libraries (DLL)
◼ Disadvantages with traditional statically linked
library
❑ Library updates
❑ Loading the whole library even if all of the library is
not used
◼ Dynamically linked library
❑ The libraries are not linked and loaded until the
program is run.
❑ Lazy procedure linkage
◼ Each routine is linked only after it is called.

29
Dynamic linked library via lazy procedure
linkage

30
Starting a Java Program

Just In Time Compiler


(JIT): A compiler that
operates at runtime,
translating the interpreted Java bytecode:
code segments into the Instruction from an
native code of the compiler. instruction set
designed to interpret
Java Virtual Machine Java programs.
(JVM): The program that
interprets Java bytecodes
31
How Compilers Optimize
Dependencies Function
Language dependent; Front end per Transform language to
machine independent common intermediate
language
form
Intermediate
representation
Somewhat language For example, loop
High-level transformations and
dependent; largely
optimizations procedure inlining
machine independent
(also called procedure
integration)

Small language Including global and


Global local optimizations +
dependencies; machine
optimizer register allocation
dependencies slight
(e.g., register
counts/types)
Detailed instruction
selection and machine-
Highly machine dependent optimizations;
dependent; language
Code generator
may include or be 32
independent followed by assembler
Compiler Optimization Summary
Optimization name Explanation GL
High level At or near the source level; processor independent
Procedure integration Replace procedure call by procedure body O3
Local Within straight-line code
Common subexpression Replace two instances of the same computation by single copy O1
elimination Replace all instances of a variable that is assigned a constant
Constant propagation with the constant O1
Rearrange expression tree to minimize resource needed for
Stack height reduction expression evaluation O1
Global Across a branch
Global common subexpression Same as local, but this version crosses branches O2
elimination
Copy propagation Replace all instances of a variable A that has been assigned X O2
(i.e., A=X) with X
Code motion Remove code from a loop that computes same value each O2
iteration of the loop
Induction variable elimination Simplify/eliminate array addressing calculations within loops O2
Processor dependent Depends on processor knowledge
Strength reduction Example: replace multiply by a constant with shifts O1
Pipeline scheduling Reorder instructions to improve pipeline performance O1
33
Branch offset optimization Choose the shortest branch displacement that reaches target O1
Common subexpression elimination:
Example: x [i]= x [i] +4
# x[i] + 4 # x[i] + 4
li R100,x li R100,x
lw R101,i lw R101,i
mult R102,R101,4 mult R102,R101,4
add R103,R100,R102 add R103,R100,R102
lw R104,0(R103) lw R104,0(R103)
# value of x[i] is in R104 # value of x[i] is in R104
add R105,R104,4 add R105,R104,4
# x[i] = # x[i] =
li R106,x sw R105,0(R103)
lw R107,i
mult R108,R107,4
add R109,R106,R107
sw R105,0(R109)

34
IA 32: Alternative Architectures
◼ 1978: The Intel 8086 is announced (16 bit architecture)
◼ 1980: The 8087 floating point coprocessor is added
◼ 1982: The 80286 increases address space to 24 bits, +instructions
◼ 1985: The 80386 extends to 32 bits, new addressing modes
◼ 1989-1995: The 80486, Pentium, Pentium Pro add only 4 new instructions
(mostly designed for higher performance)
◼ 1997: 57 new “MMX” instructions are added
◼ 1999: The Pentium III added another 70 instructions (SSE)
◼ 2001: Another 144 instructions (SSE2)
◼ 2003: AMD extends the architecture to increase address space to 64 bits,
widens all registers to 64 bits and other changes (AMD64)
◼ 2004: Extended Memory 64 Technology (EM64T) and adds
more media extensions (SSE3)

Note: SSE : Streaming SIMD Extension


35
IA-32 Registers
Name Use
31 0

EAX GPR 0

ECX GPR 1

EDX GPR 2

EBX GPR 3

ESP GPR 4

EBP GPR 5

ESI GPR 6

EDI GPR 7

CS Code segment pointer

SS Stack segment pointer (top of stack)

DS Data segment pointer 0

ES Data segment pointer 1

FS Data segment pointer 2

GS Data segment pointer 3

EIP Instruction pointer (PC)

EFLAGS Condition codes

IA-32 has only 8 general purpose register vs. 32 in MIPS


36
IA-32 Typical Instructions
◼ Four major types of integer instructions:
❑ Data movement including move, push, pop

❑ Arithmetic and logical (destination register or memory)

❑ Control flow (use of condition codes / flags )

❑ String instructions, including string move and string compare

1. IA-32: Two-operand operation vs. MIPS: three-operand operation


2. IA-32: Register-memory vs. MIPS: register-register 37
IA-32 Addressing Mode

• IA-32 has more addressing modes than MIPS

38
IA-32 instruction Formats
a. JE EIP + displacement
4 4 8

JE Condi- Displacement
tion

b. CALL
8 32

CALL Offset

c. MOV EBX, [EDI + 45]


6 1 1 8 8
r/m
MOV d w Displacement
Postbyte

d. PUSH ESI
5 3

PUSH Reg

e. ADD EAX, #6765


4 3 1 32

ADD Reg w Immediate

f. TEST EDX, #42


7 1 8 32

TEST w Postbyte Immediate

IA-32 variable-length encoding vs. MIPS fixed-length encoding 39


Complex Instruction Set Computer
(CISC)
•Design goal
•simplify compilation of high-level languages
•optimize code size
•Variable format, 2 and 3 operand instruction
• Rich set of addressing modes (apply to any operand)
• Rich set of operations
• Rich set of data types
• Condition codes
• Examples: Vax, Intel
• Problem: increase hardware design complexity!

40
Reduced Instruction Set Architecture
(RISC)
◼ Instruction set simplicity leads to a faster
machine
❑ efficient pipelining 32-bit fixed format instruction (3
formats)
◼ 3-operand, reg-reg arithmetic instruction
◼ Supporting very few addressing modes for
load/store
❑ displacement
❑ immediate

◼ Simple branch conditions


see: SPARC, MIPS, MC88100, AMD2900, i960, i860
PARisc, DEC Alpha, Clipper, ARM
CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3 41
Effectiveness of RISC

◼ CPU time = Instruction count x CPI x Clock cycle


time
❑ CISC - smaller instruction count but larger CPI
❑ RISC - larger instruction count but smaller CPI
❑ Performance ?

How to evaluate performance? Next time.

42
Popular ISAs

◼ For PCs and ware-house scale computers


❑ CISC machines
❑ E.g. Intel processors
◼ For mobile devices
❑ RISC machines and similar to MIPS
❑ Most of them are ARM-based CPUs
◼ ARM: Advanced RISC Machine
◼ The first CPU company that gained lots success in selling IPs
(Intellectual Properties) of CPUs instead of selling CPUs
themselves
◼ Sold to Softbank in 2016

43
Why is ARM architecture so popular?
◼ Chip vendors build their own chip set without
designing CPU from scratch
❑ Qualcomm, Broadcomm, MediaTek etc.
❑ Qualcomm & Broadcomm are good at IC designs in
communications
◼ Challenges of using ARM
❑ Very expensive

45
Snapdragon 845
Alternative to ARM – RISC-V

◼ RISC-V
❑ Also based on MIPS
❑ An open source ISA (Instruction Set Architecture)
◼ Vendors do not have to pay for the ISA
❑ Defines its own instruction set
◼ Very similar to MIPS
❑ You can design your own CPU based on RISC-V
❑ https://riscv.org/
❑ Originally for academic, later for starting the business
◼ https://www.sifive.com/about

46
Next Week

◼ Datapath
❑ The HW built for executing the MIPS ISA

48
Array vs. Pointer
Clear1(int array[ ], int size)
{
int I;
for (i=0, i< size; i+= 1)
array[i] = 0;
}

move $t0, $zero # i =0


Loop1 : sll $t1, $t0, 2 #I*4
add $t2, $a0, $t1 # t2 = address of array[i]
sw $zero, 0($t2) # array [i] = 0
addi $t0, $t0, 1 # i = i +1
slt $t3, $t0, $a1 # compare i and size
bne $t3, $zero, loop1
49
Array vs. Pointer (cont.)

Clear 2(int *array, int size)


{
int *p,
for (p = &array[0]; p < &array[size]; p = p+1)
*p = 0;
}

move $t0, $a0 # p = &array[0]


sll $t1, $a1, 2 # t1 = size x 4
add $t2, $a0, $t1 # t2 = &array[size]
Loop2: sw $zero, 0($t0) # memory[p] = 0
addi $t0, $t0, 4 # p= p+4
slt $t3, $t0, $t2 # compare p, & array[size]
bne $t3, $zero, Loop2
50
Array vs. Pointer (cont.)
Array
move $t0, $zero # i =0
Loop1 : sll $t1, $t0, 2 #I*4
add $t2, $a0, $t1 # t2 = address of array[i]
sw $zero, 0($t2) # array [i] = 0 # of Instruction per iteration
addi $t0, $t0, 1 # i = i +1 =6
slt $t3, $t0, $a1 # compare i and size
bne $t3, $zero, loop1

Pointer

move $t0, $a0 # p = &array[0]


sll $t1, $a1, 2 # t1 = size x 4
# of Instruction per iteration
add $t2, $a0, $t1 # t2 = &array[size] =4
Loop2: sw $zero, 0($t0) # memory[p] = 0
addi $t0, $t0, 4 # p= p+4
slt $t3, $t0, $t2 # compare p, & array[size]
bne $t3, $zero, Loop2

51

You might also like