Code Generation:
Final Phase of compilation
Input : Intermediate code,
output machine code(MC) or assembly code
optional
SP Front end IC Code IC MC/ALL
Code
Optimization generator
ST
Target program:characteristics/reqmts
must preserve Semantic meaning of source program
High Quality(error free, no ambiguity)
Effective use of available resource of target Program
It must be efficient(speed/fast)
Quick code generation
Tasks of Code generator: Primary task are
Instruction selection:
Instruction X=Y+Z A=B+C
D=A+E
Machine Code LD R0,Y LD R0,B
LD R1,Z LD R1,C
ADD R0,R0,R1 ADD R0,R0,R1
ST X,R0 ST A,R0
LD R1,E
ADD R0,R0,R1
ST D,R0
Register allocation and assignment
Instruction Ordering
One of the most important criteria of code generator is correct code generation
Issues in code generation:
Correctness of code may be dependent on
Input to the code generator
Target program
Operating System
Memory management
Instruction selection
Register allocation
Evaluation Order
Input to the code generator :
Following assumptions are made
Front end has scanned ,parsed & translated the source program into reasonably detailed intermediate
code
Type checking , type conversion and any semantic errors if any have already translated
ST is able to provide runtime address of the data object
Intermediate representation may be
three address code,postfix ,
virtual machine repressentation(byte code),
stack machine code,
Graphical representation(Syntax tree & DAG)
Target Program:
Instruction set architecture of the target machine has significant impact on the code generator that produces
high quality machine code
Common target machine architectrure are RISC,CISC & stack based
RISC: These machines are characterized by
Many registers
Three address instructions
Simple addressing modes
Relatively simple instruction set architecture
CISC: These machines are characterized by
Fewer registers
Two address instruction
Variety of addressing modes
Several register classes
Variable length instructions
Stack based :
Here operations are done by pushing operands onto a stack and then performing operations on
operands at the top of the stack
To achieve high performance top of stack is typically kept in registers
Not popular bcoz lots of swaps & copy instructions are required
Output of code generator : Is target program ,that may be m/c or ALL
Absolute machine language:It can be placed in fixed location of memory and immediately executed
Relocatable Machine Language:
Subprograms to be compiled seperately.
Set of relocatable object modules can be linked together, & loaded for execution by a linker
Assembly level language:Is simpler ,much easier & closer to machine instruction
Assumptions made :
Simple RISC machine ,CISC like addressing modes(few)
For simplicity target language is assembly language
Instruction selection:
Code generator has to map intermediate representation program into code sequences that can be
executed by target machine
Complexity of mapping is dependent on factors such as
Level of intermediate represenation
Nature of instruction set architecture
Desired quality of generated code
Level of intermediate represenation:
If intermediate representation is of high level
Code generator may end up with poor quality code
That may need further optimization
If intermediate representation is of low level
Code generator end up with mor efficient code sequences
Nature of instruction set architecture:
Uniformity & completeness of instruction set are important factors
If target machine doesn’t support each data type in uniform manner then each exception to the
general rule has to be handled in a special way
Some machines have floating point registers for handling floating point numbers
Quality of generated code is determined by speed & size
three address instruction Generated target code may be High level target code that
supports increment
a=a+1 LD R0,a INC a
ADD R0,R0,#1
ST a,R0
Information about instruction cost is essential to generate good target code sequence
For TAC x=y+z,following instruction sequence is generated
LD R0,y // y R0
ADD R0,R0,Z //R=R0+z
ST x,R0 // x R0
These may often produce redundant loads & stores
a=b+c
d=a+e
LD R0,b // Ro=e
ADD R0,R0,c // R0=R0+c
ST a,R0 //a=R0 if a is not subsequently used it is again redundant
LD R0,a // R0=a redundant
ADD R0,R0,e //R0=R0+e
ST d,R0 // d=R0
Register Allocation:
Major issue in code generation is deciding which values to hold in what registers
Instructions involving register operands are shorter & faster
Memory operands larger & comparitively slow
Efficient utilization of registers is very important since they are limited
Usage of register is divided into two subproblems
Register Allocation:Here we select set of variables that will reside in register at a point in a program
Register assignment:During which we pick the specific register that a variable will reside in
Ex certain machine instruction like multiply & divide use even odd register pair
Ex M x,y where x is multiplicand & is in even register of even/odd register pair
Product occupies entire even odd register pair
Dx,y where x is dividend & is in even register of even/odd register pair
Quotient in odd ,remainder in even
Ex
Source code(TAC) Target code
t=a+b LD R1,a
t=t*c ADD R1,b
t=t/d MUL R0,c //result entire R0&R1
DIV R0,d //dividend is in R0
ST R1,t
t=a+b LD R0,a
t=t*c ADD R0,b
t=t/d MUL R0,c
SRDA R0,32 //shift right double arithmetic,shifts the dividend into
DIV R0,d
ST R1,t
Evaluation Oder:
It affects the efficiency of target code.
Some compilation requires fewer registers to hold intermediate results than others
Picking a best order in general case is difficult
Initially we shall avoid the problem by generating code for TA stmt in the order they have been
generated by ICG
Target Language:
Target machine & its instruction set is a prerequisite for designing a good code generator
Here we shall use assembly language code for simple computer that is representative of many
machine
Simple Target Machine Model:
It is TA machine with Load,store,computation,jump operations & conditional jumps
Byte addressable machine with ‘n’ general purpose register(R0 to Rn-1)
Most
Very limited set of instructions
Assume all operands are integers
Labels may precede a instruction
Most instruction consists of operator followed by a target followed by list of source operands
Assume target machine has variety of addressing mode`
We assume following kinds of instructions are available
Load operations: Store operations Computations Conditional jumps
assignment assignment x=R Op dst,Scr1,scr2,scr3
destination=address
LD dest,addr St x,R Sub R1,R2,R3 B cond R,L,
LD R,x BLTZ R1 ,L
LD R1,R2
Unconditional jumps
BR L
useful for pointers
Ex. LD R1,100(R2) //R1=content(100+content(R2))
Indirect Addressing:
LD R1,*100(R2) // R1=content(content(100+content(R2)))
Immediate Addressing:
Content is prefixed by #
LD R1,#100 //loads 100 into R1
ADD R1,R1,#100 //R1= (R1)+100
TAC :
x=y-z
can be implemented by machine instructions
LD R1,y
LD R2,z
SUB R1,R1,R2
ST x,R1
Suppose a is an array with elements of size 8 & elements indexing starts at 0
For TAC b=a[i]
LD R1,i //R1=i
MUL R1,R1,8 //R1=R1*8
LD R2,a(R1) //R2=contents(a+contents(R1))
ST b,R2 //b=R2
TAC a[j]=c is implemented as
LD R1,c //R1=c
LD R2,j //R2=j
MUL R2,R2,8 //R2=R2*8
ST a(R2),R1 //contents(a+contents(R2))=R1
TAC x=*p is implemented as
LD R1,p //R1=p
LD R2,0(R1) // R2=Contents(0+contents(R1))
ST x,R2 //x=R2
TAC *p=y is implemented as
LD R1,p //R1=p
LD R2,y //R2=y
St 0(R1),R2 //contents(0+contents(R1))=R2
For TAC if x<y goto L ,machine code is
LD R1,x // R1=x
LD R2,y //R2=y
SUB R1,R1,R2 //R1=R1-R2
BLTZ R1,M //if R1 < 0 jump to M
Here M is the label of the first machine instruction generated from the TAC instruction that has label L
Program & Instruction Cost:
We associate a cost with compiling & running program
Here for simplicity we assume cost of an instruction is 1+cost associated with
the addressing modes of the operands.cost corresponds to length in words of
the instruction.
Addressing modes involving register have 0 cost
Those involving memory location or constant have additional cost of 1.(bcoz
such operands have to be stored in the words following the instruction)
op source, destination
Where, op is used as an op-code and source and destination are used as a
data field.
The source and destination of an instruction can be specified by the
combination of registers and memory location with address modes.
MODE FORM ADDRESS ADDED COST
absolute M M 1
register R R 0
indexed c(R) C+ 1
contents(R)
indirect *R contents(R) 0
register
indirect *c(R) contents(c+ 1
indexed contents(R))
literal #c c 1
o Here, cost 1 means that it occupies only one word of memory.
o Each instruction has a cost of 1 plus added costs for the source and
destination.
o Instruction cost = 1 + cost of source and destination.
Example:
1. Move register to Register R0 → R1
LD R0, R1
cost = 1+0+0
2. Move register to memory R0 → M
LD R0, M
cost = 1+0+1 (since address of memory location M is in word following the i
nstruction)
3. Indirect indexed mode:
LD R1,* 100(R2)
cost = 1+0+1 (since one word for operand 2 and one for instruction)
4. Literal Mode:
MOV #1, R0
cost = 1+1 +0= 2 (one word for constant 1 and one for instruction)
Instruction calculation Total cost of
instruction set
MOV R1,R0 1+0+0 1
MOV R1,M 1+0+1 2
SUB 5(R0),*10(R1) 1+1+1 3
MOV a,R0 1+1+0 6
ADD b,R0 1+1+0
MOV R0,c 1+0+1
MOV *R1,*R0 1+0+0 2
ADD *R2,*R0 1+0+0
MOV b,R0 1+1+0 6
ADD c,R0 1+1+0
MOV R0,a 1+0+1
MOV b,a 1+1+1 6
ADD c, a 1+1+1