Intermediate code generation in Compiler Design

Intermediate Code Generation
• The final phase of the compiler front-end
• Goal: translate the program into a format
expected by the compiler back-end
• In typical compilers: followed by intermediate
code optimization and machine code
generation

Why use an intermediate representation?
• It makes optimization easier: write optimization methods only
for the intermediate representation
• The intermediate representation can be directly interpreted
• SPARC (Scalable Processor Architecture)
• MIPS (Microprocessor without Interlocked Pipelined Stages)

Why Intermediate Representation?

How to choose the intermediate representation?
• It should be easy to translate the source language
to the intermediate representation
• It should be easy to translate the intermediate
representation to the machine code
• The intermediate representation should be
suitable for optimization
• It should be neither too high level nor too low
level
• Single compiler can have more than one
intermediate representation

Common Intermediate representations
• General forms of intermediate representations (IR):
– Graphical IR (i.e. parse tree, abstract syntax trees, DAG..)
– Linear IR (i.e. non-graphical)
– Three Address Code (TAC): instructions of the form “result
= op1 operator op2”
– Static single assignment (SSA) form: each variable is
assigned once.
Y = 1
Y = 2
X = Y
Y1 = 1
Y2 = 2
X1 = Y2

Example IR in programming languages
• Java bytecode (executed on the Java Virtual Machine)
• C is used in several compilers as an intermediate
representation (e.g. Lisp, Haskell, Cython. . . )
• Microsoft’s Common Intermediate Language (CIL)
• GNU Compiler Collection (GCC) uses abstract syntax trees

Position of Intermediate code generator
• Intermediate code is the interface between front end
and back end in a compiler
Parser
Static
Checker
Intermediate Code
Generator
Code
Generator
Front end Back end

Abstract Syntax Tree vs. Concrete Syntax (Parse) Tree

Variants of syntax trees - DAG
• Syntax tree is used to crate a DAG instead of tree for Expressions.
• A directed acyclic graph (DAG) is an AST with a unique node for each value.
• It can easily show the common sub-expressions and then use that knowledge during code
generation.
• Common sub-expressions has more than one parent. Ex. a and b-c
• Example: a+a*(b-c)+(b-c)*d
• Node a and (b-c) are unique nodes that values are using in two different context.
+
+ *
*
-
b c
a
d

DAG’s – using Array
• Algorithm
– Search the array for a node m with label op, left child l and right child r
– If there is such a node, return the value number m
– If not, create in the array a new node n with label op, left child l, and right child
r and return its value n.
– The search for m can be made more efficient by using k lists and using a hash
function to determine which lists to check.
=
+
1
i
id entry for i
num 10
+ 0 1
= 0 1
i := i + 1
Array of Records
0
1
2
3
4
10
Value-number method for constructing a node in a DAG
Input: label operator, left child, right child
Output: op, l, r

SDD for creating DAG’s
1) E -> E1+T
2) E -> E1-T
3) E -> T
4) T -> (E)
5) T -> id
6) T -> num
Grammar Productions Semantic Rules
{E.node= new mknode(‘+’, E1.node,T.node)}
{E.node= new mknode(‘-’, E1.node,T.node)}
{E.node = T.node}
{T.node = E.node}
{T.node = new mkleaf(id, id.entry)}
{T.node = new mkleaf(num, num.val)}
Example:
1) p1=mkleaf(id, entry-a)
2) P2=mkleaf(id, entry-a)=p1
3) p3=mkleaf(id, entry-b)
4) p4=mkleaf(id, entry-c)
5) p5=mknode(‘-’,p3,p4)
6) p6=mknode(‘*’,p1,p5)
7) p7=mknode(‘+’,p1,p6)
8) p8=mkleaf(id,entry-b)=p3
9) p9=mkleaf(id,entry-c)=p4
10) p10=mknode(‘-’,p3,p4)=p5
11) p11=mkleaf(id,entry-d)
12) p12=mknode(‘*’,p5,p11)
13) p13=mknode(‘+’,p7,p12)

Example
• To work out a+(b-c)+(b-c)
– Construct Syntax tree, DAG
P4
400

Exercise
• To construct syntax tree, DAG, array of records for the following
expression a := b*(-c)+b*(-c)
• Consider the SDD to produce syntax trees for assignment statements

Three address code: Addresses
• Three-address code is built from two concepts:
– addresses and instructions.
• Instruction is the statement or operation
– At most one operator on the right side of an instruction.
– 3-address code form:
x = y op z
• An address can be
– Identifier: source variable program name or pointer to the Symbol Table name.
– constant: Constants in the program.

Forms or types of three address instructions
• Assignment Statements ----- x := y op z
• Assignment instructions ----- x := op y
• Copy statements ----- x := y
• Unconditional jump ----- goto L
• Conditional jump ----- if x relop y goto L [relop are <, =, >= , etc.]
• Procedure calls: 3 address code generated for call of the procedure y=p(x1,x2,…,xn)
param x1
param x2,
…
param xn
y = call p, n
• Indexed assignments ------ x := y[i] and x[i] := y
• Address and pointer assignments ------ x := &y and x := *y and *x := y

Ex: Write Three Address Code for the block of statements
int a;
int b;
a = 5 + 2 * b;
Solution:
t0 = 5;
t1 = 2 * b;
a = t0 + t1;

Ex: Write Three Address Code for the if -else
if (A < B)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (4)
(2) t=0
(3) goto (5)
(4) t = 1
(5)

Ex: Write Three Address Code for the if-else
if (A < B) && (C < D)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (3)
(2) goto (4)
(3) if (C < D) goto (6)
(4) t = 0
(5) goto (7)
(6) t = 1
(7)

Ex: Write Three Address Code for the while statements
a=3; b=4; i=0;
while(i<n){
a= b+1;
a=a*a;
i++;
}
c=a;
Solution:
a=3;
b=4;
i=0;
L1:
T1=i<n;
if T1 goto L2;
goto L3;
L2:
T2=b+1;
a=T2;
T3=a*a;
a=T3
i++;
goto L1;
L3:
c=a;

Ex: Write Three Address Code for the switch case
switch (ch)
{
case 1 : c = a + b;
break;
case 2 : c = a – b;
break;
}
Solution-
if ch = 1 goto L1
if ch = 2 goto L2
L1:
T1 = a + b
c = T1
goto Last
L2:
T1 = a – b
c = T2
goto Last
Last:

Instructions in Three Address Code

Choice of allowable operators
• It is an important issue in the design of an intermediate form
• A small operator set is easier to implement
• Restricted instruction set may force front end to generate long
sequences of statements for some source language operations
• The optimizer and code generator may then have to work harder if
good code is to be generated
• Close enough to machine instructions to simplify code
generation

Example
do
i = i+1;
while (a[i*8] < v);
L: t1 = i + 1
i = t1
t2 = i * 8
t3 = a[t2]
if t3 < v goto L
Symbolic labels
100: t1 = i + 1
101: i = t1
102: t2 = i * 8
103: t3 = a[t2]
104: if t3 < v goto 100
Position numbers

Syntax tree vs. DAG vs. Three address code
• AST is the procedure’s parse tree with the nodes for most non-terminal symbols
removed.
• Directed Acyclic Graph is an AST with a unique node for each value.
• Three address code is a sequence of statements of the general form x := y op z
• In a TAC there is at most one operator at the right side of an instruction.
• Example:
t1 = b – c
t2 = a * t1
t3 = a + t2
t4 = t1 * d
t5 = t3 + t4
a+a*(b-c)+(b-c)*d
+
+ *
*
-
b c
a
d
AST DAG TAC

Data structures for three address
codes
• Implementations of Three-Address statements
– Quadruples
• Has four fields: op, arg1, arg2 and result
– Triples
• Temporaries are not used and instead references to
instructions are made
– Indirect triples
• In addition to triples we use a list of pointers to triples

Example
• a = b * uminus c + b * uminus c
(or)
a = b * (-c) + b * (-c)
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
Three address code
uminus
*
uminus c t3
*
+
=
c t1
b t2
t1
b t4
t3
t2 t5
t4
t5 a
arg1 result
arg2
op
Quadruples
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Triples
(4)
0
1
2
3
4
5
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Indirect Triples
(4)
0
1
2
3
4
5
(0)
(1)
(2)
(3)
(4)
(5)
op
35
36
37
38
39
40

Compare Quadruples, Triples &
Indirect Triples
• When instructions are moving around during optimizations:
quadruples are better than triples.
– Quadruple uses temporary variables
• Indirect triples solve this problem

Syntax tree vs. Triples vs. 3AC

More triple representation
• x[i]:=y
• x:=y[i] //Exercise
op arg1 arg2
(0) []= x i
(1) assign (0) y
op arg1 arg2
(0) []= y i
(1) Assign x (0)

Exercise
• To give 3 Address code representations for
a+a*(b-c)+(b-c)*d
– Quadruples?
– Triples?
– Indirect Triples?

Types and Declarations
• Type checking: Ensures the types of operands matches that is expected
by its context (operator).
– E.g. mod operation needs integer operands
• Determine the storage needed
• Calculate the address of an array reference
• Insert explicit type conversion
• When declarations are together, a single offset on the stack pointer
suffices.
• int x, y, z; fun1(); fun2();
• Otherwise, the translator needs to keep track of the current offset.
• int x; fun1(); int y, z; fun2();

Storage layout
• From the type, we can determine amount of
storage at run time.
• At compile time, we will use this amount to
assign its name a relative address.
• Type and relative address are saved in the
symbol table entry of the name.
• Data with length determined only at run time
saves a pointer in the symbol table.

Type Systems Design
• Design is based on syntactic constructs in the
language, notion of types and the rules for
assigning the types to language constructs.
• E.g.
– In arithmetic operation such as addition,
subtraction, multiplication and division, if both
operands are integers then result is also integer.

Type Expressions
• The type of a language construct is denoted by type expression.
• It is either a basic type or formed by applying an operator (type
constructor) to other type expressions.

Equivalence of Type Expression
• Two types are structurally equivalent iff one of the following
conditions is true.
• They are the same basic type.
• They are formed by applying the same constructor to structurally
equivalent types.
• One is a type name that denotes the other.
• int a[2][3] is not equivalent to int b[3][2];
• int a is not equivalent to char b[4];
• struct {int, char} is not equivalent to struct {char, int};
• int * is not equivalent to void *.

Type checking rules for expressions
• Basic type expressions: Boolean, char, integer, float
• Eliteral {E.type = char }
• E num {E.type = integer}
• E id {E.type = lookup(id.entry) }

Type checking Expressions (cont..d)
• Special Basic type expressions: type_error (raise the error
during type checking) and void.
E E1 mod E2 { E.type = { if E1.type = = integer and
E2.type = = integer then
integer
else
type_error }
}
• Type name is a type expression
Ea {E.name = a}

• A type expression can be formed by applying the array type
constructor to a number and a type expression
Constructors include:
• Arrays: If T is a type expression then array(I,T) is a type expressions
denotes array with type T and index set I.
– int a[2][3] is array of 2 arrays of 3 integers.
– In functional style: array(2, array(3, int))
EE1[E2] {E.type = {if E2.type = = integer and
E1.type = array(I, T) then
T
else
type_error

• Product: If s and t are type expressions, then their Cartesian product s*t
is a type expression
• Records: A record is a data structure with named field. It applied to a
tuple formed from field names and field types.
• E.g.
type row = record
{
address: integer;
lexeme : array[1..15] of caht
}
var table: array[1…101] of row;
It declares the type name row denotes the type expression record ( (address
x integer) x (lexeme x array(1..15, char)) )
The variable table is array of records of this type.

• Assignment Statements: E may be Arithmetic, Logical, Relational
expression
Sid = E { S.type = {if id.type = E.type then
void
else
type_error} }
• If Statements:
Sif E then S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error } }

• While Statements:
Swhile E do S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error
}
}
• Pointers: If T is a type expression, then pointer(T) is a type expression
(i.e. pointer to an object of type T).
E*E1 {E.type = {if E.type = ptr(T) then
T
else
type_error } }

• Functions: It maps a domain type D to a range type R. The type of such
function is denoted by type expression DR.
Mapping: T DR {T.type = D.type  R.type }
Function call: E E1 (E2) {E.type = {if E2.type = T1 and
E1.type = T1 T2 then
T2
else
type_error }}

Type checking rules for coercions
• Implicit type conversions (by Compiler) and Explicit type
conversions (by programmer)
EE1 op E2 {E.type = {if E1.type = integer and
E2.type = integer then integer
else if E1.type = integer and
E2.type = float then float
else if E1.type = float and
E2.type = integer then float
else if E1.type = float and
E2.type = float then float
else type_error

Exercise
• Write productions and semantic rules for
computing types and finding their widths in
bytes.

Apply SDD – To find size or width of an array

Intermediate Representation for Array Expression

Type Checking
• Type expressions are checked for
– Correct code
– Security aspects
– Efficient code generation

Reference
• A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman,
Compilers Principles, Techniques and Tools,
Pearson Edition, 2013.
P. Kuppusamy - Lexical Analyzer

Intermediate code generation in Compiler Design

In this document

More Related Content

What's hot

Similar to Intermediate code generation in Compiler Design

More from Kuppusamy P

Recently uploaded

Intermediate code generation in Compiler Design