This document discusses compiler architecture and intermediate code generation. It begins by describing the typical phases of a compiler: parsing, static checking, and code generation. It then discusses intermediate code, which ties the front end and back end phases together and is language and machine independent. Various forms of intermediate code are described, including trees, postfix notation, and triple/quadruple intermediate code. The rest of the document focuses on triple/quadruple code, including how it represents expressions, statements, addressing of arrays, and the translation process from source code to triple/quadruple intermediate code.
Intermediate Code (IC)
•The given program in a source language is converted
to an equivalent program in an intermediate language
by the IC generator.
• Ties the front and back ends together
• Language and Machine neutral
• Many forms
• Level depends on how being processed
• More than one intermediate language may be used by
a compiler
3
4.
• Intermediate languagecan be many different languages, and the
designer of the compiler decides this intermediate language.
– syntax trees can be used as an intermediate language.
– postfix notation can be used as an intermediate language.
– three-address code (Quadraples) can be used as an
intermediate language
• we will use quadraples to discuss intermediate code
generation
• quadraples are close to machine instructions, but they are
not actual machine instructions.
– some programming languages have well defined
intermediate languages.
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute instructions
in these intermediate languages.
Intermediate Code (IC)
4
Intermediate Languages Types
•Graphical IRs:
– Abstract Syntax trees
– Directed Acyclic Graphs (DAGs)
– Control Flow Graphs
• Linear IRs:
– Stack based (postfix)
– Three address code (quadruples)
6
7.
Graphical IRs
• AbstractSyntax Trees (AST) – retain essential
structure of the parse tree, eliminating unneeded
nodes.
• Directed Acyclic Graphs (DAG) – compacted AST to
avoid duplication – smaller footprint as well
• Control flow graphs (CFG) – explicitly model control
flow
7
8.
ASTs and DAGs:
:=
a+
**
b - (uni)
c
:=
a +
b - (uni) b
*
- (uni)
c c
a := b *-c + b*-c
AST DAG
8
Three-Address Code
• Athree-address code is:
x := y op z
where x, y and z are names, constants or compiler-
generated temporaries; op is any operator.
• But we may also the following notation for three-address
code (it looks like a machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.
• We use the term “three-address code” because each
statement usually contains three addresses (two for
operands, one for the result).
10
11.
Linearized Representation ofDAG/AST
• Source Code
– a = b * -c + b * -c
• Three address code
• Tree Representation
11
12.
Three-Address Statements
Binary Operator:op y,z,result or
result := y op z
where op is a binary arithmetic or logical operator. This binary
operator is applied to y and z, and the result of the operation is
stored in result.
Ex:
add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c
Unary Operator: op y,,result or
result := op y
where op is a unary arithmetic or logical operator. This unary
operator is applied to y, and the result of the operation is stored
in result.
Ex: uminus a,,c
12
13.
Three-Address Code
• Twoconcepts
– Address
– Instruction
• Address
– Name: source-program names to appear as addresses
– Constant: Different types of constants
– Compiler Generated temporary:
13
14.
Three-Address Instruction
Assignment Type1: x := y op z
op is a binary arithmetic or logical operation
x, y and z are addresses
Assignment Type 2: x := op z
op is a unary arithmetic or logical operation
x and z are addresses
Copy Instruction:x := y
x and z are addresses and x is assigned the value of y
14
15.
Three-Address Instructions
Unconditional Jump:goto L
We will jump to the three-address code with the label L, and the
execution continues from that statement.
Ex: goto L1
jmp 7
Conditional Jump 1: if
// jump to L1
// jump to the statement 7
x goto L and if False x goto L
We will jump to the three-address code with the label L if x is TRUE
and FALSE, respectively. Otherwise, the following three-address
instruction in sequence is executed next.
Conditional Jump 2: if x relop y goto L
We will jump to the three-address code with the label L if the result of y
relop z is true, and the execution continues from that statement. If the result is
false, the execution continues from the statement following this conditional
jump statement.
15
Three-Address Statements (cont.)
IndexedAssignments:
x := y[i]
sets x to the value in location i memory units beyond location y
y[i] := x
sets contents of the location i memory units beyond location y to
the value of x
Address and Pointer Assignments:
x := &y
sets the r-value of x to l-value of y
x := *y where y is a pointer whose r-value is a location
sets the r-value of x equal to the contents of that location
*x := y
sets the r-value of the object pointed by x to the r-value of y
17
18.
Three address codeexample
do i=i+1; while (a[i] < v)
L: t1=i+1 100: t1 = i +1
i=t1 101: i = t1
t2=i*8 102: t2=i*8
t3=a[t2] 103: t3=a[t2]
if t3 < v goto L 104: if t3 < v goto 100
(A) Symbolic Labels (B) Position Numbers
18
19.
Representing 3-Address Statements
op,arg1, arg2, result
• x = minus y
– Does not use arg2
• x = y
– Op is =
• param a1
– Uses neither arg2 nor result
• Conditional/Unconditional jumps
– Put the target label in result
19
Incremental Translation
• ‘code’attribute can be long string
• Instead of building up ‘E.code’
– We arrange to generate new three-address instructions
– ‘code’ attribute is not used
– ‘gen’ method is used instead of ‘IR’
• ‘gen’ constructs a three address instruction and appends
it to the sequence of instructions generated so far
29
30.
Syntax-Directed Translation intoThree-Address Code
S id := E
E E1 + E2
E E1 * E2
E - E1
E ( E1 )
{gen(ID.svalue ‘:=’ E.place)}
{E.place := NewTemp();
gen(E.place ‘:=’ E1.place ‘+’ E2.place )}
{E.place = NewTemp();
gen(E.place ‘:=’ E1.place ‘*’ E2.place ‘,’)}
{E.place = NewTemp();
gen(E.place ‘:=’ ‘minus’ E1.place)}
{E.place = E1.place;}
E id {E.place = ID.svalue;}
30
31.
Addressing Array Elements
•Elements of arrays can be accessed quickly if the elements are
stored in a block of consecutive locations.
A one-dimensional array A:
baseA low i width
baseA is the address of the first location of the array A,
width is the width of each array element.
low is the index of the first array element
… …
31
32.
Addressing Array Elements(cont.)
baseA+(i-low)*width
can be re-written as i*width + (baseA-low*width)
should be computed
at run-time
can be computed
at compile-time
• So, the location of A[i] can be computed at the run-time by
evaluating the formula i*width+c where c is (baseA-
low*width) which is evaluated at compile-time.
• Intermediate code generator should produce the code to
evaluate this formula i*width+c (one multiplication and
one addition operation).
32
33.
Two-Dimensional Arrays
• Atwo-dimensional array can be stored in
– either row-major (row-by-row) or
– column-major (column-by-column).
• Most of the programming languages use row-major method.
• Row-major representation of a two-dimensional array:
row1 row2 rown
33
34.
Two-Dimensional Arrays (cont.)
•The location of A[i1,i2] is
baseA+ ((i1-low1)*n2+i2-low2)*width
baseA is the location of the array A.
low1
low2
is the index of the first row
is the index of the first column
n2 is the number of elements in each row
width is the width of each array element
• Again, this formula can be re-written as
((i1*n2)+i2)*width + (baseA-((low1*n1)+low2)*width)
should be computed
at run-time
can be computed
at compile-time
34
35.
Multi-Dimensional Arrays
• Ingeneral, the location of A[i1,i2,...,ik] is
(( ... ((i1*n2)+i2) ...)*nk+ik)*width + (baseA-
((...((low1*n1)+low2)...)*nk+lowk)*width)
• So, the intermediate code generator should produce the codes
to evaluate the following formula (to find the location of
A[i1,i2,...,ik]) :
(( ... ((i1*n2)+i2) ...)*nk+ik)*width + c
• To evaluate the (( ... ((i1*n2)+i2) ...)*nk+ik portion of this formula,
we can use the recurrence equation:
e1 = i1
em = em-1 * nm + im
35
36.
Translation of ArrayElements
• One dimensional
base + i w
w: width of each array element
• Two Dimensional
base + i1 w1 + i2 w2
w1: width of a row
w2: width of an element in a row
• k dimensional (generalized)
base + i1 w1 + i2 w2 + … + ik wk
36
37.
Translation of ArrayReferences
• Need to relate the address calculation formulas to a
grammar for array references
• Consider the non-terminal L to generate an array
L € L [E] | id [E]
• Nonterminal L has three synthesized attributes
– L.addr denotes a temporary used to compute the offset
for array reference ij x wj
– L.array is a pointer to the symbol-table entry
• L.array.base is used to determine the actual l-value of it
– L.type is the type of sub-array generated by L
• L.type.width gives the width of the type
37