The document discusses intermediate code generation in compilers. It begins by explaining that intermediate code generation is the final phase of the compiler front-end and its goal is to translate the program into a format expected by the back-end. Common intermediate representations include three address code and static single assignment form. The document then discusses why intermediate representations are used, how to choose an appropriate representation, and common types of representations like graphical IRs and linear IRs.
Discusses the intermediate code generation phase in compilers, its role in translating programs for back-end processing, and includes optimization processes.
Explains the necessity of intermediate representation for easier optimization and direct interpretation in architectures like SPARC and MIPS.
Outlines criteria for selecting intermediate representations, emphasizing ease of translation and optimization suitability for compilers.
Details common forms of intermediate representations including graphical forms like AST and linear forms like Three Address Code.
Describes the position of the intermediate code generator as a crucial interface between the compiler's front-end and back-end.
Compares the abstract syntax tree with concrete syntax trees and discusses the use of Directed Acyclic Graphs (DAG) for expressions.
Presents an algorithm for creating a DAG using an array, demonstrating node creation with labels and their children.
Introduces data structures like arrays and hash tables used for effective representation of intermediate code.
Shows semantic rules and grammar productions for constructing Directed Acyclic Graphs from arithmetic expressions.
Provides an example illustrating the construction of syntax trees and DAGs for the expression involving addition and subtraction.
Exercises the creation of syntax trees, DAGs, and records for given expressions in intermediate representation.
Explains the basics of three-address code, defining its structure, addresses, and types of instructions used within it.
Provides examples of how to write three-address codes for arithmetic expressions, conditionals, and loops.
Details the types of instructions that can be represented in three-address code including jumps and assignments.
Discusses the importance of choosing a small set of operators in designing intermediate forms to ease code generation.
Illustrates how symbolic labels are used with statements in three-address code, including their position numbers.
Compares abstract syntax trees, DAGs, and three-address code, highlighting their singularities and relationships.
Describes how three-address statements can be implemented using quadruples, triples, and indirect triples.
Analyzes the efficiency of quadraples and triples in instruction movement during optimization processes.
Compares syntax trees, triples, and three-address code illustrating their relationships and distinctions.
Provides additional examples for manipulating triples in the context of three-address coding.
Prompts exercises for producing three-address code representations in various forms like quadruples and triples.
Discusses the importance of type checking, storage allocation, and the rules governing assignment statements in a compiler.
Covers type expression equivalence, type checking rules, and how types are formed and checked in expressions.
Explains complex type checking procedures, including handling assignments, function calls, and conversions.
Details the rules for implicit and explicit type conversions during expression evaluation in compilers.
Presents practical exercises aimed at practicing type checking production rules for various types.
Specifies how to represent arrays in intermediate code, detailing layout and storage aspects.
Summarizes the overall aspects of type checking, its importance in code validity, and security.
Lists referenced literature and resources for further understanding of compiler principles and techniques.
Intermediate Code Generation
ā¢The final phase of the compiler front-end
⢠Goal: translate the program into a format
expected by the compiler back-end
⢠In typical compilers: followed by intermediate
code optimization and machine code
generation
3.
Why use anintermediate representation?
⢠It makes optimization easier: write optimization methods only
for the intermediate representation
⢠The intermediate representation can be directly interpreted
⢠SPARC (Scalable Processor Architecture)
⢠MIPS (Microprocessor without Interlocked Pipelined Stages)
How to choosethe intermediate representation?
⢠It should be easy to translate the source language
to the intermediate representation
⢠It should be easy to translate the intermediate
representation to the machine code
⢠The intermediate representation should be
suitable for optimization
⢠It should be neither too high level nor too low
level
⢠Single compiler can have more than one
intermediate representation
6.
Common Intermediate representations
ā¢General forms of intermediate representations (IR):
ā Graphical IR (i.e. parse tree, abstract syntax trees, DAG..)
ā Linear IR (i.e. non-graphical)
ā Three Address Code (TAC): instructions of the form āresult
= op1 operator op2ā
ā Static single assignment (SSA) form: each variable is
assigned once.
Y = 1
Y = 2
X = Y
Y1 = 1
Y2 = 2
X1 = Y2
7.
Example IR inprogramming languages
⢠Java bytecode (executed on the Java Virtual Machine)
⢠C is used in several compilers as an intermediate
representation (e.g. Lisp, Haskell, Cython. . . )
⢠Microsoftās Common Intermediate Language (CIL)
⢠GNU Compiler Collection (GCC) uses abstract syntax trees
8.
Position of Intermediatecode generator
⢠Intermediate code is the interface between front end
and back end in a compiler
Parser
Static
Checker
Intermediate Code
Generator
Code
Generator
Front end Back end
Variants of syntaxtrees - DAG
⢠Syntax tree is used to crate a DAG instead of tree for Expressions.
⢠A directed acyclic graph (DAG) is an AST with a unique node for each value.
⢠It can easily show the common sub-expressions and then use that knowledge during code
generation.
⢠Common sub-expressions has more than one parent. Ex. a and b-c
⢠Example: a+a*(b-c)+(b-c)*d
⢠Node a and (b-c) are unique nodes that values are using in two different context.
+
+ *
*
-
b c
a
d
11.
DAGās ā usingArray
⢠Algorithm
ā Search the array for a node m with label op, left child l and right child r
ā If there is such a node, return the value number m
ā If not, create in the array a new node n with label op, left child l, and right child
r and return its value n.
ā The search for m can be made more efficient by using k lists and using a hash
function to determine which lists to check.
=
+
1
i
id entry for i
num 10
+ 0 1
= 0 1
i := i + 1
Array of Records
0
1
2
3
4
10
Value-number method for constructing a node in a DAG
Input: label operator, left child, right child
Output: op, l, r
Exercise
⢠To constructsyntax tree, DAG, array of records for the following
expression a := b*(-c)+b*(-c)
⢠Consider the SDD to produce syntax trees for assignment statements
17.
Three address code:Addresses
⢠Three-address code is built from two concepts:
ā addresses and instructions.
⢠Instruction is the statement or operation
ā At most one operator on the right side of an instruction.
ā 3-address code form:
x = y op z
⢠An address can be
ā Identifier: source variable program name or pointer to the Symbol Table name.
ā constant: Constants in the program.
18.
Forms or typesof three address instructions
⢠Assignment Statements ----- x := y op z
⢠Assignment instructions ----- x := op y
⢠Copy statements ----- x := y
⢠Unconditional jump ----- goto L
⢠Conditional jump ----- if x relop y goto L [relop are <, =, >= , etc.]
⢠Procedure calls: 3 address code generated for call of the procedure y=p(x1,x2,ā¦,xn)
param x1
param x2,
ā¦
param xn
y = call p, n
⢠Indexed assignments ------ x := y[i] and x[i] := y
⢠Address and pointer assignments ------ x := &y and x := *y and *x := y
19.
Ex: Write ThreeAddress Code for the block of statements
int a;
int b;
a = 5 + 2 * b;
Solution:
t0 = 5;
t1 = 2 * b;
a = t0 + t1;
20.
Ex: Write ThreeAddress Code for the if -else
if (A < B)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (4)
(2) t=0
(3) goto (5)
(4) t = 1
(5)
21.
Ex: Write ThreeAddress Code for the if-else
if (A < B) && (C < D)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (3)
(2) goto (4)
(3) if (C < D) goto (6)
(4) t = 0
(5) goto (7)
(6) t = 1
(7)
22.
Ex: Write ThreeAddress Code for the while statements
a=3; b=4; i=0;
while(i<n){
a= b+1;
a=a*a;
i++;
}
c=a;
Solution:
a=3;
b=4;
i=0;
L1:
T1=i<n;
if T1 goto L2;
goto L3;
L2:
T2=b+1;
a=T2;
T3=a*a;
a=T3
i++;
goto L1;
L3:
c=a;
23.
Ex: Write ThreeAddress Code for the switch case
switch (ch)
{
case 1 : c = a + b;
break;
case 2 : c = a ā b;
break;
}
Solution-
if ch = 1 goto L1
if ch = 2 goto L2
L1:
T1 = a + b
c = T1
goto Last
L2:
T1 = a ā b
c = T2
goto Last
Last:
Choice of allowableoperators
⢠It is an important issue in the design of an intermediate form
⢠A small operator set is easier to implement
⢠Restricted instruction set may force front end to generate long
sequences of statements for some source language operations
⢠The optimizer and code generator may then have to work harder if
good code is to be generated
⢠Close enough to machine instructions to simplify code
generation
27.
Example
do
i = i+1;
while(a[i*8] < v);
L: t1 = i + 1
i = t1
t2 = i * 8
t3 = a[t2]
if t3 < v goto L
Symbolic labels
100: t1 = i + 1
101: i = t1
102: t2 = i * 8
103: t3 = a[t2]
104: if t3 < v goto 100
Position numbers
28.
Syntax tree vs.DAG vs. Three address code
⢠AST is the procedureās parse tree with the nodes for most non-terminal symbols
removed.
⢠Directed Acyclic Graph is an AST with a unique node for each value.
⢠Three address code is a sequence of statements of the general form x := y op z
⢠In a TAC there is at most one operator at the right side of an instruction.
⢠Example:
t1 = b ā c
t2 = a * t1
t3 = a + t2
t4 = t1 * d
t5 = t3 + t4
a+a*(b-c)+(b-c)*d
+
+ *
*
-
b c
a
d
AST DAG TAC
29.
Data structures forthree address
codes
⢠Implementations of Three-Address statements
ā Quadruples
⢠Has four fields: op, arg1, arg2 and result
ā Triples
⢠Temporaries are not used and instead references to
instructions are made
ā Indirect triples
⢠In addition to triples we use a list of pointers to triples
30.
Example
⢠a =b * uminus c + b * uminus c
(or)
a = b * (-c) + b * (-c)
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
Three address code
uminus
*
uminus c t3
*
+
=
c t1
b t2
t1
b t4
t3
t2 t5
t4
t5 a
arg1 result
arg2
op
Quadruples
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Triples
(4)
0
1
2
3
4
5
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Indirect Triples
(4)
0
1
2
3
4
5
(0)
(1)
(2)
(3)
(4)
(5)
op
35
36
37
38
39
40
31.
Compare Quadruples, Triples&
Indirect Triples
⢠When instructions are moving around during optimizations:
quadruples are better than triples.
ā Quadruple uses temporary variables
⢠Indirect triples solve this problem
More triple representation
ā¢x[i]:=y
⢠x:=y[i] //Exercise
op arg1 arg2
(0) []= x i
(1) assign (0) y
op arg1 arg2
(0) []= y i
(1) Assign x (0)
34.
Exercise
⢠To give3 Address code representations for
a+a*(b-c)+(b-c)*d
ā Quadruples?
ā Triples?
ā Indirect Triples?
35.
Types and Declarations
ā¢Type checking: Ensures the types of operands matches that is expected
by its context (operator).
ā E.g. mod operation needs integer operands
⢠Determine the storage needed
⢠Calculate the address of an array reference
⢠Insert explicit type conversion
⢠When declarations are together, a single offset on the stack pointer
suffices.
⢠int x, y, z; fun1(); fun2();
⢠Otherwise, the translator needs to keep track of the current offset.
⢠int x; fun1(); int y, z; fun2();
36.
Storage layout
⢠Fromthe type, we can determine amount of
storage at run time.
⢠At compile time, we will use this amount to
assign its name a relative address.
⢠Type and relative address are saved in the
symbol table entry of the name.
⢠Data with length determined only at run time
saves a pointer in the symbol table.
37.
Type Systems Design
ā¢Design is based on syntactic constructs in the
language, notion of types and the rules for
assigning the types to language constructs.
⢠E.g.
ā In arithmetic operation such as addition,
subtraction, multiplication and division, if both
operands are integers then result is also integer.
38.
Type Expressions
⢠Thetype of a language construct is denoted by type expression.
⢠It is either a basic type or formed by applying an operator (type
constructor) to other type expressions.
39.
Equivalence of TypeExpression
⢠Two types are structurally equivalent iff one of the following
conditions is true.
⢠They are the same basic type.
⢠They are formed by applying the same constructor to structurally
equivalent types.
⢠One is a type name that denotes the other.
⢠int a[2][3] is not equivalent to int b[3][2];
⢠int a is not equivalent to char b[4];
⢠struct {int, char} is not equivalent to struct {char, int};
⢠int * is not equivalent to void *.
40.
Type checking rulesfor expressions
⢠Basic type expressions: Boolean, char, integer, float
⢠Eļ literal {E.type = char }
⢠Eļ num {E.type = integer}
⢠Eļ id {E.type = lookup(id.entry) }
41.
Type checking Expressions(cont..d)
⢠Special Basic type expressions: type_error (raise the error
during type checking) and void.
Eļ E1 mod E2 { E.type = { if E1.type = = integer and
E2.type = = integer then
integer
else
type_error }
}
⢠Type name is a type expression
Eļ a {E.name = a}
42.
Type checking Expressions(cont..d)
⢠A type expression can be formed by applying the array type
constructor to a number and a type expression
Constructors include:
⢠Arrays: If T is a type expression then array(I,T) is a type expressions
denotes array with type T and index set I.
ā int a[2][3] is array of 2 arrays of 3 integers.
ā In functional style: array(2, array(3, int))
Eļ E1[E2] {E.type = {if E2.type = = integer and
E1.type = array(I, T) then
T
else
type_error
43.
Type checking Expressions(cont..d)
⢠Product: If s and t are type expressions, then their Cartesian product s*t
is a type expression
⢠Records: A record is a data structure with named field. It applied to a
tuple formed from field names and field types.
⢠E.g.
type row = record
{
address: integer;
lexeme : array[1..15] of caht
}
var table: array[1ā¦101] of row;
It declares the type name row denotes the type expression record ( (address
x integer) x (lexeme x array(1..15, char)) )
The variable table is array of records of this type.
44.
Type checking Expressions(cont..d)
⢠Assignment Statements: E may be Arithmetic, Logical, Relational
expression
Sļ id = E { S.type = {if id.type = E.type then
void
else
type_error} }
⢠If Statements:
Sļ if E then S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error } }
45.
Type checking Expressions(cont..d)
⢠While Statements:
Sļ while E do S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error
}
}
⢠Pointers: If T is a type expression, then pointer(T) is a type expression
(i.e. pointer to an object of type T).
Eļ *E1 {E.type = {if E.type = ptr(T) then
T
else
type_error } }
46.
Type checking Expressions(cont..d)
⢠Functions: It maps a domain type D to a range type R. The type of such
function is denoted by type expression Dļ R.
Mapping: Tļ Dļ R {T.type = D.type ļ R.type }
Function call: Eļ E1 (E2) {E.type = {if E2.type = T1 and
E1.type = T1ļ T2 then
T2
else
type_error }}
47.
Type checking rulesfor coercions
⢠Implicit type conversions (by Compiler) and Explicit type
conversions (by programmer)
Eļ E1 op E2 {E.type = {if E1.type = integer and
E2.type = integer then integer
else if E1.type = integer and
E2.type = float then float
else if E1.type = float and
E2.type = integer then float
else if E1.type = float and
E2.type = float then float
else type_error
Type Checking
⢠Typeexpressions are checked for
ā Correct code
ā Security aspects
ā Efficient code generation
52.
Reference
⢠A.V. Aho,M.S. Lam, R. Sethi, J. D. Ullman,
Compilers Principles, Techniques and Tools,
Pearson Edition, 2013.
P. Kuppusamy - Lexical Analyzer