KEMBAR78
Intermediate code generation in Compiler Design | PDF
Intermediate code generation
Intermediate Code Generation
• The final phase of the compiler front-end
• Goal: translate the program into a format
expected by the compiler back-end
• In typical compilers: followed by intermediate
code optimization and machine code
generation
Why use an intermediate representation?
• It makes optimization easier: write optimization methods only
for the intermediate representation
• The intermediate representation can be directly interpreted
• SPARC (Scalable Processor Architecture)
• MIPS (Microprocessor without Interlocked Pipelined Stages)
Why Intermediate Representation?
How to choose the intermediate representation?
• It should be easy to translate the source language
to the intermediate representation
• It should be easy to translate the intermediate
representation to the machine code
• The intermediate representation should be
suitable for optimization
• It should be neither too high level nor too low
level
• Single compiler can have more than one
intermediate representation
Common Intermediate representations
• General forms of intermediate representations (IR):
– Graphical IR (i.e. parse tree, abstract syntax trees, DAG..)
– Linear IR (i.e. non-graphical)
– Three Address Code (TAC): instructions of the form ā€œresult
= op1 operator op2ā€
– Static single assignment (SSA) form: each variable is
assigned once.
Y = 1
Y = 2
X = Y
Y1 = 1
Y2 = 2
X1 = Y2
Example IR in programming languages
• Java bytecode (executed on the Java Virtual Machine)
• C is used in several compilers as an intermediate
representation (e.g. Lisp, Haskell, Cython. . . )
• Microsoft’s Common Intermediate Language (CIL)
• GNU Compiler Collection (GCC) uses abstract syntax trees
Position of Intermediate code generator
• Intermediate code is the interface between front end
and back end in a compiler
Parser
Static
Checker
Intermediate Code
Generator
Code
Generator
Front end Back end
Abstract Syntax Tree vs. Concrete Syntax (Parse) Tree
Variants of syntax trees - DAG
• Syntax tree is used to crate a DAG instead of tree for Expressions.
• A directed acyclic graph (DAG) is an AST with a unique node for each value.
• It can easily show the common sub-expressions and then use that knowledge during code
generation.
• Common sub-expressions has more than one parent. Ex. a and b-c
• Example: a+a*(b-c)+(b-c)*d
• Node a and (b-c) are unique nodes that values are using in two different context.
+
+ *
*
-
b c
a
d
DAG’s – using Array
• Algorithm
– Search the array for a node m with label op, left child l and right child r
– If there is such a node, return the value number m
– If not, create in the array a new node n with label op, left child l, and right child
r and return its value n.
– The search for m can be made more efficient by using k lists and using a hash
function to determine which lists to check.
=
+
1
i
id entry for i
num 10
+ 0 1
= 0 1
i := i + 1
Array of Records
0
1
2
3
4
10
Value-number method for constructing a node in a DAG
Input: label operator, left child, right child
Output: op, l, r
Data structure - Array
Data structure – Hash table
SDD for creating DAG’s
1) E -> E1+T
2) E -> E1-T
3) E -> T
4) T -> (E)
5) T -> id
6) T -> num
Grammar Productions Semantic Rules
{E.node= new mknode(ā€˜+’, E1.node,T.node)}
{E.node= new mknode(ā€˜-’, E1.node,T.node)}
{E.node = T.node}
{T.node = E.node}
{T.node = new mkleaf(id, id.entry)}
{T.node = new mkleaf(num, num.val)}
Example:
1) p1=mkleaf(id, entry-a)
2) P2=mkleaf(id, entry-a)=p1
3) p3=mkleaf(id, entry-b)
4) p4=mkleaf(id, entry-c)
5) p5=mknode(ā€˜-’,p3,p4)
6) p6=mknode(ā€˜*’,p1,p5)
7) p7=mknode(ā€˜+’,p1,p6)
8) p8=mkleaf(id,entry-b)=p3
9) p9=mkleaf(id,entry-c)=p4
10) p10=mknode(ā€˜-’,p3,p4)=p5
11) p11=mkleaf(id,entry-d)
12) p12=mknode(ā€˜*’,p5,p11)
13) p13=mknode(ā€˜+’,p7,p12)
Example
• To work out a+(b-c)+(b-c)
– Construct Syntax tree, DAG
P4
400
Exercise
• To construct syntax tree, DAG, array of records for the following
expression a := b*(-c)+b*(-c)
• Consider the SDD to produce syntax trees for assignment statements
Three address code: Addresses
• Three-address code is built from two concepts:
– addresses and instructions.
• Instruction is the statement or operation
– At most one operator on the right side of an instruction.
– 3-address code form:
x = y op z
• An address can be
– Identifier: source variable program name or pointer to the Symbol Table name.
– constant: Constants in the program.
Forms or types of three address instructions
• Assignment Statements ----- x := y op z
• Assignment instructions ----- x := op y
• Copy statements ----- x := y
• Unconditional jump ----- goto L
• Conditional jump ----- if x relop y goto L [relop are <, =, >= , etc.]
• Procedure calls: 3 address code generated for call of the procedure y=p(x1,x2,…,xn)
param x1
param x2,
…
param xn
y = call p, n
• Indexed assignments ------ x := y[i] and x[i] := y
• Address and pointer assignments ------ x := &y and x := *y and *x := y
Ex: Write Three Address Code for the block of statements
int a;
int b;
a = 5 + 2 * b;
Solution:
t0 = 5;
t1 = 2 * b;
a = t0 + t1;
Ex: Write Three Address Code for the if -else
if (A < B)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (4)
(2) t=0
(3) goto (5)
(4) t = 1
(5)
Ex: Write Three Address Code for the if-else
if (A < B) && (C < D)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (3)
(2) goto (4)
(3) if (C < D) goto (6)
(4) t = 0
(5) goto (7)
(6) t = 1
(7)
Ex: Write Three Address Code for the while statements
a=3; b=4; i=0;
while(i<n){
a= b+1;
a=a*a;
i++;
}
c=a;
Solution:
a=3;
b=4;
i=0;
L1:
T1=i<n;
if T1 goto L2;
goto L3;
L2:
T2=b+1;
a=T2;
T3=a*a;
a=T3
i++;
goto L1;
L3:
c=a;
Ex: Write Three Address Code for the switch case
switch (ch)
{
case 1 : c = a + b;
break;
case 2 : c = a – b;
break;
}
Solution-
if ch = 1 goto L1
if ch = 2 goto L2
L1:
T1 = a + b
c = T1
goto Last
L2:
T1 = a – b
c = T2
goto Last
Last:
Instructions in Three Address Code
Instructions in Three Address Code
Choice of allowable operators
• It is an important issue in the design of an intermediate form
• A small operator set is easier to implement
• Restricted instruction set may force front end to generate long
sequences of statements for some source language operations
• The optimizer and code generator may then have to work harder if
good code is to be generated
• Close enough to machine instructions to simplify code
generation
Example
do
i = i+1;
while (a[i*8] < v);
L: t1 = i + 1
i = t1
t2 = i * 8
t3 = a[t2]
if t3 < v goto L
Symbolic labels
100: t1 = i + 1
101: i = t1
102: t2 = i * 8
103: t3 = a[t2]
104: if t3 < v goto 100
Position numbers
Syntax tree vs. DAG vs. Three address code
• AST is the procedure’s parse tree with the nodes for most non-terminal symbols
removed.
• Directed Acyclic Graph is an AST with a unique node for each value.
• Three address code is a sequence of statements of the general form x := y op z
• In a TAC there is at most one operator at the right side of an instruction.
• Example:
t1 = b – c
t2 = a * t1
t3 = a + t2
t4 = t1 * d
t5 = t3 + t4
a+a*(b-c)+(b-c)*d
+
+ *
*
-
b c
a
d
AST DAG TAC
Data structures for three address
codes
• Implementations of Three-Address statements
– Quadruples
• Has four fields: op, arg1, arg2 and result
– Triples
• Temporaries are not used and instead references to
instructions are made
– Indirect triples
• In addition to triples we use a list of pointers to triples
Example
• a = b * uminus c + b * uminus c
(or)
a = b * (-c) + b * (-c)
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
Three address code
uminus
*
uminus c t3
*
+
=
c t1
b t2
t1
b t4
t3
t2 t5
t4
t5 a
arg1 result
arg2
op
Quadruples
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Triples
(4)
0
1
2
3
4
5
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Indirect Triples
(4)
0
1
2
3
4
5
(0)
(1)
(2)
(3)
(4)
(5)
op
35
36
37
38
39
40
Compare Quadruples, Triples &
Indirect Triples
• When instructions are moving around during optimizations:
quadruples are better than triples.
– Quadruple uses temporary variables
• Indirect triples solve this problem
Syntax tree vs. Triples vs. 3AC
More triple representation
• x[i]:=y
• x:=y[i] //Exercise
op arg1 arg2
(0) []= x i
(1) assign (0) y
op arg1 arg2
(0) []= y i
(1) Assign x (0)
Exercise
• To give 3 Address code representations for
a+a*(b-c)+(b-c)*d
– Quadruples?
– Triples?
– Indirect Triples?
Types and Declarations
• Type checking: Ensures the types of operands matches that is expected
by its context (operator).
– E.g. mod operation needs integer operands
• Determine the storage needed
• Calculate the address of an array reference
• Insert explicit type conversion
• When declarations are together, a single offset on the stack pointer
suffices.
• int x, y, z; fun1(); fun2();
• Otherwise, the translator needs to keep track of the current offset.
• int x; fun1(); int y, z; fun2();
Storage layout
• From the type, we can determine amount of
storage at run time.
• At compile time, we will use this amount to
assign its name a relative address.
• Type and relative address are saved in the
symbol table entry of the name.
• Data with length determined only at run time
saves a pointer in the symbol table.
Type Systems Design
• Design is based on syntactic constructs in the
language, notion of types and the rules for
assigning the types to language constructs.
• E.g.
– In arithmetic operation such as addition,
subtraction, multiplication and division, if both
operands are integers then result is also integer.
Type Expressions
• The type of a language construct is denoted by type expression.
• It is either a basic type or formed by applying an operator (type
constructor) to other type expressions.
Equivalence of Type Expression
• Two types are structurally equivalent iff one of the following
conditions is true.
• They are the same basic type.
• They are formed by applying the same constructor to structurally
equivalent types.
• One is a type name that denotes the other.
• int a[2][3] is not equivalent to int b[3][2];
• int a is not equivalent to char b[4];
• struct {int, char} is not equivalent to struct {char, int};
• int * is not equivalent to void *.
Type checking rules for expressions
• Basic type expressions: Boolean, char, integer, float
• Eliteral {E.type = char }
• E num {E.type = integer}
• E id {E.type = lookup(id.entry) }
Type checking Expressions (cont..d)
• Special Basic type expressions: type_error (raise the error
during type checking) and void.
E E1 mod E2 { E.type = { if E1.type = = integer and
E2.type = = integer then
integer
else
type_error }
}
• Type name is a type expression
Ea {E.name = a}
Type checking Expressions (cont..d)
• A type expression can be formed by applying the array type
constructor to a number and a type expression
Constructors include:
• Arrays: If T is a type expression then array(I,T) is a type expressions
denotes array with type T and index set I.
– int a[2][3] is array of 2 arrays of 3 integers.
– In functional style: array(2, array(3, int))
EE1[E2] {E.type = {if E2.type = = integer and
E1.type = array(I, T) then
T
else
type_error
Type checking Expressions (cont..d)
• Product: If s and t are type expressions, then their Cartesian product s*t
is a type expression
• Records: A record is a data structure with named field. It applied to a
tuple formed from field names and field types.
• E.g.
type row = record
{
address: integer;
lexeme : array[1..15] of caht
}
var table: array[1…101] of row;
It declares the type name row denotes the type expression record ( (address
x integer) x (lexeme x array(1..15, char)) )
The variable table is array of records of this type.
Type checking Expressions (cont..d)
• Assignment Statements: E may be Arithmetic, Logical, Relational
expression
Sid = E { S.type = {if id.type = E.type then
void
else
type_error} }
• If Statements:
Sif E then S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error } }
Type checking Expressions (cont..d)
• While Statements:
Swhile E do S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error
}
}
• Pointers: If T is a type expression, then pointer(T) is a type expression
(i.e. pointer to an object of type T).
E*E1 {E.type = {if E.type = ptr(T) then
T
else
type_error } }
Type checking Expressions (cont..d)
• Functions: It maps a domain type D to a range type R. The type of such
function is denoted by type expression DR.
Mapping: T DR {T.type = D.type  R.type }
Function call: E E1 (E2) {E.type = {if E2.type = T1 and
E1.type = T1 T2 then
T2
else
type_error }}
Type checking rules for coercions
• Implicit type conversions (by Compiler) and Explicit type
conversions (by programmer)
EE1 op E2 {E.type = {if E1.type = integer and
E2.type = integer then integer
else if E1.type = integer and
E2.type = float then float
else if E1.type = float and
E2.type = integer then float
else if E1.type = float and
E2.type = float then float
else type_error
Exercise
• Write productions and semantic rules for
computing types and finding their widths in
bytes.
Apply SDD – To find size or width of an array
Intermediate Representation for Array Expression
Type Checking
• Type expressions are checked for
– Correct code
– Security aspects
– Efficient code generation
Reference
• A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman,
Compilers Principles, Techniques and Tools,
Pearson Edition, 2013.
P. Kuppusamy - Lexical Analyzer

Intermediate code generation in Compiler Design

  • 1.
  • 2.
    Intermediate Code Generation •The final phase of the compiler front-end • Goal: translate the program into a format expected by the compiler back-end • In typical compilers: followed by intermediate code optimization and machine code generation
  • 3.
    Why use anintermediate representation? • It makes optimization easier: write optimization methods only for the intermediate representation • The intermediate representation can be directly interpreted • SPARC (Scalable Processor Architecture) • MIPS (Microprocessor without Interlocked Pipelined Stages)
  • 4.
  • 5.
    How to choosethe intermediate representation? • It should be easy to translate the source language to the intermediate representation • It should be easy to translate the intermediate representation to the machine code • The intermediate representation should be suitable for optimization • It should be neither too high level nor too low level • Single compiler can have more than one intermediate representation
  • 6.
    Common Intermediate representations •General forms of intermediate representations (IR): – Graphical IR (i.e. parse tree, abstract syntax trees, DAG..) – Linear IR (i.e. non-graphical) – Three Address Code (TAC): instructions of the form ā€œresult = op1 operator op2ā€ – Static single assignment (SSA) form: each variable is assigned once. Y = 1 Y = 2 X = Y Y1 = 1 Y2 = 2 X1 = Y2
  • 7.
    Example IR inprogramming languages • Java bytecode (executed on the Java Virtual Machine) • C is used in several compilers as an intermediate representation (e.g. Lisp, Haskell, Cython. . . ) • Microsoft’s Common Intermediate Language (CIL) • GNU Compiler Collection (GCC) uses abstract syntax trees
  • 8.
    Position of Intermediatecode generator • Intermediate code is the interface between front end and back end in a compiler Parser Static Checker Intermediate Code Generator Code Generator Front end Back end
  • 9.
    Abstract Syntax Treevs. Concrete Syntax (Parse) Tree
  • 10.
    Variants of syntaxtrees - DAG • Syntax tree is used to crate a DAG instead of tree for Expressions. • A directed acyclic graph (DAG) is an AST with a unique node for each value. • It can easily show the common sub-expressions and then use that knowledge during code generation. • Common sub-expressions has more than one parent. Ex. a and b-c • Example: a+a*(b-c)+(b-c)*d • Node a and (b-c) are unique nodes that values are using in two different context. + + * * - b c a d
  • 11.
    DAG’s – usingArray • Algorithm – Search the array for a node m with label op, left child l and right child r – If there is such a node, return the value number m – If not, create in the array a new node n with label op, left child l, and right child r and return its value n. – The search for m can be made more efficient by using k lists and using a hash function to determine which lists to check. = + 1 i id entry for i num 10 + 0 1 = 0 1 i := i + 1 Array of Records 0 1 2 3 4 10 Value-number method for constructing a node in a DAG Input: label operator, left child, right child Output: op, l, r
  • 12.
  • 13.
  • 14.
    SDD for creatingDAG’s 1) E -> E1+T 2) E -> E1-T 3) E -> T 4) T -> (E) 5) T -> id 6) T -> num Grammar Productions Semantic Rules {E.node= new mknode(ā€˜+’, E1.node,T.node)} {E.node= new mknode(ā€˜-’, E1.node,T.node)} {E.node = T.node} {T.node = E.node} {T.node = new mkleaf(id, id.entry)} {T.node = new mkleaf(num, num.val)} Example: 1) p1=mkleaf(id, entry-a) 2) P2=mkleaf(id, entry-a)=p1 3) p3=mkleaf(id, entry-b) 4) p4=mkleaf(id, entry-c) 5) p5=mknode(ā€˜-’,p3,p4) 6) p6=mknode(ā€˜*’,p1,p5) 7) p7=mknode(ā€˜+’,p1,p6) 8) p8=mkleaf(id,entry-b)=p3 9) p9=mkleaf(id,entry-c)=p4 10) p10=mknode(ā€˜-’,p3,p4)=p5 11) p11=mkleaf(id,entry-d) 12) p12=mknode(ā€˜*’,p5,p11) 13) p13=mknode(ā€˜+’,p7,p12)
  • 15.
    Example • To workout a+(b-c)+(b-c) – Construct Syntax tree, DAG P4 400
  • 16.
    Exercise • To constructsyntax tree, DAG, array of records for the following expression a := b*(-c)+b*(-c) • Consider the SDD to produce syntax trees for assignment statements
  • 17.
    Three address code:Addresses • Three-address code is built from two concepts: – addresses and instructions. • Instruction is the statement or operation – At most one operator on the right side of an instruction. – 3-address code form: x = y op z • An address can be – Identifier: source variable program name or pointer to the Symbol Table name. – constant: Constants in the program.
  • 18.
    Forms or typesof three address instructions • Assignment Statements ----- x := y op z • Assignment instructions ----- x := op y • Copy statements ----- x := y • Unconditional jump ----- goto L • Conditional jump ----- if x relop y goto L [relop are <, =, >= , etc.] • Procedure calls: 3 address code generated for call of the procedure y=p(x1,x2,…,xn) param x1 param x2, … param xn y = call p, n • Indexed assignments ------ x := y[i] and x[i] := y • Address and pointer assignments ------ x := &y and x := *y and *x := y
  • 19.
    Ex: Write ThreeAddress Code for the block of statements int a; int b; a = 5 + 2 * b; Solution: t0 = 5; t1 = 2 * b; a = t0 + t1;
  • 20.
    Ex: Write ThreeAddress Code for the if -else if (A < B) { t = 1 } else { t = 0 } Solution- (1) if (A < B) goto (4) (2) t=0 (3) goto (5) (4) t = 1 (5)
  • 21.
    Ex: Write ThreeAddress Code for the if-else if (A < B) && (C < D) { t = 1 } else { t = 0 } Solution- (1) if (A < B) goto (3) (2) goto (4) (3) if (C < D) goto (6) (4) t = 0 (5) goto (7) (6) t = 1 (7)
  • 22.
    Ex: Write ThreeAddress Code for the while statements a=3; b=4; i=0; while(i<n){ a= b+1; a=a*a; i++; } c=a; Solution: a=3; b=4; i=0; L1: T1=i<n; if T1 goto L2; goto L3; L2: T2=b+1; a=T2; T3=a*a; a=T3 i++; goto L1; L3: c=a;
  • 23.
    Ex: Write ThreeAddress Code for the switch case switch (ch) { case 1 : c = a + b; break; case 2 : c = a – b; break; } Solution- if ch = 1 goto L1 if ch = 2 goto L2 L1: T1 = a + b c = T1 goto Last L2: T1 = a – b c = T2 goto Last Last:
  • 24.
  • 25.
  • 26.
    Choice of allowableoperators • It is an important issue in the design of an intermediate form • A small operator set is easier to implement • Restricted instruction set may force front end to generate long sequences of statements for some source language operations • The optimizer and code generator may then have to work harder if good code is to be generated • Close enough to machine instructions to simplify code generation
  • 27.
    Example do i = i+1; while(a[i*8] < v); L: t1 = i + 1 i = t1 t2 = i * 8 t3 = a[t2] if t3 < v goto L Symbolic labels 100: t1 = i + 1 101: i = t1 102: t2 = i * 8 103: t3 = a[t2] 104: if t3 < v goto 100 Position numbers
  • 28.
    Syntax tree vs.DAG vs. Three address code • AST is the procedure’s parse tree with the nodes for most non-terminal symbols removed. • Directed Acyclic Graph is an AST with a unique node for each value. • Three address code is a sequence of statements of the general form x := y op z • In a TAC there is at most one operator at the right side of an instruction. • Example: t1 = b – c t2 = a * t1 t3 = a + t2 t4 = t1 * d t5 = t3 + t4 a+a*(b-c)+(b-c)*d + + * * - b c a d AST DAG TAC
  • 29.
    Data structures forthree address codes • Implementations of Three-Address statements – Quadruples • Has four fields: op, arg1, arg2 and result – Triples • Temporaries are not used and instead references to instructions are made – Indirect triples • In addition to triples we use a list of pointers to triples
  • 30.
    Example • a =b * uminus c + b * uminus c (or) a = b * (-c) + b * (-c) t1 = uminus c t2 = b * t1 t3 = uminus c t4 = b * t3 t5 = t2 + t4 a = t5 Three address code uminus * uminus c t3 * + = c t1 b t2 t1 b t4 t3 t2 t5 t4 t5 a arg1 result arg2 op Quadruples uminus * uminus c * + = c b (0) b (2) (1) (3) a arg1 arg2 op Triples (4) 0 1 2 3 4 5 uminus * uminus c * + = c b (0) b (2) (1) (3) a arg1 arg2 op Indirect Triples (4) 0 1 2 3 4 5 (0) (1) (2) (3) (4) (5) op 35 36 37 38 39 40
  • 31.
    Compare Quadruples, Triples& Indirect Triples • When instructions are moving around during optimizations: quadruples are better than triples. – Quadruple uses temporary variables • Indirect triples solve this problem
  • 32.
    Syntax tree vs.Triples vs. 3AC
  • 33.
    More triple representation •x[i]:=y • x:=y[i] //Exercise op arg1 arg2 (0) []= x i (1) assign (0) y op arg1 arg2 (0) []= y i (1) Assign x (0)
  • 34.
    Exercise • To give3 Address code representations for a+a*(b-c)+(b-c)*d – Quadruples? – Triples? – Indirect Triples?
  • 35.
    Types and Declarations •Type checking: Ensures the types of operands matches that is expected by its context (operator). – E.g. mod operation needs integer operands • Determine the storage needed • Calculate the address of an array reference • Insert explicit type conversion • When declarations are together, a single offset on the stack pointer suffices. • int x, y, z; fun1(); fun2(); • Otherwise, the translator needs to keep track of the current offset. • int x; fun1(); int y, z; fun2();
  • 36.
    Storage layout • Fromthe type, we can determine amount of storage at run time. • At compile time, we will use this amount to assign its name a relative address. • Type and relative address are saved in the symbol table entry of the name. • Data with length determined only at run time saves a pointer in the symbol table.
  • 37.
    Type Systems Design •Design is based on syntactic constructs in the language, notion of types and the rules for assigning the types to language constructs. • E.g. – In arithmetic operation such as addition, subtraction, multiplication and division, if both operands are integers then result is also integer.
  • 38.
    Type Expressions • Thetype of a language construct is denoted by type expression. • It is either a basic type or formed by applying an operator (type constructor) to other type expressions.
  • 39.
    Equivalence of TypeExpression • Two types are structurally equivalent iff one of the following conditions is true. • They are the same basic type. • They are formed by applying the same constructor to structurally equivalent types. • One is a type name that denotes the other. • int a[2][3] is not equivalent to int b[3][2]; • int a is not equivalent to char b[4]; • struct {int, char} is not equivalent to struct {char, int}; • int * is not equivalent to void *.
  • 40.
    Type checking rulesfor expressions • Basic type expressions: Boolean, char, integer, float • Eliteral {E.type = char } • E num {E.type = integer} • E id {E.type = lookup(id.entry) }
  • 41.
    Type checking Expressions(cont..d) • Special Basic type expressions: type_error (raise the error during type checking) and void. E E1 mod E2 { E.type = { if E1.type = = integer and E2.type = = integer then integer else type_error } } • Type name is a type expression Ea {E.name = a}
  • 42.
    Type checking Expressions(cont..d) • A type expression can be formed by applying the array type constructor to a number and a type expression Constructors include: • Arrays: If T is a type expression then array(I,T) is a type expressions denotes array with type T and index set I. – int a[2][3] is array of 2 arrays of 3 integers. – In functional style: array(2, array(3, int)) EE1[E2] {E.type = {if E2.type = = integer and E1.type = array(I, T) then T else type_error
  • 43.
    Type checking Expressions(cont..d) • Product: If s and t are type expressions, then their Cartesian product s*t is a type expression • Records: A record is a data structure with named field. It applied to a tuple formed from field names and field types. • E.g. type row = record { address: integer; lexeme : array[1..15] of caht } var table: array[1…101] of row; It declares the type name row denotes the type expression record ( (address x integer) x (lexeme x array(1..15, char)) ) The variable table is array of records of this type.
  • 44.
    Type checking Expressions(cont..d) • Assignment Statements: E may be Arithmetic, Logical, Relational expression Sid = E { S.type = {if id.type = E.type then void else type_error} } • If Statements: Sif E then S1 {S.type = { if E.type = Boolean then S1.type else type_error } }
  • 45.
    Type checking Expressions(cont..d) • While Statements: Swhile E do S1 {S.type = { if E.type = Boolean then S1.type else type_error } } • Pointers: If T is a type expression, then pointer(T) is a type expression (i.e. pointer to an object of type T). E*E1 {E.type = {if E.type = ptr(T) then T else type_error } }
  • 46.
    Type checking Expressions(cont..d) • Functions: It maps a domain type D to a range type R. The type of such function is denoted by type expression DR. Mapping: T DR {T.type = D.type  R.type } Function call: E E1 (E2) {E.type = {if E2.type = T1 and E1.type = T1 T2 then T2 else type_error }}
  • 47.
    Type checking rulesfor coercions • Implicit type conversions (by Compiler) and Explicit type conversions (by programmer) EE1 op E2 {E.type = {if E1.type = integer and E2.type = integer then integer else if E1.type = integer and E2.type = float then float else if E1.type = float and E2.type = integer then float else if E1.type = float and E2.type = float then float else type_error
  • 48.
    Exercise • Write productionsand semantic rules for computing types and finding their widths in bytes.
  • 49.
    Apply SDD –To find size or width of an array
  • 50.
  • 51.
    Type Checking • Typeexpressions are checked for – Correct code – Security aspects – Efficient code generation
  • 52.
    Reference • A.V. Aho,M.S. Lam, R. Sethi, J. D. Ullman, Compilers Principles, Techniques and Tools, Pearson Edition, 2013. P. Kuppusamy - Lexical Analyzer