KEMBAR78
Intermediate code- generation | PPTX
1
Intermediate-Code Generation
Ms.Hiba Elsayed 2014
2
In the analysis-synthesis model of a compiler, the front end analyzes a source
program and creates an intermediate representation, from which the back end
generates target code. Ideally, details of the source language are confined to the
front end, and details of the target machine to the back end. With a suitably
defined intermediate representation, a compiler for language i and machine j
can then be built by combining the front end for language i with the back end for
machine j. This approach to creating suite of compilers can save a considerable
amount of effort: m x n compilers can be built by writing just m front ends and n
back ends.
Logical structure of a compiler front end
3
The choice or design of an intermediate representation varies from compiler to
compiler. An intermediate representation may either be an actual language or it
may consist of internal data structures that are shared by phases of the compiler. C
is a programming language, yet it is often used as an intermediate form because it
is flexible, it compiles into efficient machine code, and its compilers are widely
available. The original C++ compiler consisted of a front end that generated C,
treating a C compiler as a back end.
Intermediate Code
“Abstract” code generated from AST
• Simplicity and Portability
– Machine independent code.
– Enables common optimizations on intermediate code.
– Machine-dependent code optimizations postponed to last phase.
4
Intermediate Code Generation
Translate from abstract-syntax trees to intermediate codes.
• Generating a low-level intermediate representation with two properties:
**It should be easy to produce
** It should be easy to translate into the target machine
• One of the popular intermediate code is three-address code. A three address
code:
• Each statement contains at most 3 operands; in addition to “: =”,
i.e., assignment, at most one operator.
• An” easy” and “universal” format that can be translated into most
assembly languages.
5
some of the basic operations which in the source program, to change in the
Assembly language:
Intermediate Forms
• Stack machine code:
Code for a “postfix” stack machine.
• Two address code:
Code of the form “add r1, r2”
• Three address code:
Code of the form “add src1, src2, dest”
Quadruples and Triples: Representations for three-address code.
6
7
Three-Address Code
In three-address code, there is at most one operator on the right side of an instruction;
that is, no built-up arithmetic expressions are permitted. Thus a source-language
expression like x+y*z might be translated into the sequence of three-address instructions
where t1 and t2 are compiler-generated temporary names. This unraveling of multi-
operator arithmetic expressions and of nested flow-of-control statements makes three-address
code desirable for target-code generation and optimization, The use of names for the
intermediate values computed by a program allows three-address code to be rearranged
easily.
Quadruples
8
The description of three-address instructions specifies the components of each
type of instruction, but it does not specify the representation of these
instructions in a data structure. In a compiler, these instructions can be
implemented as objects or as records with fields for the operator and the
operands. Three such representations are called "quadruples," LLtriples,a"n d
"indirect triples."
A quadruple (or just "quad') has four fields, which we call op, arg,, arg2,and
result. The op field contains an internal code for the operator. For instance,the
three-address instruction x = y +x is represented by placing + in op, y in arg1, x in
arg2, and x in result. The following are some exceptions to this rule:
Triples
9
A triple has only three fields, which we call op, arg1 and arg2. Note that the result
field) is used primarily for temporary names. Using triples, we refer to the result of
an operation x op y by its position, rather than by an explicit temporary name.
Thus, instead of the temporary tl , a triple representation would refer to position (0).
Parenthesized numbers represent pointers into the triple structure itself.
The DAG and triple representations of expressions are equivalent. The
equivalence ends with expressions, since syntax-tree variants and three-address
code represent control flow quite differently.
Representation of three-address code with implicit destination argument.
Example: a := a + b * -c;
10
Triples Examples:
Representations of a + a * (b - c) + (b - c) * d
11
12
13
Quadruples Examples:
14
15
Static Single- Assignment Form
Static single-assignment form (SSA) is an intermediate representation that facilitates
certain code optimizations. Two distinctive aspects distinguish SSA from three-
address code. The first is that all assignments in SSA are to variables with distinct
names; hence the term static single-assigrnent . Below Figure shows the same
intermediate program in three-address code and in static single assignment form.
Note :that subscripts distinguish each definition of variables p and q in the SSA
representation.
Figure : Intermediate program in three-address code and SSA
16
The same variable may be defined in two different control-flow paths in a
program. For example, the source program
has two control-flow paths in which the variable x gets defined. If we use different
names for x in the true part and the false part of the conditional statement, then
which name should we use in the assignment y = x * a? Here is where the second
distinctive aspect of SSA comes into play. SSA uses a notational convention called
the 4-function to combine the two definitions of x:
Here, (xl, x2) has the value xl if the control flow passes through the true
part of the conditional and the value x2 if the control flow passes through the
false part. That is to say, the $-function returns the value of its argument that
corresponds to the control-flow path that was taken to get to the assignment
statement containing the -function.
Generating 3-address code
17
Generation of Postfix Code for Boolean
Expressions
18
Code Generation for Statements
19
20
Conditional Statements

Intermediate code- generation

  • 1.
  • 2.
    2 In the analysis-synthesismodel of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target code. Ideally, details of the source language are confined to the front end, and details of the target machine to the back end. With a suitably defined intermediate representation, a compiler for language i and machine j can then be built by combining the front end for language i with the back end for machine j. This approach to creating suite of compilers can save a considerable amount of effort: m x n compilers can be built by writing just m front ends and n back ends. Logical structure of a compiler front end
  • 3.
    3 The choice ordesign of an intermediate representation varies from compiler to compiler. An intermediate representation may either be an actual language or it may consist of internal data structures that are shared by phases of the compiler. C is a programming language, yet it is often used as an intermediate form because it is flexible, it compiles into efficient machine code, and its compilers are widely available. The original C++ compiler consisted of a front end that generated C, treating a C compiler as a back end. Intermediate Code “Abstract” code generated from AST • Simplicity and Portability – Machine independent code. – Enables common optimizations on intermediate code. – Machine-dependent code optimizations postponed to last phase.
  • 4.
    4 Intermediate Code Generation Translatefrom abstract-syntax trees to intermediate codes. • Generating a low-level intermediate representation with two properties: **It should be easy to produce ** It should be easy to translate into the target machine • One of the popular intermediate code is three-address code. A three address code: • Each statement contains at most 3 operands; in addition to “: =”, i.e., assignment, at most one operator. • An” easy” and “universal” format that can be translated into most assembly languages.
  • 5.
    5 some of thebasic operations which in the source program, to change in the Assembly language:
  • 6.
    Intermediate Forms • Stackmachine code: Code for a “postfix” stack machine. • Two address code: Code of the form “add r1, r2” • Three address code: Code of the form “add src1, src2, dest” Quadruples and Triples: Representations for three-address code. 6
  • 7.
    7 Three-Address Code In three-addresscode, there is at most one operator on the right side of an instruction; that is, no built-up arithmetic expressions are permitted. Thus a source-language expression like x+y*z might be translated into the sequence of three-address instructions where t1 and t2 are compiler-generated temporary names. This unraveling of multi- operator arithmetic expressions and of nested flow-of-control statements makes three-address code desirable for target-code generation and optimization, The use of names for the intermediate values computed by a program allows three-address code to be rearranged easily.
  • 8.
    Quadruples 8 The description ofthree-address instructions specifies the components of each type of instruction, but it does not specify the representation of these instructions in a data structure. In a compiler, these instructions can be implemented as objects or as records with fields for the operator and the operands. Three such representations are called "quadruples," LLtriples,a"n d "indirect triples." A quadruple (or just "quad') has four fields, which we call op, arg,, arg2,and result. The op field contains an internal code for the operator. For instance,the three-address instruction x = y +x is represented by placing + in op, y in arg1, x in arg2, and x in result. The following are some exceptions to this rule:
  • 9.
    Triples 9 A triple hasonly three fields, which we call op, arg1 and arg2. Note that the result field) is used primarily for temporary names. Using triples, we refer to the result of an operation x op y by its position, rather than by an explicit temporary name. Thus, instead of the temporary tl , a triple representation would refer to position (0). Parenthesized numbers represent pointers into the triple structure itself. The DAG and triple representations of expressions are equivalent. The equivalence ends with expressions, since syntax-tree variants and three-address code represent control flow quite differently. Representation of three-address code with implicit destination argument. Example: a := a + b * -c;
  • 10.
    10 Triples Examples: Representations ofa + a * (b - c) + (b - c) * d
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    15 Static Single- AssignmentForm Static single-assignment form (SSA) is an intermediate representation that facilitates certain code optimizations. Two distinctive aspects distinguish SSA from three- address code. The first is that all assignments in SSA are to variables with distinct names; hence the term static single-assigrnent . Below Figure shows the same intermediate program in three-address code and in static single assignment form. Note :that subscripts distinguish each definition of variables p and q in the SSA representation. Figure : Intermediate program in three-address code and SSA
  • 16.
    16 The same variablemay be defined in two different control-flow paths in a program. For example, the source program has two control-flow paths in which the variable x gets defined. If we use different names for x in the true part and the false part of the conditional statement, then which name should we use in the assignment y = x * a? Here is where the second distinctive aspect of SSA comes into play. SSA uses a notational convention called the 4-function to combine the two definitions of x: Here, (xl, x2) has the value xl if the control flow passes through the true part of the conditional and the value x2 if the control flow passes through the false part. That is to say, the $-function returns the value of its argument that corresponds to the control-flow path that was taken to get to the assignment statement containing the -function.
  • 17.
  • 18.
    Generation of PostfixCode for Boolean Expressions 18
  • 19.
    Code Generation forStatements 19
  • 20.