Intermediate Code Generation
• Intermediate codes are machine independent codes, but they are close to
  machine instructions.
• The given program in a source language is converted to an    equivalent
  program in an intermediate language by the intermediate code generator.
• Intermediate language can be many different languages, and the designer
  of the compiler decides this intermediate language.
   – syntax trees can be used as an intermediate language.
   – postfix notation can be used as an intermediate language.
   – three-address code (Quadraples) can be used as an intermediate language
       • we will use quadraples to discuss intermediate code generation
       • quadraples are close to machine instructions, but they are not actual machine instructions.
   – some programming languages have well defined intermediate languages.
       • java – java virtual machine
       • prolog – warren abstract machine
       • In fact, there are byte-code emulators to execute instructions in these intermediate languages.
                                             CS416 Compiler Design                                         1
            Three-Address Code (Quadraples)
• A quadraple is:
      x := y op z
  where x, y and z are names, constants or compiler-generated
  temporaries; op is any operator.
• But we may also the following notation for quadraples (much better
  notation because it looks like a machine code instruction)
       op y,z,x
  apply operator op to y and z, and store the result in x.
• We use the term “three-address code” because each statement usually
  contains three addresses (two for operands, one for the result).
                               CS416 Compiler Design                    2
                     Three-Address Statements
Binary Operator:          op y,z,result or result := y op z
  where op is a binary arithmetic or logical operator. This binary operator is applied to
  y and z, and the result of the operation is stored in result.
  Ex:           add a,b,c
                gt      a,b,c
                addr a,b,c
                addi a,b,c
Unary Operator:           op y,,result or result := op y
  where op is a unary arithmetic or logical operator. This unary operator is applied to y,
  and the result of the operation is stored in result.
  Ex:            uminus          a,,c
                 not             a,,c
                 inttoreal a,,c
                                      CS416 Compiler Design                                 3
              Three-Address Statements (cont.)
Move Operator:           mov y,,result or result := y
  where the content of y is copied into result.
  Ex:           mov      a,,c
                movi a,,c
                movr a,,c
Unconditional Jumps: jmp ,,L or goto L
  We will jump to the three-address code with the label L, and the execution continues
  from that statement.
  Ex:            jmp ,,L1         // jump to L1
                 jmp ,,7          // jump to the statement 7
                                    CS416 Compiler Design                                4
                 Three-Address Statements (cont.)
Conditional Jumps: jmprelop y,z,L or if y relop z goto L
  We will jump to the three-address code with the label L if the result of y relop z is
  true, and the execution continues from that statement. If the result is false, the execution
  continues from the statement following this conditional jump statement.
  Ex:            jmpgt       y,z,L1        // jump to L1 if y>z
                 jmpgte y,z,L1             // jump to L1 if y>=z
                 jmpe        y,z,L1        // jump to L1 if y==z
                 jmpne       y,z,L1        // jump to L1 if y!=z
  Our relational operator can also be a unary operator.
                 jmpnz       y,,L1          // jump to L1 if y is not zero
                 jmpz        y,,L1          // jump to L1 if y is zero
                 jmpt        y,,L1          // jump to L1 if y is true
                 jmpf        y,,L1          // jump to L1 if y is false
                                        CS416 Compiler Design                               5
              Three-Address Statements (cont.)
Procedure Parameters:   param x,, or param x
Procedure Calls:   call p,n, or call p,n
  where x is an actual parameter, we invoke the procedure p with n parameters.
  Ex: param x1,,
  param x2,,
   p(x1,...,xn)
  param xn,,
  call    p,n,
  f(x+1,y)             add      x,1,t1
  param t1,,
  param y,,
  call f,2,
                                    CS416 Compiler Design                        6
          Three-Address Statements (cont.)
Indexed Assignments:
      move y[i],,x     or      x := y[i]
      move x,,y[i]     or      y[i] := x
Address and Pointer Assignments:
      moveaddr y,,x or x := &y
      movecont y,,x or x := *y
                            CS416 Compiler Design   7
    Syntax-Directed Translation into Three-Address
                        Code
S → id := E       S.code = E.code || gen(‘mov’ E.place ‘,,’ id.place)
E → E1 + E2       E.place = newtemp();
   E.code = E1.code || E2.code || gen(‘add’ E1.place ‘,’ E2.place ‘,’ E.place)
E → E1 * E2       E.place = newtemp();
   E.code = E1.code || E2.code || gen(‘mult’ E1.place ‘,’ E2.place ‘,’ E.place)
E → - E1          E.place = newtemp();
   E.code = E1.code || gen(‘uminus’ E1.place ‘,,’ E.place)
E → ( E1 )        E.place = E1.place;
   E.code = E1.code
E → id E.place = id.place;
   E.code = null
                                            CS416 Compiler Design                 8
                 Syntax-Directed Translation (cont.)
S → while E do S1           S.begin = newlabel();
   S.after = newlabel();
   S.code = gen(S.begin “:”) || E.code ||
   gen(‘jmpf’ E.place ‘,,’ S.after) || S1.code ||
   gen(‘jmp’ ‘,,’ S.begin) ||
   gen(S.after ‘:”)
S → if E then S1 else S2  S.else = newlabel();
   S.after = newlabel();
   S.code = E.code ||
   gen(‘jmpf’ E.place ‘,,’ S.else) || S1.code ||
   gen(‘jmp’ ‘,,’ S.after) ||
   gen(S.else ‘:”) || S2.code ||
   gen(S.after ‘:”)
                                              CS416 Compiler Design   9
Translation Scheme to Produce Three-Address Code
S → id := E         { p= lookup(id.name);
     if (p is not nil) then emit(‘mov’ E.place ‘,,’ p)
     else error(“undefined-variable”) }
E → E1 + E2         { E.place = newtemp();
     emit(‘add’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → E1 * E2        { E.place = newtemp();
     emit(‘mult’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → - E1           { E.place = newtemp();
     emit(‘uminus’ E1.place ‘,,’ E.place) }
E → ( E1 )         { E.place = E1.place; }
E → id { p= lookup(id.name);
     if (p is not nil) then E.place = id.place
     else error(“undefined-variable”) }
                                             CS416 Compiler Design   10
                      Translation Scheme with Locations
S → id := { E.inloc = S.inloc } E
   { p = lookup(id.name);
      if (p is not nil) then { emit(E.outloc ‘mov’ E.place ‘,,’ p); S.outloc=E.outloc+1 }
      else { error(“undefined-variable”); S.outloc=E.outloc } }
E → { E1.inloc = E.inloc } E1 + { E2.inloc = E1.outloc } E2
     { E.place = newtemp(); emit(E2.outloc ‘add’ E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → { E1.inloc = E.inloc } E1 + { E2.inloc = E1.outloc } E2
     { E.place = newtemp(); emit(E2.outloc ‘mult’ E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → - { E1.inloc = E.inloc } E1
     { E.place = newtemp(); emit(E1.outloc ‘uminus’ E1.place ‘,,’ E.place); E.outloc=E1.outloc+1 }
E → ( E1 ) { E.place = E1.place; E.outloc=E1.outloc+1 }
E → id { E.outloc = E.inloc; p= lookup(id.name);
         if (p is not nil) then E.place = id.place
         else error(“undefined-variable”) }
                                                        CS416 Compiler Design                                  11
                                    Boolean Expressions
E → { E1.inloc = E.inloc } E1 and { E2.inloc = E1.outloc } E2
     { E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → { E1.inloc = E.inloc } E1 or { E2.inloc = E1.outloc } E2
     { E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → not { E1.inloc = E.inloc } E1
     { E.place = newtemp(); emit(E1.outloc ‘not’ E1.place ‘,,’ E.place); E.outloc=E1.outloc+1 }
E → { E1.inloc = E.inloc } E1 relop { E2.inloc = E1.outloc } E2
     { E.place = newtemp();
       emit(E2.outloc relop.code E1.place ‘,’ E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
                                                   CS416 Compiler Design                                      12
                              Translation Scheme(cont.)
S → while { E.inloc = S.inloc } E do
    { emit(E.outloc ‘jmpf’ E.place ‘,,’ ‘NOTKNOWN’);
      S1.inloc=E.outloc+1; } S1
    { emit(S1.outloc ‘jmp’ ‘,,’ S.inloc);
      S.outloc=S1.outloc+1;
      backpatch(E.outloc,S.outloc); }
S → if { E.inloc = S.inloc } E then
    { emit(E.outloc ‘jmpf’ E.place ‘,,’ ‘NOTKNOWN’);
       S1.inloc=E.outloc+1; } S1 else
    { emit(S1.outloc ‘jmp’ ‘,,’ ‘NOTKNOWN’);
      S2.inloc=S1.outloc+1;
      backpatch(E.outloc,S2.inloc); } S2
    { S.outloc=S2.outloc;
      backpatch(S1.outloc,S.outloc); }
                                            CS416 Compiler Design   13
                    Three Address Codes - Example
x:=1;                                           01:        mov    1,,x
y:=x+10;                                        02:        add    x,10,t1
while (x<y) {                                  03:        mov    t1,,y
    x:=x+1;                                     04:        lt     x,y,t2
    if (x%2==1) then y:=y+1;                    05:        jmpf   t2,,17
    else y:=y-2;                                06:        add    x,1,t3
}                                               07:        mov    t3,,x
                                                08:        mod    x,2,t4
                                                09:        eq     t4,1,t5
                                                10:        jmpf   t5,,14
                                                11:        add    y,1,t6
                                                12:        mov    t6,,y
                                                13:        jmp    ,,16
                                                14:        sub    y,2,t7
                                                15:        mov    t7,,y
                                                16:        jmp    ,,4
                                                17:
                                   CS416 Compiler Design                    14
                                          Arrays
•   Elements of arrays can be accessed quickly if the elements are stored in a block of
    consecutive locations.
A one-dimensional array A:
                …                    …
baseA low                   i                               width
baseA is the address of the first location of the array A,
width is the width of each array element.
low is the index of the first array element
location of A[i]  baseA+(i-low)*width
                                         CS416 Compiler Design                            15
                          Arrays (cont.)
baseA+(i-low)*width
can be re-written as   i*width + (baseA-low*width)
should be computed at run-time                can be computed at compile-time
• So, the location of A[i] can be computed at the run-time by evaluating
  the formula i*width+c where c is (baseA-low*width) which is evaluated
  at compile-time.
• Intermediate code generator should produce the code to evaluate this
  formula i*width+c (one multiplication and one addition operation).
                                 CS416 Compiler Design                          16
                  Two-Dimensional Arrays
• A two-dimensional array can be stored in
   – either row-major (row-by-row) or
   – column-major (column-by-column).
• Most of the programming languages use row-major method.
• Row-major representation of a two-dimensional array:
       row1     row2            rown
                              CS416 Compiler Design         17
                 Two-Dimensional Arrays (cont.)
• The location of A[i1,i2] is
   baseA+ ((i1-low1)*n2+i2-low2)*width
   baseA is the location of the array A.
   low1 is the index of the first row
   low2 is the index of the first column
   n2 is the number of elements in each row
   width is the width of each array element
• Again, this formula can be re-written as
   ((i1*n2)+i2)*width + (baseA-((low1*n1)+low2)*width)
should be computed at run-time   can be computed at compile-time
                                     CS416 Compiler Design         18
                       Multi-Dimensional Arrays
• In general, the location of A[i1,i2,...,ik] is
   (( ... ((i1*n2)+i2) ...)*nk+ik)*width + (baseA-((...((low1*n1)+low2)...)*nk+lowk)*width)
• So, the intermediate code generator should produce the codes to evaluate
  the following formula (to find the location of A[i1,i2,...,ik]) :
   (( ... ((i1*n2)+i2) ...)*nk+ik)*width + c
• To evaluate the (( ... ((i1*n2)+i2) ...)*nk+ik portion of this formula, we can
  use the recurrence equation:
   e1 = i1
   em = em-1 * nm + im
                                        CS416 Compiler Design                                 19
               Translation Scheme for Arrays
• If we use the following grammar to calculate addresses of array
  elements, we need inherited attributes.
       L → id | id [ Elist ]
       Elist → Elist , E | E
• Instead of this grammar, we will use the following grammar to calculate
  addresses of array elements so that we do not need inherited attributes
  (we will use only synthesized attributes).
       L → id | Elist ]
       Elist → Elist , E | id [ E
                                    CS416 Compiler Design              20
             Translation Scheme for Arrays (cont.)
S → L := E      { if (L.offset is null) emit(‘mov’ E.place ‘,,’ L.place)
                  else emit(‘mov’ E.place ‘,,’ L.place ‘[‘ L.offset ‘]’) }
E → E1 + E2     { E.place = newtemp();
                 emit(‘add’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → ( E1 )      { E.place = E1.place; }
E →L            { if (L.offset is null) E.place = L.place)
                  else { E.place = newtemp();
                         emit(‘mov’ L.place ‘[‘ L.offset ‘]’ ‘,,’ E.place) } }
                                  CS416 Compiler Design                          21
             Translation Scheme for Arrays (cont.)
L → id { L.place = id.place; L.offset = null; }
L → Elist ]
  { L.place = newtemp(); L.offset = newtemp();
    emit(‘mov’ c(Elist.array) ‘,,’ L.place);
    emit(‘mult’ Elist.place ‘,’ width(Elist.array) ‘,’ L.offset) }
Elist → Elist1 , E
   { Elist.array = Elist1.array ; Elist.place = newtemp(); Elist.ndim = Elist1.ndim + 1;
     emit(‘mult’ Elist1.place ‘,’ limit(Elist.array,Elist.ndim) ‘,’ Elist.place);
     emit(‘add’ Elist.place ‘,’ E.place ‘,’ Elist.place); }
Elist → id [ E
   {Elist.array = id.place ; Elist.place = E.place; Elist.ndim = 1; }
                                         CS416 Compiler Design                             22
     Translation Scheme for Arrays – Example1
• A one-dimensional double array A : 5..100
   n1=95 width=8 (double) low1=5
• Intermediate codes corresponding to x := A[y]
  mov     c,,t1           // where c=baseA-(5)*8
  mult    y,8,t2
  mov     t1[t2],,t3
  mov     t3,,x
                            CS416 Compiler Design   23
     Translation Scheme for Arrays – Example2
• A two-dimensional int array A : 1..10x1..20
   n1=10 n2=20 width=4 (integers) low1=1 low2=1
• Intermediate codes corresponding to x := A[y,z]
  mult   y,20,t1
  add    t1,z,t1
  mov    c,,t2           // where c=baseA-(1*20+1)*4
  mult   t1,4,t3
  mov    t2[t3],,t4
  mov    t4,,x
                           CS416 Compiler Design       24
      Translation Scheme for Arrays – Example3
• A three-dimensional int array A : 0..9x0..19x0..29
   n1=10 n2=20 n3=30 width=4 (integers) low1=0 low2=0 low3=0
• Intermediate codes corresponding to             x := A[w,y,z]
  mult     w,20,t1
  add      t1,y,t1
  mult     t1,30,t2
  add      t2,z,t2
  mov      c,,t3 // where c=baseA-((0*20+0)*30+0)*4
  mult     t2,4,t4
  mov      t3[t4],,t5
  mov      t5,,x
                               CS416 Compiler Design              25
                            Declarations
P →M D
M →€          { offset=0 }
D →D ; D
D → id : T { enter(id.name,T.type,offset); offset=offset+T.width }
T → int{ T.type=int; T.width=4 }
T → real      { T.type=real; T.width=8 }
T → array[num] of T1 { T.type=array(num.val,T1.type);
                        T.width=num.val*T1.width }
T → ↑ T1       { T.type=pointer(T1.type); T.width=4 }
where enter crates a symbol table entry with given values.
                                CS416 Compiler Design                26
                       Nested Procedure Declarations
•   For each procedure we should create a symbol table.
mktable(previous) – create a new symbol table where previous is the parent symbol
  table of this new symbol table
enter(symtable,name,type,offset) – create a new entry for a variable in the given
   symbol table.
enterproc(symtable,name,newsymbtable) – create a new entry for the procedure in
   the symbol table of its parent.
addwidth(symtable,width) – puts the total width of all entries in the symbol table
   into the header of that table.
•   We will have two stacks:
     – tblptr – to hold the pointers to the symbol tables
     – offset – to hold the current offsets in the symbol tables in tblptr stack.
                                                CS416 Compiler Design                27
               Nested Procedure Declarations
P → M D { addwidth(top(tblptr),top(offset)); pop(tblptr); pop(offset) }
M →€        { t=mktable(nil); push(t,tblptr); push(0,offset) }
D →D ; D
D → proc id N D ; S
          { t=top(tblptr); addwidth(t,top(offset));
            pop(tblptr); pop(offset);
            enterproc(top(tblptr),id.name,t) }
D → id : T { enter(top(tblptr),id.name,T.type,top(offset));
              top(offset)=top(offset)+T.width }
N→ €        { t=mktable(top(tblptr)); push(t,tblptr); push(0,offset) }
                                 CS416 Compiler Design                    28