1
A SIMPLE SYNTAX-
DIRECTED TRANSLATOR
Chapter 2
2
Building a Simple Compiler
• Building our compiler involves:
– Defining the syntax of a programming language
– Develop a source code parser: for our compiler
we will use predictive parsing
– Implementing syntax directed translation to
generate intermediate code
3
Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4
Example Grammar
Context-free grammar for simple expressions:
G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>
with productions P =
list list + digit
list list - digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
5
Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6
Derivation for the Example
Grammar
list
list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9-5+2
This is an example leftmost derivation, because we replaced
the leftmost nonterminal (underlined) in each step.
7
Derivation for the Example
Rightmost Grammar
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
list
P= digit - list
list digit + list digit - digit + list
digit - digit + digit
list digit - list
digit - digit + 2
list digit digit - 5 + 2
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 9-5+2
8
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or
• Each interior node is labeled by a nonterminal
• If A X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or ( denotes the empty string)
9
Parse Tree for the Example
Grammar
Parse tree of the string 9-5+2 using grammar G
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>
Leftmost derivation Rightmost derivation
In both cases, Expr * id – num * id
• The two derivations produce different parse trees
• The parse trees imply different evaluation orders!
Derivations and Parse Trees
G
Leftmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 <id,x> Op Expr
5 <id,x> – Expr E Op E
1 <id,x> – Expr Op Expr
2 <id,x> – <num,2> Op Expr
6 <id,x> – <num,2> * Expr x – Op E
E
3 <id,x> – <num,2> * <id,y>
This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>
x – 2
This evaluates as ( x – 2 ) * y
13
Ambiguity
Consider the following context-free grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
with production P =
string string + string | string - string | 0 | 1 | … | 9
This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
14
Ambiguity (cont’d)
string string
string string string
string string string string string
9 - 5 + 2 9 - 5 + 2
15
Associativity of Operators
Left-associative operators have left-recursive productions
left left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -
left-associative: */
16
Precedence of Operators
Operators with higher precedence “bind more tightly”
expr expr + term | expr – term | term
term term * factor | term / factor | factor
factor number | ( expr )
number 0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17
Syntax of Statements
stmt id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts stmt ; opt_stmts
|
18
Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19
Synthesized and Inherited
Attributes
• An attribute is said to be …
– synthesized if its value at a parse-tree node is
determined from the attribute values at the children of
the node
– Suppose a node N in a parse tree is labeled by the
grammar symbol X . We write X.a to denote the value
of attribute a of X at that node.
– inherited if its value at a parse-tree node is determined
by the parent (by enforcing the parent’s semantic rules)
20
Example Attribute Grammar
Syntax-directed definition for infix to postfix translation
String concat operator
Production Semantic Rule
expr expr1 + term expr.t := expr1.t // term.t // “+”
expr expr1 - term expr.t := expr1.t // term.t // “-”
expr term expr.t := term.t
term 0 term.t := “0”
term 1 term.t := “1”
… …
term 9 term.t := “9”
21
Example Annotated Parse Tree
expr.t = “95-2+”
expr.t = “95-” term.t = “2”
expr.t = “9” term.t = “5”
term.t = “9”
9 - 5 + 2
22
Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23
Depth-First Traversals (Example)
expr.t = “95-2+”
expr.t = “95-” term.t = “2”
expr.t = “9” term.t = “5”
term.t = “9”
9 - 5 + 2 Note: all attributes are
of the synthesized type
24
Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.
rest + term { print(“+”) } rest
rest
Embedded
semantic action
+ term { print(“+”) } rest
25
Example Translation Scheme
expr expr + term { print(“+”) }
expr expr - term { print(“-”) }
expr term
term 0 { print(“0”) }
term 1 { print(“1”) }
… …
term 9 { print(“9”) }
26
Example Translation Scheme
(cont’d)
expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
- term 2
expr
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27
Parsing
• Parsing = process of determining if a string of
tokens can be generated by a grammar
• For any CF grammar there is a parser that takes at
most O(n3) time to parse a string of n tokens
• Top-down parsing “constructs” a parse tree from
root to leaves
• Bottom-up parsing “constructs” a parse tree from
leaves to root
28
Predictive Parsing
• Recursive descent parsing is a top-down parsing
method
– Each nonterminal has one (recursive) procedure that is
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead
token to unambiguously determine the parse
operations
29
Example Predictive Parser
(Grammar)
type simple
| ^ id
| array [ simple ] of type
simple integer
| char
| num dotdot num
30
Example Predictive Parser
(Execution Step 1)
Check lookahead
type()
and call match
match(‘array’)
Input: array [ num dotdot num ] of integer
lookahead
31
Example Predictive Parser
(Execution Step 2)
type()
match(‘array’) match(‘[’)
Input: array [ num dotdot num ] of integer
lookahead
32
Example Predictive Parser
(Execution Step 3)
type()
match(‘array’) match(‘[’) simple()
match(‘num’)
Input: array [ num dotdot num ] of integer
lookahead
33
Example Predictive Parser
(Execution Step 4)
type()
match(‘array’) match(‘[’) simple()
match(‘num’) match(‘dotdot’)
Input: array [ num dotdot num ] of integer
lookahead
34
Example Predictive Parser
(Execution Step 5)
type()
match(‘array’) match(‘[’) simple()
match(‘num’) match(‘dotdot’) match(‘num’)
Input: array [ num dotdot num ] of integer
lookahead
35
Example Predictive Parser
(Execution Step 6)
type()
match(‘array’) match(‘[’) simple() match(‘]’)
match(‘num’) match(‘dotdot’) match(‘num’)
Input: array [ num dotdot num ] of integer
lookahead
36
Example Predictive Parser
(Execution Step 7)
type()
match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’)
match(‘num’) match(‘dotdot’) match(‘num’)
Input: array [ num dotdot num ] of integer
lookahead
37
Example Predictive Parser
(Execution Step 8)
type()
match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()
match(‘num’) match(‘dotdot’) match(‘num’) simple()
match(‘integer’)
Input: array [ num dotdot num ] of integer
lookahead
38
Adding a Lexical Analyzer
• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers and store identifier names
in a global symbol table
39
The Lexical Analyzer “lexer”
Lexical analyzer
y := 31 + 28*x
lexan()
<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>
token
(lookahead)
tokenval Parser
(token attribute) parse()