A Simple
Syntax Directed
Translation
Chapter 2
Faryal Shamsi
Lecturer Computer Science
Sukkur IBA University
Recap. Lecture
1
Structure of
Compiler
Symbol Table
Analysis
• The analysis phase of a compiler breaks up a source program into
constituent pieces and produces an internal representation for it,
called intermediate code.
• Lexical Analysis,
• Syntax and Semantics Analysis,
• Intermediate Code Generation.
Syntax Directed Translation
• Focuses on: The front-end compiler
A Syntax Directed Translation
Tools and Techniques
Recap: Lexical Analyzer
• A lexical analyzer allows a translator to handle multi character
constructs like identifiers , which are written as sequences of
characters, but are treated as units called tokens during syntax
analysis;
• for example, in the expression count + 1, the identifier count is
treated as a unit. The lexical analyzer allows numbers , identifiers,
and "white spaces" ( blanks , tabs, and newlines) to appear
within expressions.
Syntax Vs Semantics
• The syntax of a programming language describes the proper structure of its
programs, while the semantics of the language defines –
• what its programs mean;
• what each program does when it executes.
• For specifying syntax, we present a widely used notation, called –
• Context-free grammars(CFG)
• or BNF (for Backus-Naur Form)
• For specifying semantics , we shall therefore use
• Syntax Directed Definition and
• Attribute Grammar
Context Free Grammar
• A set of terminal symbols, may be referred to as “tokens”.
• A set of non-terminals, may be called “syntactic variables”.
• A designation of one of the non-terminals as the start symbol.
• A set of productions, where each production consists of a
1. Non-terminal, called the head or left-side of the production,
2. an arrow, and
3. a sequence of terminals and/or non-terminals , called the body or right
side of the production.
CFG Example
• lists of digits separated by plus or minus signs .
• OR
• The string of zero terminals, written as є, is called the empty string.
• A grammar for a subset of Java statements
Suppose,
Syntax CFG
• Using the variable expr to denote an expression and the variable stmt to
denote a statement , this structuring rule can be expressed as:
• Such a rule is called a production.
• In a production, lexical elements like the keyword if and the parentheses are
called terminals.
• Variables like expr and stmt represent sequences of terminals and are called
non-terminals.
Parse Trees
• Parsing is the problem of
• taking a start symbol and
• figuring out how to derive a string of terminal from the it following the grammar
rules
• And if it cannot be derived from the start symbol of the grammar , then
reporting syntax errors within the string.
Parse Trees
• A parse tree pictorially shows how the start symbol of a grammar derives a
string in the language.
• If non-terminal A has a production A XYZ, then a parse tree may have an
interior node labeled A with three children labeled X, Y, and Z, from left to
right:
Parse Trees
Parse Trees
• The process of finding a parse tree for a given string of terminals using a
predefined CFG, is called parsing.
• Such as – 9+5-2
• Can you parse it?
Parse tree Example
• The derivation of 9-5+2
Input:
for ( ; expr ; expr )
Ambiguity
• A grammar can have more than one parse trees generating a given string of
terminals.
• Such a grammar is said to be ambiguous.
• Since a string with more than one parse trees usually has more than one
meaning,
• we need to design unambiguous grammars for compiling applications, or to use
ambiguous grammars with additional rules to resolve the ambiguities .
Consider a CFG -
E E+E
E E*E
E id
Can you parse id+id*id ?
Parsing id+id*id E
E+E |E*E | id
E E
E E E E
+ *
E E E E id
id * +
id id id id
Precedence of Operators
• Precedence? 9+5*2
• *, / should have higher precedence
• -, + should have lower precedence
Precedence of Operators
Resolving Precedence Problem
E E+E | E*E | id
Above statement can be
EE+T
TT*F
F id
Resolving Precedence Problem
Can you identify some problem here?
EE+T
TT*F
F id
Resolving Precedence Problem
EE+T
TT*F
F id
Try parsing:
• id+id*id*id
• id+id*id+id
Resolving Precedence Problem
EE+T
TT*F
F id
• Convert into
EE+T|T
TT*F|F
F id
Resolving Precedence Problem
E E+T | E-T | T
T T*F | T/F | F
F id | E
Associativity of Operators
• Associativity of 5? 9-5+2
• +,-,*,/ are mostly considered as left associative
• Special operators like ^ used for exponential is mostly right
associative
• = and its descendants are right associative in C
• a=b=c should be treated as a=(b=c)
Removing Ambiguity
• E E+E | id
• E E+id | id
• Also called left associative or left recursive
• E id+E | id
• Also called right associative or right recursive
Associativity of Operators
• A well written grammar must have well defined associativity of
operators
• Left Associativity
• Right Associativity
Associativity of Operators
4.3.4 – Left Factoring
• A left recursive grammar is not useful for predictive parsing
• Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive, or top-down, parsing.
• When the choice between two alternative A-productions is not clear,
we may be able to rewrite the productions to defer the decision until
enough of the input has been seen that we can make the right choice.
4.3.4 – Left Factoring
• A left recursive grammar – • A left factored grammar –
AAα A β A’
Aβ A’ α A’ | ε
4.3.4 – Left Factoring
• A left recursive grammar –
• A left factored grammar –
AAα
Aβ A β A’
A’ α A’ | ε
EE+T
ET
4.3.4 – Left Factoring
• A left recursive grammar –
• A left factored grammar –
AAα
Aβ A β A’
A’ α A’ | ε
EE+T
ET E T E’
E’ +T E’ | ε
Solve
• S (L) | i • S Sa | Sb | c
L L,S | S
• S Aa | Bb
•SS0S1S A Ac | Ad | B
S 01 B Ba | Bc
4.3.4 – Left Factoring
• Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive, or top-down, parsing.
• For example, if we have the two productions of a non deterministic
grammar –
stmt if expr then stmt else stmt
stmt if expr then stmt
4.3.4 – Left Factoring
• A left factored grammar –
• A non deterministic grammar –
4.3.4 – Left Factoring
• A left recursive grammar – • A left factored grammar –
This grammar is still ambiguous
Sometimes, you need to think out of
the box!
Infix VS Postfix Notation
• Infix expressions are readable and solvable by humans. We can easily
distinguish the order of operators, and also can use the parenthesis to
solve that part first during solving mathematical expressions.
• The computer cannot differentiate the operators and parenthesis
easily, that's why postfix conversion is needed.
Postfix Notation
Postfix Notation (Example)
• E = E1 op E2 postfix will be E = É1 É2 op
• (9-5)+2 postfix will be 95-2+
• Parentheses are not needed in postfix conversion
• 9-5+2 postfix will be 952-+
• 952+-3*
• Scan from left, first we face + sign, solve 52+ will be 7
• Now 97-3*, first we face – sign, solve 97- will be 2
• Now 23* solve it as 6
Convert into Postfix
•A*B+C/D
Convert into Postfix
• A*B+C/D
• A * (B + C) / D
Convert into Postfix
• A*B+C/D
• A * (B + C) / D
• A * (B + C / D)
Convert into Postfix
•A*B+C/D AB*CD/+
• A * (B + C) / D ABC+*D/
• A * (B + C / D) ABCD/+*
Intermediate Code Generation
• Two forms of intermediate code.
• One form, called abstract syntax trees (AST) or simply syntax trees, represents
the hierarchical syntactic structure of the source program.
• The other one is the parser, produces a syntax tree , that is further translated
into three-address code.
• Some compilers combine parsing and intermediate-code generation into one
component.
Intermediate Representation
• A common intermediate representation is a sequence of "three-address"
instructions.
• This form of intermediate code takes its name from instructions of the form x
= y op z ,
• where op is a binary operator,
• y and z the are addresses for the operands, and
• x is the address for the result of the operation.
• A three-address instruction carries out at most one operation,
• typically a computation, a comparison, or a branch.
a) AST – Abstract Syntax Tree
b) Intermediate Code
do i = i + 1; while ( a [i] > v) ;
Consider the following
The 3 address code