Chapter 3
Syntax Analysis
Chapter – 3 : Syntax Analysis 1 Bahir Dar Institute of Technology
Contents (Session-1)
Introduction
Context-free grammar
Derivation
Parse Tree
Ambiguity
• Resolving Ambiguity
Immediate & Indirect Left Recursion
• Eliminating Immediate & Indirect Left Recursion
Left Factoring
Context-Free Grammars versus
Regular Expressions
Chapter – 3 : Syntax Analysis 2 Bahir Dar Institute of Technology
Introduction
Syntax analysis is the second phase of the compiler.
The parser takes the token produced by lexical analysis and builds
the syntax tree (parse tree).
The syntax tree can be easily constructed from Context-Free
Grammar.
The parser reports syntax errors in an intelligible/understandable
fashion and recovers from commonly occurring errors to continue
processing the remainder of the program.
The process of syntax analysis is performed using syntax
analyzer/parser.
The goal of the parser is to
determine the syntactic
validity of a source string
is valid , a tree is built for
use by
Chapter – 3 the subsequent
: Syntax Analysis 3 Bahir Dar Institute of Technology
Role of the syntax analyzer/Parser:
It verifies the structure generated by the tokens based on the
grammar
Parser builds the parse tree.
Parser Performs context free syntax analysis.
Parser helps to construct intermediate code.
Parser produces appropriate error messages.
Parser attempts to correct/recover few errors.
Some Issues related to parser :
Parser cannot detect errors such as:
1. Variable re-declaration
2. Variable initialization before use.
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.
Chapter – 3 : Syntax Analysis 4 Bahir Dar Institute of Technology
Some Issues related to parser:
Syntax error handling :
Programs can contain errors at many different levels. For example :
1. Lexical, such as misspelling a keyword.
2. Syntactic, such as an arithmetic expression with unbalanced
parentheses.
3. Semantic, such as an operator applied to an incompatible
operand.
4. Logical, such as an infinitely recursive call.
Functions of error handler :
1. It should report the presence of errors clearly and
accurately.
2. It should recover from each error quickly enough to be able to
detect subsequent errors.
3. It should not significantly slow down the processing of correct
programs.
Chapter – 3 : Syntax Analysis 5 Bahir Dar Institute of Technology
Error recovery strategies:
The different strategies that a parse uses to recover from a
syntactic error are:
Panic mode recovery: On discovering an error enywhere in the
statement, the parser discards rest of the statement by not
processing input from erroneous input to delimiters, such as
semicolon or end. It has the advantage of simplicity and does not
go into an infinite loop. When multiple errors in the same
statement are rare, this method is quite useful.
Phrase level recovery: On discovering an error, the parser performs
local correction on the remaining input that allows it to continue.
Example: Insert a missing semicolon, replacing comma with
semicolon or delete an extraneous semicolon etc.
Chapter – 3 : Syntax Analysis 6 Bahir Dar Institute of Technology
Error recovery strategies:
Error productions: The parser is constructed using augmented
grammar with error productions for some common errors that
may occur. The augmented grammar contains productions that
generates erroneous constructs when this error encountered.
Global correction: Given an incorrect input statement x and
grammar G, certain algorithms can be used to find a parse tree for
some closest error free statement y. This may allow the parser to
make minimal changes in the source code . However, these
methods are in general too costly in terms of time and space.
Chapter – 3 : Syntax Analysis 7 Bahir Dar Institute of Technology
Types of parsers for grammars:
Universal parsers
Universal parsing methods such as the Cocke-Younger-Kasami
algorithm and Earley's algorithm can parse any grammar.
These general methods are too inefficient to use in
production.
This method is not commonly used in compilers.
Top-down parsers Top-down methods build parse trees from the
top (root) to the bottom (leaves).
Bottom-up parsers Bottom-up methods start from the leaves and
work their way up to the root.
Chapter – 3 : Syntax Analysis 8 Bahir Dar Institute of Technology
Context Free Grammars(CFG)
CFG is used to specify the structure of legal
programs.
• The design of the grammar is an initial
phase of the design of a programming
language.
Formally a CFG G = (V, Ʃ,S,P), where:
• V = non-terminals, are variables that denote sets of (sub)strings
occurring in the language. These impose a structure on the
grammar.
• Ʃ is the set of terminal symbols in the grammar
(i.e., the set of tokens returned by the scanner)
• S is the start/goal symbol, a distinguished non-terminal in V
denoting the entire set of strings in L(G).
• P is a finite set of productions specifying how terminals and
non-
terminals can be combined to form strings in the language.
Chapter –Each production
3 : Syntax Analysis must have 9a single non-terminal
Bahir onTechnology
Dar Institute of its left hand
Context Free Grammars(CFG)…
Example (G1):
E E+E E–E | E*E | E/E | -
| E (E) E
E id
• Where
• Vt = {+, -, *, / (,), id}, Vn = {E}
• S = {E}
• Production are shown above
• Sometimes can be replaced by ::=
CFG is more expressive than RE - Every language that can be
described by regular expressions can also be described by a
CFG
• L = {𝑎𝑛𝑏𝑛 | 𝑛 >= 1}, is an example language that can be expressed by
CFG but not by RE
Context-free grammar is sufficient to describe most programming
languages.
Chapter – 3 : Syntax Analysis 10 Bahir Dar Institute of Technology
Conventions
1.These symbols are terminals :
A. Lowercase letters early in the alphabet, such as a, b, c.
B. Operator symbols such as +, *, and so on .
C. Punctuation symbols such as parentheses , comma, and so
on.
D. The digits 0, 1, ... ,9 .
E. Boldface strings such as id or if, each of which represents a
single terminal symbol.
2.These symbols are non-terminals:
i. Uppercase letters early in the alphabet, such as A, B, C.
ii. The letter S, which, when it appears, is usually the start symbol.
iii. Lowercase, italic names such as expr or stmt.
iv. Uppercase letters may be used to represent non-terminals for the
constructs.
11
For example:- non terminals for expressions, terms, and
Contd.
4. Lowercase letters late in the alphabet , chiefly u, v, ... ,z ,
represent (possibly empty) strings of terminals.
5. Lowercase Greek letters ,,, for example, represent (possibly
empty) strings of grammar symbols.
Thus, a generic production can be written as A , where A is the
head and the body.
6. A set of productions A 1, A 2, A 3,..., A k with a common
head A (call them A-productions), may be written A 1|A 2|A
3|...|A k. • The notational
conventions tell us
Call 1, 2, 3,...,k the alternatives for A that E,T, and F are
non-terminals, with
7. Unless stated otherwise, the head of the first E production is the start
the start symbol.
• The remaining
symbol.
symbols are
Example:- E E+ T|E-T|T terminals
12
T T*F|T/F|F
Derivation
A sequence of replacements of non-terminal symbols to obtain
strings/sentences is called a derivation
• If we have a grammar E E+E then we can replace E
by E+E
• In general a derivation step is A if there
is a production rule A in a grammar
• where and are arbitrary strings of terminal and non-
terminal symbols
Derivation of a string should start from a production with start
symbol in the left
S is a sentential form (terminals & non-terminals Mixed)
is a sentence if it contains only terminal symbols
Chapter – 3 : Syntax Analysis 13 Bahir Dar Institute of Technology
Derivation…
The derivations are classified into two types based on the order of
replacement of production. They are:
Leftmost derivation (LMD): If the leftmost non-terminal is
replaced by its production in derivation, then it called leftmost
derivation.
Rightmost derivation (RMD): If the rightmost non-terminal is
replaced by its production in derivation, then it called rightmost
derivation.
Egg. given grammar G : E → E+E | E*E | ( E ) | - E | id
LMD for - ( id + id )
RMD for - ( id + id )
Chapter – 3 : Syntax Analysis 14 Bahir Dar Institute of Technology
Parse Tree
Parsing is the process of analyzing a continuous stream of input in order
to determine its grammatical structure with respect to a given formal
grammar.
A parse tree is a graphical representation of a derivation. It is convenient
to see how strings are derived from the start symbol. The start symbol of
the derivation becomes the root of the parse tree.
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
E -E E E E
-(E) -(E+E)
- E - E - E
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
Chapter – 3 : Syntax Analysis 15 Bahir Dar Institute of Technology
Ambiguity
An ambiguous grammar is one that produces more than
one LMD or more than one RMD for the same sentence.
E E+E
id+E E E*E
id+E*E E+E*E
id+E*E
id+id*E
id+id*E
id+id*id id+id*id
E
E
E * E
E + E
id * E + E id
E E
id id
id id
Chapter – 3 : Syntax Analysis 16 Bahir Dar Institute of Technology
Ambiguity
For the most parsers, the grammar must be unambiguous.
If a grammar unambiguous grammar then there are
unique selection of the parse tree for a sentence
• We should eliminate the ambiguity in the grammar during
the design phase of the compiler.
An unambiguous grammar should be written to eliminate the
ambiguity.
• We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate
that grammar to restrict to this choice.
Chapter – 3 : Syntax Analysis 17 Bahir Dar Institute of Technology
Left Recursion
A grammar is left recursive if it has a
non-terminal A such that there is a derivation.
A A for some string
Top-down parsing techniques cannot handle left-
recursive grammars.
So, we have to convert our left-recursive grammar into
an equivalent grammar which is not left-recursive.
Two types of left-recursion
• immediate left-recursion - appear in a single step of the derivation (),
• Indirect left-recursion - appear in more than one step of the derivation.
Chapter – 3 : Syntax Analysis 18 Bahir Dar Institute of Technology
Eliminating left recursion
A A | where does not start with A
eliminate left recursion
A A’
A’→ A’ |
• In general,
A A1|…|A m| 1 |... | n where 1... n do not start with A
A 1A’|…. | nA’
A’→ 1 A’ | …. | nA’ |
Eliminating left recursion…
Example
E→E+T | T
T→T*F | F
F→(E) | id
After eliminating the left-recursion the grammar
becomes,
E → TE’
E’→+TE’|ε
T → FT’
T’→*FT’|ε
F→ (E) |id
Left factoring
When a non-terminal has two or more productions
whose right-hand sides start with the same
grammar symbols, the grammar is not LL(1) and
cannot be used for predictive parsing
A predictive parser (a top-down parser without
backtracking) insists that the grammar must be left-
factored.
In general : A αβ1 | αβ2 , where α-is a non empty and
the first symbol of β1 and β2.
21
Left factoring…
When processing α we do not know whether to expand
A to αβ1 or to αβ2, but if we re-write the grammar as
follows:
A αA’
A’ β1 | β2 so, we can immediately expand A to
αA’.
Example: given the following grammar:
S iEtS | iEtSeS | a
Eb
Left factored, this grammar becomes:
S iEtSS’ | a
S’ eS | ε
Eb
22
Left factoring…
The following stmt if expr then stmt else stmt
grammar: | if expr then stmt
Cannot be parsed by a predictive parser that looks
one element ahead.
But the grammar stmt if expr then stmt stmt’
can be re-written: stmt‘ else stmt |
Where is the empty string.
Rewriting a grammar to eliminate multiple productions
starting with the same token is called left factoring.
23
Left factoring…
Example 1 Example 2
A abB|aB|cdg|cdeB|cdfB A ad|a|ab|abc|b
A a A’ |cd g|cd eB|cd fB A a A’ |b
A’ bB|B A’ d|Ɛ|b |bc
A a A’ |cd A’‘ A a A’ |b
A’ bB|B A’ d| Ɛ| b A’‘
A’‘ g|eB|fB A’‘ Ɛ|c
Context-Free Grammars versus Regular Expressions
Every regular language is a context-free language, but not vice-
versa.
Example: The grammar for regular expression (a|b)*abb
Describe the same language, the set of strings of a's and b's
ending in abb. So we can easily describe these languages either by
finite Automata or PDA.
On the other hand, the language L ={anbn | n ≥1} with an equal
number of a's and b's is a prototypical example of a language that
can be described by a grammar but not by a regular expression.
We can say that "finite automata cannot count" meaning that a
finite automaton cannot accept a language like {anbn | n ≥ 1} that
would require it to keep count of the number of a's before it sees
the b’s.
Chapter – 3 : Syntax Analysis 25 Bahir Dar Institute of Technology
Context-Free Grammars versus Regular Expressions
The general comparison of Regular Expressions vs. Context-
Free Grammars:
Chapter – 3 : Syntax Analysis 26 Bahir Dar Institute of Technology
Parsin
Top Down Parsing g
Recursive-Descent Parsing
Predictive Parser
• Recursive Predictive Parsing
• Non-Recursive Predictive Parsing
LL(1) Parser – Parser Actions
Constructing LL(1) - Parsing Tables
Computing FIRST and FOLLOW functions
LL(1) Grammars
Properties of LL(1) Grammars
Chapter – 3 : Syntax Analysis 27 Bahir Dar Institute of Technology
Parsing methods
Parsing
Top down parsing Bottom up parsing (Shift reduce)
Recursive descent
Involves Back tracking Operator precedence
predictive parsing
Parsing without LR parsing
backtracking
SLR
Recursive
predictive
Non-Recursive CLR
predictive
Or LL(1) LALR
Chapter – 3 : Syntax Analysis 28 Bahir Dar Institute of Technology
Top Down Parsing
Top-down parsing involves constructing a parse tree for the input
string, starting from the root
• Basically, top-down parsing can be viewed as finding a
leftmost
derivation (LMD) for an input string.
How it works? Start with the tree of one node labeled with the start symbol
and repeat the following steps until the fringe of the parse tree matches the
input string
• 1. At a node labeled A, select a production with A on its LHS and
for
each symbol on its RHS, construct the appropriate child
• 2. When a terminal is added to the fringe that doesn't match the
input string, backtrack
• 3. Find the next node to be expanded
Chapter – 3 : Syntax Analysis 29 Bahir Dar Institute of Technology
Top Down Parsing…
Two types of top-down parsing
• Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does
not work, we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used
because it is not efficient
• Predictive Parsing
• no backtracking and hence efficient
• needs a special form of grammars (LL(1) grammars).
• Two types
– Recursive Predictive Parsing is a special form of
Recursive Descent Parsing without backtracking.
– Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
Chapter – 3 : Syntax Analysis 30 Bahir Dar Institute of Technology
Recursive-Descent Parsing
As the name indicates, recursive descent uses recursive functions
to implement predictive parsing.
It tries to find the left-most derivation.
Backtracking is needed
Example
S aBc a B c
B bc
|
b
input: abc
Chapter – 3 : Syntax Analysis 31 Bahir Dar Institute of Technology
Recursive-Descent Parsing
A left-recursive grammar can cause a recursive-descent
parser to go into an infinite loop.
Hence, elimination of left-recursion must be done before
parsing.
Consider the grammar for arithmetic expressions
Example
After eliminating the left-recursion the grammar becomes,
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ |
ε F → (E) |
id
Chapter – 3 : Syntax Analysis 32 Bahir Dar Institute of Technology
Recursive-Descent Parsing
Then write the recursive procedure for grammar as follows:
Procedur begin
Remember a
grammar
e E() If
begin input_symbol=
T( ); ’*’ then
EPRIME( ADVANCE( );
); F( ); TPRIME( );
end end
Procedure Procedur
Procedure e F( )
EPRIME( ) begin
begin If input-
If symbol=
input_symbo ’id’ then
l=’+’ then ADVANC
ADVAN E( );
CE( ); else if input-
T( ); symbol=’(‘
EPRIM
Chapter – 3 : Syntax Analysis 33 then of Technology
Bahir Dar Institute
Predictive Parser
• Predictive parsers always build the syntax tree
from the root down to the leaves and are hence
also called (deterministic) top-down parsers
Predictive Parsing can be recursive or non-
recursive
• In recursive predictive parsing, each non-
terminal corresponds to a procedure/function.
Chapter – 3 : Syntax Analysis 34 Bahir Dar Institute of Technology
Recursive Predictive Parsing…
When to apply -productions?
A aA | bB |
• If all other productions fail, we should apply an -
production.
• For example, if the current token is not a or b, we may
apply the -production.
• Most correct choice: We should apply an -production for
a non-terminal A when the current token is in the follow
set of A (which terminals can follow A in the sentential
forms).
Chapter – 3 : Syntax Analysis 35 Bahir Dar Institute of Technology
Recursive Predictive Parsing…
Chapter – 3 : Syntax Analysis 36 Bahir Dar Institute of Technology
Non-Recursive Predictive Parsing
A non-recursive predictive parser can be built by
maintaining a stack explicitly, rather than implicitly via
recursive calls
Non-Recursive predictive parsing is a table-driven top-down
parser.
Model of a table-driven predictive
parser
Chapter – 3 : Syntax Analysis 37 Bahir Dar Institute of Technology
Non-Recursive Predictive Parsing…
Input buffer
• our string to be parsed. We will assume that its end is marked with a special
symbol $.
Output
• a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
Stack
• contains the grammar symbols
• at the bottom of the stack, there is a special end marker symbol $.
• initially the stack contains only the symbol $ and the starting symbol S.
• when the stack is emptied (i.e. only $ left in the stack), the parsing is
completed.
Parsing table
• a two-dimensional array M[A,a]
• each row is a non-terminal symbol
• each column is a terminal symbol or the special symbol $
• each entry holds a production rule.
Chapter – 3 : Syntax Analysis 38 Bahir Dar Institute of Technology
Non-Recursive Predictive /LL(1) Parser – Parser Actions
The symbol at the top of the stack (say X) and the current symbol
in the input string (say a) determine the parser action.
There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and
pushes Yk,Yk- 1,...,Y1 into the stack. The parser also outputs the
production rule XY1Y2...Yk to represent a step of the derivation.
4. none of the above error
• all empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an error
case.
Chapter – 3 : Syntax Analysis 39 Bahir Dar Institute of Technology
Non-Recursive Predictive /LL(1) Parser – Parser Actions
General algorithm of LL (1) parser:
METHOD: Initially, the parser is in a configuration with w$ in the
input buffer and the start symbol S of G on top of the stack, above $.
The following procedure uses the predictive parsing table M to
produce a predictive parse for the input.
Chapter – 3 : Syntax Analysis 40 Bahir Dar Institute of Technology
LL(1) Parser – Example1
S aBa a b $ LL(1) Parsing
B bB | S S aBa Table
B B B bB Assume current
string is: abba
stack input output We will see how to
$S abba$ S aBa construct parsing
$aBa abba$ table next
$aB bba$ B bB
$aBb bba$
$aB ba$ B bB
$aBb ba$
$aB a$ B
$a a$
$ $ accept, successful completion
Chapter – 3 : Syntax Analysis 41 Bahir Dar Institute of Technology
LL(1) Parser – Example2
E TE’
E’ +TE’ |
E is start symbol
T FT’
T’ *FT’ |
id
+ * ( ) $
F (E) | id
E E TE ’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
Chapter – 3 : Syntax Analysis 42 Bahir Dar Institute of Technology
LL(1) Parser – Example2…
Parse the in put id+id stack input output
Notice: the parsing is $E id+id$ E TE’
done by
lo parse id+id$ T FT’
table $E’ T’F id+id$ F id
$ E’ T’id id+id$
$ E’ T’ +id$ T’
$ E’ +id$ E’ +TE’
$ E’ T+ +id$
$ E’ T id$ T FT’
$ E’ T’ F id$ F id
$ E’ T’id id$
$ E’ T’ $ T’
$ E’ $ E’
$ $ accept
Chapter – 3 : Syntax Analysis 43 Bahir Dar Institute of Technology
non-recursive predictive /LL(1) Parsing steps
Steps to be involved in non-recursive predictive /LL(1)
Parsing Method:
1. Stack is pushed with $.
2. Construction of parsing table T.
1. Computation of FIRST set.
2. Computation of FOLLOW set.
3. Making entries into the parsing table.
3. Parsing by the help of parsing routine.
Step 1: pushing $ to Stack and ready to start parsing
process.
Chapter – 3 : Syntax Analysis 44 Bahir Dar Institute of Technology
Step 2: Constructing LL(1) Parsing Tables
Two functions are used in the construction of LL(1) parsing tables:
FIRST and FOLLOW.
These can provide (with their sets) the actual position of any
terminal in the derivation.
FIRST() is a set of the terminal symbols which occur as
first symbols in strings derived from where is any string
of grammar symbols.
• if derives to , then is also in FIRST() .
FOLLOW(A) is the set of the terminals which occur immediately
after the non-terminal A in the strings derived from the starting
symbol.
• a terminal a is in FOLLOW(A) if S Aa
Chapter – 3 : Syntax Analysis 45 Bahir Dar Institute of Technology
Compute FIRST for a String X
1. If X is a terminal symbol, then FIRST(X)={X}
2. If X is , then FIRST(X)={}
3. If X is a non-terminal symbol and X
is a production rule, thenadd in FIRST(X).
4. If X is a non-terminal symbol and X Y1Y2..Yn
is a production rule, then terminals in FIRST(Y1)
except will be in FIRST(X).Further more
• if a terminal a in FIRST(Yi) and is in all FIRST(Yj)
for j=1,...,i-1, then a is in FIRST(X).
• if is in all FIRST(Yj) for j=1,...,n, then is in
FIRST(X).
Chapter – 3 : Syntax Analysis 46 Bahir Dar Institute of Technology
Compute FIRST for a String X…
Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’| FIRST(E’) = {+, }
F (E) | id FIRST(E) =
{(,id}
From Rule 1
FIRST(id) = {id} Others
From Rule 2 FIRST(TE’) = {(,id}
FIRST() = {} FIRST(+TE’ ) = {+}
From Rule 3 and 4 FIRST(FT’) = {(,id}
First(F) = {(, id} FIRST(*FT’) = {*}
First(T’) = {*, } FIRST((E)) = {(}
FIRST(T) =
{(,id}
Chapter – 3 : Syntax Analysis 47 Bahir Dar Institute of Technology
Compute FOLLOW (for non-terminals)
1. $ is in FOLLOW(S), if S is the start symbol
2. Look at the occurrence of a non‐terminal on the RHS of a
production which is followed by something
• if A B is a production rule, then everything in FIRST() except is
FOLLOW(B)
3. Look at B on the RHS that is not followed by anything
• If ( A B is a production rule ) or ( A B is a production
rule and
is in FIRST() ), then everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added
to any follow set.
Chapter – 3 : Syntax Analysis 48 Bahir Dar Institute of Technology
Compute FOLLOW (for non-terminals)
Example
i. E TE’
ii. E’ +TE’ |
iii. T FT’
iv. T’ *FT’ |
v. F (E) | id
FOLLOW(E) = { $, ) }, because
• From first rule Follow (E) contains $
• From Rule 2 Follow(E) is first()), from the production F (E)
FOLLOW(E’) = { $, ) } …. Rule 3
FOLLOW(T) = { +, ), $ }
• From Rule 2 + is in FOLLOW(T)
• From Rule 3 Everything in Follow(E) is in Follow(T) since First(E’)
contains
FOLLOW(F) = {+, *, ), $ } …same reasoning as
above
FOLLOW(T’)
Chapter = { Analysis
– 3 : Syntax +, ), $ } ….Rule3 49 Bahir Dar Institute of Technology
Constructing LL(1) Parsing Table -- Algorithm
For each production rule A of a grammar G
1. for each terminal a in FIRST()
add A to M[A,a]
2. If in FIRST()
for each terminal a in FOLLOW(A) add A to M[A,a]
3. If in FIRST() and $ in FOLLOW(A)
add A to M[A,$]
All other undefined entries of the parsing table are error entries.
Chapter – 3 : Syntax Analysis 50 Bahir Dar Institute of Technology
Constructing LL(1) Parsing Table -- Example
The grammmar: Step 2: compute FIRST and FOLLOW
E→E+T | T First( ) : Follow( ):
T→T*F | F FIRST(E) = { ( , id} FOLLOW(E) = { $, ) }
F→(E) | FIRST(E’) ={+ , ε } FOLLOW(E’) = { $, ) }
id FIRST(T) = { ( , id} FOLLOW(T) = { +, $, ) }
Step 1: FIRST(T’) = {*, ε } FOLLOW(T’) = { +, $, ) }
Remove FIRST(F) = { ( , id } FOLLOW(F) = {+, * , $ , ) }
left
recursion Step 3: constructing predictive parsing tree
E→TE’
E’→+TE’ | ϵ
T→FT’
T’→*FT’ |
ϵ F→(E) |
id
How it works example: M[A, a]
Chapter – 3 : Syntax Analysis 51 Bahir Dar Institute of Technology
Stack implementation to parse string id+id*id$ using the above parsing
Chapter – 3 : Syntax Analysis 52 Bahir Dar Institute of Technology
Constructing LL(1) Parsing Table -- Example
Example 2: Consider this following grammar:
S → iEtS | iEtSeS | a
E→b
After eliminating left factoring, we have
S → iEtSS’ | a
S’→ eS | ε
E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non
terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
And the Parsing table is:
Chapter – 3 : Syntax Analysis 53 Bahir Dar Institute of Technology
LL(1) Grammars
A grammar whose parsing table has no multiple-defined entries is said
to be LL(1) grammar.
• First L refers input scanned from left, the second L refers left-
most derivation and 1 refers one input symbol used as a look-
head symbol do determine parser action input scanned from left to
right
Chapter – 3 : Syntax Analysis 54 Bahir Dar Institute of Technology
A Grammar which is not LL(1)
The parsing table of a grammar may contain more than one
production rule.
• In this case, we say that it is not a LL(1) grammar.
SiCtSE |
a E e S|
Cb
a b e i t $
FIRST(iCtSE) = {i}
FIRST(a) = {a} S Sa S iCtSE
FIRST(eS) = {e}
E EeS E
FIRST() = {}
FIRST(b) = {b} E
FOLLOW(S) = { $,e } C Cb
FOLLOW(E) = { $,e }
FOLLOW(C) = { t } two production rules for M[E,e]
Problem ambiguity
Chapter – 3 : Syntax Analysis 55 Bahir Dar Institute of Technology
A Grammar which is not LL(1)
What do we have to do if the resulting parsing table contains
multiply defined entries?
• Eliminate left recursion in the grammar, if it is not eliminated
• A A |
any terminal that appears in FIRST() also
appears FIRST(A) because A .
If is , any terminal that appears in FIRST() also
appears in FIRST(A) and FOLLOW(A).
• Left factor the grammar, if it is not left factored.
• A grammar is not left factored, it cannot be a LL(1) grammar:
A 1 | 2
any terminal that appears in FIRST(1) also appears
in FIRST( 2).
• If its (new grammar’s) parsing table still contains multiply defined entries,
that grammar is ambiguous or it is inherently not a LL(1) grammar.
• An ambiguous grammar cannot be a LL(1) grammar.
Chapter – 3 : Syntax Analysis 56 Bahir Dar Institute of Technology
Error Recovery in Predictive Parsing
An error may occur in the predictive parsing (LL(1) parsing)
• if the terminal symbol on the top of stack does not match with
the current input symbol.
• if the top of stack is a non-terminal A, the current input symbol
is a, the parsing table entry M[A,a] is empty.
What should the parser do in an error case?
• The parser should be able to give an error message (as much as
possible meaningful error message).
• It should recover from that error case, and it should be able to
continue the parsing with the rest of the input.
Chapter – 3 : Syntax Analysis 57 Bahir Dar Institute of Technology
Contents (Session-2)
Bottom Up Parsing
Handle Pruning
Implementation of A Shift-Reduce Parser
LR Parsers
LR Parsing Algorithm
Actions of an LR-Parser
Constructing SLR Parsing Tables
SLR(1) Grammar
Error Recovery in LR Parsing
Chapter – 3 : Syntax Analysis 58 Bahir Dar Institute of Technology
Bottom-Up Parsing
A bottom-up parser creates the parse tree of the given
input starting from leaves towards the root.
• A bottom-up parser tries to find the RMD of the given input in the
reverse order.
Bottom-up parsing is also known as shift-reduce parsing
because its two main actions are shift and reduce.
• At each shift action, the current symbol in the input string is
pushed to a stack.
• At each reduction step, the symbols at the top of the stack will be
replaced by the non-terminal at the left side of that production.
• Accept: Successful completion of parsing.
• Error: Parser discovers a syntax error, and calls an error recovery
routine.
Chapter – 3 : Syntax Analysis 59 Bahir Dar Institute of Technology
Bottom-Up Parsing…
A shift-reduce parser tries to reduce the given input string
into the starting symbol.
a string the starting symbol
reduced to
At each reduction step, a substring of the input matching to
the right side of a production rule is replaced by the non-
terminal at the left side of that production rule.
If the substring is chosen correctly, the right most derivation
of that string is created in the reverse order.
Rightmost Derivation: S
rm
Shift-Reduce Parser finds: ... S
rm rm
Chapter – 3 : Syntax Analysis 60 Bahir Dar Institute of Technology
Shift-Reduce Parsing -- Example
S aABb input string: aaabb
A aA | a aaAbb
B bB |b aAbb reduction
aABb
S
S aABb aAbb aaAbb aaabb
Right Sentential Forms
How do we know which substring to be replaced at
each reduction step?
Chapter – 3 : Syntax Analysis 61 Bahir Dar Institute of Technology
Handle & Handle pruning
Handle: A “handle” of a string is a substring of the string that matches the right
side of a production, and whose reduction to the non terminal of the production
is one step along the reverse of rightmost derivation.
Handle pruning: The process of discovering a handle and it to
reducing appropriate left hand side non terminal is known as handle
pruning. EE+E
EE*E String: id1+id2*id3
Eid
Right sentential form Handle Production
Rightmost Derivation
id1+id2*id3 id1 Eid
E
E+id2*id3 id2 Eid
E+E
E+E*id3 id3 Eid
E+E*E
E+E*E E*E EE*E
E+E*id3 E*E
E+E E+E EE+E
E+id2*id3
E
id1+id2*id3
Chapter – 3 : Syntax Analysis 62 Bahir Dar Institute of Technology
Handle Pruning
A right-most derivation in reverse can be obtained by handle-
pruning.
S 0 1 ... n-1 n=
rm 2
rm rm rm rm
input string
Start from n, find a handle Ann in n, and replace n by
An to get n-1.
Then find a handle An-1n-1 in n-1, and replace n-1 in by An-
1 to
get n-2.
Repeat this, until we reach S.
Chapter – 3 : Syntax Analysis 63 Bahir Dar Institute of Technology
Shift reduce parser
Shift-reduce parsing is a type of bottom-up parsing that attempts to
construct a parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).
The shift reduce parser performs following basic operations/actions:
1. Shift: Moving of the symbols from input buffer onto the stack, this
action is called shift.
2. Reduce: If handle appears on the top of the stack then reduction of it
by appropriate rule is done. This action is called reduce action.
3. Accept: If stack contains start symbol only and input buffer is empty at
the same time then that action is called accept.
4. Error: A situation in which parser cannot either shift or reduce the
symbols, it cannot even perform accept action then it is called error
action.
Chapter – 3 : Syntax Analysis 64 Bahir Dar Institute of Technology
A Shift-Reduce Parser - example
E E+T | T Right-Most Derivation of id+id*id
T T*F | F E E+T E+T*F E+T*id E+F*id
F (E) | E+id*id T+id*id F+id*id id+id*id
id
Right-Most Sentential Form Reducing Production
id+id*id F id
F+id*id TF
T+id*id E
E+id*id T
E+F*id F id
E+T*id TF
E+T*F F
E+T id
E T T*F
Handles are red and underlined
E in the right-sentential
forms. E+T
Chapter – 3 : Syntax Analysis 65 Bahir Dar Institute of Technology
Implementation of A Shift-Reduce
Initial stack just contains
only the end-marker $
Stack Input Action
& the end of the
$ id+id*id$ shift
input string is marked
$id +id*id$ reduce by F id
by the end-marker $.
$F +id*id$ reduce by T F
$T +id*id$ reduce by E T Parse Tree
$E +id*id$ shift
$E+ id*id$ shift
$E+id *id$ reduce by F
$E+F *id$ id
$E+T *id$ reduce by T F
$E+T* id$ shift
$E+T*id $ shift
$E+T*F $ reduce by F
id
$E+T $
reduce by T
$E $
T*F
reduce by E E+T
Chapter – 3 : Syntax Analysis accept
66 Bahir Dar Institute of Technology
Conflicts in shift-reduce parsing
There are two conflicts that occur in shift-reduce parsing:
Shift-reduce conflict: The parser cannot decide whether to shift or to
reduce.
Example: Consider the input id+id*id generated from the grammar
G:
E->E+E | E*E | id
NB: If a shift-reduce parser cannot be used for a grammar, that grammar is called
non-LR(k) grammar. An ambiguous grammar can never be an LR grammar
Chapter – 3 : Syntax Analysis 67 Bahir Dar Institute of Technology
Conflicts in shift-reduce parsing …
Reduce-reduce conflict: The parser cannot decide which of
several reductions to make.
Example: Consider the input c+c generated from the
grammar:
M → R+R | R+c | R, R→c
Chapter – 3 : Syntax Analysis 68 Bahir Dar Institute of Technology
Shift-Reduce Parsers
The most prevalent type of bottom-up parser today is based on
a concept called LR(k) parsing;
left to right right-most k lookhead (k is omitted it is 1)
CFG
LR-Parsers covers wide range of grammars. LR
LALR
• Simple LR parser (SLR )
SLR
• Look Ahead LR (LALR)
• most general LR parser
(LR )
SLR, LR and LALR work same, only their parsing tables are
different.
Chapter – 3 : Syntax Analysis 69 Bahir Dar Institute of Technology
LR Parsers
LR parsing is attractive because:
• LR parsers can be constructed to recognize virtually all
programming-language constructs for which context-free
grammars can be written.
• LR parsing is most general non-backtracking shift-reduce
parsing, yet it is still efficient.
• The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive parsers.
• LL(1)-Grammars LR(1)-Grammars
• An LR-parser can detect a syntactic error as soon as it is
possible to do so a left-to-right scan of the input.
Drawback of the LR method is that it is too much
work
to construct an LR parser by hand.
• Use tools e.g. yacc
Chapter – 3 : Syntax Analysis 70 Bahir Dar Institute of Technology
An overview of LR Parsing model
input a1 ... ai ... an $
stack
Sm
Xm
LR Parsing Algorithm output
Sm-1
Xm-1
.
.
Action Table Goto Table
S1
terminals and $ non-terminal
X1 s s
t four different each item is
S0 actions will be t a state number
a applied
a
t
e t
S e
LR parser configuration
Behavior of an LR parser describe the complete
state of the parser.
A configuration of an LR parser is a pair:
(S0 X1 S1 X2 S2… Xm Sm , ai ai+1 … an $)
inputs
stack
This configuration represents the right-sentential form
(X1 X2 … Xm , ai ai+1,…, an $)
Xi is the grammar symbol Note: S0 is on the top of the
represented by state Si. stack at the beginning of
72
parsing
Behavior of LR parser
The parser program decides the next step by using:
the top of the stack (Sm),
the input symbol (ai), and
the parsing table which has two parts: ACTION and
GOTO.
then consulting the entry ACTION[Sm , ai] in the parsing
action table
1. If Action[Sm, ai] = shift S, the parser program shifts both
the current input symbol ai and state S on the top of the
stack, entering the configuration
(S0 X1 S1 X2 S2 … Xm Sm ai S, ai+1 … an $)
73
Behavior of LR parser…
2. Action[Sm, ai] = reduce A β: the parser pops the first
2r symbols off the stack, where r = |β| (at this point,
Sm-r will be the state on top of the stack), entering the
configuration,
(S0 X1 S1 X2 S2 … Xm-r Sm-r A S, ai ai+1 … an $)
Then A and S are pushed on top of the stack where
S = goto[Sm-r, A]. The input buffer is not modified.
3. Action[Sm, ai] = accept, parsing is completed.
4. Action[Sm, ai] = error, parsing has discovered an error
and calls an error recovery routine.
74
LR-parsing algorithm
METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state,
and w$ in the input buffer.
Let a be the first symbol of w$;
while(1)
{ /* repeat forever */
let S be the state on top of the stack;
if ( ACTION[S, a] = shift t )
{ push t onto
the stack;
let a be the next input
symbol;
}
else if ( ACTION[S, a] = reduce
Aβ) //reduce previous input
symbol to head
{ pop |β| symbols off the stack;
let state t now be on top of the stack; push
GOTO[t, A] onto the stack; output the
production Aβ;
} else if ( ACTION[S, a] = accept ) break; /*
Chapter –parsing
3 : SyntaxisAnalysis
done */ 75 Bahir Dar Institute of Technology
(SLR) Parsing Tables for Expression Grammar
Action Table Goto Table
Expression state id + * ( ) $ E T F
Grammar 0 s5 s4 1 2 3
1) E 1 s6 acc
E+T 2 r2 s7 r2 r2
3 r4 r4 r4 r4
2) E T
4 s5 s4 8 2 3
3) T 5 r6 r6 r6 r6
T*F 6 s5 s4 9 3
4) T F 7 s5 s4 10
5) F 8 s6 s11
9 r1 s7 r1 r1
(E)
10 r3 r3 r3 r3
6) F id 11 r5 r5 r5 r5
Steps for Constructing SLR Parsing Tables
1. Augment G and produce G' (The purpose of this new starting
production is to indicate to the parser when it should stop parsing
and announce acceptance of the input)
2. Construct the canonical collection of set of items C for
G' (finding closure/LR(0) items)
3. Compute goto(I, X), where, I is set of items and X is grammar
symbol.
4. Construction of DFA using GOTO result and find FOLLOW
set
5. Construct LR(0) parsing table using the DFA result
and FOLLOW set
Chapter – 3 : Syntax Analysis 77 Bahir Dar Institute of Technology
Constructing SLR Parsing Tables – LR(0) Item
An LR parser makes shift-reduce decisions by maintaining
states to keep track of where we are in a parse.
An LR(0) item of a grammar G is a production of G a dot at the
some position of the right side.
• Ex: A aBb Possible LR(0)
Items:
A
A a Bb
. aBb
..
(four different possibility) A aB b
A aBb
Sets of LR(0) items will be the states of action and goto table of
.
the SLR parser.
• i.e. States represent sets of "items.“
A collection of sets of LR(0) items (the canonical LR(0) collection)
is the basis for constructing a deterministic finite automaton that
is used to make parsing decisions.
Such an automaton is called an LR(0) automaton..
Chapter – 3 : Syntax Analysis 78 Bahir Dar Institute of Technology
Constructing SLR Parsing Table:- closure & GOTO operation
closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of
items constructed from I by two rules:
1. Initially, every item in I is added to closure(I).
2. If A → α .Bβ is in closure(I) and B → γ is a production, then add the
item B →
.γ to I , if it is not already there. We apply this rule until no more new
items can be added to closure(I).
Goto operation: Goto(I, X) (I is CLOURE set and X is all Grammar symbol)
Goto(I, X) is defined to be the closure of the set of all items [A→ αX.β] such
that [A→ α .Xβ] is in I.
GOTO (I, X) will be performed based on the following Rules:
1. If A->X.BY where B is a Terminal, including this item only in the
CLOSURE (X) Item.
2. If A->X.BY where B is a Non-Terminal including this item along
with B's CLOSURE (B).
Chapter – 3 : Syntax Analysis 79 Bahir Dar Institute of Technology
Algorithm for construction of SLR parsing Table
Input: An augmented grammar G’
Output: The SLR parsing table functions action and goto for G’
Method:
1. Construct C = {I0, I1, …. In}, collection of sets of LR(0) items for G’.
2. State i is constructed from Ii.. The parsing functions for state i are determined
as follows:
If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to
“shift j”. Here a must be terminal.
If [A→α∙] is in Ii , then set action[i,a] to “reduce A→α” for
all a
in
FOLLOW(A).
If [S’→S.] is in Ii, then set action[i,$] to “accept”.
If any conflicting actions are generated by the above rules, we say grammar is
not SLR(1).
3. The goto transitions for state i are constructed for all non -terminals A using the
rule : If goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial
Chapter state Analysis
– 3 : Syntax of the parser is the one
80 constructed
Bahir Darfrom theofsetTechnology
Institute of items
Example for LR(0) parser
Example of LR(0): Let the grammar G1:
Chapter – 3 : Syntax Analysis 81 Bahir Dar Institute of Technology
Example of LR(0) parsing Table
Step 4: Construct a DFA
NB: I1, I4, I5, I6 are called final items. They lead to fill the
‘reduce’/ri action in specific row of action part in a
Chapter – 3 : Syntax Analysis 82 Bahir Dar Institute of Technology
Example of LR(0) parsing Table
Step 5: construct LR(0) parsing table
First level the grammar with integer value i.e
S→AA ---1
A→aA ---2
A→ b ---3
LR(0) parsing table
NB: In the LR(0) construction table whenever any state having final item in
that particular row of action part put Ri completely.
egg. in row 4, put R3 , 3 is a leveled number for production in G
Chapter – 3 : Syntax Analysis 83 Bahir Dar Institute of Technology
Example of LR(0) parsing Table
Step 6: check the parser by implementing using stack for string abb$
Chapter – 3 : Syntax Analysis 84 Bahir Dar Institute of Technology
Example of SLR(1) parsing Table
Example of SLR(1) parsing table: use example the above grammar previously
used in LR(0) example
To construct SLR(1) parser it is the same as to LR(0) parser but the
difference is only construction of parsing table (that is to fill reduce part
(Ri), we must use FOLLOW set.
If the input terminal belongs to FOLLOW set, fill the corresponding
cell else not fill the corresponding cell. Hence apply the difference only.
Find the FOLLOW set of each non terminal.
SAA ---1
AaA ---2
Ab -----3
FOLLOW(S) ={$}, FOLLOW(A) ={$, a, b}
Ab. is final item. Then FOLLOW(A) ={$, a, b}, fill row 4 of action part
with
R3.
SAA. is final item. FOLLOW(S) ={$}, fill row 5.$ cell with R1.
AaA. is final item. FOLLOW(A) ={$, a, b}, fill row 6 of action part with
R2. – 3 : Syntax Analysis
Chapter 85 Bahir Dar Institute of Technology
Example of SLR(1) parsing Table
SLR(1) parsing table
Stack implementation is the same as to LR(0)
Chapter – 3 : Syntax Analysis 86 Bahir Dar Institute of Technology
Example of SLR(1) parsing Table
SLR(1) parsing table
Stack implementation is the same as to LR(0)
Chapter – 3 : Syntax Analysis 87 Bahir Dar Institute of Technology
Constructing SLR parsing tables…
Exercise: Construct the SLR parsing table
for the following grammar and show the
input string
id * id + id is parsed:
EE+T
ET
TT*F
TF
F (E)
F id
Answer:
I = {[E’ .E]}
Closure (I) = {[E’ .E], [E .E + T], [E .T], [T
.T * F], [T .F], [F .(E)], [F .id]}
88
Constructing SLR parsing tables…
I0 = {[E’ .E], [E .E + T], [E .T], [T .T * F],
[T .F], [F .(E)], [F .id]}
I1 = Goto (I0, E) = {[E’ E.], [E E. + T]}
I2 = Goto (I0, T) = {[E T.], [T T. * F]}
I3 = Goto (I0, F) = {[T F.]}
I4 = Goto (I0, () = {[F (.E)], [E .E + T], [E .T],
[T .T * F], [T .F], [F . (E)], [F .id]}
I5 = Goto (I0, id) = {[F id.]}
I6 = Goto (I1, +) = {[E E + . T], [T .T * F], [T .F],
[F .(E)], [F .id]}
89
I7 = Goto (I2, *) = {[T T * . F], [F .(E)],
[F .id]}
I8 = Goto (I4, E) = {[F (E.)], [E E . + T]}
Goto(I4,T)={[ET.], [TT.*F]}=I2;
Goto(I4,F)={[TF.]}=I3;
Goto (I4, () = I4;
Goto (I4, id) = I5;
I9 = Goto (I6, T) = {[E E + T.], [T T . * F]}
Goto (I6, F) = I3;
Goto (I6, () = I4;
Goto (I6, id) = I5;
I10 = Goto (I7, F) = {[T T * F.]}
Goto (I7, () = I4;
Goto (I7, id) = I5;
I11= Goto (I8, )) = {[F (E).]}
Goto (I8, +) = I6;
Goto (I9, *) = I7;
90
LR(0) automation
91
SLR table construction method…
Construct the SLR parsing table for the grammar
G1’
Follow (E) = {+, ), $} Follow (T) = {+, ), $,
*}
Follow (F) = {+, ), $,*}
E’ E
1 EE+T
2 ET
3 TT*F
4 TF
5 F (E)
6 F id
92
Stat action goto
e
id + * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 accep
t
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S5 S4 8 2 3
5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S1
1
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5 93
SLR parser…
How a shift/reduce parser parses an input string w = id * id +
id using the parsing table shown above.
3-94
LR parsing: Exercise
Given the following Grammar:
(1) S A
(2) S B
(3) A a A b
(4) A 0
(5) B a B b b
(6) B 1
Construct the SLR parsing table.
Write the action of an LR parse for the following
string
aa1bbbb
95
LALR and CLR parser
NB: LR(0) and SLR(1) used LR(0) items to create a parsing table
but LALR and CLR parsers used LR(1) items in order to construct a
parsing table.
Reading assignment
• LALR parser and
• CLR parser
Chapter – 3 : Syntax Analysis 96 Bahir Dar Institute of Technology