Principles of Compiler Design
Chapter 3
Lecture 2
Top-Down Parsing
1
Top-Down Parsing
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other
alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
21 Q
Recursive-Descent Parsing (uses Backtracking)
• Backtracking is needed.
• It tries to find the left-most derivation.
S aBc
B bc | b
S S
input: abc
a B c a B c
fails, backtrack
b c b
3
Predictive Parser
a grammar a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.
• When re-writing a non-terminal in a derivation step, a predictive parser
can uniquely choose a production rule by just looking the current
symbol in the input string.
A 1 | ... | n input: ... a .......
current token
4
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the grammar
must be left-factored.
grammar a new equivalent grammar suitable for predictive parsing
stmt → if expr then stmt else stmt | if expr then stmt when we see if, we cannot now
which production rule to choose to re-write stmt in the derivation.
In general,
A → βα1 | βα where α is non-empty and the
first symbols of β1 and β2 (if they have one)are different.
when processing α we cannot know whether expand A to βα1 or A to βα2 But, if
we re-write the grammar as follows
A → αA’
A’ → β1 | β2 so, we can immediately expand A to αA’
5
Predictive Parser (example)
stmt if ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current token
is if we have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it
may not be suitable for predictive parsing (not LL(1) grammar).
6
Recursive Predictive Parsing
• Each non-terminal corresponds to a procedure.
Ex: A aBb (This is only the production rule for A)
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
7
Recursive Predictive Parsing (cont.)
A aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
8
Recursive Predictive Parsing (cont.)
• When to apply -productions.
A aA | bB |
• If all other productions fail, we should apply an -production. For
example, if the current token is not a or b, we may apply the
-production.
• Most correct choice: We should apply an -production for a non-
terminal A when the current token is in the follow set of A (which
terminals can follow A in the sentential forms).
9
Recursive Predictive Parsing (Example)
A aBe | cBd | C
B bB |
Cf
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the current token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C
}
}
follow set of B
first set of C
10
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL(1) Parser.
input buffer
stack Non-recursive output
Predictive Parser
Parsing Table
11
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.
output
– a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input
buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
12
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input string
(say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule XY1Y2...Yk to represent a step of the
derivation.
4. none of the above error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.
13
LL(1) Parser – Example1
S aBa a b $ LL(1) Parsing
B bB | S S aBa Table
B B B bB
stack input output
$S abba$ S aBa
$aBa abba$
$aB bba$ B bB
$aBb bba$
$aB ba$ B bB
$aBb ba$
$aB a$ B
$a a$
$ $ accept, successful completion
14
LL(1) Parser – Example1 (cont.)
Outputs: S aBa B bB B bB B
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b B
b B
15
LL(1) Parser – Example2
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
id + * ( ) $
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
16
LL(1) Parser – Example2
stack input output
$E id+id$ E TE’
$E’T id+id$ T FT’
$E’ T’F id+id$ F id
$ E’ T’id id+id$
$ E ’ T’ +id$ T’
$ E’ +id$ E’ +TE’
$ E’ T+ +id$
$ E’ T id$ T FT’
$ E ’ T’ F id$ F id
$ E’ T’id id$
$ E ’ T’ $ T’
$ E’ $ E’
$ $ accept
17
Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW
• FIRST() is a set of the terminal symbols which occur as first symbols
in strings derived from where is any string of grammar symbols.
• if derives to , then is also in FIRST() .
• FOLLOW(A) is the set of the terminals which occur immediately after
(follow) the non-terminal A in the strings derived from the starting
symbol.
– a terminal a is in FOLLOW(A) if S * Aa
*
– $ is in FOLLOW(A) if S A
18
Compute FIRST for Any String X
• If X is a terminal symbol FIRST(X)={X}
• If X is a non-terminal symbol and X is a production rule
is in FIRST(X).
• If X is a non-terminal symbol and X Y1Y2..Yn is a production rule
if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
• If X is FIRST(X)={}
• If X is Y1Y2..Yn
if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
19
FIRST Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
FIRST(F) = {(,id} FIRST(TE’) = {(,id}
FIRST(T’) = {*, } FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST() = {}
FIRST(E’) = {+, } FIRST(FT’) = {(,id}
FIRST(E) = {(,id} FIRST(*FT’) = {*}
FIRST() = {}
FIRST((E)) = {(}
FIRST(id) = {id}
20
Compute FOLLOW (for non-terminals)
• If S is the start symbol $ is in FOLLOW(S)
• if A B is a production rule
everything in FIRST() is FOLLOW(B) except
• If ( A B is a production rule ) or
( A B is a production rule and is in FIRST() )
everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
21
FOLLOW Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
22
Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A of a grammar G
– for each terminal a in FIRST()
add A to M[A,a]
– If in FIRST()
for each terminal a in FOLLOW(A) add A to M[A,a]
– If in FIRST() and $ in FOLLOW(A)
add A to M[A,$]
• All other undefined entries of the parsing table are error entries.
23
Constructing LL(1) Parsing Table -- Example
E TE’ FIRST(TE’)={(,id} E TE’ into M[E,(] and M[E,id]
E’ +TE’ FIRST(+TE’ )={+} E’ +TE’ into M[E’,+]
E’ FIRST()={} none
but since in FIRST()
and FOLLOW(E’)={$,)} E’ into M[E’,$] and M[E’,)]
T FT’ FIRST(FT’)={(,id} T FT’ into M[T,(] and M[T,id]
T’ *FT’ FIRST(*FT’ )={*} T’ *FT’ into M[T’,*]
T’ FIRST()={} none
but since in FIRST()
and FOLLOW(T’)={$,),+} T’ into M[T’,$], M[T’,)] and M[T’,+]
F (E) FIRST((E) )={(} F (E) into M[F,(]
F id FIRST(id)={id} F id into M[F,id]
24
LL(1) Grammars
• A grammar whose parsing table has no multiply-defined entries is said
to be LL(1) grammar.
one input symbol used as a look-head symbol do determine parser action
LL(1) left most derivation
input scanned from left to right
• The parsing table of a grammar may contain more than one production
rule. In this case, we say that it is not a LL(1) grammar.
25
A Grammar which is not LL(1)
SiCtSE | a FOLLOW(S) = { $,e }
EeS | FOLLOW(E) = { $,e }
Cb FOLLOW(C) = { t }
FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S Sa S iCtSE
FIRST(eS) = {e}
E EeS E
FIRST() = {}
E
FIRST(b) = {b}
C Cb
two production rules for M[E,e]
Problem ambiguity
26
A Grammar which is not LL(1) (cont.)
• What do we have to do it if the resulting parsing table contains multiply
defined entries?
– If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
– If the grammar is not left factored, we have to left factor the grammar.
– If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is
ambiguous or it is inherently not a LL(1) grammar.
• A left recursive grammar cannot be a LL(1) grammar.
– A A |
any terminal that appears in FIRST() also appears FIRST(A) because A .
If is , any terminal that appears in FIRST() also appears in FIRST(A) and FOLLOW(A).
• A grammar is not left factored, it cannot be a LL(1) grammar
• A 1 | 2
any terminal that appears in FIRST(1) also appears in FIRST(2).
• An ambiguous grammar cannot be a LL(1) grammar.
27
Properties of LL(1) Grammars
• A grammar G is LL(1) if and only if the following conditions hold for
two distinctive production rules A and A
1. Both and cannot derive strings starting with same terminals.
2. At most one of and can derive to .
3. If can derive to , then cannot derive to any string starting
with a terminal in FOLLOW(A).
28