Compiler Design
(Unit-II)
 Basic Parsing
    Vibhor Kr. Vishnoi
    Assistant Professor
2
Phases of Compiler
                     3
                            Parser
• Parser is a compiler that is used to break the data into smaller
  elements coming from lexical analysis phase.
• A parser takes input in the form of sequence of tokens and
  produces output in the form of parse tree.
• Parsing is of two types: top down parsing and bottom up
  parsing.
Parser
       Parser
       Parsing
Top Down         Bottom
 Parsing        Up Parsing
                             Parser
• Top down paring
• The top down parsing is known as recursive parsing or
  predictive parsing.
• This parsing is used to construct a parse tree for an input string.
• In the top down parsing, the parsing starts from the start
  symbol and transform it into the input symbol.
                         Parser
• Top down paring
• Parse Tree representation of
  input string "acdb" is as
  follows:
                            Parser
• Bottom up parsing
• Bottom up parsing is also known as shift-reduce parsing.
• Bottom up parsing is used to construct a parse tree for an input
  string.
• In the bottom up parsing, the parsing starts with the input
  symbol and construct the parse tree up to the start symbol by
  tracing out the rightmost derivations of string in reverse.
                      Parser
• Bottom up parsing
• Example
      • E→T
      • T→T*F
      • T → id
      • T →F
      • F → id
Type of Parsing
                           Bottom-Up Parsing
o Bottom-Up Parser : Constructs a parse tree for an input string beginning at the
   leaves(the bottom) and working up towards the root(the top)
o We can think of this process as one of “reducing” a string w to the start symbol
   of a grammar
o Bottom-up parsing is also known as shift-reduce parsing because its two main actions
   are shift and reduce.
          At each shift action, the current symbol in the input string is pushed to a
             stack.
          At each reduction step, the symbols at the top of the stack (this symbol
             sequence is the right side of a production) will replaced by the non-terminal
             at the left side of that production.
                            Handle
• A handle of a string is a substring that matches the right hand side
  of a production, and whose reduction represents one (reverse)
  step in a rightmost derivation from the start symbol.
• A handle of a right-sentential form γ is a pair h = (A → β, p)
  comprising a production A → β and a position p indicating where
  the substring β may be found and replaced by A to produce the
  previous right-sentential form in a rightmost derivation of γ
                         Handle Pruning
o A rightmost derivation in reverse can be obtained by “handle pruning”.
  That is, we start with a string of terminals w that we wish to parse. If ω
  is a sentence of grammar at hand, then ω = γ,where γn is the nth right-
  sentential form of some as yet unknown rightmost derivation
            S = 0  1  2  ...  n-1  n= 
                                                             Input string
                         A Shift-Reduce Parser
E  E+T | T       Right-Most Derivation of id+id*id
T  T*F | F       E  E+T  E+T*F  E+T*id  E+F*id
F  (E) | id          E+id*id  T+id*id  F+id*id  id+id*id
 Right-Most Sentential form      HANDLE             Reducing Production
id+id*id                             id         Fid
F+id*id                              F          TF
T+id*id                              T          ET
E+id*id                              id         Fid
E+F*id                               F          TF
E+T*id                               Id         Fid
E+T*F                               T*F         TT*F
E+T                                 E+T         EE+T
E
          A Stack Implementation of a Shift-Reduce Parser
o There are four possible actions of a shift-parser action:
       1.Shift : The next input symbol is shifted onto the top of the stack.
       2.Reduce: Replace the handle on the top of the stack by the non-terminal.
       3.Accept: Successful completion of parsing.
       4.Error: Parser discovers a syntax error, and calls an error recovery routine.
     Initial stack just contains only the end-marker $.
o   The end of the input string is marked by the end-marker $.
    A Stack Implementation of A Shift-Reduce Parser
  Stack        Input         Action
$         id+id*id$    Shift                   Parse Tree
$id       +id*id$      Reduce by Fid
$F        +id*id$      Reduce by TF               E8
$T        +id*id$      Reduce by ET
$E        +id*id$      Shift                        +
$E+       Id*id$       Shift              E3                T7
$E+id     *id$         Reduce by Fid
$E+F      *id$         Reduce by TF
$E+T      *id$         Shift                                *
                                          T2        T5           F6
$E+T*     id$          Shift
$E+T*id   $            Reduce by Fid
$E+T*F    $            Reduce by TT*F
$E+T      $            Reduce by E E+T   F1        F4           id
$E        $            Accept
                                          id         id
                  Shift reduce parsing
• Example:
• Grammar:
   • S → S+S
   • S → S-S
   • S → (S)
   • S→a
• Input string:
   • a1-(a2+a3)
   •
                  Shift reduce parsing
                        Parsing table:
• Example:
• Grammar:
   • S → S+S
   • S → S-S
   • S → (S)
   • S→a
• Input string:
   • a1-(a2+a3)
   •
           Operator precedence parsing
• Operator precedence grammar is kinds of shift reduce parsing method.
  It is applied to a small class of operator grammars.
• A grammar is said to be operator precedence grammar if it has two
  properties:
    • No R.H.S. of any production has a ∈.
    • No two non-terminals are adjacent.
• Operator precedence can only established between the terminals of the
  grammar. It ignores the non-terminal
          Operator precedence parsing
• There are the three operator precedence relations:
   • a ⋗ b means that terminal "a" has the higher precedence than terminal “b".
   • a ⋖ b means that terminal "a" has the lower precedence than terminal "b".
   • a ≐ b means that the terminal "a" and "b" both have same precedence.
  Operator precedence parsing
E  E+T | E*T|id
          id       +   *    $
 id       >        >   >    >
  +       <        >   <    >
  *       <        >   >    >
  $       <        <   <    <
   Operator precedence parsing
W=id+id*id
                      E
             E    +       E   *       E
             id       id              id
                        LR Parser
• LR parsing is one type of bottom up parsing. It is used to parse
  the large class of grammars.
• In the LR parsing, "L" stands for left-to-right scanning of the
  input and "R" stands for constructing a right most derivation in
  reverse.
• "K" is the number of input symbols of the look ahead used to
  make number of parsing decision.
                        LR Parser
• LR parsing is divided into four parts: LR (0) parsing, SLR
  parsing, CLR parsing and LALR parsing.
                        LR Parser
• LR parsing is divided into four parts: LR (0) parsing, SLR
  parsing, CLR parsing and LALR parsing.
                         LR Parser
• Parsing table is a two dimensional array. It contains two parts:
  Action part and Go To part.
                      LR (0) Parsing
• Various steps involved in the LR (0) Parsing:
   1. For the given input string write a context free grammar.
   2. Check the ambiguity of the grammar.
   3. Add Augment production in the given grammar.
   4. Create Canonical collection of LR (0) items.
   5. Draw a data flow diagram (DFA).
   6. Construct a LR (0) parsing table
                  LR (0) Parsing
• Augment Grammar
• Augmented grammar G` will be generated if we add one more
  production in the given grammar G. It helps the parser to
  identify when to stop the parsing and announce the acceptance
  of the input.
                        LR (0) Parsing
• Augment Grammar
• Example
   • Given grammar
           S → AA
           A → aA | b
   • The Augment grammar G` is represented by
           S`→ S
           S → AA
           A → aA | b
                      LR (0) Parsing
• Canonical Collection of LR(0) items
   • An LR (0) item is a production G with dot at some position on the right
     side of the production.
   • LR(0) items is useful to indicate that how much of the input has been
     scanned up to a given point in the process of parsing.
   • In the LR (0), we place the reduce node in the entire row.
                     LR (0) Parsing
• Canonical Collection of LR(0) items
• Example
    S → AA
    A → aA | b
• Add Augment Production and insert '•' symbol at the first position for
  every production in G
    S` → •S
    S → •AA
    A → •aA
    A → •b
                  LR (0) Parsing
• Canonical Collection of LR(0) items
   S` → •S
   S → •AA
   A → •aA
   A → •b
• I0 State:
• Add Augment production to the I0 State and Compute the
  Closure
  I0 = Closure (S` → •S)
                    LR (0) Parsing
• Canonical Collection of LR(0) items
• I0 State:
• Add Augment production to the I0 State and Compute the
  Closure
  I0 = Closure (S` → •S)
• Add all productions starting with S in to I0 State because "•" is
  followed by the non-terminal. So, the I0 State becomes
• I0 = S` → •S
       S → •AA
                      LR (0) Parsing
• Canonical Collection of LR(0) items
• I0 State:
• Add all productions starting with S in to I0 State because "•" is
  followed by the non-terminal. So, the I0 State becomes
• I0 = S` → •S
       S → •AA
• Add all productions starting with "A" in modified I0 State because "•"
  is followed by the non-terminal. So, the I0 State becomes.
• I0= S` → •S
       S → •AA
       A → •aA
       A → •b
                    LR (0) Parsing
• Canonical Collection of LR(0) items
• I0= S` → •S
      S → •AA
      A → •aA
      A → •b
• I1= Go to (I0, S) = closure (S` → S•) = S` → S•
• Here, the Production is reduced so close the State.
• I1= S` → S•
                      LR (0) Parsing
•   Canonical Collection of LR(0) items
•   I1= S` → S•
•   I2= Go to (I0, A) = closure (S → A•A)
•   Add all productions starting with A in to I2 State because "•" is
    followed by the non-terminal. So, the I2 State becomes
• I2 =S→A•A
        A → •aA
        A → •b
                     LR (0) Parsing
•   Canonical Collection of LR(0) items
•   I3= Go to (I0,a) = Closure (A → a•A)
•   Add productions starting with A in I3.
•   A → a•A
    A → •aA
    A → •b
• I4= Go to (I0, b) = closure (A → b•) = A → b•
• I5= Go to (I2, A) = Closure (S → AA•) = SA → A•
• I6= Go to (I3, A) = Closure (A → aA•) = A → aA•
                     LR (0) Parsing
• Canonical Collection of LR(0) items
• I0= S` → •S                         I1= S` → S•
      S → •AA
      A → •aA
      A → •b
• I2 =S→A•A
      A → •aA
      A → •b
• I3= A → a•A
       A → •aA
       A → •b
• I4= A → b•              I5=S→ AA•           I6= A → aA•
                       LR (0) Parsing
• Drawing DFA:
• The DFA contains the 7 states I0 to I6.
•
                        LR (0) Parsing
• LR(0) Table
• If a state is going to some other state on a terminal then it
  correspond to a shift move.
• If a state is going to some other state on a variable then it
  correspond to go to move.
• If a state contain the final item in the particular row then write the
  reduce node completely.
                LR (0) Parsing
• LR(0) Table
                            LR (0) Parsing
• LR(0) Table
      Explanation:
      I0 on S is going to I1 so write it as 1.
      I0 on A is going to I2 so write it as 2.
      I2 on A is going to I5 so write it as 5.
      I3 on A is going to I6 so write it as 6.
      I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
      I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
      I4, I5 and I6 all states contains the final item because they contain • in the
      right most end. So rate the production as production number.
                         SLR (1) Parsing
• SLR (1) refers to simple LR Parsing. It is same as LR(0) parsing. The
  only difference is in the parsing table. To construct SLR (1) parsing
  table, we use canonical collection of LR (0) item.
• In the SLR (1) parsing, we place the reduce move only in the follow
  of left hand side. Various steps involved in the SLR (1) Parsing:
        1.   For the given input string write a context free grammar
        2.   Check the ambiguity of the grammar
        3.   Add Augment production in the given grammar
        4.   Create Canonical collection of LR (0) items
        5.   Draw a data flow diagram (DFA)
        6.   Construct a SLR (1) parsing table
                      SLR (1) Parsing
• SLR (1) Table Construction
• The steps which use to construct SLR (1) Table is given below:
   • If a state (Ii) is going to some other state (Ij) on a terminal, then it
      corresponds to a shift move in the action part.
   • If a state (Ii) is going to some other state (Ij) on a variable then it
      correspond to go to move in the Go to part.
   • If a state (Ii) contains the final item like A → ab• which has no
      transitions to the next state then the production is known as reduce
      production. For all terminals X in FOLLOW (A), write the reduce entry
      along with their production numbers.
                     SLR (1) Parsing
• Example
• S→E
  E→E+T|T
  T→T*F|F
  F → id
• Add Augment Production and insert '•' symbol at the first position
  for every production in G
• S` → •E
  E → •E + T
  E → •T
  T → •T * F
  T → •F
  F → •id
                  SLR (1) Parsing
• I0= S` → •E
     E → •E + T
     E → •T
     T → •T * F
     T → •F
     F → •id
                      SLR (1) Parsing
• I1= Go to (I0, E) = closure (S` → E•, E → E• + T)
  I2= Go to (I0, T) = closure (E → T•T, T• → * F)
  I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
  I4= Go to (I0, id) = closure ( F → id•) = F → id•
• I5 = E → E +•T
      T → •T * F
      T → •F
      F → •id
• I6 = T → T * •F
        F → •id
• I7= Go to (I5, T) = Closure (E → E + T•) = E → E + T•
  I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•
SLR (1) Parsing
SLR (1) Parsing
                     CLR (1) Parsing
• CLR refers to canonical LR.
• CLR parsing use the canonical collection of LR (1) items to build
  the CLR (1) parsing table.
• CLR (1) parsing table produces more number of states as
  compare to the SLR (1) parsing.
• In the CLR (1), we place the reduce node only in the lookahead
  symbols.
                       CLR (1) Parsing
• Various steps involved in the CLR (1) Parsing:
   1. For the given input string write a context free grammar
   2. Check the ambiguity of the grammar
   3. Add Augment production in the given grammar
   4. Create Canonical collection of LR (1) items
   5. Draw a data flow diagram (DFA)
   6. Construct a CLR (1) parsing table
                      CLR (1) Parsing
• LR (1) item
• LR (1) item is a collection of LR (0) items and a look ahead
  symbol.
          LR (1) item = LR (0) item + look ahead
• The look ahead is used to determine that where we place the final
  item.
• The look ahead always add $ symbol for the argument production.
                 CLR (1) Parsing
• Example
• CLR ( 1 ) Grammar
   • S → AA
   • A → aA
   • A→b
                      CLR (1) Parsing
• Add Augment Production, insert '•' symbol at the first position for
    every production in G and also add the lookahead.
•   S` → •S, $         1. Augmented production must have $ as a lookahead
                       2. If A α.Bβ, a/b
•   S → •AA, $
                               Then B.γ, First(β) or a/b (if β contains null)
•   A → •aA, a/b
•   A → •b, a/b
                    CLR (1) Parsing
• I0
   • S` → •S, $
   • S → •AA, $
   • A → •aA, a/b
   • A → •b, a/b
• I1= S` → S•, $
• I2= S → A•A, $
       A → •aA, $
       A → •b, $
                    CLR (1) Parsing
I3= A → a•A, a/b
    A → •aA, a/b
    A → •b, a/b
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
            A → a•A, $
            A → •aA, $
            A → •b, $
                   CLR (1) Parsing
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $
CLR (1) Parsing
CLR (1) Parsing
                   LALR (1) Parsing:
• LALR refers to the lookahead LR.
• To construct the LALR (1) parsing table, we use the canonical
  collection of LR (1) items.
• In the LALR (1) parsing, the LR (1) items which have same
  productions but different look ahead are combined to form a
  single set of items
• LALR (1) parsing is same as the CLR (1) parsing, only difference
  in the parsing table.
                LALR (1) Parsing
• Example
• CLR ( 1 ) Grammar
   • S → AA
   • A → aA
   • A→b
                     LALR (1) Parsing
• Add Augment Production, insert '•' symbol at the first position for
    every production in G and also add the lookahead.
•   S` → •S, $
•   S → •AA, $
•   A → •aA, a/b
•   A → •b, a/b
LALR (1) Parsing
LALR (1) Parsing:
LR Parsing:
Type of Parsing
Type of Parsing
                    Non-Recursive Descent
 LL(1) Parsing:
 1st L represents that the scanning of the Input will be done from Left to Right
   manner and
 Second L shows that in this Parsing technique we are going to use Left most
   Derivation Tree. and finally the 1 represents the number of look ahead, means
   how many symbols are you going to see when you want to make a decision.
                  Non-Recursive Descent
 Left Factoring: Left
                   factoring is a grammar transformation that is
  useful for producing a grammar suitable for predictive parsing.
                  Non-Recursive Descent
Left Factoring:
 If a grammar contains two productions of form
        S→ aα and S → aβ
 It is not suitable for top down parsing without backtracking.
  Troubles of this form can sometimes be removed from the
  grammar by a technique called the left factoring.
 In the left factoring, we replace { S→ aα, S→ aβ } by
        { S → aS', S'→ α, S'→ β } cf. S→ a(α|β)
        (Hopefully α and β start with different symbols)
                   Non-Recursive Descent
 Left Factoring
 In the left factoring, we replace { S→ aα, S→ aβ } by
{ S → aS', S'→ α, S'→ β } cf. S→ a(α|β)
        (Hopefully α and β start with different symbols)
 Left factoring for G { S→aSb | c | ab }
S→aS' | c cf. S(=aSb | ab | c = a ( Sb | b) | c ) → a S' | c
S'→Sb | b
                      Non-Recursive Descent
 LL(1) Parsing:
Construction of LL(1) Parsing Table:
To construct the Parsing table, we have two functions:
1: First(): If there is a variable, and from that variable if we try to drive all the strings
then the beginning Terminal Symbol is called the first.
2: Follow(): What is the Terminal Symbol which follow a variable in the process of
derivation.
               Non-Recursive Descent
 Rules for first( ):
   1. If X is terminal, then FIRST(X) is {X}.
   2. If X → ε is a production, then add ε to FIRST(X).
   3. If X is non-terminal and X → aα is a production then add
   a to FIRST(X).
   4. If X is non-terminal and X → Y1 Y2…Yk is a production,
   then place a in FIRST(X) if for some i, a is in FIRST(Yi), and
   ε is in all of FIRST(Y1),…,FIRST(Yi-1); that is, Y1,….Yi-1
   => ε. If ε is in FIRST(Yj) for all j=1,2,..,k, then add ε to
   FIRST(X).
              Non-Recursive Descent
 Rules for follow( ):
   1. If S is a start symbol, then FOLLOW(S) contains $.
   2. If there is a production A → αBβ, then everything in
   FIRST(β) except ε is placed in follow(B).
   3. If there is a production A → αB, or a production A → αBβ
   where FIRST(β) contains ε, then everything in FOLLOW(A)
   is in FOLLOW(B).
               Non-Recursive Descent
Algorithm for construction of predictive parsing table:
Input : Grammar G
Output : Parsing table M
Method :
   1. For each production A → α of the grammar, do steps 2 and 3.
   2. For each terminal a in FIRST(α), add A → α to M[A, a].
   3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in
   FOLLOW(A). If ε is in FIRST(α) and $ is in FOLLOW(A) , add A → α to
   M[A, $].
   4. Make each undefined entry of M be error.
              Non-Recursive Descent
Example:                 After eliminating left-
Consider the following   recursion the grammar is:
grammar :                E → TE’
E → E+T | T              E’ → +TE’ | ε
T → T*F | F              T → FT’
F → (E) | id             T’ → *FT’ | ε
                         F → (E) | id
            Non-Recursive Descent
Grammar:                First( ) :
E → TE’                 FIRST(E) = { ( , id}
E’ → +TE’ | ε           FIRST(E’) ={+ , ε }
T → FT’                 FIRST(T) = { ( , id}
T’ → *FT’ | ε           FIRST(T’) = {*, ε }
F → (E) | id            FIRST(F) = { ( , id }
            Non-Recursive Descent
Grammar            Follow( ):
E → TE’            FOLLOW(E) = { $, ) }
E’ → +TE’ | ε      FOLLOW(E’) = { $, ) }
T → FT’            FOLLOW(T) = { +, $, ) }
T’ → *FT’ | ε      FOLLOW(T’) = { +, $, ) }
F → (E) | id       FOLLOW(F) = {+, * , $ , ) }
            Non-Recursive Descent
First( ) :              Follow( ):
FIRST(E) = { ( , id}    FOLLOW(E) = { $, ) }
FIRST(E’) ={+ , ε }     FOLLOW(E’) = { $, ) }
FIRST(T) = { ( , id}    FOLLOW(T) = { +, $, ) }
FIRST(T’) = {*, ε }     FOLLOW(T’) = { +, $, ) }
FIRST(F) = { ( , id }   FOLLOW(F) = {+, * , $ , ) }
              Non-Recursive Descent
First( ) :              Follow( ):
FIRST(E) = { ( , id}    FOLLOW(E) = { $, ) }
FIRST(E’) ={+ , ε }     FOLLOW(E’) = { $, ) }
FIRST(T) = { ( , id}    FOLLOW(T) = { +, $, ) }
FIRST(T’) = {*, ε }     FOLLOW(T’) = { +, $, ) }
FIRST(F) = { ( , id }   FOLLOW(F) = {+, * , $ , ) }
Non-Recursive Descent
Non-Recursive Descent
Non-Recursive Descent
Non-Recursive Descent
Non-Recursive Descent
Non-Recursive Descent
                Non-Recursive Descent
LL(1) grammar:
The parsing table entries are single entries. So each location has not more than
one entry. This type of grammar is called LL(1) grammar.