CSE-361:Compiler Design
Parsing : Part-I
Parsing
Parsing During Compilation
               regular
             expressions               errors
               lexical     token                            rest of     intermediate
  source                               parser     parse
                                                                       representation
 program      analyzer     get next                tree   front end
                            token
                                      symbol
                                                      •   Collecting token
                                       table              information
                                                      •   Perform type checking
  • uses a grammar to check structure of tokens       •   Intermediate code
                                                          generation
  • produces a parse tree
  • syntactic errors and recovery
  • recognize correct syntax
  • report errors
Parsers
We categorize the parsers into two groups:
1. Top- Down Parser: The parse tree is created top to bottom, starting from the root
2. Bottom-Up Parser: The parse tree is created bottom to top, starting from the leaves
Both Top-Down and Bottom-Up parsers scan the input from left to right ( One symbol at a time)
Efficient Top-Down and Bottom-Up parsers can be implement only for the sub-classes of context
free grammars
 ▪ LL for top-down parsing
 ▪ LR for bottom-up parsing
Errors in Programs
Error Detection
Adequate Error Reporting is Not a Trivial Task
ERROR RECOVERY
ERROR RECOVERY MAY TRIGGER MORE ERRORS!
ERROR RECOVERY APPROACHES: PANIC MODE
ERROR RECOVERY APPROACHES:
PHRASE-LEVEL RECOVERY
ERROR RECOVERY APPROACHES:
ERROR PRODUCTIONS
ERROR RECOVERY APPROACHES:
GLOBAL CORRECTION
Syntactical Analysis
Each language definition has rules that describe the syntax of well formed programs.
• Format of the rules: context-free grammars
• Why not regular expressions/NFA’s/DFA’s?
▪ Source program constructs have recursive structure:
                                 digits = [0-9]+;
                       expr = digits | “(“ expr “+” expr “)”
 ◦ Finite automata can’t recognize recursive constructs, so cannot ensure expressions
   are well-bracketed: a machine with N states cannot remember parenthesis—nesting
   depth greater than N
 ◦ CFG’s are more powerful, but also more costly to implement
CFG versus Regular Expression
CFG versus Regular Expression
Language: set of strings
String: finite sequence of symbols taken from finite alphabet
Regular expressions and CFG’s both describe languages, but over different
alphabets
CFG versus Regular Expression
CFG’s strictly more expressive than RE’s:
Any language recognizable/generated by a RE can also be recognized/generated
by a CFG, but not vice versa.
                Also known as Backus-Naur Form (BNF, Algol 60)
CONTEXT FREE GRAMMARS (CFG)
RULE ALTERNATIVE NOTATIONS
  Notational Conventions
Terminals
 ◦   Lower-case letters early in the alphabet: a, b, c
 ◦   Operator symbols: +, -
 ◦   Punctuations symbols: parentheses, comma
 ◦   Boldface strings: id or if
Nonterminals:
 ◦ Upper-case letters early in the alphabet: A, B, C
 ◦ The letter S (start symbol)
 ◦ Lower-case italic names: expr or stmt
▪Upper-case letters late in the alphabet, such as X, Y, Z, represent either nonterminals or terminals.
▪Lower-case letters late in the alphabet, such as u, v, …, z, represent strings of terminals.
Notational Conventions
▪Lower-case Greek letters, such as , , , represent strings of grammar symbols.
▪Thus A→  indicates that there is a single nonterminal A on the left side of the production
 and a string of grammar symbols  to the right of the arrow.
▪If A→ 1, A→ 2, …., A→ k are all productions with A on the left, we may write:
▪ A→ 1 | 2 | …. | k
▪Unless otherwise started, the left side of the first production is the start symbol.
                  E → E A E | ( E ) | -E | id
                  A→+|-|*| / |
SUMMARY OF NOTATIONAL CONVENTIONS
Context Free Grammars : A First Look
    Production rules:
    1. assign_stmt → id := expr ;
    2. expr → expr operator term    Derivation: A sequence of grammar rule
                                    applications and substitutions that
    3. expr → term
                                    transform a starting non-term into a
    4. term → id                    sequence of terminals / tokens.
    5. term → real
                                    Terminals: id real integer + - := ;
    6. term → integer
                                    Nonterminals: assign_stmt, expr, operator, term
    7. operator → +                 Start symbol: assign_stmt
    8. operator → -
Example Grammar:
Simple Arithmetic Expressions
 1.   expr → expr op expr
 2.   expr → ( expr )
 3.   expr → - expr         9 Production rules
 4.   expr → id
 5.   op → +
 6.   op → -                Terminals: id + - * /  ( )
 7.   op → *                Nonterminals: expr, op
                            Start symbol: expr
 8.   op → /
 9.   op → 
DERIVATIONS
DERIVATIONS
CFG Terminology
Derivation
Let’s derive: id = id + real – integer ;
                                            production rules:
 assign_stmt
 → id = expr ;                                           =
 → id = expr operator term;
 → id = expr operator term operator term;
 → id = term operator term operator term;
 → id = id operator term operator term;
 → id = id + term operator term;
 → id = id + real operator term;
 → id = id + real - term;
 → id = id + real - integer;
LEFTMOST DERIVATION
RIGHTMOST DERIVATION
PARSE TREE
PARSE TREE
             Input: (id * id) + id
PARSE TREE
Input: (id * id) + id
PARSE TREE
PARSE TREE
AMBIGUOUS GRAMMAR
AMBIGUOUS GRAMMAR
                    Two different parse trees!!
                    Which derivation of the parse tree is correct??
AMBIGUOUS GRAMMAR
AMBIGUOUS GRAMMAR
                    YES
AMBIGUOUS GRAMMAR
PROBLEMS OF AMBIGUOUS GRAMMAR
         4/2+2         4     1
                 This parse tree gives the Right Answer.
                              4/2+2=4
PROBLEMS OF AMBIGUOUS GRAMMAR
    The Ambiguous Grammar does not consider the
            Precedence and Associativity
SOLUTION: REMOVING AMBIGUITY
 Ambiguous Grammar:
 How to Solve Associativity Problem?
 Operators with
 Input: 3 + 2 + 6          Parse Tree -1         Parse Tree -2       ▪ Two different parse trees!!
                                                                     ▪ According to the Grammar both
                                                                       are correct.
 Ambiguous Grammar
                                                                     ▪ But Parse Tree-1 gives WRONG
 1. E →E + E                                                           answer and Parse Tree-2 gives
 2. E → E * E         num(3)                                num(6)
                                                                       RIGHT answer.
 3. E → num
                           num(2)    num(6)   num(3) num(2)      Removing the associativity
                                                                 problem from the Grammar:
✓ Operators with same precedence must be                             1. E → E + num
  resolved by Associativity                                          2. E → E * num
✓ Some operators have left associativity (+, -, *, /)                3. E→ num
  and some operators have right associativity (^)                Still it is an Ambiguous Grammar!!
 Ambiguous Grammar:
 How to Solve Associativity Problem?(2)
  Input: 3 + 2 + 6                            Parse Tree
 Removing the associativity
 problem from the Grammar:
                                                       num(6)
    1. E → E + num
    2. E → E * num
    3. E→ num                                    num(2)
                                     num(3)
Still it is an Ambiguous Grammar!!
 Ambiguous Grammar:
 How to Solve Precedence Problem?
 Input: 3 + 2 * 6        Parse Tree -1        Parse Tree -2       ▪ Two different parse trees!!
                                                                  ▪ According to the Grammar both
                                                                    are correct.
 Ambiguous Grammar
                                                                  ▪ But Parse Tree-1 gives RIGHT
 1. E →E + E                                                        answer and Parse Tree-2 gives
 2. E → E * E        num(3)                              num(6)
                                                                    WRONG answer.
 3. E → num
                        num(2)     num(6)   num(3) num(2)
                                                                   After Conversion to
                                                                   Unambiguous Grammar:
✓ Lower precedence operation rules should be
                                                                     1. E → E + T | T
  declared in the upper level in the Grammar
                                                                     2. T → T * F | F
✓ Higher precedence operation rules should be
                                                                     3. F → num
  declared in the lower level in the Grammar
Ambiguous Grammar:
How to Solve Precedence Problem?(2)
 Input: 3 + 2 * 6         Parse Tree
After Conversion to
                                  T
Unambiguous Grammar:
                         T    T         F
 1. E → E + T | T         F
                              F        num(6)
 2. T → T * F | F      num(3)
 3. F → num                 num(2)
UNAMBIGUOUS GRAMMAR
Reading Materials
 Chapter   -4 of your Text book:
  Compilers:   Principles, Techniques, and Tools
THE END