CSN-352
Syntax Analysis
Grammar and Parsing
Bottom-up Parsing
- A bottom-up parse corresponds to the construction of a parse tree for an input
string beginning at the leaves (the bottom) and working up towards the root (the
top).
Bottom-up Parse for id*id using the expression grammar
- Shift-reduce parser is a type of bottom-up parser, which is built on LR(K)
grammar.
Reducing
- We can think of bottom-up parsing as the process of “reducing" a string w to the start
symbol of the grammar.
- At each reduction step, a specific substring matching the body of a production is replaced
by the nonterminal at the head of that production.
Reduction process -> id*id, F*id, T*id, T*F, T, E
- The goal of bottom-up parsing is therefore to construct
a rightmost derivation in reverse with left to right scan of the input.
- When to reduce?
- Which production rule to reduce?
Handle
- A “handle" is a substring that matches the body of a production, and whose reduction represents one
step along the reverse of a rightmost derivation.
Handle during a parse of id1*id2
- T is not a handle in step-3. Though T matches the
production rule E-> T, but this replacement
(E*id2) doesn’t give righmost derivation from E
(start symbol) for the given string.
Handle Pruning
- A rightmost derivation in reverse can be obtained by “handle pruning."
- We start with a string of terminals w to be parsed. If w belongs to Grammar G, it is the n th
sentential form yn of some unknown right most derivation.
- To reconstruct this derivation in reverse order, we locate the handle ßn in
yn and replace ßn by the head of the relevant production An->ßn to obtain
the previous right-sentential form yn-1.
- Repeat the process for yn-1 until we produce a right-sentential form consisting only S .
- However, how to find handle is not clear?
Shift-Reduce Parsing
- Applicabel for LR class of Grammar.
- No ambiguoous grammar can be LR.
- A type of bottom-up parsing.
- Stack holds the grammar symbols and buffer holds the input.
- $ indicated end of buffer and stack.
- The top of the stack is on the right to show bottom-up parsing unlike top-down parsing.
- Shift: During a left-to-right scan of the input string, the parser shifts zero or more input symbols
onto the stack, until it is ready to reduce a string of grammar symbols on top of the stack.
- Reduce: It then reduces to the head of the appropriate production. The parser repeats this cycle until
it has detected an error or until the stack contains the start symbol and the input is empty
Shift-Reduce Parser: 4 Operations
1. Shift. Shift the next input symbol onto the
top of the stack.
2. Reduce. The right end of the string to be
reduced must be at the top of
the stack. Locate the left end of the string
within the stack and decide
with what nonterminal to replace the string.
3. Accept. Announce successful completion of
parsing.
4. Error. Discover a syntax error and call an
error recovery routine.
Shift-Reduce Conflict
- Knowing the entire stack and also the next k input symbols, parser cannot decide
whether to shift or to reduce (a shift/reduce conflict).
- These grammars are not in LR(K).
- Stack: ....if expr then stmt
- input: else....
Whether to reduce or shift?
- Some adaptation might help to resolve conflict for ambiguous grammar.
LR Vs LL Parser
- LL Parser requires to recognize the use of the production rule by observing the first
k symbols of what its right side derives.
- LR parser requires to recognize the occurrence of the right side of a production in a
right-sentential form, with k input symbols of lookahead.
- LR grammars can describe more languages than LL grammars. Though hard to
derive manually.
- The class of grammars that can be parsed using LR methods is a proper superset of
the class of grammars that can be parsed with predictive or LL methods.
When to Shift or Reduce?
- We keep track of parser states by forming automata of states (sets of items).
Components of automata:
- Item
- LR(0) Item
- Cannonical Collection
- Cannonical Clouser
- GoTo Funciton
Items for LR(0)
- item of a grammar G is a production of G with a dot at some position of the body.
A -> X.Y Z indicates that we have just seen on the input a string derivable from
-X and that we hope next to see a string derivable from Y Z.
4 items for A-> XYZ
A-> ^ has one item A -> .
Cannonical Collection
- Each set of items of interest is formed by taking the closure of a set of kernel items.
- One collection of sets of LR(0) items, called the canonical LR(0) collection.
-
- Canonical LR(0) provides a base to make DFA (LR(0) automaton) to take parsing
decision.
Augmented Grammar: Add production S’-> S, acceptance occurs when parser has
to reduce S’->S.
Closure Operation on a set of Items
GoTo Operation on a State (Closure of set of Items)
GoTo defines a transition in the LR(0) automata for a grammar.
GoTo(I,X) – Gives transition from the state for I under input X.
- If X is immediately to the right of the dot in an item, take closure of the
item after shifting dot to the right of the X.
- Closure is a new state which can be reached from I with the GoTo
operation.
Algorithm to construct LR(0) automaton with C (the cannonical collection of sets of LR(0)
items) from augmented grammar G’
-
Using LR(0) Automata
- SLR(1) parser gave LR(0) automata from the Grammar. It helps in shift-reduce decision.
- The start state of the LR(0) automaton is CLOSURE({[S’-> S]}).
-
- All states are accepting states.
Shift Decision: Shift on next input symbol a if state j has a transition on a.
Reduce Decision: If no transition on a, reduce; the items in state j will tell us which production to
use.
The LR parsing Algorithm Model
Action-GoTo
represents LR
The model remains the same across all LR parsing methods, parsing table
except the parsing table.
Structure of the LR parsing Table
- The parsing table consists of two parts: a parsing-action function ACTION and a
goto function GOTO.
-
- The ACTION function takes as arguments a state i and a terminal a (or $, the input
endmarker). The value of ACTION[i; a] can have one of four forms: shift a to the
stack, reduce B on top the stack to head A, accept the input and finishes parsing,
error in the input and take correct action.
- GOTO[Ii ; A] = Ij , then GOTO also maps a state i and a nonterminal A to state j.
Structure of the LR parsing Table
- The information contained in the finite-state machine can be stored in a table.
-
- States in rows, Action and GoTo in columns.
- The columns labeled by tokens (terminals and $) are called the action columns. The
columns labeled by nonterminals are called goto columns.
Structure of the LR parsing Table: How to populate
- Consider a row labeled by state q.
1. There is an accept action in column $ provided state q contains LR(0) item S′ → S.
-
where S′ is the new start state.
2. For each column labeled by a terminal t, the table contains shift n if there is a
transition from state q to state n that is labeled by terminal t.
3. For each column labeled by a terminal t, the table contains action reduce n if state
q contains LR(0) item N → α ⋅, production N → α is the n, production N → α is the nth production, and t is in
FOLLOW(N).
4. For each column labeled by a nonterminal N, the table contains the state q' so that
there is a transition from state q to state q' labeled by nonterminal N.
LR(0) Automata for
Expression Grammar
n = id