Unit-4 Syntax Analysis:
What is Syntax Analysis?
Syntax analysis (parsing) is the second phase of the compilation process, following lexical
analysis. Its primary goal is to verify the syntactical correctness of the source code. It
takes the tokens generated by the lexical analyzer and attempts to build a Parse
Tree or Abstract Syntax Tree (AST), representing the program’s structure. During this
phase, the syntax analyzer checks whether the input string adheres to the grammatical
rules of the language using context-free grammar. If the syntax is correct, the analyzer
moves forward; otherwise, it reports an error.
The main goal of syntax analysis is to create a parse tree or abstract syntax tree (AST) of
the source code, which is a hierarchical representation of the source code that reflects the
grammatical structure of the program.
A strong understanding of syntax analysis is key to mastering compiler design, a critical
area for GATE CS. For in-depth learning and exam preparation, consider the GATE CS
Self-Paced Course. This course covers syntax analysis, grammar, and other important
topics to help you excel in the GATE exam and beyond.
Parsing Algorithms Used in Syntax Analysis
 LL parsing: This is a top-down parsing algorithm that starts with the root of the
   parse tree and constructs the tree by successively expanding non-terminals. LL
   parsing is known for its simplicity and ease of implementation.
 LR parsing: This is a bottom-up parsing algorithm that starts with the leaves of the
   parse tree and constructs the tree by successively reducing terminals. LR parsing is
   more powerful than LL parsing and can handle a larger class of grammars.
 LR(1) parsing: This is a variant of LR parsing that uses lookahead to disambiguate
   the grammar.
 LALR parsing: This is a variant of LR parsing that uses a reduced set of lookahead
   symbols to reduce the number of states in the LR parser.
 Once the parse tree is constructed, the compiler can perform semantic analysis to
   check if the source code makes sense and follows the semantics of the programming
   language.
 The parse tree or AST can also be used in the code generation phase of the compiler
   design to generate intermediate code or machine code
Features of Syntax Analysis
   Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical
    representation of the code’s structure. The tree shows the relationship between the
    various parts of the code, including statements, expressions, and operators.
   Context-Free Grammar: Syntax analysis uses context-free grammar to define the
    syntax of the programming language. Context-free grammar is a formal language
    used to describe the structure of programming languages.
     Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using two
      main approaches: top-down parsing and bottom-up parsing. Top-down parsing starts
      from the highest level of the syntax tree and works its way down, while bottom-up
      parsing starts from the lowest level and works its way up.
     Error Detection: Syntax analysis is responsible for detecting syntax errors in the
      code. If the code does not conform to the rules of the programming language, the
      parser will report an error and halt the compilation process.
     Intermediate Code Generation: Syntax analysis generates an intermediate
      representation of the code, which is used by the subsequent phases of the compiler.
      The intermediate representation is usually a more abstract form of the code, which is
      easier to work with than the original source code.
     Optimization: Syntax analysis can perform basic optimizations on the code, such as
      removing redundant code and simplifying expressions.
The pushdown automata (PDA) is used to design the syntax analysis phase.
The Grammar for a Language consists of Production rules.
Example: Suppose Production rules for the Grammar of a language are:
    S -> cAd
    A -> bc|a
    And the input string is “cad”.
Now the parser attempts to construct a syntax tree from this grammar for the given input
string. It uses the given production rules and applies those as needed to generate the string.
To generate string “cad” it uses the rules as shown in the given diagram:
In step (iii) above, the production rule A->bc was not a suitable one to apply (because the
string produced is “cbcd” not “cad”), here the parser needs to backtrack, and apply the
next production rule available with A which is shown in step (iv), and the string “cad” is
produced.
Thus, the given input can be produced by the given grammar, therefore the input is correct
in syntax. But backtrack was needed to get the correct syntax tree, which is really a
complex process to implement.
There can be an easier way to solve this, which we shall see in the next article “Concepts
of FIRST and FOLLOW sets in Compiler Design”.
Advantages
 Advantages of using syntax analysis in compiler design include:
 Structural validation: Syntax analysis allows the compiler to check if the source code
   follows the grammatical rules of the programming language, which helps to detect
   and report errors in the source code.
 Improved code generation: Syntax analysis can generate a parse tree or abstract
   syntax tree (AST) of the source code, which can be used in the code generation phase
   of the compiler design to generate more efficient and optimized code.
 Easier semantic analysis: Once the parse tree or AST is constructed, the compiler can
   perform semantic analysis more easily, as it can rely on the structural information
   provided by the parse tree or AST .
Disadvantages
 Disadvantages of using syntax analysis in compiler design include:
 Complexity: Parsing is a complex process, and the quality of the parser can greatly
   impact the performance of the resulting code. Implementing a parser for a complex
   programming language can be a challenging task, especially for languages with
   ambiguous grammars.
 Reduced performance: Syntax analysis can add overhead to the compilation process,
   which can reduce the performance of the compiler.
 Limited error recovery: Syntax analysis algorithms may not be able to recover from
   errors in the source code, which can lead to incomplete or incorrect parse trees and
   make it difficult for the compiler to continue the compilation process.
 Inability to handle all languages: Not all languages have formal grammars, and some
   languages may not be easily parseable.
 Overall, syntax analysis is an important stage in the compiler design process, but it
   should be balanced against the goals.
Syntax analysis, also known as parsing, is a crucial stage in the process of compiling a
program. Its primary task is to analyze the structure of the input program and check
whether it conforms to the grammar rules of the programming language. This process
involves breaking down the input program into a series of tokens and then constructing a
parse tree or abstract syntax tree (AST) that represents the hierarchical structure of the
program.
Steps in Syntax Analysis Phase
 Tokenization: The input program is divided into a sequence of tokens, which are
   basic building blocks of the programming language, such as identifiers, keywords,
   operators, and literals.
   Parsing: The tokens are analyzed according to the grammar rules of the
    programming language, and a parse tree or AST is constructed that represents the
    hierarchical structure of the program.
   Error handling: If the input program contains syntax errors, the syntax analyzer
    detects and reports them to the user, along with an indication of where the error
    occurred.
   Symbol table creation: The syntax analyzer creates a symbol table, which is a data
    structure that stores information about the identifiers used in the program, such as
    their type, scope, and location.
   The syntax analysis phase is essential for the subsequent stages of the compiler, such
    as semantic analysis, code generation, and optimization. If the syntax analysis is not
    performed correctly, the compiler may generate incorrect code or fail to compile the
    program altogether.
Why FIRST and FOLLOW in Compiler Design?
Why FIRST?
If the compiler would have come to know in advance, that what is the “first character of
the string produced when a production rule is applied”, and comparing it to the current
character or token in the input string it sees, it can wisely take decision on which
production rule to apply. Let’s take the same grammar from the previous article:
S -> cAd
A -> bc|a
And the input string is “cad”.
Thus, in the example above, if it knew that after reading character ‘c’ in the input string
and applying S->cAd, next character in the input string is ‘a’, then it would have ignored
the production rule A->bc (because ‘b’ is the first character of the string produced by
this production rule, not ‘a’ ), and directly use the production rule A->a (
because ‘a’ is the first character of the string produced by this production rule, and is
same as the current character of the input string which is also ‘a’
). Hence it is validated that if the compiler/parser knows about
first character of the string that can be obtained by applying a production rule,
then it can wisely apply the correct production rule to get the correct syntax tree for the
given input string.
Why FOLLOW?
The parser faces one more problem. Let us consider below grammar to understand this
problem.
A -> aBb
B -> c | ε
And suppose the input string is “ab” to parse.
As the first character in the input is a, the parser applies the rule A->aBb.
     A
    /| \
   a B b
Now the parser checks for the second character of the input string which is b, and the
Non-Terminal to derive is B, but the parser can’t get any string derivable from B that
contains b as first character. But the Grammar does contain a production rule B -> ε, if
that is applied then B will vanish, and the parser gets the input “ab”, as shown below.
But the parser can apply it only when it knows that the character that follows B in the
production rule is same as the current character in the input. In RHS of A -> aBb, b
follows Non-Terminal B, i.e. FOLLOW(B) = {b}, and the current input character read is
also b. Hence the parser applies this rule. And it is able to get the string “ab” from the
given grammar.
      A           A
    / | \       / \
   a B b      => a      b
      |
      ε
So FOLLOW can make a Non-terminal vanish out if needed to generate the string from
the parse tree. The conclusions is, we need to find FIRST and FOLLOW sets for a
given grammar so that the parser can properly apply the needed rule at the correct
position. In the next article, we will discuss formal definitions of FIRST and FOLLOW,
and some easy rules to compute these sets.
FIRST Set in Syntax Analysis
FIRST(X) for a grammar symbol X is the set of terminals that begin the strings
derivable from X.
FIRST set is a concept used in syntax analysis, specifically in the context of LL and LR
parsing algorithms. It is a set of terminals that can appear immediately after a given non-
terminal in a grammar.
The FIRST set of a non-terminal A is defined as the set of terminals that can appear as
the first symbol in any string derived from A. If a non-terminal A can derive the empty
string, then the empty string is also included in the FIRST set of A.
The FIRST set is used to determine which production rule should be used to expand a
non-terminal in an LL or LR parser. For example, in an LL parser, if the next symbol in
the input stream is in the FIRST set of a non-terminal, then that non-terminal can be
safely expanded using the production rule that starts with that symbol.
It is worth noting that FIRST set is also used in computing FOLLOW set, which is a set
of terminals that can appear immediately after a non-terminal in a grammar. FOLLOW
set is used in LR parsing, which requires more information than LL parsing.
To compute the FIRST set of a grammar, one can start with all terminals having the
respective terminal in their FIRST set and continue the process by adding the first
terminal of the right-hand side of the production to the set of the non-terminal in the
left-hand side of the production. Repeat this process until no new element can be added
to any set.
FIRST set is a fundamental concept in syntax analysis, and it is used in many parsing
algorithms and techniques. Its computation is a
Rules to compute FIRST set:
1. If x is a terminal, then FIRST(x) = { ‘x’ }
2. If x-> ?, is a production rule, then add ? to FIRST(x).
3. If X->Y1 Y2 Y3….Yn is a production,
   1. FIRST(X) = FIRST(Y1)
   2. If FIRST(Y1) contains ? then FIRST(X) = { FIRST(Y1) – ? } U { FIRST(Y2) }
   3. If FIRST (Yi) contains ? for all i = 1 to n, then add ? to FIRST(X).
Example 1:
Production Rules of Grammar
E -> TE’
E’ -> +T E’|?
T -> F T’
T’ -> *F T’ | ?
F -> (E) | id
FIRST sets
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, ? }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, ? }
FIRST(F) = { ( , id }
Example 2:
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | ?
C -> h | ?
FIRST sets
FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba)
    = { d, g, h, b, a, ?}
FIRST(A) = { d } U FIRST(BC)
    = { d, g, h, ? }
FIRST(B) = { g , ? }
FIRST(C) = { h , ? }
Notes:
1. The grammar used above is Context-Free Grammar (CFG). Syntax of most
   programming languages can be specified using CFG.
2. CFG is of the form A -> B, where A is a single Non-Terminal, and B can be a set of
   grammar symbols ( i.e. Terminals as well as Non-Terminals)
Features of FIRST sets:
Definition: The FIRST set of a nonterminal symbol is the set of terminal symbols that
can appear as the first symbol in a string derived from that nonterminal. In other words,
it is the set of all possible starting symbols for a string derived from that nonterminal.
Calculation: The FIRST set for each nonterminal symbol is calculated by examining
the productions for that symbol and determining which terminal symbols can appear as
the first symbol in a string derived from that production.
Recursive Descent Parsing: The FIRST set is often used in recursive descent parsing,
which is a top-down parsing technique that uses the FIRST set to determine which
production to use at each step of the parsing process.
Ambiguity Resolution: The FIRST set can help resolve ambiguities in the grammar by
providing a way to determine which production to use based on the next input symbol.
Follow Set: The FOLLOW set is another concept used in syntax analysis that represents
the set of symbols that can appear immediately after a nonterminal symbol in a
derivation. The FOLLOW set is often used in conjunction with the FIRST set to resolve
parsing conflicts and ensure that the parser can correctly identify the structure of the
input code.
Advantages and Disadvantages:
Advantages of using FIRST set in syntax analysis include:
 Improved parsing: FIRST set can be used to determine which production rule should
   be used to expand a non-terminal in an LL or LR parser, which helps to improve the
   accuracy and efficiency of the parsing process.
 Ambiguity resolution: FIRST set can be used to resolve ambiguities in the grammar,
   by determining which production rule should be used in cases where multiple
   production rules can apply to the same non-terminal.
 Simplified error handling: By determining which production rule should be used
   based on the FIRST set, an LL or LR parser can detect errors in the source code more
   quickly and accurately.
Disadvantages of using FIRST set in syntax analysis include:
 Complexity: Computing FIRST set can be a complex process, especially for
   grammars with many non-terminals and production rules.
    Limited applicability: FIRST set is mainly used in LL and LR parsing algorithms,
     and may not be applicable to other types of parsing algorithms.
    Limitations of LL parsing: LL parsing is limited in its ability to handle certain types
     of grammars, such as those with left-recursive rules, which can lead to an infinite
     loop in the parser.
FOLLOW Set in Syntax Analysis
FOLLOW set in compiler design are used to identify the terminal symbol immediately
after a non-terminal in a given language. FOLLOW set is also used to avoid
backtracking same as the FIRST set. The only difference is FOLLOW set works on
vanishing non-terminal on the right-hand side so that decision-making gets easier for the
compiler while parsing.
Follow(X) to be the set of terminals that can appear immediately to the right of Non-
Terminal X in some sentential form.
Example:
S ->Aa | Ac
A ->b
       S            S
      / \         / \
    A a            A    c
    |         |
    b             b
Here, FOLLOW (A) = {a, c}
Rules to compute FOLLOW set:
1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal
2) If A -> pBq is a production, where p, B and q are any grammar symbols,
  then everything in FIRST(q) except Є is in FOLLOW(B).
3) If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B).
4) If A->pBq is a production and FIRST(q) contains Є,
  then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)
Example 1:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id
FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }
FOLLOW Set
FOLLOW(E) = { $ , ) } // Note ')' is there because of 5th rule
FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st production rule
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) =          {+,$,)}
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, ) }
Example 2:
Production Rules:
S -> aBDh
B -> cC
C -> bC | Є
D -> EF
E -> g | Є
F -> f | Є
FIRST set
FIRST(S) = { a }
FIRST(B) = { c }
FIRST(C) = { b , Є }
FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є }
FIRST(E) = { g , Є }
FIRST(F) = { f , Є }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }
Example 3:
Production Rules:
S -> ACB|Cbb|Ba
A -> da|BC
B-> g|Є
C-> h| Є
FIRST set
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b, a}
FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }
FIRST(B) = { g, Є }
FIRST(C) = { h, Є }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A) = { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h }
Note:
1. Є as a FOLLOW doesn’t mean anything (Є is an empty string).
2. $ is called end-marker, which represents the end of the input string, hence used while
   parsing to indicate that the input string has been completely processed.
3. The grammar used above is Context-Free Grammar (CFG). The syntax of a
   programming language can be specified using CFG.
4. CFG is of the form A -> B, where A is a single Non-Terminal, and B can be a set of
   grammar symbols ( i.e. Terminals as well as Non-Terminals)
Classification of Context Free Grammars
Context Free Grammars (CFG) can be classified on the basis of following two
properties:
1) Based on number of strings it generates.
 If CFG is generating finite number of strings, then CFG is Non-Recursive (or the
    grammar is said to be Non-recursive grammar)
 If CFG can generate infinite number of strings then the grammar is said to
    be Recursive grammar
During Compilation, the parser uses the grammar of the language to make a parse
tree(or derivation tree) out of the source code. The grammar used must be unambiguous.
An ambiguous grammar must not be used for parsing.
2) Based on number of derivation trees.
 If there is only 1 derivation tree then the CFG is unambiguous.
 If there are more than 1 left most derivation tree or right most derivation or parse
    tree , then the CFG is ambiguous .
Classifying grammars helps in understanding the capabilities of various computational
models. To dive deeper into the types of grammars and their real-world applications,
the GATE CS Self-Paced Coursecovers these essential topics in detail, ensuring clarity
for competitive exams like GATE.
Examples of Recursive and Non-Recursive Grammars
Recursive Grammars
1) S->SaS
  S->b
The language(set of strings) generated by the above grammar is :{b, bab, babab,…},
which is infinite.
2) S-> Aa
  A->Ab|c
The language generated by the above grammar is :{ca, cba, cbba …}, which is infinite.
Note: A recursive context-free grammar that contains no useless rules necessarily
produces an infinite language.
Non-Recursive Grammars
 S->Aa
 A->b|c
The language generated by the above grammar is :{ba, ca}, which is finite.
Types of Recursive Grammars
Based on the nature of the recursion in a recursive grammar, a recursive CFG can be
again divided into the following:
 Left Recursive Grammar (having left Recursion)
 Right Recursive Grammar (having right Recursion)
 General Recursive Grammar(having general Recursion)
Note:
A linear grammar is a context-free grammar that has at most one non-terminal in the
right hand side of each of its productions.
Parsing | Set 1 (Introduction, Ambiguity and Parsers)
Parsing is performed at the syntax analysis phase where a stream of tokens is taken as
input from the lexical analyzer and the parser produces the parser tree for the tokens while
checking the stream of tokens against the syntax errors.
Role of Parser
In the syntax analysis phase, a compiler verifies whether or not the tokens generated by
the lexical analyzer are grouped according to the syntactic rules of the language. This is
done by a parser. The parser obtains a string of tokens from the lexical analyzer and
verifies that the string can be the grammar for the source language. It detects and reports
any syntax errors and produces a parse tree from which intermediate code can be
generated.
Types of Parsing
The parsing is divided into two types, which are as follows:
1. Top-down Parsing
2. Bottom-up Parsing
Top-Down Parsing
Top-down parsing attempts to build the parse tree from the root node to the leaf node. The
top-down parser will start from the start symbol and proceed to the string. It follows the
leftmost derivation. In leftmost derivation, the leftmost non-terminal in each sentential is
always chosen.
1. Recursive parsing or predictive parsing are other names for top-down parsing.
2. A parse tree is built for an input string using bottom-up parsing.
3. When parsing is done top-down, the input symbol is first transformed into the start
    symbol.
The top-down parsing is further categorized as follows:
1. With Backtracking:
 Brute Force Technique
2. Without Backtracking:
 Recursive Descent Parsing
 Predictive Parsing or Non-Recursive Parsing or LL(1) Parsing or Table Driver Parsing
Bottom-up Parsing
Bottom-up parsing builds the parse tree from the leaf node to the root node. The bottom-
up parsing will reduce the input string to the start symbol. It traces the rightmost derivation
of the string in reverse. Bottom-up parsers are also known as shift-reduce parsers.
1. Shift-reduce parsing is another name for bottom-up parsing.
2. A parse tree is built for an input string using bottom-up parsing.
3. When parsing from the bottom up, the process begins with the input symbol and builds
   the parse tree up to the start symbol by reversing the rightmost string derivations.
Generally, bottom-up parsing is categorized into the following types:
1. LR parsing/Shift Reduce Parsing: Shift reduce Parsing is a process of parsing a string
to obtain the start symbol of the grammar.
 LR(0)
 SLR(1)
 LALR
 CLR
2. Operator Precedence Parsing: The grammar defined using operator grammar is
known as operator precedence parsing. In operator precedence parsing there should be no
null production and two non-terminals should not be adjacent to each other.
Bottom-up or Shift Reduce Parsers
In this article, we are discussing the Bottom Up parser. Bottom-up Parsers / Shift
Reduce Parsers Build the parse tree from leaves to root. Bottom-up parsing can be
defined as an attempt to reduce the input string w to the start symbol of grammar by
tracing out the rightmost derivations of w in reverse. Eg.
Classification of Bottom-up Parsers:
A general shift reduce parsing is LR parsing. The L stands for scanning the input from
left to right and R stands for constructing a rightmost derivation in reverse.
Benefits of LR parsing:
1. Many programming languages using some variations of an LR parser. It should be
   noted that C++ and Perl are exceptions to it.
2. LR Parser can be implemented very efficiently.
3. Of all the Parsers that scan their symbols from left to right, LR Parsers detect syntactic
   errors, as soon as possible.
To construct the GOTO graph using LR(0) parsing, we rely on two essential
functions: Closure() and Goto().
Firstly, we introduce the concept of an augmented grammar. In the augmented grammar,
a new start symbol, S’, is added, along with a production S’ -> S. This addition helps the
parser determine when to stop parsing and signal the acceptance of input. For example,
if we have a grammar S -> AA and A -> aA | b, the augmented grammar will be S’ -> S
and S -> AA.
Next, we define LR(0) items. An LR(0) item of a grammar G is a production of G with a
dot (.) positioned at some point on the right-hand side. For instance, given the
production S -> ABC, we obtain four LR(0) items: S -> .ABC, S -> A.BC, S -> AB.C,
and S -> ABC. It is worth noting that the production A -> ? generates only one item: A -
> .?.
By utilizing the Closure() function, we can calculate the closure of a set of LR(0) items.
The closure operation involves expanding the items by considering the productions that
have the dot right before the non-terminal symbol. This step helps us identify all the
possible items that can be derived from the current set.
The Goto() function is employed to construct the transitions between LR(0) items in the
GOTO graph. It determines the next set of items by shifting the dot one position to the
right. This process allows us to navigate through the graph and track the parsing
progress.
Augmented Grammar: If G is a grammar with start symbol S then G’, the augmented
grammar for G, is the grammar with new start symbol S’ and a production S’ -> S. The
purpose of this new starting production is to indicate to the parser when it should stop
parsing and announce acceptance of input. Let a grammar be S -> AA A -> aA | b, The
augmented grammar for the above grammar will be S’ -> S S -> AA A -> aA | b.
LR(0) Items: An LR(0) is the item of a grammar G is a production of G with a dot at
some position in the right side. S -> ABC yields four items S -> .ABC S -> A.BC S ->
AB.C S -> ABC. The production A -> ? generates only one item A -> .?
Closure Operation: If I is a set of items for a grammar G, then closure(I) is the set of
items constructed from I by the two rules:
1. Initially every item in I is added to closure(I).
2. If A -> ?.B? is in closure(I) and B -> ? is a production then add the item B -> .? to I, If
   it is not already there. We apply this rule until no more items can be added to closure(I).
Eg:
Goto Operation : Goto(I, X) =
1. Add I by moving dot after X.
2. Apply closure to first step.
Construction of GOTO graph-
 State I0 – closure of augmented LR(0) item
 Using I0 find all collection of sets of LR(0) items with the help of DFA
 Convert DFA to LR(0) parsing table
Construction of LR(0) parsing table:
 The action function takes as arguments a state i and a terminal a (or $ , the input end
   marker). The value of ACTION[i, a] can have one of four forms:
   1. Shift j, where j is a state.
   2. Reduce A -> ?.
   3. Accept
   4. Error
    We extend the GOTO function, defined on sets of items, to states: if GOTO[I i , A]
       = Ij then GOTO also maps a state i and a nonterminal A to state j.
Eg: Consider the grammar S ->AA A -> aA | b Augmented grammar S’ -> S S -> AA A -
> aA | b The LR(0) parsing table for above GOTO graph will be –
Action part of the table contains all the terminals of the grammar whereas the goto part
contains all the nonterminals. For every state of goto graph we write all the goto
operations in the table. If goto is applied to a terminal then it is written in the action part
if goto is applied on a nonterminal it is written in goto part. If on applying goto a
production is reduced ( i.e if the dot reaches at the end of production and no further
closure can be applied) then it is denoted as R i and if the production is not reduced
(shifted) it is denoted as Si. If a production is reduced it is written under the terminals
given by follow of the left side of the production which is reduced for ex: in I 5 S->AA is
reduced so R1 is written under the terminals in follow(S)={$} (To know more about how
to calculate follow function: Click here ) in LR(0) parser. If in a state the start symbol of
grammar is reduced it is written under $ symbol as accepted.
NOTE: If in any state both reduced and shifted productions are present or two reduced
productions are present it is called a conflict situation and the grammar is not LR
grammar.
NOTE:
1. Two reduced productions in one state – RR conflict.
2. One reduced and one shifted production in one state – SR conflict. If no SR or RR
conflict present in the parsing table then the grammar is LR(0) grammar. In above
grammar no conflict so it is LR(0) grammar.
Shift Reduce Parser in Compiler
Shift Reduce parser attempts for the construction of parse in a similar manner as done
in bottom-up parsing i.e. the parse tree is constructed from leaves(bottom) to the
root(up). A more general form of the shift-reduce parser is the LR parser.
This parser requires some data structures i.e.
 An input buffer for storing the input string.
 A stack for storing and accessing the production rules.
Basic Operations –
 Shift: This involves moving symbols from the input buffer onto the stack.
 Reduce: If the handle appears on top of the stack then, its reduction by using
  appropriate production rule is done i.e. RHS of a production rule is popped out of a
  stack and LHS of a production rule is pushed onto the stack.
 Accept: If only the start symbol is present in the stack and the input buffer is empty
  then, the parsing action is called accept. When accepted action is obtained, it is
  means successful parsing is done.
   Error: This is the situation in which the parser can neither perform shift action nor
    reduce action and not even accept action.
Example 1 – Consider the grammar
             S –> S + S
             S –> S * S
             S –> id
   Perform Shift Reduce parsing for input string “id + id + id”.
Example 2 – Consider the grammar
                 E –> 2E2
                 E –> 3E3
                 E –> 4
Perform Shift Reduce parsing for input string “32423”.
Example 3 – Consider the grammar
               S –> ( L ) | a
               L –> L , S | S
Perform Shift Reduce parsing for input string “( a, ( a, a ) ) “.
                              Input                          Parsing
  Stack                       Buffer                         Action
             $                       (a,(a,a))$                         Shift
             $(                        a,(a,a))$                        Shift
            $(a                        ,(a,a))$                     Reduce S → a
            $(S                        ,(a,a))$                     Reduce L → S
            $(L                        ,(a,a))$                         Shift
           $(L,                        (a,a))$                          Shift
          $(L,(                         a,a))$                          Shift
                              Input                        Parsing
    Stack                     Buffer                       Action
            $(L,(a                      ,a))$                      Reduce S → a
            $(L,(S                      ,a))$                      Reduce L → S
            $(L,(L                      ,a))$                           Shift
         $(L,(L,                        a))$                            Shift
        $(L,(L,a                         ))$                       Reduce S → a
        $ ( L, ( L, S                    ))$                      Reduce L →L, S
            $ ( L, ( L                   ))$                            Shift
            $ ( L, ( L )                  )$                      Reduce S → (L)
             $ ( L, S                     )$                      Reduce L → L, S
               $(L                        )$                            Shift
              $(L)                         $                      Reduce S → (L)
                $S                         $                           Accept
Advantages:
   Shift-reduce parsing is efficient and can handle a wide range of context-free
    grammars.
   It can parse a large variety of programming languages and is widely used in practice.
   It is capable of handling both left- and right-recursive grammars, which can be
    important in parsing certain programming languages.
   The parse table generated for shift-reduce parsing is typically small, which makes the
    parser efficient in terms of memory usage.
Disadvantages:
   Shift-reduce parsing has a limited lookahead, which means that it may miss some
    syntax errors that require a larger lookahead.
   It may also generate false-positive shift-reduce conflicts, which can require
    additional manual intervention to resolve.
   Shift-reduce parsers may have difficulty in parsing ambiguous grammars, where
    there are multiple possible parse trees for a given input sequence.
   In some cases, the parse tree generated by shift-reduce parsing may be more complex
    than other parsing techniques.
SLR Parser (with Examples)
LR parsers is an efficient bottom-up syntax analysis technique that can be used to parse
large classes of context-free grammar is called LR(k) parsing.
L stands for left-to-right scanning
R stands for rightmost derivation in reverse
k is several input symbols. when k is omitted k is assumed to be 1.
Advantages of LR parsing
 LR parsers handle context-free grammars. These grammars describe the structure of
    programming languages-how statements, expressions, and other language constructs
    fit together.
 LR parsers ensure that your code adheres to these rules.
 It is able to detect syntactic errors
 It is an efficient non-backtracking shift shift-reducing parsing method.
Types of LR Parsing Methods
   SLR parser
   LALR parser
   Canonical LR parser
SLR Parser
   LR parser is also called as SLR parser
   it is weakest of the three methods but easier to implement
   a grammar for which SLR parser can be constructed is called SLR grammar
Steps for constructing the SLR parsing table
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions:goto[list of terminals] and action[list of non-terminals] in the
   parsing table
EXAMPLE – Construct LR parsing table for the given context-free grammar
S–>AA
A–>aA|b
Solution:
STEP1: Find augmented grammar
The augmented grammar of the given grammar is:-
S’–>.S [0th production]
S–>.AA [1st production]
A–>.aA [2nd production]
A–>.b [3rd production]
STEP2: Find LR(0) collection of items
Below is the figure showing the LR(0) collection of items. We will understand
everything one by one.
The terminals of this grammar are {a,b}.
The non-terminals of this grammar are {S,A}
RULE –
If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
RULE –
from each state to the next state, the ‘ . ‘ shifts to one place to the right.
   In the figure, I0 consists of augmented grammar.
   Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.).
    this state is the accepted state. S is seen by the compiler.
   Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is
    seen by the compiler
   I0 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler.
   I0 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
    seen by the compiler.
   I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is
    seen by the compiler
   I2 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen
    by the compiler.
   I2 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler.
   I3 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
    seen by the compiler.
   I3 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
    seen by the compiler
   I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler.
STEP3: Find FOLLOW of LHS of production
FOLLOW(S)=$
FOLLOW(A)=a,b,$
To find FOLLOW of non-terminals, please read follow set in syntax analysis.
STEP 4: Defining 2 functions:goto[list of non-terminals] and action[list of terminals] in the
parsing table. Below is the SLR parsing table.
   $ is by default a nonterminal that takes accepting state.
   0,1,2,3,4,5,6 denotes I0,I1,I2,I3,I4,I5,I6
   I0 gives A in I2, so 2 is added to the A column and 0 rows.
   I0 gives S in I1,so 1 is added to the S column and 1 row.
   similarly 5 is written in A column and 2 row, 6 is written in A column and 3 row.
   I0 gives a in I3 .so S3(shift 3) is added to a column and 0 row.
   I0 gives b in I4 .so S4(shift 4) is added to the b column and 0 row.
   Similarly, S3(shift 3) is added on a column and 2,3 row ,S4(shift 4) is added on b
    column and 2,3 rows.
   I4 is reduced state as ‘ . ‘ is at the end. I4 is the 3rd production of grammar(A–
    >.b).LHS of this production is A. FOLLOW(A)=a,b,$ . write r3(reduced 3) in the
    columns of a,b,$ and 4th row
   I5 is reduced state as ‘ . ‘ is at the end. I5 is the 1st production of grammar(S–>.AA).
    LHS of this production is S.
    FOLLOW(S)=$ . write r1(reduced 1) in the column of $ and 5th row
   I6 is a reduced state as ‘ . ‘ is at the end. I6 is the 2nd production of grammar( A–
    >.aA). The LHS of this production is A.
    FOLLOW(A)=a,b,$ . write r2(reduced 2) in the columns of a,b,$ and 6th row
APPLICATIONS GALORE:
 Compiler
 Data Validation
 Natural Language Processing(NLP)
 Protocol Parsing
CLR Parser (with Examples)
LR parsers :
It is an efficient bottom-up syntax analysis technique that can be used to parse large
classes of context-free grammar is called LR(k) parsing.
L stands for the left to right scanning
R stands for rightmost derivation in reverse
k stands for no. of input symbols of lookahead
Advantages of LR parsing :
 It recognizes virtually all programming language constructs for which CFG can be
  written
 It is able to detect syntactic errors
 It is an efficient non-backtracking shift reducing parsing method.
Types of LR parsing methods :
1. SLR
2. CLR
3. LALR
CLR Parser :
The CLR parser stands for canonical LR parser.It is a more powerful LR parser.It makes
use of lookahead symbols. This method uses a large set of items called LR(1) items.The
main difference between LR(0) and LR(1) items is that, in LR(1) items, it is possible to
carry more information in a state, which will rule out useless reduction states.This extra
information is incorporated into the state by the lookahead symbol. The general syntax
becomes [A->∝.B, a ]
where A->∝.B is the production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
How to add lookahead with the production?
CASE 1 –
A->∝.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s
productions as well.
B->.D [1st production]
Suppose this is B’s production. The look ahead of this production is given as we look at
previous productions ie 0th production. Whatever is after B, we find FIRST(of that
value) , that is the lookahead of 1st production.So,here in 0th production, after B, C is
there. assume FIRST(C)=d, then 1st production become
B->.D, d
CASE 2 –
Now if the 0th production was like this,
A->∝.B, a
Here, we can see there’s nothing after B. So the lookahead of 0th production will be the
lookahead of 1st production. ie-
B->.D, a
CASE 3 –
Assume a production A->a|b
A->a,$ [0th production]
A->b,$ [1st production]
Here, the 1st production is a part of the previous production, so the lookahead will be
the same as that of its previous production.
These are the 2 rules of look ahead.
Steps for constructing CLR parsing table :
1. Writing augmented grammar
2. LR(1) collection of items to be found
3. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the
   CLR parsing table
EXAMPLE
Construct a CLR parsing table for the given context-free grammar
S-->AA
A-->aA|b
Solution :
STEP 1 – Find augmented grammar
The augmented grammar of the given grammar is:-
S'-->.S ,$ [0th production]
S-->.AA ,$ [1st production]
A-->.aA ,a|b [2nd production]
A-->.b ,a|b [3rd production]
Let’s apply the rule of lookahead to the above productions
   The initial look ahead is always $
   Now, the 1st production came into existence because of ‘ . ‘ Before ‘S’ in 0th
    production.There is nothing after ‘S’, so the lookahead of 0th production will be the
    lookahead of 1st production. ie: S–>.AA ,$
   Now, the 2nd production came into existence because of ‘ . ‘ Before ‘A’ in the 1st
    production.After ‘A’, there’s ‘A’. So, FIRST(A) is a,b
    Therefore,the look ahead for the 2nd production becomes a|b.
   Now, the 3rd production is a part of the 2nd production.So, the look ahead will be the
    same.
STEP 2 – Find LR(1) collection of items
Below is the figure showing the LR(1) collection of items. We will understand
everything one by one.
The terminals of this grammar are {a,b}
The non-terminals of this grammar are {S,A}
RULE-
1. If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘
   . ‘ preceding each of its production.
2. from each state to the next state, the ‘ . ‘ shifts to one place to the right.
3. All the rules of lookahead apply here.
   In the figure, I0 consists of augmented grammar.
   Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.).
    This state is the accept state . S is seen by the compiler. Since I1 is a part of the 0th
    production, the lookahead is the same ie $
   Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is
    seen by the compiler. Since I2 is a part of the 1st production, the lookahead is the
    same i.e. $.
   I0 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler. Since I3 is a part of the 2nd production, the lookahead is the
    same ie a|b.
   I0 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
    seen by the compiler. Since I4 is a part of the 3rd production, the lookahead is the
    same i.e. a | b.
   I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is
    seen by the compiler. Since I5 is a part of the 1st production, the lookahead is the
    same i.e. $.
   I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is
    seen by the compiler. Since I6 is a part of the 2nd production, the lookahead is the
    same i.e. $.
   I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen
    by the compiler. Since I6 is a part of the 3rd production, the lookahead is the same
    i.e. $.
   I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler. Since I3 is a part of the 2nd production, the lookahead is the
    same i.e. a|b.
   I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
    seen by the compiler. Since I8 is a part of the 2nd production, the lookahead is the
    same i.e. a|b.
   I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
    seen by the compiler. Since I9 is a part of the 2nd production, the lookahead is the
    same i.e. $.
   I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler. Since I6 is a part of the 2nd production, the lookahead is the
    same i.e. $.
   I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
    seen by the compiler. Since I6 is a part of the 3rd production, the lookahead is the
    same ie $.
STEP 3- defining 2 functions:goto[list of terminals] and action[list of non-terminals] in
the parsing table.Below is the CLR parsing table
   $ is by default a non terminal which takes accepting state.
   0,1,2,3,4,5,6,7,8,9 denotes I0,I1,I2,I3,I4,I5,I6,I7,I8,I9
   I0 gives A in I2, so 2 is added to the A column and 0 row.
   I0 gives S in I1,so 1 is added to the S column and 1st row.
   similarly 5 is written in A column and 2nd row, 8 is written in A column and 3rd
    row, 9 is written in A column and 6th row.
   I0 gives a in I3, so S3(shift 3) is added to a column and 0 row.
   I0 gives b in I4, so S4(shift 4) is added to the b column and 0 row.
   Similarly, S6(shift 6) is added on ‘a’ column and 2,6 row ,S7(shift 7) is added on b
    column and 2,6 row,S3(shift 3) is added on ‘a’ column and 3 row ,S4(shift 4) is
    added on b column and 3 row.
   I4 is reduced as ‘ . ‘ is at the end. I4 is the 3rd production of grammar. So write
    r3(reduce 3) in lookahead columns. The lookahead of I4 are a and b, so write R3 in a
    and b column.
   I5 is reduced as ‘ . ‘ is at the end. I5 is the 1st production of grammar. So write
    r1(reduce 1) in lookahead columns. The lookahead of I5 is $ so write R1 in $
    column.
   Similarly, write R2 in a,b column and 8th row, write R2 in $ column and 9th row.
Construction of LL(1) Parsing Table
A top-down parser builds the parse tree from the top down, starting with the start non-
terminal. There are two types of Top-Down Parsers:
1. Top-Down Parser with Backtracking
2. Top-Down Parsers without Backtracking
Top-Down Parsers without backtracking can further be divided into two parts:
In this article, we are going to discuss Non-Recursive Descent which is also known as
LL(1) Parser.
LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from
the Left to Right manner and the second L shows that in this parsing technique, we are
going to use the Left most Derivation Tree. And finally, the 1 represents the number of
look-ahead, which means how many symbols are you going to see when you want to make
a decision.
Essential conditions to check first are as follows:
1. The grammar is free from left recursion.
2. The grammar should not be ambiguous.
3. The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.
Algorithm to construct LL(1) Parsing Table:
Step 1: First check all the essential conditions mentioned above and go to step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
1. First(): If there is a variable, and from that variable, if we try to drive all the strings
   then the beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the process of
   derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each
   terminal in Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –
   > ε in the table for the $.
   To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the
Terminal Symbols. All the Null Productions of the Grammars will go under the Follow
elements and the remaining productions will lie under the elements of the First set.
Now, let’s understand with an example.
Example 1: Consider the Grammar:
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)
*ε denotes epsilon
   Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().
Find their First and Follow sets:
                   First        Follow
  E –> TE’        { id, ( }     { $, ) }
E’ –> +TE’/
            { +, ε }            { $, ) }
     ε
  T –> FT’        { id, ( }    { +, $, ) }
T’ –> *FT’/
            { *, ε }           { +, $, ) }
     ε
 F –> id/(E)      { id, ( }   { *, +, $, ) }
Step 3: Make a parser table.
Now, the LL(1) Parsing Table is:
         id              +                 *      (          )    $
 E E –> TE’                                    E –> TE’
         id                +            *           (          )        $
E’                     E’ –> +TE’                           E’ –> ε   E’ –> ε
 T T –> FT’                                      T –> FT’
T’                      T’ –> ε     T’ –> *FT’              T’ –> ε   T’ –> ε
 F     F –> id                                   F –> (E)
As you can see that all the null productions are put under the Follow set of that symbol
and all the remaining productions lie under the First of that symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one
cell may contain more than one production.
Example 2: Consider the Grammar
S --> A | a
A --> a
Step 1: The grammar does not satisfy all properties in step 1, as the grammar is
ambiguous. Still, let’s try to make the parser table and see what happens
Step 2: Calculating first() and follow()
Find their First and Follow sets:
                First     Follow
 S –> A/a        {a}        {$}
     A –>a       {a}        {$}
Step 3: Make a parser table.
Parsing Table:
                a           $
 S     S –> A, S –> a
 A            A –> a
Here, we can see that there are two productions in the same cell. Hence, this grammar is
not feasible for LL(1) Parser.
Trick – Above grammar is ambiguous grammar. So the grammar does not satisfy the
essential conditions. So we can say that this grammar is not feasible for LL(1) Parser even
without making the parse table.
Example 3: Consider the Grammar
S -> (L) | a
L -> SL'
L' -> )SL' | ε
Step1: The grammar satisfies all properties in step 1
Step 2: Calculating first() and follow()
         First      Follow
  S       (,a        $, )
  L      (,a          )
 L’       ), ε        )
Step 3: Making a parser table
Parsing Table:
                            )
             (                      a       $
  S      S -> (L)                 S -> a
  L      L -> SL’                L -> SL’
                      L’->(SL’
 L’
                       L’->ε
      Here, we can see that there are two productions in the same cell. Hence, this
      grammar is not feasible for LL(1) Parser. Although the grammar satisfies all the
      essential conditions in step 1, it is still not feasible for LL(1) Parser. We saw in
      example 2 that we must have these essential conditions and in example 3 we saw that
      those conditions are insufficient to be a LL(1) parser.
Advantages of Construction of LL(1) Parsing Table:
1.Deterministic Parsing: LL(1) parsing tables give a deterministic parsing process,
truly intending that for a given information program and language structure, there is a
novel not entirely set in stone by the ongoing non-terminal image and the lookahead
token. This deterministic nature works on the parsing calculation and guarantees that the
parsing system is unambiguous and unsurprising.
2.Efficiency: LL(1) parsing tables take into consideration productive parsing of
programming dialects. When the parsing table is built, the parsing calculation can decide
the following parsing activity by straightforwardly ordering the table, bringing about a
steady time query. This productivity is particularly useful for huge scope programs and
can altogether lessen the time expected for parsing.
3.Predictive Parsing: LL(1) parsing tables work with prescient parsing, where the
parsing activity is resolved exclusively by the ongoing non-terminal image and the
lookahead token without the requirement for backtracking or speculating. This prescient
nature makes the LL(1) parsing calculation direct to execute and reason about. It
likewise adds to better blunder dealing with and recuperation during parsing.
4.Error Discovery: The development of a LL(1) parsing table empowers the parser to
proficiently distinguish mistakes. By dissecting the passages in the parsing table, the
parser can recognize clashes, like various sections for a similar non-terminal and
lookahead blend. These struggles demonstrate sentence structure ambiguities or
mistakes in the syntax definition, considering early discovery and goal of issues.
5.Non-Left Recursion: LL(1) parsing tables require the disposal of left recursion in the
language structure. While left recursion is a typical issue in syntaxes, the most common
way of killing it brings about a more organized and unambiguous language structure.
The development of a LL(1) parsing table energizes the utilization of non-left recursive
creations, which prompts more clear and more effective parsing calculations.
6.Readability and Practicality: LL(1) parsing tables are by and large straightforward
and keep up with. The parsing table addresses the whole parsing calculation in a plain
configuration, with clear mappings between non-terminal images, lookahead tokens, and
parsing activities. This plain portrayal works on the comprehensibility of the parsing
calculation and improves on changes to the sentence structure, making it more viable
over the long haul.
7.Language Plan: Building a LL(1) parsing table assumes a significant part in the plan
and improvement of programming dialects. LL(1) language structures are frequently
preferred because of their straightforwardness and consistency. By guaranteeing that a
punctuation is LL(1) and building the related parsing table, language planners can shape
the linguistic structure and characterize the normal way of behaving of the language all
the more really.
LALR Parser (with Examples)
   LALR Parser :
   LALR Parser is lookahead LR parser. It is the most powerful parser which can
   handle large classes of grammar. The size of CLR parsing table is quite large as
   compared to other parsing table. LALR reduces the size of this table.LALR works
   similar to CLR. The only difference is , it combines the similar states of CLR parsing
   table into one single state.
   The general syntax becomes [A->∝.B, a ]
   where A->∝.B is production and a is a terminal or right end marker $
   LR(1) items=LR(0) items + look ahead
How to add lookahead with the production?
CASE 1 –
A->∝.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s
productions as well.
B->.D [1st production]
Suppose this is B’s production. The look ahead of this production is given as- we look at
previous production i.e. – 0th production. Whatever is after B, we find FIRST(of that
value) , that is the lookahead of 1st production. So, here in 0th production, after B, C is
there. Assume FIRST(C)=d, then 1st production become.
B->.D, d
CASE 2 –
Now if the 0th production was like this,
A->∝.B, a
Here,we can see there’s nothing after B. So the lookahead of 0th production will be the
lookahead of 1st production. ie-
B->.D, a
CASE 3 –
Assume a production A->a|b
A->a,$ [0th production]
A->b,$ [1st production]
Here, the 1st production is a part of the previous production, so the lookahead will be
the same as that of its previous production.
Steps for constructing the LALR parsing table :
1. Writing augmented grammar
2. LR(1) collection of items to be found
3. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the
   LALR parsing table
EXAMPLE
Construct CLR parsing table for the given context free grammar
S-->AA
A-->aA|b
Solution:
    STEP1- Find augmented grammar
    The augmented grammar of the given grammar is:-
S'-->.S ,$ [0th production]
S-->.AA ,$ [1st production]
A-->.aA ,a|b [2nd production]
A-->.b ,a|b [3rd production]
Let’s apply the rule of lookahead to the above productions.
   The initial look ahead is always $
   Now,the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th
    production.There is nothing after ‘S’, so the lookahead of 0th production will be the
    lookahead of 1st production. i.e. : S–>.AA ,$
   Now,the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st
    production.
    After ‘A’, there’s ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2nd
    production becomes a|b.
   Now,the 3rd production is a part of the 2nd production.So, the look ahead will be the
    same.
    STEP2 – Find LR(0) collection of items
    Below is the figure showing the LR(0) collection of items. We will understand
    everything one by one.
   The terminals of this grammar are {a,b}
   The non-terminals of this grammar are {S,A}
RULES –
1. If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘
   . ‘ preceding each of its production.
2. from each state to the next state, the ‘ . ‘ shifts to one place to the right.
   In the figure, I0 consists of augmented grammar.
   Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.).
    This state is the accept state . S is seen by the compiler. Since I1 is a part of the 0th
    production, the lookahead is same i.e. $
   Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is
    seen by the compiler. Since I2 is a part of the 1st production, the lookahead is same
    i.e. $.
   I0 goes to I3 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . a is
    seen by the compiler.since I3 is a part of 2nd production, the lookahead is same i.e.
    a|b.
   I0 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen
    by the compiler. Since I4 is a part of 3rd production, the lookahead is same i.e. a|b.
   I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is
    seen by the compiler. Since I5 is a part of the 1st production, the lookahead is same
    i.e. $.
   I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is
    seen by the compiler. Since I6 is a part of the 2nd production, the lookahead is same
    i.e. $.
   I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen
    by the compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
   I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler. Since I3 is a part of the 2nd production, the lookahead is same
    i.e. a|b.
   I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
    seen by the compiler. Since I8 is a part of the 2nd production, the lookahead is same
    i.e. a|b.
   I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
    seen by the compiler. Since I9 is a part of the 2nd production, the lookahead is same
    i.e. $.
   I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
    seen by the compiler. Since I6 is a part of the 2nd production, the lookahead is same
    i.e. $.
   I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
    seen by the compiler. Since I6 is a part of the 3rd production, the lookahead is same
    i.e. $.
STEP 3 –
Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the
parsing table.Below is the CLR parsing table
Once we make a CLR parsing table, we can easily make a LALR parsing table from it.
In the step2 diagram, we can see that
  I3 and I6 are similar except their lookaheads.
 I4 and I7 are similar except their lookaheads.
 I8 and I9 are similar except their lookaheads.
In LALR parsing table construction , we merge these similar states.
   Wherever there is 3 or 6, make it 36(combined form)
   Wherever there is 4 or 7, make it 47(combined form)
   Wherever there is 8 or 9, make it 89(combined form)
    Below is the LALR parsing table.
Now we have to remove the unwanted rows
   As we can see, 36 row has same data twice, so we delete 1 row.
   We combine two 47 row into one by combining each value in the single 47 row.
   We combine two 89 row into one by combining each value in the single 89 row.
    The final LALR table looks like the below.