KEMBAR78
UNIT-2: Parsing | PDF | Parsing | Metalogic
0% found this document useful (0 votes)
78 views18 pages

UNIT-2: Parsing

Parsing, or syntactic analysis, is the process of analyzing a sequence of tokens to determine the grammatical structure of a program, resulting in a parse tree that aids in further processing. There are two main types of parsing: top-down and bottom-up, each with its own methods and classifications, including recursive descent and operator precedence parsing. The document outlines the roles of parsers, their types, and the conditions for LL(1) grammar, along with examples and advantages and disadvantages of different parsing techniques.

Uploaded by

c3421338
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views18 pages

UNIT-2: Parsing

Parsing, or syntactic analysis, is the process of analyzing a sequence of tokens to determine the grammatical structure of a program, resulting in a parse tree that aids in further processing. There are two main types of parsing: top-down and bottom-up, each with its own methods and classifications, including recursive descent and operator precedence parsing. The document outlines the roles of parsers, their types, and the conditions for LL(1) grammar, along with examples and advantages and disadvantages of different parsing techniques.

Uploaded by

c3421338
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-2

Parsing, also known as syntactic analysis is the process of analyzing a sequence


of tokens to determine the grammatical structure of a program. It takes the stream of
tokens, which are generated by a lexical analyzer or tokenizer, and organizes them into a
parse tree or syntax tree.
The parse tree visually represents how the tokens fit together according to the rules of
the language’s syntax. This tree structure is crucial for understanding the program’s
structure and helps in the next stages of processing, such as code generation or
execution. Additionally, parsing ensures that the sequence of tokens follows the
syntactic rules of the programming language, making the program valid and ready for
further analysis or execution.

What is the Role of Parser?

A parser performs syntactic and semantic analysis of source code, converting it into an
intermediate representation while detecting and handling errors.
1. Context-free syntax analysis: The parser checks if the structure of the code follows
the basic rules of the programming language (like grammar rules). It looks at how
words and symbols are arranged.
2. Guides context-sensitive analysis: It helps with deeper checks that depend on the
meaning of the code, like making sure variables are used correctly. For example, it
ensures that a variable used in a mathematical operation, like x + 2, is a number and
not text.
3. Constructs an intermediate representation: The parser creates a simpler version of
your code that’s easier for the computer to understand and work with.
4. Produces meaningful error messages: If there’s something wrong in your code, the
parser tries to explain the problem clearly so you can fix it.
5. Attempts error correction: Sometimes, the parser tries to fix small mistakes in your
code so it can keep working without breaking completely.

Types of Parsing
The parsing is divided into two types, which are as follows:
 Top-down Parsing
 Bottom-up Parsing

Top-Down Parsing
Top-down parsing is a method of building a parse tree from the start symbol (root)
down to the leaves (end symbols). The parser begins with the highest-level rule and
works its way down, trying to match the input string step by step.
 Process: The parser starts with the start symbol and looks for rules that can help it
rewrite this symbol. It keeps breaking down the symbols (non-terminals) into
smaller parts until it matches the input string.
 Leftmost Derivation: In top-down parsing, the parser always chooses the leftmost
non-terminal to expand first, following what is called leftmost derivation. This means
the parser works on the left side of the string before moving to the right.
 Other Names: Top-down parsing is sometimes called recursive parsing or predictive
parsing. It is called recursive because it often uses recursive functions to process the
symbols.
Top-down parsing is useful for simple languages and is often easier to implement.
However, it can have trouble with more complex or ambiguous grammars.
Top-down parsers can be classified into two types based on whether they use
backtracking or not:
1. Top-down Parsing with Backtracking
In this approach, the parser tries different possibilities when it encounters a choice, If
one possibility doesn’t work (i.e., it doesn’t match the input string), the parser
backtracks to the previous decision point and tries another possibility.
Example: If the parser chooses a rule to expand a non-terminal, and it doesn’t work, it
will go back, undo the choice, and try a different rule.
Advantage: It can handle grammars where there are multiple possible ways to expand a
non-terminal.
Disadvantage: Backtracking can be slow and inefficient because the parser might have
to try many possibilities before finding the correct one.

2. Top-down Parsing without Backtracking


In this approach, the parser does not backtrack. It tries to find a match with the input
using only the first choice it makes, If it doesn’t match the input, it fails immediately
instead of going back to try another option.
Example: The parser will always stick with its first decision and will not reconsider
other rules once it starts parsing.
Advantage: It is faster because it doesn’t waste time going back to previous steps.
Disadvantage: It can only handle simpler grammars that don’t require trying multiple
choices.

Recursive descent parser –

Recursive descent parser is a top-down parser that processes input based on a set of
recursive functions, where each function corresponds to a grammar rule. It parses the
input from left to right, constructing a parse tree by matching the grammar’s production
rules. This parser is simple to implement and is suitable for LL(1) grammars, where
decisions can be made based on a single lookahead token. While straightforward,
recursive descent parsers struggle with left-recursive grammars and may require
grammar transformations to handle such cases effectively.
A Predictive Parser is a special case of Recursive Descent Parser, where no Back Tracking
is required.
By carefully writing a grammar, means eliminating left recursion and left factoring from
it, the resulting grammar will be a grammar that can be parsed by a recursive descent
parser.
By carefully writing a grammar means eliminating left recursion and left factoring from
it, the resulting grammar will be a grammar that can be parsed by a recursive descent
parser.

Example:
Before removing left recursion After removing left recursion

E –> T E’
E –> E + T | T E’ –> + T E’ | e
T –> T * F | F T –> F T’
F –> ( E ) | id T’ –> * F T’ | e
F –> ( E ) | id

Important points about recursive descent parsers


1. Top-Down Parsing: It starts from the start symbol and recursively applies grammar
rules to break down the input.
2. One Function per Non-Terminal: Each grammar rule has a corresponding function
in the parser, making the implementation straightforward.
3. Uses Recursion: The parser calls functions within themselves to process different
parts of the input, matching the recursive nature of grammar rules.
4. Works Best with LL(1) Grammars: It’s most effective for grammars that can be
parsed with a single token lookahead, typically simple, non-left-recursive grammars.
5. Easy to Implement: The structure is easy to follow and implement, making it a good
choice for small compilers or interpreters.
6. Error Handling: It can detect syntax errors and report them, making it useful for
debugging input strings.

What is LL(1) Parsing?

Here the 1st L represents that the scanning of the Input will be done from the Left to
Right manner and the second L shows that in this parsing technique, we are going to use
the Left most Derivation Tree. And finally, the 1 represents the number of look-ahead,
which means how many symbols you will see when you want to make a decision.
Conditions for an LL(1) Grammar
To construct a working LL(1) parsing table, a grammar must satisfy these conditions:
 No Left Recursion: Avoid recursive definitions like A -> A + b.
 Unambiguous Grammar: Ensure each string can be derived in only one way.
 Left Factoring: Make the grammar deterministic, so the parser can proceed without
guessing.

Algorithm to Construct LL(1) Parsing Table

Step 1: First check all the essential conditions mentioned above and go to step 2.

Step 2: Calculate First() and Follow() for all non-terminals.


1. First(): If there is a variable, and from that variable, if we try to drive all the strings
then the beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the process of
derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First( α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each
terminal in Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –>
ε in the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the
Terminal Symbols. All the Null Productions of the Grammars will go under the Follow
elements and the remaining productions will lie under the elements of the First set.

Example 1: Consider the Grammar:


E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)

*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().
Find their First and Follow sets:
First Follow

E –> TE’ { id, ( } { $, ) }

E’ –> +TE’/ { +, ε } { $, ) }
First Follow

T –> FT’ { id, ( } { +, $, ) }

T’ –> *FT’/
{ *, ε } { +, $, ) }
ε

F –> id/(E) { id, ( } { *, +, $, ) }

Step 3: Make a parser table.


Now, the LL(1) Parsing Table is:

id + * ( ) $

E E –> TE’ E –> TE’

E’ –> E’ –>
E’ E’ –> +TE’
ε ε

T T –> FT’ T –> FT’

T’ –> T’ –>
T’ T’ –> ε T’ –> *FT’
ε ε

F F –> id F –> (E)

As you can see that all the null productions are put under the Follow set of that symbol
and all the remaining productions lie under the First of that symbol.

Bottom-Up Parsing:
Bottom-up parsing is a method of building a parse tree starting from the leaf nodes (the
input symbols) and working towards the root node (the start symbol). The goal is to
reduce the input string step by step until we reach the start symbol, which represents
the entire language.
 Process: The parser begins with the input symbols and looks for patterns that can be
reduced to non-terminals based on the grammar rules. It keeps reducing parts of the
string until it forms the start symbol.
 Rightmost Derivation in Reverse: In bottom-up parsing, the parser traces the
rightmost derivation of the string but works backwards, starting from the input
string and moving towards the start symbol.
 Shift-Reduce Parsing: Bottom-up parsers are often called shift-reduce parsers
because they shift (move symbols) and reduce (apply rules to replace symbols) to
build the parse tree.
Bottom-up parsing is efficient for handling more complex grammars and is commonly
used in compilers. However, it can be more challenging to implement compared to top-
down parsing.

Generally, bottom-up parsing is categorized into the following types:

Difference Between Bottom-Up and Top-Down Parser


Feature Top-down Parsing Bottom-up Parsing

Builds tree from leaves to


Direction Builds tree from root to leaves.
root.

Uses rightmost derivation in


Derivation Uses leftmost derivation.
reverse.

Can be slower, especially with More efficient for complex


Efficiency
backtracking. grammars.

Example
Recursive descent, LL parser. Shift-reduce, LR parser.
Parsers

Bottom-up parsing- is a type of syntax analysis method where the parser starts
from the input symbols (tokens) and attempts to reduce them to the start symbol of the
grammar (usually denoted as S). The process involves applying production rules in
reverse, starting from the leaves of the parse tree and working upwards toward the root.
Here’s how it works:
Start with tokens: The parser begins with the terminal symbols (the input tokens),
which are the leaves of the parse tree.

Shift and reduce: The parser repeatedly applies two actions:


 Shift: The next token is pushed onto a stack.
 Reduce: A sequence of symbols on the stack is replaced by a non-terminal according
to the production rules of the grammar. This step is called “reduction,” where the
parser replaces the right-hand side of a production with the left-hand side non-
terminal.
Repeat until root: The process of shifting and reducing continues until the entire input
is reduced to the start symbol, indicating the sentence has been successfully parsed.

Example:
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id

input string: “id * id”

Example:

Consider the following simple grammar:


1. S→A+B
2. A→3
3. B→5
Now, let’s parse the string “3 + 5”:
 First, we start with the input string: “3 + 5”.
 We look for parts of the string that match a production rule.
 We see that “3” matches A, so we replace it with A (so now we have A+5).
 Next, “5” matches B, so we replace it with B (now we have A+B).
 Finally, A+B matches the production S→A+B, so we replace A+B with S.
Now, we’ve reduced the entire input to the start symbol S, meaning the input has been
successfully parsed.

Classification of Bottom-up Parsers

A bottom-up parser is often referred to as a shift-reduce parser.

A shift-reduce parser has just four canonical actions:


 shift — next input symbol is shifted onto the top of the stack.
 reduce — right end of handle is on top of stack; locate left end of handle within the
stack; pop handle off stack and push appropriate non-terminal LHS.
 accept — terminate parsing and signal success.
 error — call an error recovery routine.

Operator Precedence Parsing-


Operator precedence parsing is a type of bottom-up parsing used to parse expressions
based on operator precedence relations. It is suitable for grammars where operators
have clear precedence and associativity, such as arithmetic expressions.

Operator Precedence Relations-

Operator precedence parsers rely on three relations between terminal symbols


(operators) to determine the parsing action:
 Less than ( <· ) → Operator has lower precedence than the next.
 Greater than ( ·> ) → Operator has higher precedence than the next.
 Equal to ( = ) → Operators have the same precedence (e.g., parentheses matching).
These relations help in deciding when to shift or reduce during parsing.

Defining Precedence Relations-

The precedence relations are defined using the following rules-

 Rule-01: If precedence of b is higher than precedence of a, then we define a < b


 If precedence of b is same as precedence of a, then we define a = b
 If precedence of b is lower than precedence of a, then we define a > b

Rule-02:

 An identifier is always given the higher precedence than any other symbol.
 $ symbol is always given the lowest precedence.

Rule-03:

 If two operators have the same precedence, then we go by checking their


associativity.

Parsing A Given String-

The given input string is parsed using the following steps-


Step-01:

Insert the following-

 $ symbol at the beginning and ending of the input string.


 Precedence operator between every two symbols of the string by referring the
operator precedence table.

Step-02:

 Start scanning the string from LHS in the forward direction until > symbol is
encountered.
 Keep a pointer on that location.

Step-03:

 Start scanning the string from RHS in the backward direction until < symbol is
encountered.
 Keep a pointer on that location.

Step-04:

 Everything that lies in the middle of < and > forms the handle.
 Replace the handle with the head of the respective production.

Step-05:

Keep repeating the cycle from Step-02 to Step-04 until the start symbol is reached.

Advantages-

The advantages of operator precedence parsing are-

 The implementation is very easy and simple.


 The parser is quite powerful for expressions in programming languages.

Disadvantages-

The disadvantages of operator precedence parsing are-

 The handling of tokens known to have two different precedence becomes difficult.
 Only small class of grammars can be parsed using this parser.

Operator Precedence Table-

A table defining precedence relationships among operators is required for the parser to
function. Example:
Operator + * ( ) $

+ ·> <· <· ·> ·>

* ·> ·> <· ·> ·>

( <· <· <· = error

) ·> ·> error ·> ·>

$ <· <· <· error accept

 $ represents the end of input.


 Shift occurs when the relation is <· (lower precedence).
 Reduce occurs when the relation is ·> (higher precedence).

LR Parsing-
LR Parser is a type of bottom-up parsing. It is used to parse context-free grammar,
primarily used by compilers. The LR parser reads the input from left to right and produces
the rightmost derivation.
The LR parsing can be further divided into four types-
The LR parser is denoted as LR(k), where the letter “L” refers to the left to right scanning of
the input, the letter “R” refers to the rightmost derivation in reverse, and “k” refers to the
number of unconsumed “look ahead” input symbols that are used in making parser
decisions. The letter k is usually one and is frequently omitted. A context-free grammar is
called LR(k) if the LR(k) parser exists.

LR Algorithm
The LR (Left-to-Right) algorithm, commonly referred to in the context of parsing, is a
technique used to analyze a sequence of input symbols to determine its grammatical
structure with respect to a given formal grammar. It is widely used in the construction of
compilers and interpreters.
1. Initialization: Start with an empty stack and place a special end-of-input marker ($)
at the bottom. Place the start symbol of the grammar on top of the stack.
2. Read Input: Read the input string from left to right, symbol by symbol.
3. Shift and Reduce Operations: Shift, move the current input symbol to the stack and
advance to the next input symbol. Reduce, if the stack matches the right-hand side of
a production rule, replace it with the left-hand side non-terminal.
4. Parser Actions: Use a parsing table to decide whether to shift, reduce, accept, or
error. Accept, if the stack contains the start symbol and the input is fully consumed,
the string is successfully parsed. Error, if no valid action exists for the current stack
and input symbol, report a syntax error.
5. Termination: The algorithm terminates when the input is successfully parsed (accept
state) or a syntax error is encountered.

Block Diagram
Below described is the block diagram of the LR parser.
The input buffer takes one symbol at a time from left to right a1, a2,..., am,$.
The stack contains the combination of the state symbol and the current input symbol.
The parsing table is a two-dimensional array containing the Action and GoTo tables.

SLR Parser
 LR parser is also called as SLR parser
 it is weakest of the three methods but easier to implement
 a grammar for which SLR parser can be constructed is called SLR grammar
Steps for constructing the SLR parsing table
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the
parsing table

EXAMPLE – Construct LR parsing table for the given context-free grammar


S–>AA
A–>aA|b

Solution:
STEP1: Find augmented grammar
The augmented grammar of the given grammar is:-
S’–>.S [0th production]
S–>.AA [1st production]
A–>.aA [2nd production]
A–>.b [3rd production]

STEP2: Find LR(0) collection of items

Below is the figure showing the LR(0) collection of items.

The terminals of this grammar are {a,b}.


The non-terminals of this grammar are {S,A}
RULE –
If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
RULE –
from each state to the next state, the ‘ . ‘ shifts to one place to the right.
 In the figure, I0 consists of augmented grammar.
 Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.). this
state is the accepted state. S is seen by the compiler.
 Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is seen
by the compiler
 I0 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
seen by the compiler.
 I0 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
seen by the compiler.
 I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is seen
by the compiler
 I2 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen by
the compiler.
 I2 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
seen by the compiler.
 I3 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is
seen by the compiler.
 I3 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
seen by the compiler
 I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is
seen by the compiler.

STEP3: Find FOLLOW of LHS of production

FOLLOW(S)=$
FOLLOW(A)=a,b,$

STEP 4: Defining 2 functions:goto[list of non-terminals] and action[list of terminals] in


the parsing table. Below is the SLR parsing table.
 $ is by default a nonterminal that takes accepting state.
 0,1,2,3,4,5,6 denotes I0,I1,I2,I3,I4,I5,I6
 I0 gives A in I2, so 2 is added to the A column and 0 rows.
 I0 gives S in I1,so 1 is added to the S column and 1 row.
 similarly 5 is written in A column and 2 row, 6 is written in A column and 3 row.
 I0 gives a in I3 .so S3(shift 3) is added to a column and 0 row.
 I0 gives b in I4 .so S4(shift 4) is added to the b column and 0 row.
 Similarly, S3(shift 3) is added on a column and 2,3 row ,S4(shift 4) is added on b
column and 2,3 rows.
 I4 is reduced state as ‘ . ‘ is at the end. I4 is the 3rd production of grammar(A–
>.b).LHS of this production is A. FOLLOW(A)=a,b,$ . write r3(reduced 3) in the
columns of a,b,$ and 4th row
 I5 is reduced state as ‘ . ‘ is at the end. I5 is the 1st production of grammar(S–>.AA).
LHS of this production is S.
FOLLOW(S)=$ . write r1(reduced 1) in the column of $ and 5th row
 I6 is a reduced state as ‘ . ‘ is at the end. I6 is the 2nd production of grammar( A–
>.aA). The LHS of this production is A.
FOLLOW(A)=a,b,$ . write r2(reduced 2) in the columns of a,b,$ and 6th row

Youtube Links-

https://youtu.be/WTxdKQmsfho?si=pB0qV4s-htUIwlVq

https://youtu.be/4NWgdNquqEo?si=-D9gizQgi6l-o1rU

https://youtu.be/qTJR9Jw_i2I?si=Uy1vOEiWMJZYDhHG

You might also like