KEMBAR78
Chapter 04 Lexical and Syntax Analysis | PDF | Parsing | Linguistics
0% found this document useful (0 votes)
15 views43 pages

Chapter 04 Lexical and Syntax Analysis

Chapter 4 discusses lexical and syntax analysis, explaining the role of lexical analyzers in identifying lexemes and tokens for syntax analyzers. It covers derivations, including leftmost and rightmost derivations, and introduces parsing techniques such as top-down and bottom-up parsing. The chapter also details the shift-reduce parsing method, illustrating how strings are processed to derive the start symbol.

Uploaded by

nomanur rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views43 pages

Chapter 04 Lexical and Syntax Analysis

Chapter 4 discusses lexical and syntax analysis, explaining the role of lexical analyzers in identifying lexemes and tokens for syntax analyzers. It covers derivations, including leftmost and rightmost derivations, and introduces parsing techniques such as top-down and bottom-up parsing. The chapter also details the shift-reduce parsing method, illustrating how strings are processed to derive the start symbol.

Uploaded by

nomanur rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

CHAPTER - 04

LEXICAL AND SYNTAX ANALYZER

Spring 2018
Lexical Analysis
A lexical analyzer is essentially a pattern matcher.
2

A pattern matcher attempts to find a substring of a


given string of characters that matches a given

character pattern

A lexical analyzer serves as the front end of a syntax


analyzer.

Lexical analyzers are subprograms that


locate the next lexeme in the input,

determine its associated token code, and


return them to the caller, which is the syntax analyzer


IUBAT CSC 461 May 1, 2025


Lexical-analysis process
Skipping comments and white space
3

Inserts lexemes for user-defined names


into the symbol table

 Detect syntactic errors in tokens

IUBAT CSC 461 May 1, 2025


Leftmost and Rightmost Derivations

S → ABC,
4

A → aA,

A→λ
B → bB,
B→λ
C → cC,
C→λ
 With this grammar, there is a choice of variables to
expand. Here is a sample derivation:
 S ABC aABC aABcC aBcC abBcC
abBc abbBc abbc

IUBAT CSC 461 May 1, 2025


Leftmost Derivations
5

 If we always expanded the leftmost


variable first, we would have a leftmost
derivation:

 S ABC aABC aBC abBC


abbBC abbC abbcC abbc

S → ABC, A → aA, A→
Grammar is -

λ
B → bB, B→λ
C → cC, C→λ
IUBAT CSC 461 May 1, 2025
Rightmost Derivations
6

 Conversely, if we always expanded the


rightmost variable first, we would have
a rightmost derivation:

 S ABC ABcC ABc AbBc


AbbBc Abbc aAbbc abbc

S → ABC, A → aA, A→
Grammar is -

λ
B → bB, B→λ
C → cC, C→λ
IUBAT CSC 461 May 1, 2025
Syntax Analysis
Syntax Analysis is often called parsing
7

 Discovers the derivation of a string


 Two major approaches
Top-down parsing
Bottom-up parsing

IUBAT CSC 461 May 1, 2025


Top-Down Parsing
Start with the root of the parse tree
8

Root of the tree: node labeled with the start symbol


Algorithm:
Repeat until the fringe of the parse tree matches input

string
If a terminal symbol is added that doesn’t match,
backtrack

 Find the next node to be expanded

Leaves of parse tree match input string (success)

All productions exhaustedIUBAT


in backtracking

(failure)
 CSC 461 May 1, 2025
Example
 Expression grammar
9

#
(with precedence) Production rule
1 expr → expr + term
2 | expr - term
3 | term
4 term → term * factor
5 | term /
6 factor
7 | factor
8 factor → number
| identifier

 Input string x – 2 * y
IUBAT CSC 461 May 1, 2025
Example Current position in
the input stream
10

Rule Sentential form Input string


- expr  x - 2 * y expr

1 expr + term  x - 2 * y
3 term + term  x – 2 * y
6 factor + term  x – 2 * y expr + term
8 <id> + term x  – 2 * y
- <id,x> + term x  – 2 * y
term
# Production rule
1 expr → expr + term
2 | expr - term
fact
3 | term
4 term → term * factor
5 | term / factor x
Can’t match next terminal6 | factor
 Problem:

7 factor → number

8 | identifier
 We guessed wrong at step 2
IUBAT CSC 461 May 1, 2025
Backtracking
11

Rule Sentential form Input string


- expr  x - 2 * y
2 expr + term  x - 2 * y
3 term + term  x – 2 * y Undo all these
6 factor + term  x – 2 * y productions
8 <id> + term x  – 2 * y
? <id,x> + term x  – 2 * y

 Rollback productions
 Choose a different production for expr
 Continue
IUBAT CSC 461 May 1, 2025
Retrying
12

Rule Sentential form Input string expr


- expr  x - 2 * y
2 expr - term  x - 2 * y
3 term - term  x – 2 * y expr - term
6 factor - term  x – 2 * y
8 <id> - term x  – 2 * y
 term fact
- <id,x> - term x – 2 * y
6 <id,x> - factor x –  2 * y
7 <id,x> - <num> x – 2  * y fact 2
# Production rule
1 expr → expr + term
2 | expr -
x
3 term
 Problem: 4 | term
5 term → term * factor
 More input to read
6 | term /
 Another cause of backtracking factor
7
| factor
8
IUBAT
factor → numberCSC 461 May 1, 2025
| identifier
# Production rule

1 expr → expr + term

Successful Parse
2 | expr - term
3 | term
4 term → term * factor
5 | term / factor
6 | factor
7 factor → number
13 8 | identifier

Rule Sentential form Input string expr


- expr  x - 2 * y
2 expr - term  x - 2 * y
3 term - term  x – 2 * y expr - term
6 factor - term  x – 2 * y
8 <id> - term x  – 2 * y
term term * fact
- <id,x> - term x –  2 * y
4 <id,x> - term * fact x –  2 * y
6 <id,x> - fact * fact x –  2 * y fact fact y
7 <id,x> - <num> * fact x – 2  * y
- <id,x> - <num,2> * fact x – 2 *  y
x 2
8 <id,x> - <num,2> * <id> x – 2 * y 

 All terminals match – we’re finished


IUBAT CSC 461 May 1, 2025
Bottom-Up Parsing
14

 Also called LR parsing

 L means that tokens are read left to right


 R means that it constructs a rightmost derivation
from right to left, but in backwards.

IUBAT CSC 461 May 1, 2025


Rightmost Derivation backwards
15

 E ----> E + T | T
T ----> T * F | F
F ----> ( E ) | a | b | c

 a + b * c <==
 F + b * c <==
 T + b * c <==
 E + b * c <==
 E + F * c <==
 E + T * c <==
 E + T * F <==
 E + T <==
 E
IUBAT CSC 461 May 1, 2025
The Idea
LR parsing reduces a string to the start
16

symbol by inverting productions:


str = input string of terminals


repeat
 Identify b in str such that A  b is a
production
(i.e., str = a b g)
 Replace b by A in str (i.e., str becomes a A g)

until str = G
IUBAT CSC 461 May 1, 2025
Notation
Split input into two substrings
17

Right substring (a string of terminals) is as yet


unexamined by parser

 Left substring has terminals and non-terminals

 The dividing point is marked by a 


 The  is not part of the string

Initially, all input is unexamined: 


x1x2 . . . xn

IUBAT CSC 461 May 1, 2025


A Bottom-up Parse in Detail
18
(1)
E  int
int + (int) + (int) E  E + (E)

int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
19
(2)
E  int
int + (int) + (int) E  E + (E)
E + (int) + (int)

int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
20
(3)
E  int
int + (int) + (int) E  E + (E)
E + (int) + (int)
E + (E) + (int)

E E

int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
21
(4)
E  int
int + (int) + (int) E  E + (E)
E + (int) + (int)
E + (E) + (int)
E + (int)

E E

int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
22
(5)
E  int
int + (int) + (int) E  E + (E)
E + (int) + (int)
E + (E) + (int)
E + (int)
E + (E)
E

E E E

int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
23
(6)
int + (int) + (int) E
E + (int) + (int)
E + (E) + (int)
E + (int)
E + (E)
E E

E E E

E  int int + ( in ) + ( int )


E  E + (E)
t
IUBAT CSC 461 May 1, 2025
Shift-Reduce Parsing
Bottom-up parsing uses only two kinds of actions:
24

Shift and Reduce


Shift: Move  one place to the right


Shifts a terminal to the left string

 E + ( int )  E + (int  )

Reduce: Apply an inverse production at the right


end of the left string

 If E  E + ( E ) is a production, then

 E + (E + ( E )  )  E +(E  )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
25
E  int
  int + (int) + (int)$ shift E  E + (E)

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
26
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
27
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3
times

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
28
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3
times
 E + (int  ) + (int)$ red. E  int

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
29
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3
times
 E + (int  ) + (int)$ red. E  int
 E + (E  ) + (int)$ shift

E E

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
30
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3 times
 E + (int  ) + (int)$ red. E  int
 E + (E  ) + (int)$ shift
 E + (E)  + (int)$ red. E  E +
(E)

E E

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
31
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3 times
 E + (int  ) + (int)$ red. E  int
E
 E + (E  ) + (int)$ shift
 E + (E)  + (int)$ red. E  E +
(E)
 E  + (int)$ shift 3 times
E E

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
32
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3 times
 E + (int  ) + (int)$ red. E  int
E
 E + (E  ) + (int)$ shift
 E + (E)  + (int)$ red. E  E + (E)

 E  + (int)$ shift 3 times


 E + (int  )$ red. E  int E E

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
33
E  int
  int + (int) + (int)$ shift E  E + (E)
 int  + (int) + (int)$ red. E  int
 E  + (int) + (int)$ shift 3 times
 E + (int  ) + (int)$ red. E  int
E
 E + (E  ) + (int)$ shift
 E + (E)  + (int)$ red. E  E +
(E)
 E  + (int)$ shift 3 times
 E + (int  )$ red. E  int E E E
 E + (E  )$ shift

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
34
E  int
• int + (int) + (int)$ shift E  E + (E)
•int  + (int) + (int)$ red. E  int
•E  + (int) + (int)$ shift 3 times
•E + (int  ) + (int)$ red. E  int
•E + (E  ) + (int)$ shift
E
•E + (E)  + (int)$ red. E  E +
(E)
•E  + (int)$ shift 3 times
•E + (int  )$ red. E  int
•E + (E  )$ shift
E E E
•E + (E)  $ red. E  E + (E)

int + ( int )+ ( int )

IUBAT CSC 461 May 1, 2025


Shift-Reduce Example
E  int
E  E + (E)
35
E
• int + (int) + (int)$ shift
•int  + (int) + (int)$ red. E  int
•E  + (int) + (int)$ shift 3 times
•E + (int  ) + (int)$ red. E  int E
•E + (E  ) + (int)$ shift
•E + (E)  + (int)$ red. E  E +
(E)
•E  + (int)$ shift 3 times
E E E
•E + (int  )$ red. E  int
•E + (E  )$ shift
•E + (E)  $ red. E  E + int + ( int )+ ( int )
(E)
•E  $ accept IUBAT CSC 461 May 1, 2025
Shift-Reduce
36
Parsing
By using the following grammar Show a complete
Shift Reduce Parse, including Input String and
Action
for the String C = B * (C * (A + B))

<assign> <id> = <expr>


<id>  A | B | C
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor>  ( <expr> ) | <id>
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
A recursive-descent parser is so named because it
37

consists of a collection of subprograms, many of


which are recursive, and it produces a parse tree in
top-down order.

This recursion is a reflection of the nature of


programming
languages, which include several different kinds of
nested structures.

For example, statements are often nested in other


statements. Also, parentheses in expressions must be
properly nested.
The syntax of these structures is naturally described
with recursive grammar rules.

EBNF is ideally suited for recursive-descent parsers.


IUBAT CSC 461 May 1, 2025
Recursive-descent parser
38

Consider the following EBNF


description of simple arithmetic
expressions:

<expr> → <term> {(+| -) <term>}

<term> → <factor> {(*| /) <factor>}

<factor> → id | int_constant | (<expr> )

IUBAT CSC 461 May 1, 2025


Recursive-descent parser
39
The recursive-descent subprogram for the first rule in the
previous example grammar, written in C, is
/* expr -- Parses strings in the language generated by the rule:
<expr> -> <term> {(+ | -) <term>} */

Void expr() {
printf("Enter <expr>\n");
/* Parse the first term */
term();
/* As long as the next token is + or -, get
the next token and parse the next term */
While (nextToken == ADD_OP || nextToken == SUB_OP) {
lex();
term();
}
printf("Exit <expr>\n");
} /* End of function expr */

IUBAT CSC 461 May 1, 2025


Recursive-descent parser
40
The parsing subprogram for <term> is similar to that for
<expr>:

/* term--Parses strings in the language generated by the rule:


<term> -> <factor> {(* | /) <factor>) */

Void term() {
printf("Enter <term>\n");
/* Parse the first factor */
factor();
/* As long as the next token is * or /, get the
next token and parse the next factor */
While (nextToken == MULT_OP || nextToken == DIV_OP) {
lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
41
/* factor-- Parses strings in the Else {
language generated by the rule: If (nextToken == LEFT_PAREN) {
<factor> -> id | int_constant | lex();
( <expr> ) expr();
*/ If (nextToken == RIGHT_PAREN)
Void factor() { lex();
Printf ("Enter <factor>\n"); else
/* Determine which RHS */ error();
If (nextToken == IDENT || } /* End of if (nextToken == ... */
nextToken == INT_LIT) /* It was not an id, an integer literal,
/* Get the next token */ or a left
lex(); parenthesis */
else
/* If the RHS is ( <expr>), call error();
lex to pass over the left } /* End of else */
parenthesis, call expr, and check Printf ("Exit <factor>\n");
for the right parenthesis */ } /* End of function factor */
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
42 Next token is: 10 Next lexeme is 47
Following is the trace of the parse of Enter <term>
the example expression (sum + 47) / Enter <factor>
total, using the parsing functions
expr, term, and factor, and the Next token is: 26 Next lexeme is )
function Lex from Lexical Analyzer Exit <factor>
Exit <term>
Next token is: 25 Next lexeme is ( Exit <expr>
Enter <expr>
Enter <term> Next token is: 24 Next lexeme is /
Enter <factor> Exit <factor>

Next token is: 11 Next lexeme is sum Next token is: 11 Next lexeme is
Enter <expr> total
Enter <term> Enter <factor>
Enter <factor>
Next token is: -1 Next lexeme is
Next token is: 21 Next lexeme is + EOF
Exit <factor> Exit CSC
<factor>
IUBAT 461 May 1, 2025
Exit <term> Exit <term>
Recursive-descent parser
43

Write EBNF rule (if necessary) for the following BNF


Grammar and the Recursive-Descent Subprograms
in C
for these Rules. Also show the Trace of the Parse for
((12 + x ) * y) Input String along with a Parse Tree.

S E
E E + T | E - T | T
T T * P | T / P | P
P  ( E ) | char | number

IUBAT CSC 461 May 1, 2025

You might also like