CHAPTER - 04
LEXICAL AND SYNTAX ANALYZER
Spring 2018
Lexical Analysis
A lexical analyzer is essentially a pattern matcher.
2
A pattern matcher attempts to find a substring of a
given string of characters that matches a given
character pattern
A lexical analyzer serves as the front end of a syntax
analyzer.
Lexical analyzers are subprograms that
locate the next lexeme in the input,
determine its associated token code, and
return them to the caller, which is the syntax analyzer
IUBAT CSC 461 May 1, 2025
Lexical-analysis process
Skipping comments and white space
3
Inserts lexemes for user-defined names
into the symbol table
Detect syntactic errors in tokens
IUBAT CSC 461 May 1, 2025
Leftmost and Rightmost Derivations
S → ABC,
4
A → aA,
A→λ
B → bB,
B→λ
C → cC,
C→λ
With this grammar, there is a choice of variables to
expand. Here is a sample derivation:
S ABC aABC aABcC aBcC abBcC
abBc abbBc abbc
IUBAT CSC 461 May 1, 2025
Leftmost Derivations
5
If we always expanded the leftmost
variable first, we would have a leftmost
derivation:
S ABC aABC aBC abBC
abbBC abbC abbcC abbc
S → ABC, A → aA, A→
Grammar is -
λ
B → bB, B→λ
C → cC, C→λ
IUBAT CSC 461 May 1, 2025
Rightmost Derivations
6
Conversely, if we always expanded the
rightmost variable first, we would have
a rightmost derivation:
S ABC ABcC ABc AbBc
AbbBc Abbc aAbbc abbc
S → ABC, A → aA, A→
Grammar is -
λ
B → bB, B→λ
C → cC, C→λ
IUBAT CSC 461 May 1, 2025
Syntax Analysis
Syntax Analysis is often called parsing
7
Discovers the derivation of a string
Two major approaches
Top-down parsing
Bottom-up parsing
IUBAT CSC 461 May 1, 2025
Top-Down Parsing
Start with the root of the parse tree
8
Root of the tree: node labeled with the start symbol
Algorithm:
Repeat until the fringe of the parse tree matches input
string
If a terminal symbol is added that doesn’t match,
backtrack
Find the next node to be expanded
Leaves of parse tree match input string (success)
All productions exhaustedIUBAT
in backtracking
(failure)
CSC 461 May 1, 2025
Example
Expression grammar
9
#
(with precedence) Production rule
1 expr → expr + term
2 | expr - term
3 | term
4 term → term * factor
5 | term /
6 factor
7 | factor
8 factor → number
| identifier
Input string x – 2 * y
IUBAT CSC 461 May 1, 2025
Example Current position in
the input stream
10
Rule Sentential form Input string
- expr x - 2 * y expr
1 expr + term x - 2 * y
3 term + term x – 2 * y
6 factor + term x – 2 * y expr + term
8 <id> + term x – 2 * y
- <id,x> + term x – 2 * y
term
# Production rule
1 expr → expr + term
2 | expr - term
fact
3 | term
4 term → term * factor
5 | term / factor x
Can’t match next terminal6 | factor
Problem:
7 factor → number
8 | identifier
We guessed wrong at step 2
IUBAT CSC 461 May 1, 2025
Backtracking
11
Rule Sentential form Input string
- expr x - 2 * y
2 expr + term x - 2 * y
3 term + term x – 2 * y Undo all these
6 factor + term x – 2 * y productions
8 <id> + term x – 2 * y
? <id,x> + term x – 2 * y
Rollback productions
Choose a different production for expr
Continue
IUBAT CSC 461 May 1, 2025
Retrying
12
Rule Sentential form Input string expr
- expr x - 2 * y
2 expr - term x - 2 * y
3 term - term x – 2 * y expr - term
6 factor - term x – 2 * y
8 <id> - term x – 2 * y
term fact
- <id,x> - term x – 2 * y
6 <id,x> - factor x – 2 * y
7 <id,x> - <num> x – 2 * y fact 2
# Production rule
1 expr → expr + term
2 | expr -
x
3 term
Problem: 4 | term
5 term → term * factor
More input to read
6 | term /
Another cause of backtracking factor
7
| factor
8
IUBAT
factor → numberCSC 461 May 1, 2025
| identifier
# Production rule
1 expr → expr + term
Successful Parse
2 | expr - term
3 | term
4 term → term * factor
5 | term / factor
6 | factor
7 factor → number
13 8 | identifier
Rule Sentential form Input string expr
- expr x - 2 * y
2 expr - term x - 2 * y
3 term - term x – 2 * y expr - term
6 factor - term x – 2 * y
8 <id> - term x – 2 * y
term term * fact
- <id,x> - term x – 2 * y
4 <id,x> - term * fact x – 2 * y
6 <id,x> - fact * fact x – 2 * y fact fact y
7 <id,x> - <num> * fact x – 2 * y
- <id,x> - <num,2> * fact x – 2 * y
x 2
8 <id,x> - <num,2> * <id> x – 2 * y
All terminals match – we’re finished
IUBAT CSC 461 May 1, 2025
Bottom-Up Parsing
14
Also called LR parsing
L means that tokens are read left to right
R means that it constructs a rightmost derivation
from right to left, but in backwards.
IUBAT CSC 461 May 1, 2025
Rightmost Derivation backwards
15
E ----> E + T | T
T ----> T * F | F
F ----> ( E ) | a | b | c
a + b * c <==
F + b * c <==
T + b * c <==
E + b * c <==
E + F * c <==
E + T * c <==
E + T * F <==
E + T <==
E
IUBAT CSC 461 May 1, 2025
The Idea
LR parsing reduces a string to the start
16
symbol by inverting productions:
str = input string of terminals
repeat
Identify b in str such that A b is a
production
(i.e., str = a b g)
Replace b by A in str (i.e., str becomes a A g)
until str = G
IUBAT CSC 461 May 1, 2025
Notation
Split input into two substrings
17
Right substring (a string of terminals) is as yet
unexamined by parser
Left substring has terminals and non-terminals
The dividing point is marked by a
The is not part of the string
Initially, all input is unexamined:
x1x2 . . . xn
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
18
(1)
E int
int + (int) + (int) E E + (E)
int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
19
(2)
E int
int + (int) + (int) E E + (E)
E + (int) + (int)
int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
20
(3)
E int
int + (int) + (int) E E + (E)
E + (int) + (int)
E + (E) + (int)
E E
int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
21
(4)
E int
int + (int) + (int) E E + (E)
E + (int) + (int)
E + (E) + (int)
E + (int)
E E
int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
22
(5)
E int
int + (int) + (int) E E + (E)
E + (int) + (int)
E + (E) + (int)
E + (int)
E + (E)
E
E E E
int + ( in ) + ( int )
t
IUBAT CSC 461 May 1, 2025
A Bottom-up Parse in Detail
23
(6)
int + (int) + (int) E
E + (int) + (int)
E + (E) + (int)
E + (int)
E + (E)
E E
E E E
E int int + ( in ) + ( int )
E E + (E)
t
IUBAT CSC 461 May 1, 2025
Shift-Reduce Parsing
Bottom-up parsing uses only two kinds of actions:
24
Shift and Reduce
Shift: Move one place to the right
Shifts a terminal to the left string
E + ( int ) E + (int )
Reduce: Apply an inverse production at the right
end of the left string
If E E + ( E ) is a production, then
E + (E + ( E ) ) E +(E )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
25
E int
int + (int) + (int)$ shift E E + (E)
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
26
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
27
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3
times
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
28
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3
times
E + (int ) + (int)$ red. E int
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
29
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3
times
E + (int ) + (int)$ red. E int
E + (E ) + (int)$ shift
E E
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
30
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3 times
E + (int ) + (int)$ red. E int
E + (E ) + (int)$ shift
E + (E) + (int)$ red. E E +
(E)
E E
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
31
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3 times
E + (int ) + (int)$ red. E int
E
E + (E ) + (int)$ shift
E + (E) + (int)$ red. E E +
(E)
E + (int)$ shift 3 times
E E
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
32
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3 times
E + (int ) + (int)$ red. E int
E
E + (E ) + (int)$ shift
E + (E) + (int)$ red. E E + (E)
E + (int)$ shift 3 times
E + (int )$ red. E int E E
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
33
E int
int + (int) + (int)$ shift E E + (E)
int + (int) + (int)$ red. E int
E + (int) + (int)$ shift 3 times
E + (int ) + (int)$ red. E int
E
E + (E ) + (int)$ shift
E + (E) + (int)$ red. E E +
(E)
E + (int)$ shift 3 times
E + (int )$ red. E int E E E
E + (E )$ shift
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
34
E int
• int + (int) + (int)$ shift E E + (E)
•int + (int) + (int)$ red. E int
•E + (int) + (int)$ shift 3 times
•E + (int ) + (int)$ red. E int
•E + (E ) + (int)$ shift
E
•E + (E) + (int)$ red. E E +
(E)
•E + (int)$ shift 3 times
•E + (int )$ red. E int
•E + (E )$ shift
E E E
•E + (E) $ red. E E + (E)
int + ( int )+ ( int )
IUBAT CSC 461 May 1, 2025
Shift-Reduce Example
E int
E E + (E)
35
E
• int + (int) + (int)$ shift
•int + (int) + (int)$ red. E int
•E + (int) + (int)$ shift 3 times
•E + (int ) + (int)$ red. E int E
•E + (E ) + (int)$ shift
•E + (E) + (int)$ red. E E +
(E)
•E + (int)$ shift 3 times
E E E
•E + (int )$ red. E int
•E + (E )$ shift
•E + (E) $ red. E E + int + ( int )+ ( int )
(E)
•E $ accept IUBAT CSC 461 May 1, 2025
Shift-Reduce
36
Parsing
By using the following grammar Show a complete
Shift Reduce Parse, including Input String and
Action
for the String C = B * (C * (A + B))
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> ( <expr> ) | <id>
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
A recursive-descent parser is so named because it
37
consists of a collection of subprograms, many of
which are recursive, and it produces a parse tree in
top-down order.
This recursion is a reflection of the nature of
programming
languages, which include several different kinds of
nested structures.
For example, statements are often nested in other
statements. Also, parentheses in expressions must be
properly nested.
The syntax of these structures is naturally described
with recursive grammar rules.
EBNF is ideally suited for recursive-descent parsers.
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
38
Consider the following EBNF
description of simple arithmetic
expressions:
<expr> → <term> {(+| -) <term>}
<term> → <factor> {(*| /) <factor>}
<factor> → id | int_constant | (<expr> )
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
39
The recursive-descent subprogram for the first rule in the
previous example grammar, written in C, is
/* expr -- Parses strings in the language generated by the rule:
<expr> -> <term> {(+ | -) <term>} */
Void expr() {
printf("Enter <expr>\n");
/* Parse the first term */
term();
/* As long as the next token is + or -, get
the next token and parse the next term */
While (nextToken == ADD_OP || nextToken == SUB_OP) {
lex();
term();
}
printf("Exit <expr>\n");
} /* End of function expr */
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
40
The parsing subprogram for <term> is similar to that for
<expr>:
/* term--Parses strings in the language generated by the rule:
<term> -> <factor> {(* | /) <factor>) */
Void term() {
printf("Enter <term>\n");
/* Parse the first factor */
factor();
/* As long as the next token is * or /, get the
next token and parse the next factor */
While (nextToken == MULT_OP || nextToken == DIV_OP) {
lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
41
/* factor-- Parses strings in the Else {
language generated by the rule: If (nextToken == LEFT_PAREN) {
<factor> -> id | int_constant | lex();
( <expr> ) expr();
*/ If (nextToken == RIGHT_PAREN)
Void factor() { lex();
Printf ("Enter <factor>\n"); else
/* Determine which RHS */ error();
If (nextToken == IDENT || } /* End of if (nextToken == ... */
nextToken == INT_LIT) /* It was not an id, an integer literal,
/* Get the next token */ or a left
lex(); parenthesis */
else
/* If the RHS is ( <expr>), call error();
lex to pass over the left } /* End of else */
parenthesis, call expr, and check Printf ("Exit <factor>\n");
for the right parenthesis */ } /* End of function factor */
IUBAT CSC 461 May 1, 2025
Recursive-descent parser
42 Next token is: 10 Next lexeme is 47
Following is the trace of the parse of Enter <term>
the example expression (sum + 47) / Enter <factor>
total, using the parsing functions
expr, term, and factor, and the Next token is: 26 Next lexeme is )
function Lex from Lexical Analyzer Exit <factor>
Exit <term>
Next token is: 25 Next lexeme is ( Exit <expr>
Enter <expr>
Enter <term> Next token is: 24 Next lexeme is /
Enter <factor> Exit <factor>
Next token is: 11 Next lexeme is sum Next token is: 11 Next lexeme is
Enter <expr> total
Enter <term> Enter <factor>
Enter <factor>
Next token is: -1 Next lexeme is
Next token is: 21 Next lexeme is + EOF
Exit <factor> Exit CSC
<factor>
IUBAT 461 May 1, 2025
Exit <term> Exit <term>
Recursive-descent parser
43
Write EBNF rule (if necessary) for the following BNF
Grammar and the Recursive-Descent Subprograms
in C
for these Rules. Also show the Trace of the Parse for
((12 + x ) * y) Input String along with a Parse Tree.
S E
E E + T | E - T | T
T T * P | T / P | P
P ( E ) | char | number
IUBAT CSC 461 May 1, 2025