0% found this document useful (0 votes)

42 views98 pages

Chapter-2 Compiler Design

compiler designe

Uploaded by

elsayendale643

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views98 pages

Chapter-2 Compiler Design

compiler designe

Uploaded by

elsayendale643

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Ambo University Woliso Campus

Computer Science Department

Compiler Design

Chapter – two
Lexical analysis

1
Outline
 Introduction
 Interaction of Lexical Analyzer with Parser
 Token, pattern, lexeme
 Specification of patterns using regular expressions
 Regular expressions
 Regular expressions for tokens

 NFA and DFA

 Conversion from RE to NFA to DFA…
 Lex Scanner Generator
 Creating a Lexical Analyzer with Lex
 Regular Expressions in Lex
 Lex specifications and examples

2
Introduction
❑ The role of lexical analyzer is:
• to read a sequence of characters from the source
program
• group them into lexemes and
• produce as output a sequence of tokens for each
lexeme in the source program.

 The scanner can also perform the following

secondary tasks:
 stripping out blanks, tabs, new lines
 stripping out comments
 keep track of line numbers (for error reporting)

3
Interaction of the Lexical Analyzer
with the Parser

next char next token

lexical Syntax
analyzer analyzer
get next
char get next
token

Source
Program
symbol
table

(Contains a record
for each identifier)

token: smallest meaningful sequence of characters

of interest in source program
4
Token, pattern, lexeme
 A token is a sequence of characters from the source
program having a collective meaning.
 A token is a classification of lexical units.
- For example: id and num
 Lexemes are the specific character strings that make
up a token.
– For example: abc and 123A
 Patterns are rules describing the set of lexemes
belonging to a token.
– For example: “letter followed by letters and digits”
 Patterns are usually specified using regular expressions.
[a-zA-Z]*
Example: printf("Total = %d\n", score);

5
Token, pattern, lexeme…
 Example: The following table shows some tokens and
their lexemes in Pascal (a high level, case insensitive
programming language)
Token Some lexemes pattern
begin Begin, Begin, BEGIN, Begin in small or capital
beGin… letters
if If, IF, iF, If If in small or capital letters
ident Distance, F1, x, Dist1,… Letters followed by zero or
more letters and/or digits

• In general, in programming languages, the following are

tokens:
keywords, operators, identifiers, constants, literals,
punctuation symbols…
6
Specification of patterns using
regular expressions

 Regular expressions
 Regular expressions for tokens

7
Regular expression: Definitions

 Represents patterns of strings of characters.

 An alphabet Σ is a finite set of symbols
(characters)
 A string s is a finite sequence of symbols
from Σ
 |s| denotes the length of string s
 ε denotes the empty string, thus |ε| = 0
 A language L is a specific set of strings over
some fixed alphabet Σ
8
Regular expressions…
 A regular expression is one of the following:
Symbol: a basic regular expression consisting of a single
character a, where a is from:
▪ an alphabet Σ of legal characters;
▪ the metacharacter ε: or
▪ the metacharacter ø.
▪ In the first case, L(a)={a};
▪ in the second case, L(ε)= {ε};
▪ in the third case, L(ø)= { }.
▪ {} – contains no string at all.
▪ {ε} – contains the single string consists of no character
9
Regular expressions…
 Alternation: an expression of the form r|s, where r
and s are regular expressions.
 In this case , L(r|s) = L(r) U L(s) ={r,s}

 Concatenation: An expression of the form rs, where r

and s are regular expressions.
 In this case, L(rs) = L(r)L(s)={rs}

 Repetition: An expression of the form r*, where r is a

regular expression.
 In this case, L(r*) = L(r)* ={ε, r,…}

10
Regular expression: Language Operations

 Union of L and M
L ∪ M = {s |s ∈ L or s ∈ M}
 Concatenation of L and M
 LM = {xy | x ∈ L and y ∈ M}
 Exponentiation of L
 L0 = {ε}; Li = Li-1L The following shorthands
are often used:
 Kleene closure of L
 L* = ∪i=0,…,∞ Li r+ =rr*
r* = r+| ε
 Positive closure of L
r? =r|ε
 L+ = ∪i=1,…,∞ Li
11
RE’s: Examples

 L(01) = ?
 L(01|0) = ?
 L(0(1|0)) = ?
 Note order of precedence of operators.

 L(0*) = ?
 L((0|10)*(ε|1)) = ?

12
RE’s: Examples

 L(01) = {01}.
 L(01|0) = {01, 0}.
 L(0(1|0)) = {01, 00}.
 Note order of precedence of operators.

 L(0*) = {ε, 0, 00, 000,… }.

 L((0|10)*(ε|1)) = all strings of 0’s and 1’s
without two consecutive 1’s.

13
RE’s: Examples (more)

1- a|b=?
2- (a|b)a = ?
3- (ab) | ε = ?
4- ((a|b)a)* = ?

 Reverse
1 – Even binary numbers =?
2 – An alphabet consisting of just three alphabetic
characters: Σ = {a, b, c}. Consider the set of all strings
over this alphabet that contains exactly one b.

14
RE’s: Examples (more)

1- a | b = {a,b}
2- (a|b)a = {aa,ba}
3- (ab) | ε ={ab, ε}
4- ((a|b)a)* = {ε, aa,ba,aaaa,baba,....}

 Reverse
1 – Even binary numbers (0|1)*0
2 – An alphabet consisting of just three alphabetic
characters: Σ = {a, b, c}. Consider the set of all strings
over this alphabet that contains exactly one b.
(a | c)*b(a|c)* {b, abc, abaca, baaaac, ccbaca, cccccb}

15
Exercises
 Describe the languages denoted by the following
regular expressions:

1- a(a|b)*a
2- ((ε|a)b*)*
3- (a|b)*a(a|b)(a|b)
4- a*ba*ba*ba*

16
Regular Expressions (Summary)
 Definition: A regular expression is a string over
∑ if the following conditions hold:
1. ε, Ø, and a Є ∑ are regular expressions
2. If α and β are regular expressions, so is αβ
3. If α and β are regular expressions, so is α+β
4. If α is a regular expression, so is α*
5. Nothing else is a regular expression if it doesn’t
follow from (1) to (4)
 Let α be a regular expression, the language
represented by α is denoted by L(α).

17
Regular expressions for tokens

 Regular expressions are used to specify the

patterns of tokens.
 Each pattern matches a set of strings. It falls into
different categories:
 Reserved (Key) words: They are represented by
their fixed sequence of characters,
 Ex. if, while and do....
 If we want to collect all the reserved words into
one definition, we could write it as follows:
Reserved = if | while | do |...
18
Regular expressions for tokens…
 Special symbols: including arithmetic operators,
assignment and equality such as =, :=, +, -, *
 Identifiers: which are defined to be a sequence of
letters and digits beginning with letter,
 we can express this in terms of regular definitions as
follows:
letter = A|B|…|Z|a|b|…|z
digit = 0|1|…|9
or
letter= [a-zA-Z]
digit = [0-9]
identifiers = letter(letter|digit)*
19
Regular expressions for tokens…
 Numbers: Numbers can be:
 sequence of digits (natural numbers), or
 decimal numbers, or
 numbers with exponent (indicated by an e or E).
 Example: 2.71E-2 represents the number 0.0271.
 We can write regular definitions for these numbers as
follows:
nat = [0-9]+
signedNat = (+|-)? Nat
number = signedNat(“.” nat)?(E signedNat)?
 Literals or constants: which can include:
 numeric constants such as 42, and
 string literals such as “ hello, world”.
20
Regular expressions for tokens…

❑ relop → < | <= | = | <> | > | >=

 Comments: Ex. /* this is a C comment*/
 Delimiter → newline | blank | tab | comment
 White space = (delimiter )+

21
Example: Divide the following Java program into
appropriate tokens.
public class Dog {
private String name;
private String color;

public Dog(String n, String c) {

name = n;
color = c;
}

public String getName() { return name; }

public String getColor() { return color; }

public void speak() {

System.out.println("Woof");
} }

22
Automata
 Abstract machines
Characteristics
 Input: input values (from an input alphabet ∑) are applied
to the machine

 Output: outputs of the machine

 States: at any instant, the automation can be in one of

the several states

 State relation: the next state of the automation at any

instant is determined by the present state and the present
input

23
Automata: cont’d

 Types of automata
 Finite State Automata (FSA)
• Deterministic FSA (DFSA)
• Nondeterministic FSA (NFSA)

 Push Down Automata (PDA)

• Deterministic PDA (DPDA)
• Nondeterministic PDA (NPDA)

24
Finite Automata

 Finite State Automaton

Finite Automaton, Finite State Machine, FSA or FSM

 An abstract machine which can be used to

implement regular expressions (etc.).
 Has a finite number of states, and a finite amount
of memory (i.e., the current state).
 Can be represented by directed graphs or
transition tables

25
Finite-state Automata…
state

a b c a  = { a, b, c }
0 1 2 3 4

final state
start state transition

Input
• Representation State a b c
– An FSA may also be
represented with a 0 1  
state-transition table. 1  2 
The table for the 2   3
above FSA:
3 4  
4    26
Design of a Lexical Analyzer/Scanner
Finite Automata
❑ Lex – turns its input program into lexical analyzer.
❑ Finite automata are recognizers; they simply say "yes"
or "no" about each possible input string.
❑ Finite automata come in two flavors:

a) Nondeterministic finite automata (NFA) have no restrictions

on the labels of their edges.
ε, the empty string, is a possible label.
b) Deterministic finite automata (DFA) have, for each state,
and for each symbol of its input alphabet exactly one edge
with that symbol leaving that state.

27
The Whole Scanner Generator Process
Overview
❑ Direct construction of Nondeterministic finite
Automation (NFA) to recognize a given regular
expression.
❑ Easy to build in an algorithmic way
❑ Requires ε-transitions to combine regular sub expressions
❑ Construct a deterministic finite automation
(DFA) to simulate the NFA Optional
❑ Use a set-of-state construction
❑ Minimize the number of states in the DFA
❑ Generate the scanner code.
28
Design of a Lexical Analyzer …
 Token ➔ Pattern
 Pattern ➔ Regular Expression
 Regular Expression ➔ NFA
 NFA ➔ DFA
 DFA’s or NFA’s for all tokens ➔ Lexical Analyzer

29
Non-Deterministic Finite Automata
(NFA)
Definition
 An NFA M consists of five tuples: ( Σ,S, T, S0, F)
 A set of input symbols Σ, the input alphabet
 a finite set of states S,
 a transition function T: S × (Σ U { ε}) -> S (next state),
 a start state S0 from S, and
 a set of accepting/final states F from S.
 The language accepted by M, written L(M), is defined as:
The set of strings of characters c1c2...cn with each ci from
Σ U { ε} such that there exist states s1 in T(s0,c1), s2 in
T(s1,c2), ... , sn in T(sn-1,cn) with sn an element of F.

30
NFA…
 It is a finite automata which has choice of
edges
• The same symbol can label edges from one state to
several different states.
 An edge may be labeled by ε, the empty
string
• We can have transitions without any input
character consumption.

31
Transition Graph
 The transition graph for an NFA recognizing the
language of regular expression (a|b)*abb
all strings of a's and b's ending in the
particular string abb
a

start a b b
0 1 2 3

b S={0,1,2,3}
Σ={a,b}
S0=0
F={3}
32
Transition Table
 The mapping T of an NFA can be represented
in a transition table
State Input Input Input
a b ε
0 {0,1} {0} ø

T(0,a) = {0,1} 1 ø {2} ø

T(0,b) = {0}
T(1,b) = {2} 2 ø {3} ø
T(2,b) = {3}
3 ø ø ø

The language defined by an NFA is the set of input

strings it accepts, such as (a|b)*abb for the example
NFA
33
Acceptance of input strings by NFA
 An NFA accepts input string x if and only if there is
some path in the transition graph from the start
state to one of the accepting states
 The string aabb is accepted by the NFA:

a a b b
0 0 1 2 3 YES

a a b b
0 0 0 0 0 NO

34
Another NFA
a

a


start
b
b


An -transition is taken without consuming any character from

the input.
What does the NFA above accepts?

aa*|bb*
35
Deterministic Finite Automata (DFA)

 A deterministic finite automaton is a special

case of an NFA
 No state has an ε-transition
 For each state S and input symbol a there is at
most one edge labeled a leaving S
 Each entry in the transition table is a single state
 At most one path exists to accept a string
 Simulation algorithm is simple

36
DFSA: Example
a S = {S, A, B, C, D}
S a A
∑ = {a, b}
b a So = S
b
F = {C, D}
B C b D
a
b
State Input Next state Check whether the following
S a A
S b B strings are accepted or not:
A a A • ab
A b C • ba
B b B • bbaba
B a C
C b D • aa
D a D • aaabbaaa

37
DFA example
A DFA that accepts (a|b)*abb

38
Simulating a DFA: Algorithm
How to apply a DFA to a string.
INPUT:
 An input string x terminated by an end-of-file character
eof.
 A DFA D with start state So, accepting states F, and
transition function move.
OUTPUT: Answer ''yes" if D accepts x; "no" otherwise
METHOD
 Apply the algorithm in (next slide) to the input string x.
 The function move(s, c) gives the state to which there is
an edge from state s on input c.
 The function nextChar() returns the next character of
the input string x.
39
Simulating a DFA
s = so;
c = nextchar();
while ( c != eof ) {
s = move(s, c);
c = nextchar();
}
if ( s is in F ) return
"yes";
DFA accepting (a|b)*abb
else return "no";

Given the input string ababb, this DFA enters the

sequence of states 0,1,2,1,2,3 and returns "yes"
40
DFA: Exercise

 Construct DFAs for the string matched by the

following definition:
digit =[0-9]
nat=digit+
signednat=(+|-)?nat
number=signednat(“.”nat)?(E signedNat)?

41
Why do we study RE,NFA,DFA?
 Goal: To scan the given source program
 Process:
 Start with Regular Expression (RE)
 Build a DFA
• How?
– We can build a non-deterministic finite automaton, NFA
(Thompson's construction)
– Convert that to a deterministic one, DFA
(Subset construction)
– Minimize the DFA (optional)
(different algorithms)
 Implement it
 Existing scanner generator: Lex/Flex
42
RE→NFA→DFA→ Minimize DFA states
 Step 1: Come up with a Regular Expression
(a|b)*ab
 Step 2: Use Thompson's construction to create
an NFA for that expression

43
RE→NFA→DFA→ Minimize DFA states
 Step 1: Come up with a Regular Expression
(a|b)*ab
 Step 2: Use Thompson's construction to create
an NFA for that expression

44
RE→NFA→DFA→ Minimize DFA states
Step 3: Use subset construction to convert the NFA to a DFA

States 0 and 2 behave

Step 4: Minimize the DFA states the same way, so they can
be merged.

45
Design of a Lexical Analyzer Generator

Regular Expression DFA

Two algorithms:
1- Translate a regular expression into an NFA
(Thompson’s construction)

2- Translate NFA into DFA

(Subset construction)

46
From regular expression to an NFA
 It is known as Thompson’s construction.

Rules:
1- For an ε, a regular expressions, construct:

start a

47
From regular expression to an NFA…
2- For a composition of regular expression:
 Case 1: Alternation: regular expression(s|r), assume
that NFAs equivalent to r and s have been
constructed.

48
48
From regular expression to an NFA…
 Case 2: Concatenation: regular expression sr

ε
…r …s

Case 3: Repetition r*

49
From RE to NFA:Exercises

 Construct NFA for token identifier.

letter(letter|digit)*
 Construct NFA for the following regular
expression:
(a|b)*abb

50
From an NFA to a DFA
(subset construction algorithm)

 Input NFA N Both accept the same

Output DFA D language usage (RE)

Rules:
 Start state of D is assumed to be unmarked.
 Start state of D is = ε-closer (S0),
where S0 -start state of N.

51
NFA to a DFA…
ε- closure
ε-closure (S’) – is a set of states with the following
characteristics:
1- S’ € ε-closure(S’) itself
2- if t € ε-closure (S’) and if there is an edge labeled
ε from t to v, then v € ε-closure (S’)
3- Repeat step 2 until no more states can be added
to ε-closure (S’).
E.g: for NFA of (a|b)*abb
ε-closure (0)= {0, 1, 2, 4, 7}
ε-closure (1)= {1, 2, 4}

52
NFA to a DFA…
Algorithm
While there is unmarked state
X = { s0, s1, s2,..., sn} of D do
Begin
Mark X
For each input symbol ‘a’ do
Begin
Let T be the set of states to which there is a transition ‘a’ from state si
in X.
Y= ε-Closer (T)
If Y has not been added to the set of states of D then {
Mark Y an “Unmarked” state of D add a transition from X to Y labeled a
if not already presented
}
End
End 53
NFA for identifier: letter(letter|digit)*
ε

letter
3 4
ε ε
start
letter ε ε
0 1 2 7 8
digit ε
ε 5 6

54
NFA to a DFA…
Example: Convert the following NFA into the corresponding
DFA. letter (letter|digit)*
A={0}
B={1,2,3,5,8}
start letter C={4,7,2,3,5,8}
A B
D={6,7,8,2,3,5}

letter digit
letter
digit D digit
C

letter

55
Exercise: convert NFA of (a|b)*abb in to DFA.

56
Other Algorithms

 How to minimize a DFA ? (see Dragon Book

3.9, pp.173)
 How to convert RE to DFA directly ? (see
Dragon Book 3.9.5 pp.179)

57
The Lexical- Analyzer Generator: Lex
 The first phase in a compiler is, it reads the
input source and converts strings in the source
to tokens.
 Lex: generates a scanner (lexical analyzer or
lexer) given a specification of the tokens using
REs.
 The input notation for the Lex tool is referred to as
the Lex language and
 The tool itself is the Lex compiler.
 The Lex compiler transforms the input patterns into a
transition diagram and generates code, in a file
called lex.yy.c, that simulates this transition diagram.
58
Lex…

 By using regular expressions, we can specify

patterns to lex that allow it to scan and match
strings in the input.
 Each pattern in lex has an associated action.
 Typically an action returns a token, representing
the matched string, for subsequent use by the
parser.
 It uses patterns that match strings in the input and
converts the strings to tokens.

59
General Compiler Infra-structure
Parse tree
Program source Tokens Parser
Scanner Semantic
(tokenizer) Routines
(stream of
characters) Annotated/decorated
tree

Analysis/
Transformations/
Symbol and optimizations
literal Tables
IR: Intermediate
Representation

Code
Generator

Assembly code

60
Scanner, Parser, Lex and Yacc

6161
Generating a Lexical Analyzer using Lex
Lex is a scanner generator ----- it takes lexical specification as
input, and produces a lexical analyzer written in C.

Lex source
program Lex compiler lex.yy.c
lex.l

lex.yy.c
gcc compiler a.out

Input stream Sequence of

a.out tokens

Lexical Analyzer
62
Lex specification
➢ Program structure C declarations in %{
...declaration section... %}

%%
P1 { action1 }
...rule section... P2 { action2 }
%%
...user defined functions...
 Rules section – regular expression <--> action.
• The actions are C program.
 Declaration section – variables, constants

63
The rules section
%%
[RULES SECTION]

<pattern> { <action to take when matched> }

<pattern> { <action to take when matched> }
…
%%

Patterns are specified by regular expressions.

For example:
%%
[A-Za-z] + { printf(“this is a word”); }
%%

64
Design of a Lexical Analyzer Generator:
RE to NFA to DFA

NFA sim. alg

Thompson’s
construction

DFA sim. alg

65
Simulating an NFA
❑ INPUT:
❑ An input string x terminated by an end-of-file character eof.
❑ NFA N with start state So, accepting states F, and transition
function move.
❑ OUTPUT: Answer "yes " if M accepts x; "no" otherwise.
Algorithm
S = ε-closure(So);
c = nextchar();
while ( c != eof ) {
S = ε- closure (move(S, c)) ;
c = nextchar();
}
if ( S in F != Ф ) return “yes”;
else return "no";
66
Combining and simulation of NFAs of a Set of
Regular Expressions: Example 1
start a
a {action1} 1 2
start b
abb {action2} a b
3 4 5 6
a*b+ {action3}
start a
Must find the longest b
prefix match: 7 b 8
Continue until no further
moves are possible a Action 1
ε 1 2
start b
a a b a*b+ b
0 ε 3 a 4 5 6
0 2 7 8
1 4 a ε b Action 2
3
7 8 b
7
None a
7 Action 3
Action 3
67
Simulating NFA…

ε-closure({0}) = {0,1,3,7}
move({0,1,3,7},a) = {2,4,7}
ε-closure({2,4,7}) = {2,4,7}
move({2,4,7},a) = {7}
ε-closure({7}) = {7}
move({7},b) = {8}
ε-closure({8}) = {8}
move({8},a) = ∅

68
Combining and simulation of NFAs of a Set of
Regular Expressions: Example 2
start a
a {action1} 1 2
start b
abb {action2} a b
3 4 5 6
a*b+ {action3}
start a
When two or more b
accepting states are 7 b 8
reached, the first action
is executed a Action 1
ε 1 2
start b
a b b b
0 ε 3 a 4 5 6
0 2 5 6
1 4 8 8 ε b Action 2
a 7 b
3 7 8
7 None a Action 3
Action 2
Action 3 69
DFA's for Lexical Analyzers
NFA DFA. Transition table for DFA

State a b Token
found
0137 247 8 None
247 7 58 a
8 - 8 a*b+
7 7 8 None
58 - 68 a*b+
68 - 8 abb

Example: simulate the above DFA for input abba

70
Lex Regular Expression Basics
. : matches everything except \n
* : matches 0 or more instances of the preceding regular expression
+ : matches 1 or more instances of the preceding regular expression
? : matches 0 or 1 of the preceding regular expression
| : matches the preceding or following regular expression
[xyz ] : match one character x,y,or z
[^xyz] : match any character except x,y, and z
() : groups enclosed regular expression into a new regular expression
“…” : matches everything within the “ “ literally
x :x, but only at beginning of line
x$ :x, but only at end of line
{d} : match the regular expression defined by d.

71
Pattern matching examples

72
Meta-characters

 meta-characters (do not match themselves, because

they are used as a special symbols in reg exps):
()[]{}<>+/,^*|.\"$?-%

 to match a meta-character, prefix with "\"

 to match a backslash, tab or newline, use \\, \t, or \n

73
Lex Regular Expression: Examples

• an integer: 12345

[1-9][0-9]*
• a word: cat
[a-zA-Z]+
• a (possibly) signed integer: 12345 or -12345
[-+]?[1-9][0-9]*
• a floating point number: 1.2345
[0-9]*”.”[0-9]+

1. lex will always match the longest (number of

characters) token possible.

2. If two or more possible tokens are of the same

length, then the token with the regular expression
that is defined first in the lex specification is
favored.

76
Lex variables
yyin - of the type FILE*. This points to the current file
being scanned by the lexer.
yyout - Of the type FILE*. This points to the location
where the output of the lexer will be written.
• By default, both yyin and yyout point to standard input
and output.
yytext – variable, a pointer to the matched strings (char
*)
yyleng - Gives the length of the matched pattern.
yylineno - Provides current line number information.

77
Lex functions

yylex() - The function that starts the analysis. It is

automatically generated by Lex.
yywrap() - This function is called when end of file (or input)
is encountered. If this function returns 1, the parsing
stops.
yymore() - This function tells the lexer to append the next
token to the current token.
input() – read next character from yyin. This is the function
invoked by yylex() to read the input.
output() – write yytext to yyout. This is the function
invoked by yylex() to write the output.

78
Lex predefined variables

79
Let us run a lex program

80
Lex : programs
 The first example is the shortest possible lex file:
%%
 Input is copied to output, one character at a time.
 The first %% is always required, as there must
always be a rules section.
 However, if we don’t specify any rules, then the
default action is to match everything and copy it to
output.
 Defaults for input and output are stdin and stdout,
respectively.
 Here is the same example, with defaults explicitly
coded:
81
Rule %%
section /* match everything except newline */
. ECHO;
/* match newline */
\n ECHO;
%%
int yywrap(void) { Invokes the
return 1; Lexical
analyzer
}
int main(void) {
User yylex();
definition return 0;
section
}

82
Developing Lexical analyzer using
Lex : Linux (Fedora)
 vi – used to edit lex and yacc source files.
 w – save
 q – quit
 w filename – save as
 wq – save and quit
 q! – exit overriding change

➢ To start , go to application → System tools →

terminal
83
Example 1:Compiling and running C
program
1- vi hello.c
2- i insert
3- #include<stdio.h>
int main() {
printf(“Hello World ”);
return 0;
}
4- escape
5- : wq
6- gcc hello.c
7- ./a.out
84
How to compile and run LEX programs
test.l → lex.yy.c →gcc →test (scanner)
1. lex test.l
2. gcc lex.yy.c -ll
3. ./a.out<hello.c
➢ Implementation example 1
1. vi lab1.l
2. i → insert mode
3. %%
. ECHO;
\n ECHO;
%%

85
How to compile and run LEX programs...

4. Press esc
5. Press :wq
6. lex lab1.l
7. gcc lex.yy.c -ll
8. ./a.out <hello.c

. Every character except new line

\n new line character
ECHO → display on screen

86
Examples (more) Regular
definitions
%% %{
/*Match every thing #include <stdio.h>
except new line*/ %}
digit [0-9]
. ECHO;
letter [A-Za-z]
/*Match new line*/ id {letter}({letter}|{digit})*
\n ECHO; %%
%% {digit}+ { printf(“number: %s\n”, yytext); }
int yywrap(void) { {id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
return 1; %%
} main()
int main(void) { {
yylex(); yylex();
} Translation
retrun 0; rules
}

87
Example :Finding the number of identifier in a given
program
digit [0-9]
letter [A-Za-z]
%{
int count=0;
%}
%%
{letter}({letter}/{digit})* count++;
%%
int main(void) {
yylex();
printf(“The number of identifiers are=%4d\n”,count);
return 0; }

88
Example: Here is a scanner that counts the number of
characters, words, and lines in a file.
%{
int nchar, nword, nline;
%}
%%
\n { nline++;}
[^ \t\n]+ { nword++, nchar += yyleng; }
. { nchar++; }
%%
int main(void) {
yylex();
printf("%d\t%d\t%d\n", nchar, nword, nline);
return 0;
}

89
%{ /* definitions of manifest constants */
LT, LE, EQ, NE, GT, GE, IF, THEN, ELSE, ID, NUMBER,
Regular definitions
RELOP */
%}
delim [ \t\n]
ws {delim}+ Return token to parser
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
{ws} {/*no action and no return*/ }
if {return IF;} Token attribute
then {return THEN;}
else {return ELSE;}
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
“<“ {yylval = LT; return RELOP;}
“<=“ {yylval = LE; return RELOP;}
“=“ {yylval = EQ; return RELOP;}
Install yytext as identifier
“<>“ {yylval = NE; return RELOP;} in symbol table
“>“ {yylval = GT; return RELOP;}
“>=“ {yylval = GE; return RELOP;}
%%
int install_id() {}
int install_num() {} 90
Assignment on Lexical Analyzer

91
Individual

1. Write a program in LEX to count the no of

consonants and vowels for a given C and C++ source
programs.
2. Write a program in LEX to count the no of:
(i) positive and negative integers
(ii) positive and negative fractions.
For C and C++ source programs
3. Write a LEX program to recognize a valid C, C++, and
Java programs.

92
The MINI Language Introduction
 Assumptions:
 Source code – MINI language
 Target code – Assembly language

 Specifications:
➢ There are no procedures and declarations.
➢ All variables are integer variables, and variables are
declared simply by assigning values to them.
➢ There are only two control statements:
✓ An if – statement and
✓ A repeat statement
➢ Both the control statements may themselves
contain statement sequences.
93
The MINI Language Introduction...
➢ An if – statement has an optional else part and must
be terminated by the key word end.
➢ There are also read and write statements that
perform input/output.
➢ Comments are allowed with curly brackets,
comments cannot be nested.
➢ Expression in MINI are also limited to Boolean and
integer arithmetic expressions.
➢ A Boolean expressions consists of a comparison of
two arithmetic expressions using either of the two
comparison operators < and =.

94
The MINI Language...
➢ An arithmetic expression may involve integer constants,
variables, parenthesis, and any of the four integer
operators +, -, *, and / (integer division).
➢ Boolean expressions may appear only as tests in
control statements – i.e. There are no Boolean
variables, assignment, or I/O.
➢ Here is a sample program in this language for factorial
function.

95
{ sample program
in MINI language – computes factorials
}
read x; { input an integer }
if x > 0 then { don’t compute if x<= 0}
fact:= 1;
repeat
fact := fact * x ;
X:= x-1
until x = 0;
write fact { output factorial of x}
end

96
The MINI Language...
 In addition to the tokens, MINI has the following
lexical conventions:
➢ Comments : are enclosed in curly brackets {...} and
cannot be nested.
➢ White space : consists of blanks, tabs, and
newlines.
➢ The principles of longest substring is followed in
recognizing tokens.

97
Design a scanner for MINI language

 In designing a scanner for this language:

1. Start with regular expressions
2. Identify Tokens...
3. Develop and simulate NFA
4. Construct and simulate DFA
5. Write a lex program, to recognize tokens in
MINI language:
• Input: MINI language
• Output: Tokens..,

Chapter 2
No ratings yet
Chapter 2
99 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Lexical Analysis and Token Recognition
100% (3)
Lexical Analysis and Token Recognition
51 pages
2 - Compilers (Lexical Analysis)
No ratings yet
2 - Compilers (Lexical Analysis)
60 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Compiler Design: Lexical Analysis Basics
No ratings yet
Compiler Design: Lexical Analysis Basics
52 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
CD ch2
No ratings yet
CD ch2
104 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Token, Lexemes and Regular Expression
No ratings yet
Token, Lexemes and Regular Expression
22 pages
Regular Expressions in Compiler Design
No ratings yet
Regular Expressions in Compiler Design
44 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
27 pages
Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
67 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Compiler
No ratings yet
Compiler
60 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
10 pages
Lecture 06
No ratings yet
Lecture 06
27 pages
Lect2 Lexical
No ratings yet
Lect2 Lexical
9 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
88 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Regualr Languages
No ratings yet
Regualr Languages
29 pages
Mini Project N&sa
No ratings yet
Mini Project N&sa
2 pages
Advanced Database Systems
No ratings yet
Advanced Database Systems
3 pages
Operating System
No ratings yet
Operating System
4 pages
Artificial Intelligence Lab Manual I
No ratings yet
Artificial Intelligence Lab Manual I
46 pages
Web Programming Cource
No ratings yet
Web Programming Cource
3 pages
CH 5
No ratings yet
CH 5
41 pages
Course
No ratings yet
Course
52 pages
AI Lecture Notes
No ratings yet
AI Lecture Notes
131 pages
IP Assigniment 1
No ratings yet
IP Assigniment 1
1 page
Research Methods
No ratings yet
Research Methods
160 pages
74F382 4-Bit Arithmetic Logic Unit: General Description Features
No ratings yet
74F382 4-Bit Arithmetic Logic Unit: General Description Features
9 pages
Concepts in Programming Languages 1st Edition John C. Mitchell Download
No ratings yet
Concepts in Programming Languages 1st Edition John C. Mitchell Download
47 pages
Econ 151
No ratings yet
Econ 151
4 pages
PLINK3 Operation Ver.0.04.2004.10
No ratings yet
PLINK3 Operation Ver.0.04.2004.10
204 pages
Student's Digital Archive
No ratings yet
Student's Digital Archive
51 pages
Using Database Partitioning With Oracle E-Business Suite (Doc ID 554539.1)
No ratings yet
Using Database Partitioning With Oracle E-Business Suite (Doc ID 554539.1)
39 pages
BIG-IP CGNAT Implementations
No ratings yet
BIG-IP CGNAT Implementations
208 pages
3d Modelling For Virtual Reality: Tutorial #2 - VRML Sliding Door!
No ratings yet
3d Modelling For Virtual Reality: Tutorial #2 - VRML Sliding Door!
12 pages
(Ebook PDF) Business Driven Information Systems 6 Edition by Paige Baltzan Download
100% (2)
(Ebook PDF) Business Driven Information Systems 6 Edition by Paige Baltzan Download
50 pages
Build Your First AI Business in 6 Hours (Ultimate Beginner Guide)
No ratings yet
Build Your First AI Business in 6 Hours (Ultimate Beginner Guide)
10 pages
Chapter 6 - Review Questions
No ratings yet
Chapter 6 - Review Questions
6 pages
Database Backup REPORT - Updated
No ratings yet
Database Backup REPORT - Updated
19 pages
American Eagle Pickup Notification
No ratings yet
American Eagle Pickup Notification
5 pages
Metaverse and Education
No ratings yet
Metaverse and Education
15 pages
Traffic Engineering Answer Key
No ratings yet
Traffic Engineering Answer Key
2 pages
Cain & Abel ARP Poisoning Lab Guide
No ratings yet
Cain & Abel ARP Poisoning Lab Guide
32 pages
ECDIS Passage Planning Guide
No ratings yet
ECDIS Passage Planning Guide
5 pages
Accordion Arduino Mega Code
No ratings yet
Accordion Arduino Mega Code
9 pages
DIY Smart Home with Arduino
100% (1)
DIY Smart Home with Arduino
17 pages
VLSI DFT Training Insights
No ratings yet
VLSI DFT Training Insights
2 pages
Fullstack Development
No ratings yet
Fullstack Development
4 pages
Global CPA/CPI Offers Overview
No ratings yet
Global CPA/CPI Offers Overview
12 pages
Canoga 9145
No ratings yet
Canoga 9145
2 pages
Deepfake's Impact on Trust
No ratings yet
Deepfake's Impact on Trust
7 pages
Handbook of Elliptic Integrals For Engineers and Scientists: Paul F. Byrd - Morris D. Friedman
No ratings yet
Handbook of Elliptic Integrals For Engineers and Scientists: Paul F. Byrd - Morris D. Friedman
4 pages
Product:: Patchpro Rj45 Patch Cord, 10Gx, S/FTP LSZH
No ratings yet
Product:: Patchpro Rj45 Patch Cord, 10Gx, S/FTP LSZH
3 pages
Statistical Process Monitoring Using Advanced Data-Driven and Deep Learning Approaches: Theory and Practical Applications 1st Edition Fouzi Harrou
No ratings yet
Statistical Process Monitoring Using Advanced Data-Driven and Deep Learning Approaches: Theory and Practical Applications 1st Edition Fouzi Harrou
55 pages
Managing Attrition in The Indian Information Technology Industry
No ratings yet
Managing Attrition in The Indian Information Technology Industry
5 pages
Evermotion Archmodels 81 PDF
No ratings yet
Evermotion Archmodels 81 PDF
2 pages
9A04306 Digital Logic Design
No ratings yet
9A04306 Digital Logic Design
4 pages

Chapter-2 Compiler Design

Uploaded by

Chapter-2 Compiler Design

Uploaded by

Ambo University Woliso Campus

Computer Science Department

 NFA and DFA

 The scanner can also perform the following

next char next token

token: smallest meaningful sequence of characters

• In general, in programming languages, the following are

 Represents patterns of strings of characters.

 Concatenation: An expression of the form rs, where r

 Repetition: An expression of the form r*, where r is a

 L(0*) = {ε, 0, 00, 000,… }.

 Regular expressions are used to specify the

❑ relop → < | <= | = | <> | > | >=

public Dog(String n, String c) {

public String getName() { return name; }

public String getColor() { return color; }

public void speak() {

 Output: outputs of the machine

 States: at any instant, the automation can be in one of

 State relation: the next state of the automation at any

 Push Down Automata (PDA)

 Finite State Automaton

 An abstract machine which can be used to

a) Nondeterministic finite automata (NFA) have no restrictions

T(0,a) = {0,1} 1 ø {2} ø

The language defined by an NFA is the set of input

An -transition is taken without consuming any character from

 A deterministic finite automaton is a special

Given the input string ababb, this DFA enters the

 Construct DFAs for the string matched by the

States 0 and 2 behave

Regular Expression DFA

2- Translate NFA into DFA

 Construct NFA for token identifier.

 Input NFA N Both accept the same

 How to minimize a DFA ? (see Dragon Book

 By using regular expressions, we can specify

Input stream Sequence of

<pattern> { <action to take when matched> }

Patterns are specified by regular expressions.

NFA sim. alg

DFA sim. alg

Example: simulate the above DFA for input abba

 meta-characters (do not match themselves, because

 to match a meta-character, prefix with "\"

 to match a backslash, tab or newline, use \\, \t, or \n

1. lex will always match the longest (number of

2. If two or more possible tokens are of the same

yylex() - The function that starts the analysis. It is

➢ To start , go to application → System tools →

. Every character except new line

1. Write a program in LEX to count the no of

 In designing a scanner for this language:

You might also like