KEMBAR78
ATC-21CS51 Module 1 To 5 Notes | PDF | Theory Of Computation | Parsing
0% found this document useful (0 votes)
4K views419 pages

ATC-21CS51 Module 1 To 5 Notes

Uploaded by

shravanyadn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4K views419 pages

ATC-21CS51 Module 1 To 5 Notes

Uploaded by

shravanyadn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 419

(Accredited by NAAC)

Unit of A.SHAMA RAO FOUNDATION

VALACHIL, MANGALURU-574 143


(AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY,BALAGAVI)

AUTOMATA THEORY & COMPILER DESIGN


21CS51
[2021 SCHEME]

Notes
V SEMESTER

Prepared By

Mr. Athmaranjan K
CONTENTS
AUTOMATA THEORY & COMPILER DESIGN
21CS51
Module No. Topics Page No.

Introduction to Automata Theory : Central concepts of


Automata Theory, Deterministic (DFA) and Non-Deterministic
Finite Automata (NFA), ε- NFA, NFA to DFA Conversion. 2
Module 1 Minimization of DFA
Introduction to Compiler Design: Language Processors, Phases
66
of Compilers.
Regular Expressions and Languages: Regular Expressions,
Finite Automata and Regular Expressions, Proving Languages Not 78
to Be Regular
Module 2
Lexical Analysis Phase of compiler Design: Role of Lexical
Analyzer, Input Buffering , Specification of Token, Recognition 118
of Token.
Context Free Grammars: Definition and designing CFGs,
Derivations Using a Grammar, Parse Trees, Ambiguity and
135
Elimination of Ambiguity, Elimination of Left Recursion, Left
Module 3
Factoring
Syntax Analysis Phase of Compilers: part-1: Role of Parser ,
182
Top-Down Parsing
Push Down Automata: Definition of the Pushdown Automata,
216
The Languages of a PDA.
Module 4 Syntax Analysis Phase of Compilers: Part-2: Bottom-up
Parsing, Introduction to LR Parsing: SLR, More Powerful LR 231
parsers.
Introduction to Turing Machine: Problems that Computers
Cannot Solve, The Turing machine, problems, Programming
Techniques for Turing Machine, Extensions to the Basic Turing
311
Machine
Undecidability: A language That Is Not Recursively Enumerable,
Module 5 An Undecidable Problem That Is RE.
Other Phases of Compilers: Syntax Directed Translation-
Syntax-Directed Definitions, Evaluation Orders for SDD’s.
Intermediate-Code Generation- Variants of Syntax Trees, 371
Three-Address Code. Code Generation- Issues in the Design of a
Code Generator
------------------------------------------------------------------------------------------------------------------------------------------------
Department of Information Science & Engineering, SIT Mangalore.
------------------------------------------------------------------------------------------------------------------------------------------------
Department of Information Science & Engineering, SIT Mangalore.
------------------------------------------------------------------------------------------------------------------------------------------------
Department of Information Science & Engineering, SIT Mangalore.
Module 1
---------------------------------------------------------------------------------------------------------------------
Introduction to Automata Theory:

 Central Concepts of Automata theory

 Deterministic Finite Automata(DFA)

 Non- Deterministic Finite Automata(NFA)

 Epsilon- NFA

 NFA to DFA Conversion

 Minimization of DFA

Introduction to Compiler Design:


 Language Processors
 Phases of Compilers
--------------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.

2. Alfred V. Aho, Monica S.Lam,Ravi Sethi, Jeffrey D. Ullman, “ Compilers Principles,


Techniques and Tools”, Second Edition,Perason.
Textbook 1:

 Chapter1 – 1.5

 Chapter2 – 2.2, 2.3, 2.5

 Chapter4 –4.4

Textbook 2:

 Chapter1 – 1.1 and 1.2

Page 1
Automata Theory & Compiler Design 21CS51 Module 1

INTRODUCTION TO AUTOMATA THEORY


Computation is any type of calculation or use of computing technology in information
processing. Computation is a process evoked when a computational model acts on its inputs under
the control of an algorithm to produce its results.
The study of computation is paramount to the discipline of computer science.
What are the capabilities and limitations of computers? We seek mathematically precise answers:
Complexity Theory: Easy problem: sorting. Hard problem: scheduling. What makes some problems
computationally hard and others easy?
Computability Theory: Which problems are solvable by computers and which are not? Both
Complexity Theory and Computability Theory require a precise definition of a computer. Automata
Theory deals with definitions and properties of mathematical models of computation. Theory of
computation begins with a computational model. A computational model may be accurate in some
ways, but not in others.
Introduction to Finite Automata:
Before there were computers in 1930‟s, Allan Turing studied an abstract machine that had all the
capabilities of today‟s computers, at least as far as in what they could compute. His main goal was to
describe precisely the boundary between what a computer machine could do and what it could not
do. His conclusion not only applied to his abstract Turing machines, but also today‟s real machines.
The simpler kinds of machines which exist in between 1940‟s and 1950‟s, which we today call it as
“Finite Automata” were studied by number of researchers. These automata, originally proposed to
model brain function, turned out to be extremely useful for other purposes. These theoretical
developments bear directly on what computer scientists do today. Some of the concepts like finite
automata, regular expressions, and formal grammars are used in design and construction of important
kinds of software, including parts of compilers. Other concepts like Turing machine help us
understand and what we can expect from our software.
Automata theory is the study of abstract computing devices or machines. Like an execution of an
algorithm is carried out automatically by a computer. In this subject we use only abstract machines to
execute our programs. An abstract is a brief summary of research article, thesis or a document. It is
nothing but a theoretical concept without thinking of a specific example.
Finite automata can compute only very primitive functions; therefore, it is not an adequate
computation model. In addition, a finite-state machine's inability to generalize computations hinders
its power.
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 2
Automata Theory & Compiler Design 21CS51 Module 1

Imagine a Modern CPU. Every bit in a machine can only be in two states (0 or 1). Therefore, there
are a finite number of possible states. In addition, when considering the parts of a computer a CPU
interacts with, there are a finite number of possible inputs from the computer's mouse, keyboard,
hard disk, different slot cards, etc. As a result, one can conclude that a CPU can be modeled as a
finite-state machine.
The Turing machine can be thought of as a finite automaton or control unit equipped with an infinite
storage (memory). Its "memory" consists of an infinite number of one-dimensional array of cells.
Turing's machine is essentially an abstract model of modern-day computer execution and storage,
developed in order to provide a precise mathematical definition of an algorithm or mechanical
procedure.
Why to Study Theory of Computation:
Why do we need to study Automata Theory (Theory of Computation)?
Theory of computation lays a strong foundation for a lot of abstract areas of computer science. TOC
teaches you about the elementary ways in which a computer can be made to think. Implementations
come and go, but today's programmers can't read code from 50 years ago. Programmers from the
early days could never have imagined what a program of today would look like. In the face of that
kind of change, TOC is very important to study the mathematical properties, of problems and of
algorithms for solving problems that depend on neither the details of today's technology nor the
programming fashion of early days. It is desirable to know which problem can be algorithmically
solved and which cannot. Understanding which problems can be algorithmically solved is one of the
main objectives of theory of computation.
 TOC provides a set of abstract structures that are useful for solving certain classes of
problems. These abstract structures can be implemented on whatever hardware/software
platform is available.
 Using these abstract structures the required design efforts can be determined for the actual
model.
 Using TOC, problems are analyzed by finding the fundamental properties of the problems
themselves such as:
1. Is there any computational solution to the problem? 1f not, is there a restricted
but useful variation of the problem for which a solution does exist.
2. If a solution exists, can it be implemented using some fixed amount of
memory'?
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 3
Automata Theory & Compiler Design 21CS51 Module 1

3. If a solution exists, how efficient is it? More specifically, how do its time and
space requirements grow as the size of the problem grows'?
4. Are there groups of problems that are equivalent in the sense that if there is an
efficient solution to one member of the group there is an efficient solution to
all the others'?
 TOC plays an important role in compiler design, in switching theory, design and analysis of
digital circuits, etc.

Applications of Automata Theory


i. Automata is required for designing software and checking the behavior of digital circuits
ii. Software for verifying systems of all types that have finite number of distinct states.
iii. Structural representation of automata like grammars and regular expressions are very
useful in designing of software for lexical analyzer of typical compiler.
iv. In designing of software for identifying the words, phrases and other patterns in large bodies
of text such as collection of web pages.
v. Computational Biology: DNA & proteins are strings.
vi. In artificial intelligence and knowledge engineering, game theory and computer graphics.
vii. Automata are essential for the study of the limits of computation like
a. What can a computer do at all? (Decidability)
b. What can a computer do efficiently?(Intractability)

Abstract Machine:
An abstract machine or abstract computer is a conceptual or theoretical model of a computer
hardware or software system which really does not exist. The machines are hypothetical computers.
These machines have commonly encountered hardware features and concepts and avoid most of the
details that are often found in real computers or machines

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 4


Automata Theory & Compiler Design 21CS51 Module 1

CENTRAL CONCEPTS OF AUTOMATA THEORY


Languages and Strings:
Define the following terms with respect to automata theory.
Alphabets: An alphabet is a finite non empty set of symbols or characters.
We use the symbol Σ for an alphabet.
Example: 1. Σ = {0, 1}, the binary alphabet
2. Σ = {a, b,………….z}, the English lower-case alphabets.
Strings: A string is a finite sequence, possibly empty, of symbols chosen from some alphabet Σ.
Example: ε , 0, 1, 01, 10, 11, 110, 01110 … is the strings from the binary alphabet, Σ = {0, 1}
NOTE: 1. Given any alphabet Σ, the shortest string that can be formed from Σ is the empty string,
which we will write as ε.
2. The set of all possible strings over an alphabet Σ is written as Σ*.
3. Σ*notation exploits the Kleene Star operator.
Empty string: It is the string with zero occurrences of symbols or characters. This string is denoted
by ε, may be chosen from any alphabet whatsoever.
Length of a string: The length of a string„w‟, is the number of symbols or characters in„w‟.
It is denoted by: |w|
Example: | ε | = 0
|10010| = 5
NOTE: For any symbol „c’ and string „w‟, we define the function Nc(w) to be the number of times
that the symbol „c’ occurs in string ‟w‟. So, for example, Na(abbaaa) = 4

Concatenation of strings: Concatenation of two strings „s‟ and „t‟ is the string formed by appending
the string „t‟ to string „s‟.
It is denoted by: s||t or st
Example: If the string s = good and the string t = bye then st = goodbye.
NOTE: 1. |xy| = |x| + |y|
2. The empty string, ε, is the identity for concatenation of strings. (So for all x (x ε = ε x
= x)
3. Concatenation, as a function defined on strings, is associative. So for all s, t, w
((st)w = s(tw)).

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 5


Automata Theory & Compiler Design 21CS51 Module 1

Replication: For each string w and each natural number k, the string w; is defined as:
w0 = ε
wk+1 = wk w
For example: a3 = aaa
(bye)2 = byebye
a2b3 = aabbb
Reversal of a string: The reversal of a string w is obtained by writing the symbols in reverse order,
which is denoted by wR
Example: w = 11001 wR = 10011
NOTE: 1. If |w| = 0 then wR = w = ε
2. If |w| ≥ 1 then a € ∑( u €∑*( w=ua) (i.e., the last character of w is a.) then wR =auR
3. If w and x are strings then (wx)R = xR wR

Relations on Strings
Substring: A string„s‟ is a substring of a string „t’ if „s’ occurs contiguously as part of „t’.
Example: aaa is a substring of string aaabbbaaa
aaaaaa is not a substring of string aaabbbaaa
Proper Substring: A string „s’ is a proper substring of a string „t’ if „s’ is a substring of „t’ and
s≠t
Example: aaabbbaaa is not a proper substring of string aaabbbaaa
NOTE: 1. Every string is a substring (although not a proper substring) of itself.
2. The empty string ε is a substring of every string.

Prefix of a string: Prefix is a string of any number of leading symbols. A string „s’ is a prefix of „t’
if x € ∑* (t = sx)
Example: Prefixes of a string abba are: ε , a, ab, abb, abba.
Proper Prefix: A string „s‟ is a proper prefix of a string „t’ if „s’ is a prefix of ‘t’ and s ≠ t.
Example: Proper prefixes of a string abba are: ε , a, ab, abb ( where abba is not a proper prefix)
NOTE: 1. Every string is (although not a proper prefix) a prefix of itself.
2. The empty string ε is a prefix of every string.
Suffix of a string: Suffix is a string of any number of trailing symbols. A string „s’ is a suffix of „t’
if x € ∑* (t = xs)
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 6
Automata Theory & Compiler Design 21CS51 Module 1

Example: Suffixes of a string abba are: ε , a, ba, bba, abba.


Proper Suffix: A string „s‟ is a proper suffix of a string „t’ if „s’ is a suffix of ‘t’ and s ≠ t.
Example: Proper suffixes of a string abba are: ε , a, ba, bba( where abba is not a proper suffix)
NOTE: 1. Every string is (although not a proper suffix) a suffix of itself.
2. The empty string ε is a suffix of every string.

Language: A language is a (Finite or Infinite) set of all strings, which are chosen from some Σ*,
where Σ is a particular finite alphabet.
Example: ∑ = {a, b}
∑* = { ε, a, b, aa, ab, ba, bb, aaa, aab,………………..}
Suppose Language contains set of all strings of a‟s and b‟s with an equal number of each is given by:
L ={ ε, ab, ba, aabb,baab, baba,……………………….}
Powers of an alphabet : Power of an alphabet is the set of strings of certain length (k) obtained
from an alphabet Σ. It is denoted by Σk
Example: If Σ = {0,1}, then Σ0 ={ ε }, Σ1 = { 0, 1}, Σ2 = { 00,01,10,11}, Σ2 = { 000, 001,
010,011,100,101,110,111} and so on.
The set of languages defined on ∑ is P( ∑*), the power set of ∑* or the set of all subsets of ∑*. If ∑ =
Ø then ∑* is { ϵ} and P(∑*), is {Ø, { ϵ} }.
NOTE: 1. L= { } = Φ is the empty language, is a language over any alphabet.
2. L= {ε} the language consisting of only the empty string, is also a language over any
alphabet.

Techniques for Defining Languages


We will use a variety of techniques for defining the languages that we wish to consider. Since
languages are sets, we can define them using any of the set-defining techniques as:
L = { w  ∑* | description of w}
Example: 1 Language contains strings of a‟s and b‟s with all a's precede all b's can be defined as:
L = { w € (a, b)* | All a‟s precede all b‟s in w}
The strings which are defined in the language L are:
L { ε, a, b, aa, ab, bb,aabbb…………………….}
The strings aba, ba, and abc are not defined in the language L.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 7


Automata Theory & Compiler Design 21CS51 Module 1

Rules: For the above language all it says that any a's there; are must come before all the b's (if any).
If there are no a's or no b's, then there can be none that violate the rule. So the strings ε, a, aa, and bb
trivially satisfy the rule and are in L.
Example 2: Let L = {x : y€ {a, b}* (x = ya)}. Give an English description of the language.
L { a, aa, ba, aaa, baa, bbaa……..}.
Language L contains strings of a‟s and b’s ending with „a‟.
Note: ( a, b) * means all strings can be formed by concatenating the symbols a and b zero or more
times
Example 3: Let L = {x # y : x, y € {0,1, 2. 3. 4. 5. 6. 7, 8, 9}* and, when x and y are viewed as the
decimal representations of natural numbers, square(x) = y }.
The strings 3#9 and 12#144 are in L.

The strings 3#8 and 12#12#12 are not in L.

Concatenation of Languages: Let L1 and L2 be two languages defined over some alphabet ∑. Then
their concatenation, written as L1L2 is:
L1L2 = {w € ∑* : s € L1 ( t € L2 (w = st)) }
Example 1:
Let: L1 = {cat, dog. mouse, bird}.
L2 = {bone, food}.
L1L2 = {catbone, catfood, dogbone. dogfood. mousebone, mousefood, birdbone, birdfood}.

NOTE: 1. {} is the identity for concatenation: L{} = {}L = L


2.  is a zero for concatenation: L=L=
3. Concatenation defined on languages, is associative. So, for all languages L1, L2, L3:
((L1L2 )L3 = L1 (L2L3))

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 8


Automata Theory & Compiler Design 21CS51 Module 1

FINITE AUTOMATA (FINITE STATE MACHINE)


What is Finite Automata? Explain the block diagram of Finite Automata.
Finite automata are abstract model of a digital computer. It is used to recognize regular languages,
within input taken from alphabet Σ. It is also called Finite State Machine (FSM)
The job of FA is to accept or reject an input depending on whether the pattern defined by the FA
occurs in the input. It has 3 components i. Input file ii. Control unit iii. Output

Input tape: It is divided into number of cells each of which can hold one symbol.
Control unit: machine has some finite states (q0, q1, q2, q3, q4,…), one of which is the start state(q0).
Based on the current input symbol, state of the machine can change.
Output unit: Output may be accept or reject. When end of the input is encountered, the control unit
may be in accepting or reject state.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 9


Automata Theory & Compiler Design 21CS51 Module 1

Types of Finite Automata:


Different types of Finite automata are:
1. Deterministic Finite Automata (DFA)
2. Non Deterministic Finite Automata (NFA)
3. Non Deterministic Finite Automata with ε moves (ε - NFA)

DETERMINISTIC FINITE AUTOMATA (DFA) or DETERMINISTIC FINITE STATE


MACHINE (DFSM)
Define DFA or DFSM
DFA or DFSM is five-tuple indicating M= (Q, Σ, δ, q0, F), where M is a deterministic machine with
Q --- Non empty finite set of states
Σ --- Non empty finite set of input alphabets (symbols)
δ -- Transition function, which maps from Q x Σ → Q
q0 € Q is the start (initial) state.
F is subset of Q, is the final (accepting) state.

Language accepted by a DFA:


Let M= (Q, Σ, δ, q0, F) be a DFA. A string w is accepted by the machine M, if and only if transition
for w takes the initial state q0 to final state F. ie: δ*(q0, w) is in F.
The language accepted by DFA is L(M) = { w | w € Σ* and δ*(q0, w) is in F }
Transition diagram:
It is a graphical representation with circles, double circles, arrow and arcs with labels etc.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 10


Automata Theory & Compiler Design 21CS51 Module 1

Transition Table:
It is a tabular representation of transition function δ. For the above transition diagram, δ is given

Extended Transition function of DFA to strings:


The transition function δ ( q, a) = p accepts two parameters namely state q and input symbol „a‟ and
returns a state „p‟ which is the next state of the machine. But if there is a change of state q to state p
on input string „w‟, then extended transition function can be used, which is denoted by δ* Basis:
δ*(q, ε) = q
How DFA processes strings:
What are the moves made by the following DFA while processing the string 01101.

We know that δ*(q, ε) = q ie: δ*(q1, ε) = q1

δ*(q1, 01101) = δ ( δ*(q1, ε), 0)

= δ (q1, 0)
δ*(q1, 0 ) = q1

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 11


Automata Theory & Compiler Design 21CS51 Module 1

δ*(q1, 01101) = δ ( δ*(q1, 0), 1)

= δ (q1, 1)
δ*(q1, 01 ) = q2

δ*(q1, 01101) = δ ( δ*(q1, 01), 1)

= δ (q2, 1)
δ*(q1, 011 ) = q2

δ*(q1, 01101) = δ ( δ*(q1, 011), 0)

= δ (q2, 0)
δ*(q1, 0110 ) = q1

δ*(q1, 01101) = δ ( δ*(q1, 0110), 1)

= δ (q1, 0)
δ*(q1, 01101 ) = q2 . After reading the string w= 01101, the machine is in state q2, which is final
state. So, the string 01101 is accepted by DFA.
DFA/DFSM Design Techniques
Pattern recognition Problems:
i. Identify the minimum string.
ii. Construct a DFA for minimum string using Σ.
iii. Identify the transitions which are not defined in minimum string DFA
iv. Construct the complete DFA for the given alphabet by referring the minimum string DFA.
Draw a DFA or DFSM to accept the language contains strings of a‟s having at least one a.
Answer:
Language contains a minimum string as single a.
Note: if length of string is „m‟ then naturally we should have m+1 number of states to design FA
So DFA corresponding to this minimum string is

Identify the transitions in each state which are not defined.


In q1 for a input transition is not defined.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 12


Automata Theory & Compiler Design 21CS51 Module 1

To reach q1 state from q0 we need one input a, and in state q1 define input a. So the resultant string aa
which is accepted by q1, as it is defined in the language.
ie: δ( q1, a ) = q1
Therefore the DFA for the above problem is given by M = ({ q0, q1 }, { a}, δ, q0, {q1} )
Draw a DFA or DFSM to accept the language contains strings of a‟s and b‟s having at least one a
Language contains a minimum string as single a, and there is no restrictions on b, so it could be ε in
minimum case or any number of b‟s.
So DFA corresponding to this minimum string is

Identify the transitions in each state which are not defined.


In q0 for b input transition is not defined, q1 for a and b input transition is not defined,
In q0, on input b, it goes to q1, since any number of b‟s followed by at least one a is accepted by q1 (ie
ba, bba,bbba……), So before the suffix a, machine is in q0 and current state is also q0. Take self loop
in q0 on input b. δ( q0, b ) = qo
To reach q1 state from q0 we need one input a, and in state q1 define a input a. So the resultant string
is aa or a followed by any number of a‟s is accepted, since it is defined in the language. So take a self
loop in state q1.
ie: δ( q1, a ) = q1
There is no restriction on b, so any number of b‟s can be accepted in state q1; take a self loop for
input b.
δ( q1, b) = q1

Therefore the DFA for the above problem is given by M = ( { q0, q1 }, { a,b}, δ, q0, {q1} )

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 13


Automata Theory & Compiler Design 21CS51 Module 1

Draw a DFA or DFSM to accept the language contains strings of a‟s and b‟s having exactly one a.
There is no restriction on number of b‟s, but L contains only one a.
Therefore the DFA for the above problem is given by M = ( { q0, q1 }, { a,b}, δ, q0, {q1} )

All Strings of a‟s and b‟s ending with abb.


OR
Obtain a DFA or DFSM to accept the language L = { wabb | w € (a,b)* }
Minimum length string is abb, so the DFA for minimum string is given by:

Therefore the DFA for the given problem is M = ( { q0, q1,q2,qf }, { a, b}, δ, q0, {qf} )
Where, δ is as shown in transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 14


Automata Theory & Compiler Design 21CS51 Module 1

All Strings of a‟s and b‟s except those which end with abb. ( not ending with abb).
Answer is similar to that of previous problem, except consider non final states of previous problem
as a final state and final state as a non final state.
The DFA for the problem which is not ending with abb is M = ( { q 0, q1,q2,q3 }, { a, b}, δ, q0, {q0, q1,
q2})

Obtain a DFA or DFSM to accept the language L = { waabw | w € ( a, b)* }


OR
Obtain a DFA or DFSM to accept the language L contains strings of a‟s and b‟s with substring aab.
Minimum string is aab, so the DFA for minimum string:

DFA M = ({ q0, q1,q2,qf }, { a, b}, δ, q0, {qf} )


Where, δ is given in TD.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 15


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA/DFSM to accept the language L contains strings of a‟s and b‟s except those having
substring aab.
Note: DFA design procedure is same as that of previous problem except, make final state as non
final and non final state as final state of previous problem.
DFA M = ({ q0, q1,q2,q3 }, { a, b}, δ, q0, {q0, q1, q2} )
Where, δ is as shown in transition diagram.

Obtain a DFA/DFSM to accept the language L = {abw| w € ( a, b)* }


OR
Obtain a DFA/DFSM to accept the language L contains strings of a‟s and b‟s starting with ab.
Minimum string is ab, DFA corresponding to this is given by:

In state q0 on input b and in q1 on input a, machine enters trap state (starts with ab) and also ab
followed by any number of a‟s and b‟s in qf state is accepted.
DFA M = ( { q0, q1, qf, qt }, { a, b}, δ, q0, {qf} )
Where, δ is as shown in transition diagram.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 16


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA/DFSM to accept the language contains strings of 0‟s and 1‟s having exactly three
consecutive 0‟s.
Minimum string is 000 and its DFA is:

DFA M = ({ q0, q1, q2, q3 }, { 0, 1}, δ, q0, {q3} )


Where, δ is as shown in transition diagram.

Draw a DFA/DFSM to accept strings of a‟s and b‟s such that L = { awa | w € (a+b)* }
OR
Show that the language L = {awa | w € (a+b)* } is regular.
Note: A language is regular if it is accepted by a DFA.
That means if it possible to design DFA for the given language L = {awa | w € (a+b)* }, then we say
that L is regular.
The minimum string for the language L = { awa } is aa, and its DFA is

DFA M = ( { q0, q1, qf }, { a, b}, δ, q0, {qf} )


Where, δ is as shown in transition diagram.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 17


Automata Theory & Compiler Design 21CS51 Module 1

Therefore the given language is regular.


Draw a DFA /DFSMto accept strings of a‟s and b‟s such that L = {w(ab+ba) | w € (a+b)*}
OR
Draw a DFA/DFSM to accept strings of a‟s and b‟s ending with ab or ba.
Minimum string is ab or ba, and its DFA is

DFA M = ({ q0, q1, q2, q3, q4 }, { a, b}, δ, q0, {q3, q4} )


Where δ is as shown in transition diagram

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 18


Automata Theory & Compiler Design 21CS51 Module 1

Draw a DFA/DFSM to accept strings of 0‟s, 1‟s and 2‟s beginning with a 0 followed by odd number
of 1‟s and ending with a 2.
Minimum string is 012 and its DFA

DFA M = ({ q0, q1, q2, qf, qt}, { 0, 1,2}, δ, q0, {qf }) where qt is the trap state and δ is as shown in
transition diagram.

Draw a DFA /DFSM to accept strings of a‟s and b‟s with at most two consecutive b‟s.
Minimum string is bb and its DFA is
L = {ε, b, bb, ab, abb, bba, a, baa, aa,…………………………………}

DFA M = ({ q0, q1, qf, qt}, { a, b}, δ, q0, {qf }) where qt is the trap state and δ is as shown in
transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 19


Automata Theory & Compiler Design 21CS51 Module 1

Draw a DFA to accept strings of 0‟s and 1‟s, starting with at least two 0‟s and ending with at least
two 1‟s:
Minimum string is 0011 and its DFA

The DFA M = ({ q0, q1, q2, q3, qf, qt}, { 0, 1}, δ, q0, {qf }) where qt is the trap state and δ is as shown
in transition diagram:

Draw a DFA to accept strings of a‟s and b‟s having not more than three a‟s.
OR
Draw a DFA to accept the Language L= { Na(w) ≤ 3, w € (a, b)* }
Minimum string: L = {ε, a, aa, aaa,b, ba,bba,abbb,………..}
The DFA M = ({ q0, q1, q2, q3, qt}, { a, b}, δ, q0, {q0,}) where qt is the trap state and δ is as shown in
transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 20


Automata Theory & Compiler Design 21CS51 Module 1

Obtain DFA to accept the following languages on Σ = { a, b}.


i. The set of all strings that either begins or ends or both with substring ab.
ii. The set of strings with at least one “a” and exactly two „b‟s.
OR
L = {w | Na(w)  1 and Nb(w) = 2 }.
iii. Set of all strings of even length.
OR
*
L = {w | w is even length and w €{ a, b) }

i. DFA to accept the strings of a‟s and b‟s starting with ab can be written as shown
below:

DFA to accept the strings of a‟s and b‟s ending with ab can be written as shown below:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 21


Automata Theory & Compiler Design 21CS51 Module 1

The two DFA‟s can be joined to accept the strings of a‟s and b‟s beginning with ab or ending with
ab or both can be written as shown below:
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, {q2,q5,}) where δ is as
shown in transition diagram:

ii. The set of strings with at least one “a” and exactly two „b‟s.
The minimum strings may be abb or bab or bba and its DFA is as shown below:

Therefore the DFA M = ({ q0, q1, q2, q3, q4, q5,qt}, { a, b}, δ, q0, {q3,}) and qt is the trap state where δ
is as shown in transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 22


Automata Theory & Compiler Design 21CS51 Module 1

iii. Set of all strings of a‟s and b‟s of even length.


DFA M = ({ q0, q1 }, { a, b}, δ, q0, {q0,}) where δ is as shown in transition diagram:

Obtain a DFA to accept the language L = { (01)i 12j | i  1, j  1 }


The possible strings in L = {0111, 01010101…11, ……..}
ie: L = { strings of all 0‟s and 1‟s with at least one 01 and followed by an even number of 1‟s with
minimum two 1‟s (no zero 1‟s) }
The DFA M = ({ q0, q1, q2, q3, qf, qt}, { 0, 1}, δ, q0, {qf }) where δ is as shown in transition diagram:

Obtain a DFA to accept the set of all strings that begins with 01 and end with 11.
Minimum length string = 011 and its DFA:

The DFA M = ({q0, q1, q2, q3, q4, qt}, { 0, 1}, δ, q0, {q3 }) where qt is the trap state and δ is as shown
in transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 23


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA to accept the set of all strings that begins with 01 and end with 10.
Minimum length string = 010 and its DFA:

The DFA M = ({q0, q1, q2, q3, q4, qt}, { 0, 1}, δ, q0, {q3 }) where qt is the trap state and δ is as shown
in transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 24


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA to accept the language contains strings of 0‟s and 1‟s with odd number of 1‟s followed
by even number of 0‟s.

The DFA M = ({q0, q1, q2, q3, qt}, { 0, 1}, δ, q0, {q1,q3 }) where δ is as shown in transition diagram:

Obtain a DFA to accept the language L = { w | w is of even length and begins with 01 }
The DFA M = ({ q0, q1, q2, q3, qt}, { 0, 1}, δ, q0, {q2}) where δ is as shown in transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 25


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA to accept the language contains strings of binary odd numbers.
The DFA M = ({ q0, q1}, { 0, 1}, δ, q0, {q1}) where δ is as shown in transition diagram

Obtain a DFA to accept the set of all strings when interpreted as binary integer is an odd or even number.
The DFA M = ({ q0, q1, q2}, { 0, 1}, δ, q0, {q1,q2}) where δ is as shown in transition diagram

Obtain a DFA for the language L = {w € { 0, 1}* : w has odd parity }

Obtain a DFA for the language L = {w €{a. b}* : no two consecutive characters are the same}.
Answer:
L = { ϵ, a, b, ab, ba, aba, bab,abab,……………}
The DFSM M = ({ q0, q1,q2, d}, { a, b}, δ, q0, {q0, q1, q2}) where δ is as shown in transition diagram;
where state d is the dead state or trap state

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 26


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA for the language L = {w €{a, b}* : every a region in w is of even length}.
Answer:
Language L contains strings of a‟s and b‟s in which any number of b‟s immediately preceded or
followed by even number of a‟s .

L= {ϵ, b, bb, aab, aabb, baa, aabaab, aabaaaabbbbbb……………………………..}

The DFSM M = ({ q0, q1, d}, { a, b}, δ, q0, {q0}) where δ is as shown in transition diagram

Modulo-n-problems
Obtain a DFA to accept the language L = { |w| mod 3 = 0, where w € (a, b)* }
Answer:
Modulo- 3 results in three remainders: 0, 1, 2. So in q0 state, no input symbol required to reach that
state, length is 0. Therefore q0 is identified as remainder 0 state. Similarly in q1 (length1), remainder is
1, and q2 remainder is 2 (length 2). Afterwards it enters to q0 and the same process is repeated. Final
state is q0 since the |w| mod 3 = 0 (remainder 0 state, which is q0)
The DFA M = ({ q0, q1, q2}, { a, b}, δ, q0, {q0}) where δ is as shown in transition diagram:

Obtain a DFA to accept the language L = { |w| mod 3 < > 0, where w € (a, b)* }.
ie: (|w| mod 3 not equal to 0)
The DFA M = ({ q0, q1, q2}, { a, b}, δ, q0, {q1, q2}) where δ is as shown in transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 27


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA to accept the language L = { w: |w| mod 5 = 0 } and w € (a,b)*.


The possible strings in L ={ ε, ababb, aaaaa, bbbbb, baaab, ………….. }
Possible remainders are 0,1,2,3, 4. So there are 5 states, remainder 0 is the start state and also the
final state.
The DFA M = ({ q0, q1, q2, q3, q4}, { a, b}, δ, q0, {q0}) where δ is as shown in transition diagram

Obtain a DFA to accept the language L = { w: |w| mod 5 < > 0 } and w € (a,b)*
The DFA M = ({ q0, q1, q2, q3, q4}, { a, b}, δ, q0, { q1, q2, q3, q4}) where δ is as shown in transition
diagram

Obtain a DFA to accept the language L = { w: |w| mod 3  |w| mod 2 and w € (a,b)*}.
Answer:
Here mod 3 results in three reminders 0, 1, 2 and mod 2 results two remainders 0, 1.
Let us consider |w| mod 3 = x and it results in three states say Q1 = { 0, 1, 2 }
|w| mod 2 = y and it results in two states say Q2 = { 0, 1 }.
Therefore the number of states required to design DFA for the given language can be obtained by
taking cross product of Q1 and Q2.
Q = Q1 X Q2
Q = {( 0, 0), ( 0, 1), (1, 0), (1, 1), (2, 0), (2, 1) }.
Here (0, 0) state is considered as start state.
(Because in start state length of the string required to reach that sate = 0; that means |w| mod3 = 0
mod 3 = 0 and 0 mod 2 = 0; (0, 0) state.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 28


Automata Theory & Compiler Design 21CS51 Module 1

Final state: To accept strings of w such that |w| mod 3  |w| mod 2, is the pairs (x, y) such that x 
y are final states.
So the final states are {(0,0), (1,0), (1,1), (2,0),(2,1) }
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q0, q1, q2, q4, q5}) where δ is as shown in
transition diagram

Obtain a DFA to accept the language L = { w: |w| mod 3  |w| mod 2 and w € (a,b)*}.
Q1 = { 0, 1, 2 }
Q2 = { 0, 1 }.
Q = {( 0, 0), ( 0, 1), (1, 0), (1, 1), (2, 0), (2, 1) }.
Here (0, 0) state is considered as start state.
Final states are; x  y ie: = { (0, 0), (0,1), (1,1) }
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q0, q1, q3}) where δ is as shown in transition
diagram:

Obtain a DFA to accept the language L = { w: |w| mod 3 ≠ |w| mod 2 and w € (a,b)*}.
Answer:
Q = {( 0, 0), ( 0, 1), (1, 0), (1, 1), (2, 0), (2, 1) }.
Here (0, 0) state is considered as start state.
Final states are; x ≠ y ie: = { (0, 1), (1,0), (2,0), (2,1) }
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q2, q3, q4, q5}) where δ is as shown in transition
diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 29


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA to accept the language = { w |w € (a,b)*; Na(w) mod 3 = 2 and Nb(w) mod 2 =1}
Answer:
Na(w) mod 3 results in remainders as 0,1,2 and the states corresponding to these remainders can be
represented as: Q1 = { A0, A1, A2 }
Nb(w) mod 2 results in remainders as 0,1 and the states corresponding to these remainders can be
represented as: Q2 = { B0, B1 }
The possible states for the given DFA is Q = Q1 X Q2
Q = { (A0,B0 ), ( A0,B1), (A1,B0), ( A1,B1), (A2,B0), (A2,B1) }
Here (A0,B0 ) is the start state and the language contains strings of a‟s and b‟s such that Na(w) mod 3
= 2 and Nb(w) mod 2 =1, results in a final state ( A2,B1).
The DFA M = ({ q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q5}) where δ is as shown in transition
diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 30


Automata Theory & Compiler Design 21CS51 Module 1

Obtain a DFA to accept the language L= {w |w € (a,b)*; Na(w) mod 3  1 and Nb(w) mod 2  1}
Answer is same as that of previous problem, but only differs in final sate.
Final states ={ (A1,B0), ( A1,B1), (A2,B0), (A2,B1) .
The DFA M = ({q0, q1, q2, q3, q4, q5}, { a, b}, δ, q0, { q1, q2, q4, q5}) where δ is as shown in
transition diagram:

****Obtain a DFA to accept the language contains strings of a‟s and b‟s such that,
i. Set of all strings having even number of a‟s and even number of b‟s.
ii. Set of all strings having even number of a‟s and odd number of b‟s.
iii. Set of all strings having odd number of a‟s and even number of b‟s.
iv. Set of all strings having odd number of a‟s and odd number of b‟s.

i. Set of all strings having even number of a‟s and even number of b‟s:
OR
Number of a‟s divisible 2 and number of b‟s divisible by 2.
OR
Number of a‟s multiple of 2 and number of b‟s multiple of 2
Answer:
Even number of a‟s is nothing but Na(w) mod 2 = 0
Even number of b‟s is nothing but Nb(w) mod 2 = 0.
So the possible remainders in each case is 0, 1;
Q1 = { A0, A1}
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 31
Automata Theory & Compiler Design 21CS51 Module 1

Q2 = { B0, B1}. Therefore Q = Q1 X Q2 = { (A0, B0) , (A0, B1), (A1, B0), (A1, B1) }
Start state: (A0, B0)
Final state to accept the language contains even number of a‟s and even number of b‟s is (A0,B0).
So the DFA M = ({ q0, q1, q2, q3 }, { a, b}, δ, q0, { q0}) where δ is as shown in transition diagram:

ii. Set of all strings having even number of a‟s and odd number of b‟s.
Answer is same as that of previous problem, but only differs in final sate.ie: (A0, B1)
Even number of a‟s is nothing but Na(w) mod 2 = 0
Odd number of b‟s is nothing but Nb(w) mod 2 = 1.
DFA M = ({ q0, q1, q2, q3 }, { a, b}, δ, q0, { q2}) where δ is as shown in transition diagram:

iii. Set of all strings having odd number of a‟s and even number of b‟s.
Answer is same as that of previous problem, but only differs in final sate; ie: (A1, B0)
Odd number of a‟s is nothing but Na(w) mod 2 = 1
Even number of b‟s is nothing but Nb(w) mod 2 = 0.
DFA M = ({q0, q1, q2, q3 }, { a, b}, δ, q0, { q1}) where δ is as shown in transition diagram

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 32


Automata Theory & Compiler Design 21CS51 Module 1

iv. Set of all strings having odd number of a‟s and odd number of b‟s.
Odd number of a‟s is nothing but Na(w) mod 2 = 1
Odd number of b‟s is nothing but Nb(w) mod 2 = 1.
Answer is same as that of previous problem, but only differs in final sate; ie: (A1, B1)
DFA M = ({ q0, q1, q2, q3 }, { a, b}, δ, q0, { q3}) where δ is as shown in transition diagram:

DIVISIBLE BY-n- PROBLEMS:


Procedure for designing DFA for strings over ∑, represented as binary or decimal numbers which are
divisible by some number n
1. For any given problem if the number (Binary/decimal) is divisible by „n‟, then we may have
„n‟ number of DFA states.
2. The states of DFA are nothing but the remainders of divisible by „n‟ problem.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 33


Automata Theory & Compiler Design 21CS51 Module 1

3. For example if the numbers are interpreted as binary numbers and divisible by 5 then we need
total five states to design DFA.
4. The states of DFA: Start state q0 represents remainder 0, q1 represents remainder 1, q2
remainder 2, q3 remainder 3 and q4 remainder 4 respectively.
5. Final state of DFA is the remainder 0 state, which is q0 (since all numbers divisible by n
results in remainder 0)
Design a DFA to accept all binary numbers which are divisible by 3.
OR
Design a DFA to accept all binary integers which are multiple of 3.
Answer:
Number divisible by 3 results in remainders: 0, 1 and 2
Therefore the number of states required to design DFA for this problem = 3
Remainder 0 corresponds to q0 state, remainder 1 corresponds to q1 and remainder 2 corresponds to
q2 state.
Final state: q0 where all binary number divisible by 3 results in remainder 0 is accepted.

In q0 machine reads any number of 0‟s and results in remainder 0 (0 mod 3); when input in q0 is = 1;
Machine enters unto remainder 1 state (1 mod 3).
In q1 if it reads input 0; then 10 (2 mod 3) results in remainder 2; so it enters into q2 state.
In q1 if it reads input 1 then 11 results in remainder 0; so it enters into q0
In q2 if input = 0; 100 results in remainder 1; so it enters into q1
In q2 if input = 1; 101 results in remainder 2; so it remains in q2
The Transition Function of the above problem:
State ∑
0 1
→ * q0 q0 q1

q1 q2 q0

q2 q1 q2

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 34


Automata Theory & Compiler Design 21CS51 Module 1

SHORTCUT METHOD
1. For any divisible by n number problem, first write the transition table for „n‟ number of states
and then translate Transition table into Transition diagram.
Design a DFA to accept all binary numbers which are divisible by 5. OR which are multiple of 5.
Easy to solve by using short cut method:
Write the transition table for binary number divisible by 5. Total number of states required is 5:
State ∑
0 1
→ * q0 q0 q1

q1 q2 q3

q2 q4 q0

q3 q1 q2

q4 q3 q4

Now translate the above TT into Transition diagram:

Design a DFA to accept the set of all strings beginning with a 1 that when interpreted as binary
integer is a multiple of 5. For example 101, 1010, 1111 etc are in the language and 0, 0101, 100, 111,
01111 etc are not.
Answer remains same as that of previous problem, but the number should always start with a 1.
If a binary number starts with a 0, that number should never be accepted and machine enters to trap
state on input 0. So let us rename the final state of previous problem as qf and have a new start state
q0 and from this state on input 1 machine enters into state q1, and the remaining procedure is same as
that of previous problem.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 35


Automata Theory & Compiler Design 21CS51 Module 1

Therefore the DFA for the above problem is M =({ q0, q1, q2, q3, q4 }, { 0, 1}, δ, q0, { qf}) where δ is
as shown in transition diagram:

Design a DFA to accept the set of all strings that when interpreted in reverse as a binary integer is
divisible by 5. Examples of strings in the language are 0, 10011, 1001100 and 0101.
Answer is same as that of divisible-5 problem; but reverses the direction of all arrow marks except
the arrow labeled with start.
Therefore the DFA for the above problem is M =({ q0, q1, q2, q3, q4 }, { 0, 1}, δ, q0, { q0}) where δ is
as shown in transition diagram:

Design a DFA to accept all decimal numbers which are divisible by 3.


Answer:
∑ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} Decimal Number
Number of states required to design DFA = 3
When we divide decimal number 0, 3, 6 and 9 remainder is = 0
Similarly when we divide decimal number 1, 4 and 7 remainder is = 1
Also decimal number 2, 5, 8 results in remainder 2.
Remainder 0, 1, and 2 states are q0, q1, q2 respectively.
DFA is given by the following Transition diagram:
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 36
Automata Theory & Compiler Design 21CS51 Module 1

NON-DETERMINISTIC FINITE AUTOMATA (NFA) or Non Deterministic Finite State


Machine (NDFSM)
Why do you we need NFA?
Sometimes it is difficult to construct a DFA for complicated languages. In such cases, there is a need
to construct a machine very easily which can be achieved by constructing an NFA. After constructing
an NFA, DFA can be easily constructed. So the drawbacks of DFA can be overcome using NFA. But
practically an NFA does not exist.
Define NFA
NFA is five-tuple indicating M= (Q, Σ, δ, q0, F), where M is a Non-deterministic machine with
Q --- Non empty finite set of states
Σ --- Non empty finite set of input alphabets (symbols)
δ -- Transition function, which maps from Q x Σ → 2Q
q0 € Q is the start (initial) state.
F is subset of Q, is the final (accepting) state.
Language accepted by a NFA:
Let M= (Q, Σ, δ, q0, F) be a NFA. A string w is accepted by the machine M, if and only if transition
for w takes the initial state q0 to final state F.
ie: δ*(q0, w) is in F. ie: δ*(q0, w) contains at least one accepting state.
The language accepted by NFA is L(M) = { w | w € Σ* and δ*(q0, w) ∩ F ≠ φ }
Obtain an NFA for the following languages:
i. L = { ababn or aban | n  0 }
ii. L = { w (ab + ba) | w € ( a,b)* } ie: ending with ab or ba
iii. L = { abw | w € ( a,b)* } ie: starting with ab
iv. L = { wab | w € ( a,b)* } ie: ending with ab

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 37


Automata Theory & Compiler Design 21CS51 Module 1

v. L = { awa | w € ( a,b)* } ie: starting with “a” and ending with “a”.

i. L = { ababn or aban | n  0 }

NFA for the above problem is M =({ q0, q1, q2, q3, q4,qf }, { a, b}, δ, q0, { qf,q4}) where δ is as
shown in transition diagram:

ii. L = { w (ab + ba) | w € ( a,b)* }


iii.

iv. L = { abw | w € ( a,b)* } .


NFA for the above problem is M = ({ q0, q1, q2}, { a, b}, δ, q0, { q2) where δ is as shown in
transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 38


Automata Theory & Compiler Design 21CS51 Module 1

v. L = { wab | w € ( a,b)* } .
NFA for the above problem is M =({ q0, q1, q2}, { a, b}, δ, q0, { q2) where δ is as shown in
transition diagram:

vi. L = { awa | w € ( a,b)* }.


NFA for the above problem is M =({ q0, q1, qf}, { a, b}, δ, q0, { qf) where δ is as shown in
transition diagram:

Obtain an NFA to accept the language L {aa*(a + b) over ∑= { a, b}}

Obtain an NFA which accepts exactly those strings that have the symbol 1 in the second last position
over ∑ = { 0, 1}.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 39


Automata Theory & Compiler Design 21CS51 Module 1

How the input string 01010 is processed by the above NFA?


NFA starts from the initial state q0 and remains there on reading the 0. On reading a 1 next it may go
either to q0 or to q1 as seen in the third column of below Figure. Now the next symbol 0 is read. We
need to consider transitions out of both q0 and q1. q0 goes to itself while q1 moves to q2. When the
next symbol 1 is read we need to consider two transitions once again --- one out of q0 and the other
out of q2. q0 can go either to q0 or to q1 whereas q2 has no transition on 0 or 1 and hence it dies. With
the last input 0 q0 goes to q0 while q1 goes to q2. Since q2 is an accepting state the NFA accepts
01010

What is the difference between NFA and DFA.


DFSM NFSM
DFSM can make exactly one move, for NFSM has zero or more moves for each input symbol
each input symbol in Σ. in Σ from every state of the automaton. An NDFSM M
may have one or more transitions that are labeled ε
It is deterministic finite automaton, Non deterministic finite automaton where the moves of
where the moves of the automaton can be automaton cannot be predicted.
predicted. NFSM with ε- transitions, where automaton can change
the state without reading input.
Number of states are more compared to Number of states are less compared to DFA and easier
NFA and difficult to construct to construct
DFA cannot guess about its input NFDSM with ε-transitions enable machine M to
guess at the correct path before it actually sees the input

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 40


Automata Theory & Compiler Design 21CS51 Module 1

CONVERSION OF NFA TO DFA USING SUBSET CONSTRUCTION METHOD


Procedure:
 The subset construction method of NFA to DFA conversion starts from an NFA by
considering the start state of NFA as the start state of DFA and the other components of the
DFA states are constructed as follows:
 .QD (set of all states in DFA) is the set of subsets of QN (Set of all states in NFA) : ie: QD is
the power set of QN. If QN has n states, then QD will have 2n states. Often not all these states
are accessible from the start state of QD. Inaccessible states can be “thrown away” so
effectively, the number of states of DFA may be much smaller than 2n.
 FD is the final state(s) of DFA, which is the set of subset of QN that include at least one
accepting state of NFA.
Convert the following NFA to DFA using subset construction method.

Given NFA contains 3 states: QN = {q0, q1, q2}


Start state of NFA q0 is the start state of DFA.
QD = the subset of QN states = {φ, {q0},{q1},{q2},{q0, q1},{q0,q2},{q1,q2},{q0,q1,q2}}.
Since q0 is the start state of NFA, the start state of DFA = {q0} and start writing the transition
function from q0 state.
δD( {q0}, a ) = δN( q0, a ) = { q0,q1} δD( {q1}, a ) = δN( q1, a ) = φ
δD( {q0}, b ) = δN( q0, b ) = { q0} δD( {q1}, b ) = δN( q1, b ) = { q2}
δD( {q2}, a ) = δN( q2, a ) = φ δD( {q0, q1}, a ) = δN( q0, a ) Ṳ δN( q1, a ) = { q0,q1}Ṳφ
={ q0,q1}
δD( {q2}, b ) = δN( q2, b ) = φ δD( {q0, q1}, b ) = δN( q0, b ) Ṳ δN( q1, b ) = { q0}Ṳ{q2}
= { q0, q2}
δD( {q0, q2}, a ) = δN( q0, a ) Ṳ δN( q2, a ) = { q0,q1}Ṳφ ={ q0,q1}
δD( {q0, q2}, b ) = δN( q0, b ) Ṳ δN( q2, b ) = { q0} Ṳφ ={ q0}
δD( {q1, q2}, a ) = δN( q1, a ) Ṳ δN( q2, a ) = φṲφ =φ
δD( {q1, q2}, b ) = δN( q1, b ) Ṳ δN( q2, b ) = {q2}Ṳφ ={q2}
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 41
Automata Theory & Compiler Design 21CS51 Module 1

δD( {q0, q1, q2}, a ) = δN( q0, a ) Ṳ δN( q1, a ) Ṳ δN( q2, a ) = {q0,q1} U φṲφ = {q0,q1}
δD( {q0, q1, q2}, b ) = δN( q0, b ) Ṳ δN( q1, b ) Ṳ δN( q2, b ) = {q0} U {q2} Ṳφ = {q0,q2}
Transition table of DFA:
a b
φ φ φ

{q0} {q0, q1} {q0}


{q1} φ {q2}
*{q2} φ φ

{q0, q1} {q0, q1} {q0, q2}


*{q0,q2} {q0, q1} {q0}
*{q1,q2} φ {q2}
*{q0,q1,q2} {q0, q1} {q0,q2}

From the above table we observe that only {q0, {q0, q1} and {q0, q2} are reachable from start state
q0, and all other states are inaccessible states. So by discarding all those inaccessible states from the
above transition table we get the DFA equivalent to given NFA is:
a b
{q0} {q0, q1} {q0}
{q0, q1} {q0, q1} {q0,q2}
*{q0,q2} {q0, q1} {q0}
The final state of DFA FD = {q0, q2} ( since q2 is the final state of NFA)
The DFA M = ({(q0), (q0,q1), (q0,q2), { a, b}, δ, q0, { (q0,q2)}) where δ is as shown in transition
diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 42


Automata Theory & Compiler Design 21CS51 Module 1

Convert the following NFA to DFA using subset construction method:

δ 0 1
→p {p, q} {p}
q φ {r}
*
r {p, r} {q}
Answer:
Transition table of DFA using subset construction method:
δ 0 1
φ φ φ
→{p} {p, q} {p}

{q} φ {r}

*{r} {p, r} {q}

{p, q} {p, q} {p, r}

*{p, r} {p, q, r} {p, q}

*{q, r} {p, r} {q, r}

*{p, q, r} {p, q, r} {p, q, r}

From the above table we observe that only {p}, {p, q}, {p, r} and {p, q, r} are reachable from start
state {p}, all other states are inaccessible states. So by discarding all those inaccessible states from
the above transition table we get the DFA as:
δ 0 1
→{p} {p, q} {p}

{p, q} {p, q} {p, r}

*{p, r} {p, q, r} {p, q}

*{p, q, r} {p, q, r} {p, q, r}

The final state of DFA FD = { (p, r), (p, q, r)} ( since r is the final state of NFA)
The DFA M = ({ (p), (p,q), (p,r), (p,q,r) { 0,1}, δ, {p}, { (p,r), (p,q,r)} where δ is as shown in
transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 43


Automata Theory & Compiler Design 21CS51 Module 1

LAZY EVALUATION SUBSET CONSTRUCTION METHOD


An alternative subset construction method used to convert NFA to DFA is LAZY Evaluation subset
construction method which will overcome the problem faced in above method. The above complete
subset construction method is very slow and time consuming. Since all are not reachable for
every DFA, so this method will speed up the process by avoiding extra work.
This method is used to convert the given NFA to its equivalent DFA. Start state of DFA will be same
as start state of NFA. Write the transitions for the start state for the input symbol and repeat the
process until no new states are formed. The language accepted by NFA and DFA will be same.
Procedure:
i. The start state of DFA is the start state of NFA. Initially set of states of DFA Q D has only one
state ie: the start state.
ii. Identify the transitions of DFA: For each state of DFA, and for each input symbol „a‟ in Σ,
compute the transitions as

That is to compute δD (QD, a ), we look at all the states p in QD, see what states in NFA
goes to from p on input a, and take the union of all those states.
iii. Identify the final state of DFA ie FD; is the sets that include at least one accepting state of
NFA.

Note: In case in NFA to DFA conversion problem, particular method is not specified, then always go
for LAZY evaluation method:
Convert the following NFA to DFA.

δ 0 1
→p {q} φ
*q {p} {q, r}
r φ {q}

The start state of DFA = {p} (since p is the start state of NFA)
Initially DFA has only one state ie: start state. QD = { {p} }
Find the transitions from {p} on input 0 and 1.
δD( (p), 0 ) = δN(p, 0 ) = {q}, δD( (p), 1 ) = δN(p, 1 ) = φ
Add the new state {q} to QD = { {p}, {q} }

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 44


Automata Theory & Compiler Design 21CS51 Module 1

δD( (q), 0 ) = δN(q, 0 ) = {p}, δD( (q), 1 ) = δN(q, 1 ) = {q, r}


Add the new state { q, r} to QD . ie: = { {p}, {q} , { q, r}}
δD( (q, r), 0 ) = δN(q, 0 ) Ṳ δN(r, 0 ) = {p} Ṳφ= {p}
δD( (q, r), 1 ) = δN(q, 1 ) Ṳ δN(r, 1 ) = {q, r} Ṳ{ q}= {q, r}
Finally DFA has QD = {{p}, {q} , { q, r}} and the final state of DFA FD = { {q}, {q, r} } and δ is

δ 0 1
→{p} {q} φ
*{q} {p} {q, r}
*{ q, r} {p} {q, r}

Convert the following NFA to DFA.

δ a b

→q0 {q1} φ

q1 {q1, q2} {q1}


*
q2 {q2} φ

The equivalent DFA is given by M = ( {{ q0}, {q1} , {q1, q2}}, { 0,1 }, δ, {q0}, {q1,,q2} )
Transition table of DFA:

δ a b

→q0 {q1} φ

q1 {q1, q2} {q1}


*
{q1, q2} {q1, q2} {q1}

In DFA φ indicates that there is no transition defined (or it enters to trap state)

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 45


Automata Theory & Compiler Design 21CS51 Module 1

Convert the following NFA to DFA:

Answer:
Step 1: Start state of DFA is the start state of NFA; ie q0
Initially set of DFA states has QD = { (q0) }
Write the transition function in state q0
δD( q0, a ) = δN( q0, a) = { q0, q1}
δD( {q0, b ) = δN( q0, b ) = { q0, q3}
Add the new state { q0, q1} and{ q0, q3} to QD & write the transition function for the new states and
repeat the same process until we get no more new states.
δD( {q0, q1}, a ) = { δN( q0, a) U δN( q1, a)}
= {q0, q1} U { φ }
= {q0, q1} (already existing state no need to add into QD)
δD( {q0, q1}, b ) = { δN( q0, b) U δN( q1, b)}
= {q0, q3} U { q2 }
= {q0, q2, q3} (new state added to QD)

δD( {q0, q3}, a ) = { δN( q0, a) U δN( q3, a)}


= {q0, q1} U { q4 }
= {q0, q1, q4} (new state added to QD)
δD( {q0, q3}, b ) = { δN( q0, b) U δN( q3, b)}
= {q0, q3} U {φ }
= {q0, q3} (Already existing state)
δD( {q0, q2, q3}, a ) = { δN( q0, a) U δN( q2, a) U δN( q3, a) }
= {q0, q1} U { φ } U {q4}
= {q0, q1, q4} (Already existing state)
δD( {q0, q2, q3}, b ) = { δN( q0, b) U δN( q2, b) U δN( q3, b) }

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 46


Automata Theory & Compiler Design 21CS51 Module 1

= {q0, q3} U { φ } U { φ }
= {q0, q3} (Already existing state)

δD( {q0, q1, q4}, a ) = { δN( q0, a) U δN( q1, a) U δN( q4, a) }
= {q0, q1} U { φ } U { φ }
= {q0, q1} (Already existing state)
δD( {q0, q1, q4}, b ) = { δN( q0, b) U δN( q1, b) U δN( q4, b) }
= {q0, q3} U { q2 } U { φ }
= {q0, q2, q3} (Already existing state)
Since there is no more new state, we stop the process and finally DFA has 5 states;
QD = {{q0}, {q0, q1}, {q0, q3}, {q0, q2, q3}, {q0, q1, q4}}
Since q2 and q4 are the final states of NFA; the set of states of DFA containing at least one of these
states is considered as the final state. Therefore the final state of DFA= {{q0, q2, q3}, {q0, q1, q4}}

Convert the following NFA to DFA:

δ 0 1
→p {p, r} {q}

q {r, s} {p}

*r {p, s} {r}

*s {q, r} φ

Start state of DFA = { p}, Initially QD = { (p) }


δD( p, 0 ) = δN( p, 0) = { p, r} δD( {p, 1 ) = δN( p, 1 ) = { q}
Add these new states to QD = { (p), (q), (p,r) }
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 47
Automata Theory & Compiler Design 21CS51 Module 1

δD( q, 0 ) = δN( q, 0 ) = { r,s} , δD( {q, 1 ) = δN( q, 1 ) = { p} --- already existing state
δD( (p,r), 0 ) = δN(p, 0 ) Ṳ δN(r, 0 ) = {p,r} Ṳ{p,s} = {p,r,s}
δD( (p,r), 1 ) = δN(p, 1 ) Ṳ δN(r, 1 ) = {q}Ṳ{r} = { q,r}
Add the new states {r,s}, { p,r,s},{q,r} to QD, ie:QD = { (p), (q), (p,r), (r,s), { p,r,s},{q,r} }

δD( (r,s), 0 ) = δN(r, 0 ) Ṳ δN(s, 0 ) = {p,s} Ṳ{q,r} = {p,q,r,s}


δD( ( r,s), 1 ) = δN(r, 1 ) Ṳ δN(s, 1 ) = {r} Ṳφ= {r}

δD( (p,r,s), 0 ) = δN(p, 0 ) Ṳ δN(r, 0 ) ṲδN(s, 0 ) = {p,r} Ṳ{p,s} Ṳ{q,r} = {p,q,r,s}


δD( ( p,r,s), 1 ) = δN(p, 1 ) Ṳ δN(r, 1 )Ṳ δN(s, 1 ) = {q} Ṳ{r} Ṳφ= {q,r}

δD( (q, r), 0 ) = δN(q, 0 ) Ṳ δN(r, 0 ) = {r,s} Ṳ{p,s} = {p, r, s}


δD( (q,r), 1 ) = δN(q, 1 ) Ṳ δN(r, 1 ) = {p} Ṳ{r} = {p, r}
Add the new states {r}, { p,q,r,s} to QD.
ie:QD = { (p), (q), (p,r), (r,s), { p,r,s},{q,r}, (p,q,r,s),{r} }
δD( (p,q,r,s), 0 ) = δN(p, 0 ) Ṳ δN(q, 0 ) Ṳ δN(r, 0 ) ṲδN(s, 0 ) = {p,r} Ṳ{r,s}Ṳ{p,s} Ṳ{q,r} =
= {p,q,r,s}
δD( (p,q,r,s), 1 ) = δN(p, 1 ) Ṳ δN(q, 1 ) Ṳ δN(r, 1 ) ṲδN(s, 1 ) = {q} Ṳ{p}Ṳ{r} Ṳφ= {p,q,r}
δD( r, 0 ) = δN( r, 0 ) = { p,s} , δD( {r, 1 ) = δN( r, 1 ) = { r}
QD = { (p), (q), (p,r), (r,s), { p,r,s},{q,r}, (p,q,r,s),{r}, (p,q,r), (p,s) }

δD( (p,q,r), 0 ) = δN(p, 0 ) Ṳ δN(q, 0 ) Ṳ δN(r, 0 ) = {p,r} Ṳ{r,s}Ṳ{p,s} = {p,r,s}


δD( (p,q,r), 1 ) = δN(p, 1 ) Ṳ δN(q, 1 ) Ṳ δN(r, 1 ) = {q} Ṳ{p}Ṳ{r} = {p,q,r}

δD( (p,s), 0 ) = δN(p, 0 ) Ṳ δN(s, 0 ) = {p,r} Ṳ{q,r} = {p,q,r}


δD( (p,s), 1 ) = δN(p, 1 ) Ṳ δN(s, 1) = {q} Ṳφ = {q}
Since there is no more new states, finally DFA has
QD = { (p), (q), (p,r), (r,s), { p,r,s},{q,r}, (p,q,r,s),{r}, (p,q,r), (p,s) }
Where {p} is the start state and Final state FD = {(p,r), (r,s), { p,r,s},{q,r}, (p,q,r,s),{r}, (p,q,r), (p,s)}
Σ = { 0,1} δ is as shown in transition table.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 48


Automata Theory & Compiler Design 21CS51 Module 1

DFA
0 1
{p} { p, r} { q}
{q} {r, s} {p}
*{r} {p, s} {r}
*{p, r} {p, r, s} {q, r}
*{r, s} {p, q, r, s} {r}
*{q, r} {p, r, s} {p, r}
*{p, s} {p, q, r} {q}
*{p, r, s} {p, q, r, s} {q, r}
*{p, q, r} {p, r, s} {p, q, r}
*{p, q, r, s} {p, q, r, s} {p, q, r}

EPSILON – NFA (ε- NFA)


Here an NFA is allowed to make a transition spontaneously without receiving an input symbol.

Define ε-NFA.

ε- NFA is five-tuple indicating E= (Q, Σ, δ, q0, F), where E is a Non-deterministic machine with ε-
moves where
Q --- Non empty finite set of states
Σ --- Non empty finite set of input alphabets (symbols)
δ -- Transition function, which maps from Q x Σ Ṳ {ε} → 2Q
q0 € Q is the start (initial) state.
F is subset of Q, is the final (accepting) state.
Language accepted by a ε- NFA:
Let E= (Q, Σ, δ, q0, F) be a ε-NFA. A string w is accepted by the machine E, if and only if transition
for w takes the initial state q0 to final state F.
ie: δ*(q0, w) is in F. ie: δ*(q0, w) contains at least one accepting state.
Epsilon-Closure:
What is epsilon-closure?
Epsilon closure of any state q is the set of all states which are reachable from state q on ε-transitions
only. ε-closure(q) is denoted by ECLOSE(q).
Recursive definition of epsilon-closure:
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 49
Automata Theory & Compiler Design 21CS51 Module 1

1. epsilon-closure(q) = q for each state q € Q.

2. If ε-closure(q) = p and δ (p, ε ) = r then r is also ε-closure(q) ie: ε-closure(q) = {p, r}

Example:

ε-closure(1) = ECLOSE(1) = {1, 2, 3, 4, 6}

ECLOSE (2) = {2, 3, 6}

ECLOSE (3) = {3, 6}

ECLOSE (4) = {4}

ECLOSE (5) = {5, 7}

ECLOSE (6) = {6}

ECLOSE (7) = {7}

Compute the epsilon closure of each state of the following automata.

Epsilon Closure (q0) = {q0, q1, q2 }


Epsilon Closure (q1) = {q0, q1, q2 }
Epsilon Closure (q2) = { q0, q1, q2}
Epsilon Closure (q0) = { q3}

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 50


Automata Theory & Compiler Design 21CS51 Module 1

Design ε-NFA for the following languages:

i. The set of strings consisting of zero or more a‟s followed by zero or more b‟s followed by
zero or more c‟s

ii. The set of strings consisting of either 01 repeated one or more times or 010 repeated one or
more times. ie: L = { (01, 010 )+ }.

iii. Language contains decimal numbers.

i.

ii.

iii. Decimal number consisting of:


1. An optional + or – sign.
2. A string of digits
3. A decimal point.
4. Another string of digits, either this string of digits or the string can be empty, but at least one
of the two strings of digits must be non empty.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 51


Automata Theory & Compiler Design 21CS51 Module 1

q1 : The situation in which we have seen the sign if there is one but none of the digits or decimal
point.
q2 : The situation in where we have just seen the decimal point may or may not have seen prior digits.
q4 : We have definitely seen at least one digit but not the decimal point.
q3: We have seen a decimal point and at least one digit either before or after the decimal point. We
may stay in q3 reading whatever digits there are and also have the option of guessing the string of
digits is complete and going spontaneously to q5 as an accepting state.
Design a NDFSM (NFA) for L = {w € {a, b}*: w is made up of an optional a followed by aa
followed by zero or more b's }

NDFSM (NFA) M =({ q0, q1, q2, q3}, { a, b}, δ, q0, { q3}) where δ is as shown in transition diagram:

Design a NDFSM (NFA) for L = {w € {a, b}*: w = aba or |w| is even }

NDFSM (NFA) M =({ q0, q1, q2, q3, q4, q5, q6}, { a, b}, δ, q0, { q4, q5}) where δ is as shown in
transition diagram:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 52


Automata Theory & Compiler Design 21CS51 Module 1

Consider the following NFA:

For each of the following strings w, determine whether w € L( M):


a. aabbba.
b. bab.
c. baba.
Answer:

a) aabbba. Yes.
b) bab. No.
c) baba. Yes

CONVERSION FROM ε-NFA TO DFA


Let E= (QE, Σ, δE, q0, FE) be a ε-NFA, then the equivalent DFA D = (QD, Σ, δD, qD, FD) can be
constructed using the following steps:
1. Identify the start state of DFA QD The start state of DFA QD is the ECLOSE(start state of ε-
NFA).
2. Obtain all set of accessible states of DFA QD: The states of DFA are the ε-closed subset of
QE. It can be obtained from ε-NFA using the following transition:

3. Identify the final state: If any of the state in QD belongs to the final state of ε-NFA, then those
components of states are considered as final state of DFA,

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 53


Automata Theory & Compiler Design 21CS51 Module 1

****Convert the following ε-NFA to DFA.


ε a b c
→p {q, r} φ {q} {r}
q φ {p} {r} {p, q}
*r φ φ φ φ

Step1: Compute the ε-closure of each state:


ε-closure (p) = ECLOSE(p) = {p} + {q, r} = {p, q, r}
ECLOSE(q) = {q} + φ = { q }
ECLOSE(r) = {r} + φ = { r }
Step 2: The start state of DFA = ε-closure (start state of ε-NFA)
= ECLOSE (p)
= {p, q, r}
Add this state to QD = { {p, q, r} }
Find the transitions:
δD( (p, q, r), a ) = ECLOSE [ δE(p, a ) Ṳ δE (q, a ) Ṳ δE (r, a ) ]

= ECLOSE [ φṲ p Ṳ φ] = ECLOSE [ p] = {p, q, r} (Already existing state)

δD( (p, q, r), b ) = ECLOSE [ δE(p, b ) Ṳ δE (q, b ) Ṳ δE (r, b ) ]

= ECLOSE [ qṲ r Ṳ φ] = ECLOSE [ q, r] = ECLOSE (q) Ṳ ECLOSE (r)

= {q} U { r} = { q, r } ( New state; add this state to QD)

δD( (p, q, r), c ) = ECLOSE [ δE(p, c ) Ṳ δE (q, c ) Ṳ δE (r, c ) ]

= ECLOSE [ {r} Ṳ {p, q} Ṳ φ] = ECLOSE [ p, q, r]

= ECLOSE (p) U ECLOSE (q) Ṳ ECLOSE (r)


= {p, q, r} U {q} U{ r} = { p, q, r } (Already existing state)

Add the new state {q, r } to QD ie: = { {p, q, r}, { q, r} }


δD( (q, r), a ) = ECLOSE [ δE (q, a ) Ṳ δE (r, a ) ] = ECLOSE [ {p}Ṳ φ ) ]
= ECLOSE (p) = {p, q, r} (Already existing state)
δD( (q, r), b ) = ECLOSE [ δE (q, b ) Ṳ δE (r, b ) ] = ECLOSE [ {r}Ṳ φ ) ]
= ECLOSE (r) = {r} (New state; add this state to QD)
ATHMARANJAN K Department of ISE, SIT MANGALURU Page 54
Automata Theory & Compiler Design 21CS51 Module 1

δD( (q, r), c ) = ECLOSE [ δE (q, c ) Ṳ δE (r, c ) ] = ECLOSE [ {p, q}Ṳ φ ) ]


= ECLOSE (p) U ECLOSE (q) = { p, q, r}U { q}
= {p, q, r}
Add the new state {r} to QD ie = {{p, q, r}, {q, r}, {r}}
δD( (r), a ) = ECLOSE ( δE (r, a ) = ECLOSE ( φ) = φ
δD( (r), b) = ECLOSE ( δE (r, b ) = ECLOSE ( φ) = φ
δD( (r), c ) = ECLOSE ( δE (r, c ) = ECLOSE ( φ) = φ
Final state of DFA FD = { {p, q, r}, { q, r}, {r} }
The resultant DFA is given by M = ({p, q, r}, { q, r}, {r}, {a, b, c} , δ, {p, q, r }, {p, q, r}, { q, r},
{r}) where δ is as shown in transition diagram:

Convert the following ε- NFA to DFA, by computing ε-closure of each state.

δ ε a b

→p {r} {q} {p, r}


q φ {p} φ
*
r {p, q} {r} {p}

Also give the set of all strings of length 3 or less accepted by the automaton.
ε-closure (p) = { p, q, r}
ε-closure (q) = { q}
ε-closure (r) = { p, q, r}
The start state of DFA is the ε-closure (p), where p is the start state of ε – NFA.

Start state of DFA= ε-closure (p) = {p, q, r}

Transition function:

δD( (p, q, r), a ) = ECLOSE [ δE(p, a ) Ṳ δE (q, a ) Ṳ δE (r, a ) ]

= ECLOSE [ qṲ p Ṳ r] = {q}U{p,q,r} U {p, q, r} = {p, q, r} (Already existing state)

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 55


Automata Theory & Compiler Design 21CS51 Module 1

δD( (p, q, r), b ) = ECLOSE [ δE(p, b ) Ṳ δE (q, b ) Ṳ δE (r, b ) ]

= ECLOSE [ {p, r}Ṳ φ Ṳ p] = {p, q, r} U {p, q, r} U { p, q, r} = { p, q, r}


The resultant DFA is given by
M = ({p, q, r}, {a, b} , δ, {p, q, r }, {p, q, r}}) and δ is as shown in transition diagram:

The set of all strings of length 3 or less accepted by the automaton is given by:
L = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, baa, bab, bba, bbb }
Convert the following ε- NFA to DFA, by computing ε-closure of each state.

Answer:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 56


Automata Theory & Compiler Design 21CS51 Module 1

Comparison of DFA, NFA and ε-NFA:


Mention the differences between DFA, NFA and ε-NFA.
S. No. DFA NFA ε-NFA
1 It is deterministic finite Non deterministic finite Non deterministic finite
automaton, where the automaton where the automaton with ε- transitions,
moves of the automaton can moves of automaton cannot where automaton can change the
be predicted. be predicted. state without reading iputs.
2 The transition function is The transition function is The transition function is defined
defined by δ: QX Σ → Q defined by δ: QX Σ → 2Q by δ: QX {ΣUε} → 2Q
3 DFA has unique move for NFA has zero or more ε-NFA has zero or more moves
each input symbol in Σ from moves for each input for each input symbol in {ΣUε}
every state of the automaton symbol in Σ from every from every state of the
state of the automaton automaton
4 Language accepted by DFA, Language accepted by Language accepted by ε-NFA,
L(A) = { w | δ^(q0, w ) is in NFA, L(A) = { w | δ^(q0, w ) ∩ F ≠ φ }
F} L(A) = { w | δ^(q0, w ) ∩ F
≠φ}
5 Number of states are more Number of states are less Number of states are less
compared to NFA and compared to DFA and compared to DFA and easier to
difficult to construct easier to construct construct
* *
6 Example for ab( a+b ) Example for ab( a+b ) Example for ab( a+b )*

MINIMIZATION OF DFA
Language accepted by a Finite Automata is called Regular Language.
Using decision property of regular languages, we can decide whether two automata define the same
language. If so, we can minimize the states of automata with as few states as possible. Minimization
of automata is very important in design of switching circuits. As the number of states of automata
decreases, the size of the circuit decreases and hence the cost decreases.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 57


Automata Theory & Compiler Design 21CS51 Module 1

Equivalence of two states:


The language generated by a DFA is unique. But there can exist many DFA‟s that accept the same
language. In such cases, the DFA‟s are said to be equivalent. During computation it is desirable to
represent DFA with fewer states which increase the storage efficiency.
What are Distinguishable (not equivalent) and Indistinguishable (Equivalent state) states?

Table Filling Algorithm is used to find the set of states that are distinguishable and indistinguishable
states.
Minimization of Automata using Table Filling Algorithm:
Procedure:
1. Eliminate all the states which are not reachable from start state.
2. Identify the initial markings for each pair of states (p, q) such that, p is an accepting state and
q is non-accepting state or vice versa, then pair (p, q) is distinguishable and mark that pair (p,
q) by putting X mark.
3. Identify the subsequent markings for each unmarked pair (p, q), such that for each a € Σ find
δ (p, a) = r and δ (q, a) = s. If the pair (r, s) is already marked as distinguishable then the pair
(p, q) is also distinguishable, so mark pair (p, q) by putting X mark.
Repeat step 3 until no previously unmarked pairs are marked.
4. Obtain the states of minimized DFA such that the group of states, in which the unmarked
pairs of states obtained after performing step3 are considered as indistinguishable
(equivalent) states, they can be merged into one state and individual distinguishable states.
5. Identify the start state of minimized DFA: The group of states in which one of the component
say [p1, p2, p3, p4……pn] consists of start state of given DFA, then that component [p1, p2,
p3, p4……pn] is the start state of minimized DFA.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 58


Automata Theory & Compiler Design 21CS51 Module 1

6. Identify the final state of minimized DFA: The group of states in which one of the component
say [p1, p2, p3, p4……pn] consists of final state of given DFA, then that component [p1, p2,
p3, p4……pn] is the final state of minimized DFA.
i. Draw the table of distinguishable and indistinguishable states for the automata shown below.
ii. Construct minimum state equivalent automata.

Answer:
Step 1: State D is not reachable from start state (refer TT where D is not defined in column 0‟ or
column 1‟) . We can eliminate state D.
Step 2: Identify the initial markings:
B
*C X X
E X
F X
G X
H X
A B *C E F G
Step 3: Identify the subsequent markings:

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 59


Automata Theory & Compiler Design 21CS51 Module 1

Finally unmarked pair of state (A, E) and (B, H) are considered as indistinguishable or equivalent
states.
Individual states C, F and G are distinguishable (not equivalent) states.
Minimized DFA‟s Transition table:
δ 0 1
→(A,E) (B,H) F
(B,H) G C
*C (A,E) C
F C G
G G (A,E)

Minimized DFA‟s Transition diagram

Draw the table of distinguishable and indistinguishable states for the automata shown below and
hence find the minimum state equivalent automata.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 60


Automata Theory & Compiler Design 21CS51 Module 1

Answer: The Transition table of the given DFA:

δ 0 1

→A B C

B D E

C F G

*D D E

E F G

*F D E

*G F G

Identify the initial markings by considering pair of states in which one final and other non-final state.
B
C
D X X X
E X
*F X X X X
*G X X X X
A B C *D E *F

Identify the subsequent markings:


B X
C X X
D X X X
E X X X
*F X X X X
*G X X X X X X
A B C *D E *F

Finally unmarked pair of state (C, E) and (D, F) are considered as indistinguishable or equivalent
states.
The remaining individual states A, B and G are distinguishable (not equivalent) states.
Start state of minimized DFA = A

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 61


Automata Theory & Compiler Design 21CS51 Module 1

Final states of minimized DFA= {D, F} and G


The minimized DFA (By referring the transition table of given DFA)

i. Draw the table of distinguishable and indistinguishable states for the automata.
ii. Construct minimum state equivalent automata.

δ 0 1
→A B E
B C F
*C D H
D E H
E F I
*F G B
G H B
H I C
*I A E

Identify the initial markings by considering pair of states in which one final and other non-final state.

B
*C X X
D X
E X
*F X X X X
G X X
H X X
*I X X X X X X
A B *C D E *F G H

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 62


Automata Theory & Compiler Design 21CS51 Module 1

Identify the subsequent markings:


B X
*C X X
D X
E X X X X
*F X X X X
G X X X
H X X X X X X
*I X X X X X X
A B *C D E *F G H
Finally (A,D), (A,G), (B,E), (B,H), (C,F), (C,I), (D,G), (E,H) and (F,I) are indistinguishable states.
That means states (A,D), (A,G) and (D,G) can be represented as a single state (A,D,G), states
(B,E), (B,H) and (E,H) can be represented as a single state (B,E,H) and states (C,F), (C,I) and (F,I)
can be represented as a single state (C,F,I)
Minimum state equivalent automata:

δ 0 1

→(A,D,G) (B,E,H) (B,E,H)

(B,E,H) (C,F,I) (C,F,I)

*(C,F,I) (A,D,G) (B,E,H)

Consider the two DFA‟s shown below. Using table filling algorithm, show that the language
accepted by both the DFA‟s is same.

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 63


Automata Theory & Compiler Design 21CS51 Module 1

Answer:
Two DFA‟s accept the same language if their start states are equivalent.
Using table filling algorithm, we have to prove that start state of first DFA is A which is equivalent
to start state of second DFA, C.
B X

*C X

*D X

E X X X

*A B *C *D

From table filling algorithm we observe that, states (A, C) are indistinguishable or equivalent, so
both the DFA‟s accept the same language.
Also (A, D), (B, E) and (C, D) are indistinguishable.
So minimized automata contains (A, C, D) and (B, E) states.
Transition table of minimized DFA:

δ 0 1

→(A, C, D) (A,C,D) (B,E)

(B,E) (A,C, D) (B,E)

Minimize the following DFA using Table filling algorithm.

Step 1: Identify the initial markings:


q2
* q3 X X
q4 X
* q5 X X X
q1 q2 * q3 q4

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 64


Automata Theory & Compiler Design 21CS51 Module 1

Step 2: Identify the subsequent markings for each unmarked pairs (q1 , q2 ), (q1 , q4 ), (q2 , q4) &(q3 , q5 )
by referring transition table of DFA.

δ 0 1
(q1 , q2 ) (q2 , q3 ) (q3 , q5 )
(q1 , q4 ) (q2 , q3 ) (q3 , q5 )
(q2 , q4) (q3 , q3 ) (q5 , q5 )
(q3 , q5 ) (q2 , q4 ) (q3 , q5)

At the end of step 2, the resulting marking table is as shown below:

q2 X
* q3 X X
q4 X X
* q5 X X X
q1 q2 * q3 q4

From the above table we observe that states (q2 , q4) and (q3 , q5 ) are not marked. So these states are
considered as indistinguishable (Equivalent) states and the state q1 is distinguishable state.
The minimized DFA will have group of states of distinguishable and indistinguishable.
Thus the minimized DFA will have 3 states: q1, (q2 , q4) and (q3 , q5 )

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 65


Automata Theory & Compiler Design 21CS51 Module 1

INTRODUCTION TO COMPILER DESIGN


Programming languages are notations for describing computations to people and to machines. All the
software running on all the computers was written in some programming language. But, before a
program can be run, it must be translated into a form in which it can be executed by a computer. The
software systems that do this translation are called compilers.
LANGUAGE PROCESSORS:
Language Processors
Language processor is a software which process a program given in a certain source language. It
typically involves:
 Preprocessor
 Compiler
 Assembler
Linker/Loader in translating source program to target machine code.
Preprocessor:
A preprocessor produce input to compiler. A source program may be divided into different modules
stored in separate files. The task of collecting the source program entrusted to a separate program and
expansion of short-hands, called macros into source language statements can be done by using
preprocessor.
Compiler
What is a compiler ?
Compiler is a system software that converts high level language source program into an executable
machine language program.
A compiler is a program that can read a program in one language (source language) and translate it
into an equivalent program in another language (target language). If the target program is an
executable machine-language program, it can then be called by the user to process inputs and
produce outputs.

Example: Compiler based programming languages such as C, C++ etc

ATHMARANJAN K Department of ISE, SIT MANGALURU Page 66


Automata Theory & Compiler Design 21CS51

INTERPRETER
An interpreter is another common kind of language processor. Instead of producing a target
program as a translation, an interpreter appears to directly execute the operations specified in the
source program on inputs supplied by the user.

Example: Interpreter based programming languages like BASIC, JAVA etc


Assembler
Assembler is a system software which translate an assembly language program to a relocatable
machine language.
Loader / Linker
Re-locatable machine code linked together with other re-locatable object files and library files into
the code that actually runs on the machine . Linker resolves the external memory addresses, where
the code in one file may refer to a location in another file.
Loader loads all the executable object files into memory for execution.
With neat diagram explain language processing system.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 67


Automata Theory & Compiler Design 21CS51

Advantages and disadvantages of compiler


Advantages
1. The machine-language target program produced by a compiler is usually much faster than
an interpreter at mapping inputs to outputs.
2. Memory consumption is less.
Disadvantages:
1. Debugging a program and finding errors is quite difficult task.
Advantages and disadvantages of interpreter
Advantages
1. An interpreter, can usually give better error diagnostics than a compiler, because it executes
the source program statement by statement.
Disadvantages
1. The execution of the program is slower.
2. Memory consumption is more
THE STRUCTURE OF A COMPILER
Explain general structure of compiler with neat diagram OR
Explain the various phases of compiler with neat diagram.
Compiler which maps a source program into an executable target program has two parts
1. Analysis : Source program to intermediate representation form (front end)
2. Synthesis : Intermediate representation to target program (back end)
Function of Analysis part:
It breaks up source program into constituent pieces and imposes a grammatical structure on them. It
uses this structure to create intermediate representation of the source program. It detects whether
the source program is syntactically and semantically correct .If not it must provide informative
messages, so that user can take corrective action. It collects the information about the source
program and stores it in Symbol table and passes intermediate representation form of source
program and symbol table content to synthesis part.
Function of synthesis part:
It Construct the desired target program from the intermediate representation and the information in
the symbol table. The synthesis part is the back end of compiler.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 68


Automata Theory & Compiler Design 21CS51

Various phases of a Compiler are:


i. Lexical analyzer (scanning)
ii. Syntax analyzer( Parser) Analysis part
iii. Semantic analyzer

iv. Intermediate code generator


v. Machine independent code optimizer Synthesis part
vi. Code generator
vii. Machine-dependent code optimizer
Also symbol table manager and error handler are two independent modules which will interact
with all phases of compilation.
Block diagram:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 69


Automata Theory & Compiler Design 21CS51

LEXICAL ANALYZER
 The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer
reads the stream of characters making up the source program and groups the characters into
meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as
output a token of the form: (token-name, attribute-value), that it passes on to the subsequent
phase, syntax analysis.
 In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for
this token. Information from the symbol-table entry is needed for semantic analysis and code
generation.
SYNTAX ANALYZER (PARSER)
 It is the second phase of compiler. The parser uses the first components of the tokens
produced by the lexical analyzer to create a tree-like intermediate representation that depicts
the grammatical structure of the token stream. A typical representation is a syntax tree in
which each interior node represents an operation and the children of the node represent the
arguments of the operation.
SEMANTIC ANALYZER
 The semantic analyzer uses the syntax tree and the information in the symbol table to check
the source program for semantic consistency with the language definition. It also gathers
type information and saves it in either the syntax tree or the symbol table, for subsequent
use during intermediate-code generation.
INTERMEDIATE CODE GENERATOR
 In the process of translating a source program into target code, a compiler may construct
one or more intermediate representations, which can have a variety of forms. Syntax trees
are a form of intermediate representation; they are commonly used during syntax and
semantic analysis. The low level or machine like intermediate representation form should be
easy to produce and easy to translate into the target machine. It can be in three address code
or quadruples, triples etc.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 70


Automata Theory & Compiler Design 21CS51

MACHINE-INDEPENDENT CODE-OPTIMIZATION
 Optimization phase is optional. This phase provides better target program than it would
have otherwise produced from un-optimized form. (both machine dependent and
independent optimizers are optional phase). This phase attempts to improve the intermediate
code by eliminating unwanted codes, so that better target code will result. Usually better
means faster, but other objectives may be desired, such as shorter code, or target code that
consumes less power.
CODE GENERATOR
 The code generator takes as input an intermediate representation of the source program and
maps it into the target language. If the target language is machine code, registers or memory
locations are selected for each of the variables used by the program. Then, the intermediate
instructions are translated into sequences of machine instructions that perform the same
task. A crucial aspect of code generation is the judicious assignment of registers to hold
variables.
Symbol Table
 Symbol Table will interact with all phases of compilation. A symbol table is a data structure
containing a record for each identifier with fields for the attributes of the identifier. When an
identifier in the source program is detected by the lexical analyzer, the identifier is entered
into the symbol table.
Show the translations for an assignment statement position = initial + rate* 60, clearly indicate the
output of each phase.
Lexical analyzer phase:
Input to lexical analyzer phase is position = initial + rate * 60
position is a lexeme that would be mapped into a token < id, 1 >
The assignment symbol = is a lexeme that is mapped into the token < = >
initial is a lexeme that is mapped into the token < id, 2>
+ is a lexeme that is mapped into the token < +>
rate is a lexeme that is mapped into the token < id, 3> *
is a lexeme that is mapped into the token < * >
60 is a lexeme that is mapped into the token < 60 >
The output of lexical analyzer phase is:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 71


Automata Theory & Compiler Design 21CS51

< id, 1 > < = > < id, 2 > < + > <id, 3 > < * > < 60 >
Syntax analyzer phase:

Input to syntax analyzer phase is: < id, 1 > < = > < id, 2 > < + > <id, 3 > < * > < 60 >
Syntax analysis produces output in the form of tree called syntax tree with operators are
considered as interior nodes and operands are children of the node with normal precedence rules.
Output of syntax analyzer is a syntax tree of the following form

Semantic analyzer phase:


Input to this phase is syntax tree.

Suppose that position, initial and rate have been declared to be floating point numbers and the
lexeme 60 by itself forms an integer. Type checker in semantic analyzer discovers that the operator
* is applied to a floating point number rate and integer 60. Integer may be converted into a floating
point number. Output of this phase has an extra node for the operator inttofloat.
Output is the modified version of syntax tree:

Intermediate code generator:


Input is syntax tree (output of semantic phase)
Output is in three address code sequence with the following form
t1 = inttofloat(60) t2
= id3 * t1
t3 = id2 + t2 id1
= t3

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 72


Automata Theory & Compiler Design 21CS51

Code optimizer phase:


The above 3 address code sequence is given as an input to code optimizer.
Optimizer can deduce that the conversion of 60 from integer to floating point can be done once and
for all at compile time, so the inttofloat opeartion can be eliminated by replacing integer 60 by the
floating point number 60.0.
Output of code optimizer is

t1 = id3 * 60.0
id1 = id2 + t1
Code generator phase:
Here registers or memory locations are selected for each of the variables used by the program.
Output of code generator phase is (Here all the identifiers are floating point type)
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 73


Automata Theory & Compiler Design 21CS51

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 74


Automata Theory & Compiler Design 21CS51

SYMBOL TABLE MANAGEMENT


Symbol table records variable names used in the source program and the attributes of each name
such as: name, its type, its scope, method of passing each argument (by value or by reference) and
return type. Implementation of symbol table can be done is either linear list or hash table.
GROUPING OF PHASES INTO PASSES
It shows the logical organization of a compiler. In an implementation, activities from several phases
may be grouped together into a pass that reads an input file and writes an output file.
For example,
The front-end phases of lexical analysis, syntax analysis, and semantic analysis and from back end
intermediate code generation might be grouped together into one pass. Code optimization might be
an optional pass. Then there could be a back-end pass consisting of code generation for a particular
target machine. With these collections we can produce compilers for different source languages by
combining different front ends with the back end for that target machine. Similarly we can produce
compilers for different target machines by combining a front end with back ends for different target
machines.
COMPILER CONSTRUCTION TOOLS
Compiler writer can use any modern software development tools such as language editors,
debuggers, version mangers, profilers, test harnesses, and so on. In addition to these general
software development tools, other more specialized integrated compiler construction tools are used.
Explain about commonly used compiler construction tool.
Some commonly used compiler construction tools are:
1. Parser generator that automatically produce syntax analyzer from a grammatical
description of a programming language. YACC .
2. Scanner generators that produce lexical analyzers from a regular-expression description of
the tokens of a language. LEX.
3. Syntax-directed translation engines that produce collections of routines for walking a
parse tree and generating intermediate code.
3. Code-generator generators that produce a code generator from a collection of rules for
translating each operation of the intermediate language into the machine language for a
target machine.
4. Data-flow analysis engines that facilitate the gathering of information about how values

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 75


Automata Theory & Compiler Design 21CS51

are transmitted from one part of a program to each other part. Data-flow analysis is a key
part of code optimization.
5. Compiler-construction tool-kits that provide an integrated set of routines for constructing
various phases of a compiler.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 76


Module 2
---------------------------------------------------------------------------------------------------------------------
Regular Expressions and Languages:
 Regular Expressions
 Finite Automata and Regular Expressions
 Proving Languages Not to Be Regular
Lexical Analysis Phase of compiler Design:
 Role of Lexical Analyzer
 Input Buffering
 Specification of Token
 Recognition of Token.

Textbook 2:
----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.

2. Alfred V. Aho, Monica S.Lam,Ravi Sethi, Jeffrey D. Ullman, “ Compilers Principles,


Techniques and Tools”, Second Edition, Perason.
Textbook 1:

 Chapter3 – 3.1, 3.2

 Chapter4- 4.1

Textbook 2:

 Chapter3- 3.1 to 3.4

Page 77
Automata Theory & Compiler Design 21CS51 Module 2

REGULAR EXPRESSIONS AND LANGUAGES:


Introduction
Instead of focusing on the power of a computing device, let's look at the task that we need to
perform. Let's consider problems in which our goal is to match finite or repeating patterns.
For example regular expressions are used as pattern description language in
• Lexical analysis.-- compiler
• Filtering email for spam.
• Sorting email into appropriate mailboxes based on sender and/or content words and
phrases.
• Searching a complex directory structure by specifying patterns that are known to occur in
the file we want.
A regular expression is a pattern description language, which is used to describe particular
patterns of interest. A regular expression provides a concise and flexible means for
"matching" strings of text, such as particular characters, words, or patterns of characters.
Example: [ ] : A character class which matches any character within the brackets
[^ \t\n] matches any character except space, tab and newline character.
Regular expression
What are regular languages?
A language accepted by finite- automata is called as regular language.
A regular- languages can be described using regular expressions, in the form of algebraic
notations consisting of the symbols such as alphabets in Σ, the operators such as + . and *, where
+ is used for union operations, . is used for concatenation and * is used for closure operations.
Thus the regular expressions are the structural representation of finite-automata using algebraic
notations, which can serve as the input language for many systems that process strings. Example
such as Lex program, Unix Grep etc…
Definition of Regular expression
Define Regular expression.
A regular expression is defined as follows:
Φ is a regular expression denoting an empty language.
ε is regular expression denoting the language containing empty string.
a is regular expression denoting the language containing only {a}.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 78


Automata Theory & Compiler Design 21CS51 Module 2

If R is a regular expression denoting the language LR and S is a regular expression denoting the
language LS then R + S is a regular expression corresponding to the language LR U LS.
R.S is a regular expression corresponding to the language LR. LS.
R* is a regular expression corresponding to the language LR. Thus the expressions obtained by
applying any of the rules are regular expressions.
Examples of Regular expressions
Regular expression Meaning
a* String consisting of any number of a’s. (zero or more a’s)
a+ String consisting of at least one a. (one or more a’s)
(a + b) String consisting of either a or b
*
(a+b) String consisting of any nuber of a’s and b’s including ε
(a+b)* ab Strings of a’s and b’s ending with ab.
ab(a+b)* Strings of a’s and b’s starting with ab.
(a + b)* ab (a+b)* Strings of a’s and b’s with substring ab.

Write the regular expressions for the following languages:


a. Strings of a’s and b’s having length 2:
Regular expression = (a + b) ( a+ b).
b. Strings of a’s and b’s of length  10:
Regular expression = ( ε + a + b)10.
c. Strings of a’s and b’s of even length.
Regular expression = [(a + b) ( a + b) ]*
d. Strings of a’s and b’s of odd length
Regular expression = ( a + b) [(a + b) ( a + b)]*
e. Strings of a’s of even length
Regular expression = (aa)*
f. Strings of a’s of odd length
Regular expression = a(aa)*
Strings of a’s and b’s with alternate a’s and b’s.
Alternate a’s and b’s can be obtained by concatenating the string (ab) zero or more times. ie (ab) *
and adding an optional b to the front ie: (ε +b) and adding an optional a at the end, ie: (ε +a)

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 79


Automata Theory & Compiler Design 21CS51 Module 2

The regular expression = (ε +b) (ab)*(ε +a)


Obtain regular expression to accept the language containing at least one a and one b over Σ = { a,
b, c}. OR
Obtain regular expression to accept the language containing at least one 0 and one 1 over Σ = {
0, 1, 2}.
String should contain at least one a and one b, so the regular expression corresponding to this is
given by = ab + ba
There is no restriction on c’s. Insert any number of a’s, b’s and c;s ie: (a+b+c) * in between the
above regular expression.
So the final regular expression = (a+b+c)* a (a+b+c)* b(a+b+c)* + (a+b+c)*
b(a+b+c)*a(a+b+c)*
Obtain regular expression to accept the language containing at least 3 consecutive zeros.
Regular expression for string containing 3 consecutive 0’s = 000
The above regular expression can be preceded or followed by any number of 0’s and 1’s, ie:
(0+1)*
Regular expression = (0+1)*000(0+1)*
Obtain regular expression to accept the language containing strings of a’s and b’s ending with b
and has no substring aa.
Regular expression for strings of a’s and b’s ending with b and has no substring aa is nothing but
the string containing any combinations of either b or ab without ε.
Regular expression = ( b + ab) (b +ab)*
Obtain regular expression to accept the language containing strings of a’s and b’s such that L = {
a2n b2m | n, m  0 }
a2n means even number of a’s, regular expression = (aa)*
b2m means even number of b’s, regular expression = (bb)*.
The regular expression for the given language = (aa)* (bb)*
Obtain regular expression to accept the language containing strings of a’s and b’s such that L = {
a2n+1 b2m | n, m  0 }.
a2n+1 means odd number of a’s, regular expression = a(aa)*
b2m means even number of b’s, regular expression = (bb)*
The regular expression for the given language = a(aa)* (bb)*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 80


Automata Theory & Compiler Design 21CS51 Module 2

Obtain regular expression to accept the language containing strings of a’s and b’s such that L = {
a2n+1 b2m+1 | n, m  0 }.
a2n+1 means odd number of a’s, regular expression = a(aa)*
b2m+1 means odd number of b’s, regular expression = b(bb)*
The regular expression for the given language = a(aa)*b(bb)*
Obtain regular expression to accept the language containing strings of 0’s and 1’s with exactly
one 1 and an even number of 0’s.
Regular expression for exactly one 1 = 1
Even number of 0’s = (00)*
So here 1 can be preceded or followed by even number of 0’s or 1 can be preceded and followed
by odd number of 0’s.
The regular expression for the given language = (00)* 1 (00)* + 0(00)* 1 0(00)*
Obtain regular expression to accept the language containing strings of 0’s and 1’s having no two
consecutive 0’s. OR
Obtain regular expression to accept the language containing strings of 0’s and 1’s with no pair of
consecutive 0’s.
Whenever a 0 occurs it should be followed by 1. But there is no restriction on number of 1’s. So
it is a string consisting of any combinations of 1’s and 01’s, ie regular expression = (1+01)*
Suppose string ends with 0, the above regular expression can be modified by inserting (0 + ε ) at
the end.
Regular expression for the given language = (1+01)* (0 + ε )
Obtain regular expression to accept the language containing strings of 0’s and 1’s having no two
consecutive 1’s. OR
Obtain regular expression to accept the language containing strings of 0’s and 1’s with no pair of
consecutive 1’s.
Whenever a 1 occurs it should be followed by 0. But there is no restriction on number of 0’s. So
it is a string consisting of any combinations of 0’s and 10’s, ie regular expression = (0+10)*
Suppose string ends with 1, the above regular expression can be modified by inserting (1 + ε ) at
the end.
Regular expression for the given language = (0+10)* (1 + ε )
Obtain regular expression to accept the following languages over Σ = { a, b}.
i. Strings of a’s and b’s with substring aab.
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 81
Automata Theory & Compiler Design 21CS51 Module 2

Regular expression = (a+b)* aab(a+b)*


ii. Strings of a’s and b’s such that 4th symbol from right end is b and the 5th symbol from
right end is a.
Here the 4th symbol from right end is b and the 5th symbol from right end is a the
corresponding regu;lar expression = ab(a+b)(a+b)(a+b).
But the above regular expression can be preceded with any number of a’s and b’s.
Therefore the regular expression for the given language = (a+b)*ab(a+b)(a+b)(a+b).
iii. Strings of a’s and b’s such that 10th symbol from right end is b.
The regular expression for the given language = (a+b)*b(a+b)9.
iv. Strings of a’s and b’s whose lengths are multiple of 3.
OR
L = { |w| mod 3 = 0, where w is in Σ = { a, b}
Length of string w is multiple of 3, the regular expression = [(a+b) (a+b) (a+b)]*

v. Strings of a’s and b’s whose lengths are multiple of 5.


OR
L = { |w| mod 5 = 0, where w is in Σ = { a, b}
Length of string w is multiple of 5,
The regular expression = [(a+b) (a+b) (a+b) (a+b)(a+b)]*
vi. Strings of a’s and b’s not more than 3 a’s:
Not more than 3 a’s, regular expression= (ε+a) (ε+a) (ε+a).
But there is no restriction on b’s, so we can include b* in between the above regular
expression.
The regular expression for the given language = b*(ε+a) b*(ε+a) b* (ε+a) b*
vii. Obtain the regular expression to accept the words with two or more letters but
beginning and ending with the same letter. Σ = { a, b}
Regular expression beginning and ending with same letter is = a a + b b. In between
include any number of a’s and b’s.
Therefore the regular expression = a (a+b)* a + b (a+b)* b
viii. Strings of a’s and b’s of length is either even or multiple of 3.
Multiple of regular expression = [(a+b) (a+b) (a+b)]*
Length is of even, regular expression = [(a+b) (a+b)]*
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 82
Automata Theory & Compiler Design 21CS51 Module 2

So the regular expression for the given language = [(a+b) (a+b) (a+b)]*+ [(a+b)
(a+b)]*
ix. Obtain the regular expression to accept the language L = { anbm | m+n is even }
Here n represents number of a’s and m represents number of b’s.
m+n is even results in two possible cases;
case i. when even number of a’s followed by even number of b’s.
regular expression : (aa)*(bb)*
case ii. Odd number of a’s followed by odd number of b’s.
regular expression = a(aa)* b(bb)*.
So the regular expression for the given language = (aa)*(bb)* + a(aa)* b(bb)*
x. Obtain the regular expression to accept the language L = { anbm | n  4 and m  3 }.
Here n  4 means at least 4 a’s, the regular expression for this = aaaa(a)*
m  3 means at most 3 b’s, regular expression for this = (ε+b) (ε+b) (ε+b).
So the regular expression for the given language = aaaa(a)* (ε+b) (ε+b) (ε+b).
xi. Obtain the regular expression to accept the language L = { anbm cp | n  4 and m  3 p 
2}.
Here n  4 means at least 4 a’s, the regular expression for this = aaaa(a)*
m  3 means at most 3 b’s, regular expression for this = (ε+b) (ε+b) (ε+b).
p  2 means at most 2 c’s, regular expression for this = (ε+c) (ε+c)
So the regular expression for the given language = aaaa(a)*(ε+b) (ε+b) (ε+b) (ε+c)
(ε+c).
xii. All strings of a’s and b’s that do not end with ab.
Strings of length 2 and that do not end with ab are ba, aa and bb.
So the regular expression = (a+b)*(aa + ba +bb)
xiii. All strings of a’s, b’s and c’s with exactly one a.
The regular expression = (b+c)* a (b+c)*
xiv. All strings of a’s and b’s with at least one occurrence of each symbol in Σ = {a, b}.
At least one occurrence of a’s and b’s means ab + ba, in between we have n number
of a’s and b’s.
So the regular expression =(a+b)* a (a+b)* b(a+b)* +(a+b)* b(a+b)* a(a+b)*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 83


Automata Theory & Compiler Design 21CS51 Module 2

Obtain the regular expression for the language L = { anbm | m  1, n  1, nm  3 }.


Solution:
Case i. Since nm  3, if n = 1 then m should be  3. The equivalent regular expression is given
by: RE = a bbb(b)*

Case ii. Since nm  3, if m = 1 then n should be  3. The equivalent regular expression is given
by: RE = aaa(a)* b
Case iii. Since nm  3, if m  2 and n  2 then the equivalent regular expression is given by:
RE = aa(a)* bb(b)*
So the final regular expression is obtained by adding all the above regular expression.
Regular expression = abbb(b)* + aaa(a)*b + aa(a)*bb(b)*
Application of Regular expression:
1. Regular expressions are used in UNIX.
2. Regular expressions are extensively used in the design of Lexical analyzer phase.
3. Regular expressions are used to search patterns in text.
FINITE AUTOMATA AND REGULAR EXPRESSIONS
1. ****Converting Regular Expressions to Automata:
Prove that every language defined by a regular expression is also defined by a finite automata.
Proof:
Suppose L = L(R) for a regular expression R, we show that L = L(E) for some ε-NFA E with:
a. Exactly one accepting state.
b. No arcs into the initial state.
c. No arcs out of the accepting state.
The proof must be discussed with the following transition diagrams for the basis of the
construction of an automaton.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 84


Automata Theory & Compiler Design 21CS51 Module 2

By definition of regular expression, if R is a RE and S is a RE then R+S is also a RE


corresponding to the language L (R+S), its automaton is given by:

Starting at new start state, we can go to the start state of either the automaton for R or S. We then
reach the accepting state of one of these automata R or S. We can follow one of the ε- arcs to the
accepting state of the new automaton.
Automaton for R.S is given by:

The start state of the first( R) automata becomes the start state of the whole and the final state of
the second(S) automata becomes the final state of the whole.
Automaton for R* is given by:

From start state to final state one arc labeled ε ( for ε in R*) or the to the start state of automaton
R through that automaton one or more time and then to the final state.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 85


Automata Theory & Compiler Design 21CS51 Module 2

Convert the regular expression (0 + 1)* 1 ( 0 + 1) to an ε- NFA.


The automaton for L = 0 is given by:

The automaton for L = 1 is given by:

The automaton for L = 0+1 is given by:

The automaton for L = (0+1)* is given by:

The automaton for L = (0+1)* 1 is given by:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 86


Automata Theory & Compiler Design 21CS51 Module 2

Finally the ε-NFA for the regular expression: (0+1)*1(0+1) is given by:

Convert the regular expression (01+ 1)* to an ε- NFA.


The automaton for L = 01 is given by:

The automaton for L = 1 is given by:

The automaton for L = 01+1 is given by:

Finally ε- NFA for L = (01+1)* is given by:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 87


Automata Theory & Compiler Design 21CS51 Module 2

Convert the regular expression (01+ 101) to an ε- NFA.


The automaton for L = 01 is given by:

The automaton for L = 101 is given

The automaton for L = (01+101) is given by:

Convert the regular expression (0+ 1)*01 to an ε- NFA.

The automaton for L = 01 is given by:

The automaton for L = (0+ 1)*is given by:

Epsilon-NFA for the regular expression (0+ 1)*01 is given by:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 88


Automata Theory & Compiler Design 21CS51 Module 2

Convert the regular expression 0* + 1* + 2* to an ε- NFA.


The automaton for L = 0 is given by:

The automaton for L = 1 is given by:

The automaton for L = 2 is given by:

The automaton for L = 0* is given by:

The automaton for L = 1* is given by:

The automaton for L = 2* is given by:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 89


Automata Theory & Compiler Design 21CS51 Module 2

Epsilon-NFA for the regular expression 0* + 1* + 2* is given by:

CONVERTING DFA’s TO REGULAR EXPRESSION USING STATE ELIMINATION


TECHNIQUE
How to build a regular expression for a FSM. Instead of limiting the labels on the transitions of
an FSM to a single character or ε, we will allow entire regular expressions as labels.
• For a given input FSM/FA M, we will construct a machine M’ such that M and M’ are
equivalent and M’ has only two states, start state and a single accepting state.
• M’ will also have just one transition, which will go from its start state to its accepting
state. The label on that transition will be a regular expression that describes all the strings
that could have driven the original machine M from its start state to some accepting state
Algorithm to create a regular expression from FSM: (State elimination)
1. Remove any states from given FSM M that are unreachable from the start state
2. If FSM M has no accepting states then halt and return the simple regular expression Ø.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 90


Automata Theory & Compiler Design 21CS51 Module 2

3. If the start state of FSM M is part of a loop (i.e: it has any transitions coming into it), then
create a new start state s and connects to M ‘s start state via an ε-transition. This new
start state s will have no transitions into it.
4. If a FSM M has more than one accepting state or if there is just one but there are any
transitions out of it, create a new accepting state and connect each of M’s accepting states
to it via an ε-transition. Remove the old accepting states from the set of accepting states.
Note that the new accepting state will have no transitions out from it.
5. At this point, if M has only one state, then that state is both the start state and the
accepting state and M has no transitions. So L (M} = {ε}. Halt and return the simple
regular expression as ε.
6. Until only the start state and the accepting state remain do:
6.1. Select some state s of M which is of any state except the start state or the accepting
state.
6.2 Remove that state s from M.
6.3 Modify the transitions among the remaining states so that M accepts the same
strings The labels on the rewritten transitions may be any regular expression.
7. Return the regular expression that labels the one remaining transition from the start state
to the accepting state
Consider the following FSM M: Show a regular expression for L(M).
OR
Obtain the regular expression for the following finite automata using state elimination method.

We can build an equivalent machine M' by eliminating state q2 and replacing it by a transition
from q1 to q3 labeled with the regular expression ab*a.
So M' is:

Regular Expression = ab*a


ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 91
Automata Theory & Compiler Design 21CS51 Module 2

Obtain the regular expression for the following finite automata using state elimination method.

There is no incoming edge into the initial state as well as no outgoing edge from final state. So
there is only two states, initial and final.

Regular expression = (a+b+c) or (a U b U c)


Obtain the regular expression for the following finite automata using state elimination method.

There is no incoming edge into the initial state as well as no outgoing edge from final state.
After eliminating the state B:

Regular expression = ab
Obtain the regular expression for the following finite automata using state elimination method.

There is no incoming edge into the initial state as well as no outgoing edge from final state.
After eliminating the state B:

Regular expression = ab*c

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 92


Automata Theory & Compiler Design 21CS51 Module 2

Obtain the regular expression for the following finite automata using state elimination method.

Since initial state has incoming edge, and final sate has outgoing edge, we have to create a new
iniatial and final state by connecting new initial state to old initial state through ε and old final
state to new final state through ε. Make old final state has non-final state.

After removing state A:

After removing state B:

Regular expression: 0(10)*


Obtain the regular expression for the following finite automata using state elimination method.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 93


Automata Theory & Compiler Design 21CS51 Module 2

Since there are multiple final states, we have to create a new final state.

After removing states C, D and E:

After removing state B:

Regular Expression: a(b+c+d)


Obtain the regular expression for the following finite automata using state elimination method.

After inserting new start state:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 94


Automata Theory & Compiler Design 21CS51 Module 2

After removing state A:

After removing state B:

Regular expression: b(c +ab)*d

Obtain the regular expression for the following finite automata using state elimination method.

By creating new start and final states:

After removing state B:

After removing state A:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 95


Automata Theory & Compiler Design 21CS51 Module 2

Regular expression: (0+10*1)*


Obtain the regular expression for the following finite automata using state elimination method.

By creating new start and final states:

After removing state q1:

After removing state q2:

After removing state q3:

Regular expression: 1*00*1(0+10*1)*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 96


Automata Theory & Compiler Design 21CS51 Module 2

Obtain the regular expression for the following finite automata using state elimination method.

By creating new start state and final state:

After removing q1 state:

After removing q2 state:

After removing q3 state:


ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 97
Automata Theory & Compiler Design 21CS51 Module 2

After removing q0 state:

Regular expression: (01+10)*


Consider the following FSM M: Show a regular expression for L(M).
OR
Obtain the regular expression for the following finite automata using state elimination method.

Since start state 1 has incoming transitions, we create a new start state and link that state to state
1 through ε.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 98


Automata Theory & Compiler Design 21CS51 Module 2

Since accepting state 1 and 2 has outgoing transitions, we create a new accepting state and link
that state to state 1 and state 2 through ε. Remove the old accepting states from the set of
accepting states. (ie: consider 1 and 2 has non final states)

Consider the state 3 and remove that state:

Consider the state 2 and remove that state:

Consider the state 1 and remove that state:

Finally we have only start and final states with one transition from start state 1 to final state 2,
The labels on transition path indicates the regular edpression.
Regular Expression = (ab U aaa* b)* (a U ε )

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 99


Automata Theory & Compiler Design 21CS51 Module 2

Consider the following FSM M: Show a regular expression for L(M).

After creating new start and final states:

After removing q2 state:

After removing q1 state:

After removing q0 state:

Regular expression: 0* (ε + 1+) = 0* 1*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 100


Automata Theory & Compiler Design 21CS51 Module 2

Consider the following FSM M: Show a regular expression for L(M). OR


Construct regular expression for the following FSM using state elimination method.

By creating new state and final states.

After removing D state:

After removing E state:

After removing A state:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 101


Automata Theory & Compiler Design 21CS51 Module 2

After removing B state:

After removing C state:

Regular expression = (00)*11(11)*

Consider the following FSM M: Show a regular expression for L(M).


OR
Construct regular expression for the following FSM using state elimination method.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 102


Automata Theory & Compiler Design 21CS51 Module 2

By creating final state.

After removing q1state:

After removing q2state:

After removing q3state:

Regular expression= 01*01*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 103


Automata Theory & Compiler Design 21CS51 Module 2

Consider the following FSM M: Show a regular expression for L(M). OR


Construct regular expression for the following FSM using state elimination method.

By creating new start and final states:

After removing q0 state:

After removing q1 state:

After removing q2 state:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 104


Automata Theory & Compiler Design 21CS51 Module 2

After removing q3 state:

Regular expression: (0+1)*1(0+1) +(0+1)*1(0+1)(0+1)


CONVERTING DFA’s TO REGULAR EXPRESSION USING KLEEN’S THEOREM
The construction regular expression using this method describes sets of strings that label certain
paths in the DFA’s transition diagram. However the paths are allowed to pass through only a
limited subset of the states. We start with the simplest expressions that describe paths that are not
allowed to pass through any states (ie: they are single state or single arc), and inductively build
the expressions that let the paths go through progressively larger sets of states. Finally the paths
are allowed to go through any state. At the end these expressions represent all possible paths.
Let us consider a DFA with ‘n’ number of states and use Rijk as the name of regular expression
whose language is the set of strings w is the label of a path from state I to state j in a given DFA
and that path has no intermediate node whose number is greater than k. Note that beginning and
end points of the path are not intermediate, so there is no constraint that i and/or j be less than or
equal to k.
To construct the regular expression Rijk we use the inductive definition, starting at k = 0 and
finally k is reaching k = n which is the number of states in DFA.
When k=0
Regular expressions for the paths that can go through no intermediate state at all, there are 2
kinds of paths that meet such condition:
1. An arc from node (state) i to node j
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 105
Automata Theory & Compiler Design 21CS51 Module 2

2. A path of length 0 that consists of only some node i.


If i ≠ j then only case (1) is possible. We must examine DFA and find those input symbols a such
that there is a transition from state i to state j on symbol a.
a) If there is no such symbol a, then Rij0 =Ø
b) If there is exactly one such symbol a, then Rij0 = a
c) If there are symbols a1, a2, a3, ……………..ak that label arcs from state I to state j, then
Rij0 = a1 + a2 + a3, …………….+.ak
If i = j, then legal paths are the path of length 0 and all loops from i to itself. The path length 0 is
represented by the regular expression ε. Thus we add ε to the various expressions devised in a)
through c) above. That is in case a) expression becomes Ø + ε = ε, in case b) expression becomes
ε+a, in case c) expression becomes ε + a1 + a2 + a3, …………….+.ak
Suppose there is a path from state i to state j that goes through no state higher than k. There are
two possible cases to consider.
1. The path does not go through state k at all. In this case, the label of path is in the
language of Rijk-1
2. The path goes through state k at least once. Then we can break the path into several
pieces, as suggested in below figure:

The first goes from state i to state k without passing through k, the last piece goes from k to j
without passing through k, and all the pieces in the middle go from k to itself, without passing
through k. When we combine the expressions for the paths of the two types above, we have the
expression for the labels of all paths from state i to state j that go through no state higher than k.

Rij0 = Regular expressions for the paths that can go through no intermediate states at all.
Rij1 = Regular expressions for the paths that can go through an intermediate state 1 only.
Rij2 = Regular expressions for the paths that can go through an intermediate state 1 and state 2
only.
Rij3 = Regular expressions for the paths that can go through an intermediate state 1, state 2 and

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 106


Automata Theory & Compiler Design 21CS51 Module 2

State3 only and so on….


NOTE:
Identity rule Example
ɛR = Rɛ=R 1ɛ =ɛ1=1
ØR = RØ = Ø 1Ø = Ø1 = Ø
ɛ* =ɛ
(Ø)* = ɛ
Ø + R = R+ Ø = R Ø +1 =1
R +R = R 1U1=1
RR* =R*R = R+ 00* = 0+
(R*)* = R* (1*)* = 1*
R* R* = R*
ɛ + RR* = R* ɛ + 1+ = 1*
(P+Q)R = PR +QR
(P+Q)* =(P*Q*) = (P*+Q*)*
R*(ɛ + R) = (ɛ + R) R* = R*
(ɛ + R)* = R*
ɛ + R* = R*
(PQ)* P = P(QP)*
R*R + R = R*R =R+

Write the regular expression for the language accepted by the following DFA:

Answer:
When k =0; (passing through no intermediate state), the various regular expressions are:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 107


Automata Theory & Compiler Design 21CS51 Module 2

When k =1; (passing through sate 1 as intermediate state), the various regular expressions are:

Therefore the regular expression corresponding to the language accepted by the DFA is given by:
R122 (state 1(i) is the start state and state 2(j) is the final state). By using the formula:

R122 = R121 + R121 (R221)* R221

Regular expression = 1*0 (0 + 1)*


Write the regular expression for the language accepted by the following DFA:

Answer:
Number of states in DFA = 3; ie: k = 3
By renaming the states of DFA:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 108


Automata Theory & Compiler Design 21CS51 Module 2

Regular expressions for paths that can go through a) no state, b) state 1 only and c) states 1 and 2
only.
Therefore the regular expression corresponding to the language accepted by the DFA is given by:
R133 (state 1(i) is the start state and state 3 (j) is the final state). By using the formula:

Where i =1, j= 3 and k = 3; we get


R133 = R132 + R132 (R332)* R332
= 1*01 + 1*01 (0 + ε + 11)* (0 + ε + 11)
= 1*01 + 1*01 (0 + ε + 11)+ ]
Regular expression = 1*01 (0 + 11)*
Give all the regular expressions Rij(0), Rij(1), Rij(2) and also write the regular expression
corresponding to the language accepted by the automaton given below:
∂ 0 1
→q1 q2 q1

q2 q2 q3

*q3 q3 q2

Answer:
Number of states in DFA = 3; ie: k =3
By renaming the states of DFA as q1 = 1, q2 = 2, q3 = 3
Transition diagram of DFA:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 109


Automata Theory & Compiler Design 21CS51 Module 2

Rij(0) Rij(1) Rij(2)


R11(0) ε+1 R11(1) 1*+ ε = 1* R11(2) 1*
R12(0) 0 R12(1) 1*0 R12(2) 1* 0 0*
R13(0) Ø R13(1) 1*Ø = Ø R13(2) 1* 0 0* 1
R21(0) Ø R21(1) Ø R21(2) Ø
R22(0) 0+ε R22(1) 0+ε R22(2) 0*
R23(0) 1 R23(1) 1 R23(2) 0*1
R31(0) Ø R31(1) Ø R31(2) Ø
R32(0) 1 R32(1) 1 R32(2) 1 0*
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 10*1
Therefore the regular expression corresponding to the language accepted by the DFA is given by:
R133 (state 1(i) is the start state and state 3 (j) is the final state). By using the formula:

Where i =1, j= 3 and k = 3; we get


R133 = R132 + R132 (R332)* R332
= 1* 0 0* 1 + 1* 0 0* 1 ((0 + 10*1)*)* (0 + 10*1)*
Regular expression = 1* 0+ 1 (0 + 10*1)*
Construct the regular expression for the following FSM using Kleen’s Theorem.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 110


Automata Theory & Compiler Design 21CS51 Module 2

By renaming the states of DFA, we get:

Regular expressions for paths that can go through 3 intermediate states: states 1, states 2 and
states 3 only.
Rij(3)
R11(3) Ø+ε=ε
R12(3) b
R13(3) (a + bb)b*
R14(3) ab*a + bbb*a
R21(3) Ø
R22(3) Ø+ε=ε
(3)
R23 bb*
R24(3) bb*a
R31(3) Ø
R32(3) Ø
R33(3) b*
R34(3) b*a
R41(3) Ø
R42(3) Ø
R43(3) Ø
R44(3) Ø+ε=ε

The regular expression corresponding to the language accepted by the DFA is given by: R144
(state 1(i) is the start state and state 4 (j) is the final state). By using the formula:

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 111


Automata Theory & Compiler Design 21CS51 Module 2

Where i =1, j= 4 and k = 4; we get


R144 = R143 + R143 (R443)* R443
= (ab*a + bbb*a) + (ab*a + bbb*a) (ε)* ε
We know that ε* = ε and εR = R
= (ab*a + bbb*a) + (ab*a + bbb*a)
Therefore Regular expression reduces to
= ab*a + bbb*a

REGULAR LANGUAGES AND PROPERTIES OF REGULAR LANGUAGES.


Regular Languages:
The regular languages are the languages accepted by Finite automata (DFA’s, NFA’s and ε –
NFA’s) and defined by regular expressions.
Example: L = { Strings of a’s and b’s ending abb}
L ={ even of a’s and even number of b’s} etc….
There are many languages which are not regular. We can prove that certain languages are not
regular using one powerful tool called pumping lemma.
Example: L= {an bn | n>= 0}
L = { equal number of 0’s and 1’s} etc……
PROVING LANGUAGES NOT TO BE REGULAR
Pumping Lemma (PL) for Regular Languages:
***Theorem (Statement) :

Let L be a regular language. Then there exists a constant ‘n’ (which depends on L) such that for
every string ‘w’ in L such that |w| ≥ n, we can break w into three strings, w=xyz, such that:

1. |y| > 0 ie: y ≠ ε

2. |xy| ≤ n

3. For all k ≥ 0, the string xykz is also in L.


Proof: Suppose L = L(A) for some DFA ‘A’ and regular language L. Suppose ‘A’ ha s’n’
number of states. Consider any string w = a1a2a3………………..am of length ’m’ where m >= n and

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 112


Automata Theory & Compiler Design 21CS51 Module 2

each ai is an input symbol. Since we have ‘m’ input symbols, naturally we should have ‘m+1’
states, in sequence q0, q1, q2……………….qm where q0 is → start state and qm is → final state.

Since |w| ≥ n, by the pigeonhole principle it is not possible to have distinct transitions, since there
are only ‘n’ different states. So one of the state can have a loop. Thus we can find two different
integers i and j with 0 ≤ i < j ≤ n, such that qi = qj. Now we can break the string w = xyz as
follows:
x = a1a2a3……………..ai.
y = ai+1, ai+2, ……..aj ( loop string where i =j)
z = aj+1,aj+2,…………..am.
The relationships among the strings and states are given in figure below:

‘x’ may be empty in the case that i= 0. Also ‘z’ may be empty if j = n = m. However, y cannot be
empty, since ‘i’ is strictly less than ‘j’.
Thus for any k ≥ o, xykz is also accepted by DFA ‘A’; that is for a language L to be a regular,
xykz is in L for all k ≥ o.
Applications of Pumping lemma:
1. It is useful to prove certain languages are non-regular.
2. It is possible to check whether a language accepted by FA is finite or infinite.
Show that L= {an bn | n>= 0} is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
since |w| = n +n = 2n ≥ n, we can split ‘w’ into xyz such that |xy| ≤ n and |y |≥ 1 as

Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 113


Automata Theory & Compiler Design 21CS51 Module 2

According to pumping lemma xykz € L for all k ≥ o.


If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s will be less than number of ‘b’s
ie: w = an-1 bn.
But according to pumping lemma ‘n’ number of ‘a’s should be followed by ‘n’ of ‘b’s, which is
a contradiction to the assumption that the language is regular.
So the language L= {an bn | n>= 0} is not regular language.
Show that L= {ai bj | i > j} is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = an+1 bn
since |w| = n+1 + n = 2n+1 ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as

Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the string ‘w’ has ‘n’ number of ‘a‘s followed by ‘n’
number of ‘b’s. ie: w = an bn.
But according to pumping lemma ‘n+1’number of ‘a’s should be followed by ‘n’ of ‘b’s, which
is a contradiction to the assumption that the language is regular.
So the language L= {ai bj | i>j} is not regular language.
Show that L= {w | na(w) < nb(w) } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = an-1 bn
since |w| = n-1 + n = 2n-1 ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as

Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 114


Automata Theory & Compiler Design 21CS51 Module 2

If ‘k’ = 0, the string ‘y’ does not appear, so the string ‘w’ has ‘n-1’ number of ‘a‘s followed by
‘n-1’ number of ‘b’s. ie: w = an-1 bn-1.
But according to pumping lemma ‘n-1’number of ‘a’s should be followed by ‘n’ of ‘b’s, which is
a contradiction to the assumption that the language is regular.
So the language L= {w | na(w) < nb(w) } is not regular.
Show that L= {w | na(w) = nb(w) } is not regular.
We can prove that L is not regular by taking string w= anbn | n>=0.
For solution refer problem1.
Show that L= {ai bj | i ≠ j} is not regular.
ie: i ≠ j means i > j or i < j; so we can take string ‘w’ = an+1bn or w= an-1bn.
Solution is similar to the previous problems.
Show that L= {an bm cn+m | n,m >= 0} is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Since L is regular it is closed under homomorphism. So we can take h(a) = a, h(b) = a and h(c) =
c.
Now the language L is reduced to L = {an am cn+m | n+m >= 0}
ie: L= {an+m cn+m | n+m >= 0} which is in the form
L = { ai bj | i >=0},
Consider w = an bn
since |w| = n +n = 2n ≥ n, we can split ‘w’ into xyz such that |xy| ≤ n and |y |≥ 1 as

Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s will be less than number of ‘b’s
ie: w = an-1 bn.
Which is a contradiction to the assumption that the language is regular. So the given language
So the language L= {an bn | n>= 0} is not regular language L= {an bm cn+m | n,m >= 0} is not
regular.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 115


Automata Theory & Compiler Design 21CS51 Module 2

Show that L= {ww | w € (a+b)* } is not regular.


Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = an bn, therefore ww = an bn an bn
since |w| = n+n+n + n = 4n ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as

Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s on the left of first b will be less
than number of ‘a’s after the first b
ie: ww = an-1 bnanbn.
Which is a contradiction to the assumption that the language is regular.
So the language L= {ww | w € (a+b)*} is not regular is not regular language.
Show that L= {wwR | w € (a+b)* } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = an bn, therefore wwR = an bn bn an
since |w| = n+n+n + n = 4n ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
x = an-1
y=a
z = bn bn an
Where |x| = n-1 and |y| = 1 so that |xy| = n-1 +1 = n ≤ n, which is true.
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, so the number of ‘a’s on the left of first b will be less
than number of ‘a’s after the first b
ie: wwR = an-1 bn bn an.
Which is a contradiction to the assumption that the language is regular.
So the language L= {wwR | w € (a+b)*} is not regular is not regular language.
Show that L= {an! | n ≥ 0 } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 116


Automata Theory & Compiler Design 21CS51 Module 2

Consider the string w = an!


since |w| = n ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
x = ai
y = aj
z = an! – i –j
Where |x| = i and |y| = j so that |xy| = i + j ≤ n and |y| = j ≥ 1
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, ie: ai (aj)k an!-i-j = an!-j € L
It is clear that n! > n! – j ie: when j=1, n! > n! - 1
But according to pumping lemma, language to be regular, n! = n! – 1, which is not true and it is a
contradiction. So L can not be regular.
Show that L= {0n | n is prime } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = 0n
since |w| = n ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
x = 0i
y = 0j
z = 0n – i –j
Where |x| = i and |y| = j so that |xy| = i + j ≤ n and |y| = j ≥ 1
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, ie: 0i (0j)k 0n-i-j = 0n-j € L
It is clear that when j=1, n ≠ (n -1 ) as a prime
But according to pumping lemma, language to be regular, when n is prime, n-1 is also prime,
which is not true and it is a contradiction. So L can not be regular.
Show that L= {0n | n is a perfect square } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = 0n
since |w| = n2 ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
x = 0i
y = 0j
– i –j
z = 0n
Where |x| = i and |y| = j so that |xy| = i + j ≤ n and |y| = j ≥ 1
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 117
Automata Theory & Compiler Design 21CS51 Module 2

According to pumping lemma xykz € L for all k ≥ o.


If ‘k’ = 0, the string ‘y’ does not appear, ie: 0i (0j)k 0n - i- j
= 0n -j
€ L
2
It is clear that when j=1, (n -1 ) is not a perfect square.
But according to pumping lemma, language to be regular, when n2 is a perfect square, n2-1 is
also a perfect square, which is not true and it is a contradiction. So L can not be regular.
Show that L= {0n | n is a perfect cube } is not regular.
Let L is regular language and ‘n’ be the number of states in FA.
Consider the string w = 0n
since |w| = n3 ≥ n, we can split ‘w’ into ‘xyz’ such that |xy| ≤ n and |y |≥ 1 as
x = 0i
y = 0j
– i –j
z = 0n
Where |x| = i and |y| = j so that |xy| = i + j ≤ n and |y| = j ≥ 1
According to pumping lemma xykz € L for all k ≥ o.
If ‘k’ = 0, the string ‘y’ does not appear, ie: 0i (0j)k 0n - i- j
= 0n -j
€ L
3
It is clear that when j=1, (n -1 ) is not a perfect cube.
But according to pumping lemma, language to be regular, when n3 is a perfect cube, n3-1 is also
a perfect cube, which is not true and it is a contradiction. So L can not be regular.

LEXICAL ANALYSIS PHASE OF COMPILER DESIGN


Lexical Analysis
Lexical analysis reads characters from left to right and groups into tokens. A simple way to build
lexical analyzer is to construct a diagram to illustrate the structure of tokens of the source
program. We can also produce a lexical analyzer automatically by specifying the lexeme patterns
to a lexical-analyzer generator and compiling those patterns into code that functions as a lexical
analyzer. This approach makes it easier to modify a lexical analyzer, since we have only to
rewrite the affected patterns, not the entire program.
Three general approaches for implementing lexical analyzer are:
1. Use lexical analyzer generator (LEX) from a regular expression based specification
that provides routines for reading and buffering the input.
2. Write lexical analyzer in conventional language (C ) using I/O facilities to read input.
3. Write lexical analyzer in assembly language and explicitly manage the reading of

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 118


Automata Theory & Compiler Design 21CS51 Module 2

input.
Note: The speed of lexical analysis is a concern in compiler design, since only this phase reads the
source program character-by character.
Discuss the various issues of lexical analysis.
1. Lexical analyzer reads the source program character by character to produce tokens.
2. Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token
when the parser asks a token from it.
3. Normally L.A. don’t return a comment as a token. It skips a comment, and return the
next token (which is not a comment) to the parser.
4. Correlating error messages: It can associate a line number with each error message. In
some compilers it makes a copy of the source program with the error messages inserted
at the appropriate positions.
5. If the source program uses a macro-preprocessor, the expansion of macros may be
performed by the lexical analyzer.
Role of Lexical Analyzer
Explain the role of lexical analyzer with a block diagram.

• Read the input characters of the source program, group them into lexemes and produces
output as a sequence of tokens.
• It interacts with the symbol table.
• Initially parser calls the lexical analyzer, by means of getNextToken command.
• In response to this command LA read characters from its input until it can identify the
next lexeme and produce a token for that lexeme, which can be returned to parser.
• It eliminates comments and white space.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 119


Automata Theory & Compiler Design 21CS51 Module 2

• It displays error messages with line number.


Lexical Analysis versus Parsing:
Why analysis phase of compiler is separated into lexical analysis and parser.
The main reasons are;
1. Simplicity of design: Separation allows us to simplify the task. For example a parser
that had to deal with comments and whitespace as syntactic units would be
considerably more complex than one that can assume comments and whitespace have
already been removed by the LA.
2. Compiler efficiency is improved: A specialized buffering techniques for reading
input characters can speed up the compiler significantly.
3. Compiler portability is enhanced: Input device peculiarities can be restricted to the
lexical analyzer.
Tokens, Patterns and Lexemes:
Define the following terms with examples:

i. Token.

ii. Pattern.

iii. Lexeme.
Token: It describes the class or category of input string. A token is a pair consisting of a token
name and an optional attribute value.
For example, identifier, keywords, constants are called tokens.
Pattern: Set of rule that describes the tokens. It is a description of the form that the lexemes of a
token may take.
Example: letter [A-Za-z].
Lexeme: Sequence of characters in the source program that are matched with the pattern of the
token.
Example: int, a, num, ans etc.
Token representation:
In many programming languages, the following classes cover most or all of the tokens:
i. One token for each keyword; The pattern for a keyword is the same as the keyword itself.
ii. Tokens for the operators, either individually or in classes such as the token comparison.
iii. One token representing all identifiers.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 120


Automata Theory & Compiler Design 21CS51 Module 2

iv. One or more tokens representing constants, such as numbers and literal.
v. Tokens for each punctuation symbol, such as left and right parentheses, comma, and
semicolon.
Attributes for tokens
A token has only a single attribute that is a pointer to the symbol-table entry in which the
information about the token is kept.
Example: The token names and associated attribute values for the statement E = M * C ** 2 are
written below as a sequence of pairs.
<id, pointer to symbol-table entry for E>
<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>
Lexical errors:
1. It is hard for a lexical analyzer to tell, without the aid of other components, that there is a
source-code error. For instance, if the string fi is encountered for the first time in a C
program in the context:
fi ( a == f ( x ) )
A lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared
function identifier. Since fi is a valid lexeme for the token id, the lexical analyzer must return
the token id to the parser and let some other phase of the compiler- probably the parser in
this case handle an error due to transposition of the letters.
2. Suppose a situation arises in which the lexical analyzer is unable to proceed because none of
the patterns for tokens matches any prefix of the remaining input. The simplest recovery
strategy is "panic mode" recovery. We delete successive characters from the remaining input,
until the lexical analyzer can find a well-formed token at the beginning of what input is left.
This recovery technique may confuse the parser, but in an interactive computing environment
it may be quite adequate.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 121


Automata Theory & Compiler Design 21CS51 Module 2

Possible error-recovery actions are:


1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters.
INPUT BUFFERING
Lexical analyzer scans the input string from left to right, one character at a time. It uses two
pointers as:
 beginLexeme pointer
 forward pointer
To keep track of the position of the input scanned. Initially both the pointers point to the first
character of the input string.

The forward pointer moves ahead to search for end of lexeme. As soon as the blank space is
encountered, it indicates end of lexeme. In above example as soon as forward pointer encounters
a blank space, the lexeme is identified.

The fp will be moved ahead when it sees white space. That is when fp encounters white space
it ignores and moves ahead. Then both fp and bp is set at next token.

What is meant by input buffering?


To recognize tokens reading data/source program from hard disk is done. Accessing hard disk
each time is costly and time consuming so special buffer technique has been developed to reduce
the amount of overhead required. This process of reading source program into a buffer is called
input buffering.
A block of data is first read into a buffer and scanned by lexical analyzer. There are two methods
used
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 122
Automata Theory & Compiler Design 21CS51 Module 2

1. One buffer
2. Two buffer
One buffer scheme:
Here only one buffer is used to store the input string. But the problem with this scheme is that, if
a lexeme is very long, then it crosses the buffer boundary. To scan the remaining part of lexeme,
the buffer has to be refilled, that makes overwriting of first part of lexeme. Sometimes it may
result in loss of data due to the user misinterpretation.
Two Buffer scheme:
Why two buffer schemes is used in lexical analysis? Explain.
Because of the amount of time taken to process characters and the large number of characters
that must be processed during the compilation of a large source program, specialized two
buffering techniques have been developed to reduce the amount of overhead required to process
a single input character.
 Here a buffer (array) divided into two N-character halves, where N = number of
characters on one disk block Ex: 4096 bytes – If fewer than N characters remain in the
input file , then special character, represented by eof, marks the end of source file and it is
different from input character.
 One read command is used to read N characters. Two pointers are maintained: beginning
of the lexeme pointer and forward pointer.
 Initially, both pointers point to the first character of the next lexeme.
 Using this method we can overcome the problem faced by one buffer scheme, even
though the input is lengthier the user knows from where he has to begin in the next
buffer, as he can see the contents of previous buffer. Thus there is no scope for loss of
any data.
Sentinels:
In two buffering scheme we must check the forward pointer, each time it is incremented. Thus
we make two tests: one for the end of the buffer, and one to determine what character is read.
We can combine these two tests, if we use a sentinel character at the end of buffer.

How is input buffering of lexical analyzer is implemented with sentinels?


OR
Define sentinels. Write an algorithm for look ahead code with sentinels.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 123


Automata Theory & Compiler Design 21CS51 Module 2

Sentinel is a special character inserted at the end of buffer, that cannot be a part of source
program; eof is used as sentinel.
Look ahead code:

Operations on Languages:
Give the formal definitions of operations on languages with notations.
In lexical analysis the most important operations on languages are:
i. Union
ii. Concatenation
iii. Star closure
iv. Positive closure.
These operations are formally defined as follows

Regular Expressions
 We use regular expressions to describe tokens of a programming language.
 A regular expression is built up of simpler regular expressions (using defining rules)
ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 124
Automata Theory & Compiler Design 21CS51 Module 2

 Each regular expression denotes a language.


 A language denoted by a regular expression is called as a regular set.
 *, concatenation and | has highest to lowest precedence with left associative.
 If two regular expression ‘r’ and ‘s’ denote the same language, we say ‘r’ and ‘s’ are
equivalent and write r=s
Write the definition of regular expression
Regular expression over alphabet ∑ can be defined as follows:
ε- is a regular expression for the set containing empty string.
a is a regular expression, if a belongs to ∑ that is {a} set containing the string a.
Suppose r and s are regular expression denoting the languages L(r) and L(s) then,
(r|s) is a regular expression denoting L(r) U L(s).
rs is a regular expression denoting L(r)L(s)
(r)* is a regular expression denoting (L(r))*
The following table shows some of the regular expressions along with their possible regular
sets:
Regular expression Set
a|b {a, b}
(a|b)(a|b) or aa|ab|ba|bb {aa, ab, ba, bb}
a* {ε, a, aa, aaa,…}
(a|b)* or (a*b*)* {ε, a,aa,b,bb,…}
a|a*b {a, b, ab,aab,aaab,…}

Algebraic properties
r|s = s|r
r|(s|t)= (r|s)|t
(rs)t = r(st)
r(s|t) = rs|rt
(s|t)r = sr|tr
εr=r
rε=r
r* = (r| ε)* r** = r*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 125


Automata Theory & Compiler Design 21CS51 Module 2

[abc] denotes the regular expression a|b|c


[a-z] denotes the regular expression a|b|……|z
Regular definition:
 To write regular expression for some languages can be difficult, because their regular
expressions can be quite complex. In those cases, we may use regular definitions.
 We can give names to regular expressions, and we can use these names as symbols to define
other regular expressions.
Define regular definition and write the regular definition for C identifier.
(5)

Regular definition for ‘C’ identifier:

letter → A | B |…….|Z | a| b |…| z | _


digit → 0 | 1 | ….. | 9

id → letter ( letter | digit )*

OR
letter → [A-Za-z_]
digit → [0-9]

id → letter (letter | digit )*

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 126


Automata Theory & Compiler Design 21CS51 Module 2

Write the regular definition for an unsigned number


digit → 0 | 1| …..| 9

digits → ( digit )+
optionalFraction → . digits | ε
optionalExponent → ( E ( + | - | ε ) digits ) | ε
number → digits optionalFraction optionalExponent
OR
digit → [0-9]

digits → digit+
number → digits ( . digits ) ? ( E [+-]? digits )?

Recognition of Tokens
Our current goal is to perform the lexical analysis needed for the following grammar.

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 127


Automata Theory & Compiler Design 21CS51
Module 2

ATHMARANJAN K Department of ISE, SIT Mangaluru. Page 128


Automata Theory & Compiler Design 21CS51

Specification of Token
To specify tokens Regular Expressions are used.
Recognition of Token: To recognize tokens there are 2 steps
1. Design of Transition Diagram
2. Implementation of Transition Diagram
Transition Diagrams
A transition diagram is similar to a flowchart for (a part of) the lexer. We draw one for each
possible token. It shows the decisions that must be made based on the input seen. The two main
components are circles representing states (think of them as decision points of the lexer) and arrows
representing edges (think of them as the decisions made).
It is fairly clear how to write code corresponding to this diagram. You look at the first character, if
it is <, you look at the next character. If that character is =, you return (relop, LE) to the parser. If
instead that character is >, you return (relop, NE). If it is another character, return (relop, LT) and
adjust the input buffer so that you will read this character again since you have not used it for the
current lexeme. If the first character was =, you return (relop, EQ).
Write the transition diagram to recognize the token given below:
i. relop (relational operator)
ii. Identifier and keyword
iii. Unsigned number
iv. Integer constant
v. Whitespace
i. Transition diagram for relop:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 129


Automata Theory & Compiler Design 21CS51

ii. Transition diagram for identifier and keyword:

iii. Unsigned number:

iv. Integer constant:

v. Whitespace:
Whitespace characters are represented by delimiter, where delim includes the characters like
blank, tab, new line and other characters that are not considered by the language design to be part
of any token.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 130


Automata Theory & Compiler Design 21CS51

Recognition of reserved words and identifiers.


Draw the transition diagram for identifiers and keywords. How do you handle reserve words
that look like identifiers?

There are two ways we can handle reserved words that look like identifiers:
1. Install the reserved words in the symbol table initially: When we find an identifier, a call
to installID( ) function places that identifier into the symbol table if it is not already there
and returns a pointer to the symbol table entry. The function getToken( ) examines the
symbol table for the lexeme found, and returns token name as either id or one of the
keyword token that was initially installed in the table.
2. Create separate transition diagrams for each keyword
Architecture of a transition diagram based lexical analyzer
The idea is that we write a piece of code for each decision diagram. This piece of code contains a
case for each state, which typically reads a character and then goes to the next case depending on
the character read. nextchar() is used to read a next char from the input buffer. The numbers in the
circles are the names of the cases. Accepting states often need to take some action and return to the
parser. Many of these accepting states (the ones with stars) need to restore one character of input.
This is called retract() in the code.
What should the code for a particular diagram do if at one state the character read is not one of
those for which a next state has been defined? That is, what if the character read is not the label of
any of the outgoing arcs? This means that we have failed to find the token corresponding to this
diagram.
The code calls fail(), is not an error case. It simply means that the current input does not match
this particular token. So we need to go to the code section for another diagram after restoring the
input pointer so that we start the next diagram at the point where this failing diagram started. If
we have tried all the diagram, then we have a real failure and need to print an error message and

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 131


Automata Theory & Compiler Design 21CS51

perhaps try to repair the input.


Construct the transition diagram for relational operators ( =, <, <=, >, >=, and < > ). Write a lexical
analyzer to recognize the above mentioned relational operators.( Write code for START state, one
intermediate state and one final state).
Transition diagram:

Coding part:
TOKEN getRelop( )
{
TOKEN retToken = new(RELOP);
while(1)
{ /* repeat character processing until a return or failure occurs */

Switch (state)
{
case 0: c = nextChar( );
if ( c == '< ‘ ) state = 1; else
if ( c == '=' ) state = 5; else if
( c == '>' ) state = 6;
else fail( ); /* lexeme is not a relational operator…other */
break;

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 132


Automata Theory & Compiler Design 21CS51

case 1: c = nextChar( );
if ( c == '=' ) state = 2;
else if ( c == '>' ) state = 3;
else if ( c == other character ) state = 4;
else fail( ); /* lexeme is not a relational operator…other */
break;
……..
……..
case 8: retract( );
retToken.attribute = GT;
return(retToken);
}
}
}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 133


Module 3
---------------------------------------------------------------------------------------------------------------------
Context Free Grammars:
 Definition and designing CFGs,
 Derivations Using a Grammar
 Parse Trees
 Ambiguity and Elimination of Ambiguity
 Elimination of Left Recursion
 Left Factoring.
Syntax Analysis Phase of Compilers: part-1:
 Role of Parser
 Top-Down Parsing

----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.

2. Alfred V. Aho, Monica S.Lam,Ravi Sethi, Jeffrey D. Ullman, “ Compilers Principles,


Techniques and Tools”, Second Edition, Perason.
Textbook 1:

 Chapter 5 – 5.1.1 to 5.1.6, 5.2 (5.2.1, 5.2.2) and 5.4


 Chapter4- 4.1
Textbook 2:

 Chapter 4 – 4.1, 4.2, 4.3 (4.3.2 to 4.3.4) ,4.4

Page | 134
Automata Theory & Compiler Design 21CS51 Module 3

CONTEXT FREE GRAMMAR (CFG)


Regular Languages (FSMs, Regular expressions. and Regular grammars) that offer less power and
flexibility than a general purpose programming language provides. Because the frameworks were
restrictive, we were able to describe a large class of useful operations that could be performed on
the languages that we defined.
We will begin our discussion of the context-free languages with another restricted formalism, the
context-free grammar.
Introduction to Re-write systems and grammars:
Re-write system is a rule based system in which there is a collection of rules and an algorithm for
applying them.
Each rule has LHS and RHS
Example:
S → aS
S → bA
A→ ɛ
Re-write system works on particular set of strings and try to match the LHS of the rule against
some part of the string. But the core ideas that we will present can be used to define rewrite systems
that operate on richer data structures (programming Languages).
Obtaining the string w using the rules in re-writes system is called Derivation.
Grammar
What is a grammar?
The re-write system which is used to define a language is called Grammar.
G is a grammar which generates a language L then the language is specified as L(G). Grammar
work on set of symbols; can be of two types: Non-terminal symbols and Terminal symbols.
Non-terminal and Terminal Symbols
What is Non-terminal and Terminal Symbols?
Non-terminal symbols are kind of symbols that act as working symbols while the grammar is
working on derivation. Non-terminal symbols disappear when the grammar completely derived the
string w of L(G).
Terminal Symbols are from the input ∑. These symbols generate the string w of L(G).
Every grammar need one special symbol called start symbol. It is normally denoted by S.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 135


Automata Theory & Compiler Design 21CS51 Module 3

Context-Free Grammars and Languages


We now define a context-free grammar (or CFG) to be a grammar in which each rule must:
 have a left-hand side that is a single non terminal, and
 Have any sequence (possibly empty) of symbols (non-terminal and/or terminals) on right-
hand side.
For Example: S→ aSb
S→ ε
T→ T
S→ aSbbTT
The grammar is so called context Free because, using these rules, the decision to replace a non
terminal by some other sequence is made without looking at the context in which the non terminal
occurs. This rule says that S can be replaced by aSb or ε or aSbbTT, as required.
NOTE:
The rule aSa → aTa is not a context free grammar. This rule says that S can be replaced by T when
it is surrounded by a's. This type of grammar rule is called context-sensitive because its rules allow
context to be considered.
Define Context Free Grammar (CFG).
A context free grammar G is a Quadruple (V, T, P, S) where
V – Set of Non-terminal symbols
T – Set of Terminal symbols
P – Set of production rules, where each production rule is in the form of
A→α
Where α is in (V U T)* and A is non-terminal and there is only one non- terminal on the left hand
side of the production rule.
Example: S → AaB | Ba | abb
A → bA | a
B→b
Define the term productions in CFG.
In CFG the rules which are applied to obtain a grammatically correct sentence are called
productions.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 136


Automata Theory & Compiler Design 21CS51 Module 3

 Each production starts with non-terminal, followed by an arrow, followed by combinations


of non-terminals and/or terminals.
Example: S → AaB | Ba | abb|ɛ
DESIGN OF CONTEXT FREE GRAMMAR
Context free grammar can be generated for different formal languages by constructing the basic
building block grammar for the languages like:
an | n ≥ 0.
an bn | n ≥ 0.
an+1 bn | n ≥ 0.
an bn+1 | n ≥ 0.
a2n bn | n ≥ 0.
an b2n | n ≥ 0.
Write the CFG for the language L = {an | n ≥ 0 }.
This we can easily write, by constructing DFA and then converting into CFG or we can directly
write the grammar.
The DFA for the given language is

The transition function is given δ (S, a ) = S.


This transition function can be converted into CFG as follows:
The first symbol (non-terminal) is identified as the state for which transition is defined.
ie: S and , is replaced by → and the input symbol „a‟ and the next state S are concatenated and
written on RHS of CFG. ie: S → aS.
For final state we have to include „ɛ‟ on RHS of CFG. ie: S → ɛ.
Therefore the CFG for the language L ={ an | n ≥ 0 } is given by:
S → aS.
S → ɛ.
Note: The CFG for the language L = { an | n ≥ 1} is given by:
S → aS
S→a ; Where the minimum string is a when n =1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 137


Automata Theory & Compiler Design 21CS51 Module 3

Similarly we can write the CFG for other languages as follows:


The CFG for the language L = { an bn | n ≥ 0} is given by:
S → aSb
S→ɛ
Here for every „a‟ one „b‟ has to be generated. This is obtained by suffixing „aS’ with one „b‟. The
minimum string when n= 0 is ɛ.
If L ={ an bn | n ≥ 1}, then minimum string is „ab’ instead of ɛ when n= 1, so the resulting grammar
is: S → aSb
S →ab
The CFG for the language L ={ an+1 bn | n ≥ 0} is given by:
S → aSb
S→a
If n ≥ 1 then CFG becomes:
S → aSb
S → aab
The CFG for the language L ={ an bn+1 | n ≥ 0} is given by:
S → aSb
S→b
If n ≥ 1 then CFG becomes:
S → aSb
S → abb
The CFG for the language L = { a2n bn | n ≥ 0}
S → aaSb
S→ɛ
Here for every two „a‟s one „b‟ has to be generated. This is obtained by suffixing „aaS’ with one
„b‟. The minimum string is ɛ.
If L= {a2n bn | n ≥ 1}, then minimum string is „aab’ instead of ɛ, when n= 1, so the resulting
grammar is:
S → aaSb
S →aab

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 138


Automata Theory & Compiler Design 21CS51 Module 3

The CFG for the language L = { an b2n | n ≥ 0}


S → aSbb
S→ɛ
Here for every „a‟ two „b‟s have been be generated. This is obtained by suffixing „aS‟ with „bb‟.
The minimum string is ɛ.
If n ≥ 1 then CFG becomes:
S → aSbb
S → abb
Design of context free grammar for the language represented using regular expression.
i. Obtain CFG for the language L = { (a, b)*}
Language represents any number of a‟s and b‟s with ɛ.
S → aS | bS | ɛ

ii. Obtain CFG for the language L = { (w ab w| where w € (a+b)* }


OR
Obtain CFG for the language containing strings of a‟s and b‟s with substring „ab‟
L = {w ab w} can be re-written as:

Where A production represents any number of a‟s and b‟s and is given by:
A → aA | bA | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → AabA
A → aA | bA | ɛ
iii. Obtain CFG for the language L = { ( 011 + 1)* 01 }
L can be re-written as:

A production represents any number of 011‟s and 1‟s including ɛ.


ie: A → 011A | 1A | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { 0, 1}, S is the start symbol, and P is the production rule is as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 139


Automata Theory & Compiler Design 21CS51 Module 3

S → A01
A → 011A | 1A | ɛ
iv. Obtain CFG for the language L = { w| w € (0,1)* with at least one occurrence of „101‟ }.
The regular expression corresponding to the language is L = { w 101 w }

Where A production represents any number of 0‟s and 1‟s and is given by:
A → 0A | 1A | ɛ
Therefore the resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → A101A
A → 0A | 1A | ɛ
v. Obtain CFG for the language L = { w| wab € (a,b)* }.
OR
Obtain CFG for the language containing strings of a‟s and b‟s ending with‟ab‟. }.
The resulting grammar is G = ( V, T, P, S) where,
V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → Aab
A → aA | bA | ɛ
vi. Obtain CFG for the language containing strings of a‟s and b‟s ending with‟ab‟ or „ba‟. }.
OR
Obtain the context free grammar for the language L = { XY | X € (a, b)* and Y € (ab or ba)
The regular expression corresponding to the language is w (ab + ba) where w is in( a, b)*

X→ aX | bX | ɛ
Y → ab | ba
The resulting grammar is G = ( V, T, P, S) where,
V = { S, X, Y }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → XY
X→ aX | bX | ɛ
Y → ab | ba

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 140


Automata Theory & Compiler Design 21CS51 Module 3

Obtain the CFG for the language L = { Na (w) = Nb (w) | w € (a, b)* }
OR
Obtain the CFG for the language containing strings of a‟s and b‟s with equal number of a‟s and b‟s.
Answer:
To get equal number of a‟s and b‟s, we know that there are 3 cases:
i. An empty string ɛ has equal number of a‟s and b‟s
ii. Equal number of a‟s followed by equal number of b‟s.
iii. Equal number of b‟s followed by equal number of a‟s.
The corresponding productions for these 3 cases can be written as
S→ ɛ
S→ aSb
S→ bSa
Using these productions the strings of the form ɛ, ab, ba, ababab….., bababa…. etc can be
generated.
But the strings such as abba, baab, etc, where the strings starts and ends with the same symbol,
cannot be generated from these productions. So to generate these type of strings, we need to
concatenate the above two productions which generates equal a’s and equal b’s and equal b’s and
equal a’s or vice versa. The corresponding production is S→ SS.
The resulting grammar corresponding to the language with equal number of a‟s and equal number
of b‟s is G = ( V, T, P, S) where,
V = { S }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S→ɛ
S→ aSb
S → bSa
S → SS
Obtain the CFG for the language L = { Na (w) = Nb (w) + 1 | w € (a, b)* }
The language containing stings of a‟s and b‟s with number of a‟s one more than number of „b‟s.
Here we should have one more a‟s either in the beginning or at the end or at the middle.
We can write the A production with equal number of a‟s and equal number of b‟s as
A→ ɛ | aAb | bAa |AA
and finally inserting one extra „a‟ between these A production. ie:
S→ AaA

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 141


Automata Theory & Compiler Design 21CS51 Module 3

The resulting grammar corresponding to the language Na(w) = Nb(w) +1 is G = ( V, T, P, S) where,


V = { S, A }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → AaA
A→ aAb | bAa | AA | ɛ
******* c) Obtain the CFG for the language L = { set of all palindromes over {a, b} }
i. ɛ is a palindrome; ie: S→ ɛ
ii. a is a palindrome; ie: S→ a ( odd length string)
iii. b is a palindrome ; ie: S→ b ( odd length string)
iv. If w is a palindrome then the string awa and the string bwb are palindromes.
ie: If S is palindrome then aSa and bSb is also palindrome. The corresponding production is
S→ aSa | bSb
The resulting grammar corresponding to the language L = { set of all palindromes} is G = ( V, T, P,
S) where,
V = { S }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aSa | bSb | a | b | ɛ
Obtain the CFG for the language L = {set of all non-palindromes over {a, b}}
Non-palindrome strings are not having same symbol at the start and ending point.
ie: A→ aBb | bBa
Where B corresponds to any number of a‟s and b‟s; ie: B→ aB| bB |ɛ
Finally non-palindrome strings are generated by inserting A production between a palindrome
production S; ie S→ aSa| bSb | A
The resulting grammar corresponding to the language L = { set of all non-palindromes} is G = ( V,
T, P, S) where,
V = { S, A, B }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aSa | bSb | A
A→ aBb | bBa
B→ aB | bB |ɛ
Obtain the CFG for the language L = { wwR | w€ (a, b)*}
NOTE: wwR generates palindrome strings of a‟s and b‟s of even length.
That means we can remove the odd length palindrome strings such as „a‟ and „b‟ from the above
palindrome problem

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 142


Automata Theory & Compiler Design 21CS51 Module 3

The resulting grammar corresponding to the language L = { wwR} is G = ( V, T, P, S) where,


V = { S }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aSa | bSb | ɛ

Obtain the CFG for the language L = { w = wR | w is in (a, b)*}


OR
L = {palindrome strings over {a, b}
Note: w = wR indicates that string w and its reversal wR is always equal; That means the strings
generated from the language is palindrome strings. (either even or odd length palindrome).

The resulting grammar corresponding to the language L = { w= wR } is G = ( V, T, P, S) where,


V = { S }, T = { a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aSa | bSb | a | b | ɛ
Obtain the CFG for the language containing all positive odd integers up to 999.
The resulting grammar corresponding to the language L = {all positive odd integers up to 999 } is G
= ( V, T, P, S) where,
V = { S, C, D }, T = { 0,1, 2,3,4,5,6,7,8,9}, S is the start symbol, and P is the production rule is as
shown below:
S → C | DC | DDC
C → 1|3|5|7|9
D → 0| 1|2|3|4|5|6|7|8|9
Obtain the context free grammar for the language L = {0m 1m 2n | m, n ≥ 1 }
Answer:

We know that CFG corresponding to the language 0m 1m | m ≥ 1, by referring the basic building
block grammar of an bn | n ≥ 1.
The equivalent A production is:
A → 0A1
A → 01

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 143


Automata Theory & Compiler Design 21CS51 Module 3

Here B represents any number of 2‟s with at least one 2 (n ≥ 1), which is similar to an grammar.
The equivalent B production is:
B → 2B
B→2
So the context free grammar for the language L = { 0m 1m 2n | m, n ≥ 1 } is G = ( V, T, P, S) where,
V = { S, A, B}, T = { 0, 1, 2}, S is the start symbol, and P is the production rule is as shown below:
S → AB
A → 0A1 | 01
B → 2B | 2
Obtain the context free grammar for the language L = {a2n bm | m, n ≥ 0 }
Answer:
Since „a‟ represented in terms of „n‟ and „b‟ represented in terms of „m‟, we can re-write the
language as:

Here A represents 2n number of a‟s, and its equivalent production is A→ aaA| ɛ


and B represents m number of b‟s, and its equivalent production is B→ bB| ɛ
So the context free grammar for the language L = {a2n bm | m, n ≥ 0 } is G = ( V, T, P, S) where,
V = { S, A, B}, T = { a, b }, S is the start symbol, and P is the production rule is as shown below:
S → AB
A → aaA |ɛ
B → bB |ɛ
Obtain the context free grammar for the language L = {0i 1j 2k | i = j or j =k where i, j, k ≥ 0 }
Case 1: when i = j
The given language becomes

The resultant production is given by: A→ 0A1| ɛ and B→ 2B| ɛ


Therefore case 1 results in productions
S→ AB
A→ 0A1| ɛ
B→ 2B| ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 144


Automata Theory & Compiler Design 21CS51 Module 3

Case 2: when j = k

Therefore case 2 results in productions


S→ CD
C→ 0C | ɛ
D→ 1D2| ɛ
So the context free grammar for the language L = { 0i 1j 2k | i = j or j =k where i, j, k ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A, B, C, D}, T = {0, 1, 2}, S is the start symbol, and P is the production rule is as shown
below:
S→ AB| CD
A→ 0A1| ɛ
B→ 2B| ɛ
C→ 0C | ɛ
D→ 1D2| ɛ
Obtain the context free grammar for the language L = { 0i 1j | i ≠ j where i, j ≥ 0 }
Answer:
Here we should not have equal number of 0‟s and 1‟s. ie: i ≠ j.
Therefore we have two possible cases:
Case 1: when i > j. ie: Number of 0‟s greater than number of 1‟s. That means at least one 1
followed by equal number of 0‟s and 1‟s. Therefore the corresponding language is
L = { 0i1j 1+ | i, j ≥ 0 }

Where A→ 0A| 0 B→ 0B1| ɛ


Case 2: when i < j. ie: Number of 0‟s less than number of 1‟s. That means at least one 0 preceded
by equal number of 0‟s and 1‟s. Therefore the corresponding language is
L = { 0+ 0i1j | i, j ≥ 0 }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 145


Automata Theory & Compiler Design 21CS51 Module 3

Where C→ 1C| 1
So the context free grammar for the language L = { 0i 1j | i ≠ j where i, j ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A, B, C}, T = {0, 1}, S is the start symbol, and P is the production rule is as shown below:
S→ AB |BC
A→ 0A| 0
B→ 0B1| ɛ
C→ 1C| 1
Obtain the context free grammar for the language L = {an bm | n = 2m where m ≥ 0 }
Answer:
By substituting n = 2m we have
L= { a2m bm | m ≥ 0 }
Here for every two „a‟s one „b‟ has to be generated. This is obtained by suffixing „aaS‟ with one
„b‟. The minimum string is ɛ.
So the context free grammar for the language L = {an bm | n = 2m where m ≥ 0 }
is G = ( V, T, P, S) where,
V = {S }, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aaSb
S→ɛ
Obtain the context free grammar for the language L = {an bm | n ≠ 2m where n, m ≥ 1 }
Answer:
Here n ≠ 2m means n > 2m or n< 2m, which results in two possible cases of Language L.
Case 1: when n > 2m, we can re-write the language L by taking n = 2m + 1
L= { a2m+1 bm | m ≥ 1}; by referring the basic building block grammar example, the resulting
production ( a2m bm ) is given by:
A → aaAb
The minimum string when m = 1 is ‘aaab’.
ie : A → a

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 146


Automata Theory & Compiler Design 21CS51 Module 3

Therefore A → aaAb | aaab


Case 2: when n < 2m, we can re-write the language L by taking n = 2m - 1
L= { a2m-1 bm | m ≥ 1 }; by referring the basic building block grammar example, the resulting
production ( a2m bm ) is given by:
B → aaBb
The minimum string when m = 1 is ‘ab’.
ie : B→ ab
Therefore ; B → aaBb | ab.
So the context free grammar for the language L = {an bm | n ≠ 2m where n, m ≥ 1 }
is G = ( V, T, P, S) where,
V = {S, A, B }, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S →A | B
A → aaAb | aaab
B → aaBb | ab.

Obtain the context free grammar for the language L = { 0i 1j 2k | i + j = k where i, j ≥ 0 }


Answer:
When i+ j =k, the given language becomes: L = 0i 1j 2i + j
L = 0i 1j 2i 2 j = ; minimum value when i=0 is A
Note: For this type of language we have to select the middle string as a substring (A) and we need
to insert this substring between the start production ie: 0i 2i (where middle term A is ignored)
The equivalent substring A production is given by: A→ 1A2| ɛ
The start production S→ 0S2| A ; here the minimum value when i = 0 is A
So the context free grammar for the language L = { 0i 1j 2k | i + j = k where i, j ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A}, T = {0, 1, 2}, S is the start symbol, and P is the production rule is as shown below:
S→ 0S2| A
A→ 1A2| ɛ
Obtain the context free grammar for the language L = { an bm ck | n+ 2m = k where n, m ≥ 0 }
When n+2m = k, the given language becomes: L = an bm cn + 2m =
Minimum value when n = 0 is A
So the context free grammar for the language L = { an bm ck | n + 2m = k where n, m ≥ 0 }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 147


Automata Theory & Compiler Design 21CS51 Module 3

is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, c}, S is the start symbol, and P is the production rule is as shown below:
S→ aSc| A
A→ bAcc| ɛ
Obtain the context free grammar for the language L = { w an bn wR| W is in (0, 1)* and n ≥ 0 }
Answer: we can re-write the language L as
The corresponding A production is given by; A → aAb |ɛ ; min. value is ɛ when n = 0
We can insert this substring A production between wwR production represented by S.
The corresponding S production is S → 0S0 | 1S1 |A
Note: In S production minimum value is A, when wwR results in ɛ; ie: only the middle substring A
appears.
So the context free grammar for the language L = { w an bn wR| w is in (0, 1)* and n ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → 0S0 | 1S1 | A
A→ aAb | ɛ

Obtain the context free grammar for the language L = { an wwR bn | w is in (0, 1)* and n ≥ 2 }

L=

Corresponding A production is A → 0A0 | 1A1 | ɛ


By inserting this substring A production between start production S; ie for an bn | n ≥ 2
S production is; S → aSb | aaAbb ; minimum string when n =2 is aaAbb
So the context free grammar for the language L = { an wwR bn | W is in (0, 1)* and n ≥ 2 }
is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b, 0, 1}, S is the start symbol, and P is the production rule is as shown below:
S → aSb | aaAbb
A→ 0A0| 1A1 |ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 148


Automata Theory & Compiler Design 21CS51 Module 3

Obtain the context free grammar for the language L = { an bn ci | n ≥ 0, i ≥1 U an bn cm dm | n, m ≥


0}
Answer:

S1 production is ; S1 → AB
A → aAb |ɛ
B → cB | c
S2 production is ; S2 → AC
C → cCd | ɛ
So the context free grammar for the language L = {an bnci | n ≥ 0, i ≥1 U an bn cm dm | n, m ≥ 0 }
is G = ( V, T, P, S) where,
V = {S, S1, S2 A, B, C}, T = {a, b, c, d}, S is the start symbol, and P is the production rule is as
shown below:
S → S1| S2
S1 → AB
A → aAb |ɛ
B → cB | c
S2 → AC
C → cCd | ɛ

Obtain the context free grammar for the language L1L2 where L1 = { an bn ci | n ≥ 0, i ≥1 } and L2
={ 0n12n | n ≥ 0 }
Answer:

S1 production is ; S1 → AB
A → aAb |ɛ
B → cB | c
S2 production is: S2 → 0 S211 | ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 149


Automata Theory & Compiler Design 21CS51 Module 3

So the context free grammar for the language L1 = { an bn ci | n ≥ 0, i ≥1 } and L2 ={ 0n12n | n ≥ 0


}
is G = ( V, T, P, S) where,
V = {S, S1, S2, A, B}, T = {a, b, c, 0, 1}, S is the start symbol, and P is the production rule is as
shown below:
S → S1 S2
S1 → AB
A → aAb |ɛ
B → cB | c
S2 → 0S211 | ɛ
******* Obtain the context free grammar for the language L = { an+2 bm | n ≥ 0, m > n }
It is clear from the above language that set of strings that can be generated is represented as:
n=0 n=1 n=2 …….
m=1 m=2 m=3 … m=2 m=3 m=4 . m=3 m=4 m=5 ……….
aab aabb aabbb … aaabb aaabb aaabb . aaaabbb aaaabb aaaabbbb
b b bb b
a ab b* a aabb b* a aaabbb b* ……….
a anbn b*
We observe that above language consists of strings of a‟s and b‟s which starts with one a followed
by an bn n ≥ 1, which in-term followed by any number of b‟s (b*) .
Therefore the given language L can be re-written as L = { a anbn b* | n ≥ 1}
L=

A → aAb |ab
B→ bB |ɛ ; and S production is S → aAB
So the context free grammar for the language L = { an+2 bm | n ≥ 0, m > n } is G = ( V, T, P, S)
where,
V = {S, A, B}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below: S
→ aAB
A → aAb |ab
B→ bB |ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 150


Automata Theory & Compiler Design 21CS51 Module 3

******* Obtain the context free grammar for the language L = { an bm | n ≥ 0, m > n }
n=0 n=1 n=2 …….
m=1 m=2 m=3 … m=2 m=3 m=4 . m=3 m=4 m=5 ……….
ɛb ɛ bb ɛ bbb … abb abbb abbbb . aabbb aabbbb aabbbbb
ɛ b+ ab b+ aabb b+ ……….
an bn b+ where n ≥ 1
We observe that above language consists of strings of a‟s and b‟s with n number of a‟s followed by
n number of b‟s, which in term followed by any number of b‟s with at least one b
L = { an b n b + | n ≥ 0 }

A production is; A → aAb | ɛ


B production is; B → bB | b
So the context free grammar for the language L = { an bm | n ≥ 0, m > n } is G = ( V, T, P, S)
where,
V = {S, A, B}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → AB
A → aAb | ɛ
B→ bB | b

******* Obtain the context free grammar for the language L = { an bn-3 | n ≥ 3 }
Answer:
L = { aaaɛ, aaaab, aaaaabb, aaaaaabbb,………………………………….. }
So we can re-write the language as;
L = aaa an bn | n ≥ 0
So the context free grammar for the language L = { an bn-3 | n ≥ 3 } is G = ( V, T, P, S) where,
V = {S, A}, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → aaaA
A → aAb | ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 151


Automata Theory & Compiler Design 21CS51 Module 3

MODULO – K PROBLEMS: WRITING CFG BY CONSTRUCTING DFA:


******* Obtain the context free grammar for the language L = { w € ( a)* | |w| mod 3 ≠ |w|
mod 2 }
Answer:
Here mod 3 results in 3 remainders such as; 0, 1 and 2 and mod 2 results in 2 remainders such as 0
and 1:
The possible states are: ( 0, 0), ( 0, 1), ( 1, 0), ( 1, 1), ( 2, 0), (2, 1)
The equivalent DFA:

The productions are:


S → aA
A→ aB
B→ aC | ɛ
C→ aD | ɛ
D→ aE| ɛ
E→ aS |ɛ
So the context free grammar for the language L = { w € ( a)* | |w| mod 3 ≠ |w| mod 2 }
is G = ( V, T, P, S) where,
V = {S, A, B, C, D, E}, T = {a, b}, S is the start symbol, and P is the production rule is as shown
below:
S → aA
A→ aB
B→ aC | ɛ
C→ aD | ɛ
D→ aE| ɛ
E→ aS |ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 152


Automata Theory & Compiler Design 21CS51 Module 3

******* Obtain the context free grammar for the language L = { w € ( a, b)* | |w| mod 3 ≠ |w| mod
2}
DFA:

The productions are:


S → aA | bA
A→ aB |bB
B→ aC | bC| ɛ
C→ aD |bD | ɛ
D→ aE| bE |ɛ
E→ aS |bS| ɛ
So the context free grammar for the language L = {w € (a, b)* | |w| mod 3 ≠ |w| mod 2 }
is G = ( V, T, P, S) where,
V = {S, A, B, C, D, E}, T = {a, b}, S is the start symbol, and P is the production rule is as shown
below:
S → aA | bA
A→ aB |bB
B→ aC | bC| ɛ
C→ aD |bD | ɛ
D→ aE| bE |ɛ
E→ aS |bS| ɛ
******* Obtain the context free grammar for the language L = {w: Na(w) mod 2 = 0 where w
€ ( a, b)* }
Answer:
Na(w) mod 2 = 0 means; the string contains even number of a‟s and any number of b’s.
The Language can be re-written as: bn a2m bn | n ≥ 0, m ≥ 0
The S production is given by: S → ABA where A represents bn | n ≥ 0 and B represents a2m | m ≥ 0.
A production is given by: A → bA | ɛ

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 153


Automata Theory & Compiler Design 21CS51 Module 3

B production is given by: B→ aaB | ɛ


So the context free grammar for the language L = {w: Na(w) mod 2 = 0 where w € ( a, b)* }
is G = ( V, T, P, S) where,
V = {S, A, B }, T = {a, b}, S is the start symbol, and P is the production rule is as shown below:
S → ABA
A → bA | ɛ
B→ aaB | ɛ
Write a CFG for the language L defines balanced parentheses.
OR
*
L = { { (, ) } | parentheses are balanced }
So the context free grammar G = ( V, T, P, S) where,
V = {S}, T = {(, )}, S is the start symbol, and P
S → (S)
S → SS
S→ ɛ
DERIVATION
Define the following terms:
i. Derivation
ii. Left Most Derivation
iii. Right Most Derivation.
iv. Sentential Form
v. Left Sentential Form
Derivation: The process of obtaining string of terminals and/or non-terminals from the start
symbol by applying some or all production rules is called derivation.
If a string is obtained by applying only one production, then it is called one step derivation.
Example: Consider the Productions: S →AB, A→ aAb|ɛ, B →bB|ɛ
S => AB
 aAbB
 abB
 abbB
 abb

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 154


Automata Theory & Compiler Design 21CS51 Module 3

Note: The derivation process may end whenever one of the following things happens.
i. The working string no longer contains any non terminal symbols (including, as a special case when
the working string is ε). Ie: working string is generated.
ii. There are non terminal symbols in the working string but there is no match with the left-hand
side of any rule in the grammar. For example, if the working string were AaBb, this would
happen if the only left-hand side were C
Left Most Derivation (LMD): In derivation process, if a leftmost variable is replaced at every step,
then the derivation is said to be leftmost.
Example: E → E+E | E*E | a | b
Let us derive a string a+b*a by applying LMD.
E => E*E
 E+E*E
 a +E*E
 a+b*E
 a+b*a
Right Most Derivation (RMD): In the derivation process, if a rightmost variable is replaced at every
step, then the derivation is said to be rightmost.
Example: E → E+E | E*E | a | b
Let us derive a string a+b*a by applying RMD.
E => E+E
 E+E*E
 E +E*a
 E+b*a
 a+b*a
Sentential form: For a context free grammar G, any string „w‟ in (V U T)* which appears in every
derivation step is called a sentence or sentential form.
Two ways we can generate sentence:
i. Left sentential form
ii. Right sentential form
Example: S => AB
 aAbB

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 155


Automata Theory & Compiler Design 21CS51 Module 3

 abB
 abbB
 abb
Here {S, AB, aAbB, abB, abbB, abb } can be obtained from start symbol S, Each string in the set is
called sentential form.
Left Sentential form: For a context free grammar G, any string „w‟ in (V U T)* which appears in
every Left Most Derivation step is called a Left sentential form.
Example: E => E*E
 E+E*E
 a +E*E
 a+b*E
 a+b*a
Left sentential form = {E, E*E, E+E*E, a +E*E, a+b*E, a+b*a }
Right Sentential form: For a context free grammar G, any string „w‟ in (V U T)* which appears in
every Right Most Derivation step is called a Left sentential form.
Example: E => E+E
 E+E*E
 E +E*a
 E + b*a
 a + b*a
Right sentential form = {E, E+E, E+E*E, E +E*a, E+ b*a, a + b * a }
PARSE TREE: ( DERIVATION TREE)
What is parse tree?
The derivation process can be shown in the form of a tree. Such trees are called derivation trees or
Parse trees.
Example: E → E+E | E*E | a | b
The Parse tree for the LMD of the string a+b*a is as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 156


Automata Theory & Compiler Design 21CS51 Module 3

YIELD OF A TREE
What is Yield of a tree?
The yield of a tree is the string of terminal symbols obtained by only reading the leaves of the tree
from left to right without considering the ɛ symbols.
Example:

For the above parse tree, the yield of a tree is a + b * a


Branching factor
Define the branching factor of a CFG.
The branching factor of a grammar G is the length (the number of symbols) of the longest right-
hand side of any rule in G.
Then the branching factor of any parse tree generated by G is less than or equal to the branching
factor of G.
NOTE:
1. Every leaf node is labelled with terminal symbols including ɛ.
2. The root node is labelled with start symbol S
3. Every interior- node is labelled with some element of V.

Problem 1:
Consider the following grammar G:
S → aAS |a
A→ SbA |SS |ba
Obtain: i) LMD; ii. RMD iii. Parse tree for LMD iv. Parse tree for RMD for the string
„aabbaa‟

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 157


Automata Theory & Compiler Design 21CS51 Module 3

Parse tree for RMD: Parse tree for LMD:

Problem 2:
Design a grammar for valid expressions over operator – and /. The arguments of expressions are
valid identifier over symbols a, b, 0 and 1. Derive LMD and RMD for string w = (a11 – b0) / (b00 –
a01). Write parse tree for LMD
Answer:
Grammar for valid expression:
E → E – E | E / E | (E) |I
I → a | b | Ia |Ib | I0 |I1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 158


Automata Theory & Compiler Design 21CS51 Module 3

Parse Tree for LMD:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 159


Automata Theory & Compiler Design 21CS51 Module 3

Problem 3:
Consider the following grammar G:
E → + EE | * EE | - EE | x | y
Find the: i) LMD; ii. RMD iii. Parse tree for the string „+*-xyxy‟
Answer:
E → + EE | * EE | - EE | x | y
LMD: RMD:

Parse tree for LMD:

Problem 4:
Show the derivation tree for the string „aabbbb‟ with grammar:
S → AB |ɛ
A → aB
B → Sb
Give a verbal description of the language generated by this grammar.
Answer: Derivation tree:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 160


Automata Theory & Compiler Design 21CS51 Module 3

Verbal Description of the language generated by this grammar:


From the above grammar we can generate strings of ɛ, abb, aabbbb, aaabbbbbb…….etc.
Therefore the language corresponding to the above grammar is L = { an b2n | n ≥ 0 }.
That means for „n‟ number of a‟s „2n‟ number of b‟s should be generated.
Problem 5:
Consider the following grammar:
E → E+E | E-E
E →E*E | E/E
E → (E)
E →a|b|c
i. Obtain the LMD for the string ( a + b * c)
ii. Obtain the RMD for the string ( a + b )* c)
Answer:

Problem 6:
Consider the following grammar:
S → AbB
A →aA |ɛ
B → aB | bB |ɛ
Give LMD, RMD and parse tree for the string aaabab

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 161


Automata Theory & Compiler Design 21CS51 Module 3

LMD: RMD:

Parse tree for LMD:

Obtain the context free grammar for generating integers and derive the integer 1278 by applying
LMD.
The context free grammar corresponding to the language containing set of integers is G = ( V, T, P,
S) where, V = { I, N, D }, T = { 0, 1}, I is the start symbol, and P is the production rule is as shown
below:
I → N | SN
S→+|-|ε
N → D | DN | ND
D → 0 | 1 | 2 | 3 | ……….| 9
LMD for the integer 1278:
I => N
 ND
 NDD
 NDDD

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 162


Automata Theory & Compiler Design 21CS51 Module 3

 DDDD
 1DDD
 12DD
 127D
 1278
AMBIGUOUS GRAMMAR
Sometimes a Context Free Grammar may produce more than one parse tree for some (or all) of the
strings it generates. When this happens, we say that the grammar is ambiguous. More precisely. a
grammar G is ambiguous if there is at least one string in L( G) for which G
produces more than one parse tree.
***What is an ambiguous grammar?
A context free grammar G is an ambiguous grammar if and only if there exists at least one string
„w’ is in L(G) for which grammar G produces two or more different parse trees by applying either
LMD or RMD.
Show how ambiguity in grammars are verified with an example.
Testing of ambiguity in a CFG by the following rules:
i. Obtain the string ‘w‟ in L(G) by applying LMD twice and construct the parse tree. If the two parse
trees are different, then the grammar is ambiguous.
ii. Obtain the string ‘w‟ in L(G) by applying RMD twice and construct the parse tree. If the
two parse trees are different, then the grammar is ambiguous.
iii. Obtain the LMD and get a string „w‟. Obtain the RMD and get the same string „w‟
for both the derivations construct the parse tree. If there are two different parse trees
then the grammar is ambiguous.
Show that the following grammar is ambiguous:
S → AB | aaB
A → a | Aa
B→b
Let us take the string w= aab
This string has two parse trees by applying LMD twice so the grammar is ambiguous;

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 163


Automata Theory & Compiler Design 21CS51 Module 3

Show that the following grammar is ambiguous:


S → SbS
S→ a
Answer:
Take string like w = ababa
This string has two parse trees by applying LMD twice so the grammar is ambiguous;

Show that the following CFG is an ambiguous grammar:


E → E+E
E →E*E
E →a|b|c
Let us consider the string : a + b + c

Parse Tree for LMD1: Parse Tree for LMD2:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 164


Automata Theory & Compiler Design 21CS51 Module 3

The grammar is ambiguous, because we are getting two different parse trees for the same string by
applying LMD twice.
Associativity and Precedence Priority in CFG:
Example:
E → E+E| E-E
E →E*E
E →a|b|c
Associativity:
Let us consider the string : a + b + c
Parse Tree for LMD1: Parse Tree for LMD2:

The two different parse trees exist because of the associativity rules fails. That means for the given
string a + b + c; on either side of the operand „b’, there exist two operators. Which operator should
I associate with operand b? This ambiguity results in either I should consider the operand „b‟ with
left side operator (Left associative) or right side (Right associative) operator. So the first parse tree
is correct, where the left most „+‟ is evaluated first.
How to resolve the associtivity rules:
E →E+E
E →a|b|c
Here the grammar is not defined in the proper order, ie: the growth of the tree is in either left
direction or right direction.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 165


Automata Theory & Compiler Design 21CS51 Module 3

The growth of the first parse tree is in left direction. That means it is left associative. The growth
second parse tree is in right direction, ie: right associative.
For normal associative rule is left associative, so we have to restrict the growth of parse tree in right
direction by modifying the above grammar as:
E →E+I|I
I→ a | b | c
The parse tree corresponding to the string: a+b+c:

The growth of the parse tree is in left direction since the grammar is left recursive, therefore it is
left associative. There is only one parse tree exists for the given string. So the grammar is
ambiguous.
Note: For the operators to be left associative, grammar should be left recursive. Also for the
operators to be right associative, grammar should be right recursive.
Left Recursive grammar: A production in which the leftmost symbol of the body is same as the
non-terminal at the head of the production is called a left recursive production.
Example: E → E + T
Right Recursive grammar: A production in which the rightmost symbol of the body is same as
the non-terminal at the head of the production is called a right recursive production.
Example: E → T + E
Precedence of operators in CFG:
Let us consider the string: a + b * c
LMD 1 for the string: a+b*c LMD 2 for the string: a+b*c

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 166


Automata Theory & Compiler Design 21CS51 Module 3

The first parse tree is valid, because the highest precedence operator „*‟ is evaluated first compared
to „+‟. (See the lower level of parse tree, where „*‟ is evaluated first). The second parse tree is not
valid, since the expression containing „+‟ is evaluated first. So here we got two parse trees because
of the precedence is not taken care.
So if we take care of associativity and precedence of operators in CFG, then the grammar is un-
ambiguous.
NOTE:

Normal precedence rule: If we have the operators such as +, -, *, /, , then the highest precedence
operator is evaluated first.
Next highest precedence operator * and / is evaluated. Finally the least precedence operator + and –
is evaluated.
Normal Associativity rule: Grammar should be left associative.

How to resolve the operator precedence rules:


E → E+E| E *E
E →a|b|c
If we have two operators of different precedence, then the highest precedence operator (*) is
evaluated first. This can be done by re- writing the grammar with least precedence operator at the
first level and highest precedence operator at the next level.
E →E+T ; + is left associative, because the non-terminal symbol to the left of + is same
as that of non- terminal at the LHS.
At the first level expression containing all „+‟s are generated.
Suppose if no „+‟s are included in the grammar, then we have to bypass the grammar E + T.
This can be done by:
E → T
Finally the first level grammar is

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 167


Automata Theory & Compiler Design 21CS51 Module 3

E →E+T|T
Similarly at the second level, we have to generate all „*‟s.
T → T * F ; * is left associative.
If the expression does not contain any „*‟s, then we have to bypass the grammar T → T * F
T → F
Finally the second level grammar is
T →T*F|F
Third level:
F →a|b|c
So the resultant un-ambiguous grammar is:
E →E+T|T
T →T*F|F
F →a|b|c
So the operator which is closest to the start symbol has least precedence and the operator which is
farthest away from start symbol has the highest precedence.
Un-Ambiguous Grammar:
For a grammar to be un-ambiguous we have to resolve the two properties such as:
i. Associativity of operators: This can be resolved by writing the grammar recursion.
ii. Precedence of operators: can be resolved by writing the grammar in different levels.
Is the following grammar is ambiguous?
If the grammar is ambiguous, obtain the un-ambiguous grammar assuming normal precedence and
associativity.
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c 10
Answer:
Let us consider the string: a + b * c

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 168


Automata Theory & Compiler Design 21CS51 Module 3

LMD 1 for the string: a+b*c LMD 2 for the string: a+b*c

For the given string there exists two different parse trees, by applying LMD twice. So the above
grammar is ambiguous.
The equivalent un-ambiguous grammar is obtained by writing all the operators as left associative
and writing the operators +, – at the first level and *, / at the next level.
Equivalent un-ambiguous grammar:
E →E+T|E–T|T
T →T*F|T/F|F
F → ( E) | a | b | c
Is the following grammar is ambiguous?
If the grammar is ambiguous, obtain the un-ambiguous grammar assuming the operators + and – are
left associative and * and / are right associative with normal precedence .
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c
Ambiguous grammar------- see the previous answer.
Equivalent un-ambiguous grammar:
E →E+T|E–T|T
T →F*T|F/T|F
F → ( E) | a | b | c

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 169


Automata Theory & Compiler Design 21CS51 Module 3

Is the following grammar is ambiguous?


If the grammar is ambiguous, obtain the un-ambiguous grammar assuming the operators + and / are
left associative and * and - are right associative with + and / has the highest precedence and * and –
has the least precedence.
E →E+E
E →E*E
E →E/E
E →E-E
E → (E ) | a | b| c
Equivalent un-ambiguous grammar:
E →T-E|T*E|T
T →T+F|T/F|F
F → ( E) | a | b | c

Consider the grammar:


S → aS | aSbS | ɛ
Is the above grammar ambiguous? Show in particular that the string „aab‟ has two:
i. Parse trees
ii. Left Most Derivations
iii. Right Most Derivations.
OR
Define ambiguous grammar. Prove that the following grammar is ambiguous.
S → aS | aSbS| ɛ

LMD 1 for the string „aab‟: LMD 2 for the string „aab‟:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 170


Automata Theory & Compiler Design 21CS51 Module 3

RMD 1 for the string „aab‟: RMD 2 for the string „aab‟:

Two Parse trees for LMD:

The above grammar is ambiguous, since we are getting two parse trees for the same string „aab‟ by
applying LMD twice.

Consider the grammar:


S → S +S | S * S | (S) | a
Show that string a + a * a has two
i. Parse trees
ii. Left Most Derivations
Find an un-ambiguous grammar G‟ equivalent to G and show that L (G) = L (G‟) and G‟ is un-
ambiguous.
Two different parse trees for the string a+a*a :

Two LMDs:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 171


Automata Theory & Compiler Design 21CS51 Module 3

Un-ambiguous grammar Corresponding to the above grammar (G) is:


G‟ = ( V‟, T‟, P‟, S) where V‟ = { S, T, F }, T‟ = { a }, S is the start symbol and P‟ is the production
given by:
S→S+T|T
T → T* F | F
F→(S)|a
Consider the string a + a * a ( Generate this string using G’ using LMD )

So we proved that same language can be generated using G‟

NOTE: Suppose if we have an exponential operator ( ) in an expression such as;

; This can be represented as 2 3 2 where 32 is evaluated first as 9, then 29 is evaluated. That

means the evaluation starts from right side; therefore the operator is right associative.

Any expression containing the operators such as: +, -, *, / and

highest precedence ( Farthest away from start symbol )


*, / next highest precedence. ( next least level)
+, - least precedence ( closest to the start symbol)

Show that the following grammar is ambiguous. Also find the un-ambiguous grammar equivalent to
the grammar by normal precedence and associative rules.
E → E+ E | E - E
E → E*E| E / E

E→E E
E → ( E) | a | b

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 172


Automata Theory & Compiler Design 21CS51 Module 3

Answer:
We already proved that the above grammar is ambiguous
Equivalent Un-ambiguous grammar:
E→E+T|E–T|T
T→T*F|T/F|F

F→G F|G
G → (E) | a | b

Here the operator is right associative.


Show that the following grammar is ambiguous: 4
S → SS
S→ ( S) | ɛ over string w = (( ) ( ) ( ) )

The given string has two parse trees by applying LMD twice so the grammar is ambiguous;

Show that the following grammar is ambiguous using the string “ ibtibtaea”
S → iCtS | iCtSeS | a
C→ b
Answer:
String w = ibtibtaea
The given string has two parse trees by applying LMD twice:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 173


Automata Theory & Compiler Design 21CS51 Module 3

For the grammar


S → SS + | SS * | a Give the left most and right most derivation for the string aa+a*

Dangling- else grammar:

stmt → if expr then stmt | if expr then stmt else stmt | other
Terminals are keywords if, then and else.
Non terminals are expr and stmt.
Here “other” stands for any other statement. According to this grammar one of the compound
conditional statement can be written as
if E1 then S1 else if E2 then S2 else S3
It has the parse tree as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 174


Automata Theory & Compiler Design 21CS51 Module 3

Example: Let us consider the string if E1 then if E2 then S1 else S2


It has two parse trees, so the above grammar is an ambiguous grammar.

In all programming languages with conditional statements of this form, the first parse tree is
preferred. The general rule is match each else with the closest unmatched then.
Unambiguous grammar for this if else statements:
stmt → matched_stmt | open_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
ope_stmt → if expr then stmt | if expr then matched_stmt else open_stmt

Show that the following grammar is ambiguous:


S → iEtS | iEtSeS | a
E → b
Also write an equivalent unambiguous grammar for the same.
Let us consider the string iEtiEtSeS, which has two parse trees.

Unambiguous grammar :

S → M |U
M → iEtMeM |a
U → iEtS | iEtMeU
E → b

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 175


Automata Theory & Compiler Design 21CS51 Module 3

LEFT RECURSION
A production in which the leftmost symbol of the body is same as the non-terminal at the head of
the production is called a left recursive production.
Example: E → E + T
Immediate Left recursive production:
A production of the form A → Aα is called an immediate left recursive production. Consider a
non-terminal A with two productions
A → Aα | β
Where α and β are sequence of terminals and non-terminals that do not start with A.
Repeated application of this production results in sequence of α‟s to the right of A. When A is
finally replaced by β, we have β followed by a sequence of zero or more αs.
Therefore a non-left recursive production for A → Aα | β is given by
A → βA’
A’ → αA’ | ε
Note: In general we can eliminate any immediate left recursive production of the form
A → Aα1 | A α2 | Aα3 ………… | Aαm | β1 | β2| β3|…………| βn
By replacing A production by
A → β1A’ | β2 A’| β3 A’|…………| βn A’
A’ → α1 A’ | α2 A’| α3 A’| …………..|αm A’ | ε
no βi begins with A
What is left recursion?
A grammar is left recursive if it has a non-terminal A such that there is a derivation A Aα for
some string α.
Top down parsing methods cannot handle left recursive grammars, so a transformation is needed
to eliminate left recursion.
A grammar containing productions results in left recursive productions, after applying two or
more steps of derivations can be eliminated using the following algorithm.
Algorithm to eliminate left recursion from a grammar having no ε production:
Write an algorithm to eliminate left recursion from a grammar.
1. Arrange the non-terminals in some order A1, A2, . . . , An
2. for ( each i from 1 to n )
{

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 176


Automata Theory & Compiler Design 21CS51 Module 3

3. for ( each j from 1 to i - 1 )


{
4. replace each production of the form Ai → Aj Γ by the
productions Ai → ∂1 Γ | ∂2 Γ |…… | ∂k Γ, where Aj → ∂1 | ∂2 | ….| ∂k are all current
Aj - productions.
5. }
6 eliminate the immediate left recursion among the Ai-productions.
7. }
Eliminate left recursion from the following grammar:
E → E+T|T
T → T*F|F
F → ( E ) | id
The above grammar contains two immediate left recursive productions E → E + T | T and
T → T*F|F
The equivalent non-left recursive grammar is given by:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ |ε
F → (E) | id

Eliminate left recursion from the following grammar:


E → E+T|T
T → id | id [ ] | id [ X ]
X → E,E|E
The given grammar has one immediate left recursive productions E → E + T | T
The equivalent non-left recursive grammar is given by:
E → TE’
E’ → +TE’ | ε
T → id | id [ ] | id [ X ]
X → E,E|E

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 177


Automata Theory & Compiler Design 21CS51 Module 3

Eliminate left recursion from the following grammar:

S → Aa | b
A → Ac| Sd | a
By applying elimination algorithm,
Arrange the non-terminals as A1 = S and A2 = A
Since there is no immediate left recursion among S production, so nothing happens during the
outer loop for i =1.
For i =2, we substitute for S in A → Sd to obtain the following A productions.
A → Ac| Aad | bd | a
Eliminating the immediate left recursion among these A- productions yields the following
grammar
S → Aa | b

A → bdA’| aA’
A’ → cA’| adA’ | ε

Eliminate left recursion from the following grammar:


A → BC | a
B → CA| Ab
C → AB| CC | a
Arrange the non-terminals as A1 = A , A2 = B and A3 = C
Since there is no immediate left recursion among A production, so nothing happens during the outer
loop for i =1. ie A → BC | a
For i =2, we substitute for A in B → Ab to obtain the following A productions.
B → CA| BCb | ab
Eliminating the immediate left recursion among these B- productions results in a new B
productions as B → CAB’| abB’
B’ → CbB’ |ε
For i =3, we substitute for A in C → AB to obtain the following C productions
C → BCB| CC | aB |a
Again substitute for B in C → BCB production to obtain the C productions as

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 178


Automata Theory & Compiler Design 21CS51 Module 3

C → CAB‟CB| abB‟CB | CC | aB | a
Eliminating the immediate left recursion among these C- productions results in new C
productions as
C → abB’CBC’ | aBC’ |aC’
C’ → AB’CB C’ | CC’ | ε
The equivalent non- left recursive grammar is given by:
A → BC | a

B → CAB‟| abB‟
B‟ → CbB‟ | ε
C → abB‟CBC‟ | aBC‟ |aC‟
C‟ → AB‟CB C‟ | CC‟ | ε
Eliminate left recursion from the following grammar.
Lp → no | Op Ls
Op → +1–1*
Ls → Ls Lp | Lp
For i = 1 and 2 nothing happens to the production Lp and Op.
For i= 3
By removing immediate left recursion,

Ls → Lp Ls‟

Ls‟ → Lp Ls‟ | ε
The equivalent non- left recursive grammar is given by:
Lp → no | Op Ls
Op → +1–1*

Ls → Lp Ls‟

Ls‟ → Lp Ls‟ | ε

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 179


Automata Theory & Compiler Design 21CS51 Module 3

Eliminate left recursion from the following grammar:

S → aB | aC | Sd | Se
B → bBc| f
C → g
For i =1 , results in a new S productions as
S → aB S‟ | aC S‟
S‟ → d S‟ | eS‟ |ε
For i =2 nothing happens to B productions, B → bBc| f
For i =3 nothing happens to C productions C → g
The equivalent non- left recursive grammar is given by:

S → aB S‟ | aC S‟
S‟ → d S‟ | eS‟ |ε
B → bBc| f
C → g
LEFT FACTORING (Non-deterministic to Deterministic CFG conversion)
It is a grammar transformation method used in parser. When the choice between two alternative A
productions is not clear, we can rewrite the productions so to make the right choice.
A→ αβ1 | αβ2 |………..| αβn | Γ
By left factoring this grammar, we get
A → αA‟ | Γ
A‟ → β1 | β2 ……………..| βn
Γ is other alternatives that do not begin with α.
A predictive parser (a top-down parser without backtracking) insists that the grammar must be
left-factored.
What is left factoring?
Left factoring is removing the common left factor that appears in two or more productions of the
Same non-terminal.
Example: S → i EtSeS | iEtS | a
E→b

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 180


Automata Theory & Compiler Design 21CS51 Module 3

By left factoring the above grammar we get,

S → i EtSS’ | a
S’ → eS | ε
E→b
Perform left factoring for the grammar.
E → E+T|T
T → id | id [ ] | id [ X ]
X → E,E|E
The equivalent non-left recursive grammar is given by:
E → TE‟
E‟ → +TE‟ | ε
T → id | id [ ] | id [ X ]
X → E,E|E
After left factoring the grammar, we get
E → TE‟
E‟ → +TE‟ | ε
T → id T‟
T‟ → ε | [ ] |[ X] |
X → E X‟
X‟ → ,E|ε

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 181


Automata Theory & Compiler Design 21CS51 Module 3

TOP DOWN PARSER

 Top-down parsing can be viewed as the problem of constructing a parse tree for the input
string, starting from the root (Top) and working up towards the leaves (Down).
 Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input
string.
 At each step of a top-down parse, the key problem is that of determining the production to
be applied for a non-terminal, say A.
 Once an A-production is chosen, the rest of the parsing process consists of "matching" the
terminal symbols in the production body with the input string.
RECURSIVE-DESCENT PARSING
 Backtracking is needed (If a choice of a production rule does not work, we backtrack to try
other alternatives.)
 It is a general parsing technique, but not widely used.
 Not an efficient parsing method.
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop, so we
have to eliminate left recursion from a grammar
Recursive-Descent Parsing Algorithm:
Explain Recursive-Descent Parsing Algorithm.
void A ( )
{
1. Choose an A-production, A → X1 X2 X3 …………………………….. Xk ;
2. for (i = 1 to k)
{
3. if ( Xi is a non-terminal )

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 182


Automata Theory & Compiler Design 21CS51 Module 3

4. call procedure Xi ( ) ;
5. else if (Xi equals the current input symbol a)
6. advance the input to the next symbol;

7. else /* an error has occurred */;


}
}
A recursive-descent parsing program consists of a set of procedures, one for each non-terminal.
Execution begins with the procedure for the start symbol, which halts and announces success if its
procedure body scans the entire input string.
General recursive-descent may require backtracking; that is, it may require repeated scans over the
input. However, backtracking is rarely needed to parse programming language constructs, so Back-
tracking parsers are not seen frequently.
What are the key problems with top down parser? Write a recursive descent parser for the
grammar:
S → cAd
A → ab | a
Key problem:
1. Ambiguous grammar
2. Left recursive grammar
3. Left factoring
4. Backtracking
At each step of a top-down parse, the key problem is that of determining the production to be
applied for a non-terminal.
CONSTRUCTION OF RECURSIVE DESCENT PARSER:
Let us consider the input string w = cad, begin with a tree consisting of a single node labeled S.
Input pointer points to c, the first symbol of w. S has only one production, so we use it to expand S
and obtain the tree as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 183


Automata Theory & Compiler Design 21CS51 Module 3

The leftmost leaf labeled c, matches the first symbol of input w, so we advance the input pointer to
a, the second symbol of w, and consider the next leftmost leaf labeled A.
Expand A using the first alternative A → a b to obtain the following tree:

Now we have a match for the second input symbol a, with the leftmost leaf labeled a, so we
advance the input pointer to d, third input symbol of w.

Now compare the current input symbol d against the next leaf labeled b. Since b does not match d
,we report failure and go back to A (Back tracking) to see whether there is another alternative for
A that has not been tried, but that might produce a match.

Failure and Backtrack


In going back to A, we must reset the input pointer to position 2, the position it had when we first
came to A. (procedure must store the pointer in a local variable).
Now the second alternative for A produces the tree as,

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 184


Automata Theory & Compiler Design 21CS51 Module 3

Now the leftmost leaf labeled a matches the current input symbol a, ie: the second symbol of w,
then advance the pointer to the next input symbol d.

Now the next leaf d matches the third input symbol d, later when it finds $ nothing is left out to be
read in the tree. Since it produces a parse tree for the string w, it halts and announce successful
completion of parsing.

return success
Write a recursive descent parser for the grammar:
S → aBc
B → bc | b
Input: abc
Begin with a tree consisting of a single node labeled S with input pointer pointing to first input
symbol a.

Since the input a matches with leftmost leaf labeled a, advance the pointer to next input symbol
b.
Expand B using the alternative B → bc

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 185


Automata Theory & Compiler Design 21CS51 Module 3

We have a match for second input symbol b. Move the pointer again it finds the match for third
symbol c. Now the pointer is pointing to $, indicating the end of string, but in the tree we find one
more symbol c to be read, thus it fails

Failure and Backtrack


Now it goes back to B and reset the input pointer to position 2.

When the pointer is set to position 2, it checks the second alternative and generates the tree ;

Now the pointer moves to the 2nd symbol finds a match, then advances to the 3rd symbol finds a
match, later when it encounters „$‟ nothing is left out to be read in the tree. Thus it halts and
announce successful completion of parsing.

return success
Show that recursive descent parsing fails for the input string „acdb‟ for the grammar.
S → aAb
A → cd | c

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 186


Automata Theory & Compiler Design 21CS51 Module 3

Input string: acdb


Begin with a tree consisting of a single node labeled S with input pointer pointing to first input
symbol a .

The first input symbol a matches with left most leaf a and advance the pointer to next input
symbol c.
Now expand A using the second alternative A → c

We have a match for second input symbol c with left leaf node c. Advance the pointer to the next
input symbol d.

Now compare the input symbol d against the next leaf, labeled b. Since b does not match d, we
report failure and go back to A to see another alternative for A and reset the pointer to position 2.

Failure and Backtrack

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 187


Automata Theory & Compiler Design 21CS51 Module 3

PREDICTIVE LL(1) PARSER ( NON RECURSIVE PREDICTIVE PARSER)


Working of Predictive top down parser (LL(1) parser)
Back tracking problem of recursive descent parser can be solved in predictive parsing. This top
down parsing algorithm is a non-recursive type of parsing. Predictive parser is capable to predict or
detect, which alternatives are right choices for the expansion of non-terminals during the parsing of
input string w. In order to parse the grammar by predictive top down parser, we perform the
following operations:

Writing a grammar for program statement

Eliminate left recursion

Left factoring the resultant grammar

Apply the grammar to the predictive top down parser.
Explain with a neat diagram, the model of a table driven predictive parser.
LL(1) parser is a table driven parser, where a parsing table is built for LL(1). Here the first L stands
for the input is scanned from left to right. The second L means it uses leftmost derivation for input
string and the number 1 in the input symbol means it uses only one input symbol to predict the
parsing process.
Model of LL(1) parser:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 188


Automata Theory & Compiler Design 21CS51 Module 3

The data structures used by LL(1) parsers are;


1. Input buffer: used to store the input tokens.
2. Stack is used to hold the left sentential form, the symbols in RHS of the production rule are
pushed into the stack in reverse order, ie: from right to left.
3. A parsing table is a two dimensional array, which has row for non terminal and column for
terminals.
The construction of predictive LL(1) parser is based on two very important functions and those
functions are: FIRST( ) and FOLLOW( ). These two functions allow us to choose which production
to apply, based on the next input symbol.
Steps involved in construction of LL(1) predictive parser:
i. Computation of FIRST ( ) and FOLLOW ( ) function.
ii. Construct the predictive parsing table using FIRST and FOLLOW functions
iii. Parse the input string with the help of predictive parsing table
FIRST() FUNCTION:
Define FIRST ( )
FIRST ( α ) is a set of terminal symbol, that are first symbol appearing at RHS in derivation of α.
If α => ε then ε is also in FIRST (α )
Give the rules for constructing the FIRST and FOLLOW sets.
Rules used in computation of FIRST function:
1. If there is a terminal symbol a, then the FIRST (a) = { a }.
2. If there is a production rule X → ε , then FIRST (X) = { ε }
3. If there is a production rule X → Y1, Y2, Y3,……………………. Yk then
FIRST (X ) = { a} if for some i , a is in FIRST (Yi) and
FIRST (X ) = ε, if ε is in FIRST (Y1), FIRST (Y2), FIRST (Y3), ……. FIRST (Yk) and
FIRST (X ) = FIRST (Y1 ) if Y1 does not derive ε, but if Y1*=> ε then we add FIRST
(Y2) and so on.
ie: FIRST (X ) = FIRST (Y1) – {ε} U FIRST (Y2) – {ε} U …….. FIRST (Yk-1) - {ε} U
FIRST (Yk)
If there is a production rule X → Y1 | Y2| Y3|,……………………. |Yk then FIRST
(X ) = FIRST (Y1) U FIRST (Y2) …………………… FIRST (Yk)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 189


Automata Theory & Compiler Design 21CS51 Module 3

FOLLOW ( ) FUNCTION:
FOLLOW (A) is defined as the set of terminal symbols that appear immediately to the right of A. ie
: FOLLOW (A ) = { a | S *=> αAaβ where α and β are some grammar symbols, may be terminal or
non terminal symbols.
Rules used in computation of FOLLOW function:
1. For the start symbol S place „$‟ in FOLLOW (S).
2. If there is a production A→ αBβ, then everything in FIRST ( β ) except ε is in FOLLOW
(B).
3. If there is a production A→ αBβ and FIRST(β) derives ε, then

FOLLOW(B) = FIRST (β) – {ε} U FOLLOW (A) and


if the production is A→ αB then FOLLOW (B ) = FOLLOW (A)
Find FIRST and FOLLOW for the following grammar by eliminating left recursive
productions.
E →E + T |T

T →T * F|F

F → (E) | id
The above grammar contains left recursive productions, so by eliminating left recursive, grammar
G becomes:
E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ |ε
F → (E) | id
Computation of FIRST set:

FIRST (E ) = FIRST( T ) ie: from production E → TE‟


FIRST (T ) = FIRST( F ) ie: from production T → FT‟
FIRST (F ) = { (, id } ie: from production F → (E) | id
Therefore FIRST (E) = FIRST (T) = { (, id }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 190


Automata Theory & Compiler Design 21CS51 Module 3

FIRST (E‟ ) = { +, ε } ie: from production E‟ → +TE‟ | ε


FIRST (T‟ ) = { *, ε } ie: from production T‟ → *FT‟ | ε
Computation of FOLLOW set:
FOLLW (E ) = { ), $ }
From F→ ( E ), we have F is followed by a terminal symbol „(„. Also
E is the start symbol, add $ in FOLLOW (E).
From E → TE‟
FOLLOW (E‟) = FOLLOW (E) = { ), $ }
From E‟ → +TE‟ | ε
FOLLOW (E‟) = FOLLOW (E‟)
Therefore FOLLOW (E’) = { ), $ }

From rule E → TE‟


FOLLW (T ) = FIRST ( E‟) – { ε } U FOLLOW (E) = { +, ), $ } ie: by applying 3rd rule, as β
tends to ε when E‟ derives ε

From rule E‟ → +TE‟ | ε


FOLLW (T ) = FIRST ( E‟) – { ε } U FOLLOW (E‟) = { +, ), $ }
Therefore FOLLOW (T) = { +, ), $ }

From T → FT‟
FOLLW (T‟) = FOLLOW ( T) = { +, ), $ }

From T‟ → *FT‟ | ε
FOLLW (T‟) = FOLLOW (T‟)
Therefore FOLLOW (T’) = { +, ), $ }
From T →FT‟
FOLLOW (F) = FIRST ( T‟) – { ε } U FOLLOW (T) = { *, +, ), $} ie: by
applying 3rd rule, as β tends to ε when T‟ derives ε
From T‟ →*FT‟ | ε
FOLLOW (F) = FIRST ( T‟) – { ε } U FOLLOW (T‟)
Therefore FOLLOW (F) = { *, +,

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 191


Automata Theory & Compiler Design 21CS51 Module 3

NOTE:
For any non-terminal, FOLLOW set is computed by selecting the productions in which, that non-
terminal appears on RHS of production.
Non-terminal symbol FIRST FOLLOW
E { (, id } { ), $}

T { (, id } { +, ), $ }

F { (, id } { *, +, ), $ }

E‟ { +, ε } { ),$ }

T‟ { *, ε } { +, ), $ }

Find FIRST and FOLLOW for the following grammar.


E →E + T |T

T →T * F|F

F → (E) | id

Non-terminal symbol FIRST FOLLOW


E { (, id } { +, ), $ }
{ * ,
T { (, id } +, ), $ }

F { (, id } { *, +, ), $ }
FOLLOW ( E ) = { ) } from F → (E) and FOLLOW ( E ) = { + } from E → E +
T .
FOLLOW ( T ) = {*} from T → T * F and FOLLOW ( T ) = FOLLOW{E} from
E →E + T |T

FOLLOW ( F ) = FOLLOW (T) from T → T * F | F

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 192


Automata Theory & Compiler Design 21CS51 Module 3

Find FIRST and FOLLOW for the following grammar.

Stmt_sequence → Stmt Stmt_seq’

Stmt_seq‟ → ; Stmt_sequence | ε
Stmt → s
Non-terminal symbol FIRST FOLLOW
Stmt_sequence {s } { $}

Stmt_seq’ {; ε} { $}
{
Stmt {s } ; $}

Find FIRST and FOLLOW for the following grammar.


S → ,GH;
G → aF
F → bF | ε
H → KL
K → m|ε
L →n|ε
Non-terminal symbol FIRST FOLLOW
S {,} { $}
G {a} {mn; }
F {bε} {mn; }
H {mnε} {; }
K {mε} {n ;}
L {nε} {; }
FOLLOW (G ) = FIRST(H;) – {ε} = { m n} and when FIRST(H) = ε, FIRST(H;) = ;
That makes FOLLOW (G ) = { m n ; }
FOLLOW (K ) = FIRST(L) - {ε} U FOLLOW( H) = { n ; }since β tends to ε, so apply rule3.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 193


Automata Theory & Compiler Design 21CS51 Module 3

Find FIRST and FOLLOW sets for the following grammar.


S → aABb
A → c|ε
B →d|ε
Non-terminal symbol FIRST FOLLOW
S {a } ${}
A {c,ε } { d, b }
B { d, ε } b{}

FOLLOW(A) = FIRST(Bb) – {ε} U FIRST(b) ----------- when B derives ε, FIRST(Bb) = b

Find FIRST and FOLLOW sets for the following grammar.


S → AbS | e | ε
A → a | cAd
Non-terminal symbol FIRST FOLLOW
S { a, c, e, ε } { $}
A { a, c } { b, d }
Find FIRST and FOLLOW sets for the following grammar.
Exp → Exp addop term | term
addop → + | -
term → term mulop factor | factor
mulop → *
factor → ( Exp )
factor → number

Non-terminal symbols are : EXP, addop, mulop, term, factor


Terminal symbols are : +, -, *, „(„, „)‟ and „number‟

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 194


Automata Theory & Compiler Design 21CS51 Module 3

Non-terminal symbol FIRST FOLLOW


Exp { (, number } { +, -, ), $ }
addop { +, - } { (, number }
term { (, number } { +, -, ), $ }
mulop { *} { (, number }
factor { (, number } { +, -, ), $ }
LL (1) GRAMMAR:
What is LL(1) grammar?
A grammar G is said to be LL(1) if and only if whenever a production A → α | β are two distinct
productions of G, with the following conditions hold:
Enlist the conditions required for a grammar to be LL(1).
i. For no terminal symbol ‟a‟ do both α and β derive strings beginning with a
ii. At most one of α and β can derive the empty string.
iii. If β *=> ε then α does not derive any string beginning with a terminal in FOLLOW(A).
Likewise if α *=> ε then β does not derive any string beginning with a terminal in
FOLLOW(A).
CONSTRUCTION OF PREDICTIVE PARSING TABLE ( LL(1) Parser)
Input: Context free grammar G
Output: Predictive parsing table M.
Algorithm for predictive parsing table: (steps involved in construction of predictive parser):
For the production rule A → α of grammar G
1. For each terminal symbol „a’ in FIRST (α ) create an entry in parsing table as M [ A, a ]
= A → α.
2. For „ε „ in FIRST (α ) create an entry in parsing table as M [ A, b ] = A → α. Where ‘b’ is
the terminal symbols from FOLLOW(A).
3. If „ε „ in FIRST (α ) and „$‟ is in FOLLOW(A) then create an entry in the table M [ A, $]
= A → α.
4. All the remaining entries in the parsing table M are marked as „SYNTAX ERROR’.
NOTE:
For any grammar to be LL(1), each parsing table entry uniquely identifies a production or signals
an error. That means there should not be any multiple entries in the parsing table.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 195


Automata Theory & Compiler Design 21CS51 Module 3

Construct the predictive parsing table by making necessary changes to the grammar given
below:

E →E + T |T

T →T * F|F

F → (E) | id
Also check whether the modified grammar is LL(1) grammar or not.
The above grammar contains left recursive productions, so we eliminate left recursive
productions.

After eliminate left recursive productions, grammar G becomes:


E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ |ε
F → (E) | id

By computing FIRST and FOLLOW sets of the above grammar:


Non-terminal symbol FIRST FOLLOW
E { (, id } { ), $ }

T { (, id } { +, ), $ }

F { (, id } { *, +, ), $ }

E‟ { +, ε } { ), $ }

T‟ { *, ε } { +, ), $ }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 196


Automata Theory & Compiler Design 21CS51 Module 3

Construction of predictive parsing table:


Input Symbol

Non-terminal id + * ( ) $

E E → TE‟ E → TE‟
E’ E‟→+TE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→*FT‟ T‟→ ε T‟→ ε
F F→ id F→ (E)

The above modified grammar is LL(1) grammar, since the parsing table entry uniquely identifies a
production or signals an error

Construct the LL(1) parsing table for the grammar given below:
E →E * T |T

T → id + T | id

After eliminating left recursive production,


E → T E‟
E‟ → *T E‟

T → id + T | id

After left factoring:


E → T E‟
E‟ → *T E‟ |ε
T → id T‟
T‟ → +T | ε

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 197


Automata Theory & Compiler Design 21CS51 Module 3

By computing FIRST and FOLLOW sets:


Non-terminal symbol FIRST FOLLOW
E { id } {$}

E‟ { *, ε } {$}

T { id } { *, $ }

T‟ { +, ε } { *, $ }
Construction of predictive parsing table:

Non-terminal id + * $

E E → TE‟
E’ E‟ → *T E‟ E‟→ ε
T T → id T‟
T’ T‟→ +T T‟→ ε T‟→ ε

Consider the grammar given below:

Do necessary modifications and Construct the LL(1) parsing table for the resultant grammar .
By eliminating left recursive productions:
E → TE‟
E‟ → ATE‟ | ε
A→+|-
T → FT‟
T‟ → MFT‟ | ε
M →*
F → (E) | num

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 198


Automata Theory & Compiler Design 21CS51 Module 3

By computing FIRST and FOLLOW sets:


Non-terminal symbol FIRST FOLLOW
E { (, num } { ), $ }

E‟ { +, -, ε } { ), $ }

T { (, num } { +, - , ), $ }

T‟ { *, ε } { +, - , ), $ }

A { +, - } { (, num }

M {*} { (, num }

F { (, num } { *, +, - , ), $ }
Construction of predictive parsing table:
Input Symbol

Non- num + - * ( ) $
terminal

E E → TE‟ E → TE‟
E’ E‟→ATE‟ E‟→ATE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→ ε T‟→MFT‟ T‟→ ε T‟→ ε
A A →+ A →-
M M→*
F F→ num F→ (E)

Construct the LL(1) parsing table for the grammar given below:
S → AaAb | BbBa
A →ε

B →ε

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 199


Automata Theory & Compiler Design 21CS51 Module 3

Answer:
Non-terminal symbol FIRST FOLLOW
S { a, b } { $}

A {ε} { a, b }

B {ε} { a, b }
Parsing Table:
a b $
Non-terminal
S S → AaAb S → BbBa
A A→ε A→ε
B B→ ε B→ε
Construct the LL(1) parsing table for the grammar given below:
S →A
A → aB
B → bBC | f
C →g
Non-terminal symbol FIRST
S {a}
A {a}
B { b, f }
C {g}
Note: Since the grammar is ε- free, FOLLOW sets are not required to be computed in order to enter
the productions into the parsing table.
Parsing Table:
a b f g d
Non-terminal
S S→A
A A → aB A→d
B B → bBC B→f
C C→ g

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 200


Automata Theory & Compiler Design 21CS51 Module 3

Construct the LL(1) parsing table for the grammar given below:
S → aBDh
B → cC
C → bC | ε
D → EF
E →g|ε
F →f|ε

Non-terminal symbol FIRST FOLLOW


S {a} {$}

B {c} { g, f, h }

C { b, ε } { g, f, h }

D { g, f, ε } {h}

E { g, ε } { f, h }

F { f, ε } { h}

Parsing Table:
NT a b c g f h $

S S → aBDh
B B → cC
C C→ bC C→ ε C→ ε C→ ε
D D→ EF D→ EF D→ EF
E E→ g E→ε E→ ε
F F→ f F→ ε

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 201


Automata Theory & Compiler Design 21CS51 Module 3

NON RECURSIVE PREDICTIVE PARSING ALGORITHM:


Initially the parser is in a configuration with input string (token) “w$‟ in the input buffer and the
start symbol “S‟ of grammar G, on the top of the stack , ie: above the “$‟ symbol.
Steps involved in parsing the input token (string):
Let “X‟ be the symbol on the top of the stack, and “a‟, be the next symbol of the input string.
i. If X = a = $, then parser announces the successful completion of the parsing and halts.
ii. If X = a ≠ $, then parser pops the X off the stack and advances the input pointer to the
next input symbol.
iii. If X is a non-terminal, then the program consults the parsing table entry TABLE [X, a].
If TABLE[X, a] = X→ UVW, then the parser replaces X on the top of the stack by
UVW in such a way that U will come on the top. If TABLE [X, a] = error, then the
parser calls the error recovery routine.
Given the grammar:
S →E+T|T
T →T*F|F
F → ( E ) | id
i. Make necessary changes to make it suitable for LL(1) parsing.
ii. Construct FIRST and FOLLOW sets.
iii. Construct the predictive parsing table.
iv. Check whether the resultant grammar is LL(1) or not.
v. Show the moves made by the predictive parser on the input id + id * id.
NOTE:
In order to construct any predictive parser, First thing we have to
a. Eliminate left recursive productions if any
b. Left factoring, if possible.
i. After eliminating left recursive productions:
E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 202


Automata Theory & Compiler Design 21CS51 Module 3

i. FIRST & FOLLOW sets


Non-terminal symbol FIRST FOLLOW
E { (, id } { ), $ }

T { (, id } { +, ), $ }

F { (, id } { *, +, ), $ }

E‟ { +, ε } { ), $ }

T‟ { *, ε } { +, ), $ }

iii. Construction of predictive parsing table:


Input Symbol

Non-terminal id + * ( ) $

E E → TE‟ E → TE‟
E’ E‟→+TE‟ E‟→ ε E‟→ ε
T T→ FT‟ T→ FT‟
T’ T‟→ ε T‟→*FT‟ T‟→ ε T‟→ ε
F F→ id F→ (E)

iv. The above modified grammar is LL(1) grammar, since the parsing table entry uniquely
Identifies a production or signals an error.
v. Moves made by predictive parser on input id + id * id
MATCHED STACK INPUT ACTION
E$ id+ id * id$
TE‟$ id+ id * id$ Output E → TE‟
FT‟E‟$ id+ id * id$ Output T → FT‟
idT‟E‟$ id+ id * id$ Output F → id
id T‟E‟$ + id * id $ match id
id E‟$ + id * id $ Output T‟→ ε

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 203


Automata Theory & Compiler Design 21CS51 Module 3

id +TE‟$ + id * id $ Output E‟→+TE‟


id + TE‟$ id * id $ match +
id + FT‟E‟$ id * id $ Output T→FT‟
id + id T‟E‟$ id * id $ Output F → id
id + id T‟E‟$ * id$ match id
id + id *FT‟E‟$ * id$ Output T‟ → *FT‟
id + id* FT‟E‟$ id$ match *
id + id* id T‟E‟$ id$ Output F → id
id + id*id T‟E‟$ $ match id
id + id*id E‟$ $ Output T‟→ ε
id + id*id $ $ Output E‟→ ε

Given the grammar:


S →(L)|a
L → L, S | S
i. Make necessary changes to make it suitable for LL(1) parsing.
ii. Construct FIRST and FOLLOW sets.
iii. Construct the predictive parsing table.
iv. Check whether the resultant grammar is LL(1) or not.
v. Show the moves made by the predictive parser on the input (a, (a, a))
Answer:
i. After eliminating left recursive productions, the grammar G becomes:
S→(L)|a
L → SL‟
L‟ → , SL‟ | ε
ii. FIRST & FOLLOW sets
Non-terminal symbol FIRST FOLLOW
S {(a } {, )$}

L {(a } { )}

L‟ {,ε} { )}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 204


Automata Theory & Compiler Design 21CS51 Module 3

iii. Predictive parsing Table:

iv. The above modified grammar is LL(1) grammar, since the parsing table entry uniquely
identifies a production or signals an error.
v. Moves made by predictive parser on input (a, (a,a))
MATCHED STACK INPUT ACTION
S$ (a,(a, a))$
(L)$ (a,(a, a))$ Output S → (L)
( L)$ a,(a, a))$ match (
( SL‟)$ a,(a, a))$ Output L →SL‟
( aL‟)$ a,(a, a))$ Output S→a
(a L‟)$ ,(a, a))$ match a
(a ,SL‟)$ ,(a, a))$ Output L‟→,SL‟
(a, SL‟)$ (a, a))$ match ,
(a, (L)L‟)$ (a, a))$ Output S → (L)
(a,( L)L‟)$ a, a))$ match (
(a,( SL‟)L‟)$ a, a))$ Output L →SL‟
(a,( a L‟)L‟)$ a, a))$ Output S→a
(a,(a L‟)L‟)$ , a))$ match a
(a,(a ,SL‟)L‟)$ , a))$ Output L‟→,SL‟
(a,(a, SL‟)L‟)$ a))$ match ,
(a,(a, aL‟)L‟)$ a))$ Output S→a
(a,(a,a L‟)L‟)$ ))$ match a
(a,(a,a )L‟)$ ))$ Output L‟→ε
(a,(a,a) L‟)$ )$ match )
(a,(a,a) )$ )$ Output L‟→ε
(a,(a,a)) $ $ match )

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 205


Automata Theory & Compiler Design 21CS51 Module 3

Given the grammar:


S → aABb
A → c |ε
B → d |ε
i. Construct FIRST and FOLLOW sets.
ii. Construct the predictive parsing table.
iii. Show the moves made by the predictive parser on the input acdb.
Non-terminal symbol FIRST FOLLOW
S {a} {$}

A { c, ε } { d, b }

B { d, ε} {b}

Parsing Table:

Moves made by predictive parser on input acdb


MATCHED STACK INPUT ACTION
S$ acdb$
aABb$ acdb$ Output S →aABb
a ABb$ cdb$ match a
a cBb$ cdb$ Output A→ c
ac Bb$ db$ match c
ac db$ db$ Output B→ d
acd b$ b$ match d
acdb $ $ match b

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 206


Automata Theory & Compiler Design 21CS51 Module 3

Write the parsing table for the grammar shown below:


S→ iEtSS‟ |a
S‟ → eS |ε
E→ b
Is this grammar LL(1) ? Justify your answer.
Non-terminal symbol FIRST FOLLOW
S { i, a } { e, $ }

S‟ { e, ε } { e, $ }

E {b} {t}

Non-terminal

a b e i t $
S S→a S → iEtSS’
S’ → eS
S’ → ε
S’ S’ → ε
E E→ b

The above parsing table contains two production rules for M [S’, e]. So the given grammar is
not LL(1) grammar.
Here the grammar is ambiguous, and the ambiguity is manifested by a choice in what production to
use when an e (else) is seen. We can resolve this ambiguity by choosing S’ → eS.

SYNTAX ERROR HANDLING


Common Programming errors can occur at many different levels.
1. Lexical errors: include misspelling of identifiers, keywords, or operators.
2. Syntactic errors : include misplaced semicolons or extra or missing braces.
3. Semantic errors: include type mismatches between operators and operands.
4. Logical errors: can be anything from incorrect reasoning on the part of the programmer.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 207


Automata Theory & Compiler Design 21CS51 Module 3

What should the parser do in an error case?



The parser should be able to give an error message (as much as possible meaningful error
message).

It should be recovered from that error case, and it should be able to continue the parsing
with rest of the input.
Error Recovery in Predictive Parsing
An error may occur in the predictive parsing (LL(1) parsing)
1. If the terminal symbol on the top of stack does not match with the current input symbol.
2. If the top of stack is a non-terminal A, the current input symbol is a, and the parsing table
entry M [A, a] is empty.
ERROR-RECOVERY STRATEGIES
Explain error recovery strategies used during syntax analysis or by predictive parser.
The various error recovery strategies used during syntax analysis phase of a compiler (By
predictive parser):
1. Panic-Mode Recovery.
2. Phrase-Level Recovery.
3. Error Productions.
4. Global Correction.
Panic-Mode Recovery
On discovering an error, the parser discards input symbols one at a time until one of a designated
set of Synchronizing tokens is found. Synchronizing tokens are usually delimiters.
Example: semicolon or } whose role in the source program is clear and unambiguous.
It often skips a considerable amount of input without checking it for additional errors.
Advantage: Simplicity and it is guaranteed not to go into an infinite loop .
Phrase-Level Error Recovery
A parser may perform local correction on the remaining input. i.e: it may replace a prefix of the
remaining input by some string that allows the parser to continue.
Each empty entry in the parsing table is filled with a pointer to a specific error routine to take care
that error case.
Example: replace a comma by a semicolon, insert a missing semicolon etc.
 Local correction is left to the compiler designer.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 208


Automata Theory & Compiler Design 21CS51 Module 3

 It is used in several error-repairing compliers, as it can correct any input string.


 Difficulty in coping with the situations in which the actual error has occurred before the
point of detection.
Error Productions

If we have a good idea of the common errors that might be encountered, we can augment
the grammar with productions that generate erroneous constructs.

When an error production is used by the parser, we can generate appropriate error
diagnostics.

Since it is almost impossible to know all the errors that can be made by the programmers,
this method is not practical.
Global Correction
 Ideally, we would like a compiler to make as few changes as possible in processing incorrect
inputs and we have to globally analyze the input to find the error.
 This is an expensive method, and it is not in practice.
 We use algorithms that perform minimal sequence of changes to obtain a globally least cost
correction.
 It is too costly to implement in terms of time space, so these techniques only of theoretical
interest.
PANIC-MODE ERROR RECOVERY IN LL (1) PARSING
In panic-mode error recovery, we skip all the input symbols until a synchronizing token is found.
What is the synchronizing token?
All the terminal-symbols in the follow set of a non-terminal can be used as a synchronizing token
set for that non-terminal. So, a simple panic-mode error recovery for the LL(1) parsing:

Place all symbols in FOLLOW (A) into the synchronizing set for non terminal A as sync.

If the parser looks up entry is synch, then the non terminal on top of the stack is popped
(except for start symbol) in an attempt to resume parsing.

If the parser looks up entry M [A, a] and finds that it is blank, then the input symbol „a‟ is
skipped.

To handle unmatched terminal symbols, the parser pops that unmatched terminal symbol
from the stack and it issues an error message saying that unmatched terminal was inserted
and continue parsing by skipping the current input symbol.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 209


Automata Theory & Compiler Design 21CS51 Module 3

Panic-Mode Error Recovery – Example:


Explain panic mode error recovery techniques used for the following grammar:
S → AbS |e | ε
A→ a|cAd

Non-terminal symbol FIRST FOLLOW


S { a, c, e, ε } { $}

A { a, c } { b, d }

Construction of Predictive parsing table:


Input Symbol
Non-terminal a b c d e $
S S →AbS S →AbS S→e S→ε
A A→a Sync A → cAd Sync

Moves made by predictive parser on input: ceadb


STACK INPUT REMARK
S$ ceadb$
AbS$ ceadb$
cAdbS$ ceadb$
AdbS$ eadb$
AdbS$ adb$ Error, Skip e
adbS$ adb$
dbS$ db$
bS$ b$
S$ $
$ $

Explain how panic mode error recovery techniques used for the following grammar:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 210


Automata Theory & Compiler Design 21CS51 Module 3

E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ |ε
F → (E) | id

Consider the string: )id*+id

Non-terminal symbol FIRST FOLLOW


E { (, id } { ), $ }

T { (, id } { +, ), $}

F { (, id } { *, +, ), $ }

E‟ { +, ε } { ), $ }

T‟ { *, ε } { +, ), $ }

Construction of predictive parsing table:


Non-terminal Input Symbol
id + * ( ) $
‟ ‟
E E → TE E → TE Sync Sync
E’ E‟→+TE‟ E‟→ ε E‟→ ε
T T→ FT‟ Sync T→ FT‟ Sync Sync
T’ T‟→ ε T‟→*FT‟ T‟→ ε T‟→ ε
F F→ id Sync Sync F→ (E) Sync Sync

Moves made by the predictive parser on the input- )id*+id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 211


Automata Theory & Compiler Design 21CS51 Module 3

STACK INPUT REMARK


E$ )id*+id$ Error, skip )
E$ id*+id$
TE‟$ id*+id$
FT‟E‟$ id*+id$
id T‟E‟$ id*+id$
T‟E‟$ *+id$

*FT‟ E‟$ *+id$


FT‟ E‟$ +id$ Error, M [ F, +] = synch
T‟ E‟$ +id$ So F has been popped
E‟$ +id$
+TE‟$ +id$
TE‟$ id$

FT‟E‟$ id$
idT‟E‟$ id$
T‟E‟$ $
E‟$ $
$ $

Phrase-Level Error Recovery


 Each empty entry in the parsing table is filled with a pointer to a special error routine which
will take cares that error case. These error routines may: Change, insert, or delete input
symbols. It may issue appropriate error messages and pop items from the stack.
Note: We should be careful when we design these error routines, because we may put the parser
into an infinite loop.

NOTE:
ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 212
Automata Theory & Compiler Design 21CS51 Module 3

How to determine a Context free grammar is LL(1) or Not? without constructing parsing Table
1. For any CFG of the form:
A → α1 | α2 | α3 | ……..
If there is no ε in any of these rules, then find FIRST(α1), FIRST(α2), FIRST(α3) and so on.
Take the intersection of these FIRST()s pair-wise.
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) ……………….. = Ø (No common terms)
Then the grammar is LL(1) grammar otherwise it is not LL(1)
[Find the pair-wise intersection of FIRST()]

2. For any CFG of the form:


A → α1 | α2 | α3 | ………. |ε
Then find the FIRST(α1), FIRST(α2), FIRST(α3) and so on and also find the FOLLOW(A), for the
rule ε.
Take the intersection of these FIRST()s and FOLLOW(A) pair-wise.
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) ……… ∩ FOLLOW(A) ……….. = Ø
Then the grammar is LL(1) grammar otherwise it is not LL(1).

Example:
Check whether the following grammar is LL(1) or not without constructing parsing table.
1. S → aSa | bS | c
Answer:
FIRST(α1) = FIRST(aSa) = {a}
FIRST(α2) = FIRST(bS) = {b}
FIRST(α3) = FIRST(c) = {c}
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) = {a}∩ {b}∩ {c} = Ø
Therefore the given grammar is LL(1) grammar
Check whether the following grammar is LL(1) or not without constructing parsing table
S → iCtSS1| bS | a
S1 → eS | ε
C→b
For S production rule:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 213


Automata Theory & Compiler Design 21CS51 Module 3

FIRST(α1) = FIRST (iCtSS1) = {i}


FIRST(α2) = FIRST(bS) = {b}
FIRST(α3) = FIRST(a) = {a}
FIRST (α1) ∩ FIRST (α2) ∩ FIRST (α3) = {i} ∩ {b}∩ {a} = Ø
For S1 production rule:
FIRST(α1) = FIRST (eS) = {e}
Since the rule contains ε term we have to find FOLLOW(S1)
FOLLOW(S1) = FOLLOW(S) = FIRST(S1) = {e}
FIRST (α1) ∩ FOLLOW(S1) = {e} which is not equal to Ø
Therefore the given grammar is not LL(1) grammar
(No need to check for C production rule)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 214


Module 4
---------------------------------------------------------------------------------------------------------------------
Push Down Automata:
 Definition of the Pushdown Automata
 The Languages of a PDA.
Syntax Analysis Phase of Compilers: Part-2
 Bottom-up Parsing,
 Introduction to LR Parsing:
 SLR Parser
 More Powerful LR parsers: LALR

----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.

2. Alfred V. Aho, Monica S.Lam,Ravi Sethi, Jeffrey D. Ullman, “ Compilers Principles,


Techniques and Tools”, Second Edition, Perason.
Textbook 1:

 Chapter 6 – 6.1, 6.2


Textbook 2:
 Chapter 4 – 4.5, 4.6, 4.7 (Up to 4.7.4)

Page | 215
Automata Theory & Compiler Design 21CS51 Module 4

PUSH DOWN AUTOMATA (PDA)


A pushdown automaton (PDA) is a type of automaton that employs a stack. Pushdown automata
are used in theories about what can be computed by machines. They are more capable than finite-
state machines but less capable than Turing machines.
Context free language can be described using context free grammars. These context free
languages have a type of automaton that defines them. This automaton, called a “pushdown
automaton”, is an extension of ε- NFA.
A DFA or NFA is not powerful enough to recognize many context free languages, since it has finite
memories, and it cannot count and cannot store the input for future reference. We have a new
machine called Push Down Automata (PDA) similar to finite automata with an exception that PDA
has an extra stack. So the definition of PDA is similar to the definition of ε- NFA with slight
changes.
Types of PDA:
There are two types of PDA:
1. Deterministic PDA (DPDA)
2. Non deterministic PDA (NPDA)
Deterministic pushdown automata can recognize all deterministic context-free languages while
Nondeterministic PDA can recognize all context-free languages. Mainly the former are used in
parser design.
Explain a PDA with a neat diagram.
***Definition of PDA:
Block diagram of PDA:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 216


Automata Theory & Compiler Design 21CS51 Module 4

A Pushdown automaton has seven components, say P = (Q, Σ, Γ , δ, q0, Z0, F ) where
Q: A finite set of states.

Σ: A finite set of input symbols or alphabets.

Γ: A finite stack alphabet.

δ: The transition function, takes as argument in a triplet form as δ ( q, a, X )


where q is a state in Q.
a is either an input symbol in Σ or a = ε, the empty string, which is assumed
not to be an input symbol.
X is a stack symbol that is a member of Γ.
q0: The start state.

Z0: Initial stack symbol.

F: Set of accepting states or final states.


A finite state control reads inputs, one symbol at a time. The PDA is allowed to observe the symbol
at the top of the stack and to base its transition on its current state, the input symbol and the symbol
at the top of stack.
1. It consumes the input symbol that it uses in the transition. If ε is used for the input, then no
input symbol is consumed.
2. Goes to a new state, which may or may not be the same as the previous state.
3. Replaces the symbol at the top of the stack by any string. The string could be ε, which
corresponds to a pop of the stack. It could be the same symbol that appeared at the top of the
stack previously (NOP).
Pushdown automata choose a transition by indexing a table by input signal, current state, and the
symbol at the top of the stack. This means that those three parameters completely determine the
transition path that is chosen. Finally the given input string can be accepted by some final state or it
can be rejected, decided by the output unit.
A GRAPHICAL NOTATION FOR PDA‟s
The transition diagram for PDA‟s in which:
a. The nodes correspond to the states of the PDA.
b. An arrow labeled Start indicates the start state, and doubly circled states are accepting
states, as for finite automata.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 217


Automata Theory & Compiler Design 21CS51 Module 4

c. The arcs correspond to transitions of the PDA in the following sense. An arc labeled a, X/α
from state q to state p means that δ ( q, a, X ) contains the pair (p, α ). It tells what input is
used, and also gives the old and new tops of the stack.
INSTANTANEOUS DESCRIPTIONS OF A PDA (I D)
How PDA processes the input string, that means the PDA goes from configuration to configuration,
in response to input symbols (or ε ) can be represented using Instantaneous Descriptions of PDA.
Definition of Instantaneous Descriptions (ID)
Let P = ( Q, Σ, Γ , δ, q0, Z0, F ) be a PDA, the Instantaneous Descriptions of a PDA has a triplet
form (q, w, γ ) where q is the state.
w is the remaining input, and
γ is the stack contents.
Example: let the current configuration of PDA be ( q, aw, Zα), it means
q is the current state.
aw is the string to be processed.
Zα is current content of of the stack with Z as the topmost symbol on the stack.

(q, aw, Zα ) (p, w, βα) means that the current configuration of PDA will be (q, aw, Zα ) and
after applying zero or more number of transitions, the PDA enters into new configuration (p, w, βα ).

Note: means by applying one or more transitions.


The Languages of a PDA:
****Discuss the Languages accepted by PDA .
There are two ways in which PDA accept the language:
1. Acceptance by final state: After consuming the input string, if a PDA enters an accepting
state, then we call this approach as acceptance by final state.
2. Acceptance by empty stack: Set of input string that cause the PDA to empty its stack,
starting from initial ID.
1. Acceptance by Final state:
Let P = ( Q, Σ, Γ, δ, q0, Z0, F ) be a PDA. Then the language accepted by PDA P by final state is

L(P) = { w | (q0, w, Z0) (q, ε, α ) } for some state q in F and any stack string α.
2. Acceptance by Empty stack state:
Let P = ( Q, Σ, Γ, δ, q0, Z0, F ) be a PDA. Then the language accepted by PDA P by empty stack

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 218


Automata Theory & Compiler Design 21CS51 Module 4

is N(P) = { w | (q0, w, Z0) (q, ε, ε ) } for any state q. That is, N(P) is the set of inputs w that
PDA P can consume and at the same time empty its stack.
Design a PDA to accept the language L = { an bn | n ≥ 0 } . Draw the graphical representation of PDA
obtained. Also write the ID for the string „aaabbb‟.
Procedure: Since language contains strings of „n‟ number of a‟s followed by „n‟ number of b‟s,
machine can read n number of „a‟s in start state. Let us push all the scanned input symbol ‘a’ onto the
stack. When machine encounter input string as „b‟, we should see that for each „b‟ input, there should be
corresponding symbol ‟a‟ on the stack. Finally if there is no input (ε) and stack is empty, it indicates that
the string scanned has n number of „a‟s followed by n number of „b‟s.

PDA to accept L = { an bn | n ≥ 0 } is given by:


P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, a) = (q1, ε)
δ(q1, b, a) = (q1, ε) Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
δ(q1, ε, Z0) = (qf, Z0) Σ = { a, b}, Γ = { a, Z0 } and F = { qf }
δ(q0, ε, Z0) = (qf, Z0)
Graphical representation ( Transition diagram) :

Instantaneous Description for the string “aaabbb”


(q0, aaabbb, Z0 ) (q0, aabbb, aZ0 ) (q0, abbb, aaZ0 ) (q0, bbb, aaaZ0 ) (q1, bb, aaZ0 ) (q1, b, aZ0 )
(q1, ε, Z0 ) (qf, ε, Z0 )

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 219


Automata Theory & Compiler Design 21CS51 Module 4

Design a PDA to accept the language L = { an bn | n ≥ 0 } by empty stack method.


Note: Procedure remains same as previous problem, only the changes in final state transition
function. That is once the end of input string is encountered (ε) the stack should be empty. Here the
final state is irrelevant.
Transition function for PDA to accept L = { an bn | n ≥ 0 } by empty stack is given by:
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, a) = (q1, ε)
δ(q1, b, a) = (q1, ε) Q = { q0, q1, q2 }, q0 is the start state, Z0 is the initial stack symbol
δ(q1, ε, Z0) = (q2, ε) Σ = { a, b}, Γ = { a, Z0 } and F={q2}
δ(q0, ε, Z0) = (q2, ε)
Design a PDA to accept the language L = { an b2n | n ≥ 0 } . Draw the transition diagram and also write
the moves made by PDA for the string „aabbbb‟.

Procedure: Since language contains strings of „n‟ number of a‟s followed by„2n‟ number of b‟s,
machine can read n number of „a‟s in start state. For each input symbol ‘a’ push two ‘a’s onto the stack.
When machine encounter input string as „b‟, we should see that for each „b‟ input, there should be
corresponding symbol ‟a‟ on the stack. Finally if there is no input (ε) and stack is empty, it indicates that
the string scanned has n number of „a‟s followed by 2n number of „b‟s.

PDA to accept L = { an b2n | n ≥ 0 } is given by:


P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aaZ0)
δ(q0, a, a) = (q0, aaa) Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
δ(q0, b, a) = (q1, ε) Σ = { a, b}, Γ = { a, Z0 } and F = { qf }
δ(q1, b, a) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0)
δ(q0, ε, Z0) = (qf, Z0)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 220


Automata Theory & Compiler Design 21CS51 Module 4

Graphical representation ( Transition diagram) :

Moves made by PDA for the string “aabbbb”:


(q0, aabbbb, Z0 ) (q0, abbbb, aaZ0 ) (q0, bbbb, aaaaZ0 ) (q1, bbb, aaaZ0 ) (q1, bb, aaZ0 ) (q1, b, aZ0 )
(q1, ε, Z0 ) (qf, ε, Z0 )

Design a PDA to accept the language L = { 02n 1n | n ≥ 1 } . Draw the transition diagram for the
constructed PDA. Also show the moves made by PDA for the string „000011‟.
Procedure: Since language contains strings of „2n‟ number of 0‟s followed by „n‟ number of 1‟s,
machine can read 2n number of „0‟s in start state. In start state q0, let us push all the scanned input
symbol ‘0’ onto the stack. When it reads „1‟ in q0, change the state to q1 and pop one „0‟ from stack. In
state q1 without consuming any input (ε) symbol, change the state to q2 and pop one „0‟ from stack. In
state q2 machine reads input symbol as „1‟ and change the state to q1, pop one‟0‟ from stack and this
process is repeated. When machine encounter, there is no more input(ε) in state q2 and stack is empty,
change the state to final state qf. It indicates that the string scanned has 2n number of „0‟s followed by n
number of „1‟s.
PDA to accept L = { a2n bn | n ≥ 1 } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, 0, Z0) = (q0, 0Z0)
δ(q0, 0, 0) = (q0, 00)
δ(q0, 1, 0) = (q1, ε)
δ(q1, ε, 0) = (q2, ε)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 221


Automata Theory & Compiler Design 21CS51 Module 4

δ(q2, 1, 0) = (q1, ε)
δ(q2, ε, Z0) = (qf, Z0) Q = { q0, q1, q2, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { 0, 1}, Γ = { 0, Z0 } and F = { qf }
Transition diagram:

Moves made by PDA for the string „000011‟:


(q0, 000011, Z0) (q0, 00011, 0Z0) (q0, 0011, 00Z0) (q0, 011, 000Z0) (q0, 11, 0000Z0) (q1, 1, 000Z0)
(q2, 1, 00Z0) (q1, ε, 0Z0) (q1, ε, 0Z0) (q2, ε, Z0) (qf, ε, Z0)

Design a PDA to accept the language L = { w | w € ( a+b)* and Na(w) = Nb(w) } . Draw the transition
diagram for the constructed PDA. Also show the moves made by PDA for the string „abbaaabb”
Procedure: The first scanned input symbol is either „a‟ or „b‟, push that symbol onto the stack. From this
point onwards, if the scanned input symbol and the top of stack symbol are same, then push that current
input symbol onto the stack. If the input symbol and top of stack symbol are different, then pop one symbol
from stack and repeat the process. Finally, when end of string is encountered, if the stack is empty, we say
that the string w has equal number of „a‟s and „b‟s otherwise number of „a‟s and „n‟s are different.

PDA to accept L = { w | w € ( a+b)* and Na(w) = Nb(w) } is given by:


P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ε)
δ(q0, b, a) = (q0, ε)
δ(q0, ε, Z0) = (qf, Z0) Q = { q0, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b}, Γ = { a, b, Z0 } and F = { qf }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 222


Automata Theory & Compiler Design 21CS51 Module 4

Transition diagram:

Moves made by PDA for the string : abbaaabb


(q0, abbaaabb, Z0) (q0, bbaaabb, aZ0) (q0, baaabb, Z0) (q0, aaabb, bZ0) (q0, aabb, Z0) (q0, abb, aZ0)
(q0, bb, aaZ0) (q0, b, aZ0) (q0, ε, Z0) (qf, ε, Z0)

Design a PDA to accept the language L = { w | w € ( a+b)* and Na(w) > Nb(w) } . Draw the transition
diagram for the constructed PDA. Also show the moves made by PDA for the string „baaabbaa‟.
Note: Procedure remains same as previous problem, only the changes in final state transition
function. That is once the end of input string is encountered (ε) , the stack should contain at least one
„a‟. From this point onwards change state to q1, keep on popping the symbol a from stack until stack
gets empty. When stack is empty (Z0), input is already empty, so go to final state and accept the
language.
PDA to accept L = { w | w € ( a+b)* and Na(w) > Nb(w) } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ε)
δ(q0, b, a) = (q0, ε)
δ(q0, ε, a) = (q1, ε)
δ(q1, ε, a) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0) Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b}, Γ = { a, b, Z0 } and F = { qf }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 223


Automata Theory & Compiler Design 21CS51 Module 4

Transition Diagram:

Moves made by PDA for the string „baaabbaa‟:


(q0, baaabbaa, Z0 ) (q0, aaabbaa, bZ0 ) (q0, aabbaa, Z0 ) (q0,abbaa, aZ0 ) (q0, bbaa, aaZ0 )
(q0, baa, aZ0 ) (q0, aa, Z0 ) (q0, a, aZ0 ) (q0, ε, aaZ0 ) (q1, ε, aZ0 ) (q1, ε, Z0 ) (qf, Z0 )

Design a PDA to accept the language L = { w | w € ( a+b)* and Na(w) < Nb(w) } . Draw the transition
diagram for the constructed PDA. Also show the moves made by PDA for the string „aabbbbab‟.
Note: Procedure remains same as previous problem, only the changes in final state transition
function. That is once the end of input string is encountered (ε) , the stack should contain at least one
„b‟. From this point onwards change state to q1, keep on popping the symbol b from stack until stack
gets empty. When stack is empty (Z0), input is already empty, so go to final state and accept the
language.
PDA to accept L = { w | w € ( a+b)* and Na(w) > Nb(w) } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ε)
δ(q0, b, a) = (q0, ε)
δ(q0, ε, b) = (q1, ε)
δ(q1, ε, b) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0) Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b}, Γ = { a, b, Z0 } and F = { qf }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 224


Automata Theory & Compiler Design 21CS51 Module 4

Transition diagram:

Moves made by PDA for the string: aabbbbab


(q0, aabbbbab, Z0 ) (q0, abbbbab, aZ0 ) (q0, bbbbab, aaZ0 ) (q0, bbbab, aZ0 ) (q0, bbab, Z0 )
(q0, bab, bZ0 ) (q0, ab, bbZ0 ) (q0, b, bZ0 ) (q0, ε, bbZ0 ) (q1, ε, bZ0 ) (q1, ε, Z0 ) (qf, Z0 )

Design a PDA to accept the language L = { wCwR | w € ( a+b)* } . Draw the transition diagram and
also write the moves made by PDA for the string “baaCaab”.
Procedure: To check for palindrome, let us push all scanned input symbols onto the stack till we
encounter the letter C. Once we pass the middle string, if the string is palindrome, for each scanned input
symbol, there should be a corresponding symbol (same as input symbol) on the stack. Finally if there is
no input and stack is empty, we say that the given string is palindrome and accepted by PDA.
PDA to accept L = { wCwR | w € ( a + b)* } is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb)
δ(q0, a, b) = (q0, ab)
δ(q0, b, a) = (q0, ba)
δ(q0, C, a) = (q1, a)
δ(q0, C, b) = (q1, b)
δ(q1, a, a) = (q1, ε)
δ(q1, b, b) = (q1, ε)
δ(q0, C, Z0) = (q1, Z0) ; for w = ε
δ(q1, ε, Z0) = (qf, Z0)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 225


Automata Theory & Compiler Design 21CS51 Module 4

Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol


Σ = { a, b, C}, Γ = { a, b, Z0 } and F = { qf }
Transition diagram:

Moves made by PDA for the string “baaCaab”:


(q0, baaCaab, Z0 ) (q0, aaCaab, bZ0 ) (q0, aCaab, abZ0 ) (q0, Caab, aabZ0 ) (q1, aab, aabZ0 )
(q1, ab, abZ0 ) (q1, b, bZ0 ) (q1, ε, Z0 ) (qf, ε, Z0 )

Design an NPDA to accept the language L = { wwR | w € ( a+b)* } . Draw the transition diagram and
also write the moves made by PDA for the string “baaaab”.
Procedure: To check for palindrome, let us push all scanned input symbols onto the stack till we
encounter the midpoint. Once we pass the middle string, if the string is palindrome, for each scanned
input symbol, there should be a corresponding symbol (same as input symbol) on the stack. Finally if
there is no input and stack is empty, we say that the given string is palindrome.

PDA to accept L = { wwR | w € ( a+b)* } is given by:


P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, b, Z0) = (q0, bZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, b) = (q0, bb )
δ(q0, a, b) = (q0, ab)
δ(q0, b, a) = (q0, ba)
δ(q0, ε, a) = (q1, a) ; for midpoint of wwR

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 226


Automata Theory & Compiler Design 21CS51 Module 4

δ(q0, ε, b) = (q1, b)
δ(q1, a, a) = (q1, ε)
δ(q1, b, b) = (q1, ε)
δ(q1, ε, Z0) = (qf, Z0)
δ(q0, ε, Z0) = (qf, Z0) ; for w= ε
Q = { q0, q1, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a, b,}, Γ = { a, b, Z0 } and F = { qf }
Moves made by PDA for the string “baaaab”:
(q0, baaaab, Z0 ) (q0, aaaab, bZ0 ) (q0, aaab, abZ0 ) (q0, aab, aabZ0 ) (q1, aab, aabZ0 ) (q1, ab, abZ0 )
(q1, b, bZ0 ) (q1, ε, Z0 ) (qf, ε, Z0 )
Transition diagram:

Design a PDA to accept the language L = { 0n1m0n | m, n ≥ 1 } . Draw the transition diagram and also
write the moves made by PDA for the string “0011100”.
Procedure: Initially (q0) machine reads n number of „0‟s, push all the scanned input symbol „0‟ onto the
stack, when machine reads „1‟ in start state q0 , change the state to q1, and do not alter the content of stack.
In q1 state machine reads „1‟s and ignores that symbol. When machine reads „0‟ in q1 state, we should see
that for each scanned input symbol „0‟ there should be a corresponding symbol „0‟ on the stack, so change
the state to q2 and pop one „0‟ from stack. Finally if there is no input (ε) and stack is empty, we say that
string w has n number of „0‟s followed by „m‟ number of „1‟s followed by „n‟ number of „0‟s.
PDA to accept L = { 0n1m0n | m, n ≥ 1 } . is given by:
P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, 0, Z0) = (q0, 0Z0)
δ(q0, 0, 0) = (q0, 00)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 227


Automata Theory & Compiler Design 21CS51 Module 4

δ(q0, 1, 0) = (q1, 0)
δ(q1, 1, 0) = (q1, 0)
δ(q1, 0, 0) = (q2, ε)
δ(q2, 0, 0) = (q2, ε)
δ(q2, ε, Z0) = (qf, Z0)
Q = { q0, q1, q2, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { 0,1,}, Γ = { 0, Z0 } and F = { qf }
Moves made by PDA for the string “0011100”:
(q0, 0011100, Z0 ) (q0, 011100, 0Z0 ) (q0, 11100, 00Z0 ) (q1, 1100, 00Z0 ) (q1, 100, 00Z0 )
(q1, 00, 00Z0 ) (q2, 0, 0Z0 ) (q2, ε, Z0 ) (qf, ε, Z0 )
Transition diagram:

**15. Design a PDA to accept the language L = { aibjck | i + j = k and i, j, k ≥ 1 } where i + j = k.


PDA is given by
δ(q0, a, Z0) = (q0, aZ0)
δ(q0, a, a) = (q0, aa)
δ(q0, b, a) = (q1, ba)
δ(q1, b, b) = (q1, bb)

δ(q1, c, b) = (q2, ε)
δ(q2, c, b) = (q2, ε)
δ(q2, c, a) = (q3, ε)
δ(q3, c, a) = (q3, ε)
δ(q3, ε, Z0) = (qf, Z0 )
Q = { q0, q1, q2,q3, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { a,b,c}, Γ = {a, b, Z0 } and F = { qf }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 228


Automata Theory & Compiler Design 21CS51 Module 4

Design a PDA to accept the language L = { 0n1m0m1n | m, n ≥ 1 } . Draw the transition diagram and
also write the moves made by PDA for the string “0011100011”.
Procedure: Initially (q0) machine reads n number of „0‟s, push all the scanned input symbol „0‟ onto the
stack, when machine reads „1‟ in start state, change the state to q1, and push that input symbol onto the
stack. In q1 state machine reads as many number of „1‟s and push that symbol onto the stack. When
machine reads „0‟ in q1 state, we should see that for each scanned input symbol „0‟ there should be a
corresponding symbol „1‟ on the stack, so change the state to q2 and pop one „1‟ from stack. Again in q2
machine reads „0‟s and each time we should see that for each scanned input symbol „0‟ there should be a
corresponding symbol „1‟ on the stack and pop one‟1‟ from stack. In q2 if machine reads „1‟s , then
change state to q3 and we should see that for each scanned input symbol „1‟ there should be a
corresponding symbol „0‟ on the stack and pop one‟0‟ from stack Again in q3 machine reads remaining
„1‟s and each time pop one „0‟ from stack. Finally in q3 if there is no input (ε) and stack is empty, we say
that string w has n number of „0‟s followed by „m‟ number of „1‟s followed by „m‟ number of „0‟s
followed by „n‟ number of „1‟s.

PDA to accept L = { 0n1m0m1n | m, n ≥ 1 } is given by:


P = ( Q, Σ, Γ , δ, q0, Z0, F ) where δ is given by
δ(q0, 0, Z0) = (q0, 0Z0)
δ(q0, 0, 0) = (q0, 00)
δ(q0, 1, 0) = (q1, 10)
δ(q1, 1, 1) = (q1, 11)

δ(q1, 0, 1) = (q2, ε)
δ(q2, 0, 1) = (q2, ε)
δ(q2, 1, 0) = (q3, ε)
δ(q3, 1, 0) = (q3, ε)
δ(q3, ε, Z0) = (qf, Z0)
Q = { q0, q1, q2,q3, qf }, q0 is the start state, Z0 is the initial stack symbol
Σ = { 0,1,}, Γ = { 0,1,Z0 } and F = { qf }
Moves made by PDA for the string “0011100011”:
(q0, 0011100011, Z0 ) (q0, 011100011, 0Z0 ) (q0,11100011, 00Z0 ) (q1, 1100011, 100Z0 )

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 229


Automata Theory & Compiler Design 21CS51 Module 4

(q1, 100011, 1100Z0 ) (q1, 00011, 11100Z0 ) (q2, 0011, 1100Z0 ) (q2, 011, 100Z0 ) (q2, 11, 00Z0 )
(q3, 1, 0Z0 ) (q3, ε, Z0 ) (qf, ε, Z0 )
Graphical representation of PDA (Transition diagram):

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 230


Automata Theory & Compiler Design 21CS51 Module 4

BOTTOM - UP PARSING
What is bottom – up parsing?
A bottom-up parser creates the parse tree of the given input string, starting from leaves working
towards the root (start symbol).
A bottom-up parser tries to find the right-most derivation of the given input in the reverse order.
Example:

E →E +T |T
T → T*F |F
F → ( E ) | id
Construct a bottom-up parse tree for the input string id * id

The above bottom up parse tree construction is same as that of deriving input strings by RMD in
reverse order.

REDUCTIONS
What is reduction?
Reduction is the reverse of step in derivation, where substring of input matching the RHS of

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 231


Automata Theory & Compiler Design 21CS51 Module 4

production is replaced by the non terminal at the LHS (Head) of that production.
We can think of bottom – up parsing as the process of “reducing” a string w to the start symbol of
the grammar. At each reduction step, a specific substring matching the body of the (RHS)
production is replaced by the non-terminal at the head (LHS) of that production.
The key decisions during bottom-up parsing are about:

When to reduce the input substring.

What production to apply, as the parse proceeds.
For the above example reductions will be discussed in terms of the sequence of strings:
id * id, F * id, T * id, T * F, T, E
Here sequence starts with id * id.
The first reduction process generates the sequence F * id by reducing the leftmost id to F, using the
production F → id
The second reduction produces T * id by reducing F to T, using T → F.
Now we have a choice between reducing string T, which is the body of E → T, and the string
consisting of second id, which is the body of F → id. Rather than reduce T to E, the second id is
reduced to F, resulting in the string T * F. This string is reduced to T. The parse completes with
reduction of T to the start symbol E.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 232


Automata Theory & Compiler Design 21CS51 Module 4

HANDLE
Define handle with an example.
OR
For the following grammar indicate the handle for the right sentential form id1 * id2
A handle of a string is the substring that matches the right side of a production rule.
Every substring that matches the right side of a production rule is need not be a handle.
Example:
During the parse of input id1 * id2 according to the grammar are:
E →E +T |T

T → T*F |F

F → ( E ) | id
Reduction sequences: id1 * id2, F * id2, T * id2, T * F, T, E

RIGHT SENTENTIAL FORM HANDLE REDUCTION PRODUCTION


id1 * id2 id1 F → id
F * id2 F T →F
T * id2 id2 F → id
T *F T *F T →T *F
T T E→T

NOTE: The symbol T is not a handle in the sentential form T * id2. If T were indeed replaced by
E, we would get the string E * id2, which cannot be derived from start symbol E.
If the grammar is unambiguous, then every right-sentential form of the grammar has exactly one
handle.
HANDLE PRUNING:
What is handle pruning ? Give a bottom up parse for the input : aaa * a++ and the grammar:
S → SS + | SS * | a
A bottom up parsing is an attempt to detect the handle of a right sentential form and whenever a
handle is detected, the reduction is performed. This is equivalent to performing a rightmost
derivation in reverse and is called “handle pruning”.
Bottom up parse for the input aaa*a++:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 233


Automata Theory & Compiler Design 21CS51 Module 4

The sequence of strings during reductions are:


aaa*a++, Saa*a++, SSa*a++, SSS*a++, SSa++, SSS++, SS+, S
Give bottom-up parsing for the strings 000111 and grammar S→ 0S1 | 01 and construct parse tree
in each step of deviation.
The sequence of strings in reduction process: 000111, 00 S11, 0S1, S
Handles during parse of string 000111:
Right sentential form Handle Reducing Production
000111 01 S→ 01

00S11 0S1 S→ 0S1

0S1 0S1 S→ 0S1

Bottom up parse tree:

SHIFT REDUCE PARSER:


A convenient way to implement a bottom up parser is to use a shift reduce technique.

What is shift reduce parser?


A shift-reduce parser is one form of bottom up parser which tries to reduce the given input string
into the starting symbol. It requires two data structures; in which stack holds grammar symbols
and input buffer holds the rest of the string to be parsed.
ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 234
Automata Theory & Compiler Design 21CS51 Module 4

Working Principle:

 During left to right scan of the input string, shift reduce parser goes on shifting the input
symbols onto the stack until a handle comes on the top of the stack.
 When a handle appears on the top of the stack, it performs reduction.
 The parser repeats the cycle (shift/reduce) until it has detected an error or the stack contains
the start symbol and the input is empty (successful).

NOTE: In bottom-up parsing we show the top of the stack on the right, rather than on the left as
we did for top down parsing.
ACTIONS OF SHIFT REDUCE PARSER
Explain with an example, the stack implementation of a shift reduce parser.
List and explain the actions of shift reduce parser.

Shift- reduce parser performs the following actions:


1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery routine.
Initially stack is empty and the input string w will take the following form:
STACK INPUT
$ w$
Parser announces a successful completion of parsing, upon entering to the following configuration:
STACK INPUT
$S $

Consider the grammar


E →E +E
E → E*E
E→ (E)
E→ id
Perform shift reduce parsing for the string id1 * id2 * id3

Configuration of shift reduce parser on input id1 *id2 * id3

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 235


Automata Theory & Compiler Design 21CS51 Module 4

STACK INPUT ACTION

$ id1 * id2 * id3 $ Shift id1

$ id1 * id2 * id3 $ Reduce by E→ id

$E * id2 * id3 $ Shift *

$E* id2 * id3 $ Shift id2

$ E * id2 * id3 $ Reduce by E→ id

$E*E * id3 $ Reduce by E→ E * E

$E * id3 $ Shift *

$E * id3 $ Shift id3

$E * id3 $ Reduce by E→ id

$E * E $ Reduce by E→ E * E

$E $ ACCEPT

For the grammar S → 0S1 | 01, give shift reduce configuration on input string 000111
Shift reduce configuration for 000111
STACK INPUT ACTION
$ 000111 $ Shift 0
$0 00111 $ Shift 0
$00 0111 $ Shift 0
$000 111 $ Shift 1
$0001 11 $ Reduce by S→ 01
$00S 11 $ Shift 1
$00S1 1 $ Reduce by S→ 0S1
$0S 1 $ Shift 1
$0S1 $ Reduce by S→ 0S1
$S $ ACCEPT

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 236


Automata Theory & Compiler Design 21CS51 Module 4

Consider the following grammars and parse the respective strings using shift- reduce parser.
E →E+T | T

T → T*F | F
F → (E) | id
string is “id + id * id”
Here we follow 2 rules
1. If the incoming operator has more priority than in stack operator then perform shift.
2. If in stack operator has same or less priority than the priority of incoming operator then
perform reduce.

Write the context free grammar and perform shift reduce parsing for the input int a, b, c;
Context free grammar for int id, id, id;
S → T L;
T → int
L → L, id | id
Configuration of shift reduce parser on input: int id, id, id;
STACK INPUT ACTION
$ int id, id, id;$ Shift int
$ int id, id, id;$ Reduce int
$T id, id, id;$ Shift id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 237


Automata Theory & Compiler Design 21CS51 Module 4

$ T id , id, id;4 Reduce id


$TL , id, id;4 Shift ,
$TL id, id;4 Shift id
$ T L, id , id;4 reduce
$TL , id;$ Shift ,
$ T L, id;$ Shift id
$ T L, id ;$ reduce
$TL ;$ Shift ;
$ T L; $ reduce
$S $ Accept
CONFLICTS DURING SHIFT-REDUCE PARSING:
There are some context free grammars, for which we cannot use shift reduce parsing. But every
shift reduce parser for such grammar can reach a configuration in which the parser knows the entire
stack content as well as the next k input symbols, but cannot decide whether to shift or reduce
(Shift/Reduce) or cannot decide which of several reductions to apply (Reduce/reduce).
Explain the conflicts that may occur during shift reduce parsing
Conflicts that may occur during shift reduce parsing is
i. Shift/Reduce
ii. Reduce/Reduce
Shift/Reduce Conflict:
The situation in which parser cannot make decisions, whether to shift or reduce is called
shift/reduce conflict.
Example:
Statement → if expr then Statement | if expr then Statement else Statement | other
If we have a shift- reduce parser in configuration:

STACK INPUT
$……if expr then Statement else………….$
Here depending on what follows the else on the input:
 
it might be correct to reduce if expr then Statement to Statement, or

it might be correct to shift else and then look for another Statement to complete the

alternative: if expr then Statement else Statement.
The above shift/reduce conflict can be resolved by shifting else onto the stack.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 238


Automata Theory & Compiler Design 21CS51 Module 4

Reduce/reduce conflict:
The situation in which parser cannot make decision about which of several reductions to apply are
called reduce/reduce conflict
Example:
E → E + id
E → id
Suppose the input string is: id + id
If we have shift- reduce parser in configuration:
STACK INPUT
$ E + id $
Here parser can perform reduction of id to E or it can perform reduction E + id to E. This conflict
can be resolved by reducing E + id to E.
Shift reduce implementation does not tell us anything about the technique used for detection of
handles. Depending upon the technique used for detection of handles, we get different shift reduce
parsers.
i. Operator precedence parser: Uses the precedence relationship between certain pairs of
terminals to guide the selection of handles.
ii. LR parser: It uses DFA that recognizes the set of all viable prefixes; by reading the stack
from bottom to top, to determine what handle, if any, is on the top of the stack.
LR PARSER
What is LR parser? What is the meaning of L and R in LR grammars?
LR parser is a shift reduce parser uses DFA to recognize handles, based on the concept called
LR(k) parsing; where L is for left to right scan of the input, the R for constructing the rightmost
derivation in reverse, and k for number of look ahead input symbols used in making parsing
decisions.
Why LR parsing more attractive?

 LR parsers can be constructed for all programming language constructs for which CFGs can
be written. 
 LR parser is more efficient. 
 LR parser can quickly detect a syntactic error. 
 LR parser constructed for LR grammars can describe more languages than LL grammars. 

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 239


Automata Theory & Compiler Design 21CS51 Module 4

Drawback of LR methods:
Too much work to construct an LR parser by hand for a typical programming language grammar.
But automatic parser generators like YAAC will take CFG as input and produces a parser for that
grammar.
Items or LR(0) items
How does a shift-reduce parser know when to shift and when to reduce? For example, with stack
contains $T and next input symbol * in the following configuration

Stack Input

How does the parser know that T on the top the stack is not a handle, so the appropriate action is
to shift and to reduce T to E?
An LR parser makes shift-reduce decisions by maintaining states, to keep track of where we are in a
parse.
Define LR(0) item (Item).
An LR(0) item of a grammar G is a production rule of G with a dot placed at some position of the
right hand side of the rule.
Example: A grammar G has production rule A → XYZ results in four LR(0) items as:

A → .XYZ

A → X .YZ

A → XY .Z

A → XYZ .
The dot (.) indicates how much of the right hand side of the production is seen at a given point in
the parsing process.

Item A → .XYZ indicates we hope to see a string derivable from XYZ next on the input.

Item A → X .YZ indicates that we have just seen on the input a string from X and next we

hope to see a string derivable from YZ on the input.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 240


Automata Theory & Compiler Design 21CS51 Module 4

Item A → XYZ . indicates that we have seen the body XYZ and that it may be time to reduce
XYZ to A (as a handle)

CANONICAL LR(0) COLLECTION:


Collection of sets of LR(0) items are called as canonical LR(0) collection.
Set of states of DFA will be a collection of sets of items. The set of items that correspond to the
states of DFA that recognizes viable prefixes is called a canonical collection.
Every set of items represents the state of DFA.
Construction of canonical LR(0) item sets for a grammar requires:

Augmented grammar

CLOSURE Function.

GOTO function
Augmented grammar: 
For any grammar G, the augmented grammar G’ is the grammar G with a new start symbol S‟
and the production S‟ → S
The purpose of this new production is to indicate, when the parser should stop parsing and
announce acceptance of the input. Acceptance occurs when and only when the parser is about to
reduce by S‟ → S
Example:
Grammar G: Augmented grammar G’:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 241


Automata Theory & Compiler Design 21CS51 Module 4

CLOSURE FUNCTION:
If I is a set of items for grammar G, then CLOSURE(I) is the set of items constructed from I by the
two rules:
i. Initially add every item in I to CLOSURE(I).

ii. If A → α.Bβ is in CLOSURE(I) and B → Γ is a production, then add the item B →. Γ to


CLOSURE( I ), if it is not already there. Apply this rule until no more new items can be
added to CLOSURE( I ).
Example:
Grammar G has:

The CLOSURE ( { E → .E + T } ) = E → .E + T ( by rule 1)


Also by rule 2: we have to add E productions with dot appears on left most end of the body of E
productions. ie: E → . T
Again by rule 2: add T productions, since . follows T. ie: T → . T * F and T → . F
Again by rule 2: add F productions, since . follows F. ie: F → . (E) and F→ . id
Therefore CLOSURE ({ E → .E + T }) = E → .E + T

E→.T

T→.T*F
T→.F
F → . (E)
F→ . id
GOTO FUNCTION:
GOTO ( I, X ) is the transition from I on X, first identify all the items in I in which the dot precedes
X on the right side. Then move the dot in all the selected items one position to the right (over X)
and then take the closure of the set of these items.

GOTO ( I, X ) = CLOSURE ( {A → αX.β , such that A → α.Xβ is in I } )

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 242


Automata Theory & Compiler Design 21CS51 Module 4

Example: if set I = {E → .E + T
E→.T
T→.T*F
T→.F
F → . (E)
F→ . id
}
Then GOTO ( I, T ) = CLOSURE ( { E → T.
T → T. * F }
)
= { E → T.
T → T. * F
}
*****What are Kernel and non-kernel items?
Kernel Items:
Those items with initial item S‟ → .S and all items whose dots are not at the left end are called
Kernel items.
Example: S‟ → .S
E → T.
T → T. * F
Non-Kernel Items:
All items with their dots at the left end except for S‟→ .S are called non-kernel items.
Example:
E → .E + T
E→.T
T→.T*F
Viable Prefixes:
The prefixes of right sentential forms that can appear on the stack of a shift reduce parser are called
viable prefixes.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 243


Automata Theory & Compiler Design 21CS51 Module 4

Steps in computation of the canonical collection of sets of LR(0) items:


Input: Augmented grammar
Output : Canonical collection of sets of LR(0) items: C

C = { CLOSURE ( { S‟ → .S }) }
repeat
for ( each set of Items I in C ) for (
each grammar symbol X )
if (GOTO ( I, X ) is not empty and not in C ) add
GOTO (I, X) to C;
Until no new sets of items are added to C on a round;

Obtain the sets of canonical collection of sets of valid LR(0) items for the grammar given below:
S → CC
C → cC | d
Answer
Grammar G:
S → CC
C → cC
C→d
Augmented grammar G‟:
S‟ → S
S → CC
C → cC
C→d
Canonical collection of sets of LR(0) items are computed as follows:
I0 = CLOSURE ( { S‟ → .S } ) = { S’ → .S

S → .CC

C → .cC

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 244


Automata Theory & Compiler Design 21CS51 Module 4

C → .d }

Find the transitions from I0 on input symbol X:


First identify all the items in I0 in which the dot precedes X on the right side.
When X = S
S‟ → .S
When X = C
S → .CC
When X = c
C → .cC
When X = d
C → .d
Therefore in I0 find the transitions or GOTO functions on X = S, C, c and d input.
NOTE: While computing GOTO functions we have to find the closure of the above productions
(items) for different values of X, by moving the dot one position to the right.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 245


Automata Theory & Compiler Design 21CS51 Module 4

I1 contains the item in which dot is already moved to the rightmost end, so there is no GOTO
function or transition in I1 .

The canonical collection of LR(0) items for the given grammar is C = { I0, I1, I2, I3, I4, I5, I6 }
LR(0) Automaton for the given grammar is:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 246


Automata Theory & Compiler Design 21CS51 Module 4

Construct the DFA of LR(0) items for the grammar:

Stmt_sequence → Stmt_sequence ; stmt | stmt


Stmt → s
Identify the Kernel and non-kernel items in state I4.
Answer:
Augmented grammar:
Stmt_sequence‟ → Stmt_sequence
Stmt_sequence → Stmt_sequence ; stmt
Stmt_sequence → stmt
Stmt → s
Canonical collection of sets of LR(0) items are computed as follows:
I0 = CLOSURE ( { Stmt_sequence‟ → .Stmt_sequence } )
= { Stmt_sequence‟ → .Stmt_sequence
Stmt_sequence → . Stmt_sequence ; stmt
Stmt_sequence → .stmt
stmt → .s

}
GOTO ( I0, Stmt_sequence ) = CLOSURE ( { Stmt_sequence‟ → Stmt_sequence.
Stmt_sequence → Stmt_sequence . ; stmt
})
= { Stmt_sequence‟ → Stmt_sequence.
Stmt_sequence → Stmt_sequence . ; stmt ------- I1
}
GOTO ( I0, stmt) = CLOSURE ( { Stmt_sequence → stmt. } )
= { Stmt_sequence → stmt. } --------------- I2
GOTO ( I0, s) = CLOSURE ( { stmt → s. })
= { stmt → s. } ---------------- I3
GOTO ( I1, ;) = CLOSURE ( { Stmt_sequence → Stmt_sequence ; . stmt })

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 247


Automata Theory & Compiler Design 21CS51 Module 4

= { Stmt_sequence → Stmt_sequence ; . stmt


stmt → .s -------------- I4
}

GOTO ( I4, stmt ) = CLOSURE ( { Stmt_sequence → Stmt_sequence ; stmt. })


= Stmt_sequence → Stmt_sequence ; stmt. --------- I5

GOTO ( I4, s ) = CLOSURE ( { stmt → s . })


= { stmt → s . } ------------------- I3
Canonical collection of LR(0) items C = { I0, I1, I2, I3, I4, I5 }
LR(0) automaton:

Kernel items in state I4 = { Stmt_sequence → Stmt_sequence ; . stmt }


Non- Kernel items in state I4 = { stmt → . s }
Given the grammar A → (A) | a Find LR(0) items and LR(0) automaton.
Augmented grammar:
A‟ → A
A → (A)
A→a

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 248


Automata Theory & Compiler Design 21CS51 Module 4

Canonical collection of sets of LR(0) items are computed as follows:


I0 = CLOSURE ( { A‟ → .A} ) = { A’→ .A

A → .(A)--------------------- I0

A → .a
}
GOTO ( I0, A) = CLOSURE ( { A’→ A. } ) = { A‟→ A. }-------I1
GOTO ( I0, ( ) = CLOSURE ( { A →(. A) } ) = { A → (.A)
A → .(A)------- I2

A → .a
}
GOTO ( I0, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3
GOTO ( I2, A ) = CLOSURE ( { A → (A .) } ) = { A → (A.) } ---- I4
GOTO ( I2, ( ) = CLOSURE ( { A → (. A ) }) = { A → (.A)
A → .(A) ------- I2

A → .a
}
GOTO ( I2, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3

GOTO ( I4, ) ) = CLOSURE ( { A → (A). } ) = { A → (A). }---------- I5


Canonical collection of LR(0) items C = { I0, I1, I2, I3, I4, I5 }
LR(0) automaton:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 249


Automata Theory & Compiler Design 21CS51 Module 4

Given the grammar : Find LR(0) items and LR(0) automaton.


S → AaAb
S → BbBa
A→ε
B→ε

The augmented grammar will be:


S1 → S
S → AaAb
S → BbBa
A→ε
B→ε
Canonical collection of sets of LR(0) items are computed as follows:
I0 = closure ( { S1 → .S } ) = { S1 → .S
S → .AaAb
S → .BbBa --------------------- I0
A→.
B→.
}

Goto( I0, S ) = Closure ( { S1 → S. } )= { S1 → S. } ------------- I1

Goto( I0, A ) = Closure ( { S →A .aAb } ) = { S →A .aAb } ------------- I2

Goto( I0, B ) = Closure ( { S → B.bBa } ) = { S → B.bBa } ------------- I3

Goto( I2, a ) = Closure ( { S →A a.Ab } ) = { S →A a.Ab


A → .--------------- I4
}

Goto( I3, b ) = Closure ( { S → Bb.Ba } ) = { S → Bb.Ba

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 250


Automata Theory & Compiler Design 21CS51 Module 4

B → .--------------- I5
}
Goto( I4, A ) = Closure ( { S →AaA.b } ) = { S →AaA.b } ------------- I6
Goto( I5, B ) = Closure ( { S →BbB.a } ) = { S →BbB.a } ------------- I7
Goto( I6, b ) = Closure ( { S →AaAb. } ) = { S →AaAb. } ------------- I8
Goto( I7, a ) = Closure ( { S →BbBa. } ) = { S →BbBa. } ------------- I9

LR(0) automaton:

Write the canonical collection of sets of LR(0) items for the grammar:
S→L=R|R
L → * R | id
R→L
Augmented grammar will be:
S‟ → S
S→L=R
S→R
L→*R
L → id
R→L

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 251


Automata Theory & Compiler Design 21CS51 Module 4

Canonical collection of sets of LR(0) items are computed as follows:

Construct LR(0) automaton for the grammar given below:


E→E+T
E→T
T→T*F
T→F
F→(E)
F → id
Augmented grammar will be:
E’ → E

E→E+T
E→T

T→T*F
T→F

F→(E)
F → id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 252


Automata Theory & Compiler Design 21CS51 Module 4

Canonical collection of sets of LR(0) items are computed as follows:


I0:= closure ( { E‟ → .E } ) = { E‟ → .E
E→.E+T
E → .T
T→.T*F --------------- I0
T → .F

F→.(E)F
→ .id
}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 253


Automata Theory & Compiler Design 21CS51 Module 4

LR(0) Automaton:

Use of LR(0) automaton:

Simple LR or SLR parsing is the construction from the grammar of the LR(0) automaton. The
states of these automaton are the sets of items from the canonical LR(0) collection, and the
transitions are given by the GOTO function.
How can LR(0) automata help with shift-reduce decisions?
Shift-reduce decisions can be made as follows;
Suppose that the string of grammar symbols takes the LR(0) automaton from the start state to some
state „j‟. Then perform the shift operation on next iput symbol‟a‟ if state „j‟ has a transition on
„a‟. otherwise perform reduce operation: During reduction, the items in state „j‟ will tell us which
production to use.
Example: id * id
By looking the above LR(0) automaton; the following table illustrates the actions of a shift reduce
parser on input id * id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 254


Automata Theory & Compiler Design 21CS51 Module 4

Here STACK is used to hold the states;


For clarity we used one column SYMBOLS, which indicates the grammar symbols corresponding
to the states on the stack.
Initially STACK is empty and it holds the start state 0 of the automaton; the corresponding symbols
is the bottom of stack marker „$‟

At line (1) next input is „id‟ and state 0 (I0) has a transition on „id‟ to state 5 (refer LR(0)
automaton). Therfore we shift. At line (2) the next state number 5 (symbol „id‟) has been pushed
onto the stack. There is no transition from state 5 (I5) on input *, so we reduce. The item in state
5, F → id. is used for reduction ( production in which dot appears at the rightmost end). So the
reduction is by production F → id , reduction is implemented by poping the body of the production
(id) from the stack (at line 2) and pushing the head of the production (F in this case)
So here when we pop state 5 from stack, state 0 become the top and look for a transition on F (head
of the production) . That is state 0 has a transition on F to state 3, so we push state 3, with
corresponding symbol F (at line 3). Each of the remaing moves is determined similarly at line (2)
MODEL OF LR PARSER: (Structure of LR parser)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 255


Automata Theory & Compiler Design 21CS51 Module 4

LR parser uses the following data structures;


1. STACK
2. Input buffer
3. LR parsing driver program
4. Parsing table: Action and GOTO table
5. Output unit
STACK; holds a sequence of states s0, s1,……………….sm where sm is on top.
The driver program is same for all LR parsers; only the parsing table changes from one parser to
another. The parsing program reads characers from an input buffer one at a time. LR parser shifts a
state.
General Structure of LR Parsing Table:
Explain the the general structure of LR parsing Table.
LR parsing table consists of two parts:

Parsing action function ACTION

GOTO function.
ACTION Table: It takes as arguments a state „i‟ and a terminal symbol „a‟ ( or $ end marker).
The value of [ i, a] can have one of the four forms:
1. Shift j represented as sj, means shift the state number „j‟ onto the stack.
2. Reduce k represented as rk, means reduce by the production numbered „k‟.
3. Accept: The parser accepts the input and finishes parsing. It is represented as „acc‟ in
parsing table.
4. Error: The parser discovers an error in its input and take some corrective action when it
finds a blank entry in parsing table.

GOTO Table: It simply map the transitions in automaton on non-terminals. If GOTO [Ii, A] = Ij,
then in GOTO table we have to make an entry as sate „j‟ in the column ( I, A).
Behavior of the LR parser:
Discuss the behavior of LR parser.
The behavior of the LR parser for the given input is determined by reading the current input symbol
ai and state Sm on top of the stack, and consulting the entry ACTION [ Sm, ai] in the parsing action
Table.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 256


Automata Theory & Compiler Design 21CS51 Module 4

i. If ACTION [ Sm, ai] = Shift j (sj ), then parser performs a shift operation, in which the
next state „j‟ is shifted into the stack.
ii. If ACTION [ Sm, ai] = Reduce k (rk), then parser performs a reduce operation, in which the
production used for reduction is identified with the number „k‟. The reduction process is
implemeted by poping the „n‟ number of states corresponding to the „n‟ number of terms in the
body of production used for reduction, from stack. The head of the

production used in reduction is pushed onto the stack by consulting the entry in GOTO [s m,
A] where sm is state on top of the stack and A is non terminal symbol coresponding to the
head of the production in reduction process.
iii. If ACTION [ sm, ai] = Accept, parsing is complted.
iv. If ACTION [ sm, ai] = Error, the parser has discovered an error and calls an error
recovery routine
LR PARSING ALGORITHM:
With neat diagram explain LR parsing algorithm.

Input: An input string „w‟ and an LR parsing table with ACTION and GOTO for grammar G.
Output: w is L(G) after reduction, otherwise an error indication.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 257


Automata Theory & Compiler Design 21CS51 Module 4

Parsing Algorithm:
Let a be the first symbol of the given input string w$
while (1) / * repeat forever*/
{
let s be the state on top of the stack;
if ( ACTION [ s, a] = shift t )

{
Push ‘t’ onto the satck;
Let ‘a’ be the next input symbol;
}
else if ( ACTION [ s, a] = reduce A → β )
{
pop | β | symbols off the stack;

let state t now be on top of the satck;


push GOTO[ t, A] onto the stack;
output the production A → β ;
}
else if ( ACTION [ s, a] = accept )
break; /* Parsing is done */
else call error recovery routine;
}

TYPES OF LR PARSER:
The structure of LR parser for different types will change only in parsing table.
A s d i s c u s s e d e a r l i e r , t h e r e a r e t hr e e t yp e s o f L R P a r s e r s t ha t e m p l o y t he
bo t t o m - u p method of parsing a string in a given CFG. They are
i. Simple LR Parsers or LR(0) parsers (SLR)
ii. LR(1) Parsers or Canonical LR Parser (CLR)
iii. Look-ahead LR parsers(LALR):

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 258


Automata Theory & Compiler Design 21CS51 Module 4

SIMPLE LR PARSER ( SLR )


This LR parser is constructed from LR(0) items and LR(0) automaton. SLR parser is also called as
SLR(1) pasrer, usually we omit the “(1)‟ after the SLR. SLR parser uses SLR parsing table.
SLR Parsing Table Construction ( SLR Parsing Table ALGORITHM)
i. Construct the canonical collection of sets of LR(0) items for G‟ as
C = { I0, I1, I2, …… In }.
ii. Consider the initial state of parser as I0, the state one which constructed from the set of
items containing [ S‟ → .S ]
iii. In parsing table state „i‟ is constructed from Ii. The actions for state‟i‟ (ACTION Table)
for every terminal symbol „a‟, are determined as follows:
a. If GOTO [ Ii, a ] = Ij then set an ACTION [ Ii, a ] = Sj ( shift and enter into state „j‟)
b. For every state Ii in C whose underlying set of LR(0) items contains an item of the form
A → α. (except for S‟ → S. ) , then set ACTION [ Ii, b ] = rj ( reduce by using the
production numbered as „j‟ refer to the production A → α . Here symbol „b‟ is in
FOLLOW (A)
iv. Set an ACTION [ Ii, $ ] = accept, if Ii contains an item S‟ → S.
v. The GOTO transitions for state „i‟ are constructed for all non-terminal A using the rule:
If GOTO ( Ii, A) = Ij, then set [ i, A ] = j in GOTO table.
vi. The entries which are not defined (blank) in ACTION and GOTO table are considered as
Error.
Note:
In Parsing table ACTION entries are made only terminal symbols and GOTO entries are made only
for non-terminals.
If any conflicting actions or multiple entries are made in an action table, then the parser is not SLR
or SLR(1) parser.
Steps involved in SLR parser construction:
a. Augment the given grammar G to G‟.
b. From G‟ construct the canonical collection C, which contains LR(0) items.
c. Construct SLR(1) or SLR parsing table with ACTION and GOTO entries, using the SLR
parsing table algorithm given below.
d. Parse the input string using LR parsing algorithm by refering SLR parsing table.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 259


Automata Theory & Compiler Design 21CS51 Module 4

Given the grammar: A → (A) |a


i. Find LR(0) items.
ii. Construct SLR parsing Table.
iii. Write SLR parsing algorithm.
iv. show the parsing of input string ((a))
i. LR(0) items.
I0 = CLOSURE ( { A‟ → .A} ) = { A’→ .A
A → .(A)--------------------- I0

A → .a
}
GOTO ( I0, A) = CLOSURE ( { A’→ A. } ) = { A‟→ A. }------- I1
GOTO ( I0, ( ) = CLOSURE ( { A →(. A) }
) = { A → (.A)

A → .(A)------- I2
A → .a
}
GOTO ( I0, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3
GOTO ( I2, A ) = CLOSURE ( { A → (A
.) } ) = { A → (A.) } ---- I4
GOTO ( I2, ( ) = CLOSURE ( { A → (. A )} ) = { A → (.A)

A → .(A) ------- I2
A → .a
}
GOTO ( I2, a ) = CLOSURE ( { A → a. } ) = { A → a. } ---------- I3

GOTO ( I4, ) ) = CLOSURE ( { A → (A). } ) = { A → (A). }---------- I5

Canonical collection of LR(0) items C = { I0, I1, I2, I3, I4, I5 }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 260


Automata Theory & Compiler Design 21CS51 Module 4

ii. SLR parsing Table.

ACTION GOTO
STATE a ( ) $ A
0 S3 S2 1
1 accept
2 S3 S2 4
3 r2 r2
4 S5
5 r1 r1
By gving number to the productions of the grammar G:
(1) A → (A)
(2) A→a
Here the augmented production A‟→ A. is present in I1 item set, so we have to make an action
entry [ 1, $ ] = accept
Identify the productions of the form A → α .
Item set I3 contains the production A → a. and I5 contains A → (A).
For the given grammar Follow (A) = { ) , $ }
Therefore in state number 3 on input ) make an entry in action table as [ 3, ) ] = r2 ( reduce by
A → a production)
[3, $ ] = r2
In state number 5 on input ) make an entry in action table as [ 5, ) ] = r1 ( reduce by A → (A)
production)
[ 5, $] = r1
GOTO table entry:
GOTO ( I0, A ) = I1 makes an entry in goto table as [0, A ] = 1
GOTO ( I2, A ) = I4 makes an entry in goto table as [2, A ] = 4

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 261


Automata Theory & Compiler Design 21CS51 Module 4

iii. SLR Parsing Algorithm:


Note: Parsing algorithm is same for all types of LR parsers, so the same LR parsing algorithm is
used;
Let a be the first input symbol of the given input string w$
while (1) / * repeat forever*/
{
let „s‟ be the state on top of the stack; if
( ACTION [ s, a] = shift t )
{
Push „t‟ onto the satck;
Let „a‟ be the next input symbol;
}
else if ( ACTION [ s, a] = reduce A → β )
{
pop | β | symbols off the stack;
let state „t‟ now be on top of the satck;
push GOTO[ t, A] onto the stack;
output the production A → β ;
}
else if ( ACTION [ s, a] = accept )
break; /* Parsing is done */
else call error recovery routine;
}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 262


Automata Theory & Compiler Design 21CS51 Module 4

v. Parsing of input string ((a)):


STACK SYMBOLS INPUT ACTION
0 $ ( ( a) ) $ Shift state 2
02 $( (a))$ Shift state 2
022 $(( a))$ Shift state 3
02 2 3 $((a ))$ Reduce by A → a
02 2 4 $((A ))$ Shift state 5
0 2245 $((A) )$ Reduce by A → ( A )
02 4 $(A )$ Shift state 5
02 4 5 $(A) $ Reduce by A → ( A )
01 $A $ Accept

NOTE: In reduction process, pop n states from stack, where n = number of terms on RHS of
reducing production.

For the given grammar design SLR parsing table.


S → CC
C → cC | d
Answer:
Grammar G:
S → CC
C → cC
C→d

Augmented grammar G‟:


S‟ → S
S → CC
C → cC
C→d

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 263


Automata Theory & Compiler Design 21CS51 Module 4

Canonical collection of sets of LR(0) items are computed as follows:


I0 = CLOSURE ( { S‟ → .S } ) = { S’ → .S
S → .CC
C → .cC
C → .d }

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 264


Automata Theory & Compiler Design 21CS51 Module 4

The canonical collection of LR(0) items for the given grammar is C = { I0, I1, I2, I3, I4, I5, I6 }
By numbering the grammar G:
(1) S → CC
(2) C → cC
(3) C→d
Follow (S ) = { $ }
Follow (C ) = { c, d }

SLR parsing Table:


ACTION GOTO

STATE c d $ S C
0 S3 S4 1 2
1 Accept
2 S3 S4 5
3 S3 S4 6
4 r3 r3
5 r1
6 r2 r2

Given the grammar E → ( E ) | id.


Construct:
i. LR(0) automaton
ii. SLR(1) parsing table
iii. Moves made on the string ( id ).

LR(0) items and automaton:


Augmented grammar G‟
E‟ → E
E→ (E)
E → id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 265


Automata Theory & Compiler Design 21CS51 Module 4

I0 : CLOSURE ( { E‟ → .E} )
{ E‟ → .E
E→.(E)
E → . id
}
Goto [ I0, E ] = I1 = { E‟ → E.
}
Goto [ I0, ( ] = I2 = { E → (. E )
E→.(E)
E → . id
}

Goto [ I0, id ] = I3 = { E → id. }


Goto [ I2 , E] = I4 = { E → ( E. ) }

Goto [ I2 , ( ]I2 = { E → (.E )


E→.(E)
E → . id
}

Goto [ I2, id ] = I3 = { E → id. }


Goto [ I4, ) ] = I5 = { E → ( E) . }
LR(0) automaton:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 266


Automata Theory & Compiler Design 21CS51 Module 4

By numbering the grammar G:


(1) E → ( E )
(2) E → id
SLR(1) parsing table:
ACTION GOTO
STATE id ( ) $ E
0 S3 S2 1
1 Accept
2 S3 S2 4
3 r2 r2
4 S5
5 r1 r1

Follow( E ) = { ), $}

Parsing of input string (id)


STACK SYMBOLS INPUT ACTION
0 $ ( id) $ Shift state 2
02 $( id) $ Shift state 3
023 $( id )$ Reduce by E → id
024 $( E )$ Shift state 5
0245 $( E) $ Reduce by E → (id)
01 $E $ Accept

Construct the SLR parse table for the following grammar.


S → AS | b
A → SA | a
Is this grammar is SLR(1).
Augmented grammar G‟:
S‟ → S
S → AS

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 267


Automata Theory & Compiler Design 21CS51 Module 4

S→b
A → SA
A→a
Closure { S‟ → .S }
= { S‟ → .S
S → .AS
S →. b
→ I0
A → .SA
A → .a
}

Goto [ I0, S ] = { S‟ → S. }
→ I1
A → S. A
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I0, A ] = { S → A. S
→ I2
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I0, b ] = {S→b.}
→ I3

Goto [ I0, a ] = { A → a. } → I4

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 268


Automata Theory & Compiler Design 21CS51 Module 4

Goto [ I1, A ] = { A → S A.
→ I5
S → A.S
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I1, b ] =
→ I3
Goto [ I1, a ] =
→ I4
Goto [ I1, S ] = { A → S.A
→ I6
S → .AS
S →. b
A → .SA
A → .a
}
Goto [ I2, S ] = { S → AS.

A → S.A
S → .AS -- I7
S →. b
A → .SA
A → .a
}
Goto [ I2, A] = { S → A.S
S → .AS -- I2
S →. b

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 269


Automata Theory & Compiler Design 21CS51 Module 4

A → .SA
A → .a
}
Goto [ I2, b ] =
→ I3
Goto [ I2, a ] =
→ I4
Goto [ I5, S ] =
→ I7
Goto [ I5, A ] =
→ I2
Goto [ I5, b ] =
→ I3
Goto [ I5, a ] =
→ I4
Goto [ I6, a ] =
→ I4
Goto [ I6, b ] =
→ I3
Goto [ I6, A ] = → I5
Goto [ I6, S ] =
→ I6
Goto [ I7, b ] =
→ I3
Goto [ I7, a ] =
→ I4
Goto [I7, A] = ---- I5
Goto[I7, S] = --- I6

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 270


Automata Theory & Compiler Design 21CS51 Module 4

SLR Parsing Table:


ACTION GOTO
STATE a b $ S A
0 S4 S3 1 2
1 S4 S3 Accept 6 5
2 S4 S3 7 2
3 r2 r2 r2
4 r4 r4
5 S4 / r3 S3 / r3 7 2
6 S4 S3 6 5
7 S4 / r1 S3 / r1 r1 6 5
Follow (S ) = { a, b, $ }
Follow(A) = { a, b }
By numbering the productions of G:
(1) S → AS
(2) S → b
(3) A → SA
(4) A → a
The above SLR parsing action table contains multiple entries in some states, results in shift /
reduces conflicts. So the given grammar is not SLR(1) grammar.
Construct the SLR parsing table for the grammar:
E→E+T
E→T
T→T*F
T→F
F→(E)
F → id
Augmented grammar will be:
E‟ → E
E→E+T
E→T
T→T*F
T→F
F→(E)
F → id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 271


Automata Theory & Compiler Design 21CS51 Module 4

Canonical collection of sets of LR(0) items are computed as follows:


I0:= closure ( { E‟ → .E } ) = { E‟ → .E
E→.E+T
E → .T
--------------- I0
T→.T*F
T → .F
F→.(E)
F → .id
}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 272


Automata Theory & Compiler Design 21CS51 Module 4

SLR Parsing Table:


ACTION GOTO

STATE
id ( ) + * $ E T F
0 S5 S4 1 2 3
1 S6 accept
2 r2 r2 S7 r2
3 r4 r4 r4 r4
4 S5 S4 8 2 3
5 r6 r6 r6 r6
6 S5 S4 9 3
7 S5 S4 10
8 S11 S6
9 r1 r1 S7 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

Show that the following grammar is not SLR(1).


S → AaAb
S → BbBa
A→ε
B→ε
The augmented grammar will be:
S1 → S
S → AaAb
S → BbBa
A→ε
B→ε
Canonical collection of sets of LR(0) items are computed as follows:
I0 = closure ( { S1 → .S } ) = { S1 → .S

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 273


Automata Theory & Compiler Design 21CS51 Module 4

S → .AaAb
S → .BbBa --------------------- I0
A→.
B→.

Goto[I0, S] = Closure (S1 →S.) = {S1 →S.} I1

Goto[I0, A] = Closure (S →A.aAb) = { S →A.aAb } I1

Goto( I0, B ) = Closure ( { S → B.bBa } ) = { S → B.bBa } ------------- I3

Goto( I2, a ) = Closure ( { S →A a.Ab } ) = { S →A a.Ab


A → .--------------- I4
}

Goto( I3, b ) = Closure ( { S → Bb.Ba } ) = { S → Bb.Ba


B → .--------------- I5
}
Goto( I4, A ) = Closure ( { S →AaA.b } ) = { S →AaA.b } ------------- I6
Goto( I5, B ) = Closure ( { S →BbB.a } ) = { S →BbB.a } ------------- I7
Goto( I6, b ) = Closure ( { S →AaAb. } ) = { S →AaAb. } ------------- I8
Goto( I7, a ) = Closure ( { S →BbBa. } ) = { S →BbBa. } ------------- I9

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 274


Automata Theory & Compiler Design 21CS51 Module 4

SLR Parsing table:


ACTION GOTO
STATE
a b $ S A B
0 r3 / r 4 r3 / r 4 1 2 3
1 Accept
2 S4
3 S5
4 r3 r3 6
5 r4 r4 7
6 S8
7 S9
8 r1
9 r2
(1) S → AaAb
(2) S → BbBa
(3) A → ε
(4) B → ε
Follow(S) = { $}
Follow( A) = { a, b }
Follow( B) = { a, b }
Since the action table contains multiple entries (reduce/reduce conflicts), the given grammar is not
SLR(1).
Consider the grammar.
S→L=R|R
L → * R | id
R→L
Verify the grammar is SLR (1) or not.
Augmented grammar will be:
S‟ → S
S→L=R
S→R

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 275


Automata Theory & Compiler Design 21CS51 Module 4

L→*R
L → id
R→L
Canonical collection of sets of LR(0) items are computed as follows:

SLR Parsing table:


ACTION GOTO
STATE id * = $ S L R
0 S5 S4 1 2 3
1 accept
2 S6 / r 5 r5
3 r2
4 S5 S4 8 7
5 r4 r4 8
6 S5 S4 9
7 r3 r3
8 r5 r5
9 r1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 276


Automata Theory & Compiler Design 21CS51 Module 4

(1) S → L = R
(2) S → R
(3) L → * R
(4) L → id
(5) R → L
Follow(S) = { $}
Follow(L) = { =, $ }
Follow(R) = { =, $}
Since there is a multiple entry, ie: both a shift and a reduce entry in ACTION [ 2, =], state 2 has a
shift/reduce conflict on input symbol „= „ so the given grammar is not SLR(1).
NOTE: The above grammar is not ambiguous, the shift/reduce conflict arises from the fact that the
SLR parser is not powerful enough to remember enough left context to decide what action the
parser should take on input =, having seen a string reducible to L.

Form the ACTION / GOTO table for the following grammar:


S → Aa | bAc | Ba | bBa
A→d
B→d
Justify whether the grammar is LR(0) or not
Augmented grammar:
S‟ → S
S → Aa
S → bAc
S → Ba
S → bBa
A→d
B→d

I0 = { S‟ → .S Goto (I0, S ) = { S‟ → S. } ------------ I1


S → .Aa Goto (I0, A ) = { S → A. a } ------------ I2
S → .bAc Goto (I0, B ) = { S → B.a } ------------ I3
S → .Ba

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 277


Automata Theory & Compiler Design 21CS51 Module 4

S → .bBa
A → .d
B → .d
}

Goto (I0, d) = { A → d.
B → d. }---------- I5
Goto (I2, a) = { S → Aa. } -------------- I6

Goto (I3, a) = { S → Ba. } - I7


Goto (I4, A) = { S → bA.c } ------------- I8
Goto (I4, B) = { S → bB.a } ------------- I9
Goto (I4, d) = I5
Goto (I8, c) = { S → bAc. } ------------- I10
Goto (I9, a) = { S → bBa. } ------------- I11

ACTION GOTO
STATE a b c d $ S A B
0 S4 S5 1 2 3
1 Accept
2 S6
3 S7
4 S5 8
5 r5/ r6 r5 9
6 r1
7 r3
8 S10
9 S11
10 r2
11 r4

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 278


Automata Theory & Compiler Design 21CS51 Module 4

(1) S → Aa
(2) S → bAc
(3) S → Ba
(4) S → bBa
(5) A→d
(6) B→d
Follow(S) = { $}
Follow(A) = { a, c}
Follow(B) = { a}
The above SLR parsing table contains multiple entries in state 5 on input „a‟, Action[5, a] = r5 /
r6 results in reduce/reduce conflict. So the given grammar is not LR(0) or SLR grammar.
Show that the following grammar is SLR(1)
S → SA | A
A→ a
Augmented grammar:
S‟ → S
S → SA
S →A
A→ a
LR(0) items:

I0 : { S‟ → .S
S → .SA
S → .A
A → .a }
Goto( I0, S) = { S‟ → S.
S → S.A I1
A → .a
}
Goto ( I0, A) = {S → A. } I2
Goto ( I0, a) = {A→ a. } I3

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 279


Automata Theory & Compiler Design 21CS51 Module 4

Goto ( I1, A) = {S → SA. } I4


Goto ( I1, a) = {A→ a. } I3

(1) S → SA
(2) S → A
(3) A→ a

Follow(S) = { a, $ }
Follow(A) = { a, $ }
ACTION GOTO
STATE a $ S A
0 S3 1 2
1 S3 Accept 4
2 r2 r2
3 r3 r3
4 r1 r1

From the above SLR parsing table we observe that, each parsing table entry uniquely identifies
shift or reduce operation or signals an error (blank entry). So the given grammar is SLR(1).
Consider the grammar :

E→E+n|n

i. Find LR(0) items

ii. Construct SLR parsing table and parse the input string n + n + n

Augmented grammar:
E‟ → E
E→E+n
E→n

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 280


Automata Theory & Compiler Design 21CS51 Module 4

LR(0) items:
I0 { E‟ → .E
E → .E + n
E → .n
}

Goto (I0, E) = { E‟ → E. I1
E → E .+ n
}

Goto (I0, n) = { E → n. } I2
Goto (I1, + ) = { E → E +. n } I3

Goto (I3, n ) = { E → E + n. } I4

(1) E → E + n
(2) E → n
Follow (E) = { +, $ }

SLR Parsing Table:


ACTION GOTO
STATE n + $ E
0 S2 1
1 S3 Accept
2 r2 r2
3 S4
4 r1 r1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 281


Automata Theory & Compiler Design 21CS51 Module 4

Parsing action for the input string n + n + n

STACK SYMBOLS INPUT ACTION


0 $ n+n+n$ Shift 2
02 $n +n+n$ Reduce by E → n
01 $E +n+n$ Shift 3
013 $E + n+n$ Shift 4
0 1 34 $E + n +n$ Reduce by E → E + n
01 $E +n$ Shift 3
013 $E + n$ Shift 4
0 1 34 $E + n $ Reduce by E → E + n
01 $E $ Accept

Drawback of SLR Parser:


1. SLR grammars constitute a small subset of context free grammars, so an SLR parser can
only succeed on small number of context free grammars. That means SLR Parser is a less
powerful LR parser (The power of the parser is measured in terms of the number of
grammars on which it succeed.)
2. In an SLR parser when it sees a RHS of the production rule A → α, on top of the stack, it
replaces this rule by the LHS non-terminal A if the next input symbol can FOLLOW the
non-terminal A. But sometimes this reduction may not lead to the generation of previous
rightmost derivations.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 282


Automata Theory & Compiler Design 21CS51 Module 4

MORE POWERFUL PARSERS


Powerful LR parsers:
Here we shall extend the previous LR parsing techniques to use one look-ahead symbol on the
input.
Look- ahead‟s are symbols that the parser uses to „look-ahead‟ in the input buffer to decide
whether or not reduction is to be done. That is we have to work with items of the form:
{ A → α.Xβ , a }

 The item „a‟ is called as an LR (1) item, because the length of the look-ahead symbol is one.
 An item without look-ahead is one with look-ahead of length zero, hence it is LR (0) item.
In SLR parsing method, we were working with LR (0) items.
An LR(1) item is comprised of two parts:
 LR( 0 ) item and the look-ahead associated with the item.
There are two different methods for LR parsing based on look-ahead symbol on the input:
i. The Canonical LR or LR (1 ) or just LR parser: which makes full use of the look- ahead
symbol(s). This method uses a large set of items, called the LR (1) items.
ii. Look-ahead LR or LALR:
a. Which is based on the LR(0) sets of items and has many fewer states than typical
parsers based on the LR(1) items. (CLR)
b. We can handle many more grammars with the LALR method than the SLR method,
by introducing look-ahead‟s into the LR(0) items.
c. LALR parsing table is not bigger than SLR tables.
d. It most widely used parser.
LR (1) Or Canonical LR Parser (CLR):
 Every state of the LR(1) or CLR parser will correspond to a set of LR (1) items.
 When parser „looks-ahead‟ in the input buffer to decide whether or not reduction is to be
done; the information about the terminals will be available in the state of the parser itself.
Canonical collection of LR(1) items can be obtained by just modifying the CLOSURE and GOTO
functions with look-ahead symbols.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 283


Automata Theory & Compiler Design 21CS51 Module 4

CONSTRUCTION OF LR (1) ITEMS: (ALGORITHM FOR LR(1) ITEMS):

Write an algorithm used to compute LR(1) sets of item.


1. Find the augmented grammar G‟
2. Find the initial item set I0 in LR(1) item sets by computing the CLOSURE ( { [S‟ → .S, $ ]
})
The CLOSURE of any item I is computed as follows:
add every item in I to CLOSURE ( I).
repeat
for ( each item { A →α.Bβ , a } in I )
for ( each production B → Γ in G )
for (each terminal b in FIRST (βa) )
add { B → . Γ, b } to set I ;
until no more items are added to I;
return I;
3. Find the remaining item sets in LR(1) items using GOTO functions
GOTO(I, X ) is computed as follows:
For each item of the form [ A → α . Xβ, a] in I add item as

GOTO ( I, X ) = CLOSURE ( {A → αX.β, a } )

**********Construct LR(1) items for the following grammar:


S → CC
C → cC | d

Augmented grammar:
S‟ → S
S → CC
C → cC
C → d

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 284


Automata Theory & Compiler Design 21CS51 Module 4

LR(1) items:
The initial item set I0 is obtained by computing the

CLOSURE ( { S‟ → .S, $ } )

β = ε and a =$

First (βa) = First ($) = $

Add all S productions with dot at left most end and look-ahead input symbol = $ to I0

CLOSURE ( { S‟ → .S, $ } ) = { S‟ → .S, $

S → .CC, $

Again dot follows a non-terminal symbol C


β = C and a = $
First (βa) = First (C$) = First (C) = {c, d}
Add all C productions with dot at left most end and look-ahead input symbols = {c, d} to I0
C → .cC , c
C → .cC , d
C → .d, c
C → .d, d
We use the shorthand notation [ C → .cC , c / d ] for the two items
[C → .cC , c] and [C → .cC , d]
Similarly [C → . d, c / d ] for the two items [C → . d, c] and [C → . d, d]
I0 :

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 285


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I0, S ) = { S‟ → S. , $ } → I1
Goto (I0, C ) = { S → C.C, $
C → .cC , $ → I2
C → .d , $
}
Goto (I0, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
}
Goto (I0, d) = { C → d. , c / d } → I4
Goto (I2, C) = { S → CC. , $ } → I5
Goto (I2, c) = { C → c.C , $
C → .cC , $ → I6
C → . d, $ }
Goto (I2, d) = { C → d. , $ } → I7
Goto (I3, C) = { C → c C. , c / d } → I8
Goto (I3, c) = { C → c .C , c / d
C → .cC , c / d → I3
C → . d, c/d
}
Goto (I3, d) = { C → d. , c / d } → I4
Goto (I6, C) = { C → cC. , $ } → I9
Goto (I6, c) = { C → c.C , $
C → .cC , $ → I6
C → . d, $ }
Goto (I6, d) = { C → d. , $ } → I7

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 286


Automata Theory & Compiler Design 21CS51 Module 4

For the grammar


A → (A) | a
Construct LR(1) items
Answer:
Augmented grammar:
A‟ → A
A → (A)
A → a
LR (1) items:
I0 : { A‟ → .A , $
A → .(A) , $
A → .a, $
}
Goto (I0, A ) = { A‟ → A. , $ } I1
Goto (I0, ( ) = { A → ( .A) , $
A → .(A) , ) I2
A → .a, )
}
Goto (I0, a) = { A → a. , $ } I3
Goto (I2, A) = { A → (A .) , $ } I4
Goto (I2, ( ) = { A → (.A) , )
A → .(A) , ) I5
A → .a, )
}
Goto (I2, a ) = { A → a. , ) } I6
Goto (I4, ) ) = { A → (A) . , $ } I7
Goto (I5, A) = { A → (A.) , ) } I8
Goto (I5, ( ) = { A → (. A) , ) I5
A → .(A) , )
A → .a, )
}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 287


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I5, a) = { A → a. , ) } I6
Goto (I8, ) ) = { A → (A) . , ) } I9

Given the grammar

S → AA

A → Aa | b

i. Construct sets of LR(1) items.

ii. Draw the GOTO graph

Augmented grammar:

S‟ → S
S → AA
A → Aa
A→ b
LR(1) items:
I0 : = {

S‟ → .S, $

S → .AA, $

A → .Aa, b/a

A → .b, b/a

Goto (I0, S ) = { S → S. , $ } I1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 288


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I0, A ) = { S → A.A, $


A → .Aa , $/a

A → .b, $/a

A → A.a , b/a I2

Goto (I0, b) = { A → b. , b/a } I3

Goto (I2, A) = { S → AA. , $


A → A.a , $/a I4

}
Goto (I2, b ) = { A → b. , $/a } I5

Goto (I2, a ) = {A → Aa. , b/a } I6

Goto (I4, a ) = { A → Aa. , $/a } I7

GOTO graph:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 289


Automata Theory & Compiler Design 21CS51 Module 4

CONSTRUCTION OF LR(1) or CLR PARSING TABLE:

Write an algorithm used to construct LR(1)or CLR parsing table.

1. Construct the canonical collection of LR(1) item set C‟ = { I0, I1, I2…………….. In } for the
augmented grammar G‟.
2. State „i‟ of the parser is constructed from Ii, the parsing action for state „i‟, for every terminal
symbol „a‟ is determined as follows:
a) If GOTO ( Ii, a ) = Ij then make an Action [ i, a] = Sj
b) For every state Ii in C‟ whose underlying set of LR(1) items contains an item of the form
{ A → α., a } , make an Action [ i, a ] = rk where k is the number of the production A→ α.
c) If { S‟ → .S , $ } is in Ii, then set Action[i, $ ] = accept.
3. For state „i‟ make an entry in GOTO table for non-terminals A, using the rule GOTO[i, A] = j
4. All entries not defined by rules (2) and (3) are made „error‟.
5. The initial state of the parser is the one constructed from the set of items containing [ S‟→ .S,
$]

NOTE:
If canonical LR (1) parsing table, action function has no multiply defined entries, then the given
grammar is called an LR(1) grammar or CLR grammar.
Construct LR(1) items and LR(1) or CLR parsing table for the following grammar:

S → CC
C → cC | d

Augmented grammar:

S‟ → S

S → CC

C → cC

C → d

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 290


Automata Theory & Compiler Design 21CS51 Module 4

LR(1) items:

I0 :

Goto (I0, S ) = { S‟ → S. , $ } → I1

Goto (I0, C ) = { S → C.C, $


C → .cC , $ → I2

C → .d , $

}
Goto (I0, c) = { C → c .C , c / d
C → .cC , c / d → I3

C → . d, c/d

}
Goto (I0, d) = { C → d. , c / d } → I4
Goto (I2, C) = { S → CC. , $ } → I5
Goto (I2, c) = { C → c.C , $

C → .cC , $ → I6

C → . d, $ }

Goto (I2, d) = { C → d. , $ } → I7

Goto (I3, C) = { C → c C. , c / d } → I8

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 291


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I3, c) = { C → c .C , c / d

C → .cC , c / d → I3

C → . d, c/d

Goto (I3, d) = { C → d. , c / d } → I4

Goto (I6, C) = { C → cC. , $ } → I9

Goto (I6, c) = { C → c.C , $

C → .cC , $ → I6

C → . d, $ }

Goto (I6, d) = { C → d. , $ } → I7

(1) S → CC
(2) C → cC
(3) C → d
Canonical LR Parsing Table:
ACTION GOTO
STATE c d $ S C
0 S3 S4 1 2
1 Accept
2 S6 S7 5
3 S3 S4 8
4 r3 r3
5 r1
6 S6 S7 9
7 r3
8 r2 r2
9 r2

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 292


Automata Theory & Compiler Design 21CS51 Module 4

Given the grammar:


S → L=R
S → R
L → *R
L → id
R → L
i. Construct LR(1) items.
ii. Construct LR(1) canonical parsing table.
iii. Check whether the grammar is LR(1) or CLR

Augmented grammar:
S‟ → S
S → L=R
S → R
L → *R
L → id
R → L
LR(1) items:
S‟ → .S, $ I0

S → . L=R, $

S→.R,$

L→.*R,=

L → . id , =

R→.L,$

L→.*R,$

L → . id , $

Goto (I0, S) = S‟ → S. , $ I1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 293


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I0, L) = S → L. = R, $

R → L. , $ I2

Goto (I0, R) = S →R . , $ I3

Goto (I0, *) = L→ *.R,= L→ *.R,=/$

R→.L,= R→.L,=/$

L → .* R , = L → .* R , = / $ ----- I4
L → . id , = L → . id , =/ $
L→ *.R,$

R→.L,$

L → .* R , $

L → . id , $

Goto (I0, id) = L → id. , = / $ I5

Goto (I2, = ) = S → L = .R, $

R→.L,$ I6

L→.*R,$

L → . id , $

Goto (I4, R) = L → * R. , = / $ I7

Goto (I4, L) = R → L. , = / $ I8

Goto (I4, *) = L → * . R , =/$

R → . L , =/$

L → .* R , =/$ I4
L → . id , =/$

Goto (I4, id) = L → id. , = / $ I5

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 294


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I6, R) = S → L = R. , $ I9

Goto (I6, L) = R → L. , $ I10

Goto (I6, *) = L → *. R , $
R → .L , $
L → .* R , $ I11
L → .id , $
Goto (I6, id) = L → id. , $ I12

Goto (I11, R) = L → *R. , $ I13

Goto (I11, L) = R → L. , $ I10

Goto (I11, *) = L → *. R , $
R → .L , $
L → .* R , $ I11
L → .id , $
Goto (I11, id) = L → id. , $ I12

(1) S → L = R
(2) S → R
(3) L → *R
(4) L → id
(5) R → L
CLR Parsing Table:
ACTION GOTO
STATE id * = $ S L R
0 S5 S4 1 2 3
1 Accept
2 S6
3 r2
4 S5 S4 8 7
5 r4 r4
6 S12 S11 10 9
7 r3 r3
8 r5 r5

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 295


Automata Theory & Compiler Design 21CS51 Module 4

9 r1
10 r5
11 S12 S11 10 13
12 r4
13 r3

There is no multiple entry in parsing table, so the grammar is CLR.


Given the grammar:
S → AaAb
S → BbBa
A → ε
B → ε
i. Construct LR(1) items.
ii. Construct LR(1) canonical parsing table.
Augmented grammar:
S‟ → S
S → AaAb
S → BbBa
A → ε
B → ε
LR(1) items:
S‟ → .S, $

S → . AaAb, $

S → . BbBa, $ I0
A → ., a
B → ., b
Goto (I0, S) = S‟ → S. , $ I1
Goto (I0, A) = S → A.aAb, $ I2
Goto (I0, B) = S →B . bBa, $ I3
Goto (I2, a) = S → Aa. Ab, $

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 296


Automata Theory & Compiler Design 21CS51 Module 4

A → ., b I4
Goto (I3, b) = S →B b.Ba, $ I5
B → ., a
Goto (I4, A) = S → AaA. b, $ I6

Goto (I5, B) = S →B bB.a, $ I7

Goto (I6, b) = S → AaA b., $ I8

Goto (I7, a) = S →B bBa. , $ I9

1. S → AaAb
2. S → BbBa
3. A → ε
4. B → ε
LR(1) Parsing table:
ACTION GOTO
STATE a b $ S A B
0 r3 r4 1 2 3
1 Accept
2 S4
3 S5
4 r3 6
5 r4 7
6 S8
7 S9
8 r1
9 r2

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 297


Automata Theory & Compiler Design 21CS51 Module 4

Construct the LR(1) items for the following grammar:


S → E
E → (L)
E→ a
L → EL
Augmented grammar:
S‟ → S
S → E
E → (L)
E→ a
L → EL
LR(1) items:
S‟ → .S, $
S → .E, $
I0
E → .( L ), $
E → .a, $
Goto (I0, S) = S‟ → S. , $ I1

Goto (I0, E) = S → E. , $ I2
Goto (I0, ( ) = E → (. L), $
L → .EL, ) I3
E → .( L), a /(

E → .a, a /(
Goto (I0, a) = E → a., $ I4
Goto (I3, L) = E → (L.), $ I5
Goto (I3, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(

E → .a, a /(

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 298


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I3, ( ) = E → (. L), a /(


L → .EL, )
I7
E → .( L), a /(

E → .a, a /(
Goto (I3, a) = E → a., a /( I8

Goto (I5, ) )= E → (L) . , $ I9


Goto (I6, L) = L → EL. , ) I10

Goto (I6, E) = L → E.L, )


L → .EL, ) I6
E → .( L), a /(

E → .a, a /(
Goto (I6, ( ) = E → (. L), a /(
L → .EL, )
I7
E → .( L), a /(

E → .a, a /(
Goto (I6, a) = E → a., a /( I8
Goto (I7, L) = E → (L.), a /( I11
Goto (I7, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(

E → .a, a /(
Goto (I7, ( ) = E → (. L), a /(
L → .EL, )
I7
E → .( L), a /(

E → .a, a /(
Goto (I7‟ a) = E → a., a /( I8

Goto (I10 , ) ) E → (L) . , a /( I12

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 299


Automata Theory & Compiler Design 21CS51 Module 4

For the given grammar:


E → E+n
E → n
i. Construct LR(1) items.
ii. CLR parsing table.
iii. Parse the input string n +n +n
Augmented grammar:
E‟ → E
E → E+n
E → n
LR(1) items:
E‟ → .E, $ E‟ → .E, $
E → .E + n, $ E → .E + n, $ / + ---- I0
E → .n , $ ie: = E → .n , $ / +
E → .E + n, +
E → .n , +
Goto (I0, E) = E‟ → E. , $ I1
E → E. + n, $ / +
Goto (I0, n) = E → n. , $ / + I2
Goto (I1, + ) = E → E+. n, $ / + I3
Goto (I3, n) = E → E+ n. , $ / + I4
1. E → E + n
2. E → n
CLR Parsing Table:
ACTION GOTO
STATE n + $ E
0 S2 1
1 S3 Accept
2 r2 r2
3 S4
4 r1 r1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 300


Automata Theory & Compiler Design 21CS51 Module 4

Parsing action of the input string n + n + n:


STACK SYMBOLS INPUT ACTION
0 $ n+n+n$ Shift 2
02 n +n+n$ Reduce by E→ n
01 E +n+n$ Shift 3
013 E+ n+n$ Shift 4
0134 E+n +n$ Reduce by E→ E + n
01 E +n$ Shift 3
013 E+ n$ Shift 4
0134 E+n $ Reduce by E→ E + n
01 E $ Accept

LOOK-AHEAD LR PARSER ( LALR)


By comparing SLR(1) parser with CLR(1) parser, we find that the CLR parser is more powerful.
But the CLR has a greater number of states than the SLR parser; hence storage requirement is also
greater than the SLR(1). Therefore we can devise a parser that is an intermediate between the two;
that is the parser‟s power will be in between that of SLR and CLR and its storage requirement will
be the same as SLR(1)‟s. Such a parser, LALR(1) will be much more useful, since its states
corresponds to the set of LR(1) items, the information about the look-ahead‟s is available in the
state itself, making it more powerful than SLR parser.
The state of the LALR parser is obtained by combining those states of the CLR parser that have
identical LR(0) or core items, with different look-ahead‟s in their item set representations.
Therefore even if there is no reduce/reduce conflict in the states of CLR parser that has been
combined to form an LALR parser, a conflict may be generated in the state of LALR parser. We
may be able to obtain a CLR parsing table without multiple entries for a grammar, but when we
construct the LALR parsing table for the same grammar, it might have multiple entries.
Construction of LALR parser:
1. Obtain the canonical collection of LR(1) items set C‟
2. In C‟ if more than one set of LR(1) items have identical cores or LR(0) items, with different
look-ahead‟s, then combine these sets of LR(1) items to obtain reduced collection of C‟, of
sets of LR(1) items.
3. Construct the parsing table by using the items in C‟.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 301


Automata Theory & Compiler Design 21CS51 Module 4

Construction of LALR parsing table:


1. Construct the reduced canonical collection of LR(1) item set in C‟ = { I0, I1, I2…………….. In }
for the augmented grammar G‟ by combing the identical core items in LR(1)

2. State „i‟ of the parser is constructed from Ii, the parsing action for state „i‟, for every
terminal symbol „a‟ is determined as follows:

a If GOTO ( Ii, a ) = Ij then make an Action [ i, a] = Sj

bFor every state Ii in C‟ whose underlying set of LR(1) items contains an item of the form

{ A → α., a } , make an Action [ i, a ] = rk where k is the number of the production A→ α.

c) If { S‟ → .S , $ } is in Ii, then set Action[i, $ ] = accept.

3. For state „i‟ make an entry in GOTO table for non-terminals A, using the rule GOTO[i, A]

=j

4. All entries not defined by rules (2) and (3) are made „error‟.

5. The initial state of the parser is the one constructed from the set of items containing

[ S‟→ .S, $ ]

Construct LALR(1) parsing table for the following grammar:


S → CC

C → cC | d

Also parse the input string „ccdd‟ using LALR parsing table

Augmented grammar:

S’ → S
S → CC
C → cC
C → d

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 302


Automata Theory & Compiler Design 21CS51 Module 4

LR(1) items:

I0 :

Goto (I0, S ) = { S‟ → S. , $ } → I1

Goto (I0, C ) = { S → C.C, $


C → .cC , $ → I2

C → .d , $

}
Goto (I0, c) = { C → c .C , c / d
C → .cC , c / d → I3

C → . d, c/d

}
Goto (I0, d) = { C → d. , c / d } → I4
Goto (I2, C) = { S → CC. , $ } → I5
Goto (I2, c) = { C → c.C , $

C → .cC , $ → I6

C → . d, $ }

Goto (I2, d) = { C → d. , $ } → I7

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 303


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I3, C) = { C → c C. , c / d } → I8

Goto (I3, c) = { C → c .C , c / d

C → .cC , c / d → I3

C → . d, c/d

Goto (I3, d) = { C → d. , c / d } → I4
Goto (I6, C) = { C → cC. , $ } → I9

Goto (I6, c) = { C → c.C , $

C → .cC , $ → I6

C → . d, $ }

Goto (I6, d) = { C → d. , $ } → I7

From the above LR(1) items we see that, I3, I6 have identical LR(0) items that differ only in their
look-ahead‟s. The same goes for the pair of states I4, I7 and the pair of states I8, I9. Hence we can
combine I3 with I6, I4 with I7 and I8 with I9 to obtain the reduced collection of LR(1) items as shown
below:
I0 :

{ S‟ → S. , $ } → I1

{ S → C.C, $
C → .cC , $ → I2

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 304


Automata Theory & Compiler Design 21CS51 Module 4

C → .d , $

{ C → c .C , c / d /$

C → .cC , c / d/ $ → I36

C → . d, c / d/ $

{ C → d. , c / d /$ } → I47

{ S → CC. , $ } → I5

{ C → c C. , c / d /$ } → I89

LALR Parsing Table:


ACTION GOTO
State c d $ S C
0 S36 S47 1 2
1 accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3 r3

5 r1
89 r2 r2 r2

Parsing table for the input string ccdd:


SATCK SYMBOLS INPUT ACTION
0 $ ccdd$ Shift 36.
0 36 c cdd$ Shift 36.
0 36 36 cc dd$ Shift 47.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 305


Automata Theory & Compiler Design 21CS51 Module 4

0 36 36 47 ccd d$ Reduce by C → d
0 36 36 89 ccC d$ Reduce by C → cC
0 36 89 cC d$ Reduce by C → cC
02 C d$ Shift 47.
0 2 47 Cd $ Reduce by C → d
025 CC $ Reduce by S → CC
01 S $ Accept

Construct LALR parsing table for the following grammar:


S → E
E → (L)
E→ a
L → EL Check whether the grammar is LALR(1) or not?
Augmented grammar:
S‟ → S
S → E
E → (L)
E→ a
L → EL
LR(1) items:
S‟ → .S, $
S → .E, $
I0
E → .( L ), $
E → .a, $
Goto (I0, S) = S‟ → S. , $ I1

Goto (I0, E) = S → E. , $ I2
Goto (I0, ( ) = E → (. L), $
L → .EL, ) I3
E → .( L), a /(
E → .a, a /(

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 306


Automata Theory & Compiler Design 21CS51 Module 4

Goto (I0, a) = E → a., $ I4


Goto (I3, L) = E → (L.), $ I5
Goto (I3, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(
E → .a, a /(

Goto (I3, ( ) = E → (. L), a /(


L → .EL, ) I7
E → .( L), a /(
E → .a, a /(
Goto (I3, a) = E → a., a /( I8
Goto (I5, ) ) = E → (L) . , $ I9
Goto (I6, L) = L → EL. , ) I10
Goto (I6, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(
E → .a, a /(
Goto (I6, ( ) = E → (. L), a /(
L → .EL, ) I7
E → .( L), a /(
E → .a, a /(
Goto (I6, a) = E → a., a /( I8
Goto (I7, L) = E → (L.), a /( I11
Goto (I7, E) = L → E.L, )
L → .EL, ) I6
E → .( L), a /(

E → .a, a /(
Goto (I7, ( ) = E → (. L), a /(
L → .EL, )

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 307


Automata Theory & Compiler Design 21CS51 Module 4

E → .( L), a /( I7

E → .a, a /(
Goto (I7‟ a) = E → a., a /( I8

Goto (I10 , ) ) E → (L) . , a /( I12

From the above LR(1) items we see that, I3, I7 have identical LR(0) items that differ only in their
look-ahead‟s. The same goes for the pair of states I4, I8 and the pair of states I5, I11 and I9, I12. Hence
we can combine I3 with I7, I4 with I8 , I5 with I11 and I9 with I12 to obtain the reduced collection of
LR(1) items as shown below:
I37 :
E → (. L), $ / a /(
L → .EL, )
E → .( L), a /(

E → .a, a /(
I48:
E → a., $ /a /(
I511:
E → (L.), $ / a /(
I912:
E → (L) . , $ / a /(
LALR Parsing Table:
ACTION GOTO
STATE a ( ) $ S E L
0 S48 S37 1 2
1 accept
2 r1
37 S48 S37 6 511
48 r3 r3 r3
511 S912

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 308


Automata Theory & Compiler Design 21CS51 Module 4

6 S48 S37 6 10
912 r2 r2 r2
10 S912 / r4

1. S → E
2. E → ( L )
3. E → a
4. L → EL
The above grammar is not an LALR(1) grammar, since the LALR parsing table contains multiple
entries in state 10 on input „)‟.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 309


Module 5
---------------------------------------------------------------------------------------------------------------------
Introduction to Turing Machine:
 Problems that Computers Cannot Solve
 The Turing machine, problems, Programming Techniques for Turing Machine,
Extensions to the Basic Turing Machine
Undecidability:
 A language That Is Not Recursively Enumerable,
 An Undecidable Problem That Is RE.
Other Phases of Compilers:
 Syntax Directed Translation
 Syntax-Directed Definitions, Evaluation Orders for SDD’s.
 Intermediate-Code Generation
 Variants of Syntax Trees, Three-Address Code.
 Code Generation
 Issues in the Design of a Code Generator
----------------------------------------------------------------------------------------------------------------
Textbooks:
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata Theory,
Languages and Computation”, Third Edition, Pearson.

2. Alfred V. Aho, Monica S.Lam,Ravi Sethi, Jeffrey D. Ullman, “ Compilers Principles,


Techniques and Tools”, Second Edition, Perason.
Textbook 1:

 Chapter 8 – 8.1, 8.2,8.3,8.4


 Chapter 9 – 9.1,9.2
Textbook 2:
 Chapter 5 – 5.1, 5.2
 Chapter 6- 6.1,6.2
 Chapter 8- 8.1
Page | 310
Page | 320
Automata Theory & Compiler Design 21CS51 Module 5

TURING MACHINE
Introduction:
In the early 1930s. mathematicians were trying to define effective computation. Alan Turing in
1936. Alanzo Church in 1933, S.C. Kleene in 1935, Schonfinkel in 1965 gave various models using
the concept of Turing machines, λ-calculus, combinatory logic, post-systems and p-recursive
functions. It is interesting to note that these were formulated much before the electro-
mechanical/electronic computers were devised. Although these formalisms, describing effective
Computations are dissimilar, they turn to be equivalent.
Among these formalisms, the Turing's formulation is accepted as a model of algorithm or
computation.
Turing machines are useful in several ways. As an automaton, the Turing machine is the most
general model. It accepts type-0 (un-restricted Grammer generated language) languages. It can also
be used for computing functions. It turns out to be a mathematical model of partial recursive
functions. Turing machines are also used for determining the un-decidability of certain languages
and measuring the space and time complexity of problems.
Type-0 Grammar: Any Grammar in which the production rule is of type: α → β where α is a string of
terminals and non-terminals with at least one non-terminal and α cannot be null. β is a string of
terminals and non-terminals.
Type-0 Grammar generates Recursively Enumerable Languages.
Turing’s Thesis:
• Any computation that can be carried out by a mechanical means can be performed by some
Turing Machine.
• The Church-Turing thesis states that any algorithmic procedure that can be carried out by
human beings/computer can be carried out by a Turing machine.
• It has been universally accepted by computer scientists that the Turing machine provides an
ideal theoretical model of a computer.
Few arguments for accepting this thesis are
1. Anything that can be done on existing digital computer can also be done by Turing Machine.
2. No one has yet been able to suggest a problem solvable by what we consider an algorithm, for
which a Turing machine program cannot be written.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 311


Automata Theory & Compiler Design 21CS51 Module 5

Key Idea behind TM:


For formalizing computability, Turing assumed that, while computing, a person writes symbols on a
one-dimensional paper (instead of a two dimensional paper as is usually done) which can be viewed as
a tape divided into cells.
One can scan the cells one at a time and usually performs one of the three simple operations
1. Writing a new symbol in the cell being currently scanned
2. Moving to the cell left of the present cell and
3. Moving to the cell right of the present cell.
With these observations in mind, Turing proposed his 'computing machine‟, called as Turing Machine.
*******Define Turing machine
A Turing machine M is a 7-tuple, namely (Q, ∑ ,Г, δ, q0. B, F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols and is a subset of Г and B ≠ ∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → Q x Г x { L/R}
• q0 € Q is the initial state, and
• F is the subset of Q is the set of final states.
TURING MACHINE MODEL
With neat diagram explain the working principle of a basic Turing machine.
A Turing machine can be defined as M: which is a 7-tuple, namely (Q, ∑ ,Г, δ, q0. B, F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols and is a subset of Г and B ≠ ∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → Q x Г x { L/R}
• q0 € Q is the initial state, and

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 312


Automata Theory & Compiler Design 21CS51 Module 5

• F is the subset of Q is the set of final states.

The Turing machine model uses an infinite tape as its unlimited memory. The input symbols occupy
some of the tape‟s cells. Input symbols can be preceded and followed by infinite number of blank
(B) characters. Each cell can store only one symbol. The input to and the output from the finite state
automaton are effected by the R/W head which can examine one cell at a time.
A move of the Turing machine is a function of the state of the finite control and the tape symbol
scanned. In one move, the TM will change state. The next state optionally may be the same as the
current state.
At each step of computation
1. Read/scan the symbol below the R/W head
2. Update/write a symbol the R/W head
3. Move the R/W head one step LEFT
4. Move the R/W head one step RIGHT
Finite Control is with a sort of FSM which has
• Initial state
• Final states or Accepting state.
• Rejecting state
Computation can either: Halt and ACCEPT or Halt and REJECT or LOOP (the machine fails to
HALT).
REPRESENTATION OF TURING MACHINES
We can describe a Turing machine by employing
1. Instantaneous descriptions (ID) using move-relations.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 313


Automata Theory & Compiler Design 21CS51 Module 5

2. Transition table and


3. Transition diagram (Transition graph).
Instantaneous Description: (ID)
An ID of TM is a string in αqβ, where q is the current state and αβ is the string made from tape
symbols denoted by Γ. The initial ID is denoted by q0αβ, where q0 is the start state and R/W head
points to the first symbol of α from left. The final ID is denoted by αβqfB, where qf is in final state
F and R/W head points to the blank character denoted by B.

REPRESENTATION BY TRANSITION DIAGRAM


We give the definition of δ in the form of a diagram called the transition diagram
When there is a directed edge from state qi to qj with label (α, β, γ), it means that
δ(qi, α ) = (qj, β, γ)
During the processing of an input string, suppose the TM enters qi and the R/W head scans the present
symbol α. As a result the symbol β is written in the cell under the R/W head. The R/W moves to the left
or right, depending on γ and the new state is qj.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 314


Automata Theory & Compiler Design 21CS51 Module 5

REPRESENTATION BY TRANSITION TABLE


We give the definition of δ in the form of a table called the transition table:
δ(q, a ) = (p, X, R)
We write (p, X, R) under the a -column and in the q-row. So if we get X in the table, it means that X is
written in the current cell where input a resides, R gives the movement of the head towards Right and p
denotes the new state into which the Turing machine enters.
Example: δ(q, a ) = (p, X, R)

State Tape symbols

a X B

→q (p, X, R)

LANGUAGE ACCEPTABILITY BY TURING MACHINES


1. A Turing Machine can halt and accept by entering into final state.
2. A Turing Machine can Halt and reject, this is possible if transition function is not defined.
3. TM will never halt and enters into an infinite loop.
Let M = ( Q, Σ, Γ, δ, q0, B, F ) be a TM, The language accepted by M is defined as L(M) = { w |
q0w| αpβ where w in Σ*, p is in F and αβ is in Γ*}
Initially machine will be in the start state q0,with R/W head pointing to the first symbol of string w
from left. After some sequence of moves, if the TM enters into final state and halts, then we say
that the string w is accepted by TM.
The language accepted by TM is called Recursively Enumerable Language or RE language.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 315


Automata Theory & Compiler Design 21CS51 Module 5

A TM that always halt irrespective of whether they accept or not, are a good model for an
algorithm. If an algorithm exist for a given problem, then the problem is decidable otherwise it is
un-decidable problem.
DESIGN OF TURING MACHINES
Basic guidelines for designing a Turing machine:
• The fundamental objective in scanning a symbol by the R/W head is to 'know‟ what to do in the
future.
• The machine must remember the past symbols scanned. The Turing machine can remember this
by going to the next unique state.
• The number of states must be minimized. This can be achieved by changing the states only
when there is a change in the written symbol or when there is a change in the movement of the
R/W head.
********Design a Turing Machine to accept the language L = { an bn | n  1 }. Write the transition
diagram, also show the moves made by the TM for the string “aabb”.
General Procedure:
Starting from the left end machine checks the first input symbol, “a” and changes it to X, and move
the r/w head towards right until it sees a left most “b”. Now when it finds a leftmost “b”, replace it
by Y and move the r/w head towards left. At this point number of “a‟s matches with number of
“b‟s. Again repeat the same process till all a‟s and b‟s are replaced by X‟s and Y‟s respectively. In
start state if there are no “a’s,(only Y) then change the state and see for no “b‟s. Finally when
machine reads B, we say that language contains, n number of a‟s followed by n number of b‟s.
δ ( q0, a ) = ( q1, X, R ) ; replace a by X and move right
δ ( q1, a ) = ( q1, a, R ) ; In right move, ignore all a‟s and Y‟s
δ ( q1, Y ) = ( q1, Y, R )
δ ( q1, b ) = ( q2, Y, L ) ; replace b by Y, and move left
δ ( q2, a ) = ( q2, a, L ) ;In left move, ignore all a‟s and Y‟s
δ ( q2, Y ) = ( q2, Y, L )
δ ( q2, X ) = ( q0, X, R ) ; when it finds X in q2, move right, go to q0 and repeat the process.
After replacing all a‟s by X‟s and b‟s by Y‟s, and machine is in state q0 reads Y, it means that there
are no a‟s, we should see that there are no b‟s. For this change state to q3 and replace by Y by Y
and move right.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 316


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q0, Y ) = ( q3, Y, R )
In state q3 we should see that there are only Y‟s and no more b‟s. So as we scan Y‟s, replace Y by
Y and remain in q3 only.
δ ( q3, Y ) = ( q3, Y, R )
In state q3 if it reads B, it indicates that there no b‟s and we say that the language accepted, since it
contains n number of a‟s followed by n number of b‟s.
δ ( q3, B ) = ( qf, B, R )
Answer:
The TM for the language L = { an bn | n  1 } is given by
M = ({ q0, q1 q2, q3 qf,} , {a, b}, { a, b, X, Y,B }, δ, {q0}, B, {qf}) where δ is the transition function
given by:
δ ( q0, a ) = ( q1, X, R )

δ ( q1, a ) = ( q1, a, R )

δ ( q1, Y ) = ( q1, Y, R )
δ ( q1, b ) = ( q2, Y, L )

δ ( q2, a) = ( q2, a, L )

δ ( q2, Y ) = ( q2, Y, L )
δ ( q2, X ) = ( q0, X, R )

δ ( q0, Y ) = ( q3, Y, R )
δ ( q3, Y ) = ( q3, Y, R )
δ ( q3, B ) = ( qf, B, R )
OR
δ is given by the transition table:
a b X Y B
→q0 (q1, X, R) (q3, Y, R)
q1 (q1, a, R) (q2, Y, L) (q1, Y, R)

q2 (q2, a, L) (q0, X, R) (q2, Y, L)

q3 (q3, Y, R) (qf, B, R)

qf

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 317


Automata Theory & Compiler Design 21CS51 Module 5

Transition diagram:

ID for the string “aabb”:

********Design a Turing Machine to accept the language L = { anbncn | n  1 }. Write the


transition diagram, also show the moves made by the TM for the string “aabbcc”.
General Procedure:
Starting at the left end machine checks the first input symbol, “a” and change it to X, and move the
r/w head towards right, when it finds a leftmost “b”, replace it by Y and move the r/w head towards
right, when it finds leftmost c‟ replace it by Z and move the r/w head towards left. At this point
number of “a‟s matches with number of “b‟s and number of c’s. Again repeat the same process till
all a‟s, b‟s and c‟s are replaced by X‟s,Y‟s and Z‟s respectively. In start state if there are no
“a’s,(only Y) then change the state and see for no “b‟s. Again change the state and see for no c‟s (
only Z). Finally when machine reads B, we say that language contains, n number of a‟s followed by
n number of b‟s followed by n number of c‟s.
Transition function:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 318


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q0, a ) = ( q1, X, R ) ; replace a by X and move right


δ ( q1, a ) = ( q1, a, R )
δ ( q1, Y ) = ( q1, Y, R ) ignore all a‟s and Y‟s in right move.
δ ( q1, b ) = ( q2, Y, R ) ; replace b by Y and move right
δ ( q2, b) = ( q2, b, R)
δ ( q2, Z) = ( q2, Z, R) ; ignore all b‟s and Z‟s in right move.
δ ( q2, c) = ( q3, Z, L) ; replace c by Z and move left
δ ( q3, a ) = ( q3, a, L)
δ ( q3, b ) = ( q3, b, L) ; ignore all a‟s, b‟s, Y,s and Z,s in left move
δ ( q3, Y) = ( q3, Y, L)
δ ( q3, Z ) = ( q3, Z, L)

δ ( q3, X ) = ( q0, X, R) ; when it finds X in left move, repeat the process from q0
After replacing all a‟s by X‟s , b‟s by Y‟s, and c‟s by Z‟s and machine is in state q0 reads Y, it
means that there are no a‟s, we should see that there are no b‟s and c‟s. For this change state to q 3
and replace by Y by Y and move right.
δ ( q0, Y ) = ( q4, Y, R )
In state q4 we should see that there are only Y‟s and no more b‟s. So as we scan Y‟s, replace Y by
Y and remain in q3 only.
δ ( q4, Y ) = ( q4, Y, R )
In state q4 if it reads Z, it means that there are no b‟s, we should see that there are no c‟s and only
Z‟s should be present. So on scanning first Z change state to q5, replace Z by Z and move right.
δ ( q4, Z ) = ( q5, Z, R ) .
In state q5 only Z‟s should be present, so as long as scanned symbol is Z, remain in q5 and replace Z
by Z and move right.
δ ( q5, Z ) = ( q5, Z, R ) .
Once blank symbol is encountered, change state to qf, replace B by B and move right, and we say
that language is accepted by qf. . δ ( q5, B ) = ( qf, B, R ) .
Answer:
The TM for the language L = { an bncn | n  1 } is given by
M = ({ q0, q1 q2, q3 q4, q5, qf,} , {a, b}, { a, b, c, X, Y, Z,B }, δ, {q0}, B, {qf}) where δ is the transition
function given by:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 319


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q0, a ) = ( q1, X, R )
δ ( q1, a ) = ( q1, a, R )
δ ( q1, Y ) = ( q1, Y, R )
δ ( q1, b ) = ( q2, Y, R )
δ ( q2, b) = ( q2, b, R)
δ ( q2, Z) = ( q2, Z, R)
δ ( q2, c) = ( q3, Z, L)
δ ( q3, a ) = ( q3, a, L)
δ ( q3, b ) = ( q3, b, L)
δ ( q3, Y) = ( q3, Y, L)
δ ( q3, Z ) = ( q3, Z, L)
δ ( q3, X ) = ( q0, X, R)
δ ( q0, Y ) = ( q4, Y, R )
δ ( q4, Y ) = ( q4, Y, R ) .
δ ( q4, Z ) = ( q5, Z, R )
δ ( q5, Z ) = ( q5, Z, R ) .
δ ( q5, B ) = ( qf, B, R ) .
Transition Diagram:

ID for the string w = aabbcc


q0aabbccB X q1abbccB X a q1bbccB X a Y q2bccB X a Y b q2ccB X a Y q3b ZcB Xa
q3Y b ZcB X q3a Y b ZcB q3X a Y b ZcB X q0 a Y b ZcB XX q1 Y b ZcB XX Y q1 b

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 320


Automata Theory & Compiler Design 21CS51 Module 5

ZcB XX Y Y q2 ZcB XX Y Y Z q2 cB XX Y Y q3 Z ZB XX Y q3Y Z ZB XX q3Y Y Z


ZB X q3X Y Y Z ZB X X q0 Y Y Z ZB X X Y q4Y Z ZB X X Y Y q4Z ZB XXYYZ
q5ZB X X Y Y Z Z q5B X X Y Y Z Z Bqf

********Design a Turing machine to accept the language consisting of all palindromes of 0‟s and
1‟s. Write the transition diagram. Also write the moves made by TM for the string 101.

General Procedure:
Starting at the left end machine checks the first input symbol, if it is a 0, change it to X. Similarly
if it is a 1, change it to Y and move the r/w head towards right until it sees a blank. Now when it
finds a blank (B) move the r/w head towards left and check whether the scanned input symbol
matches the one most recently changed. If so it is also changed correspondingly and the machine
moves back left until it finds the left most 0 or 1. This process is continued by moving left and right
alternately until all 0‟s and 1‟s have been matched.
δ ( q0, 0 ) = ( q1, X, R ) ;In start state q0, replace 0 by X, change state to q1 and move right.
δ ( q0, 1 ) = ( q2, Y, R ) ;In start state q0, replace 1 by Y, change state to q2 and move right.
δ ( q1, 0 ) = ( q1, 0, R )
δ ( q1, 1 ) = ( q1,1, R ) ; In state q1 or q2 ignore 0‟s and 1‟s and move right until it sees B
or X or Y
δ ( q2, 0 ) = ( q2, 0, R )
δ ( q2, 1 ) = ( q2, 1, R )
δ ( q1, B) = ( q3, B, L ) ; In q1 or q2 when it finds B or X or Y change state to q3 or q4 and
δ ( q1, X) = ( q3, X, L ) move left
δ ( q1, Y) = ( q3, Y, L )
δ ( q2, B) = ( q4, B, L )
δ ( q2, X) = ( q4, X, L )
δ ( q2, Y) = ( q4, Y, L )
In state q3 it verifies that the symbol read is 0 and changes the 0 to an X and goes to state q5 .or in
q4 it verifies that the symbol read is 1 and changes 1 to a Y and goes to state q5.
δ ( q3, 0) = ( q5, X, L)
δ ( q4, 1) = ( q5, Y, L).
In state q5 machine moves left by ignoring 0‟s and 1‟s encountered, until it finds an X or Y.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 321


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q5, 0) = ( q5, 0, L)
δ ( q5, 1) = ( q5, 1, L)
Now when it finds X or Y, once again machine changes state to q0 and moves right.
δ ( q5, X) = ( q0, X, R)
δ ( q5, Y) = ( q0, Y, R)
Once again there are two possible cases in state q0:
1. If machine sees 0‟s and 1‟s, it repeats the above matching cycle process, we have just
described.
2. If machine sees X or Y, then it indicates that machine has changed all 0‟s to X‟s and 1‟s to
Y‟s; the input was of palindrome of even length, and hence machine should accept. Thus
machine enters state qf and halts.
δ ( q0, X) = ( qf, X, R)
δ ( q0, Y) = ( qf, Y, R)
In case machine in state q3 or q4 and reads an X or a Y instead of a 0 or 1, it concludes that input
was a palindrome of odd length.
In this case it changes to state qf.
δ ( q3, X) = ( qf, X, R)
δ ( q3, Y) = ( qf, Y, R)
δ ( q4, X) = ( qf, X, R)
δ ( q4, Y) = ( qf, Y, R).

Note: If machine encounters a 1 in state q3 or a 0 in state q4, then the input is not a palindrome
and so machine dies without accepting.
Answer:
The TM for the language consisting of all palindromes of 0‟s and 1‟s is given by
M = ({q0, q1, q2, q3, q4, q5, qf}, { 0, 1}, { 0, 1, X, Y, B}, δ, q0, B, {qf}) where δ is the transition
function given by:
δ ( q0, 0 ) = ( q1, X, R )
δ ( q0, 1 ) = ( q2, Y, R )
δ (q0, B) = ( qf, B, R)
δ ( q0, X) = ( qf, X, R)
δ ( q0, Y) = ( qf, Y, R)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 322


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q1, 0 ) = ( q1, 0, R )
δ ( q1, 1 ) = ( q1,1, R )
δ ( q2, 0 ) = ( q2, 0, R )
δ ( q2, 1 ) = ( q2, 1, R )
δ ( q1, B) = ( q3, B, L )
δ ( q1, X) = ( q3, X, L )
δ ( q1, Y) = ( q3, Y, L )
δ ( q2, B) = ( q4, B, L )
δ ( q2, X) = ( q4, X, L )
δ ( q2, Y) = ( q4, Y, L )
δ ( q3, 0) = ( q5, X, L)
δ ( q3, X) = ( qf, X, R)
δ ( q3, Y) = ( qf, Y, R)
δ ( q4, 1) = ( q5, Y, L)
δ ( q4, X) = ( qf, X, R)
δ ( q4, Y) = ( qf, Y, R) δ ( q5, 0) = ( q5, 0, L)
δ ( q5, 1) = ( q5, 1, L)
δ ( q5, X) = ( q0, X, R)
δ ( q5, Y) = ( q0, Y, R)
Transition Diagram:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 323


Automata Theory & Compiler Design 21CS51 Module 5

ID for the string 101:


q0101B Yq201 B Y0 q21 B Y01 q2 B Y0 q41 B Y q50 Y B q5Y0 Y B Yq00 Y B
YX q1Y B Y q3XY B YX qfY B

******Design a TM that accept the language L = {wwR | w € (0, 1)*}. Write its transition diagram.
Also show the moves made by the TM for the 0110.
Note: Answer is same as that of previous problem except in wwR (string of palindrome of even
length) from states q3 and q4 no transitions are defined on input symbols X and Y.
Answer:
The TM for the language L = {wwR | w € (0, 1)*} is given by
M = ({q0, q1, q2, q3, q4, q5, qf}, {0, 1}, { 0, 1, X, Y, B}, δ, q0, B, {qf}) where δ is the transition
function given by:
δ ( q0, 0 ) = ( q1, X, R )
δ ( q0, 1 ) = ( q2, Y, R )
δ (q0, B) = ( qf, B, R)
δ ( q0, X) = ( qf, X, R)
δ ( q0, Y) = ( qf, Y, R)
δ ( q1, 0 ) = ( q1, 0, R )
δ ( q1, 1 ) = ( q1,1, R )
δ ( q2, 0 ) = ( q2, 0, R )
δ ( q2, 1 ) = ( q2, 1, R )
δ ( q1, B) = ( q3, B, L )
δ ( q1, X) = ( q3, X, L )
δ ( q1, Y) = ( q3, Y, L )
δ ( q2, B) = ( q4, B, L )
δ ( q2, X) = ( q4, X, L )
δ ( q2, Y) = ( q4, Y, L )
δ ( q3, 0) = ( q5, X, L)
δ ( q4, 1) = ( q5, Y, L)
δ ( q5, 0) = ( q5, 0, L)
δ ( q5, 1) = ( q5, 1, L)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 324


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q5, X) = ( q0, X, R)
δ ( q5, Y) = ( q0, Y, R)
Transition Diagram:

ID for the string 0110:


q0 0110B Xq1110 B X1 q110 B X11 q10 B X110 q1 B X11 q30 B X1 q51 X B X
q51 1 X B q5X1 1 X B Xq01 1 X B X Y q21 X B XY1q2 X B XY q41 X B X q5 Y Y X
B X Y q0 Y X B XY Y qf X B

******Design a TM that accept the language L = {w| Na(w) = Nb(w) for all w € (a, b)* }. Write its
transition diagram. Also show the moves made by the TM for the bbabaa.
General Procedure:
Three possible cases:
1. On encountering B in start state, machine directly enters into final state qf.
2. On encountering a in stateq0.
3. On encountering b in state q0.
On encountering B in start state, machine directly enters into final state qf.
δ ( q0, B) = ( qf, B, R)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 325


Automata Theory & Compiler Design 21CS51 Module 5

In start state q0 on encountering a, we skip all subsequent symbols till we get b. Then come back to
the next leftmost symbol and repeat any of the 3 cases based on the next input symbol to be
scanned.
In start state q0 on encountering b, we skip all subsequent symbols till we get a. Then come back to
the next leftmost symbol and repeat any of the 3 cases based on the next input symbol to be
scanned.
On encountering a:
δ ( q0, a) = ( q1, X, R) ; replace a by X and move right to get b
δ ( q1, a) = ( q1, a, R) ; ignore all a‟s and Y‟s till we get b
δ ( q1, Y) = ( q1, Y, R)
δ ( q1, b) = ( q2, Y, L) ; replace b by Y and move left and find the next leftmost symbol.
δ ( q2, a) = ( q2, a, L) ; when searching for X, we may encounter a‟s and Y‟s, so ignore that symbol.
δ ( q2, Y) = ( q2, Y, L)
δ ( q2, X) = ( q0, X, R) ; when it finds X, go to q0 and repeat.
On encountering b:
δ ( q0, b) = ( q3, X, R) ; replace b by X and move right to get a
δ ( q3, b) = ( q3, b, R) ; ignore all b‟s and Y‟s until it sees a
δ ( q3, Y) = ( q3, Y, R)
δ ( q3, a) = ( q4, Y, L) ; replace a by Y and move left and find the next leftmost symbol.
δ ( q4, b) = ( q4, b, L) ; when searching for X, we may encounter b‟s and Y‟s, so ignore that symbol.
δ ( q4, Y) = ( q4, Y, L)
δ ( q4, X) = ( q0, X, R) ; when it finds X, go to q0 and repeat.
In state q0 if machine reads Y, it indicates that so far the scanned symbols have equal number of a‟s
and b‟s. So replace Y by Y and move the r/w head towards right, remain in q 0 and repeat any one
of the three cases
δ ( q0, Y) = ( q0, Y, R)
Finally the language is accepted when there is no input in q0, machine enters to final state qf
Answer:
The TM for the language L = {w| Na(w) = Nb(w) for all w € (a, b)* }is given by
M = ({q0, q1, q2, q3, q4, qf}, {0, 1}, { 0, 1, X, Y, B}, δ, q0, B, {qf}) where δ is the transition function
given by:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 326


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q0, B) = ( qf, B, R)
δ ( q0, a) = ( q1, X, R)

δ ( q0, Y) = ( q0, Y, R)
δ ( q1, a) = ( q1, a, R)
δ ( q1, Y) = ( q1, Y, R)
δ ( q1, b) = ( q2, Y, L)
δ ( q2, a) = ( q2, a, L)
δ ( q2, Y) = ( q2, Y, L)
δ ( q2, X) = ( q0, X, R)
δ ( q0, b) = ( q3, X, R)
δ ( q3, b) = ( q3, b, R)
δ ( q4, X) = ( q0, X, R)
δ ( q3, a) = ( q4, Y, L)
δ ( q4, b) = ( q4, b, L)
δ ( q4, Y) = ( q4, Y, L)
δ ( q3, Y) = ( q3, Y, R)
Transition Diagram:

ID for the string bbabaa:


q0 bbabaaB X q3babaaB X b q3abaaB X q4b YbaaB q4X b YbaaB X q0 b YbaaB X X
q3YbaaB X X Y q3baaB X X Y b q3aaB X X Y q4 b YaB X X q4Y b YaB X q4X Y b YaB
X X q0 Y b YaB X X Y q0 b YaB X X Y X q3 YaB X X Y X Y q3aB X X Y X q4Y YB X
X Y q4X Y YB X X Y X q0Y YB X X Y X Y q0YB X X Y X Y Y q0B X X Y X Y Y Bqf

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 327


Automata Theory & Compiler Design 21CS51 Module 5

*********Given a string w, design a Turing machine that generates the string ww where w € a*
General Procedure:
1. Replace each symbol in w with X
2. Find the rightmost X
3. Replace the rightmost X by the symbol a
4. Move the R/W head towards right of rightmost a and replace B by a
5. Find the rightmost X
6. Repeat through step 3 till we find no more X‟s
In state q0, keep on replacing the input symbol a by X and move the r/w head towards right till we
find B.
δ ( q0, a) = ( q0, X, R)
In state q0, when it finds B, replace B by B and change the state to q1 and move r/w head towards
left, till we get X.
δ ( q0, B) = ( q1, B, L)
If we get a in state q1, replace a by a and move left.
δ ( q1, a) = ( q1, a, L)
When we get X in q1 replace it by a and move right, change the sate to q2
δ ( q1, X) = ( q2, a, R)
If we get a in state q2, replace a by a and move right till we get B
δ ( q2, a) = ( q2, a, R)
In state q2, when it finds B, replace B by a and change the state to q1 and move r/w head towards
left, till we get X and repeat the above steps.
δ ( q2, B) = ( q1, a, L)
Finally when there is no more X‟s, and in state q1, machine reads B as the input, change the state to
qf, replace B by B and move right.
δ ( q1, B) = ( qf, B, R)
The Turing machine that generates the string ww where w € a* is given by
M = ({q0, q1, q2, qf }, {a}, { a, X, B}, δ, q0, B, {qf}) where δ is the transition function given by:
δ ( q0, a) = ( q0, X, R)
δ ( q0, B) = ( q1, B, L)
δ ( q1, a) = ( q1, a, L)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 328


Automata Theory & Compiler Design 21CS51 Module 5

δ ( q1, X) = ( q2, a, R)
δ ( q2, a) = ( q2, a, R)
δ ( q2, B) = ( q1, a, L)
δ ( q1, B) = ( qf, B, R)
DESCRIPTION OF TURING MACHINES
In the examples discussed so far, the transition function δ was described as a partial function
(function δ: Q x Г → Q x Г x {L, R} is not defined for all (q, x) by spelling out the current state,
the input symbol, the resulting state, the tape symbol replacing the input symbol and the movement
of R/W head to the left or right. We can call this a formal description of a TM. Just as we have the
machine language and higher level languages for a computer. We can have a higher level of
description, called the implementation description. In this case we describe the movement of the
head, the symbol stored etc. in English.
For example, a single instruction like 'move to right till the end of the input string' requires several
moves. A single instruction in the implementation description is equivalent to several moves of a
standard TM.
At a higher level we can give instructions in English language even without specifying the state or
transition function. This is called a high-level description. In next section we give implementation
description or high-level description.
TECHNIQUES FOR TM CONSTRUCTION
The Turing machine, which we have discussed so far, is called the standard or Basic Turing
machine. In this section we give some high-level conceptual tools to make the construction of TMs
easier.
1. Turing Machine with stationary head
2. Storage in the state
3. Multiple Track Turing Machine
4. Subroutines
TURING MACHINE WITH STATIONARY HEAD
In the definition of a TM we defined δ(q, a) = (p, Y, D) where D = L or R. So the head moves to
the left or right after reading an input symbol. Suppose we want to include the option that the head
can continue to be in the same cell for some input symbol. Then we define δ(q, a) = (p, Y, S). This
means that the TM, on reading the input symbol a, changes the state to p and writes Y in

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 329


Automata Theory & Compiler Design 21CS51 Module 5

the current cell in place of a and R/W head continues to remain in the same cell.
Stationary move can be simulated by the standard TM with Two moves.

STORAGE IN THE STATE


As shown in the model of Turing machine, this has a finite control unit. This finite control can be used
to hold some amount of information. The finite automata stores the information in pair of elements such
as the current state and the current symbol pointed by the R/W head.
Example: The transition function δ can be written as follows:
δ([q0, 0], 1) = ([q1, 1], X, R)
This means that if finite control shows the initial state is q0 and stores the current input symbol 0, if it
reads the symbol 1 then the machine goes to next state q1 and replaces that 1 by X and moves to right.
This helps in building the transition graph of the language.
The new set of states becomes Q x Γ
Construct a TM that accepts the language 0 1* + 1 0*
The TM for the language L = {0 1* + 1 0*} is given by
M = ({q0, q1, qf}, {0, 1}, { 0, 1, B}, δ, q0, B, {qf}) where δ is the transition function given by:
δ ( q0, 0) = ( q1, 0, R)
δ ( q0, 1) = ( q2, 1, R)
δ ( q1, 1) = ( q1, 1, R)
δ ( q2, 0) = ( q2, 0, R)
δ ( q1, B) = ( qf, B, R)
δ ( q2, B) = ( qf, B, R)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 330


Automata Theory & Compiler Design 21CS51 Module 5

The above problem can be simulated as:

State [ q0, B] → In the initial state, M is in q0 and TM has seen only B in its data portion.
In state [q0, B] on seeing the first symbol as 0, of the input sting w, M moves right, enters the state
[q1, 0]
In state [q0, B] on seeing the first symbol as 1, of the input sting w, M moves right, enters the state
[q2, 1]
In [q1, 0] → M moves right without changing state for input symbol 1.
In [q2, 1] → M moves right without changing state for input symbol 0.
In state [q1, 0] if its next symbol is B, M enters [qf, B], an accepting state.
In state [q2, 1] if its next symbol is B, M enters [qf, B], an accepting state.

MULTIPLE TRACK TURING MACHINE


Write a short note on Multiple Track Turing machine.
In the case of TM defined earlier ie: standard Turing machine, a single tape was used. In a multiple
track TM, a single tape is assumed to be divided into several tracks. Now the tape alphabet is required
to consist of k-tuples of tape symbols, k being the number of tracks. Hence the only difference between
the standard TM and the TM with multiple tracks is the set of tape symbols.
In the case of the standard Turing machine, tape symbols are elements of Γ; in the case of TM with
multiple track, it is Γk. The moves are defined in a similar way

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 331


Automata Theory & Compiler Design 21CS51 Module 5

Example:

Here the input symbols are tape symbols defined in 3 tracks. ie: Γ3 ;for example input is [c, a, b]
δ ( q, c, a, b) = (qnext, Z, X, Y, R)
The resultant tape structure is as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 332


Automata Theory & Compiler Design 21CS51 Module 5

SUBROUTINES
Subroutines are used in computer languages, when some task has to be done repeatedly. We can
implement this facility for TMs as well.
TM subroutine is a set of states that perform some pre-defined task. The TM subroutine has a start state
and a state without any moves. This state which has no moves serves as the return state and passes the
control to the state which calls the subroutine.
Design a TM which can multiply two positive integers
The input (m, n) where m, n being given, the positive integers are represented by 0m10n. M starts
with 0m10n in its tape. At the end of the computation 0mn (mn in unary representation) surrounded
by B's is obtained as the output.
General Procedure:
1. 0m10n1 is placed on the tape and output will be written after the rightmost 1.
2. The Leftmost 0 is erased by replacing 0 by B.
3. A block of n 0‟s is copied onto the right end.
4. Step 2 and 3 is repeated m times and 10m10mn is obtained on the tape
5. The prefix 10n1 of 10n10mn is erased. (replacing all 0‟s and 1‟s by B) leaving the product mn
as the output.
For example multiply 2 and 4: Initially tape contains these two unary numbers is as follows:

1 is used as delimiter for separation of two numbers.


At the end of step 4 tape structure is as follows:
B B B B B B B 1 0 0 0 0 1 0 0 0 0 0 0 0 0 B B

After multiplication the result stored in tape is as follows:


B B B B B B B B B B B B B 0 0 0 0 0 0 0 0 B B

Here we have to copy n number of 0‟s from the second group to the last group by replacing n number of
B‟s by n number of 0‟s

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 333


Automata Theory & Compiler Design 21CS51 Module 5

In start state q0 replace leftmost 0 by B, change state to q1 and move the r/w head towards right till we
get 1.
δ (q0, 0) = ( q1, B, R)
Now we should copy n 0‟s from the second group to last group.
δ (q1, 0) = ( q1, 0, R)
δ (q1, 1) = ( q2, 1, R)
Now the R/w is pointing to the first 0 of second group. (COPY subroutine in start state q2 )
In q2 replace 0 by X and change the state to q3, move r/w head towards right till we get B.
δ (q2, 0) = ( q3, X, R)
δ (q3, 0) = ( q3, 0, R)
δ (q3, 1) = ( q3, 1, R)
In q3 when it reads B, replace B by 0 and change state to q4 and move left.( at this point one symbol
is copied from second group to last group)
δ (q3, B) = ( q4, 0, L)
In q4 we should search for rightmost X.
While moving left in q4, replace 0 by 0 , 1 by 1 till we get X.
δ (q4, 0) = ( q4, 0, L)
δ (q4, 1) = ( q4, 1, L)
When it reads X, change state to q2 , replace X by X and move right.
δ (q4, X) = ( q2, X, R)
When n number of B‟s in last group are, replaced by n number of 0‟s in second group. In state q2
machine reads 1 then change state to q5 and move left.
δ (q2, 1) = ( q5, 1, L)
In state q5 while moving left, replace all X‟s in second group by 0‟s, till we get 1.
δ (q5, X) = ( q5, 0, L)
When machine reads 1 in q5 replace 1 by 1 and move right, change state to q6
δ (q5, 1) = ( q6, 1, R)
In state q6 machine reads 0,(pointing to the first symbol of 2nd group) move left and change state to
q7.
δ (q6, 0) = ( q7, 1, L)
In q7 machine reads 1; the delimiter of 1st and 2nd group. Change state to q8 and move left, so that
machine enters the 1st group.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 334


Automata Theory & Compiler Design 21CS51 Module 5

δ (q7, 1) = ( q8, 1, L)
In q8 when it reads 0 change state to q9 and move left.
δ (q8, 0) = ( q9, 0, L)
In q9 on any number 0s move the r/w towards left.
δ (q9, 0) = ( q9, 0, L)
In q9 if we encounter B, change the state to q0 and move right.
δ (q9, B) = ( q0, B, R)
But, in state q8, instead of 0‟s if machine encounters B‟s it means that n 0‟s have been copied from
the second group to last group m number of times.

Now replace the delimiter1 which precede and follow the second group and second group 0‟s by
B‟s.
δ (q8, B) = ( q10, B, R)
δ (q10, 1) = ( q11, B, R)
δ (q11, 0) = ( q11, B, R)
δ (q11, 1) = ( q12, B, R)
B B B B B B B B B B B B B 0 0 0 0 0 0 0 0 B B

The TM which can multiply two positive integers is given by


M = ({q0, q1, q2, q3, q4, q5, q6, q7, q8, q9, q10, q11, q12}, {0, 1}, {0, 1,X, B}, δ, q0, B, { q12 }) where δ is
the transition function given by:
δ (q0, 0) = ( q1, B, R)
δ (q1, 0) = ( q1, 0, R)
δ (q1, 1) = ( q2, 1, R)
δ (q2, 0) = ( q3, X, R)
δ (q3, 0) = ( q3, 0, R)
δ (q3, 1) = ( q3, 1, R)
δ (q3, B) = ( q4, 0, L)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 335


Automata Theory & Compiler Design 21CS51 Module 5

δ (q4, 0) = ( q4, 0, L)
δ (q4, 1) = ( q4, 1, L)
δ (q4, X) = ( q2, X, R)
δ (q6, 0) = ( q7, 1, L)
δ (q5, X) = ( q5, 0, L)
δ (q5, 1) = ( q6, 1, R)
δ (q2, 1) = ( q5, 1, L)
δ (q7, 1) = ( q8, 1, L)
δ (q8, 0) = ( q9, 0, L)
δ (q9, 0) = ( q9, 0, L)
δ (q9, B) = ( q0, B, R)
δ (q8, B) = ( q10, B, R)
δ (q10, 1) = ( q11, B, R)
δ (q11, 0) = ( q11, B, R)
δ (q11, 1) = ( q12, B, R)
Transition diagram:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 336


Automata Theory & Compiler Design 21CS51 Module 5

Variants of Turing Machines (TM):


Turing machine we have discussed so far has a single tape, δ (q, a) is either a single triple (p, Y, D),
where D = R or L, or is not defined. If we modify the structure of Turing machine , we get variants
of TM such as:
 Multi-tape TM
 Non-deterministic TM
Single tape TM

Multi-track TM: Single tape with multiple tracks.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 337


Automata Theory & Compiler Design 21CS51 Module 5

Multi-Tape TM:
A Turing machine M with more than one tape.

A multi-tape TM has a finite set Q of states, an initial state q0, a subset F of Q called the set of final
states, a set P of tape symbols, a new symbol B not in P called the blank symbol.
• There are k tapes, each divided into cells. The first tape holds the input string w.
• Initially all the other tapes hold the blank symbol.(B)
• Initially the head of the first tape (input tape) is at the left end of the input w.
• All the other heads can be placed at any cell initially.
• δ is a partial transition function from Q x Гk into Q x Гk x {L, R, S}k. where k is the
number of tapes.
• Multi-tape TM is more powerful than single tape TM but the language accepted by Multi-
tape TM is recursively enumerable language. That means language accepted by Multi-tape
TM is also accepted by basic or standard TM. Multi-tape TM and standard TM are
equivalent.
The Multi-tape TM M enters a new state.
 On each tape a new symbol is written in the cell under the head.
 Each tape head moves to the left or right or remains stationary. The heads move
independently: some move to the left, some to the right and the remaining heads do not
move.
 The initial ID has the initial state q0, the input string w in the first tape (input tape), empty
strings of B's in the remaining k - 1 tapes.
 An accepting ID has a final state, some strings in each of the k tapes.

Example: δ(q, a, b, c) = (p, X, Y, Z, L, R, R)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 338


Automata Theory & Compiler Design 21CS51 Module 5

*****Every language accepted by a multi-tape TM is acceptable by some single-tape TM (that is,


the standard TM).

OR

Show that language accepted by Multi-tape TM is recursively enumerable language.

Proof:
Suppose a language L is accepted by a k-tape (Multi tape) TM M. We simulate M with a single-tape
TM M1 with 2k tracks. Let us consider the implementation description by considering k = 2
We will prove this theorem by simulating the working of 2-tape TM (M) with the working of single
tape 4-track TM (M1). Assume that the second, fourth, ..., (2k)th tracks hold the contents of the k-
tapes. The first, third, ... , (2k - 1)th tracks hold a R/w head marker (a symbol say X) to indicate the
position of the respective tape head.

Let us consider the single move of above multi tape TM:


δ (q, a2, b5) = (qnext , 0, 1, L, R)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 339


Automata Theory & Compiler Design 21CS51 Module 5

The R/w head markers (X) of the first and third tracks are at the cells containing the first symbol.
To simulate the above move of Multi-tape TM M in single tape TM M1, the single tape TM M1 has
to visit the two R/w head markers and store the scanned symbols in its finite control.
The finite control of single tape TM M1 has also the information about the states of multi-tape TM
M and its moves.

δ (q, a2, b5)

Now M1 revisits each of the head markers to perform the following operations:
It changes the tape symbol in the corresponding track of single tape TM M1 based on the
information regarding the move of 2-tape TM M corresponding to the state (of M) and the tape
symbol in the corresponding tape M. It moves the head markers to the left or right. M1 changes
the state of M in its control

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 340


Automata Theory & Compiler Design 21CS51 Module 5

δ (q, a2, b5) = (qnext , 0, 1, L, R)


At the end of this, M1 is ready to implement its next move based on the revised positions of its head
markers and the changed state available in its control. Single tape TM M1 accepts a string w only
when it reaches a state that is recorded as a final state of M in its control at the end of the
processing of w. Hence the proof.
Non-Deterministic TM:
The non-deterministic Turing machine is a kind of TM in which the set of rules denote more than one
specific action reading particular input in current specific state.
A Non-deterministic TM can be formally defined as M: which is a 7-tuple, namely (Q, ∑ ,Г, δ, q0. B,
F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols and is a subset of Г and B ≠ ∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → power set of Q x Г x { L/R}
• q0 € Q is the initial state, and
• F is the subset of Q is the set of final states.
It is clear from the definition of δ that for each state q and tape symbol X, δ( q, Г) is a set of triples
δ( q, Г) = {( q1, Г1 , R), ( q1, Г2 , L) , ( q1, Г3 , R) ………………..}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 341


Automata Theory & Compiler Design 21CS51 Module 5

The non-deterministic TM in fact is no more powerful than the deterministic TM. Any language
accepted by non-deterministic TM can be accepted by deterministic TM.

THE MODEL OF LINEAR BOUNDED AUTOMATON (LBA)


A linear bounded automaton is a non-deterministic Turing machine which has a single tape whose
length is not infinite but bounded by a linear function of the length of the input string. A linear
function is used to restrict (to bound) the length of the tape. The set of context-sensitive languages
is accepted by this model.

LBA is formally defined as M: which is a 9-tuple, namely (Q, ∑ ,Г, δ, q0. B, , $, F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols with two special symbols ,$ and is a subset of Г and B
≠∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → Q x Г x { L/R}
• q0 € Q is the initial state, and
F is the subset of Q is the set of final states
is the left end marker, which is entered in the leftmost cell of the input tape and prevents the
R/W head from getting off the left end of the tape.
$ is the right end marker, which is entered in the rightmost cell of the input tape and prevents the
R/W head from getting off the right end of the tape. Both the end markers should not appear on
any other cell within the input tape. R/W head should not print any other symbol over both the
end markers.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 342


Automata Theory & Compiler Design 21CS51 Module 5

Linear Bounded Automata Model

There are two tapes: one is called the input tape, and the other, working tape.
• On the input tape the head never prints and never moves to the left.
• On the working tape the head can modify the contents in any way, without any restriction.
The set of strings accepted by nondeterministic LBA is the set of strings generated by the context-
sensitive grammars, excluding the null strings. That is context sensitive language.
DECIDABILITY
The notion of a recursively enumerable language and a recursive language existed even before
the invention of computers. These languages are also defined using Turing machines as follows:
 TM halts when it reaches a final state after reading the entire input string w.
 Turing machine M halts when M reaches a state q and a current input symbol a to be
scanned so that δ(q, a) is undefined.
 There are TMs that never halt on some inputs in any one of these ways. (it may enter into
infinite loop)
So we make a distinction between the languages accepted by a TM that halts on all input strings
and a TM that never halts on some input strings. That leads to the property of decidability and un-
decidability of the language.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 343


Automata Theory & Compiler Design 21CS51 Module 5

Recursively Enumerable Languages: (RE Languages)


A language L which is a subset of ∑* is a recursively enumerable language if there exists a TM M,
such that L = T(M). ( Halt or enter into infinite loop)
Recursively Languages:
A language L which is a subset of ∑* is a recursive language if there exists a TM M, that satisfies
the following two conditions.
i. If the string w is defined in the language, then the TM accepts the string w and
Halts.
ii. If the string w is not defined in the language, then the TM eventually Halts without
reaching an accepting state.
Recursive language definition assures us that TM always Halts. It is clear that a recursive language
is subset of recursively enumerable language.
DECIDABLE LANGUAGES
A language L is said to be decidable language if the corresponding language L is recursive
language. That is a problem with two answers Yes/No is decidable if the corresponding language is
recursive. The class of decidable problems is called as solvable problems.
Decidability of regular languages:
Problem of testing whether a deterministic finite automaton accepts a given input string w is
decidable?
Show that language accepted by a DFA is decidable.
Proof: By simulating the working of DFA B and input w in TM M
Let ADFA = {(B, w) | B accepts the input string w}
Let us construct a TM which takes the input as (B, w) represented by the five components of DFA
as: Q, ∑, δ, q0, F by strings of w € ∑*.
A Turing Machine M check the input (B, w) is a valid input, if not TM rejects (B, w) and Halts.
If the input (B, w) is a valid input, TM M writes the initial state q0 and the leftmost input symbol of
w. It updates the state using the transition function of DFA δ and then reads the next input symbol
in w.
If the simulation ends in an accepting state for the input w, then M accepts (B, w) otherwise M
rejects (B, w). M accepts (B, w) if and only if w is accepted by DFA B.
Decidability of context free languages:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 344


Automata Theory & Compiler Design 21CS51 Module 5

 The context-free grammar G accepts the input string w is decidable


Decidability of context sensitive languages:
 The context-sensitive grammar G accepts the input string w is decidable
UNDECIDABLE LANGUAGES
If the language L is not a recursive language, then such a language is called un-decidable language.
If there is any language L which is recursively enumerable then there exists a TM which semi-
decides it(either accept or reject or loops forever). Every TM has description of finite length. Hence
the number of TM and number of RE languages is count-ably infinite.
ATM = {(M, w) | The TM M accepts w} is un-decidable.
HALTING PROBLEM OF TURING MACHINE
Reduction technique is used to prove the un-decidability of halting problem of Turing machine
Problem A is reducible to problem B if a solution to problem B can be used to solve problem A.
For example, if A is the problem of finding some root of x4 - 3x2 + 2 = 0 and B is the problem of
finding some root of x2 - 2 = 0, then A is reducible to B. As x2 - 2 is a factor of x4 - 3x2 + 2. A root
of x2 - 2 = 0 is also a root of x4 - 3x2 + 2.
If A is reducible to B and B is decidable then A is decidable. If A is reducible to B and A is un-
decidable then B is un-decidable.
The Turing machine M halts on input w is un-decidable.
The output of TM can be:
Halt: The machine starting at this configuration will halt after a finite number of states.
No-Halt: The machine starting at this configuration never reaches a halt state, no matter how long it
runs. Based on these two observations: given any functional matrix, input data tape and initial
configuration, then it is possible to determine whether the process will ever halt? This is called
Halting problem. That means we are asking for a procedure which enable us to solve the halting
problem for every pair of machine input as: (Machine, Tape symbol). The answer is NO.
That means Halting problem is un-decidable or un-solvable
Prove that Halting Problem of TM is un-decidable
OR
HALTTM = {(M, w) | The Turing Machine M halts on input w} is un-decidable
Let us assume that Halting problem (HALTTM) is decidable

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 345


Automata Theory & Compiler Design 21CS51 Module 5

Let M1 be the TM which decides whether or not any computation by another TM M will ever halt
when a description of that TM M is given as M and tape symbol as w.
That means input to M1 will be (machine, tape) pair; (M, w)
Then for every input (M, w) to M1; if TM M accept input w, then M1 halts which is called Accept
halt.
Similarly if M does not accept input w then the machine M1 will halt which is called reject halt.

Now we construct one more TM M2 which takes an input M.


It first copies M and duplicates M on its tape and then this duplicated tape information are given as
input to machine M1. But machine M1 is a modified machine with modification that whenever M1 is
supposed to reach an accept halt, M2 loops forever.
The behavior of M2 is as shown below:
It loops if M halts for input w and halts if M does not halt for input w.

As M2 itself is one TM we will take M2 = M. that means we will replace M by M2 from the above
given machine.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 346


Automata Theory & Compiler Design 21CS51 Module 5

Thus the machine M2 halts for input M2, if M2 does not halt for M2. This is a contradiction. That
means a machine M1 which can tell whether any other Turing machine will halt on particular input
does not exist. Hence halting problem is un-decidable.
THE POST CORRESPONDENCE PROBLEM
The Post Correspondence Problem (PCP) was first introduced by Emil Post in 1946. Later, the
problem was found to have many applications in the theory of formal languages. The problem over
an alphabet ∑ belongs to a class of yes/no problems and is stated as follows:
Consider the two lists of non-empty strings over an alphabet ∑ = { 0, 1}
x = ( x1, x2, x3, x4,……………………. xn)
y = ( y1, y2, y3, y4,……………………. yn)
The PCP is to determine whether or not there exist i1,i2……………im where 1 ≤ ij ≤ n such that
xi1 xi2………………… xim = yi1 yi2………………… yim
The indices ij need not be distinct and m may be greater than n. Also, if there exists a solution to
PCP, there exist infinitely many solutions.
Does the PCP with two lists x = (b, bab3, ba) and y = (b3, ba, a) have a solution?
Answer:
We have to determine whether or not there exists a sequence of substrings of x such that the string
formed by this sequence and the string formed by the sequence of corresponding substrings of y are
identical.
x = (b, bab3, ba) and y = (b3, ba, a)
The required sequence is given by:
i1 = 2, i2 = 1, i3 = 1, i4 = 3, ie: (2, 1, 1, 3) and m = 4
The corresponding strings are:

bab3bbba = bab3b3a ie: babbbbbba = babbbbbba ; Thus the PCP has a solution.
Does the PCP with two lists x = (11, 100, 111) and y = (111, 001, 11) have a solution?
The required sequence is given by:
i1 = 1, i2 = 2, i3 = 3, ie: (1, 2, 3) and m = 3
The corresponding strings are:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 347


Automata Theory & Compiler Design 21CS51 Module 5

11100111 =11100111 ; Thus the PCP has a solution


Prove that PCP with two lists x = (01, 1, 1), y = (012 , 10, 11) has no solution
For each substring xi € x and yi € y, we have |xi | < | yi | for all i. Hence the string generated by a
sequence of substrings of x is shorter than the string generated by the sequence of corresponding
substrings of y. Therefore, the PCP has no solution.
Note: If the first substring used in PCP is always xl and y1 then the PCP is known as the Modified
Post Correspondence Problem.

If L1 and L2 are recursive languages then Show that L1 U L2 is also recursive language.
OR
Show that the recursive languages are closed under union.
Proof:
Let L1 and L2 are recursive languages.
As L1 and L2 are recursive languages there exists a TM M1 that accepts L1 and M2 that accepts L2.
Now we have to simulate a TM M that accepts the language L such tat L = L1 U L2
Construction of TM M is as follows:

If the string w € L1 U L2, then it implies that either w € L1 or w € L2 or w belongs to both L1 and L2.
That means TM M1 accepts the string w; if w € L1 or TM M2 accepts the string w; if w € L2
Also M1 and M2 accept the string w if it belongs to both L1 and L2. Thus the simulated TM M
produces the output as Accept (yes).
Similarly if the string w does not belong to L1 U L2 then it implies that string w does not belongs L1
as well as L2, resulting in both the machine M1 and M2 produces the output as Reject. Thus the
simulated TM M also produces the output as Reject (No).

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 348


Automata Theory & Compiler Design 21CS51 Module 5

Thus the TM M accepts the language L = L1 U L2 is recursive. ( it produces only two outputs as Y
or N)
If L1 and L2 are recursively enumerable languages then Show that L1 U L2 is also recursively
enumerable language.
OR
Show that the recursively enumerable languages are closed under union.
Proof:
Let L1 and L2 are recursively enumerable languages.
As L1 and L2 are recursively enumerable languages there exists a TM M1 that accepts L1 and M2
that accepts L2. Now we have to simulate a TM M that accepts the language L such tat L = L1 U L2
Construction of TM M is as follows:

If the string w € L1 U L2, then it implies that either w € L1 or w € L2 or w belongs to both L1 and L2.
That means TM M1 accepts the string w; if w € L1 or TM M2 accepts the string w; if w € L2
Also M1 and M2 accept the string w if it belongs to both L1 and L2. Thus the simulated TM M
produces the output as Accept (yes).
Similarly if the string w does not belong to L1 U L2 then it implies that string w does not belongs L1
as well as L2, resulting in both the machine M1 and M2 produces the output as Reject. Thus the
simulated TM M also produces the output as Reject (No).
Sometimes the string w which does not belongs to L1, results in M1s output as entering into loop
forever or the string w which does not belongs to L2, results in M2s output as entering into loop
forever or the string w which does not belongs to both L1 and L2, results in M1s output as entering
into loop forever and M2s output as entering into loop forever. So the simulated TM M producing
output as loop forever.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 349


Automata Theory & Compiler Design 21CS51 Module 5

Thus the TM M accepts the language L = L1 U L2 is recursively enumerable. (since it produces the
outputs as Y or N or loop forever)
Show that complement of a recursive language is also recursive language.
OR

If L is recursive then there exist a TTM with two outputs Yes (Accept) or No (Reject). Thus the
machine halts and T(M) = L

Let us construct a new machine M1 such that L‟ = T(M1) with the following steps

1. Accepting states of M are made non-accepting states of M1 and there is no transition from
that state in M1. That means we have created a state in M1 that will HALT without
accepting.
2. Now create a new accepting state for M1 say „qf‟ and there is no transition from r.
3. If q is a non- accepting state of M and δ (q, x) is not defined, then add a transition from q to
qf for M1
Since M is guaranteed to halt M1 is also guaranteed to Halt. In fact M1 accepts exactly those
strings that M does not accept. Thus we can say that M1 accepts

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 350


Automata Theory & Compiler Design 21CS51 Module 5

So the complement of recursive language L ie: is also recursive.


If L and both are recursively enumerable then show that L is recursive
Let M1 and M2 be two TMs such that language L is accepted by M1 and the complemented
language is accepted by M2 ie:

We construct a new two tape TM M by simulating M1 on tape 1 and M2 on tape 2 as follows:

If the input w belongs to L then M1 accepts w and we declare the machine M accepts w.
If the input string w does not belongs to L. ie w € then M2 accepts w and we declare that M
halts without accepting (reject). Thus in both the cases, M eventually Halts. By the construction
of M it is clear that T(M) = T(M1) = L.
Hence L is recursive.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 351


Automata Theory & Compiler Design 21CS51 Module 5

COMPLEXITY
The efficiency of an algorithm can be decided by measuring the performance of an algorithm. We
can measure the performance of an algorithm by computing two factors:
i. Amount of time required by an algorithm to execute
ii. Amount of storage required by an algorithm.
Hence we define two terms- Time complexity and space complexity.
Time complexity: of an algorithm means the amount of time taken by algorithm to run. By
computing time complexity we come to know whether the algorithm is slow or fast
Space complexity of an algorithm means the amount of space (memory) taken by an algorithm. By
computing space complexity we can analyze whether an algorithm requires more or less space.
To select the best algorithm, we need to check efficiency of each algorithm. The efficiency can be
measured by computing time complexity of each algorithm. Asymptotic notations such as Ω, Ѳ and
O is the shorthand way to represent the time complexity. Using this notation we can give time
complexity as “fastest possible”, ”slowest possible or average time.
Big Oh Notation: The Big oh notation is denoted by O is a method of representing the upper bound
of algorithm‟s running time. Using Big Oh notation we can give longest amount of time taken by
algorithm to complete.
Definition of Big Oh notation:
Let f(n) and g(n) be two non-negative functions
Let n0 and constant c are two integers such that n0 denotes some value of input and n > n0. Similarly
c is some constant such that c > 0 we can write
f(n) ≤ c * g(n)
then f(n) is Big Oh of g(n). ie: f(n) = O(g(n))
Consider the function f(n) = 2n +2 and g(n) = n2 Find some constant c, so that f(n) <= c* g(n)
Find c when n = n0 = 1
f(n) = 2n +2 and g(n) = n2
f(n) = 4 and g(n) = 1 f(n) > g(n)
n= 2
f(n) = 6 and g(n) = 4 f(n) > g(n)
n= 3
f(n) = 8 and g(n) = 9 f(n) < g(n)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 352


Automata Theory & Compiler Design 21CS51 Module 5

Hence we can conclude that for n > 2, we obtain f(n) < g(n)
Thus the upper bound of existing time is obtained by Big Oh notation.
GROWTH RATE OF FUNCTIONS
When we have two algorithms for the same problem, we may require a comparison between the
running times of these two algorithms.
Measuring the performance of an algorithm in relation with the input size ‘n’ is called order of
growth.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 353


Automata Theory & Compiler Design 21CS51 Module 5

The classes P and NP problems


Problems can be classified under two groups
1. P- problem: Problem can be solved in polynomial time.
Searching of an element from the list O(logn), Sorting of elements of O(logn)
2. NP-problem: Problem can be solved in non-deterministic polynomial time.
Knapsack problem O(2n/2) and travelling salesperson problem (O(n)).
P problem:
A Turing machine M is said to be of time complexity T(n) if the following holds:
Given an input w of length n, M halts after making at most T(n) moves.

Construct the time complexity T(n) for the Turing machine M to accept the language L = {an bn | n
≥ 1}
• TM Consists of going through the input string (anbn) forward and backward and replacing
the leftmost a by X and the leftmost b by Y. So we require at most 2n moves to match a 0
with a 1.
• Repetition of the above step requires n number of times.
• Hence the number of moves for accepting an bn is at most (2n) n
For strings not of the form an bn , TM halts with less than 2n2 steps.
Hence T(n) = O(n2).

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 354


Automata Theory & Compiler Design 21CS51 Module 5

NP Problem

Quantum Computation: is the area of study that focuses on development of computer technology
based on the principle of quantum theory
Quantum Computer:
We know that a bit (a 0 or a 1) is the fundamental concept of classical computation and
information. Classical computer is built from an electronic circuit containing wires and logical
gates. Let us study quantum bits and quantum circuits which are analogous to bits and (classical)
circuits.
Quantum computer maintains a sequence of qubits. Qubit can be mathematically described as:

The classical computer bits has two states 0 and 1.The two possible states for a qubit are the states:
|0 > and |1 >. Qubit is represented using the notation | >. Qubit can be in infinite number of states
other than | 0 > and | 1 > . It can be in state

Where α and β are complex numbers such that


|α|2 + |β|2 = 1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 355


Automata Theory & Compiler Design 21CS51 Module 5

It is not possible to obtain the quantum states by observation, whereas in classical computer bit 0
and 1 can be observed.
Multiple qubits can be defined.
Example: Two qubit system has 4 basis states
|0 0>
|0 1>
|10 >
|11 >
Quantum states can be:

Qubit for Logical Not gate


It is possible to define logical gates using qubit The classical NOT gate changes the 0 to 1 and 1 to
0. Incase of qubit NOT gate

is changed to

The action of qubit NOT gate can be represented using matrix as:

Thus the quantum computer is a system built from quantum circuits, containing wires and
elementary quantum gates to carry out manipulation of quantum information
CHURCH-TURING THESIS
Any algorithm that can be performed on any computing machine can be performed on a Turing
machine as well.
Any algorithmic process can be simulated efficiently by a Turing machine
• But a challenge to the strong Church-Turing thesis arose from analog computation.
• Certain types of analog computers solved some problems
efficiently whereas these problems had no efficient
solution on a Turing machine. But when the presence of noise was taken into account, the
power of the analog computers disappeared.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 356


Automata Theory & Compiler Design 21CS51 Module 5

• In mid-1970s. Robert Solovay and Volker Strassen gave a randomized algorithm for testing
the primality of a number. (A deterministic polynomial algorithm was given by Manindra
Agrawal, Neeraj Kayal and Nitein Saxena of IIT Kanpur in 2003) This led to the
modification of the Church thesis.
Strong Church-Turing Thesis: Any algorithmic process can be simulated efficiently using a

non-deterministic Turing machine

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 357


Automata Theory & Compiler Design 21CS51 Module 5

Write a Short Note on the following:


1. ****** MUTI-TRACK TURING MACHINE
In standard Turing machine, a single tape was used. In a multiple track TM, a single tape is assumed to
be divided into several tracks. Now the tape alphabet is required to consist of k-tuples of tape symbols,
k being the number of tracks. Hence the only difference between the standard TM and the TM with
multiple tracks is the set of tape symbols.
The language accepted by multiple track tape TM is also accepted by single tape TM.
In the case of the standard Turing machine, tape symbols are elements of Γ; in the case of TM with
multiple track, it is Γk. The moves are defined in a similar way
Example:

Here the input symbols are tape symbols defined in 3 tracks. ie: Γ3 ;for example input is [c, a, b]
δ ( q, c, a, b) = (qnext, Z, X, Y, R)
The resultant tape structure is as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 358


Automata Theory & Compiler Design 21CS51 Module 5

2. ********MULTI-TAPE TURING MACHINE


An extended TM model has some fixed number of tapes greater than one, is called Multi-tape TM.

A multi-tape TM has a finite set Q of states, an initial state q0, a subset F of Q called the set of final
states, a set P of tape symbols, a new symbol B not in P called the blank symbol.
• There are k tapes, each divided into cells. The first tape holds the input string w.
• Initially all the other tapes hold the blank symbol.(B)
• Initially the head of the first tape (input tape) is at the left end of the input w.
• All the other heads can be placed at any cell initially.
• δ is a partial transition function from Q x Гk into Q x Гk x {L, R, S}k. where k is the
number of tapes.
• Multi-tape TM is more powerful than single tape TM but the language accepted by Multi-
tape TM is recursively enumerable language. That means language accepted by Multi-tape
TM is also accepted by basic or standard TM. Multi-tape TM and standard TM are
equivalent.
The Multi-tape TM M enters a new state.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 359


Automata Theory & Compiler Design 21CS51 Module 5

 On each tape a new symbol is written in the cell under the head.
 Each tape head moves to the left or right or remains stationary. The heads move
independently: some move to the left, some to the right and the remaining heads do not
move.
 The initial ID has the initial state q0, the input string w in the first tape (input tape), empty
strings of B's in the remaining k - 1 tapes.
 An accepting ID has a final state, some strings in each of the k tapes.

Example: δ(q, a, b, c) = (p, X, Y, Z, L, R, R)

3.********* NON-DETERMINISTIC TURING MACHINE


The non-deterministic Turing machine is a kind of TM in which the set of rules denote more than one
specific action reading particular input in current specific state.
A Non-deterministic TM can be formally defined as M: which is a 7-tuple, namely (Q, ∑ ,Г, δ, q0. B,
F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols and is a subset of Г and B ≠ ∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → power set of Q x Г x { L/R}
• q0 € Q is the initial state, and
• F is the subset of Q is the set of final states.
It is clear from the definition of δ that for each state q and tape symbol X, δ( q, Г) is a set of triples
δ( q, Г) = {( q1, Г1 , R), ( q1, Г2 , L) , ( q1, Г3 , R) ………………..}
The non-deterministic TM in fact is no more powerful than the deterministic TM. Any language
accepted by non-deterministic TM can be accepted by deterministic TM.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 360


Automata Theory & Compiler Design 21CS51 Module 5

4.******* LINEAR BOUNDED AUTOMATA


A linear bounded automaton is a non-deterministic Turing machine which has a single tape whose
length is not infinite but bounded by a linear function of the length of the input string.. The set of
context-sensitive languages is accepted by this model.
LBA is formally defined as M: which is a 9-tuple, namely (Q, ∑ ,Г, δ, q0. B, , $, F) where
• Q is a finite nonempty set of states.
• Г is a finite nonempty set of tape symbols,
• B is the blank symbol.
• ∑ is a nonempty set of input symbols with two special symbols ,$ and is a subset of Г and B
≠∑
• δ is the transition function mapping (q, x) onto (q‟, y, D) where D denotes the direction of
movement of R/W head: D = L or R according as the movement is to the left or right.
Q X Г → Q x Г x { L/R}
• q0 € Q is the initial state, and
F is the subset of Q is the set of final states
is the left end marker, which is entered in the leftmost cell of the input tape and prevents the
R/W head from getting off the left end of the tape.
$ is the right end marker, which is entered in the rightmost cell of the input tape and prevents
the R/W head from getting off the right end of the tape. Both the end markers should not
appear on any other cell within the input tape. R/W head should not print any other symbol
over both the end markers.
Linear Bounded Automata Model is as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 361


Automata Theory & Compiler Design 21CS51 Module 5

There are two tapes: one is called the input tape, and the other, working tape.
• On the input tape the head never prints and never moves to the left.
• On the working tape the head can modify the contents in any way, without any restriction.
5. RECURSIVE ENUMERABLE LANGUAGES
A language L which is a subset of ∑* is a recursively enumerable language if there exists a TM M,
such that L = T(M). ( Halt or enter into infinite loop).
That means languages accepted by a TM are called RE languages.

Structure of RE languages:
1. An algorithm has a TM that not only recognizes the language, but it tells us when it has
decided the input string is not in the language, such a TM always halts eventually regardless
of whether or not it reaches an accepting state.
2. Language consists of those RE languages that are not accepted by a TM with the guarantee
of halting. These languages are accepted in an inconvenient way ie:
i. If the is in the language, then it is accepted by TM

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 362


Automata Theory & Compiler Design 21CS51 Module 5

ii. If input is not in the language then the TM may run forever, and we shall never be
sure the input won‟t be accepted eventually.
6. RECURSIVE LANGUAGES
A language L which is a subset of ∑* is a recursive language if there exists a TM M, that satisfies
the following two conditions.
i. If the string w is defined in the language, then the TM accepts the string w and
Halts.
ii. If the string w is not defined in the language, then the TM eventually Halts without
reaching an accepting state.

Recursive language definition assures us that TM always Halts. It is clear that a recursive language
is subset of recursively enumerable language.
Recursive languages are also called as decidable languages. The TM that always halt irrespective of
whether they accept or not is a good model for an algorithm. If an algorithm exist to solve a
problem, then the problem is decidable otherwise un-decidable if it is not a recursive language.
The existence or non-existence of an algorithm to solve a problem is often more important than the
existence some TM to solve the problem. Thus dividing the languages into decidable and un-
decidable languages is often more important than the division of languages as Recursively
enumerable( Those have some sort of TM) or non-recursively enumerable languages ( which have
no TM at all)
The relationship between the classes of languages are as shown below:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 363


Automata Theory & Compiler Design 21CS51 Module 5

7. **********DECIDABLE and UN-DECIDABLE LANGUAGES


A language L is said to be decidable language if the corresponding language L is recursive
language. That is a problem with two answers Yes/No is decidable if the corresponding language is
recursive. The class of decidable problems is called as solvable problems.
Decidability of regular languages:
Problem of testing whether a deterministic finite automaton accepts a given input string w is
decidable?. Yes it is decidable.
 The context-free grammar G accepts the input string w is decidable
 The context-sensitive grammar G accepts the input string w is decidable
Un-decidable language: If the language L is not a recursive language, then such a language is called
un-decidable language.
If there is any language L which is recursively enumerable then there exists a TM which semi-
decides it(either accept or reject or loops forever). Every TM has description of finite length. Hence
the number of TM and number of RE languages is count-ably infinite.
 ATM = {(M, w) | The TM M accepts w} is un-decidable.
 Halting problem of TM is un-decidable.
8. ************THE POST CORRESPONDENCE PROBLEM
The un-decidability of strings can be determined with the help of Post Correspondence problem
(PCP). The PCP can be defined as:
The post correspondence problem consists of two lists of strings that are equal length over the input
∑. The two lists are:
X = ( x1, x2, x3, x4,……………………. xn)
Y = ( y1, y2, y3, y4,……………………. yn)
We say that there exists a post correspondence solution for pair (X, Y) if there is a non-empty
sequence of integers i1,i2……………im such that
x1 x2…………………….xn = y1 y2…………………….yn where 1 ≤ ij ≤ n
To solve the Post correspondence problem we try all the combination of sequences
i1,i2……………im ;The indices ij need not be distinct and m may be greater than n.
To find xi = yi then we say that PCP has a solution.
If there exists a solution to PCP, then there exist infinitely many solutions.
Example:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 364


Automata Theory & Compiler Design 21CS51 Module 5

Does the PCP with two lists x = (b, bab3, ba) and y = (b3, ba, a) have a solution?
We have to determine whether or not there exists a sequence of substrings of x such that the string
formed by this sequence and the string formed by the sequence of corresponding substrings of y are
identical.
x = (b, bab3, ba) and y = (b3, ba, a)
The required sequence is given by:
i1 = 2, i2 = 1, i3 = 1, i4 = 3, ie: (2, 1, 1, 3) and m = 4

bab3bbba = bab3b3a
Thus the PCP has a solution.
If the first substring used in PCP is always xl and y1 then the PCP is known as the Modified Post
Correspondence Problem

9. **************HALTING PROBLEM OF TURING MACHINE


The halting problem is a critical problem in computability theory. It is the theoretical problem of
determining whether a computer program will halt (produce an answer) or loop forever on a given
input. In 1936, Alan Turing proved that the halting problem is un-decidable, and
therefore, cannot be solved.
One can determine the decidability of other computational problems by combining reduction
techniques and the un-solvability of the halting problem. Reduction technique is used to prove the
un-decidability of halting problem of Turing machine.
For example, if A is the problem of finding some root of x4 - 3x2 + 2 = 0 and B is the problem of
finding some root of x2 - 2 = 0, then A is reducible to B. As x2 - 2 is a factor of x4 - 3x2 + 2 a root of
x2 - 2 = 0 is also a root of x4 - 3x2 + 2 = 0
If A is reducible to Band B is decidable then A is decidable. If A is reducible to B and A is un-
decidable then B is un-decidable.
It is not possible to find the answer for halting problem by simulating the action of TM on string w
by universal TM, because there is no limit on the length of the computation. If M enters into an
infinite loop, then no matter how long we wait, we can never be sure that M is in fact in a loop. The

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 365


Automata Theory & Compiler Design 21CS51 Module 5

machine may be in a loop because of very long computation. What is required is an algorithm that
can determine the correct answer for any M and w by performing some analysis on the machine‟s
description and the input..
Formally the Halting problem of TM is stated as “given arbitrary TM M = (Q, ∑ ,Г, δ, q0. B, F) and
the input w € ∑*, does M halt on input w?”
Thus the Halting problem of a TM M is a collection of strings (language) having a format called
(M, w) in such a way that TM M with input strings w. If M halts on w then really M is a TM and w
is some input string. We collect all (M, w) strings in the form of language. The haltingproblem of a
TM is un-decidable.
10. P and NP PROBLEMS
Problems can be classified under two groups:
1. P- problem: Problem can be solved in polynomial time.
Searching of an element from the list O(logn), Sorting of elements of O(logn)
2. NP-problem: Problem can be solved in non-deterministic polynomial time.
Knapsack problem O(2n/2) and travelling salesperson problem (O(n)).
P problem:
A Turing machine M is said to be of time complexity T(n) if the following holds:
Given an input w of length n, M halts after making at most T(n) moves
A language L is in class P if there exists some polynomial T(n) such that L = T(M) for some
deterministic TM M of time complexity T(n).
Construct the time complexity T(n) for the Turing machine M to accept the language L = {an bn | n
≥ 1}
• TM Consists of going through the input string (anbn) forward and backward and replacing
the leftmost a by X and the leftmost b by Y. So we require at most 2n moves to match one a
with one b.
• Repetition of the above step requires n number of times.
• Hence the number of moves for accepting an bn is at most (2n) n
For strings not of the form an bn, TM halts with less than 2n2 steps.
Hence T(n) = O(n2).

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 366


Automata Theory & Compiler Design 21CS51 Module 5

NP Problem
A language L is in class NP if there is a non-deterministic TM M and a polynomial time complexity
T(n) such that L = T(M) and M executes at most T(n) moves for every input w of length n.
We have seen that a deterministic TM M1 simulating a non-deterministic TM M exists. If T(n) is
time complexity of M, then the complexity of the equivalent deterministic TM M1 is 2O(T(n))
11. COMPLEXITY of an ALGORITHM
The efficiency of an algorithm can be decided by measuring the performance of an algorithm. We
can measure the performance of an algorithm by computing two factors:
i. Amount of time required by an algorithm to execute
ii. Amount of storage required by an algorithm.
Hence we define two terms- Time complexity and space complexity.
Time complexity: of an algorithm means the amount of time taken by algorithm to run. By
computing time complexity we come to know whether the algorithm is slow or fast
Space complexity of an algorithm means the amount of space (memory) taken by an algorithm. By
computing space complexity we can analyze whether an algorithm requires more or less space.
To select the best algorithm, we need to check efficiency of each algorithm. The efficiency can be
measured by computing time complexity of each algorithm. Asymptotic notations such as Ω, Ѳ and
O is the shorthand way to represent the time complexity. Using this notation we can give time
complexity as “fastest possible”, ”slowest possible or average time.
Big Oh Notation: The Big oh notation is denoted by O is a method of representing the upper bound
of algorithm‟s running time. Using Big Oh notation we can give longest amount of time taken by
algorithm to complete.
When we have two algorithms for the same problem, we may require a comparison between the
running times of these two algorithms.
Measuring the performance of an algorithm in relation with the input size ‘n’ is called order of
growth. In particular the exponential function grows at a very fast rate when compared to any
polynomial of large degree. We prove a precise statement comparing the growth rate of
polynomials and exponential function.
A Turing machine M is said to be of time complexity T(n) if the following holds:
Given an input w of length n, M halts after making at most T(n) moves:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 367


Automata Theory & Compiler Design 21CS51 Module 5

A language L is in class P if there exists some polynomial T(n) such that L = T(M) for some
deterministic TM M of time complexity T(n). In the case of an algorithm T(n) denotes the running
time for solving a problem with an input of size n.
We have seen that a deterministic TM M1 simulating a non-deterministic TM M exists. If T(n) is
time complexity of M, then the complexity of the equivalent deterministic TM M1 is 2O(T(n))
It is not known whether the complexity of M1 is less than 2O(T(n)).
12. *********QUANTUM COMPUTER
A quantum computer is a system that built from quantum circuits, containing wires and elementary
quantum gates, to carry out manipulation of quantum information.
A classical computer has a memory made up of bits of 0 and 1. The quantum computers maintain a
sequence of qubits that can be represented mathematically as:

The classical computer bits have two states 0 and 1.The two possible states for a qubit are the
states: |0 > and |1 >. Qubit is represented using the notation | >. Qubit can be in infinite number of
states other than | 0 > and | 1 >. It can be in state

Where α and β are complex numbers such that


|α|2 + |β|2 = 1

Multiple qubits can be defined.


Example: Two qubit systems has 4 basis states
|00> |01> |10 > |11 >
Quantum states can be:

Qubit for Logical Not gate

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 368


Automata Theory & Compiler Design 21CS51 Module 5

It is possible to define logical gates using qubit. The classical NOT gate changes the 0 to 1 and 1 to
0. Incase of qubit NOT gate

is changed to

The action of qubit NOT gate can be represented using matrix as:

13. *******CHURCH-TURING THESIS


Any algorithm that can be performed on any computing machine can be performed on a Turing
machine as well. Any algorithmic process can be simulated efficiently by a Turing machine. But a
challenge to the strong Church-Turing thesis arose from analog computation. Certain types of
analog computers solved some problems efficiently whereas these problems had no efficient
solution on a Turing machine. But when the presence of noise was taken into account, the power of
the analog computers disappeared. A deterministic polynomial algorithm was given by Manindra
Agrawal, Neeraj Kayal and Nitein Saxena of IIT Kanpur in 2003. This led to the modification of
the Church-Turing thesis, as: Any algorithmic process can be simulated efficiently using a non-
deterministic Turing machine.
Later in 1985 David Deutsch tried to build computing devices using quantum mechanics in such a
way that computers are physical objects, and computations are physical process. What computers
can or cannot compute is determined by the law of physics alone, and not by pure mathematics. But
it is not known Deutsch‟s notion of universal quantum computer will efficiently simulate any
physical process. In 1994 Peter Shor proved that any problem could be solved by a quantum
computers are more efficient than Turing machines and classical computers.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 369


Automata Theory & Compiler Design 21CS51 Module 5

14. UNIVERSAL TURING MACHINE


Structure of Universal Turing Machine:

Universal Turing Machine is a Turing machine that can simulate an arbitrary Turing machine on
arbitrary input. The universal machine essentially achieves this by reading both the description of
the Turing machine to be simulated as well as the input thereof from its own tape.
The language accepted by the Universal TM is called a universal language. It contains multiple
tapes. Single TM can be used as a stored program computer, taking its program as well as its data
from one or more tapes on which input is placed. The same idea is implemented in Universal TM. It
is easiest to describe universal TM U as a multi-tape TM with transitions of M are stored initially
on the first tape, along with the string w. A second tape will be used to hold the simulated tape of
M, using the same format as for the code of M. That is tape symbol Xi of M will be represented by
0i, and tape symbol will be separated by single 1‟s, the third tape of U holds the state of M, with
state qi represented by i 0‟s. The universal TM U accepts the coded pair (M, w) if and only if M
accepts w. If M rejects w then the machine U also rejects w. Also if the machine M for input w
enters a loop forever, then U also does the same.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 370


Automata Theory & Compiler Design 21CS51 Module 5

SYNTAX DIRECTED DEFINITION


It is not possible for a CFG to represent certain properties such as uniqueness in type declarations
or type compatibility in performing arithmetic operations. In the compilation process these are
certain features, which are beyond the syntax of the programming language. Hence simply syntax
analysis is not sufficient for the language to get compiled; we need something more than the syntax
analysis. Hence semantic analysis is done to handle the issues that are beyond the syntactic
definition.
The specification of a programming constructs translation in a programming language involves:
 Specifying what construct is?
 Specifying the translating rules for the construct.
The translation does not necessarily mean generating either intermediate code or object code.
Translation also involves adding information into the symbol table as well as performing
programming construct-specific computations.
Example:
If programming construct is: a declarative statement, then translation adds the information about
the construct‟s type attribute into the symbol table.
If programming construct is : an expression, then translation generates the code for evaluating the
expression.
Translation of a construct involves manipulating the values of various quantities. For example int a,
b, c the compiler needs to extract the type int and add it to the symbol table records of a, b and c.
This requires that the compiler keep track of the type int as well as the pointers to the symbol table
records containing a, b and c.
Since the syntactic structure of the programming construct‟s are specified using CFG, we extend
that CFG by associating sets of attributes with the grammar symbols and the set of semantic rules
with the productions. These extensions allow us to specify the translations.
Overview:
 Grammar symbols are associated with attributes to associate information with the
programming language constructs that they represent.
 Values of these attributes are evaluated by the semantic rules associated with the
production rules.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 371


Automata Theory & Compiler Design 21CS51 Module 5

Evaluation of these semantic rules:


• may generate intermediate codes.
• may put information into the symbol table.
• may perform type checking.
• may issue error messages
• may perform some other activities in fact, they may perform almost any activities.
An attribute may hold almost anything.
• a string, a number, a memory location, a complex record.
Attributes for expressions:
• type of value: int, float, double, char, string,…
• type of construct: variable, constant, operations, …
Attributes for constants: values
Attributes for variables: name, scope
Attributes for operations: operands, operator,…
When we associate semantic rules with productions, we use two notations:
 Syntax-Directed Definitions
 Translation Schemes
Syntax-Directed Definitions:
 give high-level specifications for translations
 Hide many implementation details such as order of evaluation of semantic actions.
 We associate a production rule with a set of semantic actions, and we do not say when they
will be evaluated.
Translation Schemes:
 Indicate the order of evaluation of semantic actions associated with a production rule.
 In other words, translation schemes give a little bit information about implementation
details.
Syntax directed definition (SDD) :
Explain the concept of syntax directed definition with examples (6)
To translate a programming language construct compiler has to keep track of many quantities such
as the type of the construct, location of the first instruction in target code or the number of

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 372


Automata Theory & Compiler Design 21CS51 Module 5

instructions generated. A formalist called as syntax directed definition is used fort specifying
translations for programming language constructs.
Definition: A syntax directed definition (SDD) is a context free grammar together with attributes
and rules. Attributes are associated with grammar symbol and rules are associated with
productions.
Semantic rules: set up dependencies between attributes which can be represented by a dependency
graph.
This dependency graph determines the evaluation order of these semantic rules.
Evaluation of a semantic rule defines the value of an attribute. But a semantic rule may also have
some side effects such as printing a value.
If X is a grammar symbol and a is one of its attribute, then we write X.a to denote the value of „a’
at a particular parse tree node labeled X. There are two kinds of attributes for non-terminals :
i. Synthesized Attribute.( S- Attribute): An attribute is said to be synthesized attribute if its
value at a parse tree node is determined from attribute values at the children of the node.
ii. Inherited Attribute: An inherited attribute is one whose value at parse tree node is
determined in terms of attributes at the parent and / or siblings of that node.
The attribute can be string, a number, a type, a, memory location or anything else.
Terminals can have synthesized attributes, but not inherited attributes. Attributes for terminals have
lexical value (lexval) that are supplied by the lexical analyzer.
Example: L → En
E → E1 + T
E→T
T →F
F → digit
We can write the syntax directed definition for the above grammar, by considering all non- terminal
symbol with a synthesized attribute val, and the terminal digit has synthesized attribute lexval.
L.val = E. val
E.val = E1. val + T. val
T.val = F. val
F. val = digit.lexval

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 373


Automata Theory & Compiler Design 21CS51 Module 5

Define inherited and synthesized attribute.


Synthesized attribute: An attribute is said to be synthesized attribute if its value at a parse tree
node is determined from attribute values at the children of the node.
Example: Production rule E → E1 + T
Let us consider the non-terminal grammar symbol E and T has synthesized attribute say „val‟. The
attribute value E.val is computed only after computing the attribute values at the children
( E1.val and T.val ) of the node E1 and T.
ie: E.val = E1. val + T.val
Inherited Attribute: An inherited attribute is one whose value at parse tree node is determined in
terms of attributes at the parent and / or siblings of that node.
Example: T → F T‟
F → id

Here F and T‟ are siblings, F and T has a synthesized attribute val and the id has a synthesized
attribute lexval. The non-terminal T‟ has inherited attribute inh.
Here the inherited attribute value for T‟ is determined in terms of attributes of its siblings F.val.
ie: T‟.inh = F.val
What do you mean by annotating or decorating the parse tree?
The process of computing the attribute values at the parse tree node is called annotating or
decorating the parse tree.
Annotated Parse Tree:
What is annotated parse tree?
A parse tree showing the values of attributes at each node is called an annotated parse tree.
For the grammar:
L → En
E → E1 + T
E→T
T → T1 * F
T →F
F →(E)
F → digit

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 374


Automata Theory & Compiler Design 21CS51 Module 5

i. Obtain the SDD for the above.


ii. Construct annotated parse tree for the string 3 * 5 + 4n
i. SDD for the given grammar:
Let us consider each of the non-terminals has a single synthesized attribute called val. Also the
terminal digit has a synthesized attribute lexval, which is an integer value returned by the lexical
analyzer.
Here n is used as an end marker, ie: evaluate the expression terminated by n (L → En)
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
E→T E. val = T. val
T → T1 * F T. val = T1 . val * F.val
T →F T. val = F . val
F →(E) F. val = E . val
F → digit F. val = digit . lexval
ii. Annotated parse tree for the strings 3 * 5 + 4n
L.val = 19

E.val = 19 n

E.val = 15 + T.val = 4

T.val = 15 F.val = 4

T.val = 3 * F.val = 5 digit .lexval = 4

F.val = 3 digit .lexval = 5

digit .lexval = 3

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 375


Automata Theory & Compiler Design 21CS51 Module 5

Write the syntax-directed definition for simple desk calculator and give parse tree and annotated
parse tree for the expression:
i. (7 – 2 ) * (8 – 1 )n
ii. 5 + 6 * 7;
iii. 4 * ( 3 + 5) – 7
Context free grammar for simple desk calculator is given by:
L → En
E → E1 + T
E → E1 - T
E→T
T → T1 * F
T → T1 / F
T →F
F →(E)
F → digit
Syntax directed definition for Simple Desk Calculator:
Let us consider each of the non-terminals has a single synthesized attribute called val. Also the
terminal digit has a synthesized attribute lexval, which is an integer value returned by the lexical
analyzer.
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
E → E1 - T E. val = E1 . val - T. val
E→T E. val = T. val
T → T1 * F T. val = T1 . val * F.val
T → T1 / F T. val = T1 . val / F.val
T →F T. val = F . val
F →(E) F. val = E . val
F → digit F. val = digit . lexval

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 376


Automata Theory & Compiler Design 21CS51 Module 5

Parse tree for the strings (7 – 2 ) * (8 – 1 )n

Annotated parse tree for the strings (7 – 2 ) * (8 – 1 )n

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 377


Automata Theory & Compiler Design 21CS51 Module 5

Parse tree for the expression: 5 + 6 * 7;

Annotated parse tree for the expression: 5 + 6 * 7;

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 378


Automata Theory & Compiler Design 21CS51 Module 5

Parse tree for the expression: 4 * ( 3 + 5) – 7

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 379


Automata Theory & Compiler Design 21CS51 Module 5

The SDD to translate binary integer number to decimal is shown below:


Productions Semantic rules
BN → L BN.val = L.val
L → L1B L.val = 2 x L1.val + B.val
L → B L.val = B.val

B → 0 B.val = 0

B →1 B.val = 1

Construct the parse tree and annotated parse tree for the input string :11001
Parse tree for the input string:11001:
BN

L
L1 B

L1 B 1

L1 B 0

L1 B 0

B 1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 380


Automata Theory & Compiler Design 21CS51 Module 5

Annotated parse tree for the input string:11001


BN.val = 25

L.val =25
L1. Val =12 B.val =1

L1.val = 6 B.val = 0 1 lexval

L1. val =3 B.val = 0 0 lexval

L1. val =1 B.val =1 0 lexval

B.val =1 1 lexval

1 lexval

Example for inherited attribute:


1 A → BC
B → D
C→ ɛ
D → 1
Annotated Parse tree:
A.val = 1
B .val =1 C. inh =1
C.syn =
1

D.val =1 ɛ

1 lexval
From the above annotated parse tree we observe that A, B, C has synthesized attribute say val.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 381


Automata Theory & Compiler Design 21CS51 Module 5

D.val = lexval 1
B.val = D.val = 1 (from definition of synthesized attribute)
C has two attributes; an inherited attribute say C.inh and synthesized attribute say C.syn.
From C → ɛ ; directly, we cannot compute the synthesized attribute of C.syn. Therefore we have
to compute its another attribute C.inh.
From A → BC production, B and C are siblings, so the attribute value at node B, ie: B.val =1 is
inherited by C. Therefore we can pass this B.val to C as
C.inh = B.val
From C → ɛ ;by computing the synthesized attribute value of C at node C itself,
C.syn = C. inh = 1

2. T’ → *F T‟
T’ → ɛ
F → digit
T‟. inh = 8
T‟.syn = 24

* F.val =3 T1’. inh = 24


T1’ . syn =24

digit. lexval = 3 ɛ

Assume that T‟.inh = 8 and F.val = digit.lexval =3


T1‟ . inh is computed by multiplying the attribute value at parent node T‟ with the attribute value at
F.
ie: T1‟.inh = T.inh x F.val = 8 x 3 = 24

T1‟ .syn = T1‟.inh ( by production T’ → ɛ)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 382


Automata Theory & Compiler Design 21CS51 Module 5

For the given grammar, write annotated parse tree for 3 * 5 using top-down approach. Write
semantic rule for each step.
T → F T‟
T’ → *F T‟
T’ → ɛ
F → digit
Let us consider each of the non-terminals T and F has a single synthesized attribute called val. Also
the terminal digit has a synthesized attribute lexval, which is an integer value returned by the
lexical analyzer. The non-terminal T‟ has two attributes: an inherited attribute say inh and a
synthesized attribute say syn.
Annotated parse tree for the input 3 * 5:
T.val = 15

F. val = 3 T’. inh = 3


T’. syn = 15

digit.val = 3 * F . val = 5 T1‟.inh = 15


T1‟.syn = 15

digit.lexval =5 ɛ
Semantic rules:
Productions Semantic Rules
T → F T‟ T‟.inh = F.val
T.val = T‟.syn
T’ → *F T1‟ T1‟.inh = T‟.inh x F.val
T‟.syn = T1‟.syn
T’ → ɛ T1‟.syn = T1‟.inh
F → digit F.val = digit. lexval

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 383


Automata Theory & Compiler Design 21CS51 Module 5

Write annotated parse tree for 3 * 5 + 4n using the grammar suitable for top-down parser. Write
semantic rule for each step.
Answer:
For the above problem, we can make use of desk calculator grammar without left-recursion. That is
in top-down approach we have to eliminate left-recursion from the grammar. The equivalent
grammar without left recursion is given by:
E → T E‟
E‟ → +T E1‟
E‟ → ɛ
T → F T‟
T‟ → *F T‟
T‟ → ɛ
F → digit
Annotated parse tree for the input 3 * 5 + 4n :

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 384


Automata Theory & Compiler Design 21CS51 Module 5

Semantic rules:
Productions Semantic Rules
E → T E‟ E.val = E‟.syn
E‟.inh = T.val
E‟ → +T E1‟ E1‟.inh = E‟.inh + T.val
E‟.syn = E1‟.syn
E‟ → ɛ E1‟.syn = E1‟.inh
T → F T‟ T‟.inh = F.val
T.val = T‟.syn
T’ → *F T1‟ T1‟.inh = T‟.inh x F.val
T‟.syn = T1‟.syn
T’ → ɛ T1‟.syn = T1‟.inh
F → digit F.val = digit. lexval

Write the syntax directed definition for the simple desk calculator and also write the annotated
parse tree for the expression:
i. ( 3 + 4) * ( 5 + 6)n
ii. 1 * 2 * 3 * (4 + 5 )n
iii. ( 9 + 8 * ( 7+ 6 ) + 5) * 4n
Syntax directed definition:
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val

E→T E. val = T. val

T → T1 * F T. val = T1 . val * F.val

T →F T. val = F . val

F →(E) F. val = E . val

F → digit F. val = digit . lexval

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 385


Automata Theory & Compiler Design 21CS51 Module 5

i.) Annotated parse tree for the expression ( 3 + 4) * ( 5 + 6)n

ii. Annotated parse tree for the expression: 1 * 2 * 3 * (4 + 5 )n

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 386


Automata Theory & Compiler Design 21CS51 Module 5

iii. Annotated parse tree for the expression: ( 9 + 8 * ( 7+ 6 ) + 5) * 4n

Give SDD to process a simple variable declaration in „C‟ and give annotated parse tree for the
following expression: int a, b, c
Answer:
A simple declaration say D consisting of a basic type T followed by a list of L identifiers. T can be
int or float. For each identifier on the list, the type is entered into the symbol table entry for the
identifier. Entries can be updated in any order.
Context free grammar for C declaration statement:
D→TL
T→ int
T → float
L → L1 , id
L → id

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 387


Automata Theory & Compiler Design 21CS51 Module 5

Let us consider the non-terminal T has one synthesized attribute, T. type, which is the type in the
declaration D. Non-terminal L has one inherited attribute which we call as L.inh. This L.inh
attribute value is passed to the list of identifiers, so that it can be added to the appropriate symbol
table entries.
The value of the L1.inh is computed at parse tree node by copying the value of L.inh from the parent
of that node (head of the production)
Annotated Parse tree for the input : int a, b, c:
D

T. type = int L.inh = int

int lexval L1. inh = int , id. entry = c

L1. inh =int , id. entry = b

id. entry = a
addType( id.entry, L.inh) function is called whenever an identifier with appropriate type is added
to the symbol table entry.
id.entry, a lexical value that points to a symbol table object and
L.inh, the type being assigned to every identifier on the list.
Syntax Directe Definition ( SDD) for type declaration in C
Productions Semantic rules
D→TL L.inh = T. type
T→ int T. type = integer
T → float T. type = float
L → L1 , id L1. inh = L.inh
addType( id.entry, L.inh)
L → id addType( id.entry, L.inh)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 388


Automata Theory & Compiler Design 21CS51 Module 5

Annotated parse tree for the input float id1, id2, id3

EVALUATION ORDERS FOR SDD’S:


Dependency graphs are a useful tool for determining an evaluation order for the attribute instances
in a given parse tree.
Annotated parse tree shows the values of attributes; whereas dependency graphs helps us to
determine how those values can be computed.
Dependency Graphs:
What is dependency graph? Explain
OR
Describe the methods proposed for evaluating semantic rules.
A dependency graph indicates the flow of information among the attribute instances in a particular
parse tree. This is used to determine the evaluation order for semantic rules (SDD)
Edges express constraints implied by the semantic rules. An edge from one attribute instance to
another means that the value of the first is needed to compute the second. Dependency graph depict
the relationships among inherited and synthesized attributes in a parse tree.
Example: E → E1 + T
Semantic rule E.val = E1.val + E2 .val

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 389


Automata Theory & Compiler Design 21CS51 Module 5

Here dotted line represent the annotated graph and solid line indicates the dependency graph.
NOTE:
If a semantic rule associated with a production p defines the value of inherited attribute B.c in terms
of the value X.a. Then, the dependency graph has an edge from X.a to B.c.
Example: F.val = dgit. Lexval
In dependency graph we have to draw a line (edge) from digit. lexval to F.val.
Draw dependency graph for the expression 3 * 5 by using the desk calculator grammar suitable for
top-down parsing.
SDD for the grammar :
Productions Semantic Rules
T → F T‟ T‟.inh = F.val
T’ → *F T1‟ T 1‟.inh
T.val = =T‟.syn
T‟.inh x F.val
T’ → ɛ T 1‟.syn==TT11‟.syn
T‟.syn ‟.inh
F → digit F.val = digit. lexval
Dependency graph for the expression 3 * 5:

Explanation:
Here the dotted lines represent the parse tree edges and solid lines represent the edges of the
dependency graph.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 390


Automata Theory & Compiler Design 21CS51 Module 5

Dependency graph nodes are represented by the numbers 1 through 9, correspond to the attributes
in the annotated parse tree.
Nodes 1 and 2 represent the attribute lexval associated with the two leaves labeled digit.
Nodes 3 and 4 represent the attribute val associated with the two nodes labeled F.
Edges to node 3 from 1 and to node 4 from 2 result from the semantic rule F.val = digit. lexval, but
the edge in dependency graph represents dependence, not equality
Nodes 5 and 6 represent the inherited attribute T’.inh associated with each of the occurrences of
non-terminal T‟.
The edge to 5 from 3 is due to the rule T’.inh = F.val.
Edges to 6 from 5 for T’.inh and from node 4 for F.val is due to the rule T. val = T1 . val * F.val.
Nodes 7 and 8 represent the synthesized attribute syn associated with the occurrences of T‟.
The edge to 7 from 6 is due to the semantic rule T‟.syn = T‟.inh.
The edge to node 8 from 7 is due to a semantic rule T‟.syn = T1‟.syn
The edge to node 9 from 8 is due to a semantic rule T‟.val = T1‟.syn.
Obtain the syntax directed definition for simple type declarations:
D→TL
T→ int
T → float
L → L1 , id
L → id
Also obtain the dependency graph for a declaration float id1, id2, id3
Solution:
Syntax Directe Definition ( SDD) for type declaration:
Productions Semantic rules
D→TL L.inh = T. type
T→ int T. type = integer
T → float T. type = float
L → L1 , id L1. inh = L.inh
L → id addType( id.entry, L.inh)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 391


Automata Theory & Compiler Design 21CS51 Module 5

Dependency graph for float id1, id2, id3:

Nodes 1, 2, and 3 represent the attribute entry associated with each of the leaves labeled id.
Node 4 represents the attribute T.type, and actually where attribute evaluation begins. This type is
then passed to nodes 5, 7 and 9 representing L.inh associated with each of the occurrences if the
non-terminal L.
Nodes 6, 8 and 10 are the dummy attributes that represent the application of the function addType
to a type and one of these entry values.
Give the SDD for simple desk calculator and draw dependency graph for expression 1 * 2* 3 *( 4
+ 5 )n
Syntax directed definition:

PRODUCTION SEMANTIC RULES


L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
E→T E. val = T. val
T → T1 * F T. val = T1 . val * F.val
T →F T. val = F . val
F →(E) F. val = E . val

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 392


Automata Theory & Compiler Design 21CS51 Module 5

Classes of Syntax Directed Definitions


1. S-Attribute Definitions
2. L-Attribute Definitions
Describe S-attributed and L-attributed definitions.
S-attributed definitions
The syntax directed definition is said to be S-attributed if every attribute in SDD is synthesized
attribute.
S-Attributed Definitions can be efficiently implemented.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 393


Automata Theory & Compiler Design 21CS51 Module 5

Exmple:
PRODUCTION SEMANTIC RULES
L → En L. val = E . val
E → E1 + T E. val = E1 . val + T. val
E→T E. val = T. val
T → T1 * F T. val = T1 . val * F.val
T →F T. val = F . val
F →(E) F. val = E . val

L-attributed definitions:
The syntax directed definition is said to be L-attributed if every attribute in SDD must be either
synthesized attributes or inherited attributes in a restricted fashion.
A syntax-directed definition is L-attributed if each inherited attribute of Xj, where 1≤ j≤ n, on the
right side of A → X X2...Xn depends only on:
 The attributes of the symbols X1,...,Xj-1 to the left of Xj in the production and
 The inherited attribute of A.
 Every S-attributed definition is L-attributed, the restrictions only apply to the inherited
attributes (not to synthesized attributes).
Example:
Productions Semantic Rules
T → F T‟ T‟.inh = F.val
T.val = T‟.syn
T’ → *F T1‟ T1‟.inh = T‟.inh x F.val
T‟.syn = T1‟.syn

The first rule defines the inherited attribute T‟.inh = F.val, and F appears to the left of T‟ in the
production body.
The second rule defines the inherited attribute T1‟.inh = T‟.inh x F.val, which is associated with the
inherited value of head T‟ and F.val, where F appears left of T1‟ in the production body.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 394


Automata Theory & Compiler Design 21CS51 Module 5

Semantic Rules with Controlled Side effects


 Permit incidental side effects the do not constrain attribute evaluation.
 Constrain the allowable evaluation orders, so that the same translation is produced for any
allowable order.
Example: For production L→ En Semantic Rule is print(E.val)
Semantic rules that are executed for their side effects such as print(E.val) will be treated as the
definitions of dummy synthesized attributes associated with the head of the production. The
modified SDD produces the same translation, since the print statement is executed at the end, after
the result is computed into E.val.
What is Attribute grammar?
An SDD without side effects is called an attribute grammar.
The rules in attribute grammar define the value of an attribute purely in terms of the values of other
attributes and constants.
Intermediate representation:
An intermediate representation is representation of a source program part way between the source
and target languages.
A good IR is one that is fairly independent of the source and target languages, so that it maximizes
its ability to be used in a re-targetable compiler.
What are the benefits of intermediate code generation ?
Benefits of Intermediate code generation:
• It can save a considerable amount of effort:
Eg: m x n compilers can be built by writing, just m front ends and n back ends.
• Retargeting of code is possible
• It allows optimization of code.
Logical structure of compiler‟s front end:
Static checking includes type checking, which ensures that operands are applied to compatible
operands. It also includes any syntactic checks that remain after parsing.
Example:
Static checking assures that a break statement in C is enclosed within a while, for or
switch statement; otherwise an error message is issued.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 395


Automata Theory & Compiler Design 21CS51 Module 5

A complier may construct a sequence of intermediate representation:

High level representations are close to source language and low level representations are
close to the target machine.
Three types of intermediate representation:
1. Syntax Trees
2. Postfix notation
3. Three Address Code
Syntax Tree:
Syntax tree is nothing more than a condensed form of the parse tree. .Nodes in syntax tree represent
constructs in the source program. The children of a node represent the meaningful Components of a
construct.
Postfix Notation:
Operator follows the operand
Example::(a-b) * ( c + d) + (a – b), the postfix representation is::ab-cd+*ab-+
Three Address Code:
• It is a sequence of statements of the form x = y op z.
• It has at most one operator on the right side of an instruction.
• No built-up arithmetic expressions are permitted.
• It is a linearized representation of a syntax tree or a DAG
Variants of syntax trees:
• Nodes of syntax tree represent constructs in the source program; the children of a node
represent the meaningful components of a construct.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 396


Automata Theory & Compiler Design 21CS51 Module 5

• A directed acyclic graph (DAG) for an expression identifies the common sub expressions of
the expression. (sub expressions that appears more than once)
• A DAG has leaves corresponding to atomic operands and interior nodes corresponding
to operators. A node N in a DAG has more than one parent if N represents a common
sub expression; in a syntax tree.
DIRECTED ACYCLIC GRAPH (DAG):
What is a DAG? How it differs from syntax tree.
A directed acyclic graph (DAG) for an expression identifies the common sub expressions of the
expression. (sub expressions that appears more than once).
DAG gives the compiler important clues regarding the generation of efficient code to evaluate the
expressions.
Comparison between DAG and syntax tree:
• A node N in DAG has more than one parent if N represents a common sub-expression;
• In syntax tree the tree for common sub-expression would be replicated as many times as the
sub-expression appears in the original expression.
• DAG gives the compiler important clues regarding the generation of efficient code to
evaluate the expressions.
Node( ) and Leaf( ) functions were called to Create a fresh node and leaf node respectively in
Syntax tree construction. The same functions can be used in DAG, but in DAG before creating a
new node/ leaf-node, These functions first check whether an identical Node already exists.
If a previously created identical node exists, the existing node is returned, otherwise new node is
Created.
Construction of DAG:
Node ( ) and Leaf ( ) functions were used to create the node and leaf node of DAG respectively.
in DAG before creating a new node/ leaf-node, These functions first check whether an identical
Node already exists.
If a previously created identical node exists, the existing node is returned, otherwise new node is
Created.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 397


Automata Theory & Compiler Design 21CS51 Module 5

******Develop SDD to produce directed acyclic graph for an expression and draw the DAG for the
expression a + a * ( b – c) + ( b – c) * d. Show the steps for constructing the same.
SDD to produce directed acyclic graph for an expression:
E → E1 + T E. node = new Node („+‟, E1 .node, T1. node)
E → E1 + T E. node = new Node („-‟, E1 .node, T1.node)
E→T E. node = T. node
T → T1 * F T. node = new Node („*‟, T1 .node, F. node)
T→ F E. node = T. node
F→(E) F. node = E. node
F → id F. node = new Leaf( id, id. entry)

DAG:

Steps for constructing the DAG:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 398


Automata Theory & Compiler Design 21CS51 Module 5

Construct DAG for the expression given below:


( ( x + y ) - ( (x + y ) * ( x – y ) ) ) + ( (x + y) * ( x – y) )
DAG:

Construct DAG for a = b * - c + b *- c

Construct DAG for 2* x + y * (2 * x - y)

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 399


Automata Theory & Compiler Design 21CS51 Module 5

VALUE NUMBER METHOD FOR CONSTRUCTING DAG:


Explain Value number method algorithm for constructing the nodes of DAG
Nodes of syntax tree or DAG are stored in an array of records. We shall refer to the nodes by an
array index called as value number.
Input: Label op, node l, and node r
Output: The value number of a node in the array with signature ( op, l, r)
Algorithm: Search the array for a node M with label op, left child l and right child r. If there is
such a node, return the value number of node M. If not create in the array a new node N with label
op, left child l, and right child r and return its value number.
Example:

Construct the DAG and identify the value number for the sub-expressions of the following
expressions, assuming + associates from left.
i. a + b + ( a + b)
DAG:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 400


Automata Theory & Compiler Design 21CS51 Module 5

Array representation:
1 id → entry for a
2 id → entry for b
3 + 1 2
4 + 1 2
5 + 4 3

ii. a + b + a + b
DAG:

Array representation:
1 id → entry for a
2 id → entry for b
3 + 1 2
4 + 3 1
5 + 4 2

THREE ADDRESS CODE


What is three address code ?
It is a lenearized representation of a syntax tree or DAG in which explicit names correspond to the
interior nodes of the graph.
In 3 address code at most one operator on the right side of an instruction is permitted.
Example: Source language expression is x+y*z corresponding 3 address code:
t1 = y * z
t2 = x + t1

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 401


Automata Theory & Compiler Design 21CS51 Module 5

where t1 and t2 are compiler generated temporary variables.


A 3-address code is built from two concepts:
i. Address
ii. Instructions.
An address can be one of the following:
Name: For our convenience, names to appear as addresses in three-address code. In an
implementation, a source name is replaced by a pointer to its symbol-table entry, where all
information about the name is kept.
Constant: a compiler must deal with many different types of constants and variables.
Compiler-Generated Temporary Name:
It is useful, especially in optimizing compilers, to create a distinct name each time a temporary
name is needed. These temporary names can be combined, if possible, when registers are allocated
to variables (names).
THREE-ADDRESS INSTRUCTION FORMAT:
A 3-address instruction can take the following format:
********List the various 3 address instruction (code) format. Give one example for each.
3-address instruction forms for:
i. Assignment statements:
a) Assignment instruction of the form x = y op z where op is arithmetic or logical
operation and x, y and z are addresses.
Example: x = y * z.
b) Assignment instruction of the form x = op y where op is unary operation like unary
minus, logical negation and conversion operations ( int to float conversion).
Example: x = - y
c) Copy instruction of the form x = y where x is assigned the value of y.
Example: x = y
d) Indexed copy instruction of the form x = y[i] and x[i] = y
e) Address and pointer assignments of the form x = &y, x = *y and *x = y
ii. Control flow statements :
a) An unconditional jump goto L. The three address instruction with label L is the next to
be executed.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 402


Automata Theory & Compiler Design 21CS51 Module 5

b) Conditional jumps of the form if x goto L and if False x goto L. These instructions
execute the instruction with label L next if x is true and false, respectively. Otherwise
the following 3-address instruction in sequence is executed next as ususl.
c) Conditional jumps such as if x relop y goto L, which apply a relational operator such as
<, ==, >=, etc to x and y, and execute the instruction with L next if x stands in relop to y.
If not, 3-address instruction following if x relop y goto L is executed in sequence.
Example: if x < y goto L1
d) Procedure calls and returns are implemented using the following instructions:
Param x for parameters; call p, n and y = call p, n for procedure and function calls
respectively; and return y; where y represents the returned value, which is optional.
Example:
param x1
param x2

param xn
call p, n
Three- address instruction (code) Representation:
The above 3-address instruction format specifies the components of each type of instruction, but
does not specify the representation of these instructions in a data structure. In a compiler these 3-
address instructions can be implemented as records with fields for the operator and operands.
Three such representations are called
1. Quadruples
2. Triples
3. Indirect-triples
Explain in detail the implementation of three address statements (code)

OR
Explain the following with an example
i. Quadruples: A quadruple or quad has four fields which we call op, arg1, arg2 and result. The
op field contains an internal code for the operator.
Example: Three address code t1 = x + y can be represented in quadruple form as follows:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 403


Automata Theory & Compiler Design 21CS51 Module 5

op arg1 arg2 result


0 + x y t1
1 . . . .
. . . . .
. . . . .

ii. Triples: A triple has only three fields, which we call op, arg1 and arg2. In triples form result of
an operation say x op y is referred by its position rather than by an explicit temporary name.
Example: Three address codes t1 = x + y
t2 = z * t1 can be represented in triples form as follows:
op arg1 arg2
0 + x y
1 * z (0)
. . . .
. . . .

Here the second 3-address instruction contains the temporary name t1. In triples form t1 can be
referred by its position; ie: (0). The parenthesized numbers represent pointers into the triple
structure itself.
iii. Indirect Triples: Indirect triples consist of a listing of pointers to triples, rather than a listing of
triples themselves.
As we know with triples the result of an operation is referred to by its position. So moving an
instruction may require us to change all references to that result. This problem does not occur in
indirect triples form.
Example:
Three address codes t1 = x + y
t2 = z * t1 can be represented in indirect-triples form as follows:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 404


Automata Theory & Compiler Design 21CS51 Module 5

STATIC SINGLE ASSIGNMENT FORM:


With an example explain static single assignment form
Static Single Assignment form ( SSA) is an intermediate representation that facilitates certain code-
optimization. It differs from 3-address code with two distinctive aspects:
i. All assignments in SSA are to variables with distinct names, hence the name as SSA.
Example:
Let us consider a 3-addres code of the form:

In SSA this can be represented as:

ii. SSA uses a notational convention called Ø function is used to combine the two definitions of
single variable.
Example: if ( flag )
{
x = -1;
}

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 405


Automata Theory & Compiler Design 21CS51 Module 5

else
x = 1;
y=x*a
The above program has two control paths in which the variable x gets defined.
In SSA, if we use different names for x then the source program becomes
if ( flag )
{
x1 = -1;
}
else
x2 = 1;
y=x*a
Now which variable should we use in the assignment y = x * a ?
This can be answered by considering the second aspect of SSA, where a notational convention
called Ø function is used to combine the two definitions of x: as
if ( flag )
{
x1 = -1;
}
else
x2 = 1;
x3 = Ø ( x1, x2 );
Here Ø( x1, x2 ) has the value x1 if the control flow passes through the true part of the condition and
the value x2 if it passes through false part.
Advantages and disadvantages of quadruple, triples and indirect-triples:
 The benefit of Quadruples over Triples can be seen in an optimizing compiler, where
instructions are often moved around.
 With quadruples, if we move an instruction that computes a temporary t, then the
instructions that use t require no change.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 406


Automata Theory & Compiler Design 21CS51 Module 5

 With triples, the result of an operation is referred to by its position, so moving an instruction
may require changing all references to that result. This problem does not occur with indirect
triples
 With indirect-triples an optimizing compiler can move an instruction by re-ordering the
instruction list without affecting the triples themselves.

Consider the assignment statement


a = b * - c + b * -c
Write the sequence of
i. Three address code
ii. Its Quadruple representation
iii. Triples
iv. Indirect triples

i. Three Address code:


t1 = minus c
t2 = b * t1
t3 = minus c
t4 = b * t3
t5 = t2 + t4
a = t5

ii. Quadruple representation of 3- address code:


op arg1 arg2 result
0 minus c t1
1 * b t1 t2
2 minus c t3
3 * b t3 t4
4 + t2 t4 t5
5 = t5 a
. …… …… …….

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 407


Automata Theory & Compiler Design 21CS51 Module 5

iii. Triples:
op arg1 arg2
0 minus c
1 * b (0)
2 minus c
3 * b (2)
4 + ( 1) (3)
5 = a (4)
. ….. …… …..
iv. Indirect- Triples:
instruction
25 (0) op arg1 arg2
26 (1) 0 minus c
27 ( 2) 1 * b (0)
28 (3) 2 minus c
29 ( 4) 3 * b (2)
30 (5) 4 + ( 1) (3)
.. ………… 5 = a (4)
. ….. …… …..
Obtain the DAG and three-address code for the expression
( a+ b) * ( c + d) - ( a+ b)

DAG:

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 408


Automata Theory & Compiler Design 21CS51 Module 5

Three-address code:
t1 = a + b
t2 = c + d
t3 = a + b
t4 = t1 * t2
t5 = t4 – t3
Translate the arithmetic expression a + - ( b +c ) into:
i. A syntax tree
ii. Three address code
iii. Quadruples
iv. Triples
v. Indirect-Triples

i. Syntax Tree:

ii. Three- address code:


t1 = b + c
t2 = minus t1
t3 = a + t2

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 409


Automata Theory & Compiler Design 21CS51 Module 5

iii. Quadruple:
op arg1 arg2 result
0 + b c t1
1 minus t1 t2
2 + a t2 t3
. …… …… …….

iv. Triples:
op arg1 arg2
0 + b c
1 minus (0)
2 + a (1)
. …… ……

v. Indirect-triples:
instruction
25 (0) op arg1 arg2
26 (1) 0 + b c
27 ( 2) 1 minus (0)
.. ………… 5 + a (1)
. ….. …… …..

Write the three-address code for the expression:


a + a * ( b – c) + ( b – c ) * d
Three-address code:
t1 = b – c
t2 = a * t1
t3 = a + t2
t4 = t1 * d
t5 = t3 + t4

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 410


Automata Theory & Compiler Design 21CS51 Module 5

Write the three-address code for the expression: a = b[ i ] + c [ j ]


Three-address code:
t1 = b[ i ]
t2 = c [ j ]
t3 = t1 + t2
a = t3
CODE GENERATION
Code generator phase generates the target code taking input as intermediate code. The output
of intermediate code generator may be given directly to code generation or may pass through
code optimization before generating code.
ISSUES IN DESIGN OF CODE GENERATION:
**************Explain the issues in the design of code generator.
The main issues in design of code generation are:
i. Intermediate representation.(Input to code generator)
ii. Target Code.
iii. Memory Management
iv. Instruction selection.
v. Register Allocation.
vi. Evaluation Order.
Intermediate representation:
The input to the code generator is the intermediate representation of the source program produced
by the front end phase of compiler, along with information in the symbol table. Linear
representation like postfix and three address code or quadruples and graphical representation like
Syntax tree or DAG. We assume that input to code generator whose type checking is done and that
input is in free of errors.
Target code:
The target code may be absolute code, re-locatable machine code or assembly language code.
Absolute code can be executed immediately as the addresses are fixed. But in case of re-locatable it
requires linker and loader to place the code in appropriate location and map (link) the required
library functions. If it generates assembly level code then assemblers are needed to convert it into

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 411


Automata Theory & Compiler Design 21CS51 Module 5

machine level code before execution. Re-locatable code provides great deal of flexibilities as the
functions can be compiled separately before generation of object code.
Memory Management:
 During code generation process the symbol table entries have to be mapped to actual
physical addresses and levels have to be mapped to instruction address.
 Mapping name in the source program to address of data is co-operating done by the front end and
code generator.
 Local variables are stack allocation in the activation record while global variables are in
static area.
Instruction Selection:
The code generator must map the intermediate-representation program (3-address code) into
sequence of codes that can be executed by the target machine. This mapping can be determined by
considering the factors such as:
a. The level of intermediate-representation
b. The nature of instruction set architecture.
c. Quality of the generated code.
 If the intermediate-representation level is high-level, then the code generator produces poor
code that needs further optimization.
 If the IR level is Low-level details of the underlying machine, then the code generator can
produce more efficient code sequences.
 The instruction set should be complete, in the sense that all operations can be implemented.
Sometimes a single operation may be implemented using many instruction (many set of
instructions). The code generator should choose the most appropriate instruction. The
instruction should be chosen in such a way that speed is of execution is minimum or other
machine related resource utilization should be minimum.
Example: Consider the set of statements
a=b+c
d=a+e
would be translated into:
LD R0, b
ADD R0, R0, c

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 412


Automata Theory & Compiler Design 21CS51 Module 5

ST a, R0
LD R0, a
ADD R0, R0, e
ST d, R0
Here the fourth statement is redundant since it loads a value that has just been stored, and so is the
third if „a‟ is not subsequently used. So the redundant instruction should be eliminated.
Example: a = a + 1 would be translated into:
LD R0, a
Add R0, R0, # 1
ST a, R0
If the target machine has an increment (INC) instruction, then the above 3-address code may be
implemented by the single instruction INC a, rather than by a more obvious sequence that loads a
into a register, adds 1 to the register, and then stores the result back into a;
Thus instruction cost is also an important issue in design of code generation. Cost of instruction is
defined as cost of execution plus the number of memory access.
Register Allocation:
 If the operands are in register, the execution is faster hence the set of variables whose values
are required at a point in the program are to be retained in the registers.
 In Register allocation we select the set of variables that will reside in register
 In register assignment, we pick the register that contains variable.
Consider a hypothetical byte addressable machine as target machine. It has n general purpose
register R1, R2 ------- Rn. The machine instructions are two address instructions of the form
op-code source address destination address
Example:
MOV R0, R1
ADD R1, R2
Target Machine supports for the following addressing modes:
a. Absolute addressing mode
Example: MOV R0, M where M is the address of memory location of one of the operands.
MOV R0, M moves the contents of register R0 to memory location M.
b. Register addressing mode where both the operands are in register.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 413


Automata Theory & Compiler Design 21CS51 Module 5

Example: ADD R0, R1


c. Immediate addressing mode – The operand value appears in the instruction.
Example: ADD # 1, R0
d. Index addressing mode- this is of the form C(R) where the address of operand is at the location
C + Contents(R)
Example: MOV 4(R0), M the operand is located at address = contents (4+contents(R0))
Evaluation Order:
The order in which computations are performed can affect the efficiency of the target code. Some
computation orders require fewer registers to hold intermediate results than others. However,
picking a best order in the general case is a difficult NP-complete problem.

ATHMARANJAN K, DEPARTMENT OF ISE, SIT MANGALURU. Page 414

You might also like