0% found this document useful (0 votes)

135 views48 pages

Compiler Design & Finite Automata

The document discusses compiler design and language processing systems. It defines a compiler as a program that translates a program in one language (the source language) into an equivalent program in another language (the target language). An interpreter directly executes the source program without translation. The document compares compilers and interpreters, and describes the different phases, categories, and passes of compilers. It also discusses finite automata, regular expressions, and bootstrapping in compiler design.

Uploaded by

Shubham Dixit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views48 pages

Compiler Design & Finite Automata

Uploaded by

Shubham Dixit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

COMPILER DESIGN (KCS-502)

INTRODUCTION TO COMPILER
Compiler is a program that can read a program in one
language “SOURCE language” and translate it into an
equivalent program in another language “TARGET
language”

SOURCE TARGET
LANGUAGE COMPILER LANGUAGE

ERROR
MESSAGE
INTRODUCTION TO INTERPRETER
Interpreter is another common kind of language
processor. It reads the program in one language “SOURCE
program” as input and interpret line by line and produce
a “TARGET program”

SOURCE TARGET
PROGRAM INTERPRETER PROGRAM
LANGUAGE PROCESSING SYSTEM

Source program

PREPROCESSOR
Modified source
program

COMPILER
Target
assembly
language
ASSEMBLER
Relocatable
machine
code
Library files
Linker/Loader relocatable
object files
Target machine
code
DIFFERENCE BETWEEN
COMPILER AND INTERPRETER
PHASES OF COMPILER (Cont…)

Analysis of the Source

Program

Synthesis of the Object

Program
PHASES OF COMPILER
CATEGORIES OF COMPILERS
1. Native Compiler :
Native compiler are compilers that generates code for the
same Platform on which it runs. It converts high language into
computer’s native language. For example Turbo C or GCC
compiler

2. Cross compiler :
A Cross compiler is a compiler that generates executable code
for a platform other than one on which the compiler is
running. For example a compiler that running on Linux/x86
box is building a program which will run on a separate
Arduino/ARM.
CATEGORIES OF COMPILERS
Difference between Native and Cross Compiler

NATIVE COMPILER CROSS COMPILER

Translates program for same Translates program for different

hardware/platform/machine on it is hardware/platform/machine other than
running. the platform which it is running.

It is dependent on System/machine It is independent of System/machine

and OS and OS

It can generate executable file like It can generate raw code .hex
.exe
TurboC or GCC is native Compiler. Keil is a cross compiler.
PASSES
The number of iteration in which the entire phases
of compiler are done are termed as PASS. It has two
categories:
•Single pass compiler (Pascal)
•Two Pass/Multi pass compiler (Java)
•Pass1 also known as : Front End, Analytic Part, Platform
Independent
•Pass2 also known as : Back End, Synthesis Part, Platform
Dependent
SINGLE-PASS COMPILER
In a Single (One) pass compiler the entire
phases performs its function in a single pass.
Advantage:-
•It takes less time to execute.
Disadvantage:-
•In this we go in a sequence and can’t go back to
handle the error.
•In this more space is occupied.
TWO/MULTI-PASS COMPILER
A Two pass/multi-pass Compiler is a type
of compiler that processes the source
code or abstract syntax tree of a program
multiple times. In multipass Compiler we
divide phases in two pass as:
Advantage:-
•It occupies less memory space.
•Errors can be removed in every pass to make
error free.

Disadvantage:-
•It takes more time to convert source code
into target code.
TWO/MULTI-PASS COMPILER
It helps to solve two main problem:
1.If we want to design compiler for different programming
language for same machine.
TWO/MULTI-PASS COMPILER
It helps to solve two main problem:
2. If we want to design compiler of same
programming language for different machines.
DIFFERENCE B/W SINGLE AND
MULTIPASS
PARAMETER SINGLE PASS MULTIPASS

SPEED FAST SLOW

MEMORY MORE LESS

TIME LESS MORE

PORTABILITY NO YES
BOOTSTRAPPING
Bootstrapping is widely used in the compilation development.
• It is used to produce a self-hosting compiler.
•Self-hosting compiler is a type of compiler that can compile its own source
code.
•It is used to compile the compiler and then you can use this compiled
compiler to compile everything else as well as future versions of itself.
For bootstrapping purpose, a compiler is characterized by three
languages:
•Source language S that compiler compiles
•Target language T that it generate codes
•The Implementation language I the compiler is written
Notation: represents a compiler for Source S, Target T,
implemented in I. The T-diagram shown above is also used to
depict the same compiler.
BOOTSTRAPPING
FINITE AUTOMATON
An automaton with a finite number of states is called a Finite
Automaton (FA) or Finite State Machine (FSM).
An automaton can be represented by a 5-tuple (Q, ∑, δ, q0, F), where
−
Q is a finite set of states.
∑ is a finite set of symbols, called the alphabet of the automaton.
δ is the transition function.
q0 is the initial state from where any input is processed (q0 ∈ Q).
F is a set of final state/states of Q.
FINITE AUTOMATA
(cont….)
Related Terminologies

• Alphabet
Definition − An alphabet is any finite set of symbols.
Example − ∑ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and ‘d’ are symbols.

• String
Definition − A string is a finite sequence of symbols taken from ∑.
Example − ‘cabcad’ is a valid string on the alphabet set ∑ = {a, b, c, d}

• Length of a String
Definition − It is the number of symbols present in a string. (Denoted by |S|).
Examples −
If S = ‘cabcad’, |S|= 6
If |S|= 0, it is called an empty string (Denoted by λ or ε)
FINITE AUTOMATA
(cont….)
Related Terminologies

• Kleene Star
Definition − The Kleene star, ∑*, is a unary operator on a set of symbols or strings, ∑, that gives the
infinite set of all possible strings of all possible lengths over ∑ including λ.
Representation − ∑* = ∑0 ∪ ∑1 ∪ ∑2 ∪……. where ∑p is the set of all possible strings of length p.
Example − If ∑ = {a, b}, ∑* = {λ, a, b, aa, ab, ba, bb,………..}

• Kleene Closure / Plus

Definition − The set ∑+ is the infinite set of all possible strings of all possible lengths over ∑ excluding λ.
Representation − ∑+ = ∑1 ∪ ∑2 ∪ ∑3 ∪…….
∑+ = ∑* − { λ }
Example − If ∑ = { a, b } , ∑+ = { a, b, aa, ab, ba, bb,………..}

• Language
Definition − A language is a subset of ∑* for some alphabet ∑. It can be finite or infinite.
Example − If the language takes all possible strings of length 2 over ∑ = {a, b}, then L = { ab, bb, ba, bb}
FINITE AUTOMATA
(cont….)
Finite Automaton can be classified into two types −
•Deterministic Finite Automaton (DFA)
•Non-deterministic Finite Automaton (NDFA / NFA)

Deterministic Finite Automaton (DFA)

In DFA, for each input symbol, one can determine the state to which the machine
will move. Hence, it is called Deterministic Automaton. As it has a finite number of
states, the machine is called Deterministic Finite Machine or Deterministic Finite
Automaton.
Formal Definition of a DFA
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
Q is a finite set of states.
∑ is a finite set of symbols called the alphabet.
δ is the transition function where δ: Q × ∑ → Q
q0 is the initial state from where any input is processed (q0 ∈ Q).
F is a set of final state/states of Q (F ⊆ Q).
FINITE AUTOMATA
(cont….)
Example
Let a deterministic finite automaton be →
Q = {a, b, c},
∑ = {0, 1},
q0 = {a},
F = {c}, and
Transition function δ as shown by the following table −
FINITE AUTOMATA
(cont….)
Non-Deterministic Finite Automaton (NDFA/NFA)
In NDFA, for a particular input symbol, the machine can move to any combination
of the states in the machine. In other words, the exact state to which the machine
moves cannot be determined. Hence, it is called Non-deterministic Automaton. As
it has finite number of states, the machine is called Non-deterministic Finite
Machine or Non-deterministic Finite Automaton.

Formal Definition of an NDFA

An NDFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −

Q is a finite set of states.
∑ is a finite set of symbols called the alphabets.
δ is the transition function where δ: Q × ∑ → 2Q
(Here the power set of Q (2Q) has been taken because in case of NDFA, from a
state, transition can occur to any combination of Q states)
q0 is the initial state from where any input is processed (q0 ∈ Q).
F is a set of final state/states of Q (F ⊆ Q).
FINITE AUTOMATA
(cont….)
Example
Let a non-deterministic finite automaton be →
Q = {a, b, c}
∑ = {0, 1}
q0 = {a}
F = {c}
The transition function δ as shown below −
REGULAR EXPRESSION
Regular expression is an important notation for specifying patterns. Each pattern
matches a set of strings, so regular expressions serve as names for a set of strings.
Programming language tokens can be described by regular languages.

A Regular Expression can be recursively defined as follows −

• ε is a Regular Expression indicates the language containing an empty string. (L (ε)

= {ε})
. φ is a Regular Expression denoting an empty language. (L (φ) = { })
. x is a Regular Expression where L = {x}

If X is a Regular Expression denoting the language L(X) and Y is a Regular

Expression denoting the language L(Y), then
. X + Y is a Regular Expression corresponding to the language L(X) ∪ L(Y) where L(X+Y) =
L(X) ∪ L(Y).
. X . Y is a Regular Expression corresponding to the language L(X) . L(Y) where L(X.Y) =
L(X) . L(Y)
. R* is a Regular Expression corresponding to the language L(R*)where L(R*) = (L(R))*
REGULAR EXPRESSION
RE TO NFA CONVERSION
GRAMMAR
Grammars are used to describe the syntax of a programming
language. It specifies the structure of expression and statements.
Definition
A context free grammar G is defined by four tuples as,
G=(V,T,P,S)
where,
G - Grammar
V - Set of variables
T - Set of Terminals
P - Set of productions
S - Start symbol
Terminals are represented by small letters
Variables are represented by capital letters
GRAMMAR
According to Chomsky hierarchy, grammars are divided of 4 types:
TYPE of Grammar
• Type 0: Unrestricted Grammar:
Type-0 grammars include all formal grammars. Type 0
grammar language are recognized by turing machine.
These languages are also known as the Recursively
Enumerable languages.
• Grammar Production in the form of:
TYPE of Grammar
• Type 1: (Context Sensitive Grammar)
Type-1 grammars generate the context-sensitive languages. The
language generated by the grammar are recognized by the Linear
Bound Automata.
• In Type 1
I. First of all Type 1 grammar should be Type 0.
II. Grammar Production in the form of
TYPE of Grammar
• Type 2: Context Free Grammar:

Type-2 grammars generate the context-free languages. The language

generated by the grammar is recognized by a Pushdown automata.
Type-2 grammars generate the context-free languages.

In Type 2

1. First of all it should be Type 1.

2. Left hand side of production can have only one variable.
TYPE of Grammar
• Type 3: Regular Grammar:
Type-3 grammars generate regular languages. These
languages are exactly all languages that can be accepted by a
finite state automaton.
• Type 3 is most restricted form of grammar.

• Type 3 should be in the given form only :

TYPE of Grammar
DERIVATION
A derivation is basically a sequence of production
rules, in order to get the input string. During
parsing, we take two decisions for some
sentential form of input:
• Deciding the non-terminal which is to be
replaced.
• Deciding the production rule, by which, the non-
terminal will be replaced.
To decide which non-terminal to be replaced with
production rule, we can have two options.
DERIVATION (Cont…)
To decide which non-terminal to be replaced with
production rule, we can have two options.
• Left-most Derivation
If the sentential form of an input is scanned and
replaced from left to right, it is called left-most
derivation. The sentential form derived by the left-
most derivation is called the left-sentential form.
• Right-most Derivation
If we scan and replace the input with production rules,
from right to left, it is known as right-most derivation.
The sentential form derived from the right-most
derivation is called the right-sentential form.
DERIVATION Example
Example The left-most derivation The right-most derivation
is: is:
Production rules: E→E+E E→E-E
E→E-E+E E→E-E+E
E→E+E E → id - E + E E → E - E + id
E → id - id + E E → E - id + id
E→E-E E → id - id + id E → id - id + id

E → id

Input string:
Id - id + id
AMBIGUITY

A grammar G is said to ambiguous if it has more than one

parse tree (Left or Right derivation)

The left-most derivation The right-most derivation

is: is:
E→E+E E→E-E
E→E-E+E E→E-E+E
Example E → id - E + E E → E - E + id
E → id - id + E E → E - id + id
Production rules:
E → id - id + id E → id - id + id
E→E+E
E→E-E
E → id

Input string:
Id - id + id
LEX and YACC
LEX generates C code for a lexical analyzer, or scanner.
It uses patterns that match strings in the input and converts the
strings to tokens.
Tokens are numerical representations of strings, and simplify
processing.

YACC (Yet Another Compiler Compiler) generates C code for a

syntax analyzer, or parser.
Yacc uses grammar rules that allow it to analyze tokens from lex
and create a syntax tree.
A syntax tree imposes a hierarchical structure on tokens.
LEX and YACC
LEX and YACC
Input to Lex is divided into three sections, with %% dividing the
sections.

Optional

Pattern Action Code

Optional
LEX and YACC
Input to YACC is divided into three sections, with %% dividing the
sections.

Optional

Production Action Code

Rule CFG

Optional
LEX and YACC
To run :

% lex bas.l
% cc lex.yy.c –ll
% a.out
-------
------
-------
%
COMPILER CONSTRUCTION TOOLS…….
•Scanner Generator:- These automatically generate lexical analyzers
normally from a specification based on regular expression.
•Parser Generator:- These produce syntax analyzer, normally from I/P
that is based on a context free grammar.
•Syntax-directed translation engines:- These produce
collection of routines that walk the parse tree.
•Code-generator generators:- Such a tool takes a collection of
rules that defines the translation of each operation of the intermediate language
into the machine language for the target machine.
•Dataflow analysis engines:- Much of the information needed to
perform good code optimization involves “data flow analysis”. The gathering of
information about how values are transmitted from one part of a program to
other part.
Q1. Generate the Token and Parse tree for the following:

If (MAX==5) GOTO 100

Q2. Generate the Token and Parse tree for the following:

While A>B do
A=A+B
Q3. Generate the Token and Parse tree for the following:

While A>=B & A=2*5 do

A=A*B

Compiler Design Unit 1
No ratings yet
Compiler Design Unit 1
55 pages
Toc 1
No ratings yet
Toc 1
32 pages
Module 1
No ratings yet
Module 1
20 pages
Unit 1 CD
No ratings yet
Unit 1 CD
74 pages
Compiler Design Unit 1 Notes
No ratings yet
Compiler Design Unit 1 Notes
21 pages
Toa Lec 4
No ratings yet
Toa Lec 4
79 pages
Unit 1: Compiler Design
No ratings yet
Unit 1: Compiler Design
74 pages
Compiler 2
No ratings yet
Compiler 2
38 pages
FA MSC 2
No ratings yet
FA MSC 2
100 pages
Compiler
No ratings yet
Compiler
31 pages
Acd Notes - 2
No ratings yet
Acd Notes - 2
32 pages
Automata & Compiler Design Handout
No ratings yet
Automata & Compiler Design Handout
59 pages
2 3 Marks
No ratings yet
2 3 Marks
24 pages
At&CD Material
No ratings yet
At&CD Material
82 pages
Review Questions
No ratings yet
Review Questions
5 pages
Toc Unit 1
No ratings yet
Toc Unit 1
68 pages
Toc NOTES - 2
No ratings yet
Toc NOTES - 2
0 pages
TOC - UNIT-1 To 4
No ratings yet
TOC - UNIT-1 To 4
77 pages
Module 1 - Chapter 1
No ratings yet
Module 1 - Chapter 1
52 pages
Compiler Construction: Lexical Analysis
No ratings yet
Compiler Construction: Lexical Analysis
37 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
21 pages
Automata Theory Essentials
No ratings yet
Automata Theory Essentials
5 pages
Practical No-1: AIM: Write and Implement A Program To Simulate Deterministic Finite Automata Theory
No ratings yet
Practical No-1: AIM: Write and Implement A Program To Simulate Deterministic Finite Automata Theory
9 pages
Mod 1 Atc - Merged
No ratings yet
Mod 1 Atc - Merged
15 pages
Untitled Document
No ratings yet
Untitled Document
29 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
10 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Automata 3 Slide
No ratings yet
Automata 3 Slide
30 pages
Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
67 pages
Compiler Construction
No ratings yet
Compiler Construction
35 pages
Introduction To Finite Automata
No ratings yet
Introduction To Finite Automata
37 pages
Toc 1
No ratings yet
Toc 1
28 pages
Chapter One
No ratings yet
Chapter One
41 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Unit 1 - Finite Automata
No ratings yet
Unit 1 - Finite Automata
18 pages
Compiler Construction Lecture Notes
No ratings yet
Compiler Construction Lecture Notes
27 pages
Q1: Defination Of: - Regular Language
No ratings yet
Q1: Defination Of: - Regular Language
8 pages
Compiler Construction Basics
No ratings yet
Compiler Construction Basics
16 pages
At&cd Unit1
No ratings yet
At&cd Unit1
83 pages
Flat 1 Unit
No ratings yet
Flat 1 Unit
30 pages
Automata: The Methods and The Madness
No ratings yet
Automata: The Methods and The Madness
84 pages
Styled Compiler Design Notes
No ratings yet
Styled Compiler Design Notes
3 pages
Automata
No ratings yet
Automata
62 pages
Lecture 3 (30-1-23)
No ratings yet
Lecture 3 (30-1-23)
11 pages
Toc PDF
No ratings yet
Toc PDF
30 pages
Automata Theory LecturesSlides Compressed
No ratings yet
Automata Theory LecturesSlides Compressed
141 pages
Compiler Design Full PDF
No ratings yet
Compiler Design Full PDF
138 pages
12 Compilers Grammars
No ratings yet
12 Compilers Grammars
10 pages
Pushdown Automata Is A Finite Automata With Extra Memory Called - 2
No ratings yet
Pushdown Automata Is A Finite Automata With Extra Memory Called - 2
6 pages
2.chapter3 - Regular Expressions and Automata
No ratings yet
2.chapter3 - Regular Expressions and Automata
28 pages
Nfa & Dfa
No ratings yet
Nfa & Dfa
14 pages
1586345305compiler Construction Lecture 1
No ratings yet
1586345305compiler Construction Lecture 1
4 pages
Formal Languages CHAPTER-I
No ratings yet
Formal Languages CHAPTER-I
11 pages
CSC 409 Part 1 - 113507
No ratings yet
CSC 409 Part 1 - 113507
23 pages
Note 4 Schematic PDF
No ratings yet
Note 4 Schematic PDF
20 pages
Opc Quick Client
No ratings yet
Opc Quick Client
8 pages
Browser WS Install Guide
No ratings yet
Browser WS Install Guide
114 pages
Abstract
No ratings yet
Abstract
2 pages
Pulse Amplitude Modulation and Demodulation Aim
0% (1)
Pulse Amplitude Modulation and Demodulation Aim
3 pages
Introduction To ODI Agents and Creating A ODI Standalone Agent
No ratings yet
Introduction To ODI Agents and Creating A ODI Standalone Agent
6 pages
Cisco L2VPN Xconnect Redundancy
No ratings yet
Cisco L2VPN Xconnect Redundancy
14 pages
Switch Amplifier
No ratings yet
Switch Amplifier
6 pages
Smart
No ratings yet
Smart
1 page
Kernel Hacking: Introduction To Linux Kernel 2.6 How To Write A Rootkit
No ratings yet
Kernel Hacking: Introduction To Linux Kernel 2.6 How To Write A Rootkit
28 pages
003) Computer - Sumram
No ratings yet
003) Computer - Sumram
196 pages
Programming in Scheme
100% (2)
Programming in Scheme
241 pages
A Compact, High-Performance Laser Printer With Built-In Gigabit Networking and Management Tools
No ratings yet
A Compact, High-Performance Laser Printer With Built-In Gigabit Networking and Management Tools
5 pages
Robotic Process Automation
No ratings yet
Robotic Process Automation
12 pages
SAP Basis Administrator Problems and Solutions
No ratings yet
SAP Basis Administrator Problems and Solutions
4 pages
Business Continuity Plan
No ratings yet
Business Continuity Plan
6 pages
Windows Win32 API Nla
No ratings yet
Windows Win32 API Nla
204 pages
Design and Development of Smart Security System
No ratings yet
Design and Development of Smart Security System
36 pages
Getting Started SIPLACE Explorer3.11 - EN
No ratings yet
Getting Started SIPLACE Explorer3.11 - EN
112 pages
Cabzeus - User Manual (En) - V2.0X
No ratings yet
Cabzeus - User Manual (En) - V2.0X
10 pages
Indonesia AI Summit - NVIDIA
No ratings yet
Indonesia AI Summit - NVIDIA
25 pages
VZXF
No ratings yet
VZXF
5 pages
Cisco Router Performance by Application v2
No ratings yet
Cisco Router Performance by Application v2
4 pages
EMU RS485 Communication Cable - External Alarms
No ratings yet
EMU RS485 Communication Cable - External Alarms
3 pages
Amplifier Negative Feedback: Reduce Nonlinear Distortion: X A X A
No ratings yet
Amplifier Negative Feedback: Reduce Nonlinear Distortion: X A X A
3 pages
ELEG 5421 Spring 2022 Design Project 3 Area LFC Description With Block Diagram v03
No ratings yet
ELEG 5421 Spring 2022 Design Project 3 Area LFC Description With Block Diagram v03
2 pages
1SFA898120R7000 pstx1050 600 70 Softstarter
No ratings yet
1SFA898120R7000 pstx1050 600 70 Softstarter
2 pages
Isp Workshop PDF
No ratings yet
Isp Workshop PDF
218 pages
FS Bulk Shampoo Settlement Check
No ratings yet
FS Bulk Shampoo Settlement Check
5 pages
Chapter 4 (Java Networking)
No ratings yet
Chapter 4 (Java Networking)
39 pages

Compiler Design & Finite Automata

Uploaded by

Compiler Design & Finite Automata

Uploaded by

COMPILER DESIGN (KCS-502)

COMPILER DESIGN (KCS-502)

Analysis of the Source

Synthesis of the Object

NATIVE COMPILER CROSS COMPILER

Translates program for same Translates program for different

It is dependent on System/machine It is independent of System/machine

SPEED FAST SLOW

MEMORY MORE LESS

TIME LESS MORE

• Kleene Closure / Plus

Deterministic Finite Automaton (DFA)

Formal Definition of an NDFA

An NDFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −

A Regular Expression can be recursively defined as follows −

• ε is a Regular Expression indicates the language containing an empty string. (L (ε)

If X is a Regular Expression denoting the language L(X) and Y is a Regular

Type-2 grammars generate the context-free languages. The language

1. First of all it should be Type 1.

• Type 3 should be in the given form only :

A grammar G is said to ambiguous if it has more than one

The left-most derivation The right-most derivation

YACC (Yet Another Compiler Compiler) generates C code for a

Pattern Action Code

Production Action Code

If (MAX==5) GOTO 100

While A>=B & A=2*5 do

You might also like