Compiler Design
6th Semester B.Tech. (CSE)
Course Code: CS3008
Dr. Jasaswi Prasad Mohanty
School of Computer Engineering
KIIT Deemed to be University
Bhubaneswar, India
Motivation
Language processing is an important component of programming
A large number of systems software and application programs require structured
input
• Operating Systems (command line processing)
• Databases (Query language processing)
• Type setting systems like Latex
Where ever input has a structure one can think of language processing
Why study compilers?
• Compilers use the whole spectrum of language processing technology
CD / Module-I / Jasaswi 2 17 January 2024
What do we expect to achieve by the end of the course?
Knowledge to design, develop, understand, modify/enhance, and
maintain compilers for (even complex!) programming languages.
Confidence to use language processing technology for software
development.
Prerequisites
Familiarity with one high-level programming language such as C, C++
and Java is required.
Data Structures & Algorithms
Formal Language and Automata
CD / Module-I / Jasaswi 3 17 January 2024
Course Objective
The objective of the course is
• to introduce the basic theory underlying the different
components and phases of a compiler like parsing, code
generation etc.
• the students will be familiarized with the principles involved in
design of compilers for modern languages.
CD / Module-I / Jasaswi 4 17 January 2024
Course Outcomes
At the end of this course the student will be able to:
1. Explain the design of a compiler and identify the phases involved in program translation from
source code to executable code along with the intermediate files produced by the phases.
2. Represent language tokens using regular expressions and finite automata to design lexical
analyzer for a language.
3. Identify the similarities and differences among various parsing techniques to design parsers.
4. Understand formal attributed grammars for specifying semantics of programming languages and
design semantic rules to facilitate translation process.
5. Identify the effectiveness of optimization and apply various machine independent and machine
dependent optimization techniques.
6. Learn register allocation and apply target code generation algorithms used by the compiler..
CD / Module-I / Jasaswi 5 17 January 2024
Syllabus
Module I Introduction to Compiler, Phases of Compilation, Grouping of Phases.
(Overview of Compiler)
Module II Role of Lexical Analyzer, Input Buffering, Specification of Tokens, Finite state machines
(Lexical Analysis) and their applications to lexical analysis.
Context Free Grammar, Top-down Parsing- Recursive descent parsing, Table driven
Module III predictive parsing, LL(1), Bottom-up Parsing- Shift-reduce parsing, Efficient Bottom-up
(Syntax Analysis) parsers (LR parsers), LR and LALR parsing, Error recovery in parsing, handling
ambiguous grammar. compiler construction tools.
Intermediate forms of source programs- Syntax tree and three address code
Module IV (Quadruples,Triples), attributed grammar, syntax directed translation . Evaluation and
(Semantic Analysis) flow of attributes in a syntax tree, conversion of programming language constructs into
intermediate code forms. Symbol table organization, DAG representation.
Machine dependent and machine independent optimization. Local Optimization-
Module V common sub-expression elimination, Constant folding, replacing expensive operations.
(Code Optimization) Loop Optimization- Basic Block, Flow Graphs, Inner loops, Code Motion, Induction
Variable.
Module VI Design of Code Generator. Machine dependent code optimization, register allocation
(Code Generation) and assignment, Code generation algorithm.
CD / Module-I / Jasaswi 6 17 January 2024
Books
TB1: Compilers Principles, Techniques and Tools by
Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffery D.
Ullman, Pearson Education, 2009.
RB1: Compiler construction principles and practice by
K. C. Louden, Brooks/Cole - Thomson Learning.
RB2: Engineering a Compiler by Keith Cooper, Linda
Torczon, ISBN: 978-0-12-088478-0, Elsevier, Inc..
RB3: Introduction to Compiler Construction With Unix
by Axel T. Schreiner, H. George Friedman, ISBN: 978-
0134743967, Prentice-Hall software series.
RB4: The Compiler Design Handbook by Y.N. Srikant ,
Priti Shankar, ISBN:978-1420043822, Taylor and
Francis.
CD / Module-I / Jasaswi 7 17 January 2024
Online Resources
NPTEL Courses:
• IIT Madras: https://nptel.ac.in/courses/128/106/128106009
• IIT Kharagpur: https://nptel.ac.in/courses/106/105/106105190
• IIT Kanpur: https://nptel.ac.in/courses/106/104/106104123
• IISc Bangalore: https://nptel.ac.in/courses/106/108/106108113
• IIT Kanpur: https://nptel.ac.in/courses/106/104/106104072
Tutorials Point: https://www.tutorialspoint.com/compiler_design
Geeks for geeks: https://www.geeksforgeeks.org/compiler-design-tutorials
CD / Module-I / Jasaswi 8 17 January 2024
Compiler Learning
Isn’t it an old discipline?
• Yes, it is a well-established discipline.
• Algorithms, methods and techniques are researched and developed in early
stages of computer science growth.
• There are many compilers around and many tools to generate them
automatically.
So, why we need to learn it?
• Although you may never write a full compiler.
• But the techniques we learn is useful in many tasks like writing an interpreter
for a scripting language, validation checking for forms and so on.
CD / Module-I / Jasaswi 9 17 January 2024
Activity Calendar
ACTIVITY
TYPE DATE MARKS CO
SL.NO.
1. Class Test-1 25th Jan-30th Jan 5 CO1, CO2
2. Quiz-1 14th Feb-17th Feb 5 CO2, CO3
3. Assignment 17th Feb-24th Feb 5 CO1, CO2, CO3
MID SEM EXAM 26th Feb-02nd Mar
4. Class Test-2 01st Apr-07th Apr 5 CO3, CO4
5. Quiz-2 20th Apr-26th Apr 10 CO3, CO4, CO5, CO6
END SEM EXAM 27th Apr-08th May
CD / Module-I / Jasaswi 10 17 January 2024
MODULE I
Overview of Compiler
Sl. No. Topics
1. Introduction
2. Language Processors
3. Types of Compiler
4. Language Processing Systems
5. Structure of Compiler
6. Phases of Compiler
7. Grouping of phases into passes
8. Compiler Construction Tools
1. Introduction
Programming languages are notations for describing computations to
people and to machines.
Before a program can be run, it first must be translated into a form in
which it can be executed by a computer.
The software systems that do this translation are called compilers.
A system software to convert source language program to target
language program.
It validates input program to the source language specification –
produces error messages/warnings during the translation process.
CD / Module-I / Jasaswi 12 17 January 2024
Bit of History
How are programming languages implemented? Two major strategies:
• Interpreters (old and much less studied)
• Compilers (very well understood with mathematical foundations)
Some environments provide both interpreter and compiler. Lisp, scheme etc.
provide
• Interpreter for development
• Compiler for deployment
Java
• Java compiler: Java to interpretable bytecode
• Java JIT: bytecode to executable image
CD / Module-I / Jasaswi 13 17 January 2024
Some early machines and implementations
IBM developed 704 Data Processing System in 1954. All programming
was done in assembly language. Cost of software development far
exceeded cost of hardware. Low productivity.
Speedcoding interpreter: programs ran about 10 times slower than
hand written assembly code.
John Backus (in 1954): Proposed a program that translated high level
expressions into native machine code. Skepticism all around. Most
people thought it was impossible.
Fortran I project (1954-1957): The first compiler was released.
CD / Module-I / Jasaswi 14 17 January 2024
Fortran I
The first compiler had a huge impact on the programming
languages and computer science. The whole new field of compiler
design was started from that.
More than half the programmers were using Fortran by 1958. The
development time was cut down to half.
That led to enormous amount of theoretical work (lexical analysis,
parsing, optimization, structured programming, code generation,
error recovery etc.)
Modern compilers preserve the basic structure of the Fortran I
compiler.
CD / Module-I / Jasaswi 15 17 January 2024
What are Compilers?
• Compiler is a translator which translates from one representation of the program to
another
• Typically from high level source code to low level machine code or object code
• Source code is normally optimized for human readability
– Expressive: matches our notion of languages (and application?!)
– Redundant to help avoid programming errors
• Machine code is optimized for hardware
– Redundancy is reduced
– Information about the intent is lost
High level Low level
Compiler program
program
CD / Module-I / Jasaswi 16 17 January 2024
2. Language Processors
Compiler: It is a program that can read a program in one language (source
language) and translate it into an equivalent program in another language (target
language).
If the target program is an executable machine-language program, it can then be
called by the user to process inputs and produce outputs.
Source Program
Input Target Program Output
Compiler
Target Program
CD / Module-I / Jasaswi 17 17 January 2024
Language Processors – contd…
Interpreter: It is another common kind of language processor.
Instead of producing a target program as a translation, an interpreter appears to
directly execute the operations specified in the source program on inputs
supplied by the user.
The machine-language target program produced by a compiler is usually much
faster than an interpreter at mapping inputs to outputs .
An interpreter, however, can usually give better error diagnostics than a
compiler, because it executes the source program statement by statement.
Source Program
Interpreter Output
Input
CD / Module-I / Jasaswi 18 17 January 2024
Compiler vs Interpreter
Parameters Compiler Interpreter
Input Compiler takes the entire program as input. Interpreter takes single instruction as input.
Output Compiler generates an intermediate object Interpreter does not produce any intermediate
code. object code.
Working The compilation is done before execution. Compilation and execution takes place
mechanism simultaneously.
Speed Compiled codes run faster than Interpreter. Interpreted codes run slower than Compiler.
Memory Memory requirement is more due to It requires less memory as it does not create
creation of object code. intermediate object code.
Errors Errors are displayed after Compilation, all at Errors are displayed in every single line.
the same time.
Recompilation Any change in the source program after the Any change in the source program during the
compilation requires recompiling the entire translation does not require retranslation of the
code. entire code.
CD / Module-I / Jasaswi 19 17 January 2024
Compiler vs Interpreter
Compiler Interpreter
1. Steps of Programming: 1. Steps of Programming:
a) Program Creation a) Program Creation
b) Analysis of language by the compiler and throws b) Execution of source statements one by one.
errors (if any)
c) Converts the source code to Machine Code (if no
error)
d) Linking of various code files into a runnable program
e) Program Execution
2. The compiler saves the Machine Code on disks. 2. The Interpreter does not save the Machine Code.
3. Compiled codes run faster than Interpreter. 3. Interpreted codes run slower than Compiler.
4. The compiler generates an output in the form of (.exe). 4. The interpreter does not generate any output.
5. Any change in the source program after the compilation 5. Any change in the source program during the
requires recompiling the entire code. translation does not require retranslation of the
entire code.
6. Errors are displayed in Compiler after Compiling 6. Errors are displayed in every single line.
together at the current time.
CD / Module-I / Jasaswi 20 17 January 2024
Compiler vs Interpreter – contd…
Compiler Interpreter
7. The compiler can see code upfront which helps in 7. The Interpreter works by line working of Code, that’s
running the code faster because of performing why Optimization is a little slower compared to
Optimization. Compilers
8. It does not require source code for later execution. 8. It requires source code for later execution.
9. Execution of the program takes place only after the 9. Execution of the program happens after every line is
whole program is compiled. checked or evaluated.
10. Compilers more often take a large amount of time for 10. In comparison, Interpreters take less time for
analyzing the source code. analyzing the source code.
11. CPU utilization is more in the case of a Compiler. 11. CPU utilization is less in the case of a Interpreter.
12. The use of Compilers mostly happens in Production 12. The use of Interpreters is mostly in Programming and
Environment. Development Environments.
13. Object code is permanently saved for future use. 13. No object code is saved for future use.
14. Programming languages that are compiler-based: C, 14. Programming languages that are interpreter-based:
C++, C#. Python, Perl, MATLAB.
CD / Module-I / Jasaswi 21 17 January 2024
3. Types of Compiler
1. Native Compiler (Native Code Compiler):
• Native Compiler is a compiler which is used to convert source code into target code for the same machine
on which it runs.
• Example: Turbo C, GCC.
Source Code Native Compiler Target Code
Machine A Run on Machine A Machine A
2. Cross Compiler:
• Cross Compiler is a compiler which is used to convert source code of one machine into target code of
another machine. It is useful where the target platform is not having enough resources to build target
code.
• Example: arm-cortexa8-linux-gnueabihf-gcc hello.c (Source code in x86 architecture to target code in
arm architecture)
Source Code Cross Compiler Target Code
Machine A Run on Machine A Machine B
CD / Module-I / Jasaswi 22 17 January 2024
Types of Compiler – contd…
3. Source to Source Compiler:
• A source-to-source Compiler also known as a transcompiler is a compiler which converts one
high level language source code into another high level language target code.
• Example: CoffeeScript, Dart, Haxe, Opal, TypeScript.
High Level Source to Source High Level
Language Compiler Language
4. Bytecode Compiler:
• Bytecode Compiler is a compiler which is used to convert high level language source code
into bytecode.
• Example: Java
High Level Bytecode Bytecode
Language Compiler
Machine A Run on Machine A Machine B
CD / Module-I / Jasaswi 23 17 January 2024
Types of Compiler – contd…
5. Just-In-Time(JIT) Compiler:
• JIT Compiler is a compiler which translates byte code into machine code.
• Example: VM (Java Virtual Machine) in Java and CLR (Common Language Runtime) in C#
JIT
Bytecode Machine Code
Compiler
Note:
Java Language Processor: A Hybrid
Source Program
Compiler
• Java language processors combine
compilation and interpretation. Translator
• A Java source program first compiled into an (Bytecode
intermediate form called bytecodes. compiler)
• The bytecodes are then interpreted by a
virtual machine.
• Advantage: Bytecodes compiled on one Intermediate Program Virtual Machine Output
machine can be interpreted on another Input
machine, across a network.
• To achieve faster processing of inputs to outputs, some Java compilers, called just-in-time compilers, translate the
bytecodes into machine language immediately before they run the intermediate program to process the input.
CD / Module-I / Jasaswi 24 17 January 2024
4. Language Processing Systems
We write programs in a High Level Language Source Code in HLL
(HLL) which is very easy for the user to first.c
understand. Preprocessor
These programs are fed into a series of tools Preprocessed Code
first.i
and Operating System Components to get the (Pure HLL)
desired code that can be used by the Compiler
machine. This is called Language Processing
Assembly Code
Systems.
first.s (Program in Assembly Level
Language)
NOTE:
Assembler
1. Relocatable Machine Code: It is a form
of machine code that is designed to be first.o Relocatable Machine Code
(Object Code)
loaded and executed at any memory Library files
Linker/Loader
location. Relocatable object files
2. Executable Code: Code which can work first.out / first.exe
at one specific address in memory. Executable Code (Target Machine Code)
CD / Module-I / Jasaswi 25 17 January 2024
Language Processing Systems – contd…
1. Pre-processor :
• It is a program which converts HLL to pure HLL (completely machine-
independent. Programs written in a pure high-level language can theoretically
be executed on any computer without modification).
• It takes source code as input and produces modified source code as output.
• It removes the preprocessor directives (e.g. #include<stdio.h>) and adds the
respective files which is File Inclusion.
• It also does macro expansion (e.g. # define PI 3.1341;)., operator conversion
(e.g. a++; to a=a+1;).
• The preprocessor is also known as a macro evaluator.
• The processing is optional that is if any language does not
support #include, macros processing is not required.
CD / Module-I / Jasaswi 26 17 January 2024
Language Processing Systems – contd…
2. Compiler:
• Compiler is a program that can read a program in one language (the source
language) and translate it into an equivalent program in another language (the
target language).
• It converts pure HLL into assembly level language.
• Assembly Level Language is a low level programming language. It is not in
binary form.
• It also shows errors and warning messages.
CD / Module-I / Jasaswi 27 17 January 2024
Language Processing Systems – contd…
3. Assembler:
• For every platform (H/W + OS) we have an assembler.
• An assembler in one platform will not work in another platform.
• Assembler converts assembly code to relocatable machine code.
4. Loader and Linker:
• This phase converts relocatable code to absolute machine code.
• Linker: Links different object files, libraries, DLL files into a single executable file.
A DLL is a library that contains code and data that can be used by more than one
program at the same time.
• Loader: Loads the executable file in the memory and execute it.
NOTE:
1. The modern compilers like TurboC, gcc, etc. combine all the phases to convert a source file to
executable machine code.
CD / Module-I / Jasaswi 28 17 January 2024
5. Structure of a Compiler
Basically these are the two phases of a Compiler
• Analysis phase:
It breaks up the source program into constituent pieces and imposes a grammatical structure on
them. Then it uses this structure to create an intermediate representation of the source program.
Ifthe analysis part detects that the source program is either syntactically ill formed or semantically
unsound, then it must provide informative messages, so the user can take corrective action.
The analysis part also collects information about the source program and stores it in a data
structure called a symbol table, which is passed along with the intermediate representation to the
synthesis part.
This phase is often called as front end of the compiler.
• Synthesis phase:
The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table.
This phase is often called as back end of the compiler.
CD / Module-I / Jasaswi 29 17 January 2024
6. Phases of a Compiler
The process of converting the
source code into machine code Front End
involves several phases or
stages, which are collectively
known as the phases of a
compiler.
The analysis part also collects
information about the source
program and stores it in a data
structure called a symbol table.
The symbol table, which stores
information about the entire
source program, is used by all
phases of the compiler. Back End
CD / Module-I / Jasaswi 30 17 January 2024
Phases of a Compiler – contd…
CD / Module-I / Jasaswi 31 17 January 2024
6.1 Lexical Analysis
The first phase of a compiler is called lexical analysis or scanning.
It takes the output of the pre-processor (which performs file inclusion and macro expansion) as
the input which is in a pure high-level language
The lexical analyzer reads the stream of characters making up the source program and groups
the characters into meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces as output a token of the following form and
passes on to the subsequent phase, syntax analysis:
<token-name, attribute-value>
In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for this
token. Information from the symbol-table entry is needed for semantic analysis and code
generation.
It also removes lexical errors (e.g., erroneous characters), comments, and white space.
Generally implemented as Finite Automata.
CD / Module-I / Jasaswi 32 17 January 2024
Example
Suppose a source program contains the following assignment statement:
position = initial + rate * 60
The characters in this assignment could be grouped into the following lexemes and mapped into the
following tokens passed on to the syntax analyzer:
• position is a lexeme that would be mapped into a token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the symbol-table entry for position. The symbol-table entry for
an identifier holds information about the identifier, such as its name and type.
• The assignment symbol = is a lexeme that is mapped into the token <=>. Since this token needs
no attribute-value, the second component is omitted.
• initial is a lexeme that is mapped into the token <id, 2>, where 2 points to the symbol-table entry
for initial.
• + is a lexeme that is mapped into the token <+>.
• rate is a lexeme that is mapped into the token <id, 3>, where 3 points to the symbol-table entry for
rate.
• * is a lexeme that is mapped into the token <*>.
• 60 is a lexeme that is mapped into the token <60>.
CD / Module-I / Jasaswi 33 17 January 2024
Example – contd…
Blanks separating the lexemes would be discarded by the lexical analyzer.
After lexical analysis, the representation of the assignment statement position =
initial + rate * 60 as the sequence of tokens are as follows:
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
The symbol table will look like as follows:
CD / Module-I / Jasaswi 34 17 January 2024
6.2 Syntax Analysis
The second phase of the compiler is syntax analysis or parsing.
The parser uses the first components of the tokens produced by the lexical analyzer to
create a tree-like intermediate representation known as Syntax Tree which shows the
grammatical structure of the token stream.
In the syntax tree each interior node represents an operation and the children of the node
represent the arguments of the operation. This tree shows the order in which the operations
in the assignment are to be performed.
A syntax tree for the token stream <id, 1> <=> <id, 2> <+> <id, 3> <*> <60> is shown below
which is the output of the syntax analyzer.
CD / Module-I / Jasaswi 35 17 January 2024
6.3 Semantic Analysis
The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
It also gathers type information and saves it in either the syntax tree or the symbol table,
for subsequent use during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler checks that
each operator has matching operands.
• Example: many programming language definitions require an array index to be an
integer; the compiler must report an error if a floating-point number is used to index
an array.
The language specification may permit some type conversions called coercions.
• Example: a binary arithmetic operator may be applied to either a pair of integers or
to a pair of floating-point numbers. If the operator is applied to a floating-point number
and an integer, the compiler may convert or coerce the integer into a floating-point
number.
CD / Module-I / Jasaswi 36 17 January 2024
Example
Suppose that position, initial, and rate have been declared to be floating-point numbers, and
that the lexeme 60 by itself forms an integer.
The type checker in the semantic analyzer discovers that the operator * is applied to a floating-
point number rate and an integer 60. In this case, the integer may be converted into a floating-
point number.
Note that the output of the semantic analyzer has an extra node for the operator inttofloat, which
explicitly converts its integer argument into a floating-point number.
CD / Module-I / Jasaswi 37 17 January 2024
6.4 Intermediate Code Generation
After syntax and semantic analysis of the source program, many compilers generate an explicit
low-level or machine-like intermediate representation, which we can think of as a program for an
abstract machine.
This intermediate representation should have the following two important properties:
• it should be easy to produce
• it should be easy to translate into the target machine.
We consider an intermediate form called three-address code, which consists of a sequence of
assembly-like instructions with three operands per instruction.
Each operand can act like a register. The output of the intermediate code generator consists of
the three-address code sequence.
CD / Module-I / Jasaswi 38 17 January 2024
Intermediate Code Generation – contd..
NOTE:
1. Each three-address assignment instruction has at most one operator on the right
side. Thus, these instructions fix the order in which operations are to be
performed. The multiplication precedes the addition in the source program.
2. The compiler must generate a temporary name to hold the value computed by a
three-address instruction.
3. Some of the "three-address instructions" like the first and last in the following
code, have fewer than three operands.
CD / Module-I / Jasaswi 39 17 January 2024
6.5 Code Optimization
The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result.
Better in the sense the code should be shorter, execute faster, and should
consume less power.
A simple intermediate code generation algorithm followed by code optimization is
a reasonable way to generate good target code.
The shorter sequence of code of the code generated by the Intermediate Code
generation phase is as follows:
CD / Module-I / Jasaswi 40 17 January 2024
6.6 Code Generation
The code generator takes as input an intermediate representation of the source program and
maps it into the target language.
If the target language is machine code, registers or memory locations are selected for each of the
variables used by the program.
The intermediate instructions are translated into sequences of machine instructions that perform
the same task.
A crucial aspect of code generation is the judicious assignment of registers to hold variables.
The intermediate code generated in the previous phase can be converted to the following
machine instructions:
LDF: Load data to floating-point register
ADDF: Floating-point Addition.
MULF: Floating-point Multiplication.
STF: Store data from floating-point register.
CD / Module-I / Jasaswi 41 17 January 2024
Translation of an assignment statement
CD / Module-I / Jasaswi 42 17 January 2024
6.7 Symbol-Table Management
Compiler records the variable names used in the source program and collect
information about various attributes of each name.
These attributes may provide information about:
Variable Names Procedure Names
Storage Number of arguments
Type Type of its arguments
Scope (where in the Method of passing each argument (by
program its value may value or by reference)
be used)
Return type
The symbol table is a data structure containing a record for each variable name,
with fields for the attributes of the name.
The data structure should be designed to allow the compiler to find the record for
each name quickly and to store or retrieve data from that record quickly.
CD / Module-I / Jasaswi 43 17 January 2024
6.8 Error Handler
Error Handling is the phase which is connected to all the phases of a Compiler.
If an error is detected at a particular phase, it must be dealt so that compilation
can proceed.
Detection and reporting of errors in the source program is the main function of
the compiler.
An error can occur at any phase of compilation.
A good compiler must determine the line number of the program exactly, where
the errors have occurred.
Functions of Error Handler:
• Error Detection
• Error Report
• Error Recovery
CD / Module-I / Jasaswi 44 17 January 2024
6.8.1 Types of Errors
Various errors that can occur at a different level of compilation are as follows:
• Lexical (scanner) Errors:
Illegal or unrecognized characters mainly, caused by typing errors.
Misspellings of identifiers, keywords or operators
Unterminated character or string constant: Happens whenever the programmer types
something in quotes and forgets the trailing quote.
• Syntactical Errors:
These errors are among the most common and is caught by the parser.
A missing semicolon or unbalanced parenthesis
The difficult part is to decide from where to continue the syntactical analysis after an error
has been found.
If the parser is not carefully written, or if the error detection and recovery scheme are not
proper, the parser will hit one error and “mess up” after that, and give wrong error
messages throughout the rest of the program.
CD / Module-I / Jasaswi 45 17 January 2024
Types of Errors – contd..
Various errors types can occur at a different level of compilation (contd…):
• Semantical Errors:
The semantic errors that can occur in a program are related to the fact that some
statements can be correct from the syntactical point of view, but they make no sense and
no code can be generated to carry out the meaning of the statement.
incompatible value assignment or type mismatches between operator and operand
• Error may encounter during code optimization (Logical Errors)
During control flow analysis, there may exist some statements which can never be reached
(code not reachable).
Infinite loop.
• Error may occur during code generation
In code generation, the architecture of the computer also plays an important role.
• Error may encounter when the compiler tries to make symbol table entries
There may exist an identifier that has multiple declarations with contradictory attributes.
CD / Module-I / Jasaswi 46 17 January 2024
7.1 Grouping phases into passes
A pass is a component where parts of one or more phases of the compiler are
combined when a compiler is implemented.
It refers to the traversal of a compiler through the entire program.
A pass reads or scans the instructions of the source program or the output
produced by the previous pass, which makes necessary transformation specified
by its phase and writes the output in an intermediate file, which may be read by
subsequent phases.
CD / Module-I / Jasaswi 47 17 January 2024
Advantages of Grouping phases into passes
We can produce compilers for different source languages for one target machine by combining
different front ends with the back end for that target machine. Similarly, we can produce compilers for
different target machines, by combining a front end with back ends for different target machines.
Simplification of Compiler Design: By organizing the compilation process into distinct passes, each
handling a specific aspect of the compilation, the overall complexity of the compiler is reduced. This
modular approach allows each pass to focus on a single task, making the compiler easier to
understand, develop, and maintain.
Facilitates Debugging and Maintenance: When compiler tasks are divided into separate passes, it
becomes easier to locate and fix bugs. This division allows developers to isolate issues within a
specific pass, reducing the time and effort needed to debug and maintain the compiler.
Efficiency in Resource Utilization: Grouping into passes can lead to more efficient use of system
resources. For instance, each pass might require different resources or data structures. By separating
these needs into distinct passes, the compiler can manage memory and processing power more
effectively.
Reusability: The modular nature of passes allows for the reuse of certain components in different
compilers. For example, a pass designed for optimizing arithmetic expressions can be used in multiple
compilers that target different programming languages or hardware architectures
CD / Module-I / Jasaswi 48 17 January 2024
7.2 Types of Compiler Pass
1. One Pass Compiler:
If we combine or group all the phases of compiler design in a single module known as a single
pass compiler.
A one-pass/single-pass compiler is a type of compiler that passes through the part of each
compilation unit exactly once.
Single pass compiler is faster and smaller than the multi-pass compiler.
Problems with One Pass Compiler:
Limited Optimization Capabilities: Limited ability to perform complex optimizations. Since
they process the source code in a single pass, they have less context about the entire
program, which is often necessary for more advanced optimizations that require multiple
views of the program's code.
Less Efficient Memory Management: Since one-pass compilers process the source code
in a linear fashion, they may not manage memory as efficiently as multi-pass compilers.
They might need to keep more information in memory for longer periods, as they cannot
make multiple passes to refine their data structures.
CD / Module-I / Jasaswi 49 17 January 2024
7.2 Types of Compiler Pass
2. Two-Pass compiler or Multi-Pass compiler:
A Two pass/multi-pass Compiler is a type of compiler that processes the source code of
a program multiple times.
The First Pass is referred as Front End
It is the Analytic part
It is platform independent
The Second Pass is referred as Back End
It is the Synthesis Part
It is platform dependent
CD / Module-I / Jasaswi 50 17 January 2024
7.3 One Pass vs Two-pass Compiler
Aspect One-Pass Compiler Two-Pass Compiler
A two-pass compiler divides the compilation
A one-pass compiler translates the source code
process into two stages or passes. The first pass
Definition into machine code in a single pass, analyzing
involves syntax and semantic analysis, and the
and compiling the code in one step.
second pass generates the machine code.
Slower compared to one-pass compilers as it
Speed Generally faster as it processes the code once.
involves two distinct phases.
Usually requires less memory as it does not Requires more memory to store information from
Memory Usage
need to store much intermediate data. the first pass for use in the second pass.
Better error handling as it can gather more
Limited error detection capabilities since it
Error Handling context about the program in the first pass,
doesn’t analyze the entire program context.
leading to more accurate error detection.
Limited optimization opportunities, as it does not Offers better optimization, as the first pass can
Optimization analyze the entire program before generating collect information useful for optimizing code
code. generation in the second pass.
Used for more complex languages and
Suitable for simpler or resource-constrained
Use Cases applications where thorough analysis and
environments.
optimization are required.
Examples Early versions of C compilers. Modern C++, Java, and C# compilers.
CD / Module-I / Jasaswi 51 17 January 2024
8. Compiler Construction Tools
Writing a compiler is tedious and time consuming task. There are some specialized tools for helping in
implementation of various phases of compilers.
These tools are called compiler construction tools. These tools are also called as compiler-compiler, compiler-
generators, or translator writing system.
1. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens of
a language. Example: LEX, Flex(Fast Lexical Analyzer Generator) for C++, JFlex for Java
2. Parser generators that automatically produce syntax analyzers from a grammatical description of a
programming language. Example: YACC(Yet Another Compiler Compiler), Bison, Lemon, ANTLR
(Another Tool for Language Recognition)
3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and
generating intermediate code.
4. Code-generators that produce a code generator from a collection of rules for translating each operation
of the intermediate language into the machine language for a target machine.
5. Data- flow analysis engines that facilitate the gathering of information about how values are transmitted
from one part of a program to each other part. Data- flow analysis is a key part of code optimization.
6. Compiler-construction toolkits that provide an integrated set of routines for constructing various phases
of a compiler. Example: LLVM (Low Level Virtual Machine), Clang, GNU Compiler Collection (GCC)
CD / Module-I / Jasaswi 52 17 January 2024
Probable Questions
1) What is the difference between a compiler and an interpreter?
A compiler is a program that can read a program in one language - the source language - and translate it
into an equivalent program in another language – the target language and report any errors in the source
program that it detects during the translation process.
Interpreter directly executes the operations specified in the source program on inputs supplied by the user.
2) What are the advantages of (a) a compiler over an interpreter (b) an interpreter over a compiler?
a) The machine-language target program produced by a compiler is usually much faster than an interpreter
at mapping inputs to outputs.
b) An interpreter can usually give better error diagnostics than a compiler, because it executes the source
program statement by statement.
3) What advantages are there to a language-processing system in which the compiler produces assembly
language rather than machine language?
The compiler may produce an assembly-language program as its output, because assembly language is
easier to produce as output and is easier to debug.
4) A compiler that translates a high-level language into another high-level language is called a source-to-
source translator. What advantages are there to using C as a target language for a compiler?
For the C language there are many compilers available that compile to almost every hardware.
CD / Module-I / Jasaswi 53 17 January 2024