COMPILER DESIGN
Chapter 1 - Unit I
Introduction to Compiling: Compilers - Analysis of the source program -
Phases of a compiler - Cousins of the Compiler - Grouping of Phases -
Compiler construction tools.
Why do we need to design compiler?
• Because computer can't understand the source code directly.
• It will understand only object level code.
• Source codes are human readable format but the system cannot understand it.
• So, the compiler is intermediate between human readable format and machine-
readable format.
Why compiler design is important
• Compiler design principles provide an in-depth view of translation and
optimization process.
• Compiler design covers basic translation mechanism and error detection &
recovery.
• It includes lexical, syntax, semantic analysis and intermediate code generation as
front end, and code generation and optimization as back-end.
Overview of Language Processing System
A combination of programs such as preprocessors, compilers, assemblers, loaders, and
links which work together to convert source code written in a high-level language such as
java, c++ , c and python to executable target machine code. This is known as a language
processing system.
Fig: Language Processing System
Step 1: Preprocessor: A preprocessor produce input to compilers. They may perform the
following functions.
a. Macro processing: A preprocessor may allow a user to define macrosthat are short
hands for longer constructs.
b. File inclusion: A preprocessor may include header files into the program text.
c. Rational preprocessor: These preprocessors augment older languageswith more
modern flow-of-control and data structuring facilities.
d. Language Extensions: These preprocessor attempts to add capabilities tothe
language by certain amounts to build-in macro.
Step 2:Compiler: Compiler is a translator program that translates a program written in
(HLL) the source program and translates it into an equivalent program in (MLL) thetarget
program. As an important part of a compiler is error showing to theprogrammer.
Executing a program written in HLL programming language is basically of two parts. The
source program must first be compiled translated into an object program. Then the results
object program is loaded into a memory executed.
The examples of different types of compilers are:
1. Ada compilers 2. ALGOL compilers 3. BASIC compilers
4. C# compilers 5. C compilers 6. C++ compilers
7. COBOL compilers 8. Java compilers 9.Pythonand so on.
Step 3: Assembler: Assembler is a program that takes assembly language as source code
and converts it into the bit format i.e machine language which is understandable by the
computers.
An assembly language which is basically mnemonics like GO, HALT, JUMP, and NOT
code which is translated to the machine language by programming language translator
i.e., Assembler.
Step 4: Interpreter: Interpreter is a program that converts the high level language into the
bit format i.e. machine language. The function of the interpreter and compiler is the same
but the interpreter translates one line at a time and executes it. No object code is produced
so every time when the program has to be run it is to be interpreted first.
Note: Languages such as BASIC, SNOBOL, LISP can be translated using interpreters.
JAVA alsouses interpreter.
Step 5: Linker :A linker is special program that combines the object files, generated by
compiler/assembler, and other pieces of codes to originate an executable file have. exe
extension. In the object file, linker searches and append all libraries needed for execution
of file. It regulates memory space that code from each module will hold. It also merges
two or more separate object programs and establishes link among them. Generally, linkers
are of two types:
1. Linkage Editor
2. Dynamic Linker
Step 6: Loader:The loader is special program that takes input of object code from linker,
loads it to main memory, and prepares this code for execution by computer. Loader
allocates memory space to program. The embedded computer systems don’t have loaders.
In them, code is executed through ROM. Generally, loader has three types of approach:
1. Absolute loading
2. Relocatable loading
3. Dynamic run-time loading
Translators
• A translator is a program that takes as input a program written in one
language and produces as output a program in another language.
• Beside program translation, the translator performs another very important role, the
error-detection. Any violation of the HLL specification would be detected and
reported to the programmers.
Important role of translator is:
• Translating the HLL program input into an equivalent ML program.
• Providing diagnostic messages wherever the programmer violates specification
oftheHLL.
Types of Translators:
• Interpreter (Discussed in Previous Section)
• Preprocessor(Discussed in Previous Section)
• Compiler (Discussed in Previous Section)
Phases of a Compiler
• The compilation process contains the sequence of various phases.
• Each phase takes source program in one representation and produces output in
another representation.
• Each phase takes input from its previous stage.
Fig: Phases of a Compiler
Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code as
input. It reads the source program one character at a time and converts it into meaningful
lexemes. Lexical analyzer represents these lexemes in the form of tokens.
Tokens are the smallest elements or the building blocks used to construct a program.
Tokens are of 6 types, and they are classified as: Identifiers, Keywords, Constants,
Operators, Special Characters and Strings.
Syntax Analysis:
Syntax analysis is the second phase of compilation process. It takes tokens as input and
generates a parse tree as output. In syntax analysis phase, the parser checks that the
expression made by the tokens is syntactically correct or not.
Semantic Analysis:
Semantic analysis is the third phase of compilation process. It checks whether the parse
tree follows the rules of language. Semantic analyzer keeps track of identifiers, their types
and expressions. The output of semantic analysis phase is the annotated syntax tree.
Intermediate Code Generation:
In the intermediate code generation, compiler generates the source code into the
intermediate code. Intermediate code is generated between the high-level language and the
machine language. The intermediate code should be generated in such a way that you can
easily translate it into the target machine code.
Code Optimization:
Code optimization is an optional phase. It is used to improve the intermediate code so that
the output of the program could run faster and take less space. It removes the unnecessary
lines of the code and arranges the sequence of statements in order to speed up the program
execution.
Code Generation:
Code generation is the final stage of the compilation process. It takes the optimized
intermediate code as input and maps it to the target machine language. Code generator
translates the intermediate code into the machine code of the specified computer.
Example for Phases of a Compiler:
Table Management (or) Book-keeping: -
This is the portion to keep the names used by the program and records essential
information about each. The data structure used to record this information called a
“Symbol Table‟.
Error Handling: The tasks of the Error Handling process are to detect each error, report it
to the user, and then make some recover strategy and implement them to handle error.
During this whole process processing time of program should not be slow. An Error is the
blank entries in the symbol table.
• Types or Sources of Error – There are two types of error: run-time and compile-
time error:
• A run-time error is an error which takes place during the execution of a program,
and usually happens because of adverse system parameters or invalid input data.
The lack of sufficient memory to run an application or a memory conflict with
another program and logical error are example of this. Logic errors, occur when
executed code does not produce the expected result. Logic errors are best handled
by meticulous program debugging.
• Compile-time error rises at compile time, before execution of the program.
Syntax error or missing file reference that prevents the program from successfully
compiling is the example of this.
Cousins of the Compiler
The Cousins of the Compiler are:
1) Preprocessor
It converts the HLL (high level language) into pure high level language. It includes all the
header files and also evaluates if any macro is included. It is the optional because if any
language which does not support #include and macro preprocessor is not required.
2) Compiler
a compiler is used to convert high-level programming language code into machine
language code.
3) Assembler
On the other hand, an assembler converts assembly level language code into machine
language code.
4) Linking and loading: It has four functions:
Allocation:
It means get the memory portions from operating system and storing the object data.
Relocation:
It maps the relative address to the physical address and relocating the object code.
Linker:
It combines all the executable object module to pre single executable file.
Loader:
It loads the executable file into permanent storage.
Compiler Constructor Tools
The compiler writer can use some specialized tools that help in implementing various
phases of a compiler. These tools assist in the creation of an entire compiler or its parts.
Some commonly used compiler construction tools include:
Some commonly used compiler-construction tools are:
1. Scanner generators.
2. Parser generators.
3. Syntax-directed translation engines.
4. Automatic code generators.
5. Data-flow analysis engines.
6. Compiler-construction toolkits.
Scanner Generators: Scanner generator takes a regular expression description of the
tokens of a programming language and produces lexical analyzers.
Input: Regular expression description of the tokens of a language.
Output: Lexical analyzers.
Parser Generators: Parser generator takes the grammatical description of a programming
language and produces a syntax analyzer.
Input: Grammatical description of a programming language.
Output: Syntax analyzers.
Syntax-directed Translation Engines: Syntax-directed translation engines produce
collections of routines that walk a parse tree and generates intermediate code.
Input (Parse tree) →Output (Intermediate code)
Automatic Code Generators: Code-generator takes a collection of rules that define the
translation of each operation of the intermediate language into the machine language for a
target machine.
Input (Intermediate code)→Output (Machine code)
Data-flow Analysis Engines: Data-flow analysis engine gathers the information, that is,
the values transmitted from one part of a program to each of the other parts. Data-flow
analysis is a key part of code optimization.
Compiler Construction Toolkits: Compiler construction toolkits provide an integrated
set of routines for construction of phases of compiler.
Grouping of The Phases
• Compiler can be grouped into front and back ends:
• Front end: analysis part of the source program (Language dependent): These
normally include lexical and syntactic analysis, the creation of the symbol table,
semantic analysis, and the generation of intermediate code. It also includes error
handling that goes along with each of these phases.
• Front end of a compiler consists of the phases:
• Lexical analysis.
• Syntax analysis.
• Semantic analysis.
• Intermediate code generation.
• Description:
• Front end comprises of phases which are dependent on the input (source
language) and independent on the target machine (target language).
• It includes lexical and syntactic analysis, symbol table management, semantic
analysis and the generation of intermediate code.
• It also includes error handling at the phases concerned.
• Back end: synthesis part of the source program (machine dependent): It includes
code optimization phase and code generation along with the necessary error
handling and symbol table operations.
• Back end of a compiler contains:
• Code optimization.
• Code generation.
• Description:
• Back end comprises of those phases of the compiler that are dependent on the
target machine and independent on the source language.
• This includes code optimization, code generation.
• In addition to this, it also encompasses error handling and symbol table
management operations.
Passes of a Compiler
• Pass is a complete traversal of the source program.
• Compiler has two passes to traverse the source program.
• Single Pass Compiler and Two Pass Compiler or Multi Pass Compiler.
• Single Pass Compiler:
• One pass compiler reads the code only once and then translates it.
• The one-pass compiler passes only once through the parts of each compilation unit.
• It can translate each part into its final machine program. In the one-pass compiler,
when the line source is processed, it is scanned, and the token is extracted.
• If we combine or group all the phases of compiler design in a single module
known as a single pass compiler.
• Two Pass Compiler or Multi Pass Compiler:
• A multi-pass compiler can process the source code of a program multiple times.
• In multi pass Compiler, we divide phases into two passes as:
• First Pass: is refers as
• (a). Front end
• (b). Analytic part
• (c). Platform independent
• Second Pass: is refers as
• (a). Back end
• (b). Synthesis Part
• (c). Platform Dependent
• Differences between Single Pass and Multi pass Compilers:
Let us see the comparison between One-Pass and Multi-Pass Compiler
One-Pass Compiler Multi-Pass Compiler
It reads the code only once and translates it at a It reads the code multiple times, each time
similar time. changing it into numerous forms.
They are faster. They are "slower." As more number of passes
means more execution time.
Less efficient code optimization and code Better code optimization and code generation.
generation.
The compiler requires large memory. The memory occupied by one pass can be
reused by a subsequent pass; therefore, small
memory is needed by the compiler.
Example − Pascal & C languages use one-pass Example − Modula -2 languages use multi-
compilation. pass compilation.
-------------------------------------------------------------