Compiler Designing
The compiler is software that converts a program written in a high-level language (Source Language) to a
low-level language (Object/Target/Machine Language/0, 1’s).
1. The compiler is a type of translator, which takes a program written in a high-level programming
language as input and translates it into an equivalent program in low-level languages such as machine
language or assembly language.
2. The program written in a high-level language is known as a source program, and the program
converted into a low-level language is known as an object (or target) program.
3. The process of translating the source code into machine code involves several stages, including lexical
analysis, syntax analysis, semantic analysis, code generation, and optimization.
4. Compiler verifies all types of limits, ranges, errors , etc. Compiler program takes more time to run and
it occupies huge amount of memory space.
5. The speed of compiler is slower than other system software. It takes time because it enters through
the program and then does translation of the full program.
6. When compiler runs on same machine and produces machine code for the same machine on which it
is running. Then it is called as self compiler or resident compiler. Compiler may run on one machine
and produces the machine codes for other computer then in that case it is called as cross compiler.
Stages of Compiler Design
Lexical Analysis: The first stage of compiler design is lexical analysis, also known as scanning. In this
stage, the compiler reads the source code character by character and breaks it down into a series of tokens,
such as keywords, identifiers, and operators. These tokens are then passed on to the next stage of the
compilation process.
Syntax Analysis: The second stage of compiler design is syntax analysis, also known as parsing. In this
stage, the compiler checks the syntax of the source code to ensure that it conforms to the rules of the
programming language. The compiler builds a parse tree, which is a hierarchical representation of the
program’s structure, and uses it to check for syntax errors.
Semantic Analysis: The third stage of compiler design is semantic analysis. In this stage, the compiler
checks the meaning of the source code to ensure that it makes sense. The compiler performs type checking,
which ensures that variables are used correctly and that operations are performed on compatible data types.
The compiler also checks for other semantic errors, such as undeclared variables and incorrect function calls.
Code Generation: The fourth stage of compiler design is code generation. In this stage, the compiler
translates the parse tree into machine code that can be executed by the computer. The code generated by
the compiler must be efficient and optimized for the target platform.
Optimization: The final stage of compiler design is optimization. In this stage, the compiler analyzes the
generated code and makes optimizations to improve its performance. The compiler may perform
optimizations such as constant folding, loop unrolling, and function inlining.
1
Cross Compiler that runs on a machine ‘A’ and produces a code for another machine ‘B’. It is capable of
creating code for a platform other than the one on which the compiler is running.
Source-to-source Compiler or transcompiler or transpiler is a compiler that translates source code written in
one programming language into the source code of another programming language.
High-Level Language:
If a program contains pre-processor directives such as #include or #define it is
called HLL. These (#) tags are called preprocessor directives. They direct the pre-
processor about what to do.
Pre-Processor:
The pre-processor removes all the #include directives by including the files called
file inclusion and all the #define directives using macro expansion. It performs file
inclusion, augmentation, macro-processing, etc.
Assembly Language:
It’s neither in binary form nor high level. It is an intermediate state that is a
combination of machine instructions and some other useful data needed for
execution.
Assembler:
For every platform (Hardware + OS) have an assembler. The output of the
assembler is called an object file. Its translates assembly language to machine code.
Interpreter:
An interpreter converts high-level language into low-level machine language, just like a compiler.
The Compiler in one go reads the inputs, does the processing, and executes the source code
whereas the interpreter does the same line by line.
A compiler scans the entire program and translates it as a whole into machine code whereas an
interpreter translates the program one statement at a time.
Interpreted programs are usually slower concerning compiled ones.
For example: Let in the source program, it is written #include “Stdio. h”. Pre-Processor replaces this file with
its contents in the produced output. The basic work of a linker is to merge object codes (that have not even
2
been connected), produced by the compiler, assembler, standard library function, and operating system
resources.
The codes generated by the compiler, assembler, and linker are generally re-located by their nature,
which means to say, the starting location of these codes is not determined, which means they can be
anywhere in the computer memory, Thus the basic task of loaders to find/calculate the exact address
of these memory locations.
Relocatable Machine Code:
It can be loaded at any point and can be run. The address within the program will be in such a way
that it will cooperate with the program movement.
Loader/Linker:
Loader/Linker converts the relocatable code into absolute code and tries to run the program resulting
in a running program or an error message (or sometimes both can happen). Linker loads a variety of
object files into a single file to make it executable. Then loader loads it in memory and executes it.
Types of Compiler
There are mainly three types of compilers.
Single Pass Compilers
Two Pass Compilers
Multipass Compilers
Single Pass Compiler
When all the phases of the compiler are present inside a single module , it is simply called a single-pass
compiler. It performs the work of converting source code to machine code.
Two Pass Compiler
Two-pass compiler is a compiler in which the program is translated twice , once from the front end and the
back from the back end known as Two Pass Compiler.
Multipass Compiler
When several intermediate codes are created in a program and a syntax tree is processed many times, it is
called Multi pass Compiler. It breaks codes into smaller programs.
Phases of a Compiler
3
There are two major phases of compilation, which in turn have many parts. Each of them takes input from the
output of the previous level and works in a coordinated way.
Constant folding in compiler design is a compiler optimization technique that eliminates expressions of the code
whose value can be computed before executing the code.
1. It implies that if we can determine the value of an expression at compile time itself instead of
computing it at run time, then this technique will eliminate it and make the code efficient.
2. Basically, multiple constants are folded together and evaluated at the compile time. Constant folding can
be applied for the following data types:
a. Boolean values
b. Integers, except for division by zero exception.
c. Floating point values, with caution while rounding.
Let’s say we have a statement in a code like: a = 500*900+30
Then, the compiler will not generate two instructions, i.e., one multiply and one addition instruction. Instead, it
will directly substitute 450030. So, it replaces the above code by:
a=450030
When is Constant Folding Applied in Compiler Design?
Constant folding is applied:
During the Intermediate Code Generation phase of the compiler, which generates an intermediate
representation of source code.
After other optimizations that generate constant expressions, which can be eliminated by constant
folding.
Constant folding in compiler design is a compiler optimization technique that eliminates expressions of the code
whose value can be computed before executing the code.
4
It implies that if we can determine the value of an expression at compile time itself instead of computing it at run
time, then this technique will eliminate it and make the code efficient.