Compiler Design
Compiler Construction 1
The text book is:
Alfred V. Aho, Ravi Sethi and Jeffry D. Ulman,
Compilers Principles, Techniques and Tools, Addison
Wesley, 2007,
Compiler Construction 2
Language Processing System
Compiler Construction 3
Language Processors.
A translator inputs and then converts a source program
into an object or target program.
Source program is written in a source language
Object program belongs to an object language
A translators could be: Assembler, Compiler,
Interpreter
Assembler:
Assembler
source program object program
(in assembly language) (in machine
language)
Compiler Construction 4
Language Processors.
a compiler is a program that can read a program in one language the source
language - and translate it into an equivalent program in another language - the
target language;
An important role of the compiler is to report any errors in the source program that
it detects during the translation process
If the target program is an executable machine-language program, it can then be
called by the user to process inputs and produce outputs;
Compiler Construction 5
Interpreter
•An interpreter is another common kind of language processor. Instead of producing a
target program as a translation, an interpreter appears to directly execute the operations
specified in the source program on inputs supplied by the user
The machine-language target program produced by a compiler is usually much faster
than an interpreter at mapping inputs to outputs . An interpreter, however, can usually
give better error diagnostics than a compiler, because it executes the source program
statement by statement.
Compiler Construction 6
Definition of Compiler
- Compiler: translates a source program written in a High-
Level Language (HLL) such as Pascal, C++ into
computer’s machine language (Low-Level Language
(LLL)).
* The time of conversion from source program into
object program is called compile time
* The object program is executed at run time
Compiler Construction 7
Overview of Compilers
Compilation Process:
Data
Source Object Executing Results
Compiler
program program Computer
Compile time run time
Compiler Construction 8
Tasks of A Compiler
A compiler must perform two tasks:
- analysis of source program: The analysis part breaks up
the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to
create an intermediate representation of the source program.
- synthesis of its corresponding program: constructs the
desired target program from the intermediate representation
and the information in the symbol table.
The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
Compiler Construction 9
Phases of Compiler
The compilation process is a sequence of various phases.
Each phase takes input from its previous stage, has its own
representation of source program, and feeds its output to the
next phase of the compiler.
Let us understand the phases of a compiler.
Compiler Construction 10
Phases of Compiler
The phases of a compiler can be given as follows:
o • Lexical analysis.
o • Syntax analysis.
o • Semantic analysis.
o • Intermediate code generation.
o • Code optimization.
o • Code generation.
o Code optimization.
Compiler Construction 11
Phases of Compilation Process and Its Output
Error
handler
Compiler phases
Compiler Construction 12
Lexical Analysis (scanner): The first phase of a compiler
Lexical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful
sequences called lexeme
For each lexeme, the lexical analyzer produces a token of the form
that it passes on to the subsequent phase, syntax analysis
(token-name, attribute-value)
Token-name: an abstract symbol is used during syntax analysis
attribute-value: points to an entry in the symbol table for this
token.
Compiler Construction 13
Example: position =initial + rate * 60
1.”position” is a lexeme mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
2. = is a lexeme that is mapped into the token (=). Since this token needs no attribute-
value, we have omitted the second component. For notational convenience, the
lexeme itself is used as the name of the abstract symbol.
3. “initial” is a lexeme that is mapped into the token (id, 2), where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token (+).
5. “rate” is a lexeme mapped into the token (id, 3), where 3 points to the symbol-table
entry for rate.
6. * is a lexeme that is mapped into the token (*) .
7. 60 is a lexeme that is mapped into the token (60)
Blanks separating the lexemes would be discarded by the lexical analyzer.
14
Syntax Analysis (parser) : The second phase of the compiler
The next phase is called the syntax analysis or parsing. It takes the token
produced by lexical analysis as input and generates a parse tree (or syntax
tree).
In this phase, token arrangements are checked against the source code
grammar, i.e. the parser checks if the expression made by the tokens is
syntactically correct.
A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation
15
Semantic Analysis: Third phase of the compiler
Semantic analysis checks whether the parse tree constructed follows
the rules of language.
For example, assignment of values is between compatible data types,
and adding string to an integer.
Also, the semantic analyzer keeps track of identifiers, their types and
expressions; whether identifiers are declared before use or not etc.
The semantic analyzer produces an annotated syntax tree as an output.
Compiler Construction 16
Intermediate Code Generation: three-address code
After semantic analysis the compiler generates an intermediate
code of the source code for the target machine.
It represents a program for some abstract machine.
It is in between the high-level language and the machine
language.
This intermediate code should be generated in such a way that
it makes it easier to be translated into the target machine code.
17
Code Optimization: to generate better target code
The next phase does code optimization of the intermediate code.
Optimization can be assumed as something that removes unnecessary
code lines, and arranges the sequence of statements in order to speed up
the program execution without wasting resources (CPU, memory).
The machine-independent code-optimization phase attempts to improve
the intermediate code so that better target code will result.
Usually better means:
faster, shorter code, or target code that consumes less power.
18
Code Generation: takes as input an intermediate representation of
the source program and maps it into the target language
In this phase, the code generator takes the optimized representation of
the intermediate code and maps it to the target machine language.
The code generator translates the intermediate code into a sequence of
(generally) re-locatable machine code.
Sequence of instructions of machine code performs the task as the
intermediate code would do.
19
Symbol-Table Management:
It is a data-structure maintained throughout all the phases
of a compiler.
All the identifier's names along with their types are stored
here.
The symbol table makes it easier for the compiler to
quickly search the identifier record and retrieve it.
The symbol table is also used for scope management.
Whenever an identifier is detected in any of the phases, it
is stored in the symbol table.
20
Example
int a, b; float c; char z;
Symbol name Type Address
A Int 1000
B Int 1002
C Float 1004
Z Char 1008
Compiler Construction 21
Error Handling
• Each phase can encounter errors. After detecting an error,
a phase must handle the error so that compilation can
proceed.
• In lexical analysis, errors occur in separation of tokens.
• In syntax analysis, errors occur during construction of
syntax tree.
• In semantic analysis, errors may occur at the following
cases:(i) When the compiler detects constructs that have
right syntactic structure but no meaning
22
(ii) During type conversion.
In code optimization, errors occur when the result is affected
by the optimization. In code generation, it shows error when
code is missing etc.
Compiler Construction 23
Translation of an assignment
statement
Compiler Construction 24
Parse Tree and Symbol Table
Parse tree defines the program structure; how to combine parts of the
program to produce larger part and so on.
Symbol table provides
- the associations between all occurrences of each
name given in the program.
- It provides a link between each name and it
declaration.
Compiler Construction 25
Grouping of Phases
The phases of a compiler can be grouped as:
Front end
o Front end of a compiler consists of the phases
o • Lexical analysis.
• Syntax analysis.
• Semantic analysis.
Back end
o Back end of a compiler contains
o • Code optimization.
• Code generation.
Compiler Construction 26
Front End
• Front end comprises of phases which are dependent on the input
(source language) and independent on the target machine (target
language).
• It includes lexical and syntactic analysis, symbol table management,
semantic analysis and the generation of intermediate code.
• Code optimization can also be done by the front end.
• It also includes error handling at the phases concerned.
Compiler Construction 27
Back End
• Back end comprises of those phases of the compiler that are
dependent on the target machine and independent on the source
language.
• This includes code optimization, code generation.
• In addition to this, it also encompasses error handling and symbol
table management operations.
Compiler Construction 28
Common Back-end Compiling System
Fortran C/C++ Pascal Cobol
Common IR (e.g. Ucode)
Optimizer
Target Machine Code Gen
Compiler Construction 29
Compiling Passes
Several phases can be implemented as a single pass
consist of reading an input file and writing an output
file.
A typical multi-pass compiler looks like:
First pass: preprocessing, macro expansion
Second pass: syntax-directed translation, IR code
generation
Third pass: optimization
Last pass: target machine code generation 30
Passes
• The phases of compiler can be implemented in a single pass by
marking the primary actions viz. reading of input file and writing to
the output file.
• Several phases of compiler are grouped into one pass in such a
way that the operations in each and every phase are incorporated
during the pass.
• (eg.) Lexical analysis, syntax analysis, semantic analysis and
intermediate code generation might be grouped into one pass. If so,
the token stream after lexical analysis may be translated directly
into intermediate code.
Compiler Construction 31
Reducing the Number of Passes
• Minimizing the number of passes improves the time efficiency as
reading from and writing to intermediate files can be reduced.
• When grouping phases into one pass, the entire program has to be
kept in memory to ensure proper information flow to each phase
because one phase may need information in a different order than the
information produced in previous phase.
The source program or target program differs from its internal
representation. So, the memory for internal form may be larger than
that of input and output.
Compiler Construction 32