Module 1
Introduction to Compilers
Organization of a computer
Categorization of System Software
Text Editors
Compilers
Assemblers
Linkers
Loaders
Macro processor
Compilers
Definition: a program that converts instructions into a
machine-code or lower-level form so that they can be
read and executed by a computer
A compiler is a computer software that transforms
computer code written in one programming language
(the source language) into another programming
language
A compiler is a software program that transforms
high-level source code that is written by a developer in
a high-level programming language into a low level
object code (binary code) in machine language, which
can be understood by the processor. The process of
converting high-level programming into machine
language is known as compilation.
Fig: Structure of a compiler
Fig: Structure of a compiler
Fig: Structure of an interpreter
Difference between compiler and interpreter
INTERPRETER COMPILER
Translates program one statement at a Scans the entire program and translates it
time. as a whole into machine code.
It takes less amount of time to analyze the It takes large amount of time to analyze
source code but the overall execution time the source code but the overall execution
is slower. time is comparatively faster.
Generates intermediate object code which
No intermediate object code is generated,
further requires linking, hence requires
hence are memory efficient.
more memory.
Continues translating the program until the It generates the error message only after
first error is met, in which case it stops. scanning the whole program. Hence
Hence debugging is easy. debugging is comparatively hard.
Programming language like Python, Ruby Programming language like C, C++ use
use interpreters. compilers.
History of Compilers
Corrado Bohm – 1951
Navy Electronics Laboratory
International ALGOL Compiler - NELIAC
Lisp
Forth
Source program
Preprocessor
Modified Source program
Compiler
Target Assembly Program
Assembler
Relocatable Machine Code
Linker/Loader
Target Machine Code
Fig: Language Processing System
Evolution of Programming Languages
Machine language as in sequence of 0’s and 1’s –
1940
◦ Move data
◦ Compare two values
◦ Hard to understand and modify
Mnemonic Assembly Language – 1950
Higher level languages – more user friendly, less
error prone, easy modification
◦ Fortran – Scientific Calculation
◦ Cobol – Business data processing
◦ Lisp – symbolic computation
Generation of Programming Languages
First Generation
◦ Machine languages
Second Generation
◦ Assembly languages
Third Generation
◦ Fortran, Cobol, Lisp, C, C++, C#, Java
Fourth Generation
◦ NOMAD, SQL, Postscript
Fifth Generation
◦ Prolog
Impact on Compilers
Advances in programming language – new
algorithms and representations
Minimize the execution overhead
Compiler development is challenging
Generation of optimal target code
Development of Compilers
Formulation of abstract mathematical model
◦ Problem analysis
◦ Formulation of mathematical abstraction
◦ Solve using mathematical techniques
Preserve the meaning of the program under translation
Models used are: finite-state machines, regular
expressions, context free grammars
◦ Correct, with meaning preserved, improvement in
performance – speed of code execution, efficient in terms
of time and efforts, minimize power consumption and
usable
Code optimization
Correctness of the optimized code
Improve the performance of the program
◦ Speed of program execution
◦ Minimize power consumption
◦ Usable
Reasonable compilation time
◦ Less compilation time and debugging cycle
Manageable engineering effort
◦ Prioritize optimization
Structure of a Compiler
Lexical Analysis
LA or Scanners reads the source program one
character at a time – group of meaningful
sequence of characters – lexeme
Output – token
◦ <token_name, attribute_value>
A token describes a pattern of characters having
same meaning in the source program
Puts information about identifiers into the symbol
table not all attributes
Regular expressions are used to describe tokens
Blank spaces removed
DFA
Syntax Analysis
Parsing / Parser
Produces a syntactic structure of the given
program – syntax tree or parse tree
Syntax tree – terminals and non terminals
Intermediate representation – grammatical
structure
Syntax tree
◦ Interior nodes represent operations
◦ Child nodes represent the arguments of operation
Semantic Analysis
A semantic analyzer checks the source program
for semantic errors and collects the type
information for the code generation
Uses both syntax tree and symbol table
Performs type checking
Determines the meaning of the source string
Applications of Compiler Technology
Implementation of high level programming languages
Optimization for computer architecture
◦ Parallelism
◦ Memory hierarchy
Design of new computer architectures
Program Translations
Software Productivity Tools