SESSION 1
SESAP ZC373 Compiler Design
Dr. Rajashekara Murthy S
B.E., M.Tech., Ph.D.
BITS Pilani rajashekaramurthy@wilp.bits-pilani.ac.in
Pilani|Dubai|Goa|Hyderabad
1
Course Description
This course exposes the students to the principles behind the construction of
compilers. The various stages of the compilation process will be introduced
systematically with the design/implementation alternatives and their associated
costs & complexity.
The practice of compiler construction changes continually. This course emphasizes
the core principles guiding most of the compilers. Construction of scanners using
the theory of regular languages, construction of parsers by LL/LR techniques,
various intermediate representations of the code being translated, optimizations
and the code generation for target architecture are discussed.
Course Objectives
CO1 Introduce various language translation models and the specific scenarios
where they are useful.
CO2 Introduce various stages in compilations and the related issues.
CO3 Introduce formal methods (RE/CFG/DFA / PDA) in specifying and
recognizing language grammars.
CO4 Introduce type systems and implementing simple type checkers
CO5 Introduce various forms of intermediate code and Intermediate code
generation
CO6 Run-time Memory Layout for Languages with Functions / Procedures,
Heap Management
CO7 Introduce issues in code generation and approaches in code generation
and optimizations.
REFERENCE TEXTS
1. Programming Languages – Concepts and Constructs.
By Ravi Sethi, 2nd Edition. Pearson Education Asia.
2. Modern Compiler Implementation in C.
By Andrew Appel. Cambridge University Press.
3. Compilers – Principles, Techniques and Tools
By Alfred V Aho, Ravi Sethi, Jeffrey D Ullman Addison wesley, 2000.
4. Compiler Construction using Flex and Bison
By Anthony A. Aaby, Walla Walla College, Feb 25, 2004
5. Let's Build a Compiler,
By Jack Crenshaw
4
Course Coverage
Consists of SEVEN broad Topics
1. Introduction to Languages & Compiling
2. Lexical Analysis
3. Syntax Analysis
4. Syntax Directed Translation
5. Run Time Environment
6. Intermediate Code Generation
7. Machine Code Generation & Optimization
5
Course Coverage ( in Detail )
1. Introduction to languages & compiling
1. Programming Languages – Why What & How?
2. Compiler – its modules & Phases
3. Overview of each Modules
4. Accessories of a Compiler
2. Lexical Analysis
1. The Role of a Lexical Analyser
2. Input Buffering
3. Specifications of Tokens
4. Recognition of Tokens
5. A Language for Specifying Lex. Analyzers
………. Contd. 6
Course Coverage ( in Detail )
3. Syntax Analysis
1. The Role of a Parser
2. Context Free Grammar & Writing a Grammar
3. Parsing – Top Down, Bottom up & Operator Precedence
4. LR Parsers & Parser Generators
5. Semantic analysis
4. Syntax Directed Translation
1. Syntax Directed Definitions & Construction of Syntax Trees
2. Bottom up Evaluation of S-attributed Definitions
3. L-attributed Definitions
4. Top Down Translation
5. Bottom up Evaluation of Inherited Attributes
………. Contd. 7
Course Coverage ( in Detail )
5. Run Time Environments
1. Source Language Issues
2. Storage Organization & Allocation Strategies
3. Access to Non-local Names
4. Parameter Passing
5. Symbol Tables
6. Language Facilities for Dynamic Storage Allocation
7. Dynamic Storage Allocation Techniques
8. Storage Allocation in Fortran
………. Contd. 8
Course Coverage ( in Detail )
6. Intermediate Code generation
1. Intermediate languages
2. Declarations & Assignment statements
3. Boolean expressions & Case statements
7. Code Generation and Optimization
1. Issues In The Design Of Code Generators
2. The Target Machine & Run Time Storage Management
3. Basic Blocks And Flow Graphs
4. Next-use Information
5. A Simple Code Generator
6. Register Allocation & Assignment
7. The Dag Representation Of Basic Blocks
9
–1–
INTRODUCTION
10
Programming Languages – What ?
1. Are notations used for specifying organizing and reasoning about
computations
2. A set of rules that specify which sequences of symbols constitute
a program and what computation the program describes (A
program being a sequence of symbols that specifies a
computation)
3. An abstraction mechanism that enables a programmer to specify
a computation abstractly, and to let a program (usually called an
assembler, compiler or interpreter) implement the specification
in the detailed form needed for execution on a computer.
11
A Programming Language is defined by:
1. SYNTAX:
Decides whether a sentence in a language is well-formed
The sun is shining √ Shining is the sun. ×
2. SEMANTICS
Determines the meaning, if any, of a syntactically well-formed
sentence
I read a book. √ The house reads a book. ?
3. GRAMMAR
A formal system that provides a generative finite description of
the language
12
Programming Languages – Why ?
• Machine Code is unintelligible
Consists of only 0 and 1
• Assembly Language is “Low level”
Readable but Abstraction level is so low that the description of the problem
becomes convoluted
• Programming Languages provide benefits like:
Readable familiar notations ( Easily understandable)
Machine independence (Portability)
Programming task made easy with
• Separation of data and Instructions (Logic)
• Higher level of intelligence with error checks reducing errors &
• Availability of pre-packaged code - Libraries
13
Programming Languages – Why so many ?
• The W I D E R A N G E
of computational needs has prompted the creation of hundreds
of programming languages.
• Different classes of problems may demand different levels of
abstraction, and
• Different programmers have different ideas on how abstraction
should be done
• Language influences perception and perception creates new
language
New languages have in turn introduced new way of thinking
about programming & improved upon old ones
14
Programming Paradigms
1. Imperative Programming
• Action Oriented programming where computation is viewed as a sequence
of actions
• Most used languages in the industry
• E.g. FORTRAN, COBOL, PL/I , ALGOL PASCAL and C
2. Functional Programming
• Based on the mathematical concept of pure functions
• Computing with expressions that include conditionals & function definitions
• E.g. LISP and its several dialects, ISWIM, CLOS……
………. Contd. 15
Programming Paradigms
4. Object oriented Programming
• A method of structuring programs by identifying real world or
other objects, and coding them as objects containing all the data
and associated executable code
• A new kind of abstraction well suited for event driven systems
• E.g. SIMULA, SMALLTALK, C++, JAVA, C#,
5. Logic Programming
• programs are expressed as formulas in mathematical logic, and
the “compiler” attempts to infer logical consequences of these
formulas in order to solve problems
• Used in Natural language Processing
• E.g. PROLOG
………. Contd. 16
Programming Paradigms
6. Data driven languages
• languages with a preferred data structure and an extensive set of
operations for the preferred structure
• Made it possible to create sophisticated programs that were
otherwise difficult to write in languages such as Fortran
• E. g. APL, SNOBOL, SETL …..
7. Application Specific Programming
• Programming language specifically modeling the abstractions
needed by an Application Domain
• E.g. APT, NC Programming, SQL, ABAP etc…
---------------------------------------
17
Evolution Of Programming Languages
50’s 60’s 70’s 80’s 90’s 00’s
PROLOG Oberon – 2
COBOL PL / I PASCAL MODULA-2 MODULA-3
FORTRAN ALGOL C C ++ JAVA C#
SIMULA SMALLTALK
ALGOL-68 OCAML
LISP ML
MIRANDA HASKEL
SCHEME 18
Choice Of Programming Language (PL)
Depends on various technical factors like:
1. Suitability of the language to the application domain
2. Availability of the compiler on the target platform
AND
External factors like availability of :
3. Skills
4. Support
5. Training
6. User Constraints etc….
19
Source code
The Process of Program Development Preprocessor
Source program
Compiler
Target assembly program
Assembler
Relocatable machine code Library,
Loader/link editor Relocatable
Object Files
Absolute machine code
20
The Compiler
• A program that translates the code written according to a
language specs, to the code executable on the target
platform (well, Almost ! )
21
The Compiler
Does three main tasks:
1. TRANSLATION
2. VALIDATION
3. OPTIMIZATION
Any Other Tasks You can think of ?
22
The Compiler job 1 – TRANSLATION
• Must ensure that Source program and target program are
equivalent.
• Compile time: when the compiler program runs
• Run time: when the target program runs
• The programming language in which the compiler is
implemented is the implementation language
23
The Compiler job 2 – VALIDATION
• Rejects programs that have errors in them
• What sort of error may be detected by a compiler?
1. Static checks occur at compile time.
2. Dynamic checks occur at run time.
• A compiler cannot detect whether a program will
produce the output that the programmer expects.
24
The Compiler job 3 – OPTIMIZATION
• Tries to produce as good a target program as it can
• Criterion for goodness could be
1. How fast the target program runs (most common)
OR
2. Consumption of power and/or memory
• Differences among compilers depends on
• Their source and target languages,
as well as
• on the relative importance of these three jobs
25
The Compiler
• Does 3 Main tasks:
1. TRANSLATION
2. VALIDATION
3. OPTIMIZATION
• These 3 main Tasks can be further expanded to:
Twelve major Activities
or
Twelve Phases of Compilation Process
26
Phases of a Compilation
• Phases describe major activities that happen in a
compilation Process & the sequence in which it happens
• The Twelve major activities are:
1. Lex 7. Canonicalize
2. Parse 8. Instruction Selection
3. Semantic Actions 9. Control Flow Analysis
4. Semantic Analysis 10. Data Flow Analysis
5. Frame layout 11. Register Allocation
6. Translate 12. Code Gen
27
Phases of Compilation ( Defined )
1. Lex Break the source into words or tokens.
2. Parse Analyze the phrase structure of the program
3. Semantic Build a piece of abstract syntax tree
Actions corresponding to each phrase.
4. Semantic Check phrase meanings, use of variables with
Analysis definition, types of expressions
5. Framing Place variables, parameters in a frame
6. Translate Produce Intermediate Representation trees -
language & m/c independent
………. Contd. 28
Phases of Compilation ( Defined )
7. Canonical- Remove side effects and clean-up
-ize conditional branches
8. Instruction Group IR tree into lumps that correspond
Selection to the actions of the Target Machine
9. Control flow Analyze sequence of instructions into flow
analysis graph with all possible control flows
10.Data flow Live-ness analysis calculates the places
Analysis
where variable holds a still-needed value
11.Register Register to hold the variable & temporary
Allocation
values Concept register sharing.
12.Code Gen
Generate Final Machine Code
29
Phases of Compilation ( Input & Outputs)
INPUT PHASE OUTPUT
Source program LEX Tokens
Tokens PARSE Reductions
Reductions Semantic Actions Abstract Syntax
Variables, Types &
Abstract Syntax Semantic Analysis Expressions
Variables, Types &
Expressions Framing Frames
Frames Translate IR Trees
………. Contd. 30
Phases of Compilation ( Input & Outputs)
INPUT PHASE OUTPUT
IR Trees CANONICALIZE Revised IR Trees
Instruction
Revised IR Trees selection Assembly
Control flow
Assembly Analysis Flow Graph
Flow Graph Data Flow Analysis Interference Graph
Interference Graph Register Allocation Registers Assigned
All previous Outputs Code Gen Relocatable Code
31
Structure of a Compiler
• The twelve phases ( Activities) of Compilation are carried
out by:
Different Modules of a Compiler.
• These Modules may need to scan the source code
multiple times ( Known as Passes) to complete all the
twelve phases of compilation
• This, they do in two parts known as:
1. ANALYSIS Part & 2. SYNTHESIS Part
• The structure & functioning of these modules in one or
several pass(es) defines the Architecture of Compiler
32
Architecture of a COMPILER
LEXICAL ANALYSER
SYNTAX ANALYSER
SYMBOL
TABLE SEMANTIC ANALYSER ERROR
MANAGER HANDLER
INT. CODE GEN.
CODE OPTIMISER
CODE GENERATOR
33
Flow of data in Compiler modules
Source program
Intermediate-code Generator
Lexical Analyzer (Scanner)
Non-optimized Intermediate Code
Tokens
Intermediate-code Optimizer
Syntax Analyzer (Parser)
Optimized Intermediate Code
Parse tree
Target-code Generator
Semantic Analyzer
Target machine code
Abstract Syntax Tree w/ Attributes
34
LEXICAL ANALYSER
• Functionality
INPUT – Program text as a sequence of Characters
OUTPUT – Program Text as a sequence of symbols
(Tokens)
position = initial + rate * 60
id1:= id2 + id3 * 60
35
LEXICAL ANALYSER
• Tasks involved are
1. Read the input file
2. Compresses the text by Ignoring spaces (but not inside quotes)
3. Identify “TOKENS” – the language key words & identifiers
• Notation to denote the composition of tokens (regular expressions).
• Internal representation of token definition ( finite state machines ).
• Algorithms for identifying tokens.
4. Count line numbers
5. Report errors about symbols illegal in the language
• Tools available for defining tokens, which automatically
generate a program for identifying them (e.g., LEX)
36
SYNTAX
• The syntax of a programming language describes the
structure of programs without any consideration of their
meaning.
• The syntactic elements of a programming language are
determined by the computation model and pragmatic
concerns
• well developed tools (regular, context-free and attribute
grammars) for the description of the syntax of
programming language
37
SYNTAX ANALYSER
• Functionality
INPUT – Sequence of symbols (tokens)
OUTPUT – Structure of the program ( an abstract syntax
tree)
id1:= id2 + id3 * 60
=
id1 +
id2 *
id3 60 38
SYNTAX ANALYSER
• Checks the syntactical correctness of the program and
generates errors if any discrepancy is found
• Uses Notation to denote the syntax of a program (Bachus
Naur Form, Context-free grammar).
• Internal representation (Pushdown automata, Parse
trees).
• Several Algorithms for syntax analysis
• Tools for generating the syntax analyzer (YACC)
39
SEMANTIC ANALYSER
• Check consistency in types between the different
components of the expression.
• Check correspondence between parameters in procedure
calls.
INPUT – Abstract Syntax tree
= OUTPUT – Abstract tree
decorated with
attributes like Type of sub-
id1 + expressions & Symbol table
Symbol Table id2 *
Position
Initial inttoreal
id3
Rate
60 40
INTERMEDIATE CODE GENERATOR
• Generates intermediate code as a first step in translation
= INPUT – Abstract Syntax tree
OUTPUT – Intermediate Code
id1 +
id2 *
temp1 := inttoreal id3
inttoreal
(60);
temp2 := id3* temp1 60
temp3 := id2 + temp2;
temp4 := id2 +temp3; 41
CODE OPTIMISER
• Optimizes the code in terms of number of instructions
( and hence the execution time).
temp1 := inttoreal
(60);
INPUT – Intermediate code temp2 := id3*temp1
temp3 := id2 + temp2;
OUTPUT – Optimized
intermediate Code
temp4 := id2 +temp3;
id1 = temp4;
temp1 := id3 * 60;
id1 := id2 + temp1;
42
CODE GENERATOR
• Convets the logical code to machine specific Assembly
instructions
INPUT – Optimized temp1 := id3 * 60;
intermediate Code id1 := id2 + temp1;
OUTPUT – Corresponding
Assembly Code
MOVF id3,R2
MULF #60.0,R2
MOVF id2, R1
ADDF R2.R1
MOVF R1, id1 43
Symbol Table & Error Handling
• Symbol Table
• Needs to update in various stages.
• Define variables
• Check type consistency
• Enforce scope rules
• Generate run-time environment
• Error handling
• Important in all stages
• How to generate informative error messages (syntax error at line
17…).
• How to prevent error propagation (variable not declared, expression
not well defined, procedure call not well defined, etc.).
• How to best guess user’s intention?
44
Architecture of a COMPILER
ANALYSIS LEXICAL ANALYSER
Part
(A.k.a Front end)
SYNTAX ANALYSER
SYMBOL
TABLE SEMANTIC ANALYSER ERROR
MANAGER HANDLER
INT. CODE GEN.
SYNTHESIS
Part CODE OPTIMISER
(A.k.a Back end)
CODE GENERATOR
45
Accessories of Compiler
1. Editor
• An editor is a tool used to create and modify source files, which are files of
characters that comprise a program in a language.
2. Preprocessors
• Perform functions like Macro Processing, File Inclusion and Language
Extensions etc..
3. Assemblers
• Some compilers produce Assembly language code
• Instead of relocatable code normally produced
• Useful in development of cross compilers
………. Contd. 46
Accessories of Compiler (Contd.)
3. Linker
• Collects the object files of the components of the program and resolves
external references from one component to another in order to form an
executable file.
4. Loader
• Copies the executable file from the disk into memory, and initializes the
computer before executing the program.
5. Debugger
• A tool to control the execution of a program at the level of individual
statements, so that bugs can be diagnosed.
………. Contd. 47
Compiler & its Accessories in Context
48
Desired Qualities of a Compiler
Based upon as few artifacts
1. Simplicity
Functions should be controlled by
2. Orthogonality
independent mechanisms
3. Abstraction Self contained & well defined from basics
4. Information Hiding the details of data structures and
Hiding implementation
No interference or side effects,; Well
5. Safety protected
49
Desired Qualities of a Compiler
6. Uniformity Set of defined objects to be uniform
w. r. t. whole program
7. Clarity Each element of a set to be regular
8. Expressiveness Computational model to be universal
Extendibility
9. Modularity
In space and time
10. Efficiency
50
Compiler Construction Tools
1. Compiler-compilers
• Compiler Generators
• Translator Writing Systems
2. Parser Generators
3. Scanner Generators
4. Syntax Directed Translation Engines
5. Automatic Code Generators
6. Data Flow Engines
51
Interpreter – A Poor Cousin of Compiler
• Like Compilers
Interpreters also do translation of
source code to Machine code
BUT
Can you see a role
• Unlike Compilers
for the Interpreter
They do this translation:
1. One line at a time and ?
2. Execute that code immediately before taking up next line for
translation
Can not generate Object code that can be linked & run
independently
52
End of Chapter
- 1-
Introduction
53
Any Questions ????
Thank you
54
In Summary
Output = Program ( Input )
Program = Model of a problem domain
Execution of a program = simulation of the problem domain
A program is a collection of definitions of functions and a
computation is function application (evaluation of an expression).
Three basic computational models --Functional, Logic and Imperative
55