3/9/2014
Outline
Lectures
Sunday @ 10:10 AM
Monday @ 11:50 AM
CS 321 - Compilers
Teaching Assistant
Eng. Rania Salama
Textbook
Dr. Nagia M. Ghanem
Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools Second
Edition.
Grading
Sheets and Programming Project 20%
Midterm Exam 20%
Final Exam 60%
2
Programming Languages:
Notations for describing computations to people and
machines
Need to be translated to executable languages
Motivation and Objectives
Language Processors
Compiler:
Program that translates from one programming
language to another programming languages (maybe
machine language).
Structure of a Compiler
3/9/2014
Be able to build a
(programming) language
compiler
for
(simplified)
Interpreters
Study the theory behind compilers
Language theory
Algorithms
Software engineering
Machine architecture
Know how to use compiler construction tools, such as
generators of scanners and parsers
Compilers
learn how to work on a larger software project
.. . develop a new language!
Directly execute the operations implied by the source
program on inputs supplied by the user
Source
Program
Interpreter
Output
Translate a program written in a source language into a
semantically equivalent program written in a target
language
Report any errors in the source program that it detects
during the translation process
Input
Input
Source
Program
Error messages
Compiler
Error messages
Target
Program
Output
3/9/2014
Source Program
Compiler Target program produced by the
compiler is much faster in mapping input to
outputs than an interpreter.
Interpreter better error diagnostics than
compiler as it executes the source program
statement by statement.
Modified source program
2
Target Assembly
program
What about Java language processors?
Pre-Processor
Compiler
Assembler
Relocatable
Machine Code
Loader
Link/Editor
Library,
relocatable object files
10
Target machine code
9
Preprocessors
Ex: Macro Processing
#define in C: does text substitution before compiling
#define X 3
Loader: taking relocatable machine code, altering the
addresses and placing the altered instructions into
memory.
Link-editor: taking many (relocatable) machine code
programs (with cross-references) and produce a
single file.
#define Y A*B+C
Need to keep track of correspondence between variable
names and corresponding addresses in each piece of code.
11
12
3/9/2014
Phases of a Compiler
Source Program
Analysis Part (Front end) breaks up the source program
1
into pieces and imposes a grammatical structure on them. It
then create an intermediate representation of the source
program.
It creates the symbol table.
It should report errors and provide informative messages.
3
Symbol-table
Manager
Semantic Analyzer
Intermediate Code
Generator
Code Optimizer
Code Generator
Target Program
13
14
Phase 2. Parsing or Syntax Analysis
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which are the
basic building blocks
identifier
:=__ initial
*_ 60
_______ _+ rate
___
__ _;
position
:=
expression
identifier
All are tokens
initial
Blanks, Line breaks, etc. are scanned out
For previous example,
we would have
Parse Tree:
assignment
statement
For Example:
Position
__________
Syntax Analyzer
Error Handler
4
Synthesis Part (Back end) constructs the desired target
program from the intermediate representation and the
information in the symbol table.
Lexical Analyzer
expression
+
expression
*
expression
expression
identifier
rate
number
60
Nodes of tree are constructed using a grammar for the language
15
16
3/9/2014
Grammar is a Set of Rules Which Govern the Interdependencies & Structure
Among the Tokens
is an
statement
Lexical Analysis - Scans Input, Its Linear Actions Are Not
Recursive
assignment statement, or while
statement, or
Identify Only Individual words that are the the Tokens of the Language
if statement, or ...
assignment statement
is an
expression
is an
identifier := expression ;
Recursion Is Required to Identify Structure of an Expression,
As Indicated in Parse Tree
(expression), or
Verify that the words are Correctly Assembled into sentences
expression + expression, or
expression * expression, or
number, or
17
identifier, or ...
18
Most Important Activity in This Phase:
Find More Complicated Semantic Errors and Support Code Generation
Type Checking - Legality of Operands
Parse Tree Is Augmented With Semantic Actions
:=
:=
position
Many Different Situations:
position
initial
*
rate
60
Real := int + char ;
initial
A[int] := A[real] + int ;
*
rate
while char <> int do
inttoreal
. Etc.
60
Compressed Tree
Conversion Action
19
20
3/9/2014
Phases of a Compiler
Source Program
Symbol Table Creation / Maintenance
Contains info (storage, type, scope, args) on each
Meaningful token, typically identifiers
Data structure created/initialized during lexical analysis
Utilized/updated during later analysis & synthesis
3
Symbol-table
Manager
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Error Handler
Error Handling
5
Detection of different errors which correspond to all phases
What kinds of errors are found during the analysis phase?
What happens when an error is found?
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
21
22
position := initial + rate * 60
Intermediate Code Generation
lexical analyzer
id1 := id2 + id3 * 60
Abstract Machine Version of Code - Independent of Architecture
syntax analyzer
Easy to Produce and easy to translate into target program
:=
+
id1
id2
Code Optimization
*
id3
60
semantic analyzer
Find More Efficient Ways to Execute Code
Replace Code With More Optimal Statements
:=
Symbol
Table
position ....
Final Code Generation
id1
id2l
*
id3
rate.
Generate Relocatable Machine Dependent Code
23
inttoreal
60
initial .
intermediate code generator
E
r
r
o
r
s
24
3/9/2014
Front End : Analysis + Intermediate Code Generation
Symbol Table
E
r
r
o
r
s
position ....
initial .
intermediate code generator
rate.
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
vs.
Back End : Code Generation + Optimization
Number of Passes:
3 address code
A pass: requires r/w intermediate files
code optimizer
temp1 := id3 * 60.0
id1 := id2 + temp1
Fewer passes: more efficiency.
However: fewer passes require more sophisticated
memory management and compiler phase interaction.
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
Front End
Tradeoffs ..
25
Back End
scanning
parsing
sem. analysis
Parser Generators:
code generation
Produce Syntax Analyzers
intermediate
representation
language-dependent
Java
C
Pascal
26
Scanner Generators:
machine-dependent
Produce Lexical Analyzers
Pentium
PowerPC
SPARC
Syntax-directed Translation Engines:
any combination possible
Generate Intermediate Code
Advantages
Disadvantages
better portability
many combinations between front ends
and back ends possible
optimizations are easier on the intermediate
representation than on source code
slower
needs more memory
Automatic Code Generators:
Generate Actual Code
27
28
3/9/2014
In this course, we will explore the front-end of a compiler
in details
Reading - Textbook, Chapter 1
For these languages (Ruby, Perl, Python, one of your
choice) , answer the following questions:
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Is the language compiled or interpreted?
What is the formal description of identifiers?
What is the syntax of the for loop , if found ?
29
30