Chapter 1
Introduction to Compiler
Construction
Language Processing System
To create an executable target program several programs may
be required
Collecting the source
Program ( modules,
macros, etc.)
Assembly-language
program
2
Introduction
- Most programming is done using a high-level
programming language
- But this language can be very different from
the machine language that the computer can
execute.
3
Compiler
• A compiler translates a program written in a high-level
programming language that is suitable for human
programmers into the low-level machine language that is
required by computers
+
spots and reports obvious programmer mistakes.
4
Running the Target Program
• If the target program is an executable machine-
language program, it can then be called by the user
to process inputs and produce outputs.
5
Interpreter
• An interpreter is another way of implementing a programming
language.
• Interpretation shares many aspects with compiling (Lexing,
parsing and type-checking)
➔ But
Instead of producing a target program as a translation, an
interpreter appears to directly execute the operations specified
in the source program on inputs supplied by the user
6
Compiler vs Interpreter
• An interpreter may need to process the same piece of the
syntax tree (for example, the body of a loop) many times ➔
interpretation is slower than executing a compiled program.
• An interpreter executes the source program statement by
statement ➔ it can usually give better error diagnostics than
a compiler.
7
7
Interpretive Compilers
Why?
A tradeoff between fast(er) compilation and a reasonable runtime
performance.
How?
Use an “intermediate language”
• more high-level than machine code => easier to compile to
• more low-level than source language => easy to implement as an
interpreter
Example: A “Java Development Kit” for machine M
Java->JVM JVM
M M
8
The Phases of a Compiler
Phase Output Sample
Programmer (source code Source string A=B+C;
producer)
Scanner (performs lexical Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
analysis) And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax ;
|
based on the grammar of the tree =
programming language) / \
A +
/ \
B C
Semantic analyzer (type Annotated parse tree or
checking, etc) abstract syntax tree
Intermediate code generator Three-address code, quads, or int2fp B t1
RTL + t1 C t2
:= t2 A
Optimizer Three-address code, quads, or int2fp B t1
RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
The Grouping of Phases
• Compiler front and back ends:
– Front end: analysis (machine independent)
– Back end: synthesis (machine dependent)
• Compiler passes:
– A collection of phases is done only once (single pass) or
multiple times (multi pass)
• Single pass: usually requires everything to be defined before being
used in source program
• Multi pass: compiler may have to keep entire program
representation in memory
10
Compiler Architecture
In more detail:
Intermediate
Language
Source Target Language
Language Front End – Back End –
language specific machine specific
•Separation of Concerns
•Retargeting
11
Compiler Architecture
Intermediate
Intermediate
Language Language
Scanner Parser Semantic Code
Source Code Target
language
(lexical (syntax Analysis Generator language
Optimizer
analysis) analysis) (IC generator)
tokens Syntactic
structure
Symbol
Table
12
Lexical Analysis - Scanning
Scanner tokens Parser Semantic Code
Source (lexical (syntax Generator
Analysis
language analysis) analysis) (IC generator)
Code
Optimizer
• Tokens described formally
• Breaks input into tokens
Symbol
• Remove white space
Table
13
Input: result = a + b * c / d
• Tokens:
‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’
identifiers
operators
14
Static Analysis - Parsing
tokens Syntactic
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)
Code
Optimizer
• Syntax described formally
• Tokens organized into syntax tree Symbol
that describes structure
Table
• Error checking (syntax)
15
Input: result = a + b * c / d
Exp ::= Exp ‘+’ Exp Assign
| Exp ‘-’ Exp
| Exp ‘*’ Exp ID ‘=‘ Exp
| Exp ‘/’ Exp
| ID Exp ‘+’ Exp
Assign ::= ID ‘=‘ Exp ID Exp ‘*’ Exp
ID Exp ‘/’ Exp
ID ID
16
Semantic Analysis
Syntactic/semantic
Syntactic structure
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)
Syntactic/semantic
structure
Code
Optimizer
• “Meaning”
• Type/Error Checking
• Intermediate Code Generation –
abstract machine Symbol
Table
17
Optimization
Scanner Parser Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)
Syntactic/semantic
structure
Syntactic/semantic
Code structure
Optimizer
• Improving efficiency (machine
independent)
• Finding optimal code is NP Symbol
Table
18
Code Generation
Syntactic/semantic
structure
Scanner Parser Semantic Code
Source Target
language
(lexical (syntax Analysis Generator language
analysis) analysis) (IC generator)
Syntactic/semantic
Code structure
Optimizer
• IC to real machine code
• Memory management, register
allocation, instruction selection, Symbol
instruction scheduling, … Table
19
Structure of a Compiler
Dividing into tokens (symbols :
variable name, keyword or number)
Producing a tree-structure that
reflects the structure of the
program.
Determining if the program
violates certain consistency requirements (if a
variable is used but not declared, use a boolean
value as a function pointer)
Translating the intermediate language to
assembly language (a textual representation of
machine code) for a specific machine
architecture.
Translating the assembly-language code into
binary representation and addresses of
variables, functions are determined.
20
Translation of an Assignment Statement
21
Issues Driving Compiler Design
• Correctness
• Speed (runtime and compile time)
– Degrees of optimization
– Multiple passes
• Space
• Feedback to user
• Debugging
22
Symbol Table Management
• The symbol table is a data structure containing
– a record for each variable name, with fields for the
attributes of the name (storage allocated for a name, its
type, its scope where in the program its value may be
used), and
– for procedure names (number and types of its arguments,
the method of passing each argument and the type
returned ).
• - The data structure should be designed to allow the
compiler to find the record for each name quickly
and to store or retrieve data from that record quickly.
23
Disciplines involved
• Algorithms
• Languages and machines
• Operating systems
• Computer architectures
24
Important Qualities in a Compiler
1. Correct code
2. Output runs fast
3. Compiler runs fast
4. Compile time proportional to program size
5. Support for separate compilation
6. Good diagnostics for syntax errors
7. Works well with the debugger
8. Good diagnostics for flow anomalies
9. Cross language calls
10.Consistent, predictable optimization 25
A bit of history
• 1952: First compiler (linker/loader) written by Grace
Hopper for A-0 programming language
• 1957: First complete compiler for FORTRAN by John
Backus and team
• 1960: COBOL compilers for multiple architectures
• 1962: First self-hosting compiler for LISP
26
Application of Compiler Technology
• Implementation of High-Level Programming
Languages
– The key ideas behind object orientation are
• 1 . Data abstraction and
• 2. Inheritance of properties,
• Optimizations for Computer Architectures
– Parallelism
– Memory Hierarchies
• Design of New Computer Architectures
– RISC
– CISC
– Specialized Architectures 27