Compiler Design (40-414)
Main Text Book: Compilers: Principles, Techniques & Tools, 2nd ed., Aho, Lam, Sethi, and Ullman, 2007 Evaluation: Midterm Exam Final Exam Assignments Project
35% 35% 10% 20%
Compiler learning
Isnt it an old discipline? Yes, it is a well-established discipline Algorithms, methods and techniques were developed in early stages of computer science There are many compilers around, and many tools to generate them automatically So, why we need to learn it? Although you may never write a full compiler But the techniques we learn is useful in many tasks like:
writing an interpreter for a scripting language, validation checking for forms, and so on
Terminology
Compiler: a program that translates an executable program in a source language (usually high level) into an equivalent executable program in a target language (usually low level) Interpreter: a program that reads an executable program and produces the results of running that program usually, this involves executing the source program in some fashion Our course is mainly about compilers but many of the same issues arise in interpreters
A Compiler
Source Program
Compiler Errors
Target Program
Input
Target Progtam
Output
An Interpreter
Source Program
Interpreter
Output
Input
Translates line by line Executes each translated line immediately Execution is slower because translation is repeated But, usually give better error diagnostics than a compiler
5
A Hybrid Compiler
Source Program
Translator
Errors
Intermediate Program
Input
Virtual Machine
Output
Classifications of Compilers
There are different types of Compilers:
Single Pass
Construction Multiple Pass Absolute (e.g., *.com) Relocateable (e.g., *.exe) Type of produced code
The Many Phases of a Compiler
Source Program 1 Lexical analyzer
Syntax Analyzer
Analyses
3 Semantic Analyzer Intermediate 4 Code Generator
Symbol-table Manager
Error Handler
5 Code Optimizer
Syntheses
6 Code Generator
7 Peephole Optimization
1, 2, 3, 4, 5 : Front-End 6, 7 : Back-End
Target Program
Front-end, Back-end division
Source code IR Machine code
Front end
Back end
errors Front end maps legal code into IR Back end maps IR onto target machine Simplifies retargeting Allows multiple front ends
Front end
Source code Scanner tokens Parser Parse Tree
errors
Scanner: Maps characters into tokens the basic unit of syntax
x = x + y becomes <id, x> = <id, x> + <id, y>
Typical tokens: number, id, +, -, *, /, do, end Eliminate white space (tabs, blanks, comments) A key issue is speed so instead of using a tool like LEX it sometimes needed to write your own scanner
10
Front end
Source code Scanner tokens Parser Parse Tree
errors
Parser: Recognize context-free syntax Guide context-sensitive analysis Construct IR Produce meaningful error messages Attempt error correction There are parser generators like YACC which automates much of the work
11
Front end
Context free grammars are used to represent programming language syntaxes:
<expr> ::= <expr> <op> <term> | <term> <term> ::= <number> | <id> <op> ::= + | -
12
Front end
A parser tries to map a program to the syntactic elements defined in the grammar A parse can be represented by a tree called a parse or syntax tree
13
Front end
A parse tree can be represented more compactly referred to as Abstract Syntax Tree (AST) AST can be used as IR between front end and back end
14
Back end
IR Instruction selection errors
Register Allocation
Machine code
Translate IR into target machine code Choose instructions for each IR operation Decide what to keep in registers at each point
15
Back end
IR
Code Generation Peephole Optimization
Machine code
errors
Produce compact fast code Use available addressing modes
16
Back end
IR
Code Generation Peephole Optimization
Machine code
errors
Limited resources Optimal allocation is difficult
17
The Analysis Task For Compilation
Three Phases:
Lexical Analysis:
Left-to-right Scan to Identify Tokens
token: sequence of chars having a collective meaning
Syntax Analysis:
Grouping of Tokens Into Meaningful Collection
Semantic Analysis:
Checking to ensure Correctness of Components
18
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which are the basic building blocks
For Example: Position := initial + rate * 60 ;
_______ __ _____ _ ___ _ __ _
All are tokens
Blanks, Line breaks, etc. are scanned out
19
Phase 2. Syntax Analysis or Parsing
assignment statement identifier :=
For previous example, we would have Parse Tree:
expression +
position
expression
identifier initial
expression * expression expression
identifier rate number 60
Nodes of tree are constructed using a grammar for the language
20
Phase 3. Semantic Analysis
Finds Semantic Errors
:=
position initial rate + * 60 := position initial + * rate inttoreal 60 Syntax Tree Conversion Action
One of the Most Important Activity in This Phase: Type Checking - Legality of Operands
21
Supporting Phases/ Activities for Analysis
Symbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on Each Meaningful Token, Typically Identifiers Data Structure Created / Initialized During Lexical Analysis Utilized / Updated During Later Analysis & Synthesis Error Handling Detection of Different Errors Which Correspond to All Phases What Happens When an Error Is Found?
22
The Synthesis Task For Compilation
Intermediate Code Generation
Abstract Machine Version of Code - Independent of Architecture
Easy to Produce and Do Final, Machine Dependent Code Generation Code Optimization Find More Efficient Ways to Execute Code Replace Code With More Optimal Statements
Final Code Generation
Generate Relocatable Machine Dependent Code
Peephole Optimization With a Very Limited View Improves Produced Final Code
23
Reviewing the Entire Process
position := initial + rate * 60 lexical analyzer id1 := id2 + id3 * 60 syntax analyzer
:= id1 id2 + *
id3
semantic analyzer
60
:=
Symbol Table position .... initial . rate. intermediate code generator
id1 id2
+ *
Errors
id3
inttoreal
60
24
Reviewing the Entire Process
Symbol Table
position ....
initial . rate. intermediate code generator t1 := inttoreal(60) t2 := id3 * t1 temp3 := id2 + t2 id1 := t3 code optimizer t1 := id3 * 60.0 id1 := id2 + t1
3 address code
Errors
final code generator MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1
25