MU-MIT
Chapter 01
Introduction to Compiler Design (CD)
Course number: CSE 404 Instructor: Desta Gebre
Email: dhesta26@yahoo.com
Lecture 3, Practice 3
Office: 209
Credit 4 Lab: Aynalem
Pre-Requisite CSE 405
Text book:
Alfred V Aho, Ravi Sethi, Jeffrey D. Ullman,
“Compilers – Principles,Techniques and Tools”
Addison Wesley
1
Course Organization
• Lectures
• Organized in to eight chapters
• Labs
• Up to five reports
• Projects and seminars
• In groups of at most 5 students 1. Introduction
2. Lexical Analysis
3. Syntax Analysis
4. Syntax Directed Translation
5. Run Time Environments
6. Intermediate Code Generation
7. Code Generation
8. Code Optimization
2
Evaluation
• Quiz (10%)
• Lab assignments (up to 20%)
• Project (up to 20%)
• Mid exam (up to 30%)
• Final Exam (up to 40%)
• Total out of 100%
Caution
• Direct copying from other student or from the Internet is not allowed. (Results in 0).
• Each group member must contribute in project and all lab works.
• A group member who does not know what is in a lab report or project will be
given zero. Up to 5 marks will also be deducted from each group member.
• Cheating in exams will not be tolerated.
• Will be reported to discipline committee right away. 3
¿Why sudy Compilers?
• Become a better programmer
• Insight into interaction between languages, compilers, and hardware
• Compiler techniques are everywhere
• Parsing (little languages, interpreters, HTML)
• Database engines, query languages
• Text processing
• Fascinating blend of theory and engineering
• Direct applications of theory to practice
• Parsing, scanning, static analysis
• Resource allocation, “optimization”, etc.
• You might even write a compiler some day!
• Etc
4
Overview and History
Cause
Software for early computers was written in assembly language
The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler
The first real compiler
Writing a compiler has been considered as time taking task..
FORTRAN compilers of the late 1950s took18 person-years to build
Nowadays writing some modules of a compiler has
become routine enough that can be automated.
Compiler generators or compiler-compilers
E.g. scanner and parser generators (Lex, Yacc)
5
Compilers
A compiler acts as a translator,
transforming human-oriented programming languages
into computer-oriented machine languages.
Ignore machine-dependent details for programmer
Programming Machine
Language Compiler Language
(Source) (Target)
6
…Cont’d
Compilers may generate three types of code:
Pure Machine Code
Machine instructions set without assuming the existence of any
operating system or library.
Mostly being OS or embedded applications.
Augmented Machine Code
Code with OS routines and runtime support routines.
More often.
Virtual Machine Code
Virtual instructions, can be run on any architecture with a virtual
machine interpreter or a just-in-time compiler
Ex. Java
7
…Cont’d
Another way that compilers differ from one another is
in the format of the target machine code they
generate:
Assembly or other source format
Relocatable binary
Relative address
A linkage step is required
Absolute binary
Absolute address
Can be executed directly
8
Phases of Compilation
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Representation
Symbol and Optimizer
Attribute
Tables
(Used by all Phases of The Compiler)
Code
Generator
9
Target machine code
Phases of Compilation: Scanner
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Scanner Representation
The scanner begins the analysis of the source program by
reading the input, character by character, and grouping
Symbol and Optimizer
characters into individual words and symbols (tokens)
Attribute
Tables
RE ( Regular expression )
NFA ( Non-deterministic Finite Automata )
(Used by
DFA ( Deterministic Finite Automata ) all
LEX Phases of
The Compiler) Code
Generator
10
Target machine code
Phases of Compilation: Parser
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Parser Representation
Given a formal syntax specification (typically as a context-
free grammar [CFG] ), the parser reads tokens and groups
Symbol and Optimizer
them into units as specified by the productions of the CFG
Attribute
being used.
Tables
As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
(Used by all
tree.
Phases
CFG ( Context-Free Grammar ) of
The Compiler)
Top Down, Bottom Up parsers Code
Operator Precedence parser Generator
LR, SLR, LALR Parsers
11
YACC
Target machine code
Phases of Compilation: Semantic ..
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Semantic Routines Representation
Perform two functions
Check the static semantics of each construct
Symbol and
Do the actual translation Optimizer
Attribute
The heart of a compiler
Tables
Syntax Directed Translation
Semantic Processing Techniques
(Used
by all
IR (Intermediate Representation)
Phases of
The Compiler) Code
Generator
12
Target machine code
Phases of Compilation: Optimizer
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Optimizer Representation
The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent but
Symbol and Optimizer
improved IR code
Attribute
This phase can be very complex and slow
Tables
Peephole optimization
loop optimization, register allocation
(Used by all
Phases of
Register and Temporary Management
Peephole Optimization The Compiler) Code
Generator
13
Target machine code
Phases of Compilation: Code …
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Code Generator Representation
Interpretive Code Generation
Generating Code from Tree/Dag
Grammar-Based Code Generator
Optimizer
Code
Generator
14 Target machine code
Phases: Summary
Code Generator
[Intermediate Code Generator]
Non-optimized Intermediate Code
Scanner
[Lexical Analyzer]
Tokens
Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree
Code Generator
Semantic Process
[Semantic analyzer] Target machine code
Abstract Syntax Tree w/ Attributes
15
Grouping of Phases
Any compiler must perform two major tasks
Compiler
Analysis Synthesis
Analysis of the source program
Synthesis of a machine-language program
16
Grouping of Phases …cont’d
In analysis phase, an intermediate representation is
created from the given source program.
Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of
this phase.
In synthesis phase, the equivalent target program is
created from this intermediate representation.
Intermediate Code Generator, Code Generator, and Code Optimizer
are the parts of this phase.
17
Syntax and Semantics of Programming Language
A programming language must include the specification of
syntax (structure) and semantics (meaning).
Syntax typically means the context-free syntax because of
the almost universal use of context-free-grammar (CFGs)
Ex. a = b + c is syntactically legal and b + c = a is illegal
The semantics of a programming language are commonly
divided into two classes:
Static semantics
Semantics rules that can be checked at compiled time.
Ex. The type and number of a function’s arguments
Runtime semantics
Semantics rules that can be checked only at run time
18
Metalanguages
Metalanguage: a language used to define another language
We will use different metalangauges to define the various
components of a programming language so that these
components can be generated automatically
Lexical analysis: regular expressions
Syntax analysis: context free grammars
Semantics analysis: attribute grammars
Intermediate code generation: attribute grammars
Code generation: tree grammars
19
Generators
Automatic Compiler Generator
Compiler
Compiler Compiler
Definition in
(Metalanguage)
Error Message
Automatic tool Generator
Tool
Definition in Compiler Tool
(Metalanguage)
Error Message
20
Computer Architecture and Compiler Design
Compilers should exploit the hardware-specific feature
and computing capability to optimize code.
The problems encountered in modern computing
platforms:
Instruction sets for some popular architectures are highly non-
uniform.
High-level programming language operations are not always
easy to support.
Ex. exceptions, threads, dynamic heap access …
Exploiting architectural features such as cache, distributed
processors and memory
Effective use of a large number of processors
21
Compiler Design Considerations
Debugging Compilers
Designed to aid in the development and debugging of
programs.
Optimizing Compilers
Designed to produce efficient target code
Retargetable Compilers
A compiler whose target architecture can be changed without
its machine-independent components having to be rewritten.
22