KEMBAR78
Introduction to Compiler Construction | PPT
Introduction to Compiler
Construction (Lecture 1)
Natural Languages
• What are Natural Languages?
• How do you understand the language?
• If you know multiple languages then how
can you recognize each of them?
• How you know which sentence is correct
and which one is incorrect?
Programming Languages
• What are programming languages?
• How do you understand the programming
language?
• If you know multiple programming
languages then how can you recognize each
of them?
• How do you know which syntax is correct
and which one is incorrect?
Compilers and Interpreters
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
– It also reports to its users the presence of errors
in the source program
– C++ uses compiler
Compiler
Error messages
Source
Program
Target
Program
Input
Output4
Compilers and Interpreters
Interpreter
Source
Program
Input
Output
Error messages
• “Interpretation”
– Interpreter is a program that reads an executable
program and produces the results of running that
program. OR
– Instead of producing a target program as a translation,
an interpreter performs the operations implied by the
source program.
– GWBASIC is an example of Interpreter
5
Why study compilers?
• Application of a wide range of theoretical
techniques
– Data Structures
– Theory of Computation
– Algorithms
– Computer Architecture
• Good SW engineering experience
• Better understanding of programming
languages
Features of compilers
• Correctness
– preserve the meaning of the code
• Speed of target code
• Recognize legal and illegal program.
• Speed of compilation
• Good error reporting/handling
• Cooperation with the debugger
• Manage storage of all variables and codes.
• Support for separate compilation
Introduction to Compiler
Construction (Lecture 2)
Classification of Compilers
1. Single Pass Compilers
2. Two Pass Compilers
3. Multipass Compilers
Single Pass Compiler
• Source code directly transforms into
machine code.
– For example Pascal
source
code
target
code
Front EndCompiler
Two Pass Compiler
• Use intermediate representation
– Why?
source
code
target
code
Front End Back End
IR
Front End
Two pass compiler
• intermediate representation (IR)
• front end maps legal code into IR
• back end maps IR onto target machine
• simplify retargeting
• allows multiple front ends
• multiple passes ⇒ better code
12
© Oscar Nierstrasz
Multipass compiler
• analyzes and changes IR
• goal is to reduce runtime
• must preserve values
13
Comparison
• One pass compilers are generally faster than
Multipass Compilers
• Multipass ensures the correctness of small
program rather than the correctness of a
large program (high quality code)
Lecture 3
Front end
• recognize legal code
• report errors
• produce IR
• preliminary storage map
• shape code for the back end
16
Scanner
• Breaks the source code text into small
pieces called tokens.
• It is also known as Lexical Analyzer
Scanner / Lexical Analyser
• map characters to tokens
• character string value for a token is a lexeme
• eliminate white space
x = x + y <id,x> = <id,x> + <id,y>
18
Syntactic Analysis – Parsing
Majid ate the apple
Front end –Analysis– Machine
Independent
• The front end consists of those phases, that
depend primarily on the source language
and are largely independent of the target
machine.
Parser
• recognize context-free syntax
• guide context-sensitive analysis
• construct IR(s)
• produce meaningful error messages
• attempt error correction
21
BACK END
• Synthesis process
• Machine dependent
• The back end includes those portions of the
compiler that depends on the target machine
and generally, these portions do not depend
on the source language
Back end
• translate IR into target machine code
• choose instructions for each IR operation
• decide what to keep in registers at each point
• ensure conformance with system interfaces
23
Compiler Structure
• Front end
– Front end Maps legal code into IR
– Recognize legal/illegal programs
• report/handle errors
– Generate IR
– The process can be automated
• Back end
– Translate IR into target code
• instruction selection
• register allocation
• instruction scheduling
Lecture 4
The Analysis-Synthesis Model
of Compilation
• There are two parts to compilation:
– Analysis determines the operations implied by
the source program which are recorded in a tree
structure
– Synthesis takes the tree structure and translates
the operations therein into the target program
26
ANALYSIS PROCEDURE
• During analysis, the operation implied by
the source program are determined and
recorded in a hierarchical structure called a
tree.
• Often a special type of tree called a Syntax
tree in which each node represents an
operation and the children of a node
represent the arguments of the operation.
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
character stream position = initial + rate * 60
<id,1> <=> <id,2> <+> <id,3> <*> <60>
=
<id,1>
<id,2>
<id,3>
+
*
60
=
<id,1>
<id,2>
<id,3>
+
*
inttofloat
60
REMEMBER
The front end is responsible for
analysis process while the back
end is responsible for Synthesis
Other Tools that Use the
Analysis-Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. Doxygen)
• Static checkers (e.g. Lint and Splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
30
Structure Editors
• A structure editor takes as input a sequence of
commands to build a source program.
• The structure editor not only performs the text
creation and modification functions of an ordinary
text editor but it also analyzes the program text,
putting an appropriate hierarchical structure on the
source program.
• Thus the structure editor can perform additional
tasks that are useful in the preparation of
programs.
Structure Editors (cont..)
• For example, it can check that the input is
correctly formed, can supply key words
automatically (e.g. when the user types
while the editor supplies the matching do
and reminds the user that a conditional must
come between them).
Pretty printers
• A pretty printer analyzes a program and
prints it in such a way that the structure of
the program becomes clearly visible.
• For example comments may appear in a
special font, and the statements may appear
with an amount of indentation proportional
to the depth of their nesting in the
hierarchical organization of the statement.
Static Checkers
• A static checker reads a program, analyzes it, and
attempts to discover potential bugs without
running the program.
• A static checker may detect that parts of the source
program can never be executed, or that a certain
variable might be used before being defined.
• In addition, it can catch logical errors such as
trying to use a real variable as a pointer,
employing the type checking techniques.
Interpreters• Instead of producing a target program as a
translation, an interpreter performs the
operations implied by the source program.
• For example, for an assignment statement
an interpreter might build a tree and then
carry out the operations at the nodes as it
“walks” the tree.
:=
<id,1>
<id,2>
<id,3>
+
*
60
position := initial + rate * 60
Interpreters (cont..)• At the root it would discover it had an assignment to
perform, so it would call a routine to evaluate the
expression on the right, and then store the resulting value
in the location associated with the identifier position.
• At the right child of the root, the routine would discover it
had to compute the sum of two expressions
• It would call itself recursively to compute the value of
expression rate * 60
• It would then add that value to the value of the variable
initial
Text Formatters
• A text formatter takes input that is a stream
of characters, most of which is text to be
typeset, but some of which includes
commands to indicate paragraphs, figures or
mathematical structures like subscripts and
superscripts.
Silicon compilers
• A silicon compiler has a source language
that is similar or identical to a conventional
programming language.
• However, the variables of the language
represent, not locations in memory but
logical signals (0 or 1) or groups of signals
in a switching circuit.
Query interpreters
• A query interpreter translates a predicate
containing relational and Boolean operators
into commands to search a database for
records satisfying that predicate.
JIT compilation

Introduction to Compiler Construction

  • 1.
  • 2.
    Natural Languages • Whatare Natural Languages? • How do you understand the language? • If you know multiple languages then how can you recognize each of them? • How you know which sentence is correct and which one is incorrect?
  • 3.
    Programming Languages • Whatare programming languages? • How do you understand the programming language? • If you know multiple programming languages then how can you recognize each of them? • How do you know which syntax is correct and which one is incorrect?
  • 4.
    Compilers and Interpreters •“Compilation” – Translation of a program written in a source language into a semantically equivalent program written in a target language – It also reports to its users the presence of errors in the source program – C++ uses compiler Compiler Error messages Source Program Target Program Input Output4
  • 5.
    Compilers and Interpreters Interpreter Source Program Input Output Errormessages • “Interpretation” – Interpreter is a program that reads an executable program and produces the results of running that program. OR – Instead of producing a target program as a translation, an interpreter performs the operations implied by the source program. – GWBASIC is an example of Interpreter 5
  • 6.
    Why study compilers? •Application of a wide range of theoretical techniques – Data Structures – Theory of Computation – Algorithms – Computer Architecture • Good SW engineering experience • Better understanding of programming languages
  • 7.
    Features of compilers •Correctness – preserve the meaning of the code • Speed of target code • Recognize legal and illegal program. • Speed of compilation • Good error reporting/handling • Cooperation with the debugger • Manage storage of all variables and codes. • Support for separate compilation
  • 8.
  • 9.
    Classification of Compilers 1.Single Pass Compilers 2. Two Pass Compilers 3. Multipass Compilers
  • 10.
    Single Pass Compiler •Source code directly transforms into machine code. – For example Pascal source code target code Front EndCompiler
  • 11.
    Two Pass Compiler •Use intermediate representation – Why? source code target code Front End Back End IR Front End
  • 12.
    Two pass compiler •intermediate representation (IR) • front end maps legal code into IR • back end maps IR onto target machine • simplify retargeting • allows multiple front ends • multiple passes ⇒ better code 12
  • 13.
    © Oscar Nierstrasz Multipasscompiler • analyzes and changes IR • goal is to reduce runtime • must preserve values 13
  • 14.
    Comparison • One passcompilers are generally faster than Multipass Compilers • Multipass ensures the correctness of small program rather than the correctness of a large program (high quality code)
  • 15.
  • 16.
    Front end • recognizelegal code • report errors • produce IR • preliminary storage map • shape code for the back end 16
  • 17.
    Scanner • Breaks thesource code text into small pieces called tokens. • It is also known as Lexical Analyzer
  • 18.
    Scanner / LexicalAnalyser • map characters to tokens • character string value for a token is a lexeme • eliminate white space x = x + y <id,x> = <id,x> + <id,y> 18
  • 19.
    Syntactic Analysis –Parsing Majid ate the apple
  • 20.
    Front end –Analysis–Machine Independent • The front end consists of those phases, that depend primarily on the source language and are largely independent of the target machine.
  • 21.
    Parser • recognize context-freesyntax • guide context-sensitive analysis • construct IR(s) • produce meaningful error messages • attempt error correction 21
  • 22.
    BACK END • Synthesisprocess • Machine dependent • The back end includes those portions of the compiler that depends on the target machine and generally, these portions do not depend on the source language
  • 23.
    Back end • translateIR into target machine code • choose instructions for each IR operation • decide what to keep in registers at each point • ensure conformance with system interfaces 23
  • 24.
    Compiler Structure • Frontend – Front end Maps legal code into IR – Recognize legal/illegal programs • report/handle errors – Generate IR – The process can be automated • Back end – Translate IR into target code • instruction selection • register allocation • instruction scheduling
  • 25.
  • 26.
    The Analysis-Synthesis Model ofCompilation • There are two parts to compilation: – Analysis determines the operations implied by the source program which are recorded in a tree structure – Synthesis takes the tree structure and translates the operations therein into the target program 26
  • 27.
    ANALYSIS PROCEDURE • Duringanalysis, the operation implied by the source program are determined and recorded in a hierarchical structure called a tree. • Often a special type of tree called a Syntax tree in which each node represents an operation and the children of a node represent the arguments of the operation.
  • 28.
    Lexical Analyzer Syntax Analyzer SemanticAnalyzer character stream position = initial + rate * 60 <id,1> <=> <id,2> <+> <id,3> <*> <60> = <id,1> <id,2> <id,3> + * 60 = <id,1> <id,2> <id,3> + * inttofloat 60
  • 29.
    REMEMBER The front endis responsible for analysis process while the back end is responsible for Synthesis
  • 30.
    Other Tools thatUse the Analysis-Synthesis Model • Editors (syntax highlighting) • Pretty printers (e.g. Doxygen) • Static checkers (e.g. Lint and Splint) • Interpreters • Text formatters (e.g. TeX and LaTeX) • Silicon compilers (e.g. VHDL) • Query interpreters/compilers (Databases) 30
  • 31.
    Structure Editors • Astructure editor takes as input a sequence of commands to build a source program. • The structure editor not only performs the text creation and modification functions of an ordinary text editor but it also analyzes the program text, putting an appropriate hierarchical structure on the source program. • Thus the structure editor can perform additional tasks that are useful in the preparation of programs.
  • 32.
    Structure Editors (cont..) •For example, it can check that the input is correctly formed, can supply key words automatically (e.g. when the user types while the editor supplies the matching do and reminds the user that a conditional must come between them).
  • 33.
    Pretty printers • Apretty printer analyzes a program and prints it in such a way that the structure of the program becomes clearly visible. • For example comments may appear in a special font, and the statements may appear with an amount of indentation proportional to the depth of their nesting in the hierarchical organization of the statement.
  • 34.
    Static Checkers • Astatic checker reads a program, analyzes it, and attempts to discover potential bugs without running the program. • A static checker may detect that parts of the source program can never be executed, or that a certain variable might be used before being defined. • In addition, it can catch logical errors such as trying to use a real variable as a pointer, employing the type checking techniques.
  • 35.
    Interpreters• Instead ofproducing a target program as a translation, an interpreter performs the operations implied by the source program. • For example, for an assignment statement an interpreter might build a tree and then carry out the operations at the nodes as it “walks” the tree. := <id,1> <id,2> <id,3> + * 60 position := initial + rate * 60
  • 36.
    Interpreters (cont..)• Atthe root it would discover it had an assignment to perform, so it would call a routine to evaluate the expression on the right, and then store the resulting value in the location associated with the identifier position. • At the right child of the root, the routine would discover it had to compute the sum of two expressions • It would call itself recursively to compute the value of expression rate * 60 • It would then add that value to the value of the variable initial
  • 37.
    Text Formatters • Atext formatter takes input that is a stream of characters, most of which is text to be typeset, but some of which includes commands to indicate paragraphs, figures or mathematical structures like subscripts and superscripts.
  • 38.
    Silicon compilers • Asilicon compiler has a source language that is similar or identical to a conventional programming language. • However, the variables of the language represent, not locations in memory but logical signals (0 or 1) or groups of signals in a switching circuit.
  • 39.
    Query interpreters • Aquery interpreter translates a predicate containing relational and Boolean operators into commands to search a database for records satisfying that predicate.
  • 40.

Editor's Notes

  • #14 Code improvement - unclear slide
  • #17 preliminary storage map =&gt; prepare symbol table? shape code for the back end =&gt; same as produce IR?
  • #19 character string value for a token is a lexeme Typical tokens: id, number, do, end … Key issue is speed