Compiler Construction
Lecture#1
Course Outline/Intro,
Why study this course?,
Programming Languages
Compilers, Interpreters.
By: Engr. Muhammad Adnan Malik
Class of BS-CS, NCBA&E, MULTAN.
Kindly join the Google group for updates
No files will be personally given to anyone, kindly find all the related
material on the group.
All quizzes announcements and assignments will be uploaded to the group.
You need a GMAIL account for joining the group.
Provide your email id to class representative on or before Friday 8th
November.
Text Book
Compilers: Principles, Techniques, and Tools By
V. Aho, Ravi Sethi, Jeffrey D. Ullman.
Course Outline
WILL BE MADE AVAILABLE ON GROUP
(HEC OUTLINE)
HEC-OUTLINE
Important
Maintain good attendance, late comers will face loss in sessional
marks.
A single project of at least 20% weightage will be given as a part of
this course in which you will construct a Realtime working
compiler.
Orientation
How do we Human Interact
• The main medium of understanding among humans is language.
• Every country or may be a city or an area have its own language which act
as a medium of interaction among them. For example Chinese, English,
Punjabi etc.
• So hence, when two or more persons who speak different languages, when
try to communicate with each other, needs a translation for understanding.
Translate below
• 你好,今天过得好吗?(Chinese)
• Hi, how are you doing today?
How does computer understand human
• Simply, a computer also has a language which it understand directly.
• So in order to make computer work according to our needs, we need to talk
computer in its own language.
• Here is where translation is involved and we need programming languages
for this purpose.
How does computer understand human
• Language of computer is?
• CPU understands either 1 or 0. It is also known as
binary or machine language.
Brief overview of programming languages
Programming languages
• A computer programming language is set of instruction provided to the
computer to perform different tasks. These languages can either be low or
high level.
Programming languages
PRGRAMMING
LANGUAGES
LOW LEVEL HIGH LEVEL
LANGUAGE LANGUAGES
JAVA, C++, C, PHYTHON,
MACHINE LANGAUGE
MATLAB etc.
Programming languages
• High level languages are very close to human language and are very easy to
understand and write.
• Given below is a code in C++ programming language.
Programming languages
• Here is an important question:
How does a computer translate the higher level language into its own machine
language, because in the end, it is the CPU that will performs all the
operation?
• This translation job is done by another computer program known as
“compiler or interpreter”.
Compiler and interpreter
Compiler and interpreter
Compiler: A compiler is computer software that
transforms or translates computer code written in
one programming language (the source language) into
another computer language (the target language).
Compiler and interpreter
Interpreter: An interpreter is also a translator just like compiler, but it is
slightly different from compiler.
“Compiler reads the source or input program as a whole and then performs
the translation(execution is done by CPU) and check for errors where an
interpreter read the code line by line and then produces output and errors for
each”.
• Any examples for compiler and interpreter?
Compiler and interpreter
• All the modern programming languages we use today are usually based on
compiler where a programmer writes a source code and run it as a whole.
• C++, JAVA etc. are all compiler-based languages.
• Can you give some advantages and disadvantages for both compiler and
interpreter?
• When we need to write more complex program and need to uses it again and
again, we need compiler where interpreter are used for simple programs.
Important
• Does compiler only translate languages, or it can also do other
jobs? Any examples?
• Compiler are also used in other applications such as:
• File conversions (image to pdf)
• Text Conversions (.txt to .doc or .ppt)
Important
• Another important question:
• A compiler is computer program that perform translation for
CPU and then execution is done by CPU, but how do we design it?
• This is the main goal of this course. You must be able to design
your own compiler.
Parts of Compiler
Parts of Compiler
• As we have a sound knowledge of compiler, let us briefly discuss
the parts of compiler and how they work together.
• There are two basic parts of compiler:
• Front End
• Back End
Parts of Compiler
Source code Front IR Machine code
Back
End End
errors
• IR-Intermediate Representation
Parts of Compiler
• It is also sometimes referred as “TWO PASS” compiler as it goes
through two stages.
• It is the basic structure of every compiler.
• A compiler can have multiple stages and is referred as multiple
pass compiler.
Basic Working of Compiler
Basic Working of Compiler
Front End:
• It Recognizes legal (& illegal) programs.
• Report errors in a useful way.
• Produce IR & preliminary storage map.
Back end
• It maps IR into target machine code
• Admits multiple front ends & multiple passes
Parts of Compiler
Source code Front IR Machine code
Back
End End
errors
• IR-Intermediate Representation
Front End
Front End
Front End of compiler is further dividede into two
modules:
• Scanner
• Parser
Scanner
Scanner
Scanner: Its main task is to read the input characters and produce a
sequence of pairs.
Each pair has following two parts:
• Token type
• Word or part of speech
Scanner
• Example
x=x+y
becomes
<id,x>
<assign,=> <id,x>
<id,x>
<op,+> word
<id,y> token type
Parser
Parser
Parser: A parser recognizes context-free syntax and reports errors
and then builds builds IR for source program
Parser
It is a very important part of compiler. This what a Parser do:
• Receives stream of tokens from scanner.
• Generate a context-free grammar or syntax representation of
these tokens.
Parser
• In other words, a scanner is unaware of language syntax, it will
just generate stream of tokens.
• It is the parser that will properly understand the grammar of
source language or code.
CFG
(Context Free Grammar)
CFG
• A CFG is the representation of source code or source language
generated by a parser.
• It represented as context-free syntax or grammar by G=(S,N,T,P)
CFG
• For G=(S,N,T,P) we have:
• S is the start symbol
• N is a set of non-terminal symbols
• T is set of terminal symbols or words
• P is called production
Remember
• Terminal symbols of any language are those that cannot be
replaced.
• Where non-terminal symbols are those that can be
replaced
CFG
• S = goal
• T = { op,assign,number, id, +, -,*}
• N = { goal, expr, term, op,assign}
REMEMBER
1. Terminal symbols are only used by scanner
2. Where parser uses both terminal and non-terminal symbol.
CFG
• All these terms are from terminal symbol.
T = { number, id, +, -}
• So actually scanner generates a stream of “terminal
symbols” in the form of tokens.
• This is because a sacnner is just scanning the input or
source languge and cannot change it.
Example: How Parser Works
• Lets directly consider an example of how a Parser works or
deal with input stream of tokens it receives from a scanner.
• Consider following expression:
X+2-Y
• Now this is how a Parser will translate this expression.
CFG
• expr means expression:
Z=Y+2X
• Id means identifier
Z,X,Y,x,a,b (Normally variables)
• op means operation
+,-,*,/
EXAMPLE
Again Consider Following Example
• Example
x = x + y-2
• Scanner will generate following tokens:
• <id,x>
• <assign,=>
• <op,+>
• <id,x>
• <id,y>
• <op,->
• <number,2>
Example: How a Parser Translate Tokens
• goal
• expr
• expr op term
• expr op y
• expr –y
• expr op term –y
• expr op 2-y
• expr +2-y
• term +2-y
• x+2-y (input expression)
Parser
• A parser gives non terminal representation
of the terminal symbols or tokens it
receives from scanner.
N = { goal, expr, term, op}
Parser: Important
• Z=X+3-2*Y
• X=3*Y+2*Z
• X=2-7*Y+5*Z
• Goal Example: How a Parser Translate Tokens
• expr
• expr op term
• expr op y
• expr *y
• expr op term *y
• expr op (2*y) Z=X+3-2*Y
• expr -(2*y)
• expr op term -(2*y)
• expr op 3-(2*y)
• expr +3-(2*y)
• expr term +3-(2*y)
• expr x+3-(2*y)
• term assign x+3-(2*y)
• term =x+3-(2*y)
• Z=x+3-(2*y) (input expression)
Syntax Tree Generated By
Parser
Parser
• The parser will now generate a syntax tree based on the
above generated translation.
• But remember that output will be in the form of terminal
symbols that were generated by scanner.
Parser
goal
Syntax Tree For
x+2-y expr
expr op term
expr op term – <id,y>
term + <number, 2>
<id,x> 55
Example: How a Parser Translate Tokens
• goal
• expr
• expr op term
• expr op y
• expr –y
• expr op term –y
• expr op 2-y
• expr +2-y
• term +2-y
• x+2-y (input expression)
Abstract Syntax Tree
(AST)
Parser
• Parser contains lot of unneeded information.
• So hence repeated terms are removed and simple abstract
syntax tree is generated.
• This is the concise form of previous generated.
Parser: Important
• AST summarizes grammatical structure
without the details of derivation
Abstract Syntax Tree(AST): Example
+ <id,y>
<id,x> <number,2>
Syntax Tree For
x+2-y
Parser: Important
Home assignment:
• Generte scanner adn then parser ouput for
following expressions.
• Also draw Syntax tree and then AST.
• Z=X+3-2*Y
• X=3*Y+2*Z
• X=2-7*Y+5*Z
Parser: Important
• ASTs are one kind of intermediate representation
(IR)
COMING UP NEXT
• We will look into back end in next lecture.