KEMBAR78
Unit 2. The Parts of A Compiler | PDF | Parsing | Compiler
0% found this document useful (0 votes)
199 views24 pages

Unit 2. The Parts of A Compiler

The document discusses the main parts of a compiler: 1. Lexical analysis breaks the source code into tokens. 2. Syntax analysis groups the tokens into a parse tree based on the language's grammar. 3. Semantic analysis checks for errors and collects type information. 4. Intermediate code generation produces an abstract representation to optimize before code generation.

Uploaded by

Huy Đỗ Quang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views24 pages

Unit 2. The Parts of A Compiler

The document discusses the main parts of a compiler: 1. Lexical analysis breaks the source code into tokens. 2. Syntax analysis groups the tokens into a parse tree based on the language's grammar. 3. Semantic analysis checks for errors and collects type information. 4. Intermediate code generation produces an abstract representation to optimize before code generation.

Uploaded by

Huy Đỗ Quang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit 2.

The parts of a
Compiler

1
Main parts of a compiler

2
parts of a compiler
 Lexical Analysis
Stream of characters making up the source program
is read from left to right and grouped into tokens
(sequences of characters having a collective
meaning)
 Syntax Analysis
Group the tokens of the source program into
grammatical phrases that are used by the compiler to
synthesize output

3
parts of a compiler
 Semantic Analysis: Check the source
program for semantic errors and gather
type information for the subsequent code
generation part.
 Intermediate Code Generation: Generate
an intermediate representation as a
program for an abstract machine.

4
parts of a compiler
 Code optimization : Improve the
intermediate code so that faster running
code will result
 Code generation: Generation of target
code, consisting normally of relocatable
machine code or assembly code

5
Translation
of a
statement

6
Details of the parts of a Compiler
part Output Sample
Position:= inition * rate + 60
Programmer (source code producer) Source string
Scanner (performs lexical analyzer) Token string position’, ‘:=’, ‘inition’, ‘+’,
‘60’,
And symbol table with identifier

Parser (performs syntax analysis Parse tree or abstract syntax tree


based on the grammar of the
programming language)

Semantic analyzer (type checking, Annotated parse tree or abstract Convert integer (60)
etc) syntax tree to real
Intermediate code generator Three-address code

Optimizer Three-address code

Code generator Assembly code

7
The Grouping of parts
 Compiler front and back ends:
 Frontend: analysis (machine independent)
 Back end: synthesis (machine dependent)

 Compiler passes:
A collection of parts is done only once (single pass) or
multiple times (multi pass)
 Single pass: usually requires everything to be defined before
being used in source program
 Multi pass: compiler may have to keep entire program
representation in memory

8
part 1:Lexical Analysis
 Scanner: Converts the stream of input
characters into a stream of tokens that becomes
the input to the following part (parsing)
 Tasks of a scanner
Group characters into tokens
Token: the syntax unit
Categorization of tokens.

9
Types of tokens

10
part 2: Parsing
 The process of determining if a string of
token can be generate by a grammar
 Is executed by a parser

11
part 2: Parsing
 Output of a parser:
 Parse tree (if any)
 Error Message (otherwise)

 If a parse tree is built succesfully, the


program is grammatically correct

12
Parse tree of statement a = b + c

13
Grammars,languages, BNF,syntax diagrams
 The parser takes the token produced by scanner as
input and generates a parse tree (or syntax tree).
Token arrangements are checked against the
grammar of the source language.
 Notations for grammar:
 BNF (Backus-Naur Form) is is a meta language used to
express grammars of programming languages
 Syntax Diagrams : A pictorial diagram showing the rules
for forming an instruction in a programming language, and
how the components of the statement are related. Syntax
diagrams are like directed graphs.

14
BNF
 BNF (and formal grammars) use 2 types of symbol
 Terminals :
 Tokens of the language
 Never appear in the left side of any production

 Nonterminals
 Intermediate symbol to express structures of a language
 Must be in a left side of at lease one production
 Enclose in <>

 Start symbol
 Nonterminal of the first level
 Appear at the root of parse tree

15
Grammars,languages, BNF,syntax diagrams

 Start symbol :
 Nonterminal of the first level
 Appear at the root of parse tree

16
Parsing: Concept and Techniques

 Continuously apply grammatical rules until


a string of terminal is generated.
 If the parser convert first symbol into the
input string, it is syntactically correct
 Otherwise, string is not syntactically
correct

17
Parsing: Concept and techniques

 The most important thing of a


compiler: grammar
 Grammar includes all structures of a
program
 Not includes any other rule

18
Parsing: Concept and Techniques

 Grammar must be unambiguous

 If
grammar is ambiguous, more than
one parse tree can be created

19
part 3: Semantic Analysis
 Certain check are performed to
ensure that the components of a
program fit together meaningfully
 To generate code, source program
must be syntactically and semantically
correct

20
part 4: Intermediate code generation
 Source program is transferred to an
equivalent program in intermediate code by
intermediate code generator
 Intermediate code is close to the target code,
which makes it suitable for register and
memory allocation, instruction set selection,
etc.
 It is good for machine-dependent
optimizations.
21
Advantages of Intermediate Code

1. Easy to translate into object code.


2. Code optimizer can be applied before
code generation
3. Decrease time cost

22
part 5: Code Generator
 Input: Intermediate code of source program
 Output: Object program
 Assembly code
 Virtual machine code

23
Problems

 Input
 Output
 Object machine
 Set of instruction
 Register allocation

24

You might also like