CO3005 Chapter 3 Syntax Analysis
CO3005 Chapter 3 Syntax Analysis
Syntax Analysis
Introduction
Context-free Grammar
Challenges in
Grammar Writing
Parser.1
Parser
Overview
MEng. Tran Ngoc
Bao Duy
1 Introduction
Introduction
2 Context-free Grammar Context-free Grammar
Challenges in
Grammar Writing
4 Parser Generation from ANTLR4 Ambiguous Grammar
Common Cases in Parser Rule
Parser.2
Parser
Introduction
Context-free Grammar
Parser Generation
SYNTAX ANALYSIS from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.3
Parser
An overview of compilation
MEng. Tran Ngoc
Bao Duy
Character stream
Context-free Grammar
Abstract syntax tree (AST) generation Front-end
Abstract syntax tree Parser: How it works
Challenges in
Intermediate code generation Grammar Writing
Intermediate code Ambiguous Grammar
Parser.4
Parser
Roles of Syntax Analysis
MEng. Tran Ngoc
Bao Duy
Definition
Syntax analysis, also known as parsing, is the second phase
of a compiler, following lexical analysis. Its primary role is Introduction
to check whether the sequence of tokens generated by the Context-free Grammar
lexical analyzer forms a valid syntactic structure according Parser: How it works
to the rules of the programming language’s grammar. Parser Generation
from ANTLR4
1 Syntax Validation:
Challenges in
• Ensures that the program’s structure follows the lan- Grammar Writing
Ambiguous Grammar
guage’s grammar rules. Common Cases in Parser Rule
Parser.5
Parser
Syntax Analysis: Example
MEng. Tran Ngoc
Bao Duy
• Input: a + b ∗ c
• Grammar rules (Simplified Context-Free Grammar)
• Output (as parse tree):
Introduction
E Context-free Grammar
Parser Generation
E + T from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
F F id(c)
id(a) id(b)
Parser.6
Parser
Introduction
CONTEXT-FREE
Context-free Grammar
Parser Generation
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.7
Parser
The Limits of Regular Expressions
MEng. Tran Ngoc
Bao Duy
q0 b q1 q2
Common Cases in Parser Rule
start
Parser.8
Parser
The Limits of Regular Expressions
MEng. Tran Ngoc
Bao Duy
q0 b q1 q2
Common Cases in Parser Rule
start
Parser.8
Parser
Symmetry in Programming
MEng. Tran Ngoc
Bao Duy
Parser Generation
if ... then ... else or repeat ... until from ANTLR4
Parser.9
Parser
Symmetry in Programming
MEng. Tran Ngoc
Bao Duy
Parser Generation
if ... then ... else or repeat ... until from ANTLR4
Parser.9
Parser
Context-free grammar
MEng. Tran Ngoc
Bao Duy
A context-free grammar (CFG) is a formal system used to define
programming languages, natural languages, and structures in compu-
tation. CFG generates strings by applying production rules that replace
non-terminal symbols with sequences of symbols.
Introduction
Challenges in
2 A set of non-terminals N: Abstract symbols (e.g., S, X ) Grammar Writing
used in rules. Ambiguous Grammar
Common Cases in Parser Rule
Parser.10
Parser
Context-free grammar: Examples
MEng. Tran Ngoc
Bao Duy
Parser Generation
from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.11
Parser
Context-free grammar: Examples
MEng. Tran Ngoc
Bao Duy
Parser Generation
from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.11
Parser
Context-free grammar: Examples
MEng. Tran Ngoc
Bao Duy
Parser Generation
from ANTLR4
The statement, in turn, is often a list enclosed in braces:
Challenges in
Grammar Writing
block_statement −→ { statements } Ambiguous Grammar
Common Cases in Parser Rule
or:
block_statement −→ { }
Parser.11
Parser
Backus-Naur Form (BNF)
MEng. Tran Ngoc
Bao Duy
Backus and Peter Naur, who devised it for the definition Context-free Grammar
Parser Generation
2 Strictly speaking, RE operators (*, +, ?) and meta- from ANTLR4
Parser.12
Parser
Backus-Naur Form (BNF)
MEng. Tran Ngoc
Bao Duy
Backus and Peter Naur, who devised it for the definition Context-free Grammar
Parser Generation
2 Strictly speaking, RE operators (*, +, ?) and meta- from ANTLR4
Parser.12
Parser
Extended Backus-Naur Form (EBNF)
MEng. Tran Ngoc
Bao Duy
Parser.13
Parser
Extended Backus-Naur Form (EBNF)
MEng. Tran Ngoc
Bao Duy
Parser.13
Parser
Introduction
PARSER:
Context-free Grammar
Parser Generation
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.14
Parser
How a Parser Uses CFG
MEng. Tran Ngoc
Bao Duy
3 Match the tokens step by step until the input is fully Parser Generation
from ANTLR4
consumed.
Challenges in
4 If no valid derivation exists, report a syntax error. Grammar Writing
Parser.15
Parser
How a Parser Uses CFG
MEng. Tran Ngoc
Bao Duy
3 Match the tokens step by step until the input is fully Parser Generation
from ANTLR4
consumed.
Challenges in
4 If no valid derivation exists, report a syntax error. Grammar Writing
Parser.15
Parser
Leftmost derivation
MEng. Tran Ngoc
Bao Duy
Definition
Leftmost derivation is a way to generate a string from Introduction
a context-free grammar by expanding the leftmost non- Context-free Grammar
terminal at each step. This method produces a parse tree Parser: How it works
Parser.16
Parser
Leftmost derivation
MEng. Tran Ngoc
Bao Duy
Definition
Leftmost derivation is a way to generate a string from Introduction
a context-free grammar by expanding the leftmost non- Context-free Grammar
terminal at each step. This method produces a parse tree Parser: How it works
Parser.16
Parser
Leftmost derivation: Examples
MEng. Tran Ngoc
Grammar: Bao Duy
Context-free Grammar
Parser Generation
from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.17
Parser
Leftmost derivation: Examples
MEng. Tran Ngoc
Grammar: Bao Duy
expr + term
Introduction
term term * factor
Context-free Grammar
Challenges in
NUMBER NUMBER 2 Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
3 5
Parser.18
Parser
Parse Tree as Output
MEng. Tran Ngoc
expr Bao Duy
expr + term
Introduction
term term * factor
Context-free Grammar
Challenges in
NUMBER NUMBER 2 Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
3 5
Definition
A parse tree is a hierarchical, tree-like structure that represents the
syntactic structure of a string according to a context-free grammar
(CFG). Each node in the tree corresponds to a grammar symbol, and
the edges represent the application of production rules.
Parser.18
Parser
Introduction
Context-free Grammar
Parser Generation
FROM ANTLR4 from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.19
Parser
ANTLR4: A Quick Recap
MEng. Tran Ngoc
Bao Duy
Recap
ANTLR4 (Another Tool for Language Recognition) is a pow-
Introduction
erful tool for generating parsers from grammar definitions. It can
Context-free Grammar
automatically create lexer, parser, and listener/visitor code for
Parser: How it works
various programming languages
Parser Generation
from ANTLR4
ANTLR4 Workflow for Parser Generation:
Challenges in
Grammar Writing
1 Define the Grammar: uses a .g4 file, includes lexer rules Ambiguous Grammar
(for token recognition) and parser rules (for syntactic struc- Common Cases in Parser Rule
ture).
2 Run the command to generate the parser and lexer code.
3 Use the parser and lexer generated.
Parser.20
Parser
Writing Parser Rules
MEng. Tran Ngoc
Bao Duy
• Repetition as *, +, ?
• Grouping as ()
• Token end-of-file EOF using for starting rule.
Parser.21
Parser
Example Grammar: Arithmetic Language
MEng. Tran Ngoc
Bao Duy
grammar Arithmetic ;
// Parser Rules
expr : expr ’+ ’ term // Addition Introduction
| expr ’ - ’ term // Subtraction Context-free Grammar
| term ; // Single Term Parser: How it works
Parser Generation
term : term ’* ’ factor // Multiplication from ANTLR4
| term ’/ ’ factor // Division Challenges in
| factor ; // Single Factor Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
factor : ’( ’ expr ’) ’ // Parentheses
| NUMBER ; // Numbers
// Lexer Rules
NUMBER : [0 -9]+; // Integer
WS : [ \ t \ r \ n ]+ -> skip ; // Ignore whitespace
Parser.22
Parser
Introduction
Context-free Grammar
Parser Generation
GRAMMAR WRITING from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.23
Parser
Ambiguous Grammar in CFGs
MEng. Tran Ngoc
Bao Duy
Definition
A context-free grammar (CFG) is said to be ambiguous if
there exists at least one string that can have more than one Introduction
distinct parse tree or derivation. This means the grammar Context-free Grammar
allows multiple valid interpretations for the same input, lead- Parser: How it works
ing to syntactic ambiguity. Parser Generation
from ANTLR4
• Multiple Parse Trees: A single input string can lead Challenges in
to more than one valid parse tree. Grammar Writing
Ambiguous Grammar
• Parser Conflicts: Ambiguous grammars often cause Common Cases in Parser Rule
Parser.24
Parser
Ambiguous Grammar: Example
MEng. Tran Ngoc
Bao Duy
WS : [ \ t \ r \ n ]+ -> skip ;
Parser.25
Parser
Example: two possible parse trees
expr MEng. Tran Ngoc
Bao Duy
expr * expr
Context-free Grammar
3 5 Parser: How it works
Parse Tree 1: (3 + 5) ∗ 2 Parser Generation
from ANTLR4
Challenges in
expr Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
expr + expr
3 NUMBER NUMBER
5 2
Parse Tree 2: 3 + (5 ∗ 2)
Parser.26
Parser
Common Patterns in Writing Parser Rules
MEng. Tran Ngoc
Bao Duy
there are five recurring patterns that frequently appear: Parser: How it works
Parser Generation
1 A non-empty list of x. from ANTLR4
Parser.27
Parser
Case 1: A non-empty list of x
MEng. Tran Ngoc
Bao Duy
Example
A program in the C language consists of at least one func-
tion. The non-terminal symbol for a program is program (as
a start symbol), and for a func, it is function. Introduction
Parser.28
Parser
Case 1: A non-empty list of x
MEng. Tran Ngoc
Bao Duy
Example
A program in the C language consists of at least one func-
tion. The non-terminal symbol for a program is program (as
a start symbol), and for a func, it is function. Introduction
• With EBNF:
Parser.28
Parser
Formula 1: A non-empty list of x
MEng. Tran Ngoc
Bao Duy
Parser Generation
xlist : x xlist | x ; from ANTLR4
Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x +;
Parser.29
Parser
Case 2: A null-able list of x
MEng. Tran Ngoc
Bao Duy
Example
The body of a function in C consists of a null-able list of
statements enclosed in a pair of curly braces. The function Introduction
Write the production rule for the function body. Parser Generation
from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.30
Parser
Case 2: A null-able list of x
MEng. Tran Ngoc
Bao Duy
Example
The body of a function in C consists of a null-able list of
statements enclosed in a pair of curly braces. The function Introduction
Write the production rule for the function body. Parser Generation
from ANTLR4
Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
func_body : ’{ ’ stmtlist ’} ’; Common Cases in Parser Rule
• With EBNF:
func_body : ’{ ’ stmtlist ’} ’;
stmtlist : stmt *;
Parser.30
Parser
Formula 2: A non-empty list of x
MEng. Tran Ngoc
Bao Duy
Parser Generation
xlist : x xlist | ; from ANTLR4
Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x *;
Parser.31
Parser
Case 3: A non-empty list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
A variable declaration starts with a type, which is int or
float, then a comma-separated list of identifiers and ends
with a semicolon. It is known that a variable declaration is Introduction
denoted as vardecl, and an identifier is denoted as ID. Context-free Grammar
Other terminal symbols include: INT (integer type), Parser: How it works
FLOAT (floating-point type), SEMI (semicolon), and CM Parser Generation
from ANTLR4
(comma).
Challenges in
Write the production rule for the variable declaration. Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.32
Parser
Case 3: A non-empty list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
A variable declaration starts with a type, which is int or
float, then a comma-separated list of identifiers and ends
with a semicolon. It is known that a variable declaration is Introduction
denoted as vardecl, and an identifier is denoted as ID. Context-free Grammar
Other terminal symbols include: INT (integer type), Parser: How it works
FLOAT (floating-point type), SEMI (semicolon), and CM Parser Generation
from ANTLR4
(comma).
Challenges in
Write the production rule for the variable declaration. Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
• With BNF:
• With EBNF:
Parser Generation
xlist : x y xlist | x ; from ANTLR4
Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x ( y x )*;
Parser.33
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction
Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.34
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction
Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4
Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
paramdecl : LB paramlist RB ;
paramlist : param SM paramlist | ;
• With EBNF:
paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;
Parser.34
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction
Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4
Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
paramdecl : LB paramlist RB ;
paramlist : param SM paramlist | ;
• With EBNF:
paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;
Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4
Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
paramdecl : LB paramlist RB ;
paramlist : param SM paramlist | ;
• With EBNF:
paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;
Write the production rule for the parameter declaration. Parser Generation
from ANTLR4
Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
paramdecl : LB paramlist RB ; Common Cases in Parser Rule
paramlist : paramprime | ;
paramprime : param SM paramprime | param ;
• With EBNF:
paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;
or paramlist: (param (SM param)*)?;
Parser.35
Parser
Formula 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Context-free Grammar
1 With BNF:
Parser: How it works
xlist : xprime | ; Parser Generation
xprime : x y xprime | x ; from ANTLR4
Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x ( y x )* | ;
Parser.36
Parser
Case 5: An infix expression
MEng. Tran Ngoc
Bao Duy
ematical operation, listed in the order they should be performed: Context-free Grammar
Challenges in
• MD – Multiplication and Division (From left to right) Grammar Writing
Ambiguous Grammar
• AS – Addition and Subtraction (From left to right) Common Cases in Parser Rule
Parser.37
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parser.38
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parser Generation
factor + fact from ANTLR4
Challenges in
ID factor + fact Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
b ID factor
c ID
Parser.39
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parse Tree for b + c + d:
exp
term
Introduction
c ID
Parser.39
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parse Tree for b + c + d:
exp
term
Introduction
fact
Context-free Grammar
exp
term
fact Introduction
Context-free Grammar
factor + fact
Parser: How it works
b ID factor Challenges in
Grammar Writing
Ambiguous Grammar
c ID Common Cases in Parser Rule
term = exp
Introduction
fact term
Context-free Grammar
Parser Generation
ID factor + fact from ANTLR4
Challenges in
a ID factor Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
b ID
Parser.40
Parser
Formula 5: An infix expression
MEng. Tran Ngoc
Bao Duy
Parser.41
Parser
Formula 5: An infix expression
MEng. Tran Ngoc
Bao Duy
Parser.41
Parser
Introduction
Context-free Grammar
THANK YOU.
Parser: How it works
Parser Generation
from ANTLR4
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
Parser.42