KEMBAR78
CO3005 Chapter 3 Syntax Analysis | PDF | Parsing | Computer Science
0% found this document useful (0 votes)
18 views62 pages

CO3005 Chapter 3 Syntax Analysis

The document provides an overview of parsers and syntax analysis, detailing the role of context-free grammar in parsing and the process of parser generation using ANTLR4. It discusses challenges in grammar writing, including ambiguous grammar and common cases in parser rules. Additionally, it covers the concepts of leftmost derivation and the limitations of regular expressions in expressing certain patterns.

Uploaded by

dang.nguyen106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views62 pages

CO3005 Chapter 3 Syntax Analysis

The document provides an overview of parsers and syntax analysis, detailing the role of context-free grammar in parsing and the process of parser generation using ANTLR4. It discusses challenges in grammar writing, including ambiguous grammar and common cases in parser rules. Additionally, it covers the concepts of leftmost derivation and the limitations of regular expressions in expressing certain patterns.

Uploaded by

dang.nguyen106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Parser

MEng. Tran Ngoc


Bao Duy

Syntax Analysis
Introduction

Context-free Grammar

Parser: How it works


Principles of Programming Languages
Parser Generation
from ANTLR4

Challenges in
Grammar Writing

MEng. Tran Ngoc Bao Duy Ambiguous Grammar


Common Cases in Parser Rule

Department of Computer Science


Faculty of Computer Science and Engineering
Ho Chi Minh University of Technology, VNU-HCM

Parser.1
Parser
Overview
MEng. Tran Ngoc
Bao Duy

1 Introduction

Introduction
2 Context-free Grammar Context-free Grammar

Parser: How it works

3 Parser: How it works Parser Generation


from ANTLR4

Challenges in
Grammar Writing
4 Parser Generation from ANTLR4 Ambiguous Grammar
Common Cases in Parser Rule

5 Challenges in Grammar Writing


Ambiguous Grammar
Common Cases in Parser Rule

Parser.2
Parser

MEng. Tran Ngoc


Bao Duy

Introduction

Context-free Grammar

INTRODUCTION TO Parser: How it works

Parser Generation
SYNTAX ANALYSIS from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.3
Parser
An overview of compilation
MEng. Tran Ngoc
Bao Duy
Character stream

Lexical analysis (scanner)


Token stream

Syntax analysis (parser)


Parse tree Introduction

Context-free Grammar
Abstract syntax tree (AST) generation Front-end
Abstract syntax tree Parser: How it works

Semantic analysis (Static checking) Parser Generation


(Modified) AST from ANTLR4

Challenges in
Intermediate code generation Grammar Writing
Intermediate code Ambiguous Grammar

Machine-independent code Common Cases in Parser Rule


optimization (optional)
Modified IC
Target code generation Back-end
Target language
Machine-specific end code
optimization (optional)
Modified
target language

Parser.4
Parser
Roles of Syntax Analysis
MEng. Tran Ngoc
Bao Duy

Definition
Syntax analysis, also known as parsing, is the second phase
of a compiler, following lexical analysis. Its primary role is Introduction
to check whether the sequence of tokens generated by the Context-free Grammar
lexical analyzer forms a valid syntactic structure according Parser: How it works
to the rules of the programming language’s grammar. Parser Generation
from ANTLR4
1 Syntax Validation:
Challenges in
• Ensures that the program’s structure follows the lan- Grammar Writing
Ambiguous Grammar
guage’s grammar rules. Common Cases in Parser Rule

• Detects syntax errors and reports them to the developer


with clear error messages.
2 Parse Tree Generation: Constructs a parse tree (or abstract
syntax tree - AST) representing the hierarchical structure of
the program based on grammar rules.

Parser.5
Parser
Syntax Analysis: Example
MEng. Tran Ngoc
Bao Duy
• Input: a + b ∗ c
• Grammar rules (Simplified Context-Free Grammar)
• Output (as parse tree):
Introduction

E Context-free Grammar

Parser: How it works

Parser Generation
E + T from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar

T T ∗ F Common Cases in Parser Rule

F F id(c)

id(a) id(b)

Parser.6
Parser

MEng. Tran Ngoc


Bao Duy

Introduction

CONTEXT-FREE
Context-free Grammar

Parser: How it works

Parser Generation

GRAMMAR from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.7
Parser
The Limits of Regular Expressions
MEng. Tran Ngoc
Bao Duy

Regular expressions (REs) are powerful for pattern match-


ing. But can they express certain symmetric patterns?
Introduction
L = {an bn | n > 0} Context-free Grammar

Parser: How it works


Using a finite automaton (as back-end of regex):
Parser Generation
from ANTLR4
a b Challenges in
Grammar Writing
? Ambiguous Grammar

q0 b q1 q2
Common Cases in Parser Rule

start

Parser.8
Parser
The Limits of Regular Expressions
MEng. Tran Ngoc
Bao Duy

Regular expressions (REs) are powerful for pattern match-


ing. But can they express certain symmetric patterns?
Introduction
L = {an bn | n > 0} Context-free Grammar

Parser: How it works


Using a finite automaton (as back-end of regex):
Parser Generation
from ANTLR4
a b Challenges in
Grammar Writing
? Ambiguous Grammar

q0 b q1 q2
Common Cases in Parser Rule

start

Limitation: The finite automaton cannot count the number


of as to ensure they match the number of bs.

Parser.8
Parser
Symmetry in Programming
MEng. Tran Ngoc
Bao Duy

Symmetry in programming refers to structures where one


part mirrors or matches another, ensuring balance and cor-
rectness. Introduction
• Balanced Parentheses: ((...)) Context-free Grammar

• Control Structures: Parser: How it works

Parser Generation
if ... then ... else or repeat ... until from ANTLR4

• HTML/XML Tags: <div><p>Text</p></div> Challenges in


Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.9
Parser
Symmetry in Programming
MEng. Tran Ngoc
Bao Duy

Symmetry in programming refers to structures where one


part mirrors or matches another, ensuring balance and cor-
rectness. Introduction
• Balanced Parentheses: ((...)) Context-free Grammar

• Control Structures: Parser: How it works

Parser Generation
if ... then ... else or repeat ... until from ANTLR4

• HTML/XML Tags: <div><p>Text</p></div> Challenges in


Grammar Writing
Regular expressions cannot describe this kind of structure Ambiguous Grammar
Common Cases in Parser Rule

but they could be:


1 A means of describing this kind of language.
2 A method to detect if a sequence of tokens is valid or
invalid with respect to this kind of language.

Parser.9
Parser
Context-free grammar
MEng. Tran Ngoc
Bao Duy
A context-free grammar (CFG) is a formal system used to define
programming languages, natural languages, and structures in compu-
tation. CFG generates strings by applying production rules that replace
non-terminal symbols with sequences of symbols.
Introduction

Components Context-free Grammar

Parser: How it works


1 A set of terminals T: Basic symbols (e.g., a, b) used to Parser Generation
represent strings. from ANTLR4

Challenges in
2 A set of non-terminals N: Abstract symbols (e.g., S, X ) Grammar Writing
used in rules. Ambiguous Grammar
Common Cases in Parser Rule

3 A start symbol S ∈ N: A special non-terminal where deriva-


tion starts.
4 A set of production P: Rules for replacing non-terminals
with sequences of terminals/non-terminals.
A production p ∈ P is in the form: X → α where X ∈ N
and α is a sequence of symbols in T and/or N.

Parser.10
Parser
Context-free grammar: Examples
MEng. Tran Ngoc
Bao Duy

In C, for example, a while loop consists of the keyword


while followed by a parenthesized Boolean expression and
Introduction
a block of statements:
Context-free Grammar

Parser: How it works

Parser Generation
from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.11
Parser
Context-free grammar: Examples
MEng. Tran Ngoc
Bao Duy

In C, for example, a while loop consists of the keyword


while followed by a parenthesized Boolean expression and
Introduction
a block of statements:
Context-free Grammar

while_statement −→ while ( expression ) block_statement Parser: How it works

Parser Generation
from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.11
Parser
Context-free grammar: Examples
MEng. Tran Ngoc
Bao Duy

In C, for example, a while loop consists of the keyword


while followed by a parenthesized Boolean expression and
Introduction
a block of statements:
Context-free Grammar

while_statement −→ while ( expression ) block_statement Parser: How it works

Parser Generation
from ANTLR4
The statement, in turn, is often a list enclosed in braces:
Challenges in
Grammar Writing
block_statement −→ { statements } Ambiguous Grammar
Common Cases in Parser Rule

or:
block_statement −→ { }

Parser.11
Parser
Backus-Naur Form (BNF)
MEng. Tran Ngoc
Bao Duy

1 The notation for context-free grammars is sometimes


called Backus-Naur Form (BNF), in honor of John Introduction

Backus and Peter Naur, who devised it for the definition Context-free Grammar

of the Algol-60 programming language. Parser: How it works

Parser Generation
2 Strictly speaking, RE operators (*, +, ?) and meta- from ANTLR4

level parentheses are not allowed in BNF, but they do Challenges in


Grammar Writing
not change the expressive power of the notation, and Ambiguous Grammar
Common Cases in Parser Rule
are commonly included for convenience.

Parser.12
Parser
Backus-Naur Form (BNF)
MEng. Tran Ngoc
Bao Duy

1 The notation for context-free grammars is sometimes


called Backus-Naur Form (BNF), in honor of John Introduction

Backus and Peter Naur, who devised it for the definition Context-free Grammar

of the Algol-60 programming language. Parser: How it works

Parser Generation
2 Strictly speaking, RE operators (*, +, ?) and meta- from ANTLR4

level parentheses are not allowed in BNF, but they do Challenges in


Grammar Writing
not change the expressive power of the notation, and Ambiguous Grammar
Common Cases in Parser Rule
are commonly included for convenience.
−→ BNF is used to describe how strings in a language can
be derived using recursive rules.

Parser.12
Parser
Extended Backus-Naur Form (EBNF)
MEng. Tran Ngoc
Bao Duy

1 EBNF (Extended Backus-Naur Form) is an exten-


sion of BNF that introduces RE operators to simplify
grammar definitions.
Introduction
2 EBNF allows for more compact, readable, and ex- Context-free Grammar
pressive grammar rules. Parser: How it works

3 It can describe the same set of context-free lan- Parser Generation


from ANTLR4
guages (CFLs) as BNF but in a more succinct way. Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.13
Parser
Extended Backus-Naur Form (EBNF)
MEng. Tran Ngoc
Bao Duy

1 EBNF (Extended Backus-Naur Form) is an exten-


sion of BNF that introduces RE operators to simplify
grammar definitions.
Introduction
2 EBNF allows for more compact, readable, and ex- Context-free Grammar
pressive grammar rules. Parser: How it works

3 It can describe the same set of context-free lan- Parser Generation


from ANTLR4
guages (CFLs) as BNF but in a more succinct way. Challenges in
Grammar Writing
Ambiguous Grammar
Comparison of BNF and EBNF Features Common Cases in Parser Rule

Feature BNF EBNF


Repetition Recursive rules *, +, ?
Grouping Multiple non-terminal rules Parentheses ()
Alternatives | Same, inside ()
Optional Elements | with ε Brackets [...]

Parser.13
Parser

MEng. Tran Ngoc


Bao Duy

Introduction

PARSER:
Context-free Grammar

Parser: How it works

Parser Generation

HOW IT WORKS from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.14
Parser
How a Parser Uses CFG
MEng. Tran Ngoc
Bao Duy

• Input: A sequence of tokens from the lexer.


• Process: Introduction

1 Start with the start symbol. Context-free Grammar

2 Apply production rules to expand non-terminals. Parser: How it works

3 Match the tokens step by step until the input is fully Parser Generation
from ANTLR4
consumed.
Challenges in
4 If no valid derivation exists, report a syntax error. Grammar Writing

• Output: A parse tree if the input is valid or a syntax Ambiguous Grammar


Common Cases in Parser Rule

error if the input violates the grammar.

Parser.15
Parser
How a Parser Uses CFG
MEng. Tran Ngoc
Bao Duy

• Input: A sequence of tokens from the lexer.


• Process: Introduction

1 Start with the start symbol. Context-free Grammar

2 Apply production rules to expand non-terminals. Parser: How it works

3 Match the tokens step by step until the input is fully Parser Generation
from ANTLR4
consumed.
Challenges in
4 If no valid derivation exists, report a syntax error. Grammar Writing

• Output: A parse tree if the input is valid or a syntax Ambiguous Grammar


Common Cases in Parser Rule

error if the input violates the grammar.


⇒ This is a top-down parser uses context-free grammar
to derive the structure of an input string (derivation).

Parser.15
Parser
Leftmost derivation
MEng. Tran Ngoc
Bao Duy

Definition
Leftmost derivation is a way to generate a string from Introduction
a context-free grammar by expanding the leftmost non- Context-free Grammar
terminal at each step. This method produces a parse tree Parser: How it works

by always replacing the leftmost non-terminal first, en- Parser Generation


from ANTLR4
suring a structured and predictable derivation process.
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.16
Parser
Leftmost derivation
MEng. Tran Ngoc
Bao Duy

Definition
Leftmost derivation is a way to generate a string from Introduction
a context-free grammar by expanding the leftmost non- Context-free Grammar
terminal at each step. This method produces a parse tree Parser: How it works

by always replacing the leftmost non-terminal first, en- Parser Generation


from ANTLR4
suring a structured and predictable derivation process.
Challenges in
Grammar Writing
Benefits of Leftmost Derivation: Ambiguous Grammar
Common Cases in Parser Rule
1 Deterministic: Ensures a clear, step-by-step derivation process.
2 Easy to Implement: Forms the basis for top-down parsers.
3 Predictable Tree Structure: Guarantees consistent parse tree
construction from left to right.

Parser.16
Parser
Leftmost derivation: Examples
MEng. Tran Ngoc
Grammar: Bao Duy

1 expr → expr + term | term


2 term → term * factor | factor
3 factor → ( expr ) | NUMBER
Introduction

Context-free Grammar

Parser: How it works

Parser Generation
from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.17
Parser
Leftmost derivation: Examples
MEng. Tran Ngoc
Grammar: Bao Duy

1 expr → expr + term | term


2 term → term * factor | factor
3 factor → ( expr ) | NUMBER
Introduction
Input: 3 + 5 ∗ 2, derivation (step by Step):
Context-free Grammar
1 Start with the start symbol: expr
Parser: How it works
2 Apply the first rule: expr → expr + term Parser Generation
from ANTLR4
3 Expand the leftmost expr again: expr → term + term
Challenges in
4 Expand the leftmost term: term → factor + term Grammar Writing
Ambiguous Grammar

5 Replace factor with NUMBER: factor → 3 Common Cases in Parser Rule

Now we have: term → 3 + term


6 Expand the next term: term → term * factor
7 Replace term with factor : term → factor * factor
8 Replace factor with NUMBER: factor → 5
Now we have: term → 3 + 5 * factor
9 Replace the last factor : factor → 2
Parser.17
Parser
Parse Tree as Output
MEng. Tran Ngoc
expr Bao Duy

expr + term

Introduction
term term * factor
Context-free Grammar

Parser: How it works


factor factor NUMBER Parser Generation
from ANTLR4

Challenges in
NUMBER NUMBER 2 Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

3 5

Parser.18
Parser
Parse Tree as Output
MEng. Tran Ngoc
expr Bao Duy

expr + term

Introduction
term term * factor
Context-free Grammar

Parser: How it works


factor factor NUMBER Parser Generation
from ANTLR4

Challenges in
NUMBER NUMBER 2 Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

3 5
Definition
A parse tree is a hierarchical, tree-like structure that represents the
syntactic structure of a string according to a context-free grammar
(CFG). Each node in the tree corresponds to a grammar symbol, and
the edges represent the application of production rules.
Parser.18
Parser

MEng. Tran Ngoc


Bao Duy

Introduction

Context-free Grammar

PARSER GENERATION Parser: How it works

Parser Generation
FROM ANTLR4 from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.19
Parser
ANTLR4: A Quick Recap
MEng. Tran Ngoc
Bao Duy

Recap
ANTLR4 (Another Tool for Language Recognition) is a pow-
Introduction
erful tool for generating parsers from grammar definitions. It can
Context-free Grammar
automatically create lexer, parser, and listener/visitor code for
Parser: How it works
various programming languages
Parser Generation
from ANTLR4
ANTLR4 Workflow for Parser Generation:
Challenges in
Grammar Writing
1 Define the Grammar: uses a .g4 file, includes lexer rules Ambiguous Grammar
(for token recognition) and parser rules (for syntactic struc- Common Cases in Parser Rule

ture).
2 Run the command to generate the parser and lexer code.
3 Use the parser and lexer generated.

Parser.20
Parser
Writing Parser Rules
MEng. Tran Ngoc
Bao Duy

Parser rules define how tokens are combined to form valid


structures.
Introduction
• Written in lowercase (by convention).
Context-free Grammar
• Can reference other parser rules or lexer rules. Parser: How it works

Grammar Features: Parser Generation


from ANTLR4
• Production symbol as : Challenges in
Grammar Writing
• Alternatives as | Ambiguous Grammar
Common Cases in Parser Rule

• Repetition as *, +, ?
• Grouping as ()
• Token end-of-file EOF using for starting rule.

Parser.21
Parser
Example Grammar: Arithmetic Language
MEng. Tran Ngoc
Bao Duy

grammar Arithmetic ;

// Parser Rules
expr : expr ’+ ’ term // Addition Introduction
| expr ’ - ’ term // Subtraction Context-free Grammar
| term ; // Single Term Parser: How it works

Parser Generation
term : term ’* ’ factor // Multiplication from ANTLR4
| term ’/ ’ factor // Division Challenges in
| factor ; // Single Factor Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
factor : ’( ’ expr ’) ’ // Parentheses
| NUMBER ; // Numbers

// Lexer Rules
NUMBER : [0 -9]+; // Integer
WS : [ \ t \ r \ n ]+ -> skip ; // Ignore whitespace

Parser.22
Parser

MEng. Tran Ngoc


Bao Duy

Introduction

Context-free Grammar

CHALLENGES IN Parser: How it works

Parser Generation
GRAMMAR WRITING from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.23
Parser
Ambiguous Grammar in CFGs
MEng. Tran Ngoc
Bao Duy

Definition
A context-free grammar (CFG) is said to be ambiguous if
there exists at least one string that can have more than one Introduction
distinct parse tree or derivation. This means the grammar Context-free Grammar
allows multiple valid interpretations for the same input, lead- Parser: How it works
ing to syntactic ambiguity. Parser Generation
from ANTLR4
• Multiple Parse Trees: A single input string can lead Challenges in
to more than one valid parse tree. Grammar Writing
Ambiguous Grammar

• Parser Conflicts: Ambiguous grammars often cause Common Cases in Parser Rule

conflicts during parsing.


• Complex Disambiguation: Resolving ambiguity typi-
cally requires modifying the grammar to enforce prece-
dence, associativity, or restructuring rules.

Parser.24
Parser
Ambiguous Grammar: Example
MEng. Tran Ngoc
Bao Duy

Grammar (in ANTLR4):


Introduction
grammar AmbiguousExpr ;
Context-free Grammar

expr : expr ( ’+ ’ | ’ - ’) expr // AddSub Parser: How it works

| expr ( ’* ’ | ’/ ’) expr // MulDiv Parser Generation


from ANTLR4
| ’( ’ expr ’) ’ // Parens
| NUMBER ; // Number Challenges in
Grammar Writing
Ambiguous Grammar

NUMBER : [0 -9]+; Common Cases in Parser Rule

WS : [ \ t \ r \ n ]+ -> skip ;

For an input 3 + 5 ∗ 2, there are two possible parse trees

Parser.25
Parser
Example: two possible parse trees
expr MEng. Tran Ngoc
Bao Duy
expr * expr

expr + expr NUMBER

NUMBER NUMBER 2 Introduction

Context-free Grammar
3 5 Parser: How it works
Parse Tree 1: (3 + 5) ∗ 2 Parser Generation
from ANTLR4

Challenges in
expr Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
expr + expr

NUMBER expr * expr

3 NUMBER NUMBER

5 2

Parse Tree 2: 3 + (5 ∗ 2)
Parser.26
Parser
Common Patterns in Writing Parser Rules
MEng. Tran Ngoc
Bao Duy

In the process of writing parser rules, the typical ap-


proach is to follow the order in which they appear in the Introduction

language specification. However, during the writing process, Context-free Grammar

there are five recurring patterns that frequently appear: Parser: How it works

Parser Generation
1 A non-empty list of x. from ANTLR4

2 A null-able list of x. Challenges in


Grammar Writing

3 A non-empty list of x, separated by y. Ambiguous Grammar


Common Cases in Parser Rule

4 A null-able list of x, separated by y.


5 An infix expression.

Parser.27
Parser
Case 1: A non-empty list of x
MEng. Tran Ngoc
Bao Duy
Example
A program in the C language consists of at least one func-
tion. The non-terminal symbol for a program is program (as
a start symbol), and for a func, it is function. Introduction

Write the production rule for program. Context-free Grammar

Parser: How it works

Explanation: A program in the C language consists of at Parser Generation


from ANTLR4
least one function ⇒ A program in the C language consists
Challenges in
of a non-empty list of functions. Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.28
Parser
Case 1: A non-empty list of x
MEng. Tran Ngoc
Bao Duy
Example
A program in the C language consists of at least one func-
tion. The non-terminal symbol for a program is program (as
a start symbol), and for a func, it is function. Introduction

Write the production rule for program. Context-free Grammar

Parser: How it works

Explanation: A program in the C language consists of at Parser Generation


from ANTLR4
least one function ⇒ A program in the C language consists
Challenges in
of a non-empty list of functions. Grammar Writing
Ambiguous Grammar
• With BNF: Common Cases in Parser Rule

program : func_list EOF ;


func_list : func func_list | func ;

• With EBNF:

program : func_list EOF ;


func_list : func +;

Parser.28
Parser
Formula 1: A non-empty list of x
MEng. Tran Ngoc
Bao Duy

Every case in the format of a non-empty list of x can be Introduction

written using the following formula: Context-free Grammar

1 With BNF: Parser: How it works

Parser Generation
xlist : x xlist | x ; from ANTLR4

Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x +;

Parser.29
Parser
Case 2: A null-able list of x
MEng. Tran Ngoc
Bao Duy

Example
The body of a function in C consists of a null-able list of
statements enclosed in a pair of curly braces. The function Introduction

body is denoted as func_body, and a statement is denoted Context-free Grammar

as stmt. Parser: How it works

Write the production rule for the function body. Parser Generation
from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.30
Parser
Case 2: A null-able list of x
MEng. Tran Ngoc
Bao Duy

Example
The body of a function in C consists of a null-able list of
statements enclosed in a pair of curly braces. The function Introduction

body is denoted as func_body, and a statement is denoted Context-free Grammar

as stmt. Parser: How it works

Write the production rule for the function body. Parser Generation
from ANTLR4

Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
func_body : ’{ ’ stmtlist ’} ’; Common Cases in Parser Rule

stmtlist : stmt stmtlist | ;

• With EBNF:

func_body : ’{ ’ stmtlist ’} ’;
stmtlist : stmt *;

Parser.30
Parser
Formula 2: A non-empty list of x
MEng. Tran Ngoc
Bao Duy

Every case in the format of a null-able list of x can be Introduction

written using the following formula: Context-free Grammar

1 With BNF: Parser: How it works

Parser Generation
xlist : x xlist | ; from ANTLR4

Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x *;

Parser.31
Parser
Case 3: A non-empty list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
A variable declaration starts with a type, which is int or
float, then a comma-separated list of identifiers and ends
with a semicolon. It is known that a variable declaration is Introduction
denoted as vardecl, and an identifier is denoted as ID. Context-free Grammar
Other terminal symbols include: INT (integer type), Parser: How it works
FLOAT (floating-point type), SEMI (semicolon), and CM Parser Generation
from ANTLR4
(comma).
Challenges in
Write the production rule for the variable declaration. Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.32
Parser
Case 3: A non-empty list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
A variable declaration starts with a type, which is int or
float, then a comma-separated list of identifiers and ends
with a semicolon. It is known that a variable declaration is Introduction
denoted as vardecl, and an identifier is denoted as ID. Context-free Grammar
Other terminal symbols include: INT (integer type), Parser: How it works
FLOAT (floating-point type), SEMI (semicolon), and CM Parser Generation
from ANTLR4
(comma).
Challenges in
Write the production rule for the variable declaration. Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
• With BNF:

vardecl : ( INT | FLOAT ) idlist SEMI ;


idlist : ID CM idlist | ID ;

• With EBNF:

vardecl : ( INT | FLOAT ) idlist SEMI ;


idlist : ID ( CM ID )*;
Parser.32
Parser
Formula 3: A non-empty list of x, separated by y
MEng. Tran Ngoc
Bao Duy

Every case in the format of a non-empty list of x, sepa- Introduction

rated by y can be written using the following formula: Context-free Grammar

1 With BNF: Parser: How it works

Parser Generation
xlist : x y xlist | x ; from ANTLR4

Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x ( y x )*;

Parser.33
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction

Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.34
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction

Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4

Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
paramdecl : LB paramlist RB ;
paramlist : param SM paramlist | ;

• With EBNF:

paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;

Parser.34
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction

Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4

Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
paramdecl : LB paramlist RB ;
paramlist : param SM paramlist | ;

• With EBNF:

paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;

Is the solution above correct?


Parser.34
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction

Context-free Grammar
that a parameter declaration is denoted as paramdecl, and
Parser: How it works
a parameter is denoted as param.
Parser Generation
Write the production rule for the parameter declaration. from ANTLR4

Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
paramdecl : LB paramlist RB ;
paramlist : param SM paramlist | ;

• With EBNF:

paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;

Is the solution above correct? NO


Parser.34
Parser
Case 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy
Example
The parameter declaration starts with a left round bracket
LB and a null-able semicolon-separated list of param-
eters and ends with a right round bracket RB. It is known Introduction

that a parameter declaration is denoted as paramdecl, and Context-free Grammar

a parameter is denoted as param. Parser: How it works

Write the production rule for the parameter declaration. Parser Generation
from ANTLR4

Challenges in
• With BNF: Grammar Writing
Ambiguous Grammar
paramdecl : LB paramlist RB ; Common Cases in Parser Rule

paramlist : paramprime | ;
paramprime : param SM paramprime | param ;

• With EBNF:

paramdecl : LB paramlist RB ;
paramlist : param ( SM param )* | ;
or paramlist: (param (SM param)*)?;
Parser.35
Parser
Formula 4: A null-able list of x, separated by y
MEng. Tran Ngoc
Bao Duy

Every case in the format of a null-able list of x, separated


by y can be written using the following formula: Introduction

Context-free Grammar
1 With BNF:
Parser: How it works
xlist : xprime | ; Parser Generation
xprime : x y xprime | x ; from ANTLR4

Challenges in
Grammar Writing
2 With EBNF: Ambiguous Grammar
Common Cases in Parser Rule
xlist : x ( y x )* | ;

or: xlist: (x (y x)*)?;

Parser.36
Parser
Case 5: An infix expression
MEng. Tran Ngoc
Bao Duy

"Please Excuse My Dear Aunt Sally"(PEMDAS) is a pop-


ular mnemonic used to remember the order of operations in
mathematics. Each word in the phrase represents a specific math- Introduction

ematical operation, listed in the order they should be performed: Context-free Grammar

Parser: How it works


• P – Parentheses (Solve expressions inside parentheses first)
Parser Generation
• E – Exponents (Calculate powers and roots next) from ANTLR4

Challenges in
• MD – Multiplication and Division (From left to right) Grammar Writing
Ambiguous Grammar

• AS – Addition and Subtraction (From left to right) Common Cases in Parser Rule

In programming (as mathematics), infix expressions require ex-


plicit rules for the order of operations (such as PEMDAS), while
prefix and postfix expressions do not.

Parser.37
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy

Given a piece of grammar defined in ANTLR as below:


grammar exp ; Introduction
exp : term ASSIGN exp | term ;
Context-free Grammar
term : term EXPONENT fact | fact ;
Parser: How it works
fact : factor RELOP fact | factor ADDOP fact
| factor ; Parser Generation
from ANTLR4
factor : ID ;
Challenges in
ASSIGN : ’= ’ ; Grammar Writing
EXPONENT : ’^ ’ ; Ambiguous Grammar
Common Cases in Parser Rule
ADDOP : ’+ ’ ;
RELOP : ’>’ ;
ID : [a - zA - Z_ ][ a - zA - Z_0 -9]* ;
WS : [ \ t \ r \ n ]+ -> skip ;

Parser.38
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy

Parse Tree for b + c + d:


exp
Introduction
term
Context-free Grammar

fact Parser: How it works

Parser Generation
factor + fact from ANTLR4

Challenges in
ID factor + fact Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
b ID factor

c ID

Parser.39
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parse Tree for b + c + d:
exp

term
Introduction

fact Context-free Grammar

Parser: How it works


factor + fact
Parser Generation
from ANTLR4
ID factor + fact
Challenges in
Grammar Writing
b ID factor Ambiguous Grammar
Common Cases in Parser Rule

c ID

In this tree, the rightmost + (between c and d) appears deeper (closer


to the leaves) than the leftmost one. This indicates that the right side
(c + d) is evaluated before the left (b + result).

Parser.39
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parse Tree for b + c + d:
exp

term
Introduction
fact
Context-free Grammar

Parser: How it works


factor + fact
Parser Generation
from ANTLR4
ID factor + fact
Challenges in
Grammar Writing
b ID factor
Ambiguous Grammar
Common Cases in Parser Rule
c ID

In this tree, the rightmost + (between c and d) appears deeper (closer


to the leaves) than the leftmost one. This indicates that the right side
(c + d) is evaluated before the left (b + result).
→ The ADDOP is right-associative.
Parser.39
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Parse Tree for b + c + d: Bao Duy

exp

term

fact Introduction

Context-free Grammar
factor + fact
Parser: How it works

ID factor + fact Parser Generation


from ANTLR4

b ID factor Challenges in
Grammar Writing
Ambiguous Grammar
c ID Common Cases in Parser Rule

fact: factor ADDOP fact | factor ;

• This is right-recursive because fact appears on the right-hand


side of the production rule.
• Right recursion leads to right-associative behavior.
Parser.39
Parser
Case 5: An infix expression - Example
MEng. Tran Ngoc
Bao Duy
Parse Tree for a = b + c
exp

term = exp
Introduction
fact term
Context-free Grammar

factor fact Parser: How it works

Parser Generation
ID factor + fact from ANTLR4

Challenges in
a ID factor Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule
b ID

• ASSIGN (=) is higher in the tree, giving it lower precedence.


• The depth of operators in the parse tree directly reflects their
precedence.

Parser.40
Parser
Formula 5: An infix expression
MEng. Tran Ngoc
Bao Duy

1 Operator Precedence: The closer an operator’s pro-


duction rule is to the start production rule, the lower
its precedence. Introduction

2 Operator Associativity: Context-free Grammar

• Left-recursive rules enforce left-associativity. Parser: How it works

• Right-recursive rules enforce right-associativity. Parser Generation


from ANTLR4
• Non-recursive rules enforce non-associativity.
Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.41
Parser
Formula 5: An infix expression
MEng. Tran Ngoc
Bao Duy

1 Operator Precedence: The closer an operator’s pro-


duction rule is to the start production rule, the lower
its precedence. Introduction

2 Operator Associativity: Context-free Grammar

• Left-recursive rules enforce left-associativity. Parser: How it works

• Right-recursive rules enforce right-associativity. Parser Generation


from ANTLR4
• Non-recursive rules enforce non-associativity.
Challenges in
Grammar Writing
Ghi nhớ Ambiguous Grammar
Common Cases in Parser Rule

Gần gốc ưu tiên thấp hơn,


Trái thì kết trái, phải thì cũng xuôi.
Không đệ quy đứng lặng thôi,
Giữa dòng toán tử, một nơi vững vàng.

Parser.41
Parser

MEng. Tran Ngoc


Bao Duy

Introduction

Context-free Grammar

THANK YOU.
Parser: How it works

Parser Generation
from ANTLR4

Challenges in
Grammar Writing
Ambiguous Grammar
Common Cases in Parser Rule

Parser.42

You might also like