Chapter 3
Describing Syntax
and Semantics
ISBN 0-321-33025-0
Introduction
• Syntax: the form or structure of the expressions,
statements, and program units
• Semantics: the meaning of the expressions,
statements, and program units
• Pragmatics:
• Syntax and semantics provide a language’s
definition
– Users of a language definition
• Other language designers
• Implementers
• Programmers (the users of the language)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-2
Language Theory
• Chomsky identified four classes of language
Type Characteristics
0 Unrestricted
1 Context-sensitive
2 Context-free
3 Regular
– Types 2 and 3 useful for programming language
specification
• Backus (on ALGOL58 committee) developed
notation for specifying programming
languages
Copyright © 2006 Addison-Wesley. All rights reserved. 1-3
Terminology
• A metalanguage is a language used to
describe another language
• A sentence is a string of characters over
some alphabet
• A language is a set of sentences
– a language is specified by a set of rules
• A lexeme is the lowest level syntactic unit
of a language (e.g., *, sum, begin)
• A token is a category of lexemes (e.g.,
identifier)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-4
Two approaches to Language
Definition
• Recognizers
– Read a string and decide whether it follows the
rules for the language
– Example: syntax analysis part of a compiler
(Chapter 4)
• Generators
– A device that generates sentences of a
language (BNF)
– More useful for specifying the language than
for checking a string
Copyright © 2006 Addison-Wesley. All rights reserved. 1-5
Formal Methods of Describing
Syntax
• Backus-Naur Form and Context-Free
Grammars
– Most widely known method for describing
programming language syntax
– Developed as part of the process for
specifying ALGOL
– Define a class of languages called context-
free languages
• Extended BNF
– Improves readability and writability of BNF
Copyright © 2006 Addison-Wesley. All rights reserved. 1-6
BNF Fundamentals
• Non-terminals: BNF abstractions used to
represent classes of syntactic structures
• Terminals: lexemes and tokens
• Grammar: a collection of rules
– Examples of BNF rules:
<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-7
BNF Rules
• A rule has a left-hand side (LHS) and a
right-hand side (RHS), and consists of
terminal and nonterminal symbols
• In a context-free grammar, there can
only be one symbol on the LHS
• A grammar is a finite nonempty set of
rules
• An abstraction (or nonterminal symbol)
can have more than one RHS
<stmt> <single_stmt>
| begin <stmt_list> end
Copyright © 2006 Addison-Wesley. All rights reserved. 1-8
Derivations
• BNF is a generative device
– Use a grammar to generate sentences that
belong to the language the grammar describes
• A derivation is a repeated application of
rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-9
Examples : Describing Lists
• Syntactic lists are described using
recursion
<ident_list> -> <ident>
| <ident>, <ident_list>
<arg_list> -> <expr>
| <expr>, <arg_list>
<param_list> -> <param>
| param, <param_list>
<stmt_list> -> <statement>
| <statement> <stmt_list>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-10
An Example Grammar
<program> -> <stmts>
<stmts> -> <stmt> | <stmt> ; <stmts>
<stmt> -> <var> = <expr>
<var> -> a | b | c | d
<expr> -> <term> + <term> | <term> - <term>
<term> -> <var> | const
Copyright © 2006 Addison-Wesley. All rights reserved. 1-11
Derivation
• Every string of symbols in the derivation is
a sentential form
• A sentence is a sentential form that has
only terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential
form is the one that is expanded
• A derivation may be neither leftmost nor
rightmost
Copyright © 2006 Addison-Wesley. All rights reserved. 1-12
An Example Derivation
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a =<expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Copyright © 2006 Addison-Wesley. All rights reserved. 1-13
Parse Tree
• A hierarchical representation of a
derivation <program>
<stmts>
<stmt>
<var> = <expr>
a <term> + <term>
<var> const
b
Copyright © 2006 Addison-Wesley. All rights reserved. 1-14
Ambiguity in Grammars
• A grammar is ambiguous if and only if it
generates a sentential form that has
two or more distinct parse trees
Copyright © 2006 Addison-Wesley. All rights reserved. 1-15
An Ambiguous Expression Grammar
<expr> <expr> <op> <expr> | const
<op> / | -
<expr> <expr>
<expr> <op> <expr> <expr> <op> <expr>
<expr> <op> <expr> <expr> <op> <expr>
const - const / const const - const / const
Copyright © 2006 Addison-Wesley. All rights reserved. 1-16
An Unambiguous Expression Grammar
• If we use the parse tree to indicate
precedence levels of the operators, we
cannot have ambiguity
<expr> <expr> - <term> | <term>
<term> <term> / const | const
<expr>
<expr> - <term>
<term> <term> / const
const const
Copyright © 2006 Addison-Wesley. All rights reserved. 1-17
Associativity of Operators
• Operator associativity can also be indicated by a
grammar
<expr> -> <expr> + <expr> | const (ambiguous)
<expr> -> <expr> + const | const (unambiguous)
<expr>
<expr>
<expr> + const
<expr> + const
const
Copyright © 2006 Addison-Wesley. All rights reserved. 1-18
Extended BNF
• Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed
inside parentheses and separated via
vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed
inside braces { }
<ident> → letter {letter|digit}
Copyright © 2006 Addison-Wesley. All rights reserved. 1-19
BNF and EBNF
• BNF
<expr> <expr> + <term>
| <expr> - <term>
| <term>
<term> <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr> <term> {(+ | -) <term>}
<term> <factor> {(* | /) <factor>}
Copyright © 2006 Addison-Wesley. All rights reserved. 1-20