Year: 2024-2025
Spring Semester
Natural Language
Processing
Dr. Wafaa Samy
Dr. Hanaa Eissa
Syntax (Part 1)
Lecture (9)
2
Contents
• Syntax
• Context-Free Grammar (CFG)
o Derivation
o Parsing
o Grammatical and Ungrammatical Sentences
3
Syntax
• Study the order of words in a sentence and their
relationships.
o Syntax defines word categories and functions.
• Subject, verb, object is a sequence of functions
that corresponds to a common order in many
European languages including English and French.
• Syntactic Parsing determines the structure of a
sentence.
4
Syntax (Cont.)
• Why should we care about syntax?
• Grammars (and parsing) are key components
in many applications like:
o Grammar checkers
o Question answering
o Information extraction
o Machine translation
5
Context-Free Grammars
• Context-Free Grammars (CFGs).
o Also known as:
Phrase structure grammars.
Backus-Naur Form (BNF).
• CFG consist of: Rules, Terminals, and Non-terminals
1. Terminals:
o We’ll take these to be words (for now).
2. Non-Terminals:
o The constituents in a language.
Like noun phrase, verb phrase, verb, noun, sentence, etc.
3. Rules:
o Rules are equations that consist of a single non-terminal on the left
and any number of terminals and non-terminals on the right.
6
Some NP Rules
• Here are some rules for our noun phrases (NP):
• Together, these three rules describe two kinds of NPs.
1. One that consists of a determiner followed by a nominal.
2. And another that says that proper names are NPs.
• The third rule illustrates two things:
o An explicit disjunction (by using “|”): Two kinds of nominal.
o A recursive definition: Same non-terminal “Nominal” on the
right and left-side of the rule.
7
L0 Grammar
8
Formal Definition
9
Context-Free Grammar
• A context-free grammar consists of:
1. A set of production rules, each of which expresses the ways
that symbols of the language can be grouped and ordered
together, and
2. A lexicon of words and symbols.
10
Derivation and Parsing
• A derivation is a sequence of rules applied to a string
that accounts for that string:
o Covers all the elements in the string.
o Covers only the elements in the string.
11
Example (1): Derivation
• Derivation for the statement:
• Use this grammar to
S ⇒ NP VP
make a derivation for the ⇒ ProperNoun VP
following statement: ⇒ Adel VP
⇒ Adel Verb NP
Adel study Automata with Maged ⇒ Adel study NP
⇒ Adel study NP PP
⇒ Adel study ProperNoun PP
⇒ Adel study Automata PP
⇒ Adel study Automata Preposition NP
⇒ Adel study Automata with NP
⇒ Adel study Automata with ProperNoun
12 ⇒ Adel study Automata with Maged
Derivation
If A → α is a production rule, and β and γ are any strings in the set (Σ U V)*, then
we say that β A γ directly derives β α γ, written as:
βAγ⇒βαγ
• Derivation is a generalization of direct derivation.
13
Language Defined by CFG
• The formal language defined by a CFG is the set of strings
that are derivable from the designated start symbol.
14
Parsing
• Parsing (or Syntactic parsing) is the process of taking
a string and a grammar and returning a (multiple?)
parse tree(s) for that string.
• It is completely analogous to running a finite-state
transducer with a tape.
o It’s just more powerful:
Remember this means that there are languages we
can capture with CFGs that we can’t capture with
finite-state methods.
15
Example (2): Parsing
• Parse tree:
16 Adel study Automata with Maged
Example (3): Parsing
Parse Tree
17
Question (1)
• Which of the following rules is NOT a correct
context-free rule, where non-terminals are
{A, B}, and terminals are {a, b}.
1. A a
2. A a | bA
3. AB a
4. a bA
5. Both 3 and 4
18
Question (1): Solution
• Which of the following rules is NOT a correct
context-free rule, where non-terminals are
{A, B}, and terminals are {a, b}.
1. A a
2. A a | bA
3. AB a
4. a bA
5. Both 3 and 4
19
Question (2)
• Given {a, the, study} ∈ terminals and
{VP, Verb, Det} ∈ non-terminals, which of the
following production rules is NOT a correct
CFG rule:
1. Det → a | the
2. VP → Verb
3. Verb → study
4. a → Det
20
Question (2): Solution
• Given {a, the, study} ∈ terminals and
{VP, Verb, Det} ∈ non-terminals, which of the
following production rules is NOT a correct
CFG rule:
1. Det → a | the
2. VP → Verb
3. Verb → study
4. a → Det
21
Question (3)
• Given {x, y} is a set of terminals and A is a
non-terminal, which of the following words is
correctly generated from the grammar with a
rule: A → x | yA
1. xyyy
2. yxyxyx
3. yyyx
4. xyy
22
Question (3): Solution
• Given {x, y} is a set of terminals and A is a
non-terminal, which of the following words is
correctly generated from the grammar with a
rule: A → x | yA
1. xyyy
2. yxyxyx
3. yyyx
4. xyy
23
Grammatical and Ungrammatical
Sentences
• A CFG defines a formal language ( a set of strings ).
• Sentences (strings of words) that can be derived by a
grammar are in the formal language defined by that
grammar, and are called grammatical sentences.
24
Grammatical and Ungrammatical
Sentences (Cont.)
• Sentences that cannot be derived by a given formal
grammar are not in the language defined by that grammar,
and are referred to as ungrammatical sentences.
ungrammatical sentence grammatical sentence
25