ATCD Unit 2
ATCD Unit 2
Grammar
The set of production rules which are used for generate the strings is call as
Grammar. In Theory of Computation, grammar refers to a formal system that defines how
strings in a language are constructed. It plays a crucial role in determining the syntactic
correctness of languages and forms the foundation for parsing and interpreting programming
languages, natural languages, and other formal systems.
Grammar in Computation
Grammar is a formal system that defines a set of rules for generating valid strings within
a language. It serves as a blueprint for constructing syntactically correct sentences or
meaningful sequences in a formal language.
Grammar is basically composed of two basic elements:
Terminal Symbols: Terminal symbols are those that are the components of the sentences
generated using grammar and are represented using small case letters like a, b, c, etc.
Non-Terminal Symbols: Non-terminal symbols are those symbols that take part in the
generation of the sentence but are not the component of the sentence. Non-Terminal
Symbols are also called Auxiliary Symbols and Variables. These symbols are represented
using a capital letters like A, B, C, etc.
Representation of Grammar
Any Grammar can be represented by 4 tuples - <V, T, P, S>
V - Finite Non-Empty Set of Non-Terminal Symbols.
T - Finite Set of Terminal Symbols.
P - Finite Non-Empty Set of Production Rules.
S - Start Symbol (Symbol from where we start producing our sentences or strings).
Production Rules
A production or production rule in computer science is a rewrite rule specifying a symbol
substitution that can be recursively performed to generate new symbol sequences. It is of the
form α-> β where α is a Non-Terminal Symbol which can be replaced by β which is a string
of Terminal Symbols or Non-Terminal Symbols.
Example 1
Consider Grammar G1 = <V, T, P, S>
T = {a,b} #Set of terminal symbols
1 2 3 4 5
P = { A -> Aa, A -> Ab, A -> a ,A -> b, A -> 𝜺} #Set of all production rules
S = {A} #Start Symbol
As the start symbol S is equivalent to A then we can produce Aa, Ab, a, b, 𝜺 strings. These
strings can further produce strings where A can be replaced by the strings mentioned in the
production rules. Hence this grammar can be used to produce strings of the form (a+b)*.
Derivation of Strings
A->a #using production rule 3
OR
A->Aa #using production rule 1
Aa->ba #using production rule 4
OR
A->Aa #using production rule 1
Aa->AAa #using production rule 1
AAa->bAa #using production rule 4
bAa->ba #using production rule 5
Example 2
Equivalent Grammars
Two grammars are said to be equivalent if they generate the same language. For instance, if
Grammar 1 and Grammar 2 both generate strings of the form (𝑎+𝑏)∗, they are considered
equivalent.
Types of Grammars
There are several types of Grammar. We classify them on the basis mentioned below.
Type of Production Rules: The form and complexity of the production rules define the
grammar type, such as context-free, context-sensitive, regular, or unrestricted grammars.
Number of Derivation Trees: The number of ways a string can be derived from the
grammar. Ambiguous grammars have multiple derivation trees for the same string.
Number of Strings: The size and nature of the language generated by the grammar.
Type 0 Grammar:-
This grammar is also called as phase structured grammar. In this grammar the
Right Hand Side of production are free from any restriction. This is also called as
unrestricted grammar. The language generated by Type 0 grammar is Recursively
Enumerable Language.
Type 1 Grammar:-
It is also called as context sensitive grammar. the production should be of the p form αβ
Such that |α|≤|β| the grammar which contains all type 1 productions such grammar is
called Type 1 grammar or context sensitive grammar.
Language generated by the context sensitive grammar (CSG) is called Context Sensitive
Language(CSL).
Example:-
1. S-->aS,
S-->AB,
S-->Aab
Type 2 Grammar:-
A production of the form αβ , such that β ∈ (V U T)*. Then it is called Type 2
production. If a grammar contains all type 2 productions. That is called as type 2
grammar. This is also called as Context Free Grammar. The language generated by the
Context Free Grammar is called as Context Free Language(CFL)
Example:
S-->aA
S-->BAD
Type 3 Grammar:-
Type-3 grammars (regular grammars) generate the regular languages. Such a grammar
restricts its rule to a single non-terminal on the left-hand side and a right-hand side
consisting of a single terminal, possibly followed or precedes, but not both in the same
grammar by a single non- terminal. A production S→ε is allowed in Type-3 grammar, but
in this case S does not appear on the right hand side of any production.
Example:-
S-->a,
S-->b,
S-->aA,
A-->a
Regular Grammar:
A regular grammar is a formal grammar that describes the regular language.
Where a formal grammar is defined as a set of rules for rewriting the strings, along with
a start symbol from which the rewriting must start.
S ⇢ 00B | 11S
B ⇢ 0B | 1B | 0 | 1
where,
S and B are non-terminals, and
0 and 1 are terminals
S ⇢ B00 | S11
B ⇢ B0 | B1 | 0 | 1
where
S and B are non-terminals, and
0 and 1 are terminals
Context Free Grammar:
A context-free grammar (CFG) is a formal system used to describe a class of languages
known as context-free languages (CFLs). Purpose of context-free grammar is:
To list all strings in a language using a set of rules (production rules).
It extends the capabilities of regular expressions and finite automata.
A CFG (or just a grammar) G is a tuple G = (V, T, P, S) where
V is the (finite) set of variables (or non terminals or syntactic categories). Each variable
represents a language, i.e., a set of strings
T is a finite set of terminals, i.e., the symbols that form the strings of the language being
defined
P is a set of production rules that represent the recursive definition of the language.
S is the start symbol that represents the language being defined. Other variables represent
auxiliary classes of strings that are used to define the language of the start symbol.
A grammar is said to be the Context-free grammar if every production is in the form of:
G -> (V∪T)* , where G ∊ V
V (Variables/Non-terminals): These are symbols that can be replaced using production
rules. They help in defining the structure of the grammar. Typically, non-terminals are
represented by uppercase letters (e.g., S, A, B).
T (Terminals): These are symbols that appear in the final strings of the language and
cannot be replaced further. They are usually represented by lowercase letters (e.g., a, b,
c) or specific symbols.
The left-hand side can only be a Variable, it cannot be a terminal.
But on the right-hand side here it can be a Variable or Terminal or both combination of
Variable and Terminal.
The above equation states that every production which contains any combination of the
'V' variable or 'T' terminal is said to be a context-free grammar.
S⟹aS
S⟹aSS
S⟹aSSS
S⟹aaSS
S⟹aaSSS
S⟹aaaSS
S⟹aaaaS
S⟹aaaaa
Language generated by G is L(G)= { ai | i>=2}.
In the given productions we have production in the form of Sa . So we have single
a ϵ L(G).
hence the language generated by given language G id L(G)={ai | i>=1}.
Tuple representation:
------------------------------
G(V,T,P,S)
V={S}
T={b,a}
P={ Sabb|aSbb}
SS.
Find the grammar for the language L= {an cm dm bn | n, m >=1}
Strings generated by the given language
L={ acdb, accddb, aacdbb, aaaccddbbb……}
Derivation
Derivation is a sequence of production rules. It is used to get the input string through
these production
rules.
We have two options to decide which non-terminal to be placed with production rule.
1. Leftmost Derivation:
In the leftmost derivation, the input is scanned and replaced with the production rule
from left to right.
So in leftmost derivation, we read the input string from left to right.
Example:
Consider the grammar SS+S, SS*S, Sa, Sb.
2. Rightmost Derivation:
In rightmost derivation, the input is scanned and replaced with the production rule
from right to left.
So in rightmost derivation, we read the input string from right to left.
Example:
Consider the grammar SS+S, SS*S, Sa, Sb.
Classification of CFG
Context Free Grammars (CFG) can be classified on the basis of following two
properties:
During Compilation, the parser uses the grammar of the language to make a parse tree(or
derivation tree) out of the source code. The grammar used must be unambiguous. An
ambiguous grammar must not be used for parsing.
For example, for the expression a+b×c , the only valid parse is that multiplication happens
first due to the structure of the grammar. This is an unambiguous grammar because there is
only one way to derive the string.
A A
A - A
A + A
A + A
a A - A
a a a
a a
Parse Tree for the above derivation: Parse Tree for the above derivation:
There exist two left derivations , So the given grammar is ambiguous.
Example 2:
S→0S1∣SS∣ϵ
Answer: It’s ambiguous because at least one string in its language has two distinct parse trees
(equivalently, two distinct leftmost derivations).
Take the string 01.
Example 3:
S→aAB
A→bC | cd
C→cd
B→c | d
Ambiguous grammar: a context-free grammar is ambiguous if there exists at least one string
in its language that has two distinct parse trees (equivalently two distinct leftmost or rightmost
derivations).
Left Recursion: A grammar is said to be Left recursive if and only if it is of the form AAα
such that A is a variable, and α Є (VUT)*.
Example: SS+S
SSa
Elimination of Left Factoring: Consider the A productions which are having the left
factoring as follows
Aα β1| α β2|..... α βn| γ1| γ2 ......| γm
γ 1, γ2 ........ γm doesn’t starts with α
we can eliminate left factoring in the following way
A αA1| γ1| γ2 ......| γm
A1β1|β2.......βm.
Examples:
1. Consider the CFG SaSa|aa|b eliminate left factoring.
Sol: In the given grammar there exist left factoring . a is common left factoring. We can
eliminate left factoring in following way.
SaA1|b
A1 Sa|a
Simplification of Grammars:
Grammar may contain some extra symbols , these will increase the length of the
grammar. Elimination of these unnecessary symbols is called simplification of CFGs.
Simplification of grammars generally includes
a. Elimination of useless symbols.
b. Elimination of Є productions.
c. Elimination of unit productions of the form AB.
a. Elimination of useless symbols: A symbol is useless if it can not derive a terminal
or it is not reachable from start symbol.
Examples:
1. Eliminate useless symbols and productions from the following grammar.
SABa|BC, AaC|BCC, Ca, Bbcc, DE, Ed, Fe
Sol: In the given grammar the non terminals D,E,F are not reachable from the start symbol
‘S’, so we can eliminate them And the simplified grammar is
SABa|BC, AaC|BCC, Ca, Bbcc,
2. Eliminate useless symbols in G.
SAB|CA, SBC|AB, Aa, CaB|b.
Sol: In the given grammar there is no production for B. So we have to eliminate the
productions which contains B.
The simplified grammar is
SCA
Aa
Cb.
b. Elimination of Є productions:
If some CFL contains the word Є then CFG must have a Є-production. However if a
CFG has a Є-production then the CFL doesn’t necessarily contain Є.
Example: SaX
XЄ
CFL={a}
Nullable Variables: In a given Context free grammar a non terminal X is nullable if
1. There is a production XЄ
2. There is a derivation that starts at X and leads to Є
i.e X-----Є
Examples:
1. Eliminate Є-productions from the following grammar G.
SABaC, ABC, Bb|ε, CD|ε, Dd.
Sol: nullable variable are Vn = {B,C,A}
Because B, C are having Є-productions and production A leads to Є.
Now we have to replace the nullable variable with Є.
SABaC|AaC|ABa|aC|a|Aa|Ba
ABC|B|C
CD
Bb
Cd.
2. Eliminate Є-productions from the following grammar G.
SaA, ABB, BaBb|Є
Sol: nullable set Vn= {B}
SaA|a
ABB|B
BaBb|ab.
c. Elimination of unit productions: A production which is of the form AB where A,
B are variables is said to be unit productions.
For each pair of non-terminals A and B such that there is a production AB and
the non-unit productions from B are BS1|S2|...Sn
Where Si Є(TUV)* are strings of terminals and non-terminals then create new
productions as
AS1|S2|...Sn
Do the same for all such pairs A and B simultaneously.
Examples:
1. Eliminate the unit productions in the grammar
SA|bb, AB|b, BS|a
Sol: In the given grammar we have following unit productions
SA, AB, BS.
After eliminating the above unit productions, the required grammar is as follows.
Sb|bb|a
Ab|bb|a
Ba|bb|b
Normal Forms:
If G is a Context free grammar and the production of G satisfy certain properties
then G is said to be in a normal form. There are two types of normal forms.
1. Chomsky Normal Form (CNF)
2. Greibach Normal Form (GNF)
1. Chomsky's Normal Form (CNF)
CNF stands for Chomsky normal form. A CFG(context free grammar) is in
CNF(Chomsky normal form) if all production rules satisfy one of the following conditions:
A non-terminal generating two non-terminals. For example, S → AB.
A non-terminal generating a terminal. For example, S → a.
For example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the
grammar G1 is in CNF. However, the production rule of Grammar G2 does not satisfy the
rules specified for CNF as S → aA contains terminal followed by non-terminal. So the
grammar G2 is not in CNF.
Step 2: Eliminate terminals from the RHS of the production if they exist with other non-
terminals or terminals. For example, production S → aA can be decomposed as:
1. S → RA
2. R → a
Step 3: Eliminate RHS with more than two non-terminals. For example, S → ASB can be
decomposed as:
1. S → RS
2. R → AS
Example:
Convert the given CFG to CNF. Consider the given grammar G1:
1. S → a | aA | B
2. A → aBB | ε
3. B → Aa | b
Solution:
Step 1: As grammar G1 contains A → ε null production, its removal from the grammar
yields:
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 2: In the production rule S → aA | Aa, A → aBB and B → Aa,
terminal a exists on RHS with non-terminals. So we will replace terminal a with X:
S → a | XA | AX | b
A → XBB
B → AX | b | a
X→a
Step 3: In the production rule A → XBB, RHS has more than two symbols, removing it
from grammar yield:
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB
Hence, for the given grammar, this is the required CNF.
For example:
1. G1 = {S → aAB | aB, A → aA| a, B → bB | b}
2. G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}
The production rules of Grammar G1 satisfy the rules specified for GNF, so the
grammar G1 is in GNF. However, the production rule of Grammar G2 does not satisfy the
rules specified for GNF as A → ε and B → ε contains ε(only start symbol can generate ε).
So the grammar G2 is not in GNF.
Step 3: In the grammar, convert the given production rule into GNF form.
If any production rule in the grammar is not in GNF form, convert it.
Example:
1. S → XB | AA
2. A → a | SA
3. B → b
4. X → a
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can
skip step 1 and step 2 and directly go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the
production rule A → SA as:
1. S → XB | AA
2. A → a | XBA | AAA
3. B → b
4. X → a
The production rule S → XB and A → XBA is not in GNF, so we substitute X → a in the
production rule S → XB and A → XBA as:
1. S → aB | AA
2. A → a | aBA | AAA
3. B → b
4. X → a
Now we will remove left recursion (A → AAA), we get:
1. S → aB | AA
2. A → aC | aBAC
3. C → AAC | ε
4. B → b
5. X → a
Now we will remove null production C → ε, we get:
1. S → aB | AA
2. A → aC | aBAC | a | aBA
3. C → AAC | AA
4. B → b
5. X → a
The production rule S → AA and C → AA is not in GNF, so we substitute A → aC | aBAC
| a | aBA in production rule S → AA and C → AA as:
1. S → aB | aCA | aBACA | aA | aBAA
2. A → aC | aBAC | a | aBA
3. C → AAC
4. C → aCA | aBACA | aA | aBAA
5. B → b
6. X → a
The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA
in production rule C → AAC as:
1. S → aB | aCA | aBACA | aA | aBAA
2. A → aC | aBAC | a | aBA
3. C → aCAC | aBACAC | aAC | aBAAC
4. C → aCA | aBACA | aA | aBAA
5. B → b
6. X → a
Hence, this is the GNF form for the grammar G.