USCSP401 Theory of Computation – Practical
Sr.   Date                                                                  Pa    Sign
No                                     Index                                ge
.                                                                           N
                                                                            o.
1                Practical-1: Write a program for tokenization of given     1-3
                                         input
2               Practical-2: Write a program for generating regular         4-6
                expressions for regular grammar
3               Practical-3: Write a program for generating derivation      7-9
                sequence / language for the given sequence of
                productions
4               Practical-4: Design a Program for creating machine that     10-
                accepts three consecutive one.                              12
5               Practical-5: Design a Program for creating machine that     13-
                accepts the string always ending with 101.                  15
6               Practical-6: Design a program for accepting decimal         16-
                number divisible by 2.                                      18
7               Practical-7: Design a program for creating a machine        19-
                which accepts string having equal no. of 1’s and 0’s.       21
8               Practical-8: Design a program for creating a machine        22-
                which count number of 1’s and 0’s in a given string.        24
9               Practical-9: Design a PDA to accept WCWR where w            25-
                is any string and WR is reverse of that string and C is a   27
                Special symbol.
10              Practical-10: Design a Turing machine that’s accepts        28-
                the following language an b n c n where n>0                 30
                                    *******
                                                       Theory-1
                                                     Tokenization
Tokenization is a fundamental process in natural language processing (NLP) that involves breaking
down a text into smaller units, or tokens. These tokens can be words, phrases, symbols, or any
meaningful chunks, depending on the level of granularity required for analysis. The purpose of
tokenization is to transform raw text into a format that is easier to process, analyze, and understand.
In simpler terms, imagine you have a sentence: "The quick brown fox jumps over the lazy dog."
Tokenization would break this sentence into individual units or tokens: "The," "quick," "brown," "fox,"
"jumps," "over," "the," "lazy", "dog", ".". Each of these words is now a separate token.
Basic Tokenization: The most straightforward tokenization is word tokenization, where the input text is
divided into words based on spaces. Punctuation marks are often treated as separate tokens as well.
For example, the sentence "Hello, how are you?" would be tokenized into: "Hello," ",", "how," "are,"
"you," "?"
Sentence Tokenization: Sometimes, it's necessary to tokenize a larger piece of text into sentences. This
process involves splitting the text into individual sentences.
Consider the paragraph: "Tokenization is crucial. It helps in various NLP tasks." Sentence tokenization
would yield: "Tokenization is crucial." and "It helps in various NLP tasks."
Whitespace Tokenization: Another simple tokenization method involves splitting the text based on
whitespace characters, such as spaces or tabs.
For instance, the string "apple banana cherry" would be tokenized into: "apple," "banana," "cherry."
Tokenization Challenges: Tokenization is not always a straightforward task due to challenges like
handling contractions, abbreviations, or languages with no clear word boundaries.
Consider the word "can't." It is a contraction of "cannot," and a simple tokenizer might treat it as "can"
and "'t." Advanced tokenizers take context into account to handle such cases better.
Tokenization Libraries: Many programming languages offer libraries for tokenization. In Python, the
Natural Language Toolkit (nltk) and spaCy are commonly used. These libraries provide pre-built
functions to tokenize text efficiently.
Use Cases:
Tokenization is a crucial preprocessing step in various NLP applications. Some common use cases
include:
Sentiment analysis: Analyzing the sentiment of a piece of text.
Named entity recognition: Identifying entities like names, locations, and organizations.
Machine translation: Translating text from one language to another.
Text summarization: Summarizing large amounts of text.
Practical-1: Write a program for tokenization of given input
Source code:
Output:
                                                               2 | Page
                       Theory-2
                        Regular
                       Expression
Regular Expressions:
                                    3 | Page
Definition: Regular expressions (regex or regexp) are sequences of characters that define a search pattern.
They are used for pattern matching within strings, allowing for the identification of specific sequences of
characters. Regular expressions are powerful tools for text processing and are widely used in
programming, text editors, and various other applications.
Components:
Literal Characters: Characters match themselves.
Metacharacters: Special characters with a reserved meaning (e.g., '*', '+', '.', '|').
Character Classes: A set of characters enclosed in square brackets (e.g., [abc] matches 'a', 'b', or 'c').
Quantifiers: Indicate the number of occurrences (e.g., '*', '+', '?').
Anchors: Specify the position in the string (e.g., '^' for the start, '$' for the end).
Grouping: Parentheses are used to group expressions and establish the order of operations.
Escape Characters: Backslash () is used to escape a metacharacter, treating it as a literal
character. Example:
The regular expression a*b matches strings like "b", "ab", "aab", etc., where 'a' can occur zero or more
times followed by 'b'.
Regular Grammar:
Definition: A regular grammar is a type of formal grammar that generates regular languages.
Regular languages can be recognized by finite automata, and regular grammars are less expressive
than context-free grammars.
Components:
Terminal Symbols: Basic units of the language.
Non-terminal Symbols: Symbols representing sets of strings.
Production Rules: Define how non-terminals can be replaced by sequences of terminals
and non-terminals.
Start Symbol: Designates the initial non-terminal symbol from which the derivation begins.
Epsilon (ε): Represents the empty string.
Regular Production: A production rule of the form A → xB or A → x, where A and B are non-terminals,
and x is a string of terminals.
Example:
Consider the regular grammar:
S → aA | b
A → aA | ε
This generates strings in the form of b or strings starting with 'a' followed by any number of 'a's.
Ex: b,a,aa,aaa,aaaa
                                                                                                 4 | Page
Practical-2: Write a program for generating regular expressions for regular grammar
Source code:
Output:
                                                                                 5 | Page
                                                  Theory-3
                                                  Language
In the context of formal languages and grammars, a language is a set of strings (sequences of symbols)
that are generated or recognized by a given formal grammar. The formal grammar specifies the rules for
constructing valid strings in the language. In this explanation, The following provides a basic
understanding of languages and how they relate to grammars.
Symbols: Symbols are an entity or individual objects, which can be any letter, alphabet or any picture.
Example: 1, a, b, #
Alphabet (Σ): The alphabet of a language is the set of symbols or characters from which strings in the
language are constructed. For example, in the language of binary strings, the alphabet is {0, 1}.
String: A string is a finite sequence of symbols from the alphabet. For instance, "0101" is a string over the
binary alphabet.
Language (L): A language is a set of strings over a given alphabet. It represents the set of all valid or
accepted strings according to the rules defined by a formal grammar.
A language which is formed over Σ can be Finite or Infinite.
Example: 1
L1 = {Set of string of length 2}
  = {aa, bb, ba, bb}        Finite Language
Example: 2
L2 = {Set of all strings starts with 'a'}
   = {a, aa, aaa, abb, abbb, ababb}         Infinite Language
Formal Grammar: A formal grammar is a set of rules that defines the syntax of a language. It consists of:
Terminal Symbols: The basic symbols of the language.
Non-terminal Symbols: Symbols that can be replaced by sequences of symbols according to the rules.
Production Rules: Rules that define how non-terminal symbols can be replaced by sequences of terminal
and/or non-terminal symbols.
Start Symbol: The initial non-terminal symbol from which the derivation process begins.
Derivation: The process of applying production rules in a formal grammar to generate a string in the
language is called derivation. Starting with the start symbol, successive applications of production rules
lead to the formation of strings in the language.
Generative Power: A formal grammar's generative power refers to its ability to generate or produce
strings in a language. Different types of grammars (regular, context-free, context-sensitive, etc.) have
varying generative powers.
For example, if we have a formal grammar with the following rules:
S -> aS | bA | ε
A -> ε
This grammar generates the language L, which consists of strings of the form a* b. The asterisk (*)
denotes zero or more occurrences of the symbol 'a'. Strings such as "b", "ab", "aab", etc., are part of this
language.
In summary, a language defined by a formal grammar represents a set of valid strings, and the grammar
provides the rules for generating or recognizing those strings. Different types of grammars have different
expressive powers, allowing for the description of various language classes.
                                                                                               6 | Page
Practical-3: Write a program for generating derivation sequence / language for the given sequence of
productions
Source code:
Output:
                                                                                  7 | Page
                                                       Theory-4
                                            Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
Formal Definition of FA:
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Finite Automata Model:
Finite automata can be represented by input tape and finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
There are two types of finite automata:
- DFA(deterministic finite automata)
- NFA(non-deterministic finite automata)
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the
null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
Some important points about DFA and NFA:
- Every DFA is NFA, but NFA is not DFA.
- There can be multiple final states in both NFA and DFA.
- DFA is used in Lexical Analysis in Compiler.
- NFA is more of a theoretical concept.
                                                                                               8 | Page
Practical-4: Design a Program for creating machine that accepts three consecutive one.
Source code:
Output:
                                                                                  9 | Page
                                                       Theory-5
                                                   Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
Formal Definition of FA:
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Finite Automata Model:
Finite automata can be represented by input tape and finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
There are two types of finite automata:
- DFA(deterministic finite automata)
- NFA(non-deterministic finite automata)
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the
null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
Some important points about DFA and NFA:
- Every DFA is NFA, but NFA is not DFA.
- There can be multiple final states in both NFA and DFA.
- DFA is used in Lexical Analysis in Compiler.
- NFA is more of a theoretical concept.
                                                                                               10 | Page
Practical-5: Design a Program for creating machine that accepts the string always ending with 101.
Source code:
Output:
                                                                                  11 | Page
                                                       Theory-6
                                                   Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
Formal Definition of FA:
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Finite Automata Model:
Finite automata can be represented by input tape and finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
There are two types of finite automata:
- DFA(deterministic finite automata)
- NFA(non-deterministic finite automata)
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the
null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
Some important points about DFA and NFA:
- Every DFA is NFA, but NFA is not DFA.
- There can be multiple final states in both NFA and DFA.
- DFA is used in Lexical Analysis in Compiler.
- NFA is more of a theoretical concept.
                                                                                               12 | Page
Practical-6: Design a program for accepting decimal number divisible by 2.
Source code:
Output:
                                                                             13 | Page
                                                       Theory-7
                                                   Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
Formal Definition of FA:
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Finite Automata Model:
Finite automata can be represented by input tape and finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
There are two types of finite automata:
- DFA(deterministic finite automata)
- NFA(non-deterministic finite automata)
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the
null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
Some important points about DFA and NFA:
- Every DFA is NFA, but NFA is not DFA.
- There can be multiple final states in both NFA and DFA.
- DFA is used in Lexical Analysis in Compiler.
- NFA is more of a theoretical concept.
                                                                                               14 | Page
Practical-7: Design a program for creating a machine which accepts string having equal no. of 1’s and 0’s.
Source code:
Output:
                                                                                   15 | Page
                                                       Theory-8
                                                   Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired symbol is found,
then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the same state.
Finite automata have two states, Accept state or Reject state. When the input string is processed
successfully, and the automata reached its final state, then it will accept.
Formal Definition of FA:
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q: finite set of states
∑: finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
Finite Automata Model:
Finite automata can be represented by input tape and finite control.
Input tape: It is a linear tape having some number of cells. Each input symbol is placed in each cell.
Finite control: The finite control decides the next state on receiving particular input from input tape. The
tape reader reads the cells one by one from left to right, and at a time only one input symbol is read.
Types of Automata:
There are two types of finite automata:
- DFA(deterministic finite automata)
- NFA(non-deterministic finite automata)
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In
the DFA, the machine goes to one state only for a particular input character. DFA does not accept the
null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a
particular input. It can accept the null move.
Some important points about DFA and NFA:
- Every DFA is NFA, but NFA is not DFA.
- There can be multiple final states in both NFA and DFA.
- DFA is used in Lexical Analysis in Compiler.
- NFA is more of a theoretical concept.
                                                                                               16 | Page
Practical-8: Design a program for creating a machine which count number of 1’s and 0’s in a given string.
rce code:
"
Output:
                                                                                  17 | Page
                                                       Theory-9
                                                   Pushdown automata
Pushdown automata is a way to implement a CFG in the same way we design DFA for a regular grammar.
A DFA can remember a finite amount of information, but a PDA can remember an infinite amount of
information.
Pushdown automata is simply an NFA augmented with an "external stack memory". The addition of stack
is used to provide a last-in-first-out memory management capability to Pushdown automata. Pushdown
automata can store an unbounded amount of information on the stack. It can access a limited amount of
information on the stack. A PDA can push an element onto the top of the stack and pop off an element
from the top of the stack. To read an element into the stack, the top elements must be popped off and are
lost.
A PDA is more powerful than FA. Any language which can be acceptable by FA can also be acceptable
by PDA. PDA also accepts a class of language which even cannot be accepted by FA. Thus PDA is
much more superior to FA.
PDA Components:
Input tape: The input tape is divided in many cells or symbols. The input head is read-only and may only
move from left to right, one symbol at a time.
Finite control: The finite control has some pointer which points the current symbol which is to be read.
Stack: The stack is a structure in which we can push and remove the items from one end only. It has an
infinite size. In PDA, the stack is used to store the items temporarily.
Formal definition of PDA:
The PDA can be defined as a collection of 7 components:
Q: the finite set of states
∑: the input set
Γ: a stack symbol which can be pushed and popped from the stack
q0: the initial state
Z: a start symbol which is in Γ.
F: a set of final states
δ: mapping function which is used for moving from current state to next state.
Instantaneous Description (ID):
ID is an informal notation of how a PDA computes an input string and make a decision that string is
accepted or rejected.
An instantaneous description is a triple (q, w, α) where:
q describes the current state.
w describes the remaining input.
α describes the stack contents, top at the left.
Turnstile Notation:
⊢ sign describes the turnstile notation and represents one move.
⊢* sign describes a sequence of moves.
                                                                                            18 | Page
Practical-9: Design a PDA to accept WCWR where w is any string and WR is reverse of that string and C is a
Special symbol.
Source code:
Output:
                                                                                      19 | Page
                                                      Theory-10
                                                   Turing Machine
Turing machine was invented in 1936 by Alan Turing. It is an accepting device which accepts Recursive
Enumerable Language generated by type 0 grammar.
There are various features of the Turing machine:
It has an external memory which remembers arbitrary long sequence of input.
It has unlimited memory capability.
The model has a facility by which the input at left or right on the tape can be read easily.
The machine can produce a certain output based on its input. Sometimes it may be required that the same
input has to be used to generate the output. So in this machine, the distinction between input and output
has been removed. Thus a common set of alphabets can be used for the Turing machine.
Formal definition of Turing machine
A Turing machine can be defined as a collection of 7 components:
Q: the finite set of states
∑: the finite set of input symbols
T: the tape symbol
q0: the initial state
F: a set of final states
B: a blank symbol used as a end marker for input
δ: a transition or mapping function.
The mapping function shows the mapping from states of finite automata and input symbol on the tape to
the next states, external symbols and the direction for moving the tape head. This is known as a triple or a
program for turing machine.
Ex: (q0, a) → (q1, A, R)
That means in q0 state, if we read symbol 'a' then it will go to state q1, replaced a by X and move ahead
right(R stands for right).
Basic Model of Turing machine
The turning machine can be modelled with the help of the following representation.
1. The input tape is having an infinite number of cells, each cell containing one input symbol and thus
the input string can be placed on tape. The empty tape is filled by blank characters.
2. The finite control and the tape head which is responsible for reading the current input symbol. The
tape head can move to left to right.
3. A finite set of states through which machine has to undergo.
4. Finite set of symbols called external symbols which are used in building the logic of turing machine.
                                                                                               20 | Page
Practical-10: Design a Turing machine that’s accepts the following language aⁿ bⁿ cⁿ where n>0.
Source code:
Output:
                                                                                21 | Page
22 | Page