KEMBAR78
Advanced Algorithms Lab Manual | PDF | Theoretical Computer Science | Computing
0% found this document useful (0 votes)
2K views19 pages

Advanced Algorithms Lab Manual

The document outlines the syllabus for an Advanced Algorithms Laboratory course, including 5 programming assignments on graph search, string matching, and modular linear equation algorithms. Students will design and implement algorithms like Bellman-Ford, Monte Carlo primality testing, naive string matching, KMP, finite automata, and Rabin-Karp string matching. The Bellman-Ford algorithm and its applications in routing are described in detail, with pseudocode provided.

Uploaded by

Manjunath Gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views19 pages

Advanced Algorithms Lab Manual

The document outlines the syllabus for an Advanced Algorithms Laboratory course, including 5 programming assignments on graph search, string matching, and modular linear equation algorithms. Students will design and implement algorithms like Bellman-Ford, Monte Carlo primality testing, naive string matching, KMP, finite automata, and Rabin-Karp string matching. The Bellman-Ford algorithm and its applications in routing are described in detail, with pseudocode provided.

Uploaded by

Manjunath Gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ACHARYA INSTITUTE OF TECHNOLOGY

Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Syllabus Prescribed

Subject: Advanced Algorithms Laboratory Academic Year: 2014-15


Year: I Total Hours: 42
Name of the Faculty: Manjunath G.S. Code: 14SCS26

COURSE OBJECTIVES
 To implement the graph search algorithms.
 To implement the string matching algorithms.
 To implement the modular linear equation algorithms.

LABORATORY WORK:
Note: The following programs can be executed on Java/C#/any equivalent tool/language by
adapting exception handling technique wherever it is suitable.
1. Design, develop, and write a program to implement the Bellman-Ford algorithm and
determine its performance. Give its applications.
2. Design, develop, and write a program to implement a Monte Carlo algorithm to test the
primality of a given integer and determine its performance.
3. Design, develop, and write a program to solve string matching problem using naive
approach and the KMP algorithm. Compare their performances.
4. Design, develop, and write a program to solve String matching problem using Finite
Automata and determine its performance.
5. Design, develop, and write a program to solve String matching problem using Robin Karp
algorithm and determine its performance.

Course Outcomes:
Upon completion of the course, the students will be able to
 Design and apply graph search algorithms.
 Design and implement string matching algorithms.
 Design modular linear equation algorithms.

Manjunath G.S. Asst. Professor, Dept. of CSE Page 1


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Program 1. Design, develop, and write a program to implement the Bellman-Ford


algorithm and determine its performance. Give its applications.
The Bellman–Ford algorithm computes single-source shortest paths in a weighted digraph.
For graphs with only non-negative edge weights, the faster Dijkstra's algorithm also solves
the problem. Thus, Bellman–Ford is used primarily for graphs with negative edge weights.
The algorithm is named after its developers, Richard Bellman and Lester Ford, Jr.

If a graph contains a "negative cycle", i.e., a cycle whose edges sum to a negative value, then
walks of arbitrarily low weight can be constructed, i.e., there may be no shortest path.
Bellman-Ford can detect negative cycles and report their existence, but it cannot produce a
correct answer if a negative cycle is reachable from the source.

Bellman–Ford is in its basic structure very similar to Dijkstra's algorithm, but instead of
greedily selecting the minimum-weight node not yet processed to relax, it simply relaxes all
the edges, and does this |V | − 1 times, where |V | is the number of vertices in the graph. The
repetitions allow minimum distances to accurately propagate throughout the graph, since, in
the absence of negative cycles; the shortest path can only visit each node at most once. Unlike
the greedy approach, which depends on certain structural assumptions derived from positive
weights, this straightforward approach extends to the general case.

Bellman–Ford runs in O(|V|·|E|) time, where |V| and |E| are the number of vertices and edges
respectively.

PROOF OF CORRECTNESS
The correctness of the algorithm can be shown by induction. The precise statement shown
by induction is:
Lemma. After i repetitions of for cycle:
If Distance(u) is not infinity, it is equal to the length of some path from s to u;
If there is a path from s to u with at most i edges, then Distance(u) is at most the length of the
shortest path from s to u with at most i edges.

For the base case of induction, consider i=0 and the moment before for cycle is executed for
the first time. Then, for the source vertex, source.distance = 0, which is correct. For other
vertices u, u.distance = infinity, which is also correct because there is no path from source to
u with 0 edges. For the inductive case, first prove the first part. Consider a moment when a
vertex's distance is updated by v.distance := u.distance + uv.weight. By inductive assumption,
u.distance is the length of some path from source to u. Then u.distance + uv.weight is the
length of the path from source to v that follows the path from source to u and then goes to v.
For the second part, consider the shortest path from source to u with at most i edges. Let v be
the last vertex before u on this path. Then, the part of the path from source to v is the shortest
path from source to v with at most i-1 edges. By inductive assumption, v.distance after i−1
cycles is at most the length of this path. Therefore, uv.weight + v.distance is at most the length
of the path from s to u. In the ith cycle, u.distance gets compared with uv.weight + v.distance,
and is set equal to it if uv.weight + v.distance was smaller. Therefore, after i cycles, u.distance

Manjunath G.S. Asst. Professor, Dept. of CSE Page 2


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

is at most the length of the shortest path from source to u that uses at most i edges. If there
are no negative-weight cycles, then every shortest path visits each vertex at most once, so at
step 3 no further improvements can be made. Conversely, suppose no improvement can be
made. Then for any cycle with vertices v[0], ..., v[k−1],
v[i].distance <= v[(i-1) mod k].distance + v[(i-1) mod k]v[i].weight

Summing around the cycle, the v[i].distance terms and the v[i−1 (mod k)] distance terms
cancel, leaving
0 <= sum from 1 to k of v[i-1 (mod k)]v[i].weight
I.e., every cycle has nonnegative weight.

APPLICATIONS IN ROUTING
A distributed variant of the Bellman–Ford algorithm is used in distance-vector routing
protocols, for example the Routing Information Protocol (RIP). The algorithm is distributed
because it involves a number of nodes (routers) within an Autonomous system, a collection
of IP networks typically owned by an ISP.
It consists of the following steps:
 Each node calculates the distances between itself and all other nodes within the AS
and stores this information as a table.
 Each node sends its table to all neighboring nodes.
 When a node receives distance tables from its neighbors, it calculates the shortest
routes to all other nodes and updates its own table to reflect any changes.

DISADVANTAGES OF THE BELLMAN–FORD ALGORITHM


 It does not scale well.
 Changes in network topology are not reflected quickly since updates are spread node-
by-node.

Count to infinity (if link or node failures render a node unreachable from some set of other
nodes, those nodes may spend forever gradually increasing their estimates of the distance to
it, and in the meantime there may be routing loops).

SOURCE CODE

#include<stdio.h>
#include<stdlib.h>

int w[10][10],dist[10],parent[10];
int n,source;
int BellmanFord( );
int Display(int);

Manjunath G.S. Asst. Professor, Dept. of CSE Page 3


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

int main( )
{
int i,j,check;
printf("\n enter the no. of vertices:");
scanf("%d",&n);
printf("\n enter the cost matrix:");
for(i=1;i<=n;i++)
for(j=1;j<=n;j++)
{
if(i!=j)
{
printf("\n enter the weight/cost of the edge w[%d][%d]:",i,j);
scanf("%d",&w[i][j]);
}
}
printf("\n enter the source vertex:");
scanf("%d",&source);
for(i=1;i<=n;i++)
{
dist[i]=999;
parent[i]=999;
}
dist[source]=0;
parent[source]=-1;

check=BellmanFord( );
if(check==0)
{
printf("\n No solution exists because of -ve weight cycle in I/P G!!!");
exit(0);
}
for(i=1;i<=n;i++)
{
if(i!=source)
{ printf("\n (V%d,V%d):%d",source,i,dist[i]);
printf("\n Path is :");
Display(i);
}
}
return 0;
}

Manjunath G.S. Asst. Professor, Dept. of CSE Page 4


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

int Display(int x)
{
if(parent[x]==source)
printf(" %d ==> %d",source,x);
else
{
Display(parent[x]);
printf(" ==> %d",x);
}
return 0;
}
int BellmanFord()
{
int pass,i,j,k,NegativeCycle;
for(pass=1;pass<=n-1;pass++)
{
for(i=1;i<=n;i++)
for(j=1;j<=n;j++)
{
if((dist[i]+w[i][j])<dist[j])
{
parent[j]=i;
dist[j]=dist[i]+w[i][j];
}
}
printf("\n At the end of Pass %d Distance vector is:\n\n",pass);
for(k=0;k<n;printf("%10d",dist[k]),k++);
printf("\n The Parent vector is :\n\n");
for(k=0;k<n;printf("%10d",parent[k++]));
}
NegativeCycle=0;
for(i=1;i<=n;i++)
for(j=1;j<=n;j++)
{
if(dist[j]>(dist[i]+w[i][j]))
{
NegativeCycle=1;
return 0;
}
} return 1;
}

Manjunath G.S. Asst. Professor, Dept. of CSE Page 5


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Output:
Enter the no. of vertices: 3
Enter the cost matrix:
Enter the weight/cost of edge w [1][2]: 1
Enter the weight/cost of edge w [1][3]: 3
Enter the weight/cost of edge w [2][1]: 2
Enter the weight/cost of edge w [2][3]: 4
Enter the weight/cost of edge w [3][1]: 5
Enter the weight/cost of edge w [3][2]: 2
Enter the Source Vertex: 3
At the end of Pass 1 Distance vector is:
0 5 2
The Parent vector is:
0 3 3
At the end of Pass 2 Distance vector is:
0 4 2
The Parent vector is:
0 2 3
(V3, V1):4
Path is: 3 ==> 2 ==> 1
(V3, V2):2
Path is: 3 ==> 2

Manjunath G.S. Asst. Professor, Dept. of CSE Page 6


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Program 2. Design, develop, and write a program to implement a Monte Carlo


algorithm to test the primality of a given integer and determine its performance.
In computing, a Monte Carlo algorithm is a randomized algorithm whose running time is
deterministic, but whose output may be incorrect with a certain (typically small) probability.
The related class of Las Vegas algorithms is also randomized, but in a different way: they take
an amount of time that varies randomly, but always produce the correct answer.

A Monte Carlo algorithm can be converted into a Las Vegas algorithm whenever there exists
a procedure to verify that the output produced by the algorithm is indeed correct. If so, then
the resulting Las Vegas algorithm is merely to repeatedly run the Monte Carlo algorithm until
one of the runs produces an output that can be verified to be correct.

For any which does not belong to the language the algorithms that belong to this class will
conclusively say that. For any which belongs to the language these algorithms say that with
at least 0.5 probability. This type of behavior is termed as L is accepted with one-sided error.
The class of languages with polynomial time Monte Carlo recognition algorithms is termed
as L is accepted with one-sided error.

A natural number
1, 2, 3, 4, 5, 6,...
Is called a prime or a prime number if it has exactly two divisors, 1 and the number itself.
Natural numbers greater than 1 that are not prime are called composite.

A primality test is an algorithm for determining whether an input number is prime. Amongst
other fields of mathematics, it is used for cryptography. Unlike integer factorization,
primality tests do not generally give prime factors, only stating whether the input number is
prime or not. As of 2010, factorization is a computationally difficult problem, whereas
primality testing is comparatively easy (its running time is polynomial in the size of the
input). Some primality tests prove that a number is prime, while others like Miller-Rabin
prove that a number is composite. Therefore we might call the latter compositeness tests
instead of primality tests.

SOURCE CODE
#include<stdio.h>
#include<stdlib.h>

int Modular_Exponentiation(int,int,int);
void Binary(int,int*,int*);

int main( )
{
int n,res;
printf("\n enter a no n:");

Manjunath G.S. Asst. Professor, Dept. of CSE Page 7


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

scanf("%d",&n);
if(n<=0)
{
printf("\n pl enter a +ve no!!!\n\n");
exit(0);
}
res=Modular_Exponentiation(2,n-1,n);
//printf("\n Result=%d",res);
if(n==1)
printf("\n %d is prime!!!\n\n",n);
else if((res%n)==1)
printf("\n %d is prime!!!\n",n); // probably!!!!
else
printf("\n %d is not prime!!!\n",n); // for sure
return 0;
}

int Modular_Exponentiation(int a, int b, int n)


{
int c,d,bin[20]={0},k,i;
c=0,d=1;
Binary(b, bin, &k);
for(i=k; i>=0; i--)
{
c=2*c;
d=(d*d)%n;
if(bin[i]==1)
{
c++;
d=(d*a)%n;
}
printf("\n i=%d,bi=%d,c=%d,d=%d",i,bin[i],c,d);
}
return d;
}

void Binary(int b,int *bin,int *k)


{
int i=0,n=b;
*k=0;
*bin=0;

Manjunath G.S. Asst. Professor, Dept. of CSE Page 8


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

while(b!=0)
{
bin[i]=b%2;
*k=*k+1;
b/=2;
i++;
}
*k=*k-1;
printf("\n binary equivalent of %d is:",n);
for(i=*k;i>=0;printf("%5d",bin[i--]));
}

Output:
Case 1: Enter a no n: 3
Binary equivalent of 2 is: 1 0
i=1,bi=1,c=1,d=2
i=0,bi=0,c=2,d=1
3 is a prime !!!

Case 2: Enter a no n: 4
Binary equivalent of 3 is: 1 1
i=1,bi=1,c=1,d=2
i=0,bi=1,c=3,d=0
4 is a not prime !!!

Manjunath G.S. Asst. Professor, Dept. of CSE Page 9


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Program 3. Design, develop, and write a program to solve string matching problem
using naive approach and the KMP algorithm. Compare their performances.
The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for
occurrences of a "word" W within a main "text string" S by employing the observation that
when a mismatch occurs, the word itself embodies sufficient information to determine where
the next match could begin, thus bypassing re-examination of previously matched characters.

Efficiency of the search algorithm


Assuming the prior existence of the table T, the search portion of the Knuth–Morris–Pratt
algorithm has complexity O (k), where k is the length of S and the O is big-O notation. As
except for the fixed overhead incurred in entering and exiting the function, all the
computations are performed in the while loop, we will calculate a bound on the number of
iterations of this loop; in order to do this we first make a key observation about the nature of
T. By definition it is constructed so that if a match which had begun at S[m] fails while
comparing S[m + i] to W[i], then the next possible match must begin at S[m + (i - T[i])]. In
particular the next possible match must occur at a higher index than m, so that T[i] < i.

Using this fact, we will show that the loop can execute at most 2k times. For in each iteration,
it executes one of the two branches in the loop. The first branch invariably increases i and
does not change m, so that the index m + i of the currently scrutinized character of S is
increased. The second branch adds i - T[i] to m, and as we have seen, this is always a positive
number. Thus the location m of the beginning of the current potential match is increased.
Now, the loop ends if m + i = k; therefore each branch of the loop can be reached at most k
times, since they respectively increase either m + i or m, and m ≤ m + i: if m = k, then certainly
m + i ≥ k, so that since it increases by unit increments at most, must have had m + i = k at
some point in the past, and therefore either way would be done.

Thus the loop executes at most 2k times, showing that the time complexity of the search
algorithm is O(k).

Here is another way to think about the runtime: begin to match W and S at position i and p,
if W exists as a substring of S at p, then W[0 through m] == S[p through p+m]. On succeed,
that is, the word and the text matched at the positions (W[i] == S[p+i]), increase i by 1 (i++).
On fail, that is, the word and the text does not match at the positions(W[i] != S[p+i]), the text
pointer is kept still, while the word pointer roll-back a certain amount(i = T[i], where T is the
jump table) And attempt to match W[T[i]] with S[p+i]. The maximum number of roll-back of
i is bounded by i that is to say, for any failure, can only roll-back as much as have progressed
up to the failure. Then it is clear the runtime is 2k.

NAIVE APPROACH (T, P)


The naïve approach finds all valid shifts using a loop that checks condition
P[1..m]=T[s+1..s+m] for each of the n-m+1 possible values of s.
Naïve string matching takes time O((n-m+1)m) and this bound is tight in the worst case.

Manjunath G.S. Asst. Professor, Dept. of CSE Page 10


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

SOURCE CODE OF NAIVE ALGORITHM


#include<stdio.h>
#include<string.h>

int NaiveString(char *,char *);

int main( )
{
int pos;
char p[15],t[50];
printf("\n enter the text:");
gets(t);
printf("\n text is : %s ",t);
printf("\n enter the pattern:");
gets(p);
printf("\n pattern is : %s ",p);
pos=NaiveString(p,t);
if(pos==-1)
printf("\n search unsuccessful!!!pattern not found!!!\n");
else
printf("\n left most occurance of pattern in text is at position:%d\n",pos);
return 0;
}

int NaiveString(char *p,char *t)


{
int i,j,m,n;
n=strlen(t);
m=strlen(p);
for(i=0;i<=n-m-1;i++)
{
j=0;
while(j<m && (p[j]==t[i+j]))
j++;
if(j==m)
return i+1;
}
return -1;
}

Manjunath G.S. Asst. Professor, Dept. of CSE Page 11


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Output:
Case 1: enter the text: arunan
text is : arunan
enter the pattern: na
pattern is na
left most occurrence of pattern in text is at a position: 5
Case 2: enter the text: arunan
Text is: arunan
enter the pattern: btlit
pattern is btlit
search unsuccessful !!! Pattern not found !!!

SOURCE CODE OF KMP ALGORITHM


#include<stdio.h>
#include<string.h>

int main( )
{
char p[25],t[50];
int pi[25];
int m,n,i,k,flag=1;
printf("\n enter the text:");
gets(t);
printf("\n enter the pattern:");
gets(p);
m=strlen(p);
n=strlen(t);
pi[0]=0,k=0;
for(i=1;i<m;i++)
{
while((k>0)&&(p[k]!=p[i]))
k=pi[k-1];
if(p[k]==p[i])
k++;
pi[i]=k;
}
printf("\n the PI[] array is as follows:\n\n");
for(i=0;i<m;printf("\nPI[%d]=%d",i,pi[i]),i++);
k=0;
for(i=0;i<n;i++)
{

Manjunath G.S. Asst. Professor, Dept. of CSE Page 12


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

while(k>0 && (p[k]!=t[i]))


k=pi[k-1];
if(p[k]==t[i])
k++;
if(k==m)
{
flag=0;
printf("\n pattern match occurs at location %d\n\n",i-m+2);
k=pi[k-1];
}
//printf("\n shift size q=%d\n\n",q);
}
if(flag==1)
printf("\n pattern not found!!!\n\n");
return 0;
}

Output:
Case1: Enter the text: Arunan
Enter the pattern: BTLIT
The PI[] array is as follows:
PI[0]=0
PI[1]=0
PI[2]=0
PI[3]=0
PI[4]=0
Pattern not found !!!
Case2: Enter the text: Arunan in BTLIT
Enter the pattern: BTLIT
The PI[] array is as follows:
PI[0]=0
PI[1]=0
PI[2]=0
PI[3]=0
PI[4]=0
Pattern match occurs at location 12

Manjunath G.S. Asst. Professor, Dept. of CSE Page 13


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Program 4. Design, develop, and write a program to solve String matching problem
using Finite Automata and determine its performance.
In this problem, we will discuss Finite Automata (FA) based pattern searching algorithm. In
FA based algorithm, we preprocess the pattern and build a 2D array that represents a Finite
Automata. Construction of the FA is the main tricky part of this algorithm. Once the FA is
built, the searching is simple. In search, we simply need to start from the first state of the
automata and first character of the text. At every step, we consider next character of text,
look for the next state in the built FA and move to new state. If we reach final state, then
pattern is found in text.

Time complexity of the search process is O (n). Before we discuss FA construction, let us take
a look at the following FA for pattern ACACAGA.

The above diagrams represent graphical and tabular representations of pattern ACACAGA.

Number of states in FA will be M+1 where M is length of the pattern. The main thing to
construct FA is to get the next state from the current state for every possible character. Given
a character x and a state k, we can get the next state by considering the string “pat[0..k-1]x”
which is basically concatenation of pattern characters pat[0], pat[1] … pat[k-1] and the
character x. The idea is to get length of the longest prefix of the given pattern such that the
prefix is also suffix of “pat[0..k-1]x”. The value of length gives us the next state. For example,
let us see how to get the next state from current state 5 and character ‘C’ in the above diagram.
We need to consider the string, “pat[0..5]C” which is “ACACAC”. The lenght of the longest
prefix of the pattern such that the prefix is suffix of “ACACAC”is 4 (“ACAC”). So the next state
(from state 5) is 4 for character ‘C’.

Manjunath G.S. Asst. Professor, Dept. of CSE Page 14


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

In the following code, computeTF() constructs the FA. The time complexity of the
computeTF() is O(m^3*NO_OF_CHARS) where m is length of the pattern and NO_OF_CHARS
is size of alphabet (total number of possible characters in pattern and text). The
implementation tries all possible prefixes starting from the longest possible that can be a
suffix of “pat[0..k-1]x”. There are better implementations to construct FA in
O(m*NO_OF_CHARS)

SOURCE CODE
#include<stdio.h>
#include<string.h>
#define NO_OF_CHARS 256

int getNextState(char *pat, int M, int state, int x)


{
// If the character c is same as next character in pattern,
// then simply increment state
if (state < M && x == pat[state])
return state+1;

int ns, i; // ns stores the result which is next state

// ns finally contains the longest prefix which is also suffix in "pat[0..state-1]c"

// Start from the largest possible value and stop when you find
// a prefix which is also suffix
for (ns = state; ns > 0; ns--)
{
if(pat[ns-1] == x)
{
for(i = 0; i < ns-1; i++)
{
if (pat[i] != pat[state-ns+1+i])
break;
}
if (i == ns-1)
return ns;
}
}
return 0;
}
/* This function builds the TF table which represents Finite Automata for a given pattern*/

Manjunath G.S. Asst. Professor, Dept. of CSE Page 15


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

void computeTF(char *pat, int M, int TF[][NO_OF_CHARS])


{
int state, x;
for (state = 0; state <= M; ++state)
for (x = 0; x < NO_OF_CHARS; ++x)
TF[state][x] = getNextState(pat, M, state, x);
}

/* Prints all occurrences of pat in txt */


void search(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);

int TF[M+1][NO_OF_CHARS];

computeTF(pat, M, TF);
// Process txt over FA.
int i, state=0;
for (i = 0; i < N; i++)
{
state = TF[state][txt[i]];
if (state == M)
{
printf ("\n patterb found at index %d", i-M+1);
}
}
}

int main ( )
{
char *txt = “AABAACAADAABAAABAA”;
char *pat = “AABA”;
search (pat, txt);
return 0;
}

Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13

Manjunath G.S. Asst. Professor, Dept. of CSE Page 16


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Program 5. Design, develop, and write a program to solve String matching problem
using Robin Karp algorithm and determine its performance.
The Rabin – Karp string searching algorithm calculates a hash value for the pattern, and for
each M-character subsequence of text to be compared.
If the hash values are unequal, the algorithm will calculate the hash value for next M-
character sequence.
If the hash values are equal, the algorithm will compare the pattern and the M-character
sequence.
In this way, there is only one comparison per text subsequence, and character matching is
only needed when hash values match.
Running time for Rabin Karp algorithm is O ((n-m+1) m) in the worst case, since the Rabin
Karp algorithm explicitly verifies every valid shift

ALGORITHM
1. n = Length[T],m = Length[P]
2. h = d^m-1 mod q
3. p = 0,to = 0
4. for i= 0 to m
5. do p = (d*p + P[i]) mod q
6. to = (d*to + T[i]) mod q
7. For s =0 to n-m
8. do if p=ts
9. then if P[1..m] ==T[s+1..s+m]
10. then “Pattern founds at shift “ s
11. if s<n-m
12. then ts+1=(d(ts-T[s+1])h+T[s+m+1]

SOURCE CODE
#include<stdio.h>
#include<string.h>

// d is the number of characters in input alphabet


#define d 256
/* pat -> pattern
txt -> text
q -> A prime number
*/
void search(char *pat, char *txt, int q)
{
int M = strlen(pat);
int N = strlen(txt);

Manjunath G.S. Asst. Professor, Dept. of CSE Page 17


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

int i, j;
int p = 0; // hash value for pattern
int t = 0; // hash value for txt
int h = 1;

// The value of h would be "pow(d, M-1)%q"


for (i = 0; i < M-1; i++)
h = (h*d)%q;

// Calculate the hash value of pattern and first window of text


for (i = 0; i < M; i++)
{
p = (d*p + pat[i])%q;
t = (d*t + txt[i])%q;
}

// Slide the pattern over text one by one


for (i = 0; i <= N - M; i++)
{
// Check the hash values of current window of text and pattern
// If the hash values match then only check for characters on by one
if ( p == t )
{
/* Check for characters one by one */
for (j = 0; j < M; j++)
{
if (txt[i+j] != pat[j])
break;
}
if (j == M) // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
{
printf("Pattern found at index %d \n", i);
}
}

// Calculate hash value for next window of text: Remove leading digit,
// add trailing digit
if ( i < N-M )
{
t = (d*(t - txt[i]*h) + txt[i+M])%q;

Manjunath G.S. Asst. Professor, Dept. of CSE Page 18


ACHARYA INSTITUTE OF TECHNOLOGY
Acharya Dr. Sarvepalli Radhakrishnan Road, Soldevanahalli, Bangalore-560107
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

// We might get negative value of t, converting it to positive


if(t < 0)
t = (t + q);
}
}
}

int main( )
{
char *txt = "GEEKS FOR GEEKS";
char *pat = "GEEK";
int q = 101; // A prime number
search(pat, txt, q);
getchar( );
return 0;
}

Output:
Example 1:
txt[ ] = "THIS IS A TEST TEXT"
pat[ ] = "TEST"
Pattern found at index 10

Example 2:
txt[ ] = "AABAACAADAABAAABAA"
pat[ ] = "AABA"
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13

Manjunath G.S. Asst. Professor, Dept. of CSE Page 19

You might also like