KEMBAR78
RABIN KARP ALGORITHM STRING MATCHING | PPTX
PRESENTED BY
ABHISHEK KUMAR SINGH
RABIN KARP
String Matching
Algorithm
 Let text string be T of length N
 Pattern string be P of length M
 Example
 T=“hello world”; N=11;
 P=“llo”; M=3
Problem Statement
H E L L O W O R L D
L L O
 The Rabin-Karp string matching algorithm calculates a hash value
for the pattern, as well as for each M-character subsequence of
given text to be compared.
 If the hash values for a particular subsequence are unequal, the
algorithm will calculate the hash value for next M-character
sequence.
 If the hash values are equal, the algorithm will do a Brute Force
comparison between the pattern and the M-character sequence.
 Therefore there is only one comparison per text subsequence, and
Brute Force is only needed when hash values match.
Note: Only one comparison for sub sequence
Rabin-Karp Algorithm
 Consider an M-character sequence as an M-
digit number in the alphabet. The text
subsequence t[I ... i+M-1] is mapped to the
number
x(i) = t[i]*3^0+ t[i+1]*b^1+...+ t[i+M-1]*3^(M-
1)
 Furthermore, given x(i) we can compute x(i+1)
for the next subsequence t[i+1 .. i+M] in
constant time,as follows:
x(i+1) = (x(i)-(t[i]))/3+(3^(M-1))*t[i+M-1]
Rolling Hash Function
A B D A B C
Example
tHash = Hash(“ABD”) = 1*3^0+2*3^1+4*3^2= 43
pHash = Hash(“ABC”) = 1*3^0+2*3^1+3*3^2 = 34
tHash == pHash  FALSE
tHash = Hash(“BDA”) = 2*3^0+4*3^1+1*3^2 = 23
tHash == pHash  FALSE
tHash = Hash(“DAB”) = 4*3^0+1*3^1+2*3^2 = 25
tHash == pHash  FALSE
A B D A B C
A B D A B C
A B D A B C
tHash = Hash(“ABC”) = 30+30+40 = 34
tHash == pHash  TRUE
 TEXT :
 PATTERN :
HERE PATTERN MATCHES THE
SUBSTRING SO INDEX NO. 3 IS RETURNED
Incase of HIT
A B D A B C
A B C
int main()
{
string s1,s2;
cin>>s1>>s2;
int
l1=s1.size(),l2=s2.size(),
i,j,k;
int hs1=0,p=0,ar[l1];
for(i=0;i<l2;i++)
hs1+=(int)s2[i]*pow(3,i);
int hs2=0;
for(i=0;i<l1;i++)
{
if(i==0)
int c=0;
if(hs2==hs1)
{
for(j=i;j<i+l2;j++)
if(s1[j]==s2[j-
i])
c++;
else
break;
}
if(c==l2)
ar[p++]=i;
}
cout<<l1<<endl;
cout<<"Stored index
 Hash to two string match then it is called Hit
 There is possibility
 Hash of “abc” is 34
 Hash of “dga” is 34
 This is called Hash Collision
 Minimize Collision by
 Taking mod with prime number
HASH COLLISION
 Hash of Pattern
 O(m)
 Hash Comparison
 O(n-m+1) = O(n) ; m < n
 Average Running Time
 O(m+n)
 Worst Case Running Time
 m comparison in each iteration
 O(mn)
ANALYSIS
 Keyword matching in large files
 Good for Plagiarism detection
 Searching engines
 Database searching
 Bioinformatics

APPLICATION
1. CLRS – 2nd Edition, Page 911-916
2. slideshare.net/GajanandSharma1/rabin-karp-string-
matching-algorithm
3. stoimen.com/blog/2012/04/02/computer-algorithms-
rabin-karp-string-searching/
4. en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algori
thm
5. cs.princeton.edu/courses/archive/fall04/cos226/lectur
es/string.4up.pdf
References
THANK YOU

RABIN KARP ALGORITHM STRING MATCHING

  • 1.
    PRESENTED BY ABHISHEK KUMARSINGH RABIN KARP String Matching Algorithm
  • 2.
     Let textstring be T of length N  Pattern string be P of length M  Example  T=“hello world”; N=11;  P=“llo”; M=3 Problem Statement H E L L O W O R L D L L O
  • 3.
     The Rabin-Karpstring matching algorithm calculates a hash value for the pattern, as well as for each M-character subsequence of given text to be compared.  If the hash values for a particular subsequence are unequal, the algorithm will calculate the hash value for next M-character sequence.  If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence.  Therefore there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. Note: Only one comparison for sub sequence Rabin-Karp Algorithm
  • 4.
     Consider anM-character sequence as an M- digit number in the alphabet. The text subsequence t[I ... i+M-1] is mapped to the number x(i) = t[i]*3^0+ t[i+1]*b^1+...+ t[i+M-1]*3^(M- 1)  Furthermore, given x(i) we can compute x(i+1) for the next subsequence t[i+1 .. i+M] in constant time,as follows: x(i+1) = (x(i)-(t[i]))/3+(3^(M-1))*t[i+M-1] Rolling Hash Function
  • 5.
    A B DA B C Example tHash = Hash(“ABD”) = 1*3^0+2*3^1+4*3^2= 43 pHash = Hash(“ABC”) = 1*3^0+2*3^1+3*3^2 = 34 tHash == pHash  FALSE tHash = Hash(“BDA”) = 2*3^0+4*3^1+1*3^2 = 23 tHash == pHash  FALSE tHash = Hash(“DAB”) = 4*3^0+1*3^1+2*3^2 = 25 tHash == pHash  FALSE A B D A B C A B D A B C A B D A B C tHash = Hash(“ABC”) = 30+30+40 = 34 tHash == pHash  TRUE
  • 6.
     TEXT : PATTERN : HERE PATTERN MATCHES THE SUBSTRING SO INDEX NO. 3 IS RETURNED Incase of HIT A B D A B C A B C
  • 7.
    int main() { string s1,s2; cin>>s1>>s2; int l1=s1.size(),l2=s2.size(), i,j,k; inths1=0,p=0,ar[l1]; for(i=0;i<l2;i++) hs1+=(int)s2[i]*pow(3,i); int hs2=0; for(i=0;i<l1;i++) { if(i==0) int c=0; if(hs2==hs1) { for(j=i;j<i+l2;j++) if(s1[j]==s2[j- i]) c++; else break; } if(c==l2) ar[p++]=i; } cout<<l1<<endl; cout<<"Stored index
  • 8.
     Hash totwo string match then it is called Hit  There is possibility  Hash of “abc” is 34  Hash of “dga” is 34  This is called Hash Collision  Minimize Collision by  Taking mod with prime number HASH COLLISION
  • 9.
     Hash ofPattern  O(m)  Hash Comparison  O(n-m+1) = O(n) ; m < n  Average Running Time  O(m+n)  Worst Case Running Time  m comparison in each iteration  O(mn) ANALYSIS
  • 10.
     Keyword matchingin large files  Good for Plagiarism detection  Searching engines  Database searching  Bioinformatics  APPLICATION
  • 11.
    1. CLRS –2nd Edition, Page 911-916 2. slideshare.net/GajanandSharma1/rabin-karp-string- matching-algorithm 3. stoimen.com/blog/2012/04/02/computer-algorithms- rabin-karp-string-searching/ 4. en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algori thm 5. cs.princeton.edu/courses/archive/fall04/cos226/lectur es/string.4up.pdf References
  • 12.

Editor's Notes