KEMBAR78
Rabin Carp String Matching algorithm | PPTX
Rabin-Karp Substring search
algorithm
1
Prepared By:
Sabiya Fatima
sabiya1990fatima@gmail.com
Objectives
2
 What is Substring search problem
 Definition of the Rabin-Karp algorithm
 How Rabin-Karp works
 An example to illustrate Rabin-Karp
 Complexity Analysis
 Real Life applications
What is Substring search Problem
3
We assume that the text is an array T [1..N] of length n and that the pattern is an array P [1..M]
of length m, where m << n.
We also assume that the elements of P and T are characters in the finite alphabet S.
(e.g., S = {a,b} We want to find P = ‘aab’ in T = ‘abbaabaaaab’)
 A string search algorithm which compares a string's hash values, rather than the strings
themselves.
 For efficiency, the hash value of the next position in the text is easily computed from the
hash value of the current position.
Definition of the Rabin-Karp Algorithm
4
How Rabin-Karp Works
5
 Let characters in both arrays T and P be digits in radix-S notation. S = (0,1,...,9)
 Let p be the value of the characters in P
 Choose a prime number q such that fits within a computer word to speed
computations.
 Compute (p mod q)
 The value of p mod q is what we will be using to find all matches of the pattern P in T.
How Rabin-Karp Works(Contd.)
6
 Compute (T[s+1, .., s+m] mod q) for s = 0 .. n-m
 Test against P only those sequences in T having the same (mod q) value
 (T[s+1, .., s+m] mod q) can be incrementally computed by subtracting the high-order digit,
shifting, adding the low-order bit, all in modulo q arithmetic.
Algorithm
7
RABIN-KARP-MATCHER(T,P,d,q)
1. n = T.length
2. m= P.length
3. h = d^(m-1) mod q
4. p = 0
5. t0 = 0
6. for i = 1 to m // preprocessing
7. p = (dp + p[i]) mod q
8. t0 = (dt0 + p[i]) mod q
9. for s = 0 to n-m // matching
10. if p == ts
11. if P[1 . . . . M] == T[ s+1 . . . . s+m]
12. print “Pattern occurs with shift” s
13. if s<(n + m)
14. ts+1 = (d(ts – T[s+1]h)+T[s+m+1]) mod q
An Example to illustrate Rabin-Karp
8
• Given T = 31415926535 and P = 26
• We choose q = 11
• P mod q = 26 mod 11 = 4
13 14 95 62 35 5
13 14 95 62 35 5
14 mod 11 = 3 not equal to 4
31 mod 11 = 9 not equal to 4
13 14 95 62 35 5
41 mod 11 = 8 not equal to 4
An Example to illustrate Rabin-Karp(contd.)
9
13 14 95 62 35 5
15 mod 11 = 4 equal to 4 -> spurious hit
13 14 95 62 35 5
59 mod 11 = 4 equal to 4 -> spurious hit
13 14 95 62 35 5
92 mod 11 = 4 equal to 4 -> spurious hit
13 14 95 62 35 5
26 mod 11 = 4 equal to 4 -> an exact match!!
13 14 95 62 35 5
65 mod 11 = 10 not equal to 4
An Example to illustrate Rabin-Karp(contd.)
10
13 14 95 62 35 5
53 mod 11 = 9 not equal to 4
13 14 95 62 35 5
35 mod 11 = 2 not equal to 4
As we can see, when a match is found, further testing is done to insure that a match has
indeed been found.
Complexity Analysis 11
RABIN-KARP-MATCHER(T,P,d,q)
1. n = T.length
2. m= P.length
3. h = d^(m-1) mod q O(1)
4. p = 0
5. t0 = 0
6. for i = 1 to m O(m)
7. p = (dp + p[i]) mod q
8. t0 = (dt0 + p[i]) mod q
9. for s = 0 to n-m O((n-m+1)m)
10. if p == ts
11. if P[1 . . . . M] == T[ s+1 . . . . s+m]
12. print “Pattern occurs with shift” s
13. if s<n + m
14. ts+1 = (d(ts – T[s+1]h)+T[s+m+1]) mod q
Complexity Analysis Result
12
 The running time of the Rabin-Karp algorithm in the worst-case scenario is
O((n-m+1))m but it has a good average-case running time.
 If the expected number of valid shifts is small O(1) and the prime q is chosen to be
quite large, then the Rabin-Karp algorithm can be expected to run in time O(n+m) plus
the time to required to process spurious hits.
Real Time Applications
13
 Bioinformatics
• Used in looking for similarities of two or more proteins; i.e. high sequence
similarity usually implies significant structural or functional similarity.
Example:
Hb A_human
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
G+ +VK+HGKKV A++++++AH+ D++ ++ +++LS+LH KL
Hb B_human
GNPKVKAHGKKVLGAFSDGLAH LDNLKGTF ATLSELH CDKL
+ similar amino acids
14
 Good for plagiarism, because it can deal with multiple pattern matching!
 With a good hashing function it can be quite effective and it’s easy to implement!
Real Time Applications
References
15
.
 Cormen, Thomas S., et al. Introduction to Algorithms. 3rd ed. Boston: MIT Press, 2
 Go2Net Website for String Matching Algorithms
 [www.go2net.com/internet/deep/1997/05/14/body.html]
 Yummy Yummy Animations Site for an animation of the Rabin-Karp algorithm at work
[www.mills.edu/ACAD_INFO/MCS/CS/S00MCS125/String.Matching.Algorithms/animations.html]
 National Institute of Standards and Technology Dictionary of Algorithms, Data Structures, and Problems
 [hissa.nist.gov/dads/HTML/rabinKarpAlgo.html]
 Multi-Pattern String Matching with Very Large Pattern Sets
 [https://www.dcc.uchile.cl/~gnavarro/workshop07/lsalmela.pdf]
Thank You
16

Rabin Carp String Matching algorithm

  • 1.
    Rabin-Karp Substring search algorithm 1 PreparedBy: Sabiya Fatima sabiya1990fatima@gmail.com
  • 2.
    Objectives 2  What isSubstring search problem  Definition of the Rabin-Karp algorithm  How Rabin-Karp works  An example to illustrate Rabin-Karp  Complexity Analysis  Real Life applications
  • 3.
    What is Substringsearch Problem 3 We assume that the text is an array T [1..N] of length n and that the pattern is an array P [1..M] of length m, where m << n. We also assume that the elements of P and T are characters in the finite alphabet S. (e.g., S = {a,b} We want to find P = ‘aab’ in T = ‘abbaabaaaab’)
  • 4.
     A stringsearch algorithm which compares a string's hash values, rather than the strings themselves.  For efficiency, the hash value of the next position in the text is easily computed from the hash value of the current position. Definition of the Rabin-Karp Algorithm 4
  • 5.
    How Rabin-Karp Works 5 Let characters in both arrays T and P be digits in radix-S notation. S = (0,1,...,9)  Let p be the value of the characters in P  Choose a prime number q such that fits within a computer word to speed computations.  Compute (p mod q)  The value of p mod q is what we will be using to find all matches of the pattern P in T.
  • 6.
    How Rabin-Karp Works(Contd.) 6 Compute (T[s+1, .., s+m] mod q) for s = 0 .. n-m  Test against P only those sequences in T having the same (mod q) value  (T[s+1, .., s+m] mod q) can be incrementally computed by subtracting the high-order digit, shifting, adding the low-order bit, all in modulo q arithmetic.
  • 7.
    Algorithm 7 RABIN-KARP-MATCHER(T,P,d,q) 1. n =T.length 2. m= P.length 3. h = d^(m-1) mod q 4. p = 0 5. t0 = 0 6. for i = 1 to m // preprocessing 7. p = (dp + p[i]) mod q 8. t0 = (dt0 + p[i]) mod q 9. for s = 0 to n-m // matching 10. if p == ts 11. if P[1 . . . . M] == T[ s+1 . . . . s+m] 12. print “Pattern occurs with shift” s 13. if s<(n + m) 14. ts+1 = (d(ts – T[s+1]h)+T[s+m+1]) mod q
  • 8.
    An Example toillustrate Rabin-Karp 8 • Given T = 31415926535 and P = 26 • We choose q = 11 • P mod q = 26 mod 11 = 4 13 14 95 62 35 5 13 14 95 62 35 5 14 mod 11 = 3 not equal to 4 31 mod 11 = 9 not equal to 4 13 14 95 62 35 5 41 mod 11 = 8 not equal to 4
  • 9.
    An Example toillustrate Rabin-Karp(contd.) 9 13 14 95 62 35 5 15 mod 11 = 4 equal to 4 -> spurious hit 13 14 95 62 35 5 59 mod 11 = 4 equal to 4 -> spurious hit 13 14 95 62 35 5 92 mod 11 = 4 equal to 4 -> spurious hit 13 14 95 62 35 5 26 mod 11 = 4 equal to 4 -> an exact match!! 13 14 95 62 35 5 65 mod 11 = 10 not equal to 4
  • 10.
    An Example toillustrate Rabin-Karp(contd.) 10 13 14 95 62 35 5 53 mod 11 = 9 not equal to 4 13 14 95 62 35 5 35 mod 11 = 2 not equal to 4 As we can see, when a match is found, further testing is done to insure that a match has indeed been found.
  • 11.
    Complexity Analysis 11 RABIN-KARP-MATCHER(T,P,d,q) 1.n = T.length 2. m= P.length 3. h = d^(m-1) mod q O(1) 4. p = 0 5. t0 = 0 6. for i = 1 to m O(m) 7. p = (dp + p[i]) mod q 8. t0 = (dt0 + p[i]) mod q 9. for s = 0 to n-m O((n-m+1)m) 10. if p == ts 11. if P[1 . . . . M] == T[ s+1 . . . . s+m] 12. print “Pattern occurs with shift” s 13. if s<n + m 14. ts+1 = (d(ts – T[s+1]h)+T[s+m+1]) mod q
  • 12.
    Complexity Analysis Result 12 The running time of the Rabin-Karp algorithm in the worst-case scenario is O((n-m+1))m but it has a good average-case running time.  If the expected number of valid shifts is small O(1) and the prime q is chosen to be quite large, then the Rabin-Karp algorithm can be expected to run in time O(n+m) plus the time to required to process spurious hits.
  • 13.
    Real Time Applications 13 Bioinformatics • Used in looking for similarities of two or more proteins; i.e. high sequence similarity usually implies significant structural or functional similarity. Example: Hb A_human GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A++++++AH+ D++ ++ +++LS+LH KL Hb B_human GNPKVKAHGKKVLGAFSDGLAH LDNLKGTF ATLSELH CDKL + similar amino acids
  • 14.
    14  Good forplagiarism, because it can deal with multiple pattern matching!  With a good hashing function it can be quite effective and it’s easy to implement! Real Time Applications
  • 15.
    References 15 .  Cormen, ThomasS., et al. Introduction to Algorithms. 3rd ed. Boston: MIT Press, 2  Go2Net Website for String Matching Algorithms  [www.go2net.com/internet/deep/1997/05/14/body.html]  Yummy Yummy Animations Site for an animation of the Rabin-Karp algorithm at work [www.mills.edu/ACAD_INFO/MCS/CS/S00MCS125/String.Matching.Algorithms/animations.html]  National Institute of Standards and Technology Dictionary of Algorithms, Data Structures, and Problems  [hissa.nist.gov/dads/HTML/rabinKarpAlgo.html]  Multi-Pattern String Matching with Very Large Pattern Sets  [https://www.dcc.uchile.cl/~gnavarro/workshop07/lsalmela.pdf]
  • 16.