Design and Analysis of Algorithms
String Matching Problem
(Knuth-Morris-Pratt Algorithm)
Dr. D. P. Acharjya
Professor, SCOPE
SJT Annex – 201 E
2/28/2024 Dr. D. P. Acharjya 1
String Matching Problem
Finding all occurrences of a pattern (String) in a text
leads to string matching problem.
It arises frequently in text editing programs where text
is a document, and the string searched for is a
particular word.
String-matching algorithms are also used to search for
particular patterns in DNA sequences.
For Example:
2/28/2024 Dr. D. P. Acharjya 2
Mathematical Formulation
Assume that the text is an array T[1 n] of length n.
The pattern is an array P[1 m] of length m. It is to be
noted that m ≤ n.
It is to be noted that the elements of P and T are
characters drawn from a finite alphabet Σ. For example,
Σ = {0, 1} or Σ = {a, b, , z}.
The character arrays P and T are often called strings of
characters.
2/28/2024 Dr. D. P. Acharjya 3
Continued…
The pattern P occurs with shift s in text T (or pattern P
occurs beginning at position s + 1 in text T) if
0 ≤ s ≤ n - m and T[s+1 s+m] = P[1 m]
It means that T[s + j] = P[j], for 1 ≤ j ≤ m.
If P occurs with shift s in T, then it is a valid shift
otherwise, s is an invalid shift.
The string-matching problem is the problem of finding
all valid shifts with which a given pattern P occurs in
a given text T.
2/28/2024 Dr. D. P. Acharjya 4
Prefix Function
The prefix function π for a pattern encapsulates
knowledge about how the pattern matches against
shifts of itself.
This information can be used to avoid testing useless
shifts in the naïve pattern-matching algorithm.
Simultaneously, it avoid the precomputation of δ
(transition function of finite automaton) for a string-
matching automaton.
It minimizes the total computation in comparison with
finite automaton string matching.
2/28/2024 Dr. D. P. Acharjya 5
Knuth-Morris-Pratt Algorithm
KMP-MATCHER(T, P)
1. n ← Length[T]
2. m ← Length[P]
3. π ← COMPUTE PREFIX FUNCTION(P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P[q + 1] ≠ T[i]
7. do q ← π[q]
8. if P[q + 1] = T[i]
9. then q ← q + 1
10. if q = m
11. then print “Pattern occurs with shift” (i – m)
12. q ← π[q]
2/28/2024 Dr. D. P. Acharjya 6
Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
1. m ← Length[P]
2. π[1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P[k + 1] ≠ P[q]
6. do k ← π[k]
7. if P[k +1] = P[q]
8. then k ← k+1
9. π[q] ← k
10. Return π
2/28/2024 Dr. D. P. Acharjya 7
Computing Time
The for loop at step 4 of compute prefix function is
computed (m-1) times. We associate a potential of k
with the current state k of the algorithm. This potential
has an initial value of 0.
The only other line that affects k is line 8, which
increases k by at most one during each execution of
the for loop body. Since the number of outer-loop
iterations is (m - 1), the computing time for compute
prefix function is Θ(m).
The total actual worst-case running time is Θ(m). It is
considered as pre-processing time.
2/28/2024 Dr. D. P. Acharjya 8
Continued…
The for loop at step 5 of KMP algorithm is computed
n - times. We associate a potential of q with the
current state q of the algorithm. This potential has an
initial value of 0.
The line that affects q is line 9, which increases q by at
most one during each execution of the for loop body.
Since the number of outer-loop iterations is n, the
computing time for KMP algorithm is Θ(n).
The total computing time for the algorithm is: Θ(m) +
Θ(n).
The algorithm computes the match in linear time.
2/28/2024 Dr. D. P. Acharjya 9
Numerical Illustration
Consider a text T =
bacbabababacaca, find all the
valid shifts for the pattern P =
ababaca.
Compute Prefix Function
m = Length[P] = 7
(1) = 0
k=0
q=2
k > 0 and P[1]P[2] (False)
2/28/2024 Dr. D. P. Acharjya 10
Continued …
q=2
k > 0 and P[1]P[2] (False)
P[1] = P[2] (False)
[2] = k = 0
q=3
k > 0 and P[1]P[3] (False)
P[1] = P[3] (True)
k=k+1=1
[3] = 1
2/28/2024 Dr. D. P. Acharjya 11
Continued …
q=4
k=1>0 and P[2]P[4] (False)
P[k+1=2] = P[4] (True)
k=k+1=2
[4] = 2
q=5
k=2>0 and P[3]P[5] (False)
P[k+1=3] = P[5] (True)
k=k+1=3
[5] = 3
2/28/2024 Dr. D. P. Acharjya 12
Continued …
q=6
k=3>0 and P[4]P[6] (True)
k = [3] = 1
k=1>0 and P[2]P[6] (True)
k = [1] = 0
k=0>0 and P[1]P[6] (False)
P[k+1=1] = P[6] (False)
[6] = 0 (k)
2/28/2024 Dr. D. P. Acharjya 13
Continued …
q=7
k=0>0 and P[1]P[7] (False)
P[k+1=1] = P[7] (True)
k = k+1 =1
[7] = 1 (k)
2/28/2024 Dr. D. P. Acharjya 14
Continued …
n = Length [T] = 15
m = Length[P] = 7
Compute
q=0
For i = 1 to 15
2/28/2024 Dr. D. P. Acharjya 15
Continued …
For q = 0 and i = 1
q >0 and P[1] T[1] (False)
P[1] = T[1] (False)
q m (7)
2/28/2024 Dr. D. P. Acharjya 16
Continued …
For q = 0 and i = 2
q >0 and P[1] T[2] (False)
P[1] = T[2] (True)
q = q +1 = 1
q (1) m (7)
2/28/2024 Dr. D. P. Acharjya 17
Continued …
For q = 1 and i = 3
q=1>0 and P[2]T[3](True)
q = [q =1] = 0
q=0>0 and P[1]T[3](False)
P[1] = T[3] (False)
q (0) m (7)
2/28/2024 Dr. D. P. Acharjya 18
Continued …
For q = 0 and i = 4
q>0 and P[1]T[4](False)
P[1] = T[4] (False)
q (0) m (7)
2/28/2024 Dr. D. P. Acharjya 19
Continued …
For q = 0 and i = 5
q>0 and P[1]T[5](False)
P[1] = T[5] (True)
q = q + 1 =1
q (1) m (7)
2/28/2024 Dr. D. P. Acharjya 20
Continued …
For q = 1 and i = 6
q>0 and P[2]T[6](False)
P[2] = T[6] (True)
q = q + 1 =2
q (2) m (7)
2/28/2024 Dr. D. P. Acharjya 21
Continued …
For q = 2 and i = 7
q>0 and P[3]T[7](False)
P[3] = T[7] (True)
q = q + 1 =3
q (3) m (7)
2/28/2024 Dr. D. P. Acharjya 22
Continued …
For q = 3 and i = 8
q>0 and P[4]T[8](False)
P[4] = T[8] (True)
q = q + 1 =4
q (4) m (7)
2/28/2024 Dr. D. P. Acharjya 23
Continued …
For q = 4 and i = 9
q>0 and P[5]T[9](False)
P[5] = T[9] (True)
q = q + 1=5
q (5) m (7)
2/28/2024 Dr. D. P. Acharjya 24
Continued …
For q = 5 and i = 10
q>0 and P[6]T[10](True)
q = [q =5] = 3
q>0 and P[4]T[10](False)
P[4] = T[10] (True)
q = q + 1=4
q (4) m (7)
2/28/2024 Dr. D. P. Acharjya 25
Continued …
For q = 4 and i = 11
q>0 and P[5]T[11](False)
P[5] = T[11] (True)
q = q + 1=5
q (5) m (7)
2/28/2024 Dr. D. P. Acharjya 26
Continued …
For q = 5 and i = 12
q>0 and P[6]T[12](False)
P[6] = T[12] (True)
q = q + 1=6
q (6) m (7)
2/28/2024 Dr. D. P. Acharjya 27
Continued …
For q = 6 and i = 13
q>0 and P[7]T[13](False)
P[7] = T[13] (True)
q = q + 1=7
q (7) = m (7)
Pattern with shift 13 – 7 = 6
q = [q =7] = 1
2/28/2024 Dr. D. P. Acharjya 28
Continued …
For q = 1 and i = 14
q>0 and P[2]T[14](True)
q = [q=1] = 0
P[2] = T[14] (False)
q (0) = m (7)
2/28/202 Dr. D. P. Acharjya 29
4
Continued …
For q = 0 and i = 15
q>0 and P[1]T[15](False)
P[1] = T[15] (True)
q (0) = m (7) (False)
Process Terminates
2/28/2024 Dr. D. P. Acharjya 30