Design and Analysis of Algorithms
String Matching Problem
            (Knuth-Morris-Pratt Algorithm)
                                           Dr. D. P. Acharjya
                                              Professor, SCOPE
                                             SJT Annex – 201 E
2/28/2024             Dr. D. P. Acharjya                         1
String Matching Problem
 Finding all occurrences of a pattern (String) in a text
  leads to string matching problem.
 It arises frequently in text editing programs where text
  is a document, and the string searched for is a
  particular word.
 String-matching algorithms are also used to search for
  particular patterns in DNA sequences.
 For Example:
2/28/2024              Dr. D. P. Acharjya               2
Mathematical Formulation
 Assume that the text is an array T[1  n] of length n.
  The pattern is an array P[1  m] of length m. It is to be
  noted that m ≤ n.
 It is to be noted that the elements of P and T are
  characters drawn from a finite alphabet Σ. For example,
  Σ = {0, 1} or Σ = {a, b, , z}.
 The character arrays P and T are often called strings of
  characters.
2/28/2024                Dr. D. P. Acharjya                3
Continued…
 The pattern P occurs with shift s in text T (or pattern P
  occurs beginning at position s + 1 in text T) if
      0 ≤ s ≤ n - m and T[s+1  s+m] = P[1  m]
 It means that T[s + j] = P[j], for 1 ≤ j ≤ m.
 If P occurs with shift s in T, then it is a valid shift
  otherwise, s is an invalid shift.
 The string-matching problem is the problem of finding
  all valid shifts with which a given pattern P occurs in
  a given text T.
2/28/2024               Dr. D. P. Acharjya               4
Prefix Function
 The prefix function π for a pattern encapsulates
  knowledge about how the pattern matches against
  shifts of itself.
 This information can be used to avoid testing useless
  shifts in the naïve pattern-matching algorithm.
 Simultaneously, it avoid the precomputation of δ
  (transition function of finite automaton) for a string-
  matching automaton.
 It minimizes the total computation in comparison with
  finite automaton string matching.
2/28/2024              Dr. D. P. Acharjya              5
Knuth-Morris-Pratt Algorithm
KMP-MATCHER(T, P)
1. n ← Length[T]
2. m ← Length[P]
3. π ← COMPUTE PREFIX FUNCTION(P)
4. q ← 0
5. for i ← 1 to n
6.   do while q > 0 and P[q + 1] ≠ T[i]
7.             do q ← π[q]
8.       if P[q + 1] = T[i]
9.             then q ← q + 1
10.      if q = m
11.           then print “Pattern occurs with shift” (i – m)
12.              q ← π[q]
2/28/2024                  Dr. D. P. Acharjya                  6
Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
1. m ← Length[P]
2. π[1] ← 0
3. k ← 0
4. for q ← 2 to m
5.    do while k > 0 and P[k + 1] ≠ P[q]
6.            do k ← π[k]
7.       if P[k +1] = P[q]
8.            then k ← k+1
9.       π[q] ← k
10. Return π
2/28/2024              Dr. D. P. Acharjya   7
Computing Time
 The for loop at step 4 of compute prefix function is
  computed (m-1) times. We associate a potential of k
  with the current state k of the algorithm. This potential
  has an initial value of 0.
 The only other line that affects k is line 8, which
  increases k by at most one during each execution of
  the for loop body. Since the number of outer-loop
  iterations is (m - 1), the computing time for compute
  prefix function is Θ(m).
 The total actual worst-case running time is Θ(m). It is
  considered as pre-processing time.
2/28/2024               Dr. D. P. Acharjya               8
Continued…
 The for loop at step 5 of KMP algorithm is computed
  n - times. We associate a potential of q with the
  current state q of the algorithm. This potential has an
  initial value of 0.
 The line that affects q is line 9, which increases q by at
  most one during each execution of the for loop body.
 Since the number of outer-loop iterations is n, the
  computing time for KMP algorithm is Θ(n).
 The total computing time for the algorithm is: Θ(m) +
  Θ(n).
 The algorithm computes the match in linear time.
2/28/2024               Dr. D. P. Acharjya                9
Numerical Illustration
 Consider a text T =
  bacbabababacaca, find all the
  valid shifts for the pattern P =
  ababaca.
 Compute Prefix Function
 m = Length[P] = 7
  (1) = 0
 k=0
 q=2
 k > 0 and P[1]P[2] (False)
2/28/2024              Dr. D. P. Acharjya   10
Continued …
 q=2
 k > 0 and P[1]P[2] (False)
 P[1] = P[2] (False)
 [2] = k = 0
 q=3
 k > 0 and P[1]P[3] (False)
 P[1] = P[3] (True)
    k=k+1=1
 [3] = 1
2/28/2024            Dr. D. P. Acharjya   11
Continued …
 q=4
 k=1>0 and P[2]P[4] (False)
 P[k+1=2] = P[4] (True)
    k=k+1=2
 [4] = 2
 q=5
 k=2>0 and P[3]P[5] (False)
 P[k+1=3] = P[5] (True)
    k=k+1=3
 [5] = 3
2/28/2024           Dr. D. P. Acharjya   12
Continued …
 q=6
 k=3>0 and P[4]P[6] (True)
    k = [3] = 1
 k=1>0 and P[2]P[6] (True)
    k = [1] = 0
 k=0>0 and P[1]P[6] (False)
 P[k+1=1] = P[6] (False)
 [6] = 0 (k)
2/28/2024           Dr. D. P. Acharjya   13
Continued …
 q=7
 k=0>0 and P[1]P[7] (False)
 P[k+1=1] = P[7] (True)
 k = k+1 =1
 [7] = 1 (k)
2/28/2024           Dr. D. P. Acharjya   14
Continued …
 n = Length [T] = 15
 m = Length[P] = 7
 Compute 
 q=0
 For i = 1 to 15
2/28/2024               Dr. D. P. Acharjya   15
Continued …
    For q = 0 and i = 1
    q >0 and P[1]  T[1] (False)
    P[1] = T[1] (False)
    q  m (7)
2/28/2024                Dr. D. P. Acharjya   16
Continued …
    For q = 0 and i = 2
    q >0 and P[1]  T[2] (False)
    P[1] = T[2] (True)
    q = q +1 = 1
    q (1)  m (7)
2/28/2024                Dr. D. P. Acharjya   17
Continued …
    For q = 1 and i = 3
    q=1>0 and P[2]T[3](True)
    q = [q =1] = 0
    q=0>0 and P[1]T[3](False)
    P[1] = T[3] (False)
    q (0)  m (7)
2/28/2024              Dr. D. P. Acharjya   18
Continued …
    For q = 0 and i = 4
    q>0 and P[1]T[4](False)
    P[1] = T[4] (False)
    q (0)  m (7)
2/28/2024               Dr. D. P. Acharjya   19
Continued …
    For q = 0 and i = 5
    q>0 and P[1]T[5](False)
    P[1] = T[5] (True)
    q = q + 1 =1
    q (1)  m (7)
2/28/2024               Dr. D. P. Acharjya   20
Continued …
    For q = 1 and i = 6
    q>0 and P[2]T[6](False)
    P[2] = T[6] (True)
    q = q + 1 =2
    q (2)  m (7)
2/28/2024               Dr. D. P. Acharjya   21
Continued …
    For q = 2 and i = 7
    q>0 and P[3]T[7](False)
    P[3] = T[7] (True)
    q = q + 1 =3
    q (3)  m (7)
2/28/2024               Dr. D. P. Acharjya   22
Continued …
    For q = 3 and i = 8
    q>0 and P[4]T[8](False)
    P[4] = T[8] (True)
    q = q + 1 =4
    q (4)  m (7)
2/28/2024               Dr. D. P. Acharjya   23
Continued …
    For q = 4 and i = 9
    q>0 and P[5]T[9](False)
    P[5] = T[9] (True)
    q = q + 1=5
    q (5)  m (7)
2/28/2024               Dr. D. P. Acharjya   24
Continued …
    For q = 5 and i = 10
    q>0 and P[6]T[10](True)
    q = [q =5] = 3
    q>0 and P[4]T[10](False)
    P[4] = T[10] (True)
    q = q + 1=4
    q (4)  m (7)
2/28/2024              Dr. D. P. Acharjya   25
Continued …
    For q = 4 and i = 11
    q>0 and P[5]T[11](False)
    P[5] = T[11] (True)
    q = q + 1=5
    q (5)  m (7)
2/28/2024              Dr. D. P. Acharjya   26
Continued …
    For q = 5 and i = 12
    q>0 and P[6]T[12](False)
    P[6] = T[12] (True)
    q = q + 1=6
    q (6)  m (7)
2/28/2024              Dr. D. P. Acharjya   27
Continued …
    For q = 6 and i = 13
    q>0 and P[7]T[13](False)
    P[7] = T[13] (True)
    q = q + 1=7
    q (7) = m (7)
    Pattern with shift 13 – 7 = 6
    q = [q =7] = 1
2/28/2024                 Dr. D. P. Acharjya   28
Continued …
    For q = 1 and i = 14
    q>0 and P[2]T[14](True)
    q = [q=1] = 0
    P[2] = T[14] (False)
    q (0) = m (7)
2/28/202               Dr. D. P. Acharjya   29
4
Continued …
    For q = 0 and i = 15
    q>0 and P[1]T[15](False)
    P[1] = T[15] (True)
    q (0) = m (7) (False)
    Process Terminates
2/28/2024              Dr. D. P. Acharjya   30