Longest Common Subsequence
Analysis of Algorithms 1
Longest Common Subsequence
• Problem: Given 2 sequences, X = 〈x1,...,xm〉 and
Y = 〈y1,...,yn〉, find a common subsequence whose length is maximum.
springtime ncaa tournament basketball
printing north carolina krzyzewski
Subsequence need not be consecutive, but must be in order.
Comp 122, Spring 2004
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
ABA?
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is a
subset of the sequence defined by a set of increasing
indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
ABA
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
ACA?
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
ACA
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
DCA?
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
DCA
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
AADAA?
Longest common subsequence (LCS)
For a sequence X = x1, x2, …, xn, a subsequence is
a subset of the sequence defined by a set of
increasing indices (i1, i2, …, ik) where
1 ≤ i1 < i2 < … < i k ≤ n
X=ABACDABAB
AADAA
LCS problem
Given two sequences X and Y, a common subsequence is a
subsequence that occurs in both X and Y
Given two sequences X = x1, x2, …, xn and
Y = y1, y2, …, yn,
What is the longest common subsequence?
X=ABCBDAB
Y=BDCABA
LCS problem
Given two sequences X and Y, a common subsequence is a
subsequence that occurs in both X and Y
Given two sequences X = x1, x2, …, xn and
Y = y1, y2, …, yn,
What is the longest common subsequence?
X=ABCBDAB
Y=BDCABA
Longest Common Subsequence
Analysis of Algorithms 13
Longest Common Subsequence
Analysis of Algorithms 14
Characterizing a Longest Common Subsequence
Analysis of Algorithms 15
Optimal Substructure
Theorem
Let Z = 〈z1, . . . , zk〉 be any LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Comp 122, Spring 2004
Naïve Solution
Let X be a sequence of length m,
and Y a sequence of length n.
Check for every subsequence of X whether it is a subsequence of Y, and
return the longest common subsequence found.
There are 2m subsequences of X. Testing a sequences whether or not it is a
subsequence of Y takes O(n) time. Thus, the naïve algorithm would
take O(n2m) time.
17
Naïve Algorithm
• For every subsequence of X, check whether it’s a subsequence of Y .
• Time: Θ(n2m).
– 2m subsequences of X to check.
– Each subsequence takes Θ(n) time to check:
scan Y for first letter, for second, and so on.
Comp 122, Spring 2004
Optimal Substructure
Theorem
Let Z = 〈z1, . . . , zk〉 be any LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Notation:
prefix Xi = 〈x1,...,xi〉 is the first i letters of X.
This says what any longest common subsequence must look like;
do you believe it?
Comp 122, Spring 2004
Optimal Substructure
Theorem
Let Z = 〈z1, . . . , zk〉 be any LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Proof: (case 1: xm = yn)
Any sequence Z’ that does not end in xm = yn can be made longer by adding xm = yn
to the end. Therefore,
(1) longest common subsequence (LCS) Z must end in xm = yn.
(2) Zk-1 is a common subsequence of Xm-1 and Yn-1, and
(3) there is no longer CS of Xm-1 and Yn-1, or Z would not be an LCS.
Comp 122, Spring 2004
Optimal Substructure
Theorem
Let Z = 〈z1, . . . , zk〉 be any LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Proof: (case 2: xm ≠ yn, and zk ≠ xm)
Since Z does not end in xm,
(1) Z is a common subsequence of Xm-1 and Y, and
(2) there is no longer CS of Xm-1 and Y, or Z would not be an LCS.
Comp 122, Spring 2004
Recursive Solution
• Define c[i, j] = length of LCS of Xi and Yj .
• We want c[m,n].
This gives a recursive algorithm and solves the problem.
But does it solve it well?
Comp 122, Spring 2004
Longest Common Subsequence Algorithm
Analysis of Algorithms 23
A Recursive Solution to Subproblems
Analysis of Algorithms 24
A Recursive Solution to Subproblems
Analysis of Algorithms 25
Example
INPUT: two strings
OUTPUT: longest common subsequence
X= ACTGAACTCTGTGCACT
Y= TGACTCAGCACAAAAAC
Z= TGACTCGCAC
Ex: X= {A B C B D A B }, Y= {B D C A B A}
Longest Common Subsequence:
X= ABCBDAB X= AB C BDAB
Y= BDCAB A Y= BDCAB A
Analysis of Algorithms 26
Example
Analysis of Algorithms 27
Computing the Length of an LCS
Analysis of Algorithms
28
LCS Example
We’ll see how LCS algorithm works on the following example:
• X = ABCB
• Y = BDCAB
What is the Longest Common Subsequence
of X and Y?
LCS(X, Y) = BCB
X=ABCB
Y=BDCAB
29
LCS Example (0)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi
A
1
2 B
3 C
4 B
X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
30
LCS Example (1)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0
2 B
0
3 C 0
4 B 0
for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0
31
LCS Example (2) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0
2 B
0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
32
LCS Example (3) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0
2 B
0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
33
LCS Example (4) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1
2 B
0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
34
LCS Example (5) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
35
LCS Example (6) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
36
LCS Example (7) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
37
LCS Example (8) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
38
LCS Example (10) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
39
LCS Example (11) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
40
LCS Example (12) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
41
LCS Example (13) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
42
LCS Example (14) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
43
LCS Example (15) ABCB
j 0 1 2 3 4 5
BDCAB
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
44
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
45
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi
0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B
0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
46
LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c[m,n] // return LCS length for X and Y
47
Constructing an LCS
Analysis of Algorithms 48
Constructing an LCS
Analysis of Algorithms 49
Computing the length of an LCS
LCS-LENGTH (X, Y)
1. m ← length[X]
2. n ← length[Y]
3. for i ← 1 to m
4. do c[i, 0] ← 0
5. for j ← 0 to n
6. do c[0, j ] ← 0
7. for i ← 1 to m
b[i, j ] points to table entry
8. do for j ← 1 to n whose subproblem we used
9. do if xi = yj in solving LCS of Xi
10. then c[i, j ] ← c[i−1, j−1] + 1 and Yj.
11. b[i, j ] ← “ ”
12. else if c[i−1, j ] ≥ c[i, j−1]
13. then c[i, j ] ← c[i− 1, j ] c[m,n] contains the length
14. b[i, j ] ← “↑” of an LCS of X and Y.
15. else c[i, j ] ← c[i, j−1]
16. b[i, j ] ← “←”
Time: O(mn)
17. return c and b
Comp 122, Spring 2004
Constructing an LCS
PRINT-LCS (b, X, i, j)
1. if i = 0 or j = 0
2. then return
3. if b[i, j ] = “ ”
4. then PRINT-LCS(b, X, i−1, j−1)
5. print xi
6. elseif b[i, j ] = “↑”
7. then PRINT-LCS(b, X, i−1, j)
8. else PRINT-LCS(b, X, i, j−1)
•Initial call is PRINT-LCS (b, X,m, n).
•When b[i, j ] = , we have extended LCS by one character. So LCS
= entries with in them.
•Time: O(m+n)
Comp 122, Spring 2004
Example 2
Analysis of Algorithms 53
If xi=yj i-1,j-1
k
K+1
Arrow= i,j
i-1,j
If xi!=yj
max
i,j-1
If max
If max
Arrow=
If xi=yj i-1,j-1
k
K+1
Arrow=
i-1,j
If xi!=yj
max
i,j-1
If max
Arrow=
LCS= B C B A
The algorithm
The algorithm
Base case initialization
The algorithm
Fill in the matrix
The algorithm
The algorithm
The algorithm
Running time?
Θ(nm)
Computing the Length of an LCS
Analysis of Algorithms 64
Longest Common Subsequence
This improvement works if we only need the length of an LCS
Analysis of Algorithms 65
Example 3
Example: Determine the LCS of (1,0,0,1,0,1,0,1) and (0,1,0,1,1,0,1,1,0).
Solution: let X = (1,0,0,1,0,1,0,1) and
Y = (0,1,0,1,1,0,1,1,0).
From the table we can deduct that LCS = 6.
There are several such sequences, for
instance (1,0,0,1,1,0)
(0,1,0,1,0,1) and
(0,0,1,1,0,1)
Analysis of Algorithms 66
Thanks