Longest Common Subsequence
finding using Dynamic
Programming
Design and Analysis of Algorithms
Week 11 – Lecture 1
Longest Common Subsequence (LCS)
• The longest common subsequence (LCS) is defined as the longest subsequence
that is common to all the given sequences, provided that the elements of the
subsequence are not required to occupy consecutive positions within the
original sequences.
• If S1 and S2 are the two given sequences then, Z is the common subsequence
of S1 and S2 if Z is a subsequence of both S1 and S2. Furthermore, Z must be
a strictly increasing sequence of the indices of both S1 and S2.
• In a strictly increasing sequence, the indices of the elements chosen from the
original sequences must be in ascending order in Z.
Longest Subsequence in a Single String
• If
S1 = {B, C, D, A, A, C, D}
• Then, subsequences can be {B,D}, {A,C}, {A,C,D} etc.
• Here, {A, D, B} cannot be a subsequence of S1 as the order of the elements is not the same (i.e.
not strictly increasing sequence).
• Number of possible subsequences are 2n, where n is the length of the string.
LS in multiple Strings
If
S1 = {B, C, D, A, A, C, D} S2 = {A, C, D, B, A, C}
Then, common subsequences are {B, C}, {C, D, A, C}, {D, A, C}, {A, A, C}, {A, C}, {C, D}, ...
Among these subsequences, {C, D, A, C} is the longest common subsequence.
Algorithm for Finding LCS
function findLCS(str1, str2): // Retrieve the LCS using the table
m = length of str1 lenLCS = lcsTable[m][n]
n = length of str2 lcs = new character array of length lenLCS
// Create a 2D table to store the length of LCS
// Backtrack to find the LCS
lcsTable[0..m][0..n]
i = m, j = n
// Initialize the table
while i > 0 and j > 0:
for i from 0 to m:
if str1[i-1] == str2[j-1]:
for j from 0 to n:
lcs[lenLCS-1] = str1[i-1]
if i == 0 or j == 0:
lcsTable[i][j] = 0 lenLCS = lenLCS - 1
// Building the LCS table in bottom-up manner i=i-1
for i from 1 to m: j=j-1
for j from 1 to n: else if lcsTable[i-1][j] > lcsTable[i][j-1]:
if str1[i-1] == str2[j-1]: i=i-1
lcsTable[i][j] = lcsTable[i-1][j-1] + 1 else:
else:
j=j-1
lcsTable[i][j] = max(lcsTable[i-1][j], lcsTable[i][j-1])
Asymptotic Analysis of LCS using DP
The asymptotic complexity of the Longest Common Subsequence (LCS)
problem using dynamic programming is O(m * n), where m and n are the
lengths of the two input sequences.
The dynamic programming approach involves constructing a matrix of size
(m + 1) x (n + 1) to store the solutions to subproblems.
The matrix is filled in a bottom-up manner using nested loops, resulting in
a time complexity of O(m * n).
while (i > 0 && j > 0) {
Sample Code if (S1[i - 1] == S2[j - 1]) {
lcsAlgo[index - 1] = S1[i - 1];
// The longest common subsequence in C++ i--;
void lcsAlgo(char *S1, char *S2, int m, int n) { j--;
int LCS_table[m + 1][n + 1]; index--; }
// Building the mtrix in bottom-up way else if (LCS_table[i - 1][j] > LCS_table[i][j - 1])
for (int i = 0; i <= m; i++) { i--;
else
for (int j = 0; j <= n; j++) {
j--;
if (i == 0 || j == 0)
}
LCS_table[i][j] = 0;
// Printing the sub sequences
else if (S1[i - 1] == S2[j - 1])
cout << "S1 : " << S1 << "\nS2 : " << S2 << "\nLCS: " << lcsAlgo << "\n";
LCS_table[i][j] = LCS_table[i - 1][j - 1] + 1;
}
else int main() {
LCS_table[i][j] = max(LCS_table[i - 1][j], LCS_table[i][j - 1]); } } char S1[] = "ACADB";
int index = LCS_table[m][n]; char S2[] = "CBDA";
char lcsAlgo[index + 1]; int m = strlen(S1);
lcsAlgo[index] = '\0'; int n = strlen(S2);
int i = m, j = n; lcsAlgo(S1, S2, m, n); }
USING DP TO FIND LCS -
EXAMPLE
X = {a b a a b a}
Y = {b a b b a b}
Not possible to find all common subsequence due to large number of sequences, i.e., 2 6 = 64.
IF both characters are different then select the maximum value between immediate left and upper cells.
IF both characters are same then select the immediate upper diagonal value and add 1 to it and write in the current cell.
Y - b a b b a b
- 0 0 0 0 0 0 0
X a 0 0 1 1 1 1 1
b 0 1 1 2 2 2 2
a 0 1 2 2 2 3 3
a 0 1 2 2 2 3 3
b 0 1 2 3 3 3 4
a 0 1 2 3 3 4 4
This shows that the length of the
longest common subsequence will
be 4.
FINDING LCS
Now, we know that the length of the longest sub-sequence will be 4 but what are those 4
characters ?
To find that, we will backtrack through the last cell of the table.
Backtracking Method:
1. If the arrow move to the left or upwards, then just move along the path.
2. Stop when you reach the cell where the diagonal arrow appears.
3. Check the alphabets on this cells row and column. In first case, it will be both a’s on row and column
position, and in second case it will be both b’s.
4. Write a as the last letter and b as the second last letter of the LCS. (Yes, the method of writing this is from
right to left, i.e. --ba)
5. Continuing this, the final substring will be {b a b a}.
It is imperative to note that it is not necessary that the above found substring is the only LCS. In previous
table, if you draw an arrow to above cell in the last cell then the substring comes out different {b a a b}.