KEMBAR78
5.3 dynamic programming 03 | PPT
Dynamic Programming 1
Dynamic Programming
Dynamic Programming 2
Outline and Reading
Matrix Chain-Product (§5.3.1)
The General Technique (§5.3.2)
0-1 Knapsack Problem (§5.3.3)
Dynamic Programming 3
Matrix Chain-Products
Dynamic Programming is a general
algorithm design paradigm.
 Rather than give the general structure, let
us first give a motivating example:
 Matrix Chain-Products
Review: Matrix Multiplication.
 C = A*B
 A is d × e and B is e × f
 O(d⋅e⋅f ) time
A C
B
d d
f
e
f
e
i
j
i,j
∑
−
=
=
1
0
],[*],[],[
e
k
jkBkiAjiC
Dynamic Programming 4
Matrix Chain-Products
Matrix Chain-Product:
 Compute A=A0*A1*…*An-1
 Ai is di × di+1
 Problem: How to parenthesize?
Example
 B is 3 × 100
 C is 100 × 5
 D is 5 × 5
 (B*C)*D takes 1500 + 75 = 1575 ops
 B*(C*D) takes 1500 + 2500 = 4000 ops
Dynamic Programming 5
Enumeration Approach
Matrix Chain-Product Alg.:
 Try all possible ways to parenthesize
A=A0*A1*…*An-1
 Calculate number of ops for each one
 Pick the one that is best
Running time:
 The number of parenthesizations is equal
to the number of binary trees with n nodes
 This is exponential!
 It is called the Catalan number, and it is
almost 4n
.
 This is a terrible algorithm!
Dynamic Programming 6
Greedy Approach
Idea #1: repeatedly select the product that uses
the fewest operations.
Counter-example:
 A is 101 × 11
 B is 11 × 9
 C is 9 × 100
 D is 100 × 99
 Greedy idea #1 gives A*((B*C)*D)), which takes
109989+9900+108900=228789 ops
 (A*B)*(C*D) takes 9999+89991+89100=189090 ops
The greedy approach is not giving us the optimal
value.
Dynamic Programming 7
“Recursive” Approach
Define subproblems:
 Find the best parenthesization of Ai*Ai+1*…*Aj.
 Let Ni,j denote the number of operations done by this subproblem.
 The optimal solution for the whole problem is N0,n-1.
Subproblem optimality: The optimal solution can be
defined in terms of optimal subproblems
 There has to be a final multiplication (root of the expression tree) for
the optimal solution.
 Say, the final multiplication is at index i: (A0*…*Ai)*(Ai+1*…*An-1).
 Then the optimal solution N0,n-1 is the sum of two optimal
subproblems, N0,i and Ni+1,n-1 plus the time for the last multiplication.
Dynamic Programming 8
Characterizing
Equation
The global optimal has to be defined in terms of optimal
subproblems, depending on where the final
multiplication is at.
Let us consider all possible places for that final
multiplication:
 Recall that Ai is a di × di+1 dimensional matrix.
 So, a characterizing equation for Ni,j is the following:
Note that subproblems are not independent–the
subproblems overlap.
}{min 11,1,, +++
<≤
++= jkijkki
jki
ji dddNNN
Dynamic Programming 9
Subproblem Overlap
Algorithm RecursiveMatrixChain(S, i, j):
Input: sequence S of n matrices to be multiplied
Output: number of operations in an optimal parenthesization of S
if i=j
then return 0
for k ← i to j do
Ni, j ← min{Ni,j, RecursiveMatrixChain(S, i ,k)+ RecursiveMatrixChain(S, k+1,j)+ di dk+1
dj+1}
return Ni,j
Dynamic Programming 10
Subproblem Overlap
1..4
1..1 2..4 1..2 3..4 1..3 4..4
2..2 3..4 2..3 4..4 3..3 4..41..1 2..2
3..3 4..4 2..2 3..3
...
Dynamic Programming 11
Dynamic Programming
Algorithm
Since
subproblems
overlap, we don’t
use recursion.
Instead, we
construct optimal
subproblems
“bottom-up.”
Ni,i’s are easy, so
start with them
Then do
problems of
“length” 2,3,…
subproblems,
and so on.
Running time:
O(n3
)
Algorithm matrixChain(S):
Input: sequence S of n matrices to be multiplied
Output: number of operations in an optimal
parenthesization of S
for i ← 1 to n − 1 do
Ni,i ← 0
for b ← 1 to n − 1 do
{ b = j − i is the length of the problem }
for i ← 0 to n − b - 1 do
j ← i + b
Ni,j ← +∞
for k ← i to j − 1 do
Ni,j ← min{Ni,j, Ni,k + Nk+1,j + di dk+1 dj+1}
return N0,n-1
Dynamic Programming 12
answer
N 0 1
0
1
2 …
n-1
…
n-1j
i
Dynamic Programming
Algorithm Visualization
The bottom-up
construction fills in the N
array by diagonals
Ni,j gets values from
previous entries in i-th row
and j-th column
Filling in each entry in the
N table takes O(n) time.
Total run time: O(n3
)
Getting actual
parenthesization can be
done by remembering “k”
for each N entry
}{min 11,1,, +++
<≤
++= jkijkki
jki
ji dddNNN
i
j
Dynamic Programming 13
Dynamic Programming Algorithm
Visualization
A0: 30 X 35; A1: 35 X15; A2: 15X5;
A3: 5X10; A4: 10X20; A5: 20 X 25
7125
}
1137520*10*3504375
,712520*5*3510002625
,1300020*15*3525000
min{
5414,43,1
5314,32,1
5214,21,1
4,1
=
=++=++
=++=++
=++=++
=
dddNN
dddNN
dddNN
N
}{min 11,1,, +++
<≤
++= jkijkki
jki
ji dddNNN
Dynamic Programming 14
Dynamic Programming
Algorithm
Since
subproblems
overlap, we don’t
use recursion.
Instead, we
construct optimal
subproblems
“bottom-up.”
Ni,i’s are easy, so
start with them
Then do
problems of
“length” 2,3,…
subproblems,
and so on.
Running time:
O(n3
)
Algorithm matrixChain(S):
Input: sequence S of n matrices to be multiplied
Output: number of operations in an optimal
parenthesization of S
for i ← 1 to n − 1 do
Ni,i ← 0 ; Tii i;
for b ← 1 to n − 1 do
{ b = j − i is the length of the problem }
for i ← 0 to n − b - 1 do
j ← i + b
Ni,j ← +∞; Tij i;
for k ← i to j − 1 do
If (Ni,j> Ni,k + Nk+1,j + di dk+1 dj+1)
Ti,jk
Ni,j ← min{Ni,j, Ni,k + Nk+1,j + di dk+1 dj+1}
return N0,n-1
Dynamic Programming 15
Dynamic Programming Algorithm
Visualization
(A0*(A1*A2))*((A3*A4)*A5)
Dynamic Programming 16
The General Dynamic
Programming Technique
Applies to a problem that at first seems to
require a lot of time (possibly exponential),
provided we have:
 Subproblem optimality: the global optimum value
can be defined in terms of optimal subproblems
 Subproblem overlap: the subproblems are not
independent, but instead they overlap (hence,
should be constructed bottom-up).
Dynamic Programming 17
The 0/1 Knapsack Problem
Given: A set S of n items, with each item i having
 wi - a positive weight
 bi - a positive benefit
Goal: Choose items with maximum total benefit but with
weight at most W.
If we are not allowed to take fractional amounts, then
this is the 0/1 knapsack problem.
 In this case, we let Tdenote the set of items we take
 Objective: maximize
 Constraint:
∑∈Ti
ib
∑∈
≤
Ti
i Ww
Dynamic Programming 18
Given: A set S of n items, with each item i having
 bi - a positive “benefit”
 wi - a positive “weight”
Goal: Choose items with maximum total benefit but with
weight at most W.
Example
Weight:
Benefit:
1 2 3 4 5
4 in 2 in 2 in 6 in 2 in
$20 $3 $6 $25 $80
Items:
box of width 9 in
Solution:
• item 5 ($80, 2 in)
• item 3 ($6, 2in)
• item 1 ($20, 4in)
“knapsack”
Dynamic Programming 19
A 0/1 Knapsack Algorithm,
First Attempt
Sk: Set of items numbered 1 to k.
Define B[k] = best selection from Sk.
Problem: does not have subproblem optimality:
 Consider set S={(3,2),(5,4),(8,5),(4,3),(10,9)} of
(benefit, weight) pairs and total weight W = 20
Best for S4:
Best for S5:
Dynamic Programming 20
A 0/1 Knapsack Algorithm,
Second Attempt
Sk: Set of items numbered 1 to k.
Define B[k,w] to be the best selection from Sk with
weight at most w
Good news: this does have subproblem optimality.
I.e., the best subset of Sk with weight at most w is either
 the best subset of Sk-1 with weight at most w or
 the best subset of Sk-1 with weight at most w−wk plus item k



+−−−
>−
=
else}],1[],,1[max{
if],1[
],[
kk
k
bwwkBwkB
wwwkB
wkB
Dynamic Programming 21
0/1 Knapsack Algorithm
Consider set S={(1,1),(2,2),(4,3),(2,2),(5,5)} of (benefit, weight)
pairs and total weight W = 10
Dynamic Programming 22
0/1 Knapsack Algorithm
Trace back to find the items picked
Dynamic Programming 23
0/1 Knapsack Algorithm
Each diagonal arrow corresponds to adding one item into the bag
Pick items 2,3,5
{(2,2),(4,3),(5,5)} are what you will take away
Dynamic Programming 24
0/1 Knapsack Algorithm
Recall the definition of
B[k,w]
Since B[k,w] is defined in
terms of B[k−1,*], we can
use two arrays of instead of
a matrix
Running time: O(nW).
Not a polynomial-time
algorithm since W may be
large
This is a pseudo-polynomial
time algorithm
Algorithm 01Knapsack(S, W):
Input: set S of n items with benefit bi
and weight wi; maximum weight W
Output: benefit of best subset of S with
weight at most W
let A and B be arrays of length W + 1
for w ← 0 to W do
B[w] ← 0
for k ← 1 to n do
copy array B into array A
for w ← wk to W do
if A[w−wk] + bk > A[w] then
B[w] ← A[w−wk] + bk
return B[W]



+−−−
>−
=
else}],1[],,1[max{
if],1[
],[
kk
k
bwwkBwkB
wwwkB
wkB
Dynamic Programming 25
Longest Common Subsequence
Given two strings, find a longest subsequence that they share
substring vs. subsequence of a string
 Substring: the characters in a substring of S must occur contiguously in S
 Subsequence: the characters can be interspersed with gaps.
Consider ababc and abdcb
alignment 1
ababc.
abd.cb
the longest common subsequence is ab..c with length 3
alignment 2
aba.bc
abdcb.
the longest common subsequence is ab..b with length 3
Dynamic Programming 26
Longest Common Subsequence
Let’s give a score M an alignment in this way,
M=sum s(xi,yi), where xi is the i character in the first aligned sequence
yi is the i character in the second aligned sequence
s(xi,yi)= 1 if xi= yi
s(xi,yi)= 0 if xi≠yior any of them is a gap
The score for alignment:
ababc.
abd.cb
M=s(a,a)+s(b,b)+s(a,d)+s(b,.)+s(c,c)+s(.,b)=3
To find the longest common subsequence between sequences S1 and S2
is to find the alignment that maximizes score M.
Dynamic Programming 27
Longest Common Subsequence
Subproblem optimality
Consider two sequences
Let the optimal alignment be
x1x2x3…xn-1xn
y1y2y3…yn-1yn
There are three possible cases
for the last pair (xn,yn):
S1: a1a2a3…ai
S2: b1b2b3…bj
Dynamic Programming 28
Longest Common Subsequence
Mi,j = MAX {Mi-1, j-1 + S (ai,bj) (match/mismatch)
Mi,j-1 + 0 (gap in sequence #1)
Mi-1,j + 0 (gap in sequence #2) }
Mi,jis the score for optimal alignment between strings a[1…i] (substring of
a from index 1 to i) and b[1…j]
S1: a1a2a3…ai
S2: b1b2b3…bj
There are three cases for (xn,yn) pair:
x1x2x3…xn-1xn
y1y2y3…yn-1yn
Dynamic Programming 29
Examples:
G A A T T C A G T T A (sequence #1)
G G A T C G A (sequence #2)
s(ai,bj)= 1 if ai=bj
s(ai,bj)= 0 if ai≠bj or any of them is a gap
Mi,j = MAX {
Mi-1, j-1 + S(ai,bj)
Mi,j-1 + 0
Mi-1,j + 0
}
Longest Common Subsequence
Dynamic Programming 30
Longest Common Subsequence
M1,1 = MAX[M0,0 + 1, M1, 0 + 0, M0,1 + 0] = MAX [1, 0, 0] = 1
Fill the score matrix M and trace back table B
Score matrix M Trace back table B
Dynamic Programming 31
Longest Common Subsequence
Score matrix M Trace back table B
M7,11=6 (lower right corner of Score matrix)
This tells us that the best alignment has a score of 6
What is the best alignment?
Dynamic Programming 32
Longest Common Subsequence
We need to use trace back table to find out the best alignment,
which has a score of 6
(1) Find the path from lower right
corner to upper left corner
Dynamic Programming 33
Longest Common Subsequence
(2) At the same time, write down the alignment backward
:Take one character
from each sequence
:Take one character
from sequence S1
(columns)
:Take one character
from sequence S2
(rows)
S1
S2
Dynamic Programming 34
Longest Common Subsequence
:Take one character
from each sequence
:Take one character
from sequence S1
(columns)
:Take one character
from sequence S2
(rows)
Dynamic Programming 35
Longest Common Subsequence
Thus, the optimal alignment is
The longest common subsequence is
G.A.T.C.G..A
There might be multiple longest common subsequences (LCSs)
between two given sequences.
These LCSs have the same number of characters (not include gaps)
Dynamic Programming 36
Longest Common Subsequence
Algorithm LCS (string A, string B) {
Input strings A and B
Output the longest common subsequence of A and B
M: Score Matrix
B: trace back table (use letter a, b, c for )
n=A.length()
m=B.length()
// fill in M and B
for (i=0;i<m+1;i++)
for (j=0;j<n+1;j++)
if (i==0) || (j==0)
then M(i,j)=0;
else if (A[i]==B[j])
M(i,j)=max {M[i-1,j-1]+1, M[i-1,j], M[i,j-1]}
{update the entry in trace table B}
else
M(i,j)=max {M[i-1,j-1], M[i-1,j], M[i,j-1]}
{update the entry in trace table B}
then use trace back table B to print out the optimal alignment
…

5.3 dynamic programming 03

  • 1.
  • 2.
    Dynamic Programming 2 Outlineand Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
  • 3.
    Dynamic Programming 3 MatrixChain-Products Dynamic Programming is a general algorithm design paradigm.  Rather than give the general structure, let us first give a motivating example:  Matrix Chain-Products Review: Matrix Multiplication.  C = A*B  A is d × e and B is e × f  O(d⋅e⋅f ) time A C B d d f e f e i j i,j ∑ − = = 1 0 ],[*],[],[ e k jkBkiAjiC
  • 4.
    Dynamic Programming 4 MatrixChain-Products Matrix Chain-Product:  Compute A=A0*A1*…*An-1  Ai is di × di+1  Problem: How to parenthesize? Example  B is 3 × 100  C is 100 × 5  D is 5 × 5  (B*C)*D takes 1500 + 75 = 1575 ops  B*(C*D) takes 1500 + 2500 = 4000 ops
  • 5.
    Dynamic Programming 5 EnumerationApproach Matrix Chain-Product Alg.:  Try all possible ways to parenthesize A=A0*A1*…*An-1  Calculate number of ops for each one  Pick the one that is best Running time:  The number of parenthesizations is equal to the number of binary trees with n nodes  This is exponential!  It is called the Catalan number, and it is almost 4n .  This is a terrible algorithm!
  • 6.
    Dynamic Programming 6 GreedyApproach Idea #1: repeatedly select the product that uses the fewest operations. Counter-example:  A is 101 × 11  B is 11 × 9  C is 9 × 100  D is 100 × 99  Greedy idea #1 gives A*((B*C)*D)), which takes 109989+9900+108900=228789 ops  (A*B)*(C*D) takes 9999+89991+89100=189090 ops The greedy approach is not giving us the optimal value.
  • 7.
    Dynamic Programming 7 “Recursive”Approach Define subproblems:  Find the best parenthesization of Ai*Ai+1*…*Aj.  Let Ni,j denote the number of operations done by this subproblem.  The optimal solution for the whole problem is N0,n-1. Subproblem optimality: The optimal solution can be defined in terms of optimal subproblems  There has to be a final multiplication (root of the expression tree) for the optimal solution.  Say, the final multiplication is at index i: (A0*…*Ai)*(Ai+1*…*An-1).  Then the optimal solution N0,n-1 is the sum of two optimal subproblems, N0,i and Ni+1,n-1 plus the time for the last multiplication.
  • 8.
    Dynamic Programming 8 Characterizing Equation Theglobal optimal has to be defined in terms of optimal subproblems, depending on where the final multiplication is at. Let us consider all possible places for that final multiplication:  Recall that Ai is a di × di+1 dimensional matrix.  So, a characterizing equation for Ni,j is the following: Note that subproblems are not independent–the subproblems overlap. }{min 11,1,, +++ <≤ ++= jkijkki jki ji dddNNN
  • 9.
    Dynamic Programming 9 SubproblemOverlap Algorithm RecursiveMatrixChain(S, i, j): Input: sequence S of n matrices to be multiplied Output: number of operations in an optimal parenthesization of S if i=j then return 0 for k ← i to j do Ni, j ← min{Ni,j, RecursiveMatrixChain(S, i ,k)+ RecursiveMatrixChain(S, k+1,j)+ di dk+1 dj+1} return Ni,j
  • 10.
    Dynamic Programming 10 SubproblemOverlap 1..4 1..1 2..4 1..2 3..4 1..3 4..4 2..2 3..4 2..3 4..4 3..3 4..41..1 2..2 3..3 4..4 2..2 3..3 ...
  • 11.
    Dynamic Programming 11 DynamicProgramming Algorithm Since subproblems overlap, we don’t use recursion. Instead, we construct optimal subproblems “bottom-up.” Ni,i’s are easy, so start with them Then do problems of “length” 2,3,… subproblems, and so on. Running time: O(n3 ) Algorithm matrixChain(S): Input: sequence S of n matrices to be multiplied Output: number of operations in an optimal parenthesization of S for i ← 1 to n − 1 do Ni,i ← 0 for b ← 1 to n − 1 do { b = j − i is the length of the problem } for i ← 0 to n − b - 1 do j ← i + b Ni,j ← +∞ for k ← i to j − 1 do Ni,j ← min{Ni,j, Ni,k + Nk+1,j + di dk+1 dj+1} return N0,n-1
  • 12.
    Dynamic Programming 12 answer N0 1 0 1 2 … n-1 … n-1j i Dynamic Programming Algorithm Visualization The bottom-up construction fills in the N array by diagonals Ni,j gets values from previous entries in i-th row and j-th column Filling in each entry in the N table takes O(n) time. Total run time: O(n3 ) Getting actual parenthesization can be done by remembering “k” for each N entry }{min 11,1,, +++ <≤ ++= jkijkki jki ji dddNNN i j
  • 13.
    Dynamic Programming 13 DynamicProgramming Algorithm Visualization A0: 30 X 35; A1: 35 X15; A2: 15X5; A3: 5X10; A4: 10X20; A5: 20 X 25 7125 } 1137520*10*3504375 ,712520*5*3510002625 ,1300020*15*3525000 min{ 5414,43,1 5314,32,1 5214,21,1 4,1 = =++=++ =++=++ =++=++ = dddNN dddNN dddNN N }{min 11,1,, +++ <≤ ++= jkijkki jki ji dddNNN
  • 14.
    Dynamic Programming 14 DynamicProgramming Algorithm Since subproblems overlap, we don’t use recursion. Instead, we construct optimal subproblems “bottom-up.” Ni,i’s are easy, so start with them Then do problems of “length” 2,3,… subproblems, and so on. Running time: O(n3 ) Algorithm matrixChain(S): Input: sequence S of n matrices to be multiplied Output: number of operations in an optimal parenthesization of S for i ← 1 to n − 1 do Ni,i ← 0 ; Tii i; for b ← 1 to n − 1 do { b = j − i is the length of the problem } for i ← 0 to n − b - 1 do j ← i + b Ni,j ← +∞; Tij i; for k ← i to j − 1 do If (Ni,j> Ni,k + Nk+1,j + di dk+1 dj+1) Ti,jk Ni,j ← min{Ni,j, Ni,k + Nk+1,j + di dk+1 dj+1} return N0,n-1
  • 15.
    Dynamic Programming 15 DynamicProgramming Algorithm Visualization (A0*(A1*A2))*((A3*A4)*A5)
  • 16.
    Dynamic Programming 16 TheGeneral Dynamic Programming Technique Applies to a problem that at first seems to require a lot of time (possibly exponential), provided we have:  Subproblem optimality: the global optimum value can be defined in terms of optimal subproblems  Subproblem overlap: the subproblems are not independent, but instead they overlap (hence, should be constructed bottom-up).
  • 17.
    Dynamic Programming 17 The0/1 Knapsack Problem Given: A set S of n items, with each item i having  wi - a positive weight  bi - a positive benefit Goal: Choose items with maximum total benefit but with weight at most W. If we are not allowed to take fractional amounts, then this is the 0/1 knapsack problem.  In this case, we let Tdenote the set of items we take  Objective: maximize  Constraint: ∑∈Ti ib ∑∈ ≤ Ti i Ww
  • 18.
    Dynamic Programming 18 Given:A set S of n items, with each item i having  bi - a positive “benefit”  wi - a positive “weight” Goal: Choose items with maximum total benefit but with weight at most W. Example Weight: Benefit: 1 2 3 4 5 4 in 2 in 2 in 6 in 2 in $20 $3 $6 $25 $80 Items: box of width 9 in Solution: • item 5 ($80, 2 in) • item 3 ($6, 2in) • item 1 ($20, 4in) “knapsack”
  • 19.
    Dynamic Programming 19 A0/1 Knapsack Algorithm, First Attempt Sk: Set of items numbered 1 to k. Define B[k] = best selection from Sk. Problem: does not have subproblem optimality:  Consider set S={(3,2),(5,4),(8,5),(4,3),(10,9)} of (benefit, weight) pairs and total weight W = 20 Best for S4: Best for S5:
  • 20.
    Dynamic Programming 20 A0/1 Knapsack Algorithm, Second Attempt Sk: Set of items numbered 1 to k. Define B[k,w] to be the best selection from Sk with weight at most w Good news: this does have subproblem optimality. I.e., the best subset of Sk with weight at most w is either  the best subset of Sk-1 with weight at most w or  the best subset of Sk-1 with weight at most w−wk plus item k    +−−− >− = else}],1[],,1[max{ if],1[ ],[ kk k bwwkBwkB wwwkB wkB
  • 21.
    Dynamic Programming 21 0/1Knapsack Algorithm Consider set S={(1,1),(2,2),(4,3),(2,2),(5,5)} of (benefit, weight) pairs and total weight W = 10
  • 22.
    Dynamic Programming 22 0/1Knapsack Algorithm Trace back to find the items picked
  • 23.
    Dynamic Programming 23 0/1Knapsack Algorithm Each diagonal arrow corresponds to adding one item into the bag Pick items 2,3,5 {(2,2),(4,3),(5,5)} are what you will take away
  • 24.
    Dynamic Programming 24 0/1Knapsack Algorithm Recall the definition of B[k,w] Since B[k,w] is defined in terms of B[k−1,*], we can use two arrays of instead of a matrix Running time: O(nW). Not a polynomial-time algorithm since W may be large This is a pseudo-polynomial time algorithm Algorithm 01Knapsack(S, W): Input: set S of n items with benefit bi and weight wi; maximum weight W Output: benefit of best subset of S with weight at most W let A and B be arrays of length W + 1 for w ← 0 to W do B[w] ← 0 for k ← 1 to n do copy array B into array A for w ← wk to W do if A[w−wk] + bk > A[w] then B[w] ← A[w−wk] + bk return B[W]    +−−− >− = else}],1[],,1[max{ if],1[ ],[ kk k bwwkBwkB wwwkB wkB
  • 25.
    Dynamic Programming 25 LongestCommon Subsequence Given two strings, find a longest subsequence that they share substring vs. subsequence of a string  Substring: the characters in a substring of S must occur contiguously in S  Subsequence: the characters can be interspersed with gaps. Consider ababc and abdcb alignment 1 ababc. abd.cb the longest common subsequence is ab..c with length 3 alignment 2 aba.bc abdcb. the longest common subsequence is ab..b with length 3
  • 26.
    Dynamic Programming 26 LongestCommon Subsequence Let’s give a score M an alignment in this way, M=sum s(xi,yi), where xi is the i character in the first aligned sequence yi is the i character in the second aligned sequence s(xi,yi)= 1 if xi= yi s(xi,yi)= 0 if xi≠yior any of them is a gap The score for alignment: ababc. abd.cb M=s(a,a)+s(b,b)+s(a,d)+s(b,.)+s(c,c)+s(.,b)=3 To find the longest common subsequence between sequences S1 and S2 is to find the alignment that maximizes score M.
  • 27.
    Dynamic Programming 27 LongestCommon Subsequence Subproblem optimality Consider two sequences Let the optimal alignment be x1x2x3…xn-1xn y1y2y3…yn-1yn There are three possible cases for the last pair (xn,yn): S1: a1a2a3…ai S2: b1b2b3…bj
  • 28.
    Dynamic Programming 28 LongestCommon Subsequence Mi,j = MAX {Mi-1, j-1 + S (ai,bj) (match/mismatch) Mi,j-1 + 0 (gap in sequence #1) Mi-1,j + 0 (gap in sequence #2) } Mi,jis the score for optimal alignment between strings a[1…i] (substring of a from index 1 to i) and b[1…j] S1: a1a2a3…ai S2: b1b2b3…bj There are three cases for (xn,yn) pair: x1x2x3…xn-1xn y1y2y3…yn-1yn
  • 29.
    Dynamic Programming 29 Examples: GA A T T C A G T T A (sequence #1) G G A T C G A (sequence #2) s(ai,bj)= 1 if ai=bj s(ai,bj)= 0 if ai≠bj or any of them is a gap Mi,j = MAX { Mi-1, j-1 + S(ai,bj) Mi,j-1 + 0 Mi-1,j + 0 } Longest Common Subsequence
  • 30.
    Dynamic Programming 30 LongestCommon Subsequence M1,1 = MAX[M0,0 + 1, M1, 0 + 0, M0,1 + 0] = MAX [1, 0, 0] = 1 Fill the score matrix M and trace back table B Score matrix M Trace back table B
  • 31.
    Dynamic Programming 31 LongestCommon Subsequence Score matrix M Trace back table B M7,11=6 (lower right corner of Score matrix) This tells us that the best alignment has a score of 6 What is the best alignment?
  • 32.
    Dynamic Programming 32 LongestCommon Subsequence We need to use trace back table to find out the best alignment, which has a score of 6 (1) Find the path from lower right corner to upper left corner
  • 33.
    Dynamic Programming 33 LongestCommon Subsequence (2) At the same time, write down the alignment backward :Take one character from each sequence :Take one character from sequence S1 (columns) :Take one character from sequence S2 (rows) S1 S2
  • 34.
    Dynamic Programming 34 LongestCommon Subsequence :Take one character from each sequence :Take one character from sequence S1 (columns) :Take one character from sequence S2 (rows)
  • 35.
    Dynamic Programming 35 LongestCommon Subsequence Thus, the optimal alignment is The longest common subsequence is G.A.T.C.G..A There might be multiple longest common subsequences (LCSs) between two given sequences. These LCSs have the same number of characters (not include gaps)
  • 36.
    Dynamic Programming 36 LongestCommon Subsequence Algorithm LCS (string A, string B) { Input strings A and B Output the longest common subsequence of A and B M: Score Matrix B: trace back table (use letter a, b, c for ) n=A.length() m=B.length() // fill in M and B for (i=0;i<m+1;i++) for (j=0;j<n+1;j++) if (i==0) || (j==0) then M(i,j)=0; else if (A[i]==B[j]) M(i,j)=max {M[i-1,j-1]+1, M[i-1,j], M[i,j-1]} {update the entry in trace table B} else M(i,j)=max {M[i-1,j-1], M[i-1,j], M[i,j-1]} {update the entry in trace table B} then use trace back table B to print out the optimal alignment …