KEMBAR78
Assignment 2 | PDF | Sequence Alignment | Dynamic Programming
0% found this document useful (0 votes)
154 views3 pages

Assignment 2

This document provides instructions for Assignment 2 of CSE 565 Fall 2021. It includes 9 problems related to algorithms on strings and dynamic programming. Students are asked to design algorithms to solve problems like sequence alignment with gaps, longest increasing subsequence, and currency exchange rates. Algorithms should have time complexities of O(mn), O(n log n), or O(n3). The assignment is due on September 29th at 11:59pm and should be submitted on Gradescope.

Uploaded by

guanlin He
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views3 pages

Assignment 2

This document provides instructions for Assignment 2 of CSE 565 Fall 2021. It includes 9 problems related to algorithms on strings and dynamic programming. Students are asked to design algorithms to solve problems like sequence alignment with gaps, longest increasing subsequence, and currency exchange rates. Algorithms should have time complexities of O(mn), O(n log n), or O(n3). The assignment is due on September 29th at 11:59pm and should be submitted on Gradescope.

Uploaded by

guanlin He
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment 2 CSE 565 Fall 2021 (Instructor: Mingfu Shao) Due: Sep.

29, 11:59pm

INSTRUCTIONS:

1. Use the provided latex template to write your solutions.

2. Submit your solution to Gradescope: make sure only one of your group members submits. After
submitting, make sure to add your group members.

Problem 1 (10 points).


In the lecture, we learned a O(mn) dynamic programming (DP) algorithm to compute edit distance of two
strings, aka sequence alignment. Similar algorithms are widely used in bioinformatics where scientists often
compare similarity of two given DNA sequences.

Here you are required to design a DP algorithm to compute the edit penalty of the optimal alignment of two
give strings, but with a higher gap penalty. The biological implication is that mutations (i.e. mismatches)
happens much more frequently than insertions/deletions (i.e. gaps) in a DNA sequence. Also, extending
a gap (i.e. a longer gap) is more favored than opening a new gap (i.e. two short gaps). More specifically,
in the edit distance problem, the penalty (i.e., the edit distance) is incremented by 1 for a mismatch or a
gap between S1 and S2 . In this problem with refined gap model, the penalty is still incremented by 1 for a
mismatch, but by a when opening a new gap, and by b when extending an existing gap (a > b ≥ 3). For
example, the total penalty for alignment AA CC × AAGGCT is (a + b + 1) where a is for opening the gap at
position 3, b for extending the gap at position 4, and 1 for a mismatch at position 6.

Problem: Sequence alignment with gap penalty.

Input: Two strings S1 [1 · · · n] and S2 [1 · · · m] over alphabet Σ = {A,C, G,U}, a, b, a > b ≥ 3.

Output: the alignment between S1 and S2 with minimized total penalty.

Design a dynamic programming algorithm for above problem. Show correctness and complexity analysis
of your algorithm. Your algorithm should run in O(mn) time.

Hint: Use three DP tables. One for ending with a gap in S1 , one for ending with a gap in S2 , one for
match/mismatch only.

Problem 2 (10 points).


Given two strings S1 [1 · · · n] and S2 [1 · · · m], design an O(nm) algorithm to decide the optimal alignment
between S1 [1 · · · n] and S2 [1 · · · m] is unique.

Problem 3 (10 points).


Recall from the class we talked about the sequence alignment problem where we use dynamic programming
to calculate the edit distance between two strings. Given two strings A and B have the same length of n
characters, and a threshold k, 0 < k < n, design an algorithm to check whether the edit distance between A
and B is less than k. Your algorithm should run in O(nk) time.

Problem 4 (10 points).


We introduced longest increasing subsequence (LIS) problem in the lecture, and defined an DP algorithm
runs in O(n2 ) time. Now we want to further improve it: design an algorithm runs in O(n log n) time to solve
this problem but only finds the length of the longest increasing subsequence of the given array S[1 · · · n], i.e.,
this algorithm don’t need to find the actual longest increasing subsequence.

1
Assignment 2 CSE 565 Fall 2021 (Instructor: Mingfu Shao) Due: Sep. 29, 11:59pm

Here are some hints: we introduce a new array D, where D[i] represents the minimum value of the last
element in the increasing subsequence of length i. For example, if we have S = [2, 1, 4, 5], then we’ll have
D[1] = 1, D[2] = 4 and D[3] = 5. (There are four increasing subsequences of length 1, and 1 is the smallest;
among the four increasing subsequences of length 2, i.e., [2, 4], [2, 5], [1, 4] and [1, 5], 4 is the minimum
last element; and both the two increasing subsequences of length 3 end with 5.) In your algorithm, you can
enumerate the index i from 1 to n, and use the value of S[i] to update the array D. Meanwhile, we also need
to keep updating the length of LIS. Further hint: binary search might be needed.

Problem 5 (10 points).


Let A = a1 a2 · · · an and B = b1 b2 · · · bm be two strings. Design a dynamic programming algorithm to compute
the longest common subsequence between A and B, i.e., to find the largest k and indices 1 ≤ i1 < i2 < · · · <
ik ≤ n and 1 ≤ j1 < j2 < · · · < jk ≤ m such that A[i1 ] = B[ j1 ], A[i2 ] = B[ j2 ], · · · , A[ik ] = B[ jk ]. Your algorithm
should run in O(mn) time.

Problem 6 (10 points).


Given a string S of length n and a positive integer k, find the longest continuous repeat of a substring
which length is k. For example, when S = ACGACGCGCGACG, if k = 2, the longest continuous repeat
would be CGCGCG composed of CG repeating 3 times. And if k = 3, the longest continuous repeat would
be ACGACG composed of ACG repeating twice. Design a dynamic programming algorithm to solve this
problem and analyze its running time.

Problem 7 (10 points).


You have a wood plank in a shape of a convex polygon with n edges, represented as the (circular) list of the
2D coordinates of its n vertices. You want to cut it into (n−2) triangles using (n−3) non-crossing diagonals,
such that the total length of the cutting trace (i.e., total length of the picked diagonals) is minimized. Design
a dynamic programming algorithm runs in O(n3 ) time to solve this problem. Your solution should include
definition of subproblems, a recursion, how to calculate the minimized total length (i.e., the termination
step), and an explanation of why your algorithm runs in O(n3 ) time.

For example, in the convex polygon given below, one possible way to cut it into triangles is illustrated with
red dashed lines, and in this case the total length of cutting trace is BG + BF +CF + DF.

Problem 8 (16 points).


Let c1 , c2 , · · · , cn be various currencies. For any two currencies ci and c j , there is an exchange rate ri, j ; this
means that you can purchase ri, j units of currency c j in exchange for one unit of ci . These exchange rates
satisfy the condition that ri, j · r j,i < 1, so that if you start with a unit of currency ci , change it into currency c j
and then convert back to currency ci , you end up with less than one unit of currency ci (the difference is the
cost of the transaction). Give an efficient algorithm for the following problem: given a set of exchange rates

2
Assignment 2 CSE 565 Fall 2021 (Instructor: Mingfu Shao) Due: Sep. 29, 11:59pm

ri, j , and two currencies s and t, find the most advantageous sequence of currency exchanges for converting
currency s into currency t. (Assume that, there does not exists sequence of currencies c1 , c2 , · · · , ck such that
r1,2 · r2,3 · r3,4 · · · rk−1,k · rk,1 > 1.)

Occasionally the exchange rates satisfy the following property: there is a sequence of currencies c1 , c2 , · · · , ck
such that r1,2 · r2,3 · r3,4 · · · rk−1,k · rk,1 > 1. This means that by starting with a unit of currency c1 and then
successively converting it to currencies c2 , c3 , · · ·, ck , and finally back to c1 , you would end up with more
than one unit of currency c1 . Give an efficient algorithm for detecting the presence of such an anomaly.

Problem 9 (10 points).


There are n trains (X1 , X2 , · · · , Xn ) moving in the same direction on parallel tracks. Train Xk moves at constant
speed vk , and at time t = 0, is at position sk . Train Xk will be awarded, if there exists a time period [t1 ,t2 ] of
length at least δ (i.e., 0 ≤ t1 < t2 and t2 − t1 ≥ δ) such that during this entire time peroid Xk is in front of all
other trains (it is fine if Xk is behind some other train prior to t1 or Xk is surpassed by some other train after
t2 ). Given vk and sk , 1 ≤ k ≤ n, and δ > 0, design an algorithm to list all trains that will be awarded. Your
algorithm should run in O(n · log n) time.

You might also like