100% found this document useful (1 vote)

52 views107 pages

2024 Bioinformatics Algorithms Day 2

Uploaded by

ast56erx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

52 views107 pages

2024 Bioinformatics Algorithms Day 2

Uploaded by

ast56erx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

Bioinformatics

Algorithms
Stephan Peischl
Interfaculty Unit for Bioinformatics
Baltzerstrasse 6
CH-3012 Bern
Switzerland

Email: stephan.peischl@unibe.ch
Sequence Alignment
GATTACA

insertion
GAATTACA substitution
GATTACC

substitution
TAATTACA

GAATTACA TAATTACA GATTACC

22.09.24 Bioinformatics Algorithms 2

DNA and Protein sequences
• DNA (Deoxyribonucleic acid) uses a 4-letter alphabet (A,T,C,G)
• RNA (Ribonucleic acid) also uses a 4-letter alphabet (A,U,C,G)
• Proteins use 20 amino acids (A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y)
• We will represent sequences of DNA, RNA or Proteins as strings
e.g.: ATCTGTTA, TGCCATA, ...

22.09.24 Bioinformatics Algorithms 3

Sequence evolution
GATTACA

GAATTACA
GATTACC

TAATTACA

GAATTACA TAATTACA GATTACC

22.09.24 Bioinformatics Algorithms 4

Sequence evolution
GATTACA
We know real alignment:
insertion
GAATTACA substitution GAATTACA
GATTACC TAATTACA
GA-TTACC

substitution
TAATTACA How can we align sequences
if we don’t know the exact
evolutionary history?
GAATTACA TAATTACA GATTACC

22.09.24 Bioinformatics Algorithms 5

Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA,
RNA, or protein to identify regions of similarity that may be a consequence of functional,
structural, or evolutionary relationships between the sequences.

If two sequences in an alignment share a common ancestor, mismatches can be

interpreted as point mutations and gaps as indels (that is, insertion or deletion
mutations) introduced in one or both lineages in the time since they diverged from one
another.
AAAGCGGAAGTCACAG AAGGCTGAAGTATAG

There are two different ways to align sequences:

AAAGCGGAAGTCACAG
1. Global alignment assumes that the two proteins are basically similar over the ||.||.||||| |.||
entire length of one another. The alignment attempts to match them to each other
from end to end. AAGGCTGAAGT-ATAG
2. Local alignment searches for segments of the two sequences that match well. AAAGCGGAAGTCACAG
There is no attempt to force entire sequences into an alignment, just those parts ......||||| ....
that appear to have good similarity, according to some criterion.
AAGGCTGAAGT-ATAG

22.09.24 Bioinformatics Algorithms 6

Manhattan tourist algorithm
The problem: Given a weighted grid G, travel from source (top left)
to the sink (bottom right) along the highest scoring path only travelling
south and east.

There are only finitely many possible paths and

we can solve problem by trying all of them.

But what does that have to do with sequence

alignments? (we will see later….)
22.09.24 Bioinformatics Algorithms 7
Manhattan Toursit

Trying all possible routes will

give us best solution, but as we
have ssen before, this is
probably not the best way to
tackle this problem!

22.09.24 Bioinformatics Algorithms 8

Manhattan Toursit

There are
𝑛+𝑚 𝑛+𝑚 !
=
𝑛 𝑛! 𝑚!
possible paths!

Therfore, an exhaustive search

!"# !
algorithm is O (! ! !&# !).

22.09.24 Bioinformatics Algorithms 9

22.09.24 Bioinformatics Algorithms 15

Manhattan Tourist
Let’s have a look at a more general problem:

What is the best way (i.e., highest scoring

path) to get from the source (upper left
corner) to any location on the grid?

Even though this may seem a stupid thing to do at first glance.... Instead
of one question we now have to answer n x m questions!

22.09.24 Bioinformatics Algorithms 16

Dynamic programming
• Dynamic programming describes breaking a large problem into
several smaller ones that are easier to solve
• Solving the more general problem of how we can get from the source
to any location is as easy as solving the more specific problem of
going from source to sink and...
• ... we get the solution of the specific problem en route!
• Finding the best path to a cell is easy because there are only two
possible ways to get there
• These observations outline the key ideas of dynamic programming

22.09.24 Bioinformatics Algorithms 17

Manhattan Toursit
Many paths are calculated multiple
times.

We can split the problem into

small peices by first calculating a
set of sub-paths, and then use this
information to find the optimal
global path.

How shall we do this?

22.09.24 Bioinformatics Algorithms 18

Manhattan Tourist Algorithm
1. We start in the source and put a 0 in the source

22.09.24 Bioinformatics Algorithms 19

Manhattan Tourist Algorithm
1. We start in the source and put a 0 in the source
2. Next, we fill in the top row and leftmost column
with the according scores (note that there is
only onw way to get to each cell!)
3. The next cell to fill is (2,2) – there are only two
ways to get there: from west or from north.
4. We take the higher scoring path and enter the
score

22.09.24 Bioinformatics Algorithms 20

22.09.24 Bioinformatics Algorithms 21

22.09.24 Bioinformatics Algorithms 22

22.09.24 Bioinformatics Algorithms 23

22.09.24 Bioinformatics Algorithms 24

Manhattan Tourist Algorithm
Finally, there is a single (or multiple) optimal path(s) left:

22.09.24 Bioinformatics Algorithms 25

The Algorithm
SE(i,j), SS(i,j) .... Score to come to (i,j) from west or north
S .... nxm Matrix with score of highest scoring path
D ... nxm Matrix with direction of highest scoring path

S(0,0) = 0
for (i in 1:n) S(i,0) = S(i-1,0) + SS(i,0)
for(j in 1:m) S(0,j) = S(0,j-1) + SE(0,j)
For(i in 1:n)
for(j in 1:m)
S(i,j) = max(S(i-1,j) + SS(i,j), S(i,j-1) + SE(i,j))
D(i,j) = path.to(i,j)

22.09.24 Bioinformatics Algorithms 26

The Algorithm
SE(i,j), SS(i,j) .... Score to come to (i,j) from east or south
S .... nxm Matrix with score of highest scoring path
D ... nxm Matrix with direction of highest scoring path
Complexity:
S(0,0) = 0 T(n) = C1 + C2 (n + m) + C3 n m
for (i in 1:n) S(i,0) = S(i-1,0) + SS(i,0) = O(n m)
for(j in 1:m) S(0,j) = S(0,j-1) + SE(0,j)
For(i in 1:n)
for(j in 1:m)
S(i,j) = max(S(i-1,j) + SS(i,j), S(i,j-1) + SE(i,j))
D(i,j) = path.to(i,j)

22.09.24 Bioinformatics Algorithms 27

Global alignment
What does sequence alignment have to do
with the Manhattan Tourist algorithm?

AAAGCGGAAGTCACAG We can represent sequences as strings and

||.||.||||| |.|| give each alignment between two sites a score:
AAGGCTGAAGT-ATAG
w(i,j) = 1 if site i and j are equal (match)
w(i,j) = -1 if site i and j are not equal (substitution
or insertion/deletion )

22.09.24 Bioinformatics Algorithms 28

Global alignment
What does sequence alignment have to do
with the Manhattan Tourist algorithm?

AAAGCGGAAGTCACAG We can represent sequences as strings and

||.||.||||| |.|| give each alignment between two sites a score:
AAGGCTGAAGT-ATAG
w(i,j) = 1 if site i and j are equal (match)
w(i,j) = -1 if site i and j are not equal (substitution
1+1-1+1+1-1+1+1+1+1+1-1+1-1+1+1 or insertion/deletion )

22.09.24 Bioinformatics Algorithms 29

Global alignment
What does sequence alignment have to do
with the Manhattan Tourist algorithm?

AAAGCGGAAGTCACAG We can represent sequences as strings and

22.09.24 Bioinformatics Algorithms 30

Global alignment
We now have three possible ways to from
a site to the next:

match, subsitution, or insertion/deletion

If we label a grid the letters of the two

strings, we can represent alignments as
paths through the grid.

Matches and substitutions are represented

by going diagonally to the next cell and
indels by going down.
22.09.24 Bioinformatics Algorithms 31
Global alignment
We now have three possible ways to from
a site to the next:

match, subsitution, or insertion/deletion

ATCGT-AC-
If we label a grid the letters of the two
||.||.|..
strings, we can represent alignments as
paths throughAT-GTTA-T
the grid.

Matches and substitutions are represented

by going diagonally to the next cell and
indels by going down/right.
22.09.24 Bioinformatics Algorithms 32
Global alignment
We now have:
• how to find the highest scoring path on
a grid
• how to asign scores to sequence
alignments
• How to represent sequence-alignments
on a grid
AAAGCGGAAGTCACAG
||.||.||||| |.||
AAGGCTGAAGT-ATAG

1+1-1+1+1-1+1+1+1+1+1-1+1-1+1+1
22.09.24 Bioinformatics Algorithms 33
Needleman-wunsch algorithm
w(i,j,x) .... Score to come to (i,j) via x (= match, mismatch, indel)
S .... nxm Matrix with score of highest scoring path
D ... nxm Matrix with direction of highest scoring path

S(0,0) = 0
for (i in 1:n) S(i,0) = S(i-1,0) + SS(i,0)
for(j in 1:m) S(0,j) = S(0,j-1) + SE(0,j)
For(i in 1:n)
for(j in 1:m)
S(i,j) = max(w(i,j,x))
D(i,j) = path.to(i,j)

22.09.24 Bioinformatics Algorithms 34

References
• Needlman and Wunsch (1970) - A general method applicable to the
search for similarities in the amino acid sequence of two proteins
• Online Tools, e.g.: BLAST:

22.09.24 Bioinformatics Algorithms 35

Comparing strings
To compare strings, there are two common measures:
1. Edit distance: the minimum number of operations (insertions, deletions
and substitutions) required to transform 1 string into another
AAAGCGGAAGTCACAG
||.||.|||||-|.|| Edit distance: 4
AAGGCTGAAGT-ATAG

2. Hamming distance: the number of differences when comparing the ith

value of a string against the ith value of another
AAAGCGGAAGTCACAG
||.||.|||||....- Hamming distance: 7
AAGGCTGAAGTATAG

22.09.24 Bioinformatics Algorithms 36

Which distance?
DNA sequences are subject to insertions, deletions, and substitutions.

Subsitutions can be measured by the edit distance, but because insertions and deletions occur as
well, we usually do not know which is the «ith» position.

Thus, the edit distance seems to be more appropriate for sequence alignment.

e.g.: ATATATAT and TATATATA

Edit distance = 2 Hamming distance = 7

-ATATATAT ATATATAT
-|||||||- ........
TATATATA TATATATA

22.09.24 Bioinformatics Algorithms 37

Scoring function
The simplest form of the edit distance gives equal weight to substitutions, insertions or
deletions.

This is not reflecting biological reality because not all substitutions occur with the same
probability, substitutions are more/less likely than insertions or deleteions, etc.

We therfore introduce a scoring function that assigns a score (or penality) to each «edit».

For instance, if we penalize mismatches by μ, insertions/deletions by σ, and score matches

by 1, we can calculate the edit distance for two strings by:

# matches – μ # mismatches – σ # insertions/deletions

22.09.24 Bioinformatics Algorithms 38

Example
AAAGCGGAAGTCACAG AAAG-C-GGAAGTCACAG
||.||.|||||-|.|| ||-|-|-|-||||-|.||
AAGGCTGAAGT-ATAG AA-GGCTG-AAGT-ATAG

#matches = 12 #matches = 12
#mismatches = 3 #mismatches = 1
# indels = 1 # indels = 5

distance = 12 – 3 μ – σ distance = 12 – μ – 5 σ

22.09.24 Bioinformatics Algorithms 39

Example
AAAGCGGAAGTCACAG AAAG-C-GGAAGTCACAG
Which is better||-|-|-|-||||-|.||
alignment?
||.||.|||||-|.||
AAGGCTGAAGT-ATAG d1 > d2 or AA-GGCTG-AAGT-ATAG
d1 < d2 ?
d1 = 12 – 3 μ – σ
μ = 1, σ = 1: d1 > d2 (8 > 6)
#matches = 12 #matches = 12
#mismatches = 3 μ = 2, σ = 1: d1 #mismatches
= d2 (5 = 5) =1
d2 = 12 – μ – 5 σ
# indels = 1 # indels = 5
μ = 3, σ = 1: d1 < d2 (2 < 4)
distance = 12 – 3 μ – σ distance = 12 – μ – 5 σ

22.09.24 Bioinformatics Algorithms 40

Example
AAAGCGGAAGTCACAG AAAG-C-GGAAGTCACAG
||.||.|||||-|.|| ||-|-|-|-||||-|.||
AAGGCTGAAGT-ATAG AA-GGCTG-AAGT-ATAG
How we score matches, mismatches
and indels impacts the outcome
#matches = 12
of the
#matches = 12
#mismatches = 3 sequence alignment!#mismatches = 1
# indels = 1 # indels = 5

distance = 12 – 3 μ – σ distance = 12 – μ – 5 σ

22.09.24 Bioinformatics Algorithms 41

Needleman-wunsch algorithm
w(i,j,x) .... Score to come to (i,j) via x (= match, mismatch, indel)
S .... nxm Matrix with score of highest scoring path
D ... nxm Matrix with direction of highest scoring path

S(0,0) = 0
for (i in 1:n) S(i,0) = S(i-1,0) + SS(i,0)
for(j in 1:m) S(0,j) = S(0,j-1) + SE(0,j)
For(i in 1:n)
for(j in 1:m)
S(i,j) = max(w(i,j,x))
D(i,j) = path.to(i,j)

22.09.24 Bioinformatics Algorithms 42

Example Needlman-Wunsch

TGATACCA
-|.||.||
-GCTAACA

μ=1
σ=1

22.09.24 Bioinformatics Algorithms 43

Example Needlman-Wunsch

TGATACCA
-|.||.||
-GCTAACA

μ=1
σ=1

--TGATACCA
--|-|-||-|
GCT-A-AC-A

μ=2
σ=1

22.09.24 Bioinformatics Algorithms 44

Scoring matrix
We can define a function that incorporates empirical
evidence about the frequency at which a given amino
acid/nucelotide/ etc. is subsituted by another. A C T G
A dAA dAC dAT dAG

One problem: C dCA dCC dCT dAG

To get the the empirical data to construct scoring
functions, one needs to align sequences!
T dTA dTC dTT dAG

Solution: G dGA dGC dGT dGG

If we have «good» data, we can use a naive scoring
function and incrementally improve it.

22.09.24 Bioinformatics Algorithms 45

Modifications to
Needleman-Wunsch algorithm
• So far we have assumed that indels are always of length 1
• A single indel of length n is, however, more likely than n small indels
• An indel of length n should therfore have a smaller penalty than n
indels of length 1
• We can implement this in our algorithm by changing the score
function for indels:

σ1 if new indel
w(i,j,indel) =
σ2 if extension of «open» indel

22.09.24 Bioinformatics Algorithms 46

Modifications to
Needleman-Wunsch algorithm
An alignment with two long indels might be biologically more plausible
than an alignment with fewer indels:

ACCTAACCGGTAGATAC--- ACCT-A-AC-CGGTAGATAC
||||------||||.||--- ||||-|-||-|--||||.||
ACCT------TAGAGACGAC ACCTTAGACAC--TAGAGAC

2 long indels (9 positions in total) 4 short indels (5 positions in total)

1 substitution 1 substitution

22.09.24 Bioinformatics Algorithms 47

Modifications to
Needleman-Wunsch algorithm
• It can also make sense to allow long indels
at beginning and end of alignment:

---AAGCATAGCCAT
---||||||||----
CTGAAGCATAG----

• This can be achieved by simply set the first

row and first column to zero in the
initalization step of the NW algorithm

22.09.24 Bioinformatics Algorithms 48

Modifications to
Needleman-Wunsch algorithm
• It can also make sense to allow long indels
at beginning and end of alignment:

---AAGCATAGCCAT
---||||||||----
CTGAAGCATAG----

• This can be achieved by simply set the first

row and first column to zero in the
initalization step of the NW algorithm

22.09.24 Bioinformatics Algorithms 49

Modifications to
Needleman-Wunsch algorithm
• A simple heuristic approximation for
the NW algorithm can be obtained by
assuming that the best alignmnet
should be close to the diagonal of the
scoring matrix
• We can therfore improve space and
time complexity considerably by
considering only a «band» along the
main diagonal

22.09.24 Bioinformatics Algorithms 50

Complexity of NW algorithm
• It is easy to see from the pseudo-code that NW algorithm has
• space complexity O(n m) (store nxm matrix) and
• time complexity O(n m) (calculate entries of nxm matrix)
• We can reduce the space complexity to O(n) at the cost of doubling
the computation time
• We need to store only 2 n entries to calculate the next column of
entries
• But we need to backtrack the highest scoring path after the calculation:
this requires to store a nxm matrix = O(nm)
• How can we calculate alignment in O(n) space?

22.09.24 Bioinformatics Algorithms 51

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time

Step 1)

It is easy to calculate just the scores of paths

in linear space because we only need to store
at most 2n elements to calculate the next row
of entries.

22.09.24 Bioinformatics Algorithms 52

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time
If (U,V) = NW(X,Y) is the alignment of sequences X and Y then we can
see the following:

Let Xl and Xr be a partition of X than there exists a partitioning Yl and

Yr such that NW(X,Y) = NW(Xl,Yl) + NW(Xr,Yr)
ACCT-A-AC-CGGTAGATAC ACCT-A-AC-C GGTAGATAC
||||-|-||-|--||||.|| ||||-|-||-| --||||.||
ACCTTAGACAC--TAGAGAC ACCTTAGACAC --TAGAGAC

22.09.24 Bioinformatics Algorithms 53

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time
If (U,V) = NW(X,Y) is the alignment of sequences X and Y then we can see
the following:

Let Xl and Xr be a partition of X than there exists a partitioning Yl and Yr

such that NW(X,Y) = NW(Xl,Yl) + NW(Xr,Yr)

ACCT-A-AC-CGGTAGATAC ACCT-A-AC-C GGTAGATAC

||||-|-||-|--||||.|| ||||-|-||-| --||||.||
ACCTTAGACAC--TAGAGAC ACCTTAGACAC --TAGAGAC

X = Xl + Xr = ACCTAACC + GGTAGATAC, Y = Yl + Yr = ACCTTAGACAC+ TAGAGAC

22.09.24 Bioinformatics Algorithms 54

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time

We start by finding the middle, i.e.

the point on the middle column
of the matrix that divides the highest scoring
path into two sub-paths.

This can be done in O(n) space and O(n m/2)

time.

Note that we only need to find the cell with

the highest score and note the whole path!

That‘s why we can do it in O(n) space.

22.09.24 Bioinformatics Algorithms 55

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time

When we now the location of the «middle»,

we can simply repeat this process for the
remaining upper and lower sub-matrices to get
the location of next «middle» of the alignment
path.

This can be done in time proportional to the

area of the submatrices (gray rectangles).

22.09.24 Bioinformatics Algorithms 56

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time

We recursively repeat this until we have

reconstructed the whole sequence alignment, i.e.,
until we have found the «middle» for all columns.

At each step, this can be done in time

proportional to the area of the submatrices.

22.09.24 Bioinformatics Algorithms 57

From O(n^2) -> O(n) (in terms of memory)
by doubling the computation time

We recursively repeat this until we have

reconstructed the whole sequence alignment, i.e.,
until we have found the «middle» for all columns.

At each step, this can be done in time proportional

to the area of the submatrices.

Total time =
nm + nm/2 + nm/4 + nm/8 + ... ≤ 2 nm = O(n m)
......
Total space = O(n)

22.09.24 Bioinformatics Algorithms 58

From O(n2) -> O(n) (in terms of memory)
by doubling the computation time
• This is an example of a «divide and conquer» algorithm:

A divide and conquer algorithm works by recursively breaking down a

problem into two or more sub-problems of the same (or related)
type, until these become simple enough to be solved directly. The
solutions to the sub-problems are then combined to give a solution
to the original problem.

22.09.24 Bioinformatics Algorithms 59

Implementation in R
• Open the neddleman wunsch R script from ilias
• Run the code
• Try some examples: Score for match

Penalty for
plotNeedlemanWunsch("GCTAACA","TGATACCA",1,1,1) insertion/deletion

Sequence 1 Sequence 2
Penalty for mismatch

22.09.24 Bioinformatics Algorithms 60

Needleman−Wunsch

Implementation in R match = 1

T
mismatch = −1

G A T A
gap = −1

C C A

0 −1 −2 −3 −4 −5 −6 −7 −8

A −1 −1 −2 −1 −2 −3 −4 −5 −6

plotNeedlemanWunsch T −2 0 −1 −2 0 −1 −2 −3 −4

("ATTCCAATA","TGATACCA",1,1,1) T −3 −1 −1 −2 −1 −1 −2 −3 −4

C −4 −2 −2 −2 −2 −2 0 −1 −2
--ATTCCAATA
--||.|||--- C −5 −3 −3 −3 −3 −3 −1 1 0
TGATACCA
A −6 −4 −4 −2 −3 −2 −2 0 2

A −7 −5 −5 −3 −3 −2 −3 −1 1
--ATTCCAATA
--||.||---| T −8 −6 −6 −4 −2 −3 −3 −2 0
TGATACC---A
A −9 −7 −7 −5 −3 −1 −2 −3 −1

22.09.24 Bioinformatics Algorithms 61

Needleman−Wunsch

Implementation in R match = 1

T
mismatch = −1

G A T A
gap = −5

C C A

0 −5 −10 −15 −20 −25 −30 −35 −40

A −5 −1 −6 −9 −14 −19 −24 −29 −34

plotNeedlemanWunsch T −10 −4 −2 −7 −8 −13 −18 −23 −28

("ATTCCAATA","TGATACCA",1,1,5) T −15 −9 −5 −3 −6 −9 −14 −19 −24

C −20 −14 −10 −6 −4 −7 −8 −13 −18

ATTCCAATA
C −25 −19 −15 −11 −7 −5 −6 −7 −12
-|...|..|
-TGATACCA A −30 −24 −20 −14 −12 −6 −6 −7 −6

A −35 −29 −25 −19 −15 −11 −7 −7 −6

T −40 −34 −30 −24 −18 −16 −12 −8 −8

A −45 −39 −35 −29 −23 −17 −17 −13 −7

22.09.24 Bioinformatics Algorithms 62

Scores for amino acids
• The scoring matrices for DNA sequence comparison are usually
defined only by a few parameters
• Random mutations of the nucleotide sequence within a gene may
change the amino acid sequence of the corresponding protein.
• Some of these mutations do not drastically alter the protein’s
structure, but others do and impair the protein’s ability to function.
• The common matrices for protein sequence comparison are point
accepted mutations (PAM) and block substitution (BLOSUM)
• They reflect the frequency with which amino acid x replaces amino
acid y in evolutionarily related sequences

22.09.24 Bioinformatics Algorithms 63

PAM scoring matrices
PAM = Point (or Percentage) Accepted Mutations.
(Dayhoff, Schwartz, and Orcutt, 1978).

Proteins that have, on average, only one mutation per 100 amino acids are
defined as being one PAM unit diverged.

Roughly speaking: 1 PAM is the amount of time in which an “average” protein

mutates 1% of its amino acids.

Accepted mutations are the ones that are able to spread in the population
(typically mutations that do not disrupt the protein function or even increase
fitness).

22.09.24 Bioinformatics Algorithms 64

PAM scoring matrices
1) Given a set of base alignments, define f(i, j) as the total number of times amino acids i and j are aligned
against each other, divided by the total number of aligned positions.
2) We also define g(i, j) as f(i,j)/f(i) , where f(i) is the frequency of amino acid i in all proteins (or seqeunces)
from the data set. g(i, j) defines the probability that an amino acid i mutates into amino acid j within 1
PAM unit.
3) The (i, j) entry of the PAM1 matrix is defined as
log (f(i,j)/ (f(i)·f(j))) = log (g(i,j)/f(j))
Note that f(i) f(j) is the frequency of alignment i against j by chance.
4) The PAM n matrix can be defined as the result of applying the PAM 1 matrix n times.
5) Multiplying g by itself n times gives the probability that amino acid i mutates into amino acid j during n
PAM units.
6) The (i, j) entry of the PAM n matrix is then defined as log(gn(i,j)/f(j)).

22.09.24 Bioinformatics Algorithms 65

PAM matrix
• Usually one starts with sequences that are easy to align because they
are not strongly diverged (e.g., human-chimp)
• Align with naive scoring function
• Pick subset that satisfy the condition that 1% of the sequence is
mutated
• Calculate PAM 1 matrix
• Calculate PAM n matrix

22.09.24 Bioinformatics Algorithms 66

Multiple alignments
• So far we have only seen how to align 2 sequences
• What if we want to align 3 or more sequences?
• Mulitple pariwise alignment:

22.09.24 Bioinformatics Algorithms 67

Multiple pairwise alignment
• In this case, the pairwise alignments are compatible:

22.09.24 Bioinformatics Algorithms 68

Incompatible Alignments
• This is not always the case:

22.09.24 Bioinformatics Algorithms 69

Multiple alignment: three sequences
We can represent all alignments
of three sequences as paths on a
3-dimensional grid.

We now have more directions to move:

- match in 2 sequences, mismatch in 1 (2x)
- match in 2 sequences, indel in 1 (2x)
- match in 3 sequences
- Mismatch in all sequences
- indel in 2 sequences (2x)

22.09.24 Bioinformatics Algorithms 70

Multiple alignment: three sequences
We can represent all alignments
of three sequences as paths on a
3-dimensional grid.

We now have more directions to move:

- match in 2 sequences, mismatch in 1 (2x)
- match in 2 sequences, indel in 1 (2x)
- match in 3 sequences
- Mismatch in all sequences
- indel in 2 sequences (2x)

22.09.24 Bioinformatics Algorithms 71

Multiple alignment algorithm
• Needleman-Wunsch algorithm can be extend to arbitrarily many
sequences
• Two modifications:
1. Calculate highest score over all possible “directions”
2. Store path-pointers in a k-dimensional array instead of a matrix
• Idea of algorithm remains the same
• Time and space complexity increases with number of sequences

22.09.24 Bioinformatics Algorithms 72

Complexity
Unforutnately, time-complexity for k sequences is O(nk)
Space-complexity is also O(nk)
Multiple heuristics have been proposed
• Compute all possible pairwise alignments and find best multiple alignment
Sometimes difficult because of incompatible aligmnents
still demanding: (k 2) possible pairwise alginments
• Align sequences iterativels: start with strong pairwise alignment and join one
sequence after another
• Iterative procedures work well if sequences are not too different from each
other

22.09.24 Bioinformatics Algorithms 73

Sequence alignment
1. Global alignment assumes that the two proteins are basically
similar over the entire length of one another. The alignment
attempts to match them to each other from end to end.
AAAGCGGAAGTCACAG
||.||.||||| |.||
AAGGCTGAAGT-ATAG

1. Local alignment searches for segments of the two sequences

that match well. There is no attempt to force entire sequences
into an alignment, just those parts that appear to have good
similarity, according to some criterion.
AAAGCGGAAGTCACAG
AAAGCGGAAGTCACAG AAGGCTGAAGTATAG
......||||| ....
AAGGCTGAAGT-ATAG
22.09.24 Bioinformatics Algorithms 74
Local alignments
• In many biological applications, the score of an alignment between
two substrings of v and w might actually be larger than the score of
an alignment between the entireties of v and w.
• For example, homeobox genes, which regulate embryonic
development, are present in a large variety of species. Although
homeobox genes are very different in different species, one region in
each gene—called the homeodomain— is highly conserved.
• How to find this conserved area and ignore the areas that show little
similarity?

22.09.24 Bioinformatics Algorithms 75

Local alignments
--T--CC-C-AGT--TATGT-CAGGGGACACG--A-GCATGCAGA-GAC
--|--||-|--||--|-|-|-|||----||-|--|-|--|-||||---|
AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG--T-CAGAT--C

versus

tccCAGTTATGTCAGgggacacgagcatgcagagac
||||||||||||
aattgccgccgtcgttttcagCAGTTATGTCAGatc

How can we represent such a local alignment on a grid?

22.09.24 Bioinformatics Algorithms 76

--T--CC-C-AGT--TATGT-CAGGGGACACG--A-GCATGCAGA-G
| || | || | | | ||| || | | | | ||||
AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG--T-CAGAT-

Local alignments tccCAGTTATGTCAGgggacacgagcatgcaga

||||||||||||
aattgccgccgtcgttttcagCAGTTATGTCAGatc

A A T T G C CG C C G T CG T T T T C A G C A G T T A T G T C A G A T C
T
--T--CC-C-AGT--TATGT-CAGGGGACACG--A-GCATGCAGA-GAC C
C
C
--|--||-|--||--|-|-|-|||----||-|--|-|--|-||||---| A
G
T
AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG--T-CAGAT--C T
A Global Local
T
G
T
C
A
versus G
G
G
G
A
C
tccCAGTTATGTCAGgggacacgagcatgcagagac A
C
G
|||||||||||| A
G
C
A
aattgccgccgtcgttttcagCAGTTATGTCAGatc T
G
C
A
G
A
G
A
C

22.09.24 Bioinformatics Algorithms 77

Figure 6.16 (a) Global and (b) local alignments of two hypothetical genes th
have a conserved domain. The local alignment has a much worse score accor
Local alignment
• Smith-Waterman algorithm (1981)
6.8 Local Sequence Alignment
• Introduces “free jumps” to arbitrary cells into Needleman-Wunsch
algorithm
• This can be represented by edges of weight 0
(here shown with dashed lines) from the
source (0, 0) to every other cell
on the grid.

Figure 6.17 The Smith-Waterman local alignment algorithm introdu

weight 0 (here shown with dashed lines) from the source vertex (0, 0) t
22.09.24 Bioinformatics Algorithms 78
vertex in the edit graph.
Smith-Waterman algorithm
w(i,j,x) .... Score to come to (i,j) via x (= match, mismatch, indel)
S .... nxm Matrix with score of highest scoring path
D ... nxm Matrix with direction of highest scoring path

S(0,0) = 0
for (i in 1:n) S(i,0) = 0
for(j in 1:m) S(0,j) = 0
for(i in 1:n)
for(j in 1:m)
S(i,j) = max(w(i,j,x),0)
D(i,j) = path.to(i,j)

22.09.24 Bioinformatics Algorithms 79

Smith-Waterman
How to find the best local alignment?

1. Find largest entry in matrix

22.09.24 Bioinformatics Algorithms 80

Smith-Waterman
How to find the best local alignment?

1. Find largest entry in matrix

2. Backtrack until cell with entry 0

22.09.24 Bioinformatics Algorithms 81

Smith-Waterman
How to find the best local alignment?

1. Find largest entry in matrix

2. Backtrack until cell with entry 0

It is also possible to find multiple

local alignments

22.09.24 Bioinformatics Algorithms 82

Mutiple
non-overlapping
local alignments:

22.09.24 Bioinformatics Algorithms 83

Mapping and re-sequencing
Alignemnt is also called mapping if you align (map) sequences against an
existing genome

This is an essential step in re-sequencing

If we have a reference genome of an organism, we can align new

sequences against this reference and find sites with genetic differences
between indivudals

Because genomes are quite big, mapping can be quite tedious

22.09.24 Bioinformatics Algorithms 84
NGS data
(next generation sequencing)
NGS generally produces shorter reads than Sanger sequencing (200 vs
1000 bp), which poses some challenges:

• The reads don’t come with position information

• Reference sequence can be quite long (3 billion bp for humans)
• Alignment is not perfect: we need to allow for small variations
• Reads have sequencing-errors
• Short reads are more likely to fit into several positions
• We need to map millions of reads
22.09.24 Bioinformatics Algorithms 85
Some tools
• BLAST (Basic Local Alignment Search Tool)
one of the most commonly used tools in BioInformatics
aligns sequences against existing databases
BLAST can be used for identifying species, locating domains, establishing
phylogeny, DNA mapping, and comparison
• Bowtie
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short
DNA sequences (reads) to the human genome at a rate of over 25 million
35-bp reads per hour. Bowtie indexes the genome with a Burrows-
Wheeler index to keep its memory footprint small: typically about 2.2 GB
for the human genome (2.9 GB for paired-end).
• TopHat
Similar to BowTie but for Transcriptome data

22.09.24 Bioinformatics Algorithms 86

Excercises
• Take the NW-algorithm implementation and try to see how the
outcome changes when you change parameters
• Change the scoring function:
1. smaller penalty for longer indels
2. Include a scoring matrix (different penalities depending on the type of
substitution, e.g., C -> T is more costly than G -> T, etc.)
• Adapt NW algorithm for local alignments

22.09.24 Bioinformatics Algorithms 87

References
• Needleman and Wunsch - A general method applicable to the search
for similarities in the amino acid sequence of two proteins (1970)
• Smith and Waterman - Identification of common molecular
subsequences (1981)
• Chapter 6 in “An Introduction to Bioinformatics Algorithms” – Jones
and Pevzner

22.09.24 Bioinformatics Algorithms 88

Exhaustive search and search trees

22.09.24 Bioinformatics Algorithms 89

Exhaustive search algorithms
• NW and SW are algorithms where we can find solution not by
exhaustive search, but by smart algorithms (dynamic programming)
• Exhaustive search/brute force usually not good but first step towards
better algorithm design
• Sometimes exhaustive search is only option
• Can we find smart ways to ignore subsets of (wrong) solutions?

22.09.24 Bioinformatics Algorithms 90

Restriction sites
Hamilton Smith discovered in 1970 that the restriction enzyme HindII
cleaves DNA molecules at every occurrence, or site, of the sequences
GTGCAC or GTTAAC, breaking a long molecule into a set of restriction
fragments.
Shortly thereafter, maps of restriction sites in DNA molecules, or
restriction maps, became powerful research tools in molecular biology by
helping to narrow the location of certain genetic markers.

• Restiriciton sites are locations on a chromosome that are recognized

by restriction enzymes
• a particular restriction enzyme may cut the sequence between two
nucleotides within its recognition site, or somewhere nearby.
• E.g., the common restriction enzyme EcoRI recognizes the palindromic
sequence GAATTC and cuts between the G and the A on both the top
and bottom strands
• This can then be used to ligate in complementary pieces of DNA

22.09.24 Bioinformatics Algorithms 91

Restriction map

• A restriction map is a map of known

restriction sites within a sequence of
DNA.
• Restriction mapping requires the use of
restriction enzymes.
• In molecular biology, restriction maps are
used as a reference to engineer plasmids
or other relatively short pieces of DNA,
and sometimes for longer genomic DNA.

22.09.24 Bioinformatics Algorithms http://dx.doi.org/10.1590/S0074-02762007005000121

92
Restriction mapping
• If the genomic DNA sequence of an organism is known, then
construction of a restriction map for, say, HindII amounts to finding all
occurrences of GTGCAC and GTTAAC in the genome.
• However, not all restriction sites are known and
• sequencing is not always the best option if one wants to find a
restriction map
• Also: 25 years ago sequencing was not an option!
• How can we build restriction maps for genomes without prior
knowledge of the genomes’ DNA sequence?

22.09.24 Bioinformatics Algorithms 93

Electrophoresis
1. Aliquots of purified plasmid DNA for each digest
2. Digestion with enzyme
3. Samples are run on an electrophoresis gel
4. We can determine the lengths of DNA fragments from
band-pattern

Electrophoresis gives us number and

length of DNA fragments. How can we
infer restriciton map from this
information?
http://bio1151.nicerweb.com/Locked/media/ch20/electrophoresis.html
22.09.24 Bioinformatics Algorithms 94
Some notation
A multiset is a set that allows duplicate elements: e.g., {1,2,2,3,4,4,6}

Let X = {x1 = 0,x2, .... , xn} a set of n elements in increasing order

We denote by ΔX the multiset of all pairwise distances between

elements of X:
ΔX = {xj − xi : 1 ≤ i < j ≤ n} .

22.09.24 Bioinformatics Algorithms 95

Example
For example, if X={0, 2, 4, 7, 10}, then ΔX={2, 2, 3, 3, 4, 5, 6, 7, 8, 10}

22.09.24 Bioinformatics Algorithms 96

The problem

Given all pairwise differences between points on a

line, reconstruct the position of all those points.

!
Input: The multiset of pairwise distances L, containing "
integers.

Output: A set X of n integers such that ΔX = L.

22.09.24 Bioinformatics Algorithms 97

Restriction mapping: brute force
!
Input: The multiset of pairwise distances L, containing "
integers.
Output: A set X of n integers such that ΔX = L.

M = max(L)
for (every set of integers 0 < x2 < .. < xn-1 < M)
X = {0, x2, ...., xn-1,M}
ΔX = get.pairwise.distances(X)
if (ΔX == L)
return(X)
return(«no solution»)

22.09.24 Bioinformatics Algorithms 98

Complexity
• The algorithm is slow since it examines #$% !
different sets of
positions
• This requires about O(M(n−2)) time
• This makes the algorithm unpractical for most applications
• One could improve the algorithm by choosing the elements of the
sets more wisely (e.g., choosing n-2 distinct elements from L rather
than n-2 arbitrary integers)
• This reduces time-complexity to O(n2n-4)
(consider L = {2,998,1000}, n = 3, M = 1000)

22.09.24 Bioinformatics Algorithms 99

A practial algorithm (Skiena, 1990)
Idea:
1. Find largest distance in L (determines the two outermost points of X)
2. Find second-largest distance δ
3. two options to place δ:
left or right 0 δ(?) δ(?) max(X)
4. Pick left and add δ to preliminiary set
5. calculated all pairwise distances 0 δ(?) δ(?) xn-1 max(X)
6. check if they are contained in L
7. If yes: remove δ from L and go to 2.
8. If not: go back and pick other direction 0 x2 δ(?) δ(?) xn-1 max(X)
9. If L is empty, we have found a valid solution

22.09.24 Bioinformatics Algorithms 100

Example
L = {2,2,3,3,4,5,6,7,8,10} x3 = 3 is not possible because x3 – x2 = 1 is not in L

! Therfore x4 must be 7
L has 10 elements and therefore n = 5 ( "
= 10)

10 is the largest distance in L, so x1 = 0 and x5 = 10 X = {0,2,7,10} and L = {2,3,4,6}

X = {0,10} and L ={2,2,3,3,4,5,6,7,8} Largest remaining distance is 6

The largest remaining distance is 8. Two choices: x3 = 4 or x3 = 6

Set x2 = 2 and remove distances 8 and 2. (x2 - x1, x5 – x2) x3 = 6 does not work, so x3 must be 4

We get X = {0,2,10} and L = {2,3,3,4,5,6,7} We get X = {0,2,4,7,10} which is a valid solution

The largest remaining distance is 7 -> either x4 = 7 or x3 = 3

22.09.24 Bioinformatics Algorithms 101

Algorithm
get.rest.map = find.next = function(L,X,m)
If L is empty: X is solution, return
function(L,X)
y = max(L)
m = max(L) If Δ(y,X) is subset of L
L = L without m add y to X and remove Δ(y,X) from L
find.next(L,X,m)
X = {0,m} remove y from X and add Δ(y,X) to L
find.next(L,X,m) if Δ(m-y,X) is subset of L
add m-y to X and remove Δ(y,X) from L
find.next(L,X,m)
remove m – y from X and add Δ(m-y,X) to L

22.09.24 Bioinformatics Algorithms 102

Recursive tree
At each step we can go left or right,
which creates a recursive-tree like this:

If we find that going, say,

left doesn’t yield a viable
solution, we can ignore
whole subtree!
……

……

22.09.24 Bioinformatics Algorithms 103

Some notes
After each recursion we undo the modifications to sets X and L for the next recursive call

This algorithm will list all sets X with Δ(X) = L

This algorithm is usually very fast because we do not go deep into the «recursive tree»

However, there exists pathological examples where we explore 2k possible paths (left and right are
always viable paths)

So strictly speaking, this algorithm has exponential time-complexity, but is much faster in most cases

A polynomial-time algorithm was found in 2002 by Nivat et al.

22.09.24 Bioinformatics Algorithms 104

Search trees
As we have seen, trees can be useful structures to “organize”
information.

We can exploit the tree structure to efficiently chose whether a subset

of all possible combinations is “uninteresting”

22.09.24 Bioinformatics Algorithms 105

Example: Binary search tree
• One node is designated the root of the tree.
• Each node contains a key and has (at most) two subtrees.
• Each subtree is itself a binary search tree.
• The left subtree of a node contains only nodes with keys strictly less
than the node's key.
• The right subtree of a node contains only nodes with keys strictly
greater than the node's key.

22.09.24 Bioinformatics Algorithms 106

Example: Binary tree

See R-script: “tree.r” on ilias

22.09.24 Bioinformatics Algorithms 107

Lecture 3
No ratings yet
Lecture 3
62 pages
2024 Bioinformatics Algorithms Day 3 - 4
No ratings yet
2024 Bioinformatics Algorithms Day 3 - 4
106 pages
Algorithms in Bioinformatics: Instructor: Dr. M. Asif Rasheed
No ratings yet
Algorithms in Bioinformatics: Instructor: Dr. M. Asif Rasheed
11 pages
Black and Purple Gradient Modern Artificial Intelligence Presentation
No ratings yet
Black and Purple Gradient Modern Artificial Intelligence Presentation
21 pages
Unit Iv
No ratings yet
Unit Iv
98 pages
Unit 2 L1
No ratings yet
Unit 2 L1
65 pages
FASTA Algorithm
No ratings yet
FASTA Algorithm
15 pages
AI Pathfinding in 8-Puzzle Game
No ratings yet
AI Pathfinding in 8-Puzzle Game
17 pages
hw1 Sol
No ratings yet
hw1 Sol
6 pages
Module Comparing and Visualizing Multiple Biological Sequences
No ratings yet
Module Comparing and Visualizing Multiple Biological Sequences
34 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
HandoutUninformedSearch PDF
No ratings yet
HandoutUninformedSearch PDF
14 pages
Introduction to AI Course Overview
No ratings yet
Introduction to AI Course Overview
32 pages
Ch08 GraphsDNAseq
No ratings yet
Ch08 GraphsDNAseq
82 pages
Lec3-Problem Solving by Search
No ratings yet
Lec3-Problem Solving by Search
144 pages
Tutorial Note 7 Midterm Exam Review (Again!)
No ratings yet
Tutorial Note 7 Midterm Exam Review (Again!)
16 pages
CS206 - AI: Class Discussions
No ratings yet
CS206 - AI: Class Discussions
55 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
35 pages
AE556 2024 Topic1 SSSearch
No ratings yet
AE556 2024 Topic1 SSSearch
52 pages
Unit-3-Full Notes
No ratings yet
Unit-3-Full Notes
72 pages
Chapter 05
No ratings yet
Chapter 05
86 pages
1c, Lectures - 815 - F14 - A - Metaheuristics
No ratings yet
1c, Lectures - 815 - F14 - A - Metaheuristics
98 pages
2023 Bioinformatics Algorithms Day 1
No ratings yet
2023 Bioinformatics Algorithms Day 1
74 pages
Searching: Deterministic Single-Agent: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Searching: Deterministic Single-Agent: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
45 pages
Eloise's Claim: by Showing That Franklin Is One of Eloise's Ancestors or by Showing That Eloise Is One of Franklin's
No ratings yet
Eloise's Claim: by Showing That Franklin Is One of Eloise's Ancestors or by Showing That Eloise Is One of Franklin's
7 pages
Ai Unit Ii
No ratings yet
Ai Unit Ii
31 pages
AI Assinment Final
No ratings yet
AI Assinment Final
22 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Problem Solving by Search - Updated
No ratings yet
Problem Solving by Search - Updated
102 pages
Needleman-Wunsch and Smith-Waterman Algorithm
67% (9)
Needleman-Wunsch and Smith-Waterman Algorithm
19 pages
AI Algorithms for CSE Students
No ratings yet
AI Algorithms for CSE Students
31 pages
Modu 4 (Aad)
No ratings yet
Modu 4 (Aad)
37 pages
3 8 (Amar)
No ratings yet
3 8 (Amar)
23 pages
Tabby
No ratings yet
Tabby
11 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
5 BackTrack
No ratings yet
5 BackTrack
20 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Ai 02
No ratings yet
Ai 02
22 pages
UC Riverside Electronic Theses and Dissertations
No ratings yet
UC Riverside Electronic Theses and Dissertations
59 pages
Ai Unit 2 Notes
No ratings yet
Ai Unit 2 Notes
52 pages
Ai
No ratings yet
Ai
10 pages
Student Algorithm Competition Guide
No ratings yet
Student Algorithm Competition Guide
26 pages
3.1 Sequence Alignment
No ratings yet
3.1 Sequence Alignment
8 pages
AI Search Algorithms Guide
No ratings yet
AI Search Algorithms Guide
58 pages
CSC215 Complete Note
No ratings yet
CSC215 Complete Note
61 pages
Cs221 Section3 Solutions
No ratings yet
Cs221 Section3 Solutions
8 pages
CS249-Uninformed-Search-2023
No ratings yet
CS249-Uninformed-Search-2023
58 pages
Présentation Ekin en
No ratings yet
Présentation Ekin en
40 pages
Ai Lab Manual (Parta & B)
No ratings yet
Ai Lab Manual (Parta & B)
28 pages
Cs221 LEC 3 Slides
No ratings yet
Cs221 LEC 3 Slides
43 pages
Unit 2. Problem Solving Through Search
No ratings yet
Unit 2. Problem Solving Through Search
7 pages
Ch10 Clustering
No ratings yet
Ch10 Clustering
45 pages
A2 Solutions
No ratings yet
A2 Solutions
4 pages
Unit 2(s)
No ratings yet
Unit 2(s)
31 pages
Biology and The Service of Mankind Final
No ratings yet
Biology and The Service of Mankind Final
15 pages
Statistics Examples For Ocr As and A Level Biology
No ratings yet
Statistics Examples For Ocr As and A Level Biology
21 pages
2 ZOOLOGY Defining Science
No ratings yet
2 ZOOLOGY Defining Science
3 pages
Screenshot 2024-01-09 at 12.39.51 PM
No ratings yet
Screenshot 2024-01-09 at 12.39.51 PM
22 pages
KANDEL - Psychiatry, Psychoanalysis, and The New Biology of Mind
100% (2)
KANDEL - Psychiatry, Psychoanalysis, and The New Biology of Mind
441 pages
Biology Presentation 4.1
No ratings yet
Biology Presentation 4.1
23 pages
Business Statistics Groebner 9th Edition
No ratings yet
Business Statistics Groebner 9th Edition
304 pages
Step Up SIP Editing
No ratings yet
Step Up SIP Editing
4 pages
Fish Morphology
No ratings yet
Fish Morphology
10 pages
Tambopata Candamo Heath
No ratings yet
Tambopata Candamo Heath
192 pages
Environmental Science Exam
No ratings yet
Environmental Science Exam
3 pages
Paneli Zid 80 MM (Sever-Istok) - Krojna Lista
No ratings yet
Paneli Zid 80 MM (Sever-Istok) - Krojna Lista
3 pages
Molecular Ecology - 2022 - S Nsteb - Population Genomics of A Forest Fungus Reveals High Gene Flow and Climate Adaptation
No ratings yet
Molecular Ecology - 2022 - S Nsteb - Population Genomics of A Forest Fungus Reveals High Gene Flow and Climate Adaptation
17 pages
Holyoak DTHolyoak GA2014 Candidula
No ratings yet
Holyoak DTHolyoak GA2014 Candidula
45 pages
ĐỀ SOẠN THEO FORM MINH HỌA TNTHPT2024 BỘ GIÁO DỤC VÀ ĐÀO TẠO
No ratings yet
ĐỀ SOẠN THEO FORM MINH HỌA TNTHPT2024 BỘ GIÁO DỤC VÀ ĐÀO TẠO
6 pages
Understanding the Physical and Material Self
No ratings yet
Understanding the Physical and Material Self
2 pages
Human Physiology 16th Edition Stuart Fox Krista Rompolski ISBN10 1260720462 ISBN13 9781260720464 Ebook and TestBank Bundle Verified PDF
No ratings yet
Human Physiology 16th Edition Stuart Fox Krista Rompolski ISBN10 1260720462 ISBN13 9781260720464 Ebook and TestBank Bundle Verified PDF
404 pages
Ra Ahhhhh
No ratings yet
Ra Ahhhhh
2 pages
(PDF Download) Molecular Biology Structure and Dynamics of Genomes and Proteomes Zlatanova Fulll Chapter
100% (9)
(PDF Download) Molecular Biology Structure and Dynamics of Genomes and Proteomes Zlatanova Fulll Chapter
53 pages
Class 8 English Worksheet 11
No ratings yet
Class 8 English Worksheet 11
45 pages
Cell Analogy As One Piece
No ratings yet
Cell Analogy As One Piece
34 pages
3.2.3 Transport Across Cell Membranes
No ratings yet
3.2.3 Transport Across Cell Membranes
12 pages
Matter, Energy and Life in Ecosystem and Biomes
No ratings yet
Matter, Energy and Life in Ecosystem and Biomes
28 pages
Toefl Structure and Written Expression 03
No ratings yet
Toefl Structure and Written Expression 03
4 pages
Botany GE Syllabus
No ratings yet
Botany GE Syllabus
6 pages
Final Proceedings 2024
No ratings yet
Final Proceedings 2024
201 pages
Biology Pre-Board Examination
No ratings yet
Biology Pre-Board Examination
4 pages
GAP FILLING EXERCISE Unit 1 VERB TENSES
No ratings yet
GAP FILLING EXERCISE Unit 1 VERB TENSES
1 page
Safety, Branches and Careers in Biology
No ratings yet
Safety, Branches and Careers in Biology
6 pages
Genetically Modified Organisms
No ratings yet
Genetically Modified Organisms
5 pages