Multiple Sequence Alignment by Shubham Kaushik

What is Multiple Sequence Alignment?
 Multiple Sequence Alignment (MSA) is
generally the alignment of three or more
biological sequences (protein or nucleic acid) of
similar length. From the output, homology can be
inferred and the evolutionary relationships
between the sequences studied.

Why we need MSA?
 MSA is central to many bioinformatics applications -
 Phylogenetic tree
 Motifs
 Patterns
 Structure prediction (RNA, protein)

Multiple Sequence Alignment:
Methods and tools
Sum-of-pair or Dynamic programming method-
Progressive Alignment Method-
Iterative Refining Method-

Dynamic programming method
Direct method
N dimensional matrix for N number of sequences.
For N sequences and L residues-
Time O(LN
)
 Memory O(LN
)
Reduction of space and time:
1989-Carillo Lipman Algorithm
Pairwise dynamic programming alignment

Architecture of 3D alignment cell
(i-1,j-1,k-1)
(i,j-1,k-1)
(i,j-1,k)
(i-1,j-1,k) (i-1,j,k)
(i,j,k)
(i-1,j,k-1)
(i,j,k-1)

• si,j,k = max
• (x, y, z) is an entry in the 3D scoring matrix
si-1,j-1,k-1 + δ(vi, wj,uk)
si-1,j-1,k + δ (vi, wj, _ )
si-1,j,k-1 + δ (vi, _, uk)
si,j-1,k-1 + δ (_, wj, uk)
si-1,j,k + δ (vi, _ , _)
si,j-1,k + δ (_, wj, _)
si,j,k-1 + δ (_, _, uk)
cube diagonal:
no indels
face diagonal:
one indel
edge diagonal:
two indels

Progressive alignment method
 Also known as Hierarchical or Tree method.
 Most widely used in MSAs.
 Developed by Poulien Hogeweg and Ben Hesper in
1984.
 All progressive alignment require two stages-
 Tree guide formation
 MSA built according to tree guide

Progressive Alignment tools
 CLUSTAL family(Clustal w2,Clustal x,Clustal-omega)
 Pileup
 MUSCLE,
 K-align
 T-Coffee-
 Dialign
 PIMA
 Multialign

ITERATIVE REFINMENT METHOD
 Produce MSA while reducing errors inherent in
progressive methods are classified as “Iterative”.
 Similarly to progressive but align repetedly realign
the initial sequence as well as new sequence.
 Barton and Sternberg formulated this method of
MSA.
 MUSCLE,
 Dialign,
 SAGA

 Tree based Consistency Objective Function For
alignment Evaluation
 ADVANCED FEATURE-evaluate the quality of
alignment.
 Produce alignment in .aln format commonly,
 PIR,MSF,FASTA can also be produce.
 Most common input format supported(FASTA,PIR).
http://www.tcoffee.crg.cat/

 Popular multiple alignment tool family
comprises clustal w2,clustal omega,clustal x.
 steps are
Build distance matrix
construct guide tree
building MSA.
http://clustal.org/Clustal Family

MUSCLE:
Multiple Sequence Comparison
by Log- Expectation.
 MUSCLE is claimed to achieve both better average
accuracy and better speed than ClustalW2 or T-Coffee,
depending on the chosen options.
 especially good with proteins.
 Suitable for medium alignments.
 This tool can align up to 500 sequences or a maximum
file size of 1 MB.

References:
 Floden,E.Tommaso,P.Chatzou,M.Magis,C.Notredame,C.and
Chanf,J. “TM-Coffee: a web server for fast and accurate
multiple sequence alignments of regular and transmembrane
proteins using homology extension on reduced databases.”
Nucleic Acids Res.vol 44, Web Server issue W339–W343,
2016
 Karmakar,R. Sadhu,K.T. Hazra,A. Sahana,S. and
Karmakar,S. “A Comparative Study of Multiple Sequence
Alignments.” International Journal of Computer Sciences
and Engineering journal Volume-4, Issue-10, 2017.
 Edger,C.R. “MUSCLE: multiple sequence alignment with
high accuracy and high throughput” Nucleic Acids Res.vol
32, Web Server issue 1792–1797, 2004

Multiple Sequence Alignment by Shubham Kaushik

More Related Content

What's hot

Similar to Multiple Sequence Alignment by Shubham Kaushik

Recently uploaded

In this document

Multiple Sequence Alignment by Shubham Kaushik