KEMBAR78
Multiple Sequence Alignment by Shubham Kaushik | PPT
What is Multiple Sequence Alignment?
 Multiple Sequence Alignment (MSA) is 
generally  the  alignment  of  three  or  more 
biological  sequences  (protein  or  nucleic  acid)  of 
similar length. From the output, homology can be 
inferred  and  the  evolutionary  relationships 
between the sequences studied.
Why we need MSA?
 MSA is central to many bioinformatics applications -
 Phylogenetic tree
 Motifs
 Patterns
 Structure prediction (RNA, protein)
Local & Global alignment
Multiple Sequence Alignment:
Methods and tools
Sum-of-pair or Dynamic programming method-
Progressive Alignment Method-
Iterative Refining Method-
Dynamic programming method
Direct method
N dimensional matrix for N number of sequences.
For N sequences and L residues-
Time O(LN
)
 Memory O(LN
)
Reduction of space and time:
1989-Carillo Lipman Algorithm
Pairwise dynamic programming alignment 
Architecture of 3D alignment cell
(i-1,j-1,k-1)
(i,j-1,k-1)
(i,j-1,k)
(i-1,j-1,k) (i-1,j,k)
(i,j,k)
(i-1,j,k-1)
(i,j,k-1)
• si,j,k = max
• (x, y, z) is an entry in the 3D scoring matrix
si-1,j-1,k-1 + δ(vi, wj,uk)
si-1,j-1,k + δ (vi, wj, _ )
si-1,j,k-1 + δ (vi, _, uk)
si,j-1,k-1 + δ (_, wj, uk)
si-1,j,k + δ (vi, _ , _)
si,j-1,k + δ (_, wj, _)
si,j,k-1 + δ (_, _, uk)
cube diagonal:
no indels
face diagonal:
one indel
edge diagonal:
two indels
Progressive alignment method
 Also known as Hierarchical or Tree method.
 Most widely used in MSAs.
 Developed by Poulien Hogeweg and Ben Hesper in
1984.
 All progressive alignment require two stages-
 Tree guide formation
 MSA built according to tree guide
Algorithm
Progressive Alignment tools
 CLUSTAL family(Clustal w2,Clustal x,Clustal-omega)
 Pileup
 MUSCLE,
 K-align
 T-Coffee-
 Dialign
 PIMA
 Multialign
ITERATIVE REFINMENT METHOD
 Produce MSA while reducing errors inherent in
progressive methods are classified as “Iterative”.
 Similarly to progressive but align repetedly realign
the initial sequence as well as new sequence.
 Barton and Sternberg formulated this method of
MSA.
 MUSCLE,
 Dialign,
 SAGA
 Tree based Consistency Objective Function For
alignment Evaluation
 ADVANCED FEATURE-evaluate the quality of
alignment.
 Produce alignment in .aln format commonly,
 PIR,MSF,FASTA can also be produce.
 Most common input format supported(FASTA,PIR).
http://www.tcoffee.crg.cat/
T-Coffee flavors
Input your sequences
MSAs Output
 Popular multiple alignment tool family
comprises clustal w2,clustal omega,clustal x.
 steps are
Build distance matrix
construct guide tree
building MSA.
http://clustal.org/Clustal Family
MUSCLE:
Multiple Sequence Comparison
by Log- Expectation.
 MUSCLE is claimed to achieve both better average
accuracy and better speed than ClustalW2 or T-Coffee,
depending on the chosen options.
 especially good with proteins.
 Suitable for medium alignments.
 This tool can align up to 500 sequences or a maximum
file size of 1 MB.
References:
 Floden,E.Tommaso,P.Chatzou,M.Magis,C.Notredame,C.and
Chanf,J. “TM-Coffee: a web server for fast and accurate
multiple sequence alignments of regular and transmembrane
proteins using homology extension on reduced databases.”
Nucleic Acids Res.vol 44, Web Server issue W339–W343,
2016
 Karmakar,R. Sadhu,K.T. Hazra,A. Sahana,S. and
Karmakar,S. “A Comparative Study of Multiple Sequence
Alignments.” International Journal of Computer Sciences
and Engineering journal Volume-4, Issue-10, 2017.
 Edger,C.R. “MUSCLE: multiple sequence alignment with
high accuracy and high throughput” Nucleic Acids Res.vol
32, Web Server issue 1792–1797, 2004
THANKYOU…

Multiple Sequence Alignment by Shubham Kaushik

  • 2.
    What is MultipleSequence Alignment?  Multiple Sequence Alignment (MSA) is  generally  the  alignment  of  three  or  more  biological  sequences  (protein  or  nucleic  acid)  of  similar length. From the output, homology can be  inferred  and  the  evolutionary  relationships  between the sequences studied.
  • 3.
    Why we needMSA?  MSA is central to many bioinformatics applications -  Phylogenetic tree  Motifs  Patterns  Structure prediction (RNA, protein)
  • 4.
    Local & Globalalignment
  • 5.
    Multiple Sequence Alignment: Methodsand tools Sum-of-pair or Dynamic programming method- Progressive Alignment Method- Iterative Refining Method-
  • 6.
  • 7.
    Architecture of 3Dalignment cell (i-1,j-1,k-1) (i,j-1,k-1) (i,j-1,k) (i-1,j-1,k) (i-1,j,k) (i,j,k) (i-1,j,k-1) (i,j,k-1)
  • 8.
    • si,j,k =max • (x, y, z) is an entry in the 3D scoring matrix si-1,j-1,k-1 + δ(vi, wj,uk) si-1,j-1,k + δ (vi, wj, _ ) si-1,j,k-1 + δ (vi, _, uk) si,j-1,k-1 + δ (_, wj, uk) si-1,j,k + δ (vi, _ , _) si,j-1,k + δ (_, wj, _) si,j,k-1 + δ (_, _, uk) cube diagonal: no indels face diagonal: one indel edge diagonal: two indels
  • 9.
    Progressive alignment method Also known as Hierarchical or Tree method.  Most widely used in MSAs.  Developed by Poulien Hogeweg and Ben Hesper in 1984.  All progressive alignment require two stages-  Tree guide formation  MSA built according to tree guide
  • 10.
  • 11.
    Progressive Alignment tools CLUSTAL family(Clustal w2,Clustal x,Clustal-omega)  Pileup  MUSCLE,  K-align  T-Coffee-  Dialign  PIMA  Multialign
  • 12.
    ITERATIVE REFINMENT METHOD Produce MSA while reducing errors inherent in progressive methods are classified as “Iterative”.  Similarly to progressive but align repetedly realign the initial sequence as well as new sequence.  Barton and Sternberg formulated this method of MSA.  MUSCLE,  Dialign,  SAGA
  • 13.
     Tree basedConsistency Objective Function For alignment Evaluation  ADVANCED FEATURE-evaluate the quality of alignment.  Produce alignment in .aln format commonly,  PIR,MSF,FASTA can also be produce.  Most common input format supported(FASTA,PIR). http://www.tcoffee.crg.cat/
  • 16.
  • 17.
  • 18.
  • 19.
     Popular multiplealignment tool family comprises clustal w2,clustal omega,clustal x.  steps are Build distance matrix construct guide tree building MSA. http://clustal.org/Clustal Family
  • 20.
    MUSCLE: Multiple Sequence Comparison byLog- Expectation.  MUSCLE is claimed to achieve both better average accuracy and better speed than ClustalW2 or T-Coffee, depending on the chosen options.  especially good with proteins.  Suitable for medium alignments.  This tool can align up to 500 sequences or a maximum file size of 1 MB.
  • 22.
    References:  Floden,E.Tommaso,P.Chatzou,M.Magis,C.Notredame,C.and Chanf,J. “TM-Coffee:a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.” Nucleic Acids Res.vol 44, Web Server issue W339–W343, 2016  Karmakar,R. Sadhu,K.T. Hazra,A. Sahana,S. and Karmakar,S. “A Comparative Study of Multiple Sequence Alignments.” International Journal of Computer Sciences and Engineering journal Volume-4, Issue-10, 2017.  Edger,C.R. “MUSCLE: multiple sequence alignment with high accuracy and high throughput” Nucleic Acids Res.vol 32, Web Server issue 1792–1797, 2004
  • 23.