This document discusses dot plot analysis, which allows comparison of two biological sequences to identify similar regions. It describes how dot plots are generated using a similarity matrix and defines different features that can be observed, such as identical sequences appearing on the principal diagonal, direct and inverted repeats appearing as multiple diagonals, and low complexity regions forming boxes. Applications of dot plot analysis include identifying alignments, self-base pairing, sequence transposition, and gene locations between genomes. Limitations include high memory needs for long sequences and low efficiency for global alignments.
IntroductionIntroduction
In bioinformatics adot plot is a graphical method that allows
the comparison of two biological sequences and identify
regions of close similarity between them.
Introduced by GIBBS and MCLNTYE in 1970.
It is the one way to visualize that similarity between two
protein and nucleotide sequences by uses a similarity matrix.
4.
PrinciplePrinciple
Dot plot aretwo dimensional graphs, showing a comarision of two sequences.
The principle used to generate the dot plot is:
The top X and the left y axes of a rectangular array are used to represent the
two sequences to be compared.
Calculation:
Matrix
• Columns = residues of sequence 1
• Rows = residues of sequence 2
A dot is plotted at every co-ordinate where there is similarity between the bases.
5.
ExampleExample
Seq 1: TWILIGHTZONE
Seq2: MIDNIGHTZONE
Matrix= 12 * 12
A dot is plotted at every co-ordinate where there is similarity between the
bases.
Analysis of dotplot matrixAnalysis of dot plot matrix
Region of similarity appears as diagonal run of dots.
Principal diagonal shows identical sequence.
Global and local alignment are shown.
Multiple diagonal indicate repeatation
Reverse diagonal (perpendicular to diagonal) indicate
INVERSION.
Reverse diagonal crossing diagonal (X) indicate
PALINDROMES.
Formation of box indicate the low complexity region.
Inverted repeatInverted repeat
Aninverted repeat is sequence of nucleotides followed downstream by its
reverse complement.
Inverted repeat: abcdeedcbafghijklmno
11.
Palindromic sequencesPalindromic sequences
Apalindromic sequence is a nucleic acid sequence (DNA or
RNA) tha is same whether read 5' to 3' on one strand or 5'
to 3' on the complementary strand with which it forms a
double helix.
12.
Frame shiftsFrame shifts
Frameshifts in a nucleotide
sequence can occur due to
insertions, deletions or
mutations.
1. Deletion of nucleotides
2.Insertion of nucleotides
3.Mutation (out of frame)
13.
Low cmplexity regionLowcmplexity region
Low-complexity regions in sequences can be found as regions around the diagonal all
obtaining a high score. Low complexity regions are calculated from the redundancy of
amino acids within a limited region [Wootton and Federhen,1993].
14.
ApplicationApplication
Shows the allpossible alignment between two nucleic acid
and amino acid sequences.
All kind of local and global aligment can be traped.
Help to recognise large region of simiarity.
To find self base pairing of RNA (eg, tRNA) by comparing a
sequence to itself complemented and reverse.
An excellent approach for finding sequence transposition.
To find the location of genes between two genomes.
To find the non sequential alignment.
15.
LimitationLimitation
For longer sequence,memory required for the graphical
representation is very high. So long sequnece can not be
aligned.
Lots of insignifcant matches makes it noisy (so many off
diagonal appear).
Time required to compare two sequences is proportional to
the product of length of the squences time of the search
window.
i.e, higher efficiency of short sequence.
Low efficiency of long sequence.
16.
Dot plot softwareDotplot software
GCG is a commercial software, hence not possible to use all
the time.
Instead of this, we can use the EMBOSS package, which are
followig:
Dotmatcher
Dotpath
Polydot
Dottup
(http://emboss.bioinformatics.nl/cgi-bin/emboss/dottup)
17.
ReferencesReferences
●
Bioinformatics Principal andApplications by Zhumur Ghosh
and Bibekanand Mallick
●
Bioinformatics concepts, skill & applications, second edition by
S.C.Rastogi, Namita Mendriatta, Parag Rastogi
http://en.wikipedia.org/wiki/Dot_plot_%28bioinformatics%29
http://www.code10.info/index.php?option=com_content&view=ar
ticle&id=64:inroduction-to-dot-plots&catid=52:cat_coding_al
gorithms_dot-plots&Itemid=76
http://lectures.molgen.mpg.de/Pairwise/DotPlots/
https://ugene.unipro.ru/wiki/pages/viewpage.action?pageId=4
227426
http://www.clcsupport.com/clcgenomicsworkbench/650/Examples
_interpretations_dot_plots.html