KEMBAR78
Blast (Basic Local Alignment Search Tool) | PDF | Sequence Alignment | Systems Biology
0% found this document useful (0 votes)
1K views28 pages

Blast (Basic Local Alignment Search Tool)

BLAST (Basic Local Alignment Search Tool) is a sequence similarity search program that compares a query sequence to sequence databases and calculates the statistical significance of matches. It can identify homologous sequences, find conserved domains and motifs within sequences, and help determine the function of uncharacterized sequences. BLAST uses a heuristic algorithm to quickly find regions of local similarity between sequences and outputs alignments, scores, and E-values to evaluate the significance of matches. Its applications include species identification, phylogenetic analysis, gene mapping, and domain detection.

Uploaded by

yasasve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views28 pages

Blast (Basic Local Alignment Search Tool)

BLAST (Basic Local Alignment Search Tool) is a sequence similarity search program that compares a query sequence to sequence databases and calculates the statistical significance of matches. It can identify homologous sequences, find conserved domains and motifs within sequences, and help determine the function of uncharacterized sequences. BLAST uses a heuristic algorithm to quickly find regions of local similarity between sequences and outputs alignments, scores, and E-values to evaluate the significance of matches. Its applications include species identification, phylogenetic analysis, gene mapping, and domain detection.

Uploaded by

yasasve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

BLAST (BASIC LOCAL ALIGNMENT SEARCH TOOL)

Presented By:
Madhavan Yasasve
Admn. No. 2019BY0017
I Year - M. Tech. (Biotechnology)
Sri Venkateswara College of Engineering
(Autonomous - Affiliated to Anna University),
Chennai

BY18102 - Computational Systems Biology - Seminar 30 SEP 2019


CONTENTS

1. IMPORTANT TERMS

2. INTRODUCTION TO BLAST

3. BLAST INPUT - OUTPUT

4. BLAST TYPES

5. BLAST SEARCH

6. RESULTS

7. APPLICATIONS

8. TAKE HOME MESSAGE


2
IMPORTANT TERMS

• Bioinformatics – It is the science of managing and analyzing


biological data (informations associated with biomolecules like
DNA, RNA, Protein etc.) using advanced computing techniques

• Database – They are simply the repositories in which all the


biological data is stored as computer language. Databases are
variously classified on varying basis like data type, data
source, organisms, etc

• Tools – They are software developed to perform various tasks


over the stored data such as searches, analysis, submission,
annotation, etc

• Residue – Terms stand for the building block of the


macromolecules in the databases. For example nucleotide for
DNA & RNA and amino acids for Proteins

3
CLASSIFICATION OF DATABASE

4
IMPORTANT DATABASES AND TOOLS

5
INTRODUCTION TO BLAST

• It is a sequence similarity search program for comparing


biological sequences such as amino acid sequence of different
proteins or the nucleotides of DNA sequences with sequence
database or library sequences
• It is an Insilico Hybridisation experiment used to identify
significant similarities between query sequences with the
library sequences

• BLAST stands for :


B - Basic
L - Local
A - Alignment
S - Search
T - Tool

6
INTRODUCTION TO BLAST

• BLAST was designed by Eugene Myers , Samuel Karlin ,


Stephen Altschul, Warren Gish, David J. Lipman and Webb
Miller (1990,1994,1997 ) at the National Institute of Health and
was published in Journal of Molecular Biology in 1990
• It was originally developed & controlled by NCBI .
• http://www.ncbi.nlm.nih.gov/BLAST/

7
BLAST – INPUT AND OUTPUT

• All BLAST programs use a


substitution scoring matrix
(BLOSUM or PAM),
determines pair-wise raw
alignment scores
• The BLAST algorithm is
fast, accurate, and web-
accessible
• Is relatively faster than
other sequence similarity
search tools

8
BLAST PROCESS

• BLAST works through use of Heuristic Algorithm , an algorithm


that is able to produce an acceptable solution to a problem in
many practical scenarios and is more faster than classical
methods
• Heuristics are typically used when there is no known method to
find an optimal solution ,under the given constraints
• Using this BLAST finds homologous sequences, not by
comparing either sequences in its entirety, but rather by
locating short matches between the two sequences

9
BLAST – TYPES
• Blastp : compares protein
query against proteins
sequence database
• tBlastn : compares protein
query against the all six
reading frames of a
translated nucleotide
sequence database
• Blastn : compares nucleotide
query against nucleotide
sequence database
• Blastx : compares six-frame
conceptual translation
products of a nucleotide
query sequence (both
strands) against a protein
sequence database 10
BLAST - TYPES

• tBlastx : compares nucleotide query against translated


nucleotide sequence database
• Large numbers of query sequences(megablast): When
comparing large numbers of input sequences via the
command-line BLAST, "megablast" is much faster than running
BLAST multiple times
• Position-Specific Iterative BLAST (PSIBLAST) (blastpgp): This
program is used to find distant relatives of a protein
• PHI-BLAST (Pattern-Hit Initiated BLAST): is a search program
that combines matching of regular expressions with local
alignments surrounding the match

11
PSI-BLAST

• These proteins are combined into a general "profile" sequence,


which summarises significant features present in these
sequences
• A query against the protein database is then run using this
profile, and a larger group of proteins is found
• This larger group is used to construct another profile, and the
process is repeated
• By including related proteins in the search, PSI-BLAST is much
more sensitive in picking up distant evolutionary relationships
than a standard protein-protein BLAST

12
PICTORIAL REPRESENTATION

13
BLAST SEARCH
BLAST SEARCH
BLAST SEARCH
BLAST SEARCH
BLAST SEARCH
BLAST RESULT
GRAPHIC SUMMARY
• Query sequence is at the
top, with colour key for
alignment scores
• Each bar represents the
portion of another
sequence that’s similar to
your query sequence
• Red bars: most similar
sequence
• Pink bars: match less good
• Green bars: not impressive
match
• Blue bars: worst score
• Black bars: Bad hits

20
BLAST RESULT
• 1 - This portion of each description links to the sequence
record for a particular hit.
• 2 - Score or bit score is a value calculated from the number of
gaps and substitutions associated with each aligned sequence.
The higher the score, the more significant the alignment. Each
score links to the corresponding pairwise alignment between
query sequence and hit sequence (also referred to as subject
sequence)
• 3 - E Value (Expect Value) describes the likelihood that a
sequence with a similar score will occur in the database by
chance. The smaller the E Value, the more significant the
alignment 22
• 4 - These links provide the user with direct access from BLAST
results to related entries in other databases. ‘L’ links to Locus
Link records and ‘S’ links to structure records in NCBI's
Molecular Modelling DataBase
• The Percentage of identity: This gives you a concrete
substitute for the E-value. An identity of more than 25 percent
is good news. (The identity is the number of identical residues
divided by the number of matched residues - gaps are simply
ignored)
• The Gaps field shows residues that were not aligned
• Length : is alignment length of sequence aligned by BLAST.
• Top sequence : Query sequence
• Bottom sequence : Hits (referred as Subject sequence)

23
BLAST RESULT

24
BLAST ALGORITHM

25
BLAST ALGORITHM

• List the possible matching words


• Organize the remaining high-scoring words into an efficient
search tree
• Repeat step 3 to 4 for each k-letter word in the query sequence
• Scan the database sequences for exact matches with the
remaining high-scoring words
• Extend the exact matches to high-scoring segment pair (HSP)

26
BLAST APPLICATIONS

• Identifying Species: With the use of BLAST, you can possibly


correctly identify a species and/or find homologous species.
This can be useful, for example, when one is working with a
DNA sequence from an unknown species
• Establishing Phylogeny: Using the results received through
BLAST, one can create a phylogenetic tree using the BLAST
web-page
• DNA Mapping: When working with a known species, and
looking to sequence a gene at an unknown location, BLAST
can compare the chromosomal position of the sequence of
interest, to relevant sequences in the database(s)
• Locating Domains: When working with a protein sequence you
can input it into BLAST, to locate known domains within the
sequence of interest
• Comparison: When working with genes, BLAST can locate
common genes in two related species, and can be used to map
annotations from one organism to another
27
TAKE HOME MESSAGE

• BLAST is the most important program in bioinformatics


• BLAST is based on sound statistical principles (key to its
speed and sensitivity)
• A basic understanding of its principles is key for
using/interpreting BLAST output
• BLAST can play an essential role for helping us to purpose the
following:
- structure of a protein
- Function of sequence
- Relation with an organism
- Use blastn or MEGA-BLAST for DNA
- Use PSI-BLAST for protein searches

28

You might also like