KEMBAR78
Databases, bioinformatics, sequence analysis | PPT
INTRODUCTION
•
•
DATABASE
•
•
•
•
•
•
•
WHAT ARE THE BIOLOGICAL
DATABASES ???
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
TYPES OF DATABASES


PRIMARY DATABASES
GenBank
• Database from NCBI, includes sequences from
publicly available resources.
http://www.ncbi.nlm.nih.gov
/genbank/
•
•
• SEARCH FOR :
GENES, PROTEINS, GENOMES, STRUCTURES, DISEASES,
PUBLICATIONS AND MORE
•
15
•
•
•
GENBANK FILE FORMAT
GENBANK FILE FORMAT
EMBL
European Molecular Biological Laboratory
Nucleic acid database from EBI
(European Bioinformatics Institute)
Produced in collaboration with DDBJ and GenBank
Search engine – SRS (Sequence Retrieval System)
http://www.ebi.ac.uk
/
DDBJ
DNA Databank of Japan
Started in 1986 in collaboration with GenBank
Produced and maintained at NIG
(National Institute of Genetics)
http://www.ddbj.nig.ac.jp/
SWISS PROT http://www.ebi.ac.uk/uniprot/
…...
 Annotated sequence database established
in 1986
 Consists of sequence entries of different
lie formats
 Similar format to EMBL
 http://us.expasy.org/sprot/sprot-top.html
PIR
• Protein Information Resource
•A division of National Biomedical Research
•Foundation (NBRF) in U.S.
•One can search for entries or do sequence
similarity search at PIR site.
http://pir.georgetown.edu
/
TREMBL
Translated European Molecular Biology Laboratory
Computer annotated supplement of SWISS PROT.
Contains all the translations of EMBL nucleotide
sequence entries not yet integrated in SWISS PROT.
http://www.ebi.ac.uk/trembl/
•
•
•
•
•
COMPOSITE DATABASES
Collection of various primary database sequences
Renders sequence searching highly efficient as it searches
multiple resources
Examples :- NRDB (Non Redundant Database), OWL,
MIPSX, SWISS PROT + TrEMBL
SECONDARY DATABASES
Contains data derived from the results of analysing
primary data
Manually created or automatically generated
Contains more relevant and useful information
structured to specific requirements
Example :- PROSITE, PRINTS, BLOCKS, Pfam
Families of proteins
Can search using regular
expressions
Similar to unix commands
Families exhibit these patterns
So we can search over families
http://ca.expasy.org/
prosite/
 Motifs/blocks are
created by
automatically
detecting the
most conserved
regions of each
protein family.
PRIMARY VS SECONDARY DATABASES

Databases, bioinformatics, sequence analysis