KEMBAR78
Primary and secondary database | PPT
PRIMARY AND SECONDARY BIOLOGICAL
DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )
CONTANTS
• INTRODUCTION
• WHAT IS DATA AND DATABASE?
• WHAT IS BIOLOGICAL DATABASE?
• TYPES OF BIOLOGICAL DATABASE
– PRIMARY DATABASE
• Nucleic acid sequence database
• Protein sequence database
– SECONDARY DATABASE
– COMPOSITE DATABASE
– TERTIARY DATABASE
• WHY NEED?
• CONCLUSION
• REFRENCES
5/11/2020
2
INTRODUCTION
Application of
computational
techniques
Management
and Analysis
of biological
Data.
Bioinformatic
History:
•The first English use of the word "data" is from the 1640s.
• Using the word "data" to mean "transmittable and
storable computer information" was first done in 1946.
•The first database was created in 1956 .
•Insulin protein is the first protein to be sequenced.
5/11/2020 3
DATA
• A series of
observations,
measurements ,
or facts;
information
and also
called: information
computing.
DATABASE
• A large
systematized collecti
on of data that can
be
expanded,updated,
and retrieved rapidly
for specific purpose.
5/11/2020 4
BIOLOGICAL DATABASE
• Storage of biological information(Nucleic
acid sequence, Protein sequence and
structure).
5/11/2020 5
DEFINATION
Biological database are computer sites
that organise, store and disseminate files that
contain information consisting of literature
references, nucleic acid sequences and Protein
sequences and structure.
5/11/2020 6
SOURCES ON THE WEB FOR IMPORTANT
DATABASE
5/11/2020 7
TYPES OF BIOLOGICAL DATABASE
1.
• Primary Database
2.
• Secondary Database
3.
• Composite Database
4.
• Tertiary Database
5/11/2020 8
Primary Database
Stores biomolecular sequences (Protein or Nucleic acid )
and associated annotation information (Organism,
species, mutation linked to particular diseases,
bibliographic etc. )
Primary sources are original materials on which research
is based.
Neither interpreted nor condensed nor evaluated by
other writers.
5/11/2020 9
PRIMARY
Nucleotide
sequences
NCBI GenBank
EMBL
DDBJ
Protein
Sequences
PIR
UniProt
SWISS-
PROT
TrEMBL
5/11/2020 10
NCBI
• Located in Bethesda, Maryland and was founded in 1988
through legislation sponsored by Senator Claude Pepper.
• Was directed by David Lipman, one of the original authors of
the BLAST.
• The NCBI houses a series of databases.
EX. : GenBank - DNA sequences.
PubMed (a bibliographic database ) - the biomedical
literature.
Other databases - Epigenomics database.
5/11/2020 11
GenBank
• A part of International nucleiotide sequence database
collaboration which comprised of EMBL, DDBJ GenBank
at NCBI.
• The database started in 1982 by Walter Goad and Los
Alamos National Laboratory.
• In 15 August 2017, GenBank release 221.0 has
203,180,606 loci, 240,343,378,258 bases, from
203,180,606 reported sequences.
https://www.revolvy.com/main/index.php?s=GenBank
5/11/2020 12
EMBL-EBI
• Established in 1980 at the EMBL laboratories in
Heidelberg, Germany.
• An international, innovative and interdisciplinary
research organisation funded by 23 member states and
two associate member states.
• Location- Hinxton, Cambridge, UK.
5/11/2020 13
DDBJ
• 1987 DDBJ release 1 was provided.
• Situated in Mishima, Japan.
5/11/2020 14
5/11/2020 15
5/11/2020 16
SECONDARY DATABASE
• Derived from the analysis of primary data.
• Present in the form of regular expressions(patterns),
fringerprints, blocks.
Secondary
databse
PROSITE
PRINTS
5/11/2020 17
PROSITE
• It is consists of entries describing the protein families,
domains and functional sites as wel as aminocid patterns
and profiles in them.
• Complemented by collection of rules based profiles and
pattern i.e. ProRule.
5/11/2020 18
PRINTS
• Collection of protein motif fringerprints.
• the motifs do not overlap, but are separated along a
sequence, though they may be contiguous in 3D-space.
• Fingerprints can encode protein folds and functionalities
more flexibly and powerfully than can single motifs, full
diagnostic potency deriving from the mutual context
provided by motif neighbours.
5/11/2020 19
COMPOSITE DATABASE
• Represent an amalgamation of several primary database
sources and are easy to use.
• Access all the relevant information from a single source
rather than connect to multiple resources.
Ex. NCBI, UniProt etc.
5/11/2020 20
CONCLUSION
• Bioinformatics is the application of information
technology to store, organize To make biological data
available in computer-readable form.
• We can easily analyze the vast amount of biological
data which is available in the form of sequences and
structures of proteins(the building block of organisms)
and nucleic acid (the information carrior).
• Need for storing and communicating large datasets has
grown .
• Make biological data available to scientists.
5/11/2020 21
REFERENCES
• Books:
– Bioinformatics – C.S.V.Murthy - edition-1st - 2003 .
– Bioinformatics – S.C. Rastogi - edition-1st - 2003.
• Other s source:
– https://www.ncbi.nlm.nih.gov/nuccore/NC_002371.2
– http://vle.du.ac.in/mod/book/print.php?id=8913&chapterid=12618
– https://web.expasy.org/docs/swiss-prot_guideline.html
– nd%20Managing%20Information%20Leicester/page_21.htm
– https://bioinf.comav.upv.es/courses/biotech3/theory/databases.ht
ml
5/11/2020 22

Primary and secondary database

  • 1.
    PRIMARY AND SECONDARYBIOLOGICAL DATABASE By KAUSHAL KUMAR SAHU Assistant Professor (Ad Hoc) Department of Biotechnology Govt. Digvijay Autonomous P. G. College Raj-Nandgaon ( C. G. )
  • 2.
    CONTANTS • INTRODUCTION • WHATIS DATA AND DATABASE? • WHAT IS BIOLOGICAL DATABASE? • TYPES OF BIOLOGICAL DATABASE – PRIMARY DATABASE • Nucleic acid sequence database • Protein sequence database – SECONDARY DATABASE – COMPOSITE DATABASE – TERTIARY DATABASE • WHY NEED? • CONCLUSION • REFRENCES 5/11/2020 2
  • 3.
    INTRODUCTION Application of computational techniques Management and Analysis ofbiological Data. Bioinformatic History: •The first English use of the word "data" is from the 1640s. • Using the word "data" to mean "transmittable and storable computer information" was first done in 1946. •The first database was created in 1956 . •Insulin protein is the first protein to be sequenced. 5/11/2020 3
  • 4.
    DATA • A seriesof observations, measurements , or facts; information and also called: information computing. DATABASE • A large systematized collecti on of data that can be expanded,updated, and retrieved rapidly for specific purpose. 5/11/2020 4
  • 5.
    BIOLOGICAL DATABASE • Storageof biological information(Nucleic acid sequence, Protein sequence and structure). 5/11/2020 5
  • 6.
    DEFINATION Biological database arecomputer sites that organise, store and disseminate files that contain information consisting of literature references, nucleic acid sequences and Protein sequences and structure. 5/11/2020 6
  • 7.
    SOURCES ON THEWEB FOR IMPORTANT DATABASE 5/11/2020 7
  • 8.
    TYPES OF BIOLOGICALDATABASE 1. • Primary Database 2. • Secondary Database 3. • Composite Database 4. • Tertiary Database 5/11/2020 8
  • 9.
    Primary Database Stores biomolecularsequences (Protein or Nucleic acid ) and associated annotation information (Organism, species, mutation linked to particular diseases, bibliographic etc. ) Primary sources are original materials on which research is based. Neither interpreted nor condensed nor evaluated by other writers. 5/11/2020 9
  • 10.
  • 11.
    NCBI • Located inBethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper. • Was directed by David Lipman, one of the original authors of the BLAST. • The NCBI houses a series of databases. EX. : GenBank - DNA sequences. PubMed (a bibliographic database ) - the biomedical literature. Other databases - Epigenomics database. 5/11/2020 11
  • 12.
    GenBank • A partof International nucleiotide sequence database collaboration which comprised of EMBL, DDBJ GenBank at NCBI. • The database started in 1982 by Walter Goad and Los Alamos National Laboratory. • In 15 August 2017, GenBank release 221.0 has 203,180,606 loci, 240,343,378,258 bases, from 203,180,606 reported sequences. https://www.revolvy.com/main/index.php?s=GenBank 5/11/2020 12
  • 13.
    EMBL-EBI • Established in1980 at the EMBL laboratories in Heidelberg, Germany. • An international, innovative and interdisciplinary research organisation funded by 23 member states and two associate member states. • Location- Hinxton, Cambridge, UK. 5/11/2020 13
  • 14.
    DDBJ • 1987 DDBJrelease 1 was provided. • Situated in Mishima, Japan. 5/11/2020 14
  • 15.
  • 16.
  • 17.
    SECONDARY DATABASE • Derivedfrom the analysis of primary data. • Present in the form of regular expressions(patterns), fringerprints, blocks. Secondary databse PROSITE PRINTS 5/11/2020 17
  • 18.
    PROSITE • It isconsists of entries describing the protein families, domains and functional sites as wel as aminocid patterns and profiles in them. • Complemented by collection of rules based profiles and pattern i.e. ProRule. 5/11/2020 18
  • 19.
    PRINTS • Collection ofprotein motif fringerprints. • the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. • Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbours. 5/11/2020 19
  • 20.
    COMPOSITE DATABASE • Representan amalgamation of several primary database sources and are easy to use. • Access all the relevant information from a single source rather than connect to multiple resources. Ex. NCBI, UniProt etc. 5/11/2020 20
  • 21.
    CONCLUSION • Bioinformatics isthe application of information technology to store, organize To make biological data available in computer-readable form. • We can easily analyze the vast amount of biological data which is available in the form of sequences and structures of proteins(the building block of organisms) and nucleic acid (the information carrior). • Need for storing and communicating large datasets has grown . • Make biological data available to scientists. 5/11/2020 21
  • 22.
    REFERENCES • Books: – Bioinformatics– C.S.V.Murthy - edition-1st - 2003 . – Bioinformatics – S.C. Rastogi - edition-1st - 2003. • Other s source: – https://www.ncbi.nlm.nih.gov/nuccore/NC_002371.2 – http://vle.du.ac.in/mod/book/print.php?id=8913&chapterid=12618 – https://web.expasy.org/docs/swiss-prot_guideline.html – nd%20Managing%20Information%20Leicester/page_21.htm – https://bioinf.comav.upv.es/courses/biotech3/theory/databases.ht ml 5/11/2020 22

Editor's Notes

  • #4 The first English use of the word "data" is from the 1640s.