Primary and secondary databases ppt by puneet kulyana

INTRODUCTION
TO DATABASES
By:-
 PUNEET
 NEERAJ
 KARTIK
 VARUN
1

INDEX/CONTENTS
 Introduction
 Data & Information
 Database
 Biological Databases
 Types of Databases
- Primary Databases
- Secondary Databases
- Composite Databases
 References
2

DATA & INFORMATION
DATA
Data is raw, unorganized facts that need to
be processed.
Example:- Each student's test score is one
piece of data.
INFORMATION
When data is processed, organized,
structured or presented in a given context
so as to make it useful, it is called
information.
Example:- The average score of a class
or of the entire school is information that
can be derived from the given data.
4

DATA INFORMATION
Definition
(Oxford
Dictionaries)
Facts and statistics collected
together for reference or
analysis
Facts provided or
learned about something
or someone
Data as processed,
stored, or transmitted
by a computer
Refers to Raw Data Analyzed Data
Description
Qualitative Or Quantitative
Variables that can be used to
make ideas or conclusions
A group of data which
carries news and
meaning
In the form of
Numbers, letters, or a set of
characters.
Ideas and inferences
Collected via
Measurements, experiments,
etc.
Linking data and making
inferences
Represented in
A structure, such as tabular
data, data tree, a data graph,
etc.
Language, ideas, and
thoughts based on the
data
Interrelation Information that is collected
Data that has been
processed
C
O
M
P
A
R
I
S
O
N
B
E
T
W
E
E
N
D
A
T
A
&
I
N
F
O
R
M
A
T
I
O
N
5

S. No. Type of data Example(s) Weblinks
1. Sequence of
biomolecules viz., DNA,
RNA, proteins
GenBank, EMBL,
DDBJ, Swiss-Prot,
PIR
(i) www.ncbi.nlm.nih.gov/genba
nk/
(ii) https://www.ebi.ac.uk/embl/
(iii) www.ddbj.nig.ac.jp/
(iv)http://web.expasy.org/docs/s
wiss-prot_guideline.html
(v) http://pir.georgetown.edu/
2. Bio-molecular
structures
PDB http://www.rcsb.org/pdb/home
/home.do
3. Bibliography/scientific
literature **
PubMed, Scopus
(Search engine)
(i) www.ncbi.nlm.nih.gov/pubme
d
(ii) www.scopus.com
4. Patent databases USPTO www.uspto.gov/
5. Metabolic pathways /
molecular interactions
KEGG http://www.genome.jp/kegg/pa
thway.htm
6
TYPES OF DATA & INFORMATION
Databases are categorized based on the data type. A few examples are
listed below:-

DATABASE???
A database is a
collection of data
in an organized
manner, which is
accessible in
various ways.
7

WHAT ARE THE BIOLOGICAL
DATABASES ???
8

Biological Databases serve a critical purpose in the collation
and organization of data related to biological systems.
They provide a computational support and a user-friendly
interface to a researcher for a meaningful analysis of biological
data.
9

TYPES OF DATABASES
 Primary Databases
 Secondary Databases
10

PRIMARY DATABASES
 Contains bio-molecular data in its original form.
 Experimental results are submitted directly into the
database by researchers, and the data are essentially
archival in nature.
 Once given a database accession number, the data in
primary databases are never changed.
 Examples :- GenBank, EMBL and DDBJ for DNA/RNA
sequences, SWISS-PROT and PIR for protein sequences
and PDB for molecular structures.
11

GenBank
Database from NCBI, includes sequences from publicly
available resources.
http://www.ncbi.nlm.nih.gov/genbank/ 12

EMBL
 European Molecular Biological Laboratory
 Nucleic acid database from EBI (European
Bioinformatics Institute)
 Produced in collaboration with DDBJ and GenBank
 Search engine – SRS (Sequence Retrieval System)
http://www.ebi.ac.uk/
13

DDBJ
 DNA Databank of Japan
 Started in 1986 in collaboration with GenBank
 Produced and maintained at NIG (National Institute
of Genetics)
http://www.ddbj.nig.ac.jp/ 14

SWISS PROT
 Annotated sequence database established in 1986
 Consists of sequence entries of different lie formats
 Similar format to EMBL
 http://us.expasy.org/sprot/sprot-top.html
http://www.ebi.ac.uk/uniprot/
15

PIR
 Protein Information Resource
 A division of National Biomedical Research
Foundation (NBRF) in U.S.
 One can search for entries or do sequence similarity
search at PIR site.
http://pir.georgetown.edu/ 16

TrEMBL
 Translated European Molecular Biology Laboratory
 Computer annotated supplement of SWISS PROT.
 Contains all the translations of EMBL nucleotide
sequence entries not yet integrated in SWISS PROT.
http://www.ebi.ac.uk/trembl/ 17

COMPOSITE DATABASES
 Collection of various primary database sequences
 Renders sequence searching highly efficient as it
searches multiple resources
 Examples :- NRDB (Non Redundant Database), OWL,
MIPSX, SWISS PROT + TrEMBL
18

SECONDARY DATABASES
 Contains data derived from the results of analysing
primary data
 Manually created or automatically generated
 Contains more relevant and useful information
structured to specific requirements
 Example :- PROSITE, PRINTS, BLOCKS, Pfam
20

SECONDARY DATABASES
SECONDARY
DATABASE
PRIMARY
SOURCE
INFORMATION
STORED
PROSITE SWISS PROT
Regular
expression
BLOCKS
PROSITE/PRIN
TS
Aligned
motifs(blocks)
PRINTS
OWL
(Composite DB)
Aligned motifs
Pfam SWISS PROT
Hidden Markov
Models
Profile SWISS PROT
Weighted
Matrices(profile)
21

PROSITE
Families of proteins
Can search using regular expressions
Similar to unix commands using
wildcards, etc.
E.g., [AC]-x-V-x(4)-{ED}
Interpreted as:
[Ala or Cys]-any-Val-any-any-any-
any-{any but Glu or Asp}
Families exhibit these patterns
So we can search over families
http://ca.expasy.org/prosite/ 22

BLOCKS
 Motifs/blocks
are created
by
automatically
detecting the
most
conserved
regions of
each protein
family.
23

PRINTS
 Most protein families are characterized not by one,
but by several conserved motifs
 Fingerprints are groups of conserved motifs excised
from sequence alignments
 Taken together, they provide diagnostic family
signatures. They are the basis of the PRINTS
database, and are stored in the form of aligned
motifs.
 Input about protein families is done manually
24

Pfam
Maintained by the Sanger Centre (Cambridge)
Protein families aligned using HMMs
Hidden Markov Models
Given a new sequence
Find families which the sequence might fit into
Sequence Coverage
11912 families
Split into Pfam-A (high quality) and Pfam-B (low quality)
http://pfam.sanger.ac.uk/ 25

PRIMARY VS SECONDARY DATABASES 27

REFERENCES
 Class notes
 ESSENTIAL BIOINFORMATICS- Jin Xiong
 file:///C:/Users/student/Downloads/DATABASES%2
0IN%20BIOINFORMATICS.pdf
 https://www.ebi.ac.uk/training/online/course/bioinfor
matics-terrified/what-database/relational-
databases/primary-and-secondary-databases
 http://www.diffen.com/difference/Data_vs_Informa
tion
 Google images
28

Primary and secondary databases ppt by puneet kulyana

In this document

More Related Content

What's hot

Similar to Primary and secondary databases ppt by puneet kulyana

Recently uploaded

Primary and secondary databases ppt by puneet kulyana