KEMBAR78
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf
Dr. Harisingh Gour Viswavidyalaya
A Central University
DEPARTMENT OF ZOOLOGY
TOPIC – DATABASES IN BIOINFORMATICS
MID II ASSIGNMENT
ZOO – SEC – 128
SUBMITED TO – MR. ANUPAM KUMAR
SUBMITED BY –
PRAVANJAN DASH
ROLL NO. – Y23265020, Msc 1st YEAR, 1st SEMESTER
INTRODUCTION OF DATABASE
BIOLOGICAL DATABASES are
 Collection of files containing records of biological data in
machine readable form Can be accessed, added, retrieved,
manipulated and modified.
 Store, manage, connect and distribute data.
 Data are arranged by sets of rules which are programmed
into software that manages the data called Database
Management System or DBMS.
 A biological database is a collection of data that is
structured, searchable, updated periodically and cross
referenced.
 The data is stores, maintained, annotated, curated and
stored for public/research use.
 Data collected and organized in a specific but useful way
Classification based on type of data stored
 Primary Databases: Contain original data in the form of
primary sequence data or structural data as submitted by the
scientific community.
 Secondary Databases: Contain information that has been
processed and derived from the raw data available in primary
database.eg: PROSITE, PRINTS, BLOCKS etc..
 Composite Databases: Collect and present data after
comparing and filtering them from different primary databases
and exhibit only the non redundant sequences.
PRIMARY DATA VERSUS SECONDARY DATA
PRIMARY DATA
• Primary data is a type of data researchers
directly collect from main sources.
• Includes real-time data.
• Collected to address a current research
problem.
• Accessing primary data includes a relatively
long process.
• Data collection tools include observations,
surveys, questionnaires, physical testing,
online questionnaires, personal or telephone
interviews, case studies, and focused group
discussions.
SECONDARY DATA
• Secondary data refers to already existing data
produced by the previous researchers.
• Related to the past.
• Primarily collected to address previously
existed research problems and can be used
to address the current research problem as
well.
• Referring to secondary data is quick and easy.
• Data collection tools include journal articles,
websites, books, government publications,
records, etc.
PRIMARY DATABASES
 Primary databases contain original biological data. They are
archives of raw sequence or structural data submitted by the scientific
community.
 Once given a database a accession number, the data in primary
database are never changed.
 There are three (Genbank, EMBL, DDBJ) major public sequence
databases that store raw nucleic acid sequence data produced and
submitted by researchers worldwide.
 SOME PRIMARY DATABASES
Nucleic acid databases: Gen Bank, EMBL, DDBJ
Protein sequence databases: PIR, Swiss-Prot, UNIPROT
Protein structure database: PDB
Metabolic databases: KEGG
SECONDARY DATABASE
• Secondary database contain additional information
derived from the analysis f data available in primary
sources. econdary databases are analysed in a variety
Of ways and contain different formation in different
formats.
• SOME SECONDARY DATABASES ARE
 TrEMBL
 Pfam
 PROSITE
 Profiles
 SCOP
 CATH
NUCLEOTIDE SEQUENCE DATABASE
• Composed of a group of nucleotide sequence entries.
• Data repositories that accept nucleic acid sequence data
and make it freely available to the public.
• All the three are members of the International Nucleotide
Sequence Database Consortium (INSDC) and interchange
data.
• GenBank, EMBL, DDBJ are principal nucleotide
databases.
PROTEIN SEQUENCE DATABASES
 An array of amino acid sequence entries arranged
according to the identification number.
 Well known protein sequence databases available
on www are
 Swiss-Prot
 PIR
 UNIPROT
PROTEIN STRUCTURE DATABASE
 Many proteins which exhibit a common evolutionary
origin, show structural similarities.
 Dissimilar proteins exhibit changes in primary, secondary,
teritiary and quarternary structures.
 Similar or dissimilar protein structure can be predicted
with structure database.
 These databases store a collection of three dimensional
structures of proteins.
 EXAMPLE IS pluggable database (PDB) .
THANK YOU

BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf

  • 1.
    Dr. Harisingh GourViswavidyalaya A Central University DEPARTMENT OF ZOOLOGY TOPIC – DATABASES IN BIOINFORMATICS MID II ASSIGNMENT ZOO – SEC – 128 SUBMITED TO – MR. ANUPAM KUMAR SUBMITED BY – PRAVANJAN DASH ROLL NO. – Y23265020, Msc 1st YEAR, 1st SEMESTER
  • 2.
    INTRODUCTION OF DATABASE BIOLOGICALDATABASES are  Collection of files containing records of biological data in machine readable form Can be accessed, added, retrieved, manipulated and modified.  Store, manage, connect and distribute data.  Data are arranged by sets of rules which are programmed into software that manages the data called Database Management System or DBMS.  A biological database is a collection of data that is structured, searchable, updated periodically and cross referenced.  The data is stores, maintained, annotated, curated and stored for public/research use.  Data collected and organized in a specific but useful way
  • 3.
    Classification based ontype of data stored  Primary Databases: Contain original data in the form of primary sequence data or structural data as submitted by the scientific community.  Secondary Databases: Contain information that has been processed and derived from the raw data available in primary database.eg: PROSITE, PRINTS, BLOCKS etc..  Composite Databases: Collect and present data after comparing and filtering them from different primary databases and exhibit only the non redundant sequences.
  • 4.
    PRIMARY DATA VERSUSSECONDARY DATA PRIMARY DATA • Primary data is a type of data researchers directly collect from main sources. • Includes real-time data. • Collected to address a current research problem. • Accessing primary data includes a relatively long process. • Data collection tools include observations, surveys, questionnaires, physical testing, online questionnaires, personal or telephone interviews, case studies, and focused group discussions. SECONDARY DATA • Secondary data refers to already existing data produced by the previous researchers. • Related to the past. • Primarily collected to address previously existed research problems and can be used to address the current research problem as well. • Referring to secondary data is quick and easy. • Data collection tools include journal articles, websites, books, government publications, records, etc.
  • 5.
    PRIMARY DATABASES  Primarydatabases contain original biological data. They are archives of raw sequence or structural data submitted by the scientific community.  Once given a database a accession number, the data in primary database are never changed.  There are three (Genbank, EMBL, DDBJ) major public sequence databases that store raw nucleic acid sequence data produced and submitted by researchers worldwide.  SOME PRIMARY DATABASES Nucleic acid databases: Gen Bank, EMBL, DDBJ Protein sequence databases: PIR, Swiss-Prot, UNIPROT Protein structure database: PDB Metabolic databases: KEGG
  • 6.
    SECONDARY DATABASE • Secondarydatabase contain additional information derived from the analysis f data available in primary sources. econdary databases are analysed in a variety Of ways and contain different formation in different formats. • SOME SECONDARY DATABASES ARE  TrEMBL  Pfam  PROSITE  Profiles  SCOP  CATH
  • 7.
    NUCLEOTIDE SEQUENCE DATABASE •Composed of a group of nucleotide sequence entries. • Data repositories that accept nucleic acid sequence data and make it freely available to the public. • All the three are members of the International Nucleotide Sequence Database Consortium (INSDC) and interchange data. • GenBank, EMBL, DDBJ are principal nucleotide databases.
  • 8.
    PROTEIN SEQUENCE DATABASES An array of amino acid sequence entries arranged according to the identification number.  Well known protein sequence databases available on www are  Swiss-Prot  PIR  UNIPROT
  • 9.
    PROTEIN STRUCTURE DATABASE Many proteins which exhibit a common evolutionary origin, show structural similarities.  Dissimilar proteins exhibit changes in primary, secondary, teritiary and quarternary structures.  Similar or dissimilar protein structure can be predicted with structure database.  These databases store a collection of three dimensional structures of proteins.  EXAMPLE IS pluggable database (PDB) .
  • 10.