KEMBAR78
Bio Informatics science related work.pptx
Soumya Mathunny
P2115015
I PG Zoology
What is bioinformatics
Computational biology Vs bioinformatics
Branches of Bioinformatics
Scope of bioinformatics
Biological databases
BIOINFORMATICS
• Original definition- Paulien Hogeweg
• “Application of informational technology and computer science to the field
of molecular biology”
• Discipline where techniques such as applied mathematics, computer science,
statistics, artificial intelligence are integrated to form biological problems
• Three components:
Development of new algorithms and statistics for accessing relationship
among large set of biological data ( DNA sequence data)
Application of these tools for analysis and interpretation of various
biological data(nucleotide sequences, amino acid sequence)
Development of database for an efficient storage, access and management of
various biological information
Computational biology and bioinformatics
 Computational biology and bioinformatics are related branches but with subtle
difference
 Computational biology- use of computer technology to solve a question
 Development of algorithms and statistical models to analyse biological data
 Ramachandran plot
 Bioinformatics- Multipurpose computerised analysis of
biological data to make statistical or comparative
inference
 Collection and storage of biological information
 Derives knowledge from computer analysis of
biological data
 databases
Genomics
Transcriptomics
Branches
of
Bioinformatics
Proteomics
Systems
Biology
Functional
Genomics
Metabolomics
Structural
Genomics
Nutritional
Genomics
Cheminformatics Glycomics
Molecular
Phylogeny
• Modern biological
research
• Nucleotide
sequence- mapped
and located
• Analysis of NA
Genomics
• Study of
transcriptome-
whole set of mRNA
• Depict the
expression level of
gene(DNA micro
arrays)
• Sequencing of
amino acids in a
protein
• 3D structure of
protein
• Hb & insulin
Transcriptomics Proteomics
Systems Biology Functional Genomics Metabolomics
• System level
understanding of
biological systems
• Interaction
between
components of
biological system
• Determining functions
of genes
• Understand functions
of human gene, genes
responsible for
production of
antibodies,
pathogenesis
• Chemical finger
print
• Drug toxicity
assessment
• 3D structure of gene
products
• NMR studies
Structural Genomics
• Gene responsible
for synthesis of
nutritionally
important enzymes
• Golden rice (pro
vitamin A)
• Products of
secondary
metabolism
• Natural products
Nutritional Genomics Cheminformatics
Glycomics Molecular Phylogeny
• Carbo hydrate
research
• Future field
• Origin & evolution
of organism
• To understand
genetic and
evolutionary
relationship of
organisms
SCOPE OF BIOINFORMATICS
 The main scope of bioinformatics is to fetch all the relevant data
and process into useful information
 Management and analysis of a wide set of biological data
 It is used in human genome sequencing where large sets of data are
being handled
 Bioinformatics plays a major role in the research and development
of the biomedical field
 Bioinformatics uses computational coding for several applications
that involve finding gene and protein functions and sequences,
developing evolutionary relationships, and analysing the three
dimensional shape of proteins
 Research work based on genetic disease and microbial disease
entirely depend on bioinformatics where the derived information
can be vital to produce personalised medicine
Biological databases
• Collection of data that is structured, searchable, update periodically and cross
referenced
• Biological databases are developed to perform several functions such as:
i. Databases aid in systematization of results from biological experiments and
analysis
ii. Database makes biological data available to scientists at one place and help the
to obtain data for their research and cross validation
iii. Biological data in databases are available in computer readable form and this
forms the first fundamental step of biological data analysis
Classification of Biological databases
• Based on data types
1. Genome databases- human, mouse, yeast, C. elegans, flybase
2. Sequence database
a) Nucleotide databases: Alternative splicing, EMBL-bank, Ensembl, Genomes
server, Genome MOT, EMBL- Align, Simple queries, dbSTS queries, Parasites,
Mutations and IMGT
b) Protein databases: Swiss- Prot, TrEMBL, Inter Pro, CluSTr, IPI, GOA, GO,
Proteome analysis, HPI, IntEnz, TrEmBL new, SP_ML, NEWT and PANDIT
3. Structure database- PDB, MSD, NDB, FSSP and DALI
4. Micro array database- Array Express and MIAME
5. Chemical database- chEBI
6. Pathway database- BRENDA, KEGG and BioSilico
7. Enzyme database- EC enzyme database, REBASE
8. disease database- OMIM, OMIA
9. Literature database- MEDLINE, Flybase archives
• Based on maintainer status: NCBI, EMBL, SIB
• Based on data access
1. Publically available
2. Available with copyright
3. Browsing only, accessible but not downloadable
4. Academic, but not freely available
5. Proprietary, commercial
6. Restricted SQL queries against underlying DBMS
• Based on data source
1. Primary data(archival)
a. Nucleotide : Gen Bank/ EMBL/ DDBJ
b. protein: Uniprot, TrEMBL
c. structure: PDB
d. literature: Medline(PubMed)
2. Secondary database(curated)
a. Genomic: RefSeq, TIGR gene indices of human
b. Proteomic: Prosite, Swiss-Prot
• Database design- Relational and object oriented
• Organism-Bacteria, Virus, Human etc
Primary databases
1. NUCLEOTIDE SEQUENCE DATABASE
i. GenBank
• Hosted by NCBI
• Offers all publically available nucleotide sequence, their protein translation and
their bibliographic and annotated information
• Facilitate and encourages direct submission of sequences data by providing vary
simple and user friendly process
• Researches from anywhere in the world can submit their data to GenBank
• The information in GenBank is growing exponentially and is assumed to continue
growing with a doubling time of approximately 30 months
• http://www.ncbi.nlm.nih.gov/genbank/
ii. EMBL
• Nucleotide sequence database hosted at UK by EMBL European Bioinformatics
Institute
• Non profit research institution supported by 20 European countries and Australia
• Collects nucleotide sequence data from individual researches, genome sequencing
projects and patent applications
• First established in 1974
• Contains taxonomic and non taxonomic divisions
• Sequences are stored in the database as they would exist in biological state
• Stored data generally correspond to wild type sequences without mutation or
genetic manipulations
https://www.ebi.ac.uk/
iii. DDBJ
• DNA Data Bank of Japan
• Started on 1986
• Now hosted at National Institute of Genetics
• Gather data mainly from scientists in Japan and also from researchers all over
the world and share these nucleotide data with EMBL and GenBank
• Each database entry includes details of sequence, submitter’s details,
bibliographic references, biological significance and the scientific name and
taxonomy of the organism
• http://www.ddbj.nig.ac.jp
2. PROTEIN SEQUENCE DATABASE
i. TrEMBL
• Translated EMBL is a computer-annotated supplement of SwissProt
• Developed by SwissProt groups at SIB and EBI in1996
• Contains translations of all coding sequences in EMBL except for coding
sequences already included in SwissProt
• Created to accommodate the enormous sequence information and the time
consuming curating process
• Two major sections: SP-TrEMBL and REM-TrEMBL
• SP-TrEMBL contains the entries which will finally merge into SwissProt
• REM-TrEMBL contains sequences that will not get include in SwissProt
ii. Uniprot
• Uniprot is a freely accessible database of protein sequence and
functional information
• It contains large amount of information about the biological function
of proteins derived from the research literature
• Nowadays it combines a network of sister databases centralising all
levels of annotation produced for protein sequences
• https://www.uniprot.org/help/linking_to_uniprot
3. STRUCTURE DATABASES
PDB
• Protein Data Bank is the main primary database used for the production of 3D
structure of proteins and nucleic acid
• This is the single world-wide archive of structural data and is maintained by
Research Collaboratory for Structural Bioinformatics
• Knowledge can be used to help derive the role played by higher level
structure of molecules in human health and disease and in drug development
• The data obtained from x-ray crystallography and NMR spectroscopy are
submitted to the PDB
• https://pdbj.org/help/faq_data03
4. LITERATURE DATABASE
Medline(PubMed)
• Bibliographic database
• Free database accessing the MEDLINE database of citations and some full
text articles on life science and fields such as medicine, nursing, healthcare
system and preclinical sciences
• Developed and maintained by National Centre for Biotechnology Information
• New journals are not included automatically in PubMed
• It also provide access to additional relevant websites and links to other NCBI
molecular biology resources
• https://pubmed.ncbi.nlm.nih.gov/
1. PROTEOMIC
i. Prosite
• Prosite a part of Swiss prot is a database of protein families and domains
• Consists of entries describing the families, domains and functional sites as well as
amino acid patterns, signature and profiles in them
• Help to identify to which known protein family a new sequence belongs
• Prosite offers tools for protein sequence analysis and motif detection
• Basis of Prosite is regular expression describing characteristic subsequences of specific
protein families and domains
• Part of EXPAST proteomic analysis servers
• https://prosite.expasy.org/
Secondary databases
ii. PRINTS
• Prints is a database of protein which uses a different approach of
pattern recognition called ‘fingerprinting’
• Provides both a detailed annotation resource for protein families, and
a diagnostic tool for new protein sequences
• Prints database, a very high quality database is created with a great
deal of manual effort
iii. BLOCKS
• Blocks database are databases which represents protein families in
terms of multiple aligned ungapped segments
• Derive from most highly conserved regions in a group of protein or
protein family
• Ungapped multiple alignments of short regions are called blocks
• Database was constructed from sequences of protein families using a
fully automated method
• WWW. http://blocks.fhcrc.org
iv. Swiss prot
• Swiss prot is a high quality, manually annotated non-redundant protein
sequence database
• Created in 1986
• Developed by the Swiss Institute of Bioinformatics and the European
Bioinformatics Institute
• Provides high level of annotations including description of function of the
protein, post-transcriptional modifications
• Aim of the database is to provide all known relevant information about a
particular protein
• http://www.ebi.ac.uk/swissprot/
v. TIGR Database
• It provides a collection of molecular biology database comprising DNA
and protein sequence, gene expression, function, cellular role etc.
• Maintained at the Institute of Genomic Research which is a part of J.
Craig Venter Institute in USA
• http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
RNA databases
Information on sequence of ribonucleotides in RNA, coding and non-coding RNA
sequences, functions of RNA molecules and their spatial structures is available in
databases
i. Rfam
• Stores non coding RNA families
• Rfam also contains multiple sequence alignments and models
• Allow user to view and download multiple sequence alignments, read annotation
and examine species distribution of family members
• Also provides link to Wikipedia so that entries can be created or edited by users
• https://rfam.xfam.org/
ii. Gt RNA
• Gt RNA database stores genomic tRNA ribonucleotide sequences and
secondary structures
• Index of RNA structures provides a lot of information about RNA
• Including indexes of the locations of molecular structures in the PDB
database
• Data on ribonucleic acid sequences in RNA can also be found in
GenBank
• http://gtrnadb.ucsc.edu/
Curated databases
• Data collected by human efforts through consultation, verification and
aggregation of existing source and interpreting new raw data
• Machine readable
• Source pre-existing data
• Updations are followed
• E.g.: Swiss prot
Uncurated databases
• Follows automated curation
• Provides quick updates and tend to be larger
• Less accuracy
• E.g.: PDB
Functional databases
• It provides information on the physiology role of gene produced
• Enzyme activity
• collect and experiment biological information
REFERENCES
• Zhumur Ghosh and Bibekanand Mallick(2008) ‘Bioinformatics-
Principles and Applications’ Oxford University press pg: 3-133
• K. Vijayakumaran Nair,etal. (2019) ‘Informatics- Bioinformatics and
Molecular Biology ‘ Academia . Pg: 145-158
• Andrzej Polanski, Mark Kimmel.(2007). ‘Bioinformatics’ Springer New
York . Pg: 349-354
• D.R.Westhead etal.(2003). ‘Bioinformatics’ Viva Books Pvt. Ltd. Pg: 35-
49
Bio Informatics science related work.pptx

Bio Informatics science related work.pptx

  • 1.
  • 2.
    What is bioinformatics Computationalbiology Vs bioinformatics Branches of Bioinformatics Scope of bioinformatics Biological databases
  • 3.
    BIOINFORMATICS • Original definition-Paulien Hogeweg • “Application of informational technology and computer science to the field of molecular biology” • Discipline where techniques such as applied mathematics, computer science, statistics, artificial intelligence are integrated to form biological problems • Three components: Development of new algorithms and statistics for accessing relationship among large set of biological data ( DNA sequence data) Application of these tools for analysis and interpretation of various biological data(nucleotide sequences, amino acid sequence) Development of database for an efficient storage, access and management of various biological information
  • 4.
    Computational biology andbioinformatics  Computational biology and bioinformatics are related branches but with subtle difference  Computational biology- use of computer technology to solve a question  Development of algorithms and statistical models to analyse biological data  Ramachandran plot  Bioinformatics- Multipurpose computerised analysis of biological data to make statistical or comparative inference  Collection and storage of biological information  Derives knowledge from computer analysis of biological data  databases
  • 5.
  • 6.
    • Modern biological research •Nucleotide sequence- mapped and located • Analysis of NA Genomics • Study of transcriptome- whole set of mRNA • Depict the expression level of gene(DNA micro arrays) • Sequencing of amino acids in a protein • 3D structure of protein • Hb & insulin Transcriptomics Proteomics Systems Biology Functional Genomics Metabolomics • System level understanding of biological systems • Interaction between components of biological system • Determining functions of genes • Understand functions of human gene, genes responsible for production of antibodies, pathogenesis • Chemical finger print • Drug toxicity assessment
  • 7.
    • 3D structureof gene products • NMR studies Structural Genomics • Gene responsible for synthesis of nutritionally important enzymes • Golden rice (pro vitamin A) • Products of secondary metabolism • Natural products Nutritional Genomics Cheminformatics Glycomics Molecular Phylogeny • Carbo hydrate research • Future field • Origin & evolution of organism • To understand genetic and evolutionary relationship of organisms
  • 8.
    SCOPE OF BIOINFORMATICS The main scope of bioinformatics is to fetch all the relevant data and process into useful information  Management and analysis of a wide set of biological data  It is used in human genome sequencing where large sets of data are being handled  Bioinformatics plays a major role in the research and development of the biomedical field  Bioinformatics uses computational coding for several applications that involve finding gene and protein functions and sequences, developing evolutionary relationships, and analysing the three dimensional shape of proteins  Research work based on genetic disease and microbial disease entirely depend on bioinformatics where the derived information can be vital to produce personalised medicine
  • 9.
    Biological databases • Collectionof data that is structured, searchable, update periodically and cross referenced • Biological databases are developed to perform several functions such as: i. Databases aid in systematization of results from biological experiments and analysis ii. Database makes biological data available to scientists at one place and help the to obtain data for their research and cross validation iii. Biological data in databases are available in computer readable form and this forms the first fundamental step of biological data analysis
  • 10.
    Classification of Biologicaldatabases • Based on data types 1. Genome databases- human, mouse, yeast, C. elegans, flybase 2. Sequence database a) Nucleotide databases: Alternative splicing, EMBL-bank, Ensembl, Genomes server, Genome MOT, EMBL- Align, Simple queries, dbSTS queries, Parasites, Mutations and IMGT b) Protein databases: Swiss- Prot, TrEMBL, Inter Pro, CluSTr, IPI, GOA, GO, Proteome analysis, HPI, IntEnz, TrEmBL new, SP_ML, NEWT and PANDIT 3. Structure database- PDB, MSD, NDB, FSSP and DALI 4. Micro array database- Array Express and MIAME 5. Chemical database- chEBI 6. Pathway database- BRENDA, KEGG and BioSilico 7. Enzyme database- EC enzyme database, REBASE 8. disease database- OMIM, OMIA 9. Literature database- MEDLINE, Flybase archives
  • 11.
    • Based onmaintainer status: NCBI, EMBL, SIB • Based on data access 1. Publically available 2. Available with copyright 3. Browsing only, accessible but not downloadable 4. Academic, but not freely available 5. Proprietary, commercial 6. Restricted SQL queries against underlying DBMS • Based on data source 1. Primary data(archival) a. Nucleotide : Gen Bank/ EMBL/ DDBJ b. protein: Uniprot, TrEMBL c. structure: PDB d. literature: Medline(PubMed)
  • 12.
    2. Secondary database(curated) a.Genomic: RefSeq, TIGR gene indices of human b. Proteomic: Prosite, Swiss-Prot • Database design- Relational and object oriented • Organism-Bacteria, Virus, Human etc
  • 13.
    Primary databases 1. NUCLEOTIDESEQUENCE DATABASE i. GenBank • Hosted by NCBI • Offers all publically available nucleotide sequence, their protein translation and their bibliographic and annotated information • Facilitate and encourages direct submission of sequences data by providing vary simple and user friendly process • Researches from anywhere in the world can submit their data to GenBank • The information in GenBank is growing exponentially and is assumed to continue growing with a doubling time of approximately 30 months • http://www.ncbi.nlm.nih.gov/genbank/
  • 14.
    ii. EMBL • Nucleotidesequence database hosted at UK by EMBL European Bioinformatics Institute • Non profit research institution supported by 20 European countries and Australia • Collects nucleotide sequence data from individual researches, genome sequencing projects and patent applications • First established in 1974 • Contains taxonomic and non taxonomic divisions • Sequences are stored in the database as they would exist in biological state • Stored data generally correspond to wild type sequences without mutation or genetic manipulations https://www.ebi.ac.uk/
  • 15.
    iii. DDBJ • DNAData Bank of Japan • Started on 1986 • Now hosted at National Institute of Genetics • Gather data mainly from scientists in Japan and also from researchers all over the world and share these nucleotide data with EMBL and GenBank • Each database entry includes details of sequence, submitter’s details, bibliographic references, biological significance and the scientific name and taxonomy of the organism • http://www.ddbj.nig.ac.jp
  • 16.
    2. PROTEIN SEQUENCEDATABASE i. TrEMBL • Translated EMBL is a computer-annotated supplement of SwissProt • Developed by SwissProt groups at SIB and EBI in1996 • Contains translations of all coding sequences in EMBL except for coding sequences already included in SwissProt • Created to accommodate the enormous sequence information and the time consuming curating process • Two major sections: SP-TrEMBL and REM-TrEMBL • SP-TrEMBL contains the entries which will finally merge into SwissProt • REM-TrEMBL contains sequences that will not get include in SwissProt
  • 17.
    ii. Uniprot • Uniprotis a freely accessible database of protein sequence and functional information • It contains large amount of information about the biological function of proteins derived from the research literature • Nowadays it combines a network of sister databases centralising all levels of annotation produced for protein sequences • https://www.uniprot.org/help/linking_to_uniprot
  • 18.
    3. STRUCTURE DATABASES PDB •Protein Data Bank is the main primary database used for the production of 3D structure of proteins and nucleic acid • This is the single world-wide archive of structural data and is maintained by Research Collaboratory for Structural Bioinformatics • Knowledge can be used to help derive the role played by higher level structure of molecules in human health and disease and in drug development • The data obtained from x-ray crystallography and NMR spectroscopy are submitted to the PDB • https://pdbj.org/help/faq_data03
  • 19.
    4. LITERATURE DATABASE Medline(PubMed) •Bibliographic database • Free database accessing the MEDLINE database of citations and some full text articles on life science and fields such as medicine, nursing, healthcare system and preclinical sciences • Developed and maintained by National Centre for Biotechnology Information • New journals are not included automatically in PubMed • It also provide access to additional relevant websites and links to other NCBI molecular biology resources • https://pubmed.ncbi.nlm.nih.gov/
  • 20.
    1. PROTEOMIC i. Prosite •Prosite a part of Swiss prot is a database of protein families and domains • Consists of entries describing the families, domains and functional sites as well as amino acid patterns, signature and profiles in them • Help to identify to which known protein family a new sequence belongs • Prosite offers tools for protein sequence analysis and motif detection • Basis of Prosite is regular expression describing characteristic subsequences of specific protein families and domains • Part of EXPAST proteomic analysis servers • https://prosite.expasy.org/ Secondary databases
  • 21.
    ii. PRINTS • Printsis a database of protein which uses a different approach of pattern recognition called ‘fingerprinting’ • Provides both a detailed annotation resource for protein families, and a diagnostic tool for new protein sequences • Prints database, a very high quality database is created with a great deal of manual effort
  • 22.
    iii. BLOCKS • Blocksdatabase are databases which represents protein families in terms of multiple aligned ungapped segments • Derive from most highly conserved regions in a group of protein or protein family • Ungapped multiple alignments of short regions are called blocks • Database was constructed from sequences of protein families using a fully automated method • WWW. http://blocks.fhcrc.org
  • 23.
    iv. Swiss prot •Swiss prot is a high quality, manually annotated non-redundant protein sequence database • Created in 1986 • Developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute • Provides high level of annotations including description of function of the protein, post-transcriptional modifications • Aim of the database is to provide all known relevant information about a particular protein • http://www.ebi.ac.uk/swissprot/
  • 24.
    v. TIGR Database •It provides a collection of molecular biology database comprising DNA and protein sequence, gene expression, function, cellular role etc. • Maintained at the Institute of Genomic Research which is a part of J. Craig Venter Institute in USA • http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi
  • 25.
    RNA databases Information onsequence of ribonucleotides in RNA, coding and non-coding RNA sequences, functions of RNA molecules and their spatial structures is available in databases i. Rfam • Stores non coding RNA families • Rfam also contains multiple sequence alignments and models • Allow user to view and download multiple sequence alignments, read annotation and examine species distribution of family members • Also provides link to Wikipedia so that entries can be created or edited by users • https://rfam.xfam.org/
  • 26.
    ii. Gt RNA •Gt RNA database stores genomic tRNA ribonucleotide sequences and secondary structures • Index of RNA structures provides a lot of information about RNA • Including indexes of the locations of molecular structures in the PDB database • Data on ribonucleic acid sequences in RNA can also be found in GenBank • http://gtrnadb.ucsc.edu/
  • 27.
    Curated databases • Datacollected by human efforts through consultation, verification and aggregation of existing source and interpreting new raw data • Machine readable • Source pre-existing data • Updations are followed • E.g.: Swiss prot Uncurated databases • Follows automated curation • Provides quick updates and tend to be larger • Less accuracy • E.g.: PDB
  • 28.
    Functional databases • Itprovides information on the physiology role of gene produced • Enzyme activity • collect and experiment biological information
  • 29.
    REFERENCES • Zhumur Ghoshand Bibekanand Mallick(2008) ‘Bioinformatics- Principles and Applications’ Oxford University press pg: 3-133 • K. Vijayakumaran Nair,etal. (2019) ‘Informatics- Bioinformatics and Molecular Biology ‘ Academia . Pg: 145-158 • Andrzej Polanski, Mark Kimmel.(2007). ‘Bioinformatics’ Springer New York . Pg: 349-354 • D.R.Westhead etal.(2003). ‘Bioinformatics’ Viva Books Pvt. Ltd. Pg: 35- 49