KEMBAR78
Computational Genomics - Bioinformatics - IK | PPTX
TU-
SOFIA
Computer Sciences
and Engineering
BIOINFORMATIC
S
Computational Genomics
Ilgın KAVAKLIOĞULLARI
273213005
CSE III. COURSE
Genomic
Genomics is a discipline in genetics that
applies recombinant DNA, DNA sequencing
methods, and bioinformatics to sequence,
assemble, and analyze the function and
structure of genomes. Advances in genomics
have triggered a revolution in discovery-based
research to understand even the most complex
biological systems such as the brain. The field
includes efforts to determine the entire DNA
sequence of organisms and fine-scale genetic
mapping.
BIOINFORMATICS
Computational Genomics
Computational genomics
(often referred to as Computational
Genetics) refers to the use of
computational and statistical analysis to
decipher biology from genome
sequences and related data, including
both DNA and RNA sequence as well as
other "post-genomic" data.
BIOINFORMATICS
Computational Genomics
These, in combination with
computational and statistical approaches
to understanding the function of the
genes and statistical association
analysis, this field is also often referred
to as Computational and Statistical
Genetics/genomics.
BIOINFORMATICS
Computational Genomics
As such, computational genomics may be
regarded as a subset of bioinformatics and
computational biology, but with a focus on using
whole genomes (rather than individual genes) to
understand the principles of how the DNA of a
species controls its biology at the molecular
level and beyond. With the current abundance
of massive biological datasets, computational
studies have become one of the most important
means to biological discovery.
BIOINFORMATICS
BIOINFORMATICS
History of Computational Genomics
The roots of computational genomics are
shared with those of bioinformatics. During the
1960s, Margaret Dayhoff and others at the National
Biomedical Research Foundation assembled
databases of homologous protein sequences for
evolutionary study. Their research developed a
phylogenetic tree that determined the evolutionary
changes that were required for a particular protein to
change into another protein based on the underlying
amino acid sequences. This led them to create a
scoring matrix that assessed the likelihood of one
protein being related to another.
BIOINFORMATICS
History of Computational Genomics
Beginning in the 1980s, databases of genome
sequences began to be recorded, but this presented new
challenges in the form of searching and comparing the
databases of gene information. Unlike text-searching algorithms
that are used on websites such as Google or Wikipedia,
searching for sections of genetic similarity requires one to find
strings that are not simply identical, but similar. This led to the
development of the Needleman-Wunsch algorithm, which is a
dynamic programming algorithm for comparing sets of amino
acid sequences with each other by using scoring matrices
derived from the earlier research by Dayhoff. Later, the BLAST
algorithm was developed for performing fast, optimized
searches of gene sequence databases. BLAST and its
derivatives are probably the most widely used algorithms for this
purpose.
BIOINFORMATICS
History of Computational Genomics
The emergence of the phrase "computational
genomics" coincides with the availability of complete sequenced
genomes in the mid-to-late 1990s. The first meeting of the
Annual Conference on Computational Genomics was organized
by scientists from The Institute for Genomic Research (TIGR) in
1998, providing a forum for this speciality and effectively
distinguishing this area of science from the more general fields
of Genomics or Computational Biology.The first use of this term
in scientific literature, according to MEDLINE abstracts, was just
one year earlier in Nucleic Acids Research.
The final Computational Genomics conference was held
in 2006, featuring a keynote talk by Nobel Laureate Barry
Marshall, co-discoverer of the link between Helicobacter pylori
and stomach ulcers. As of 2014, the leading conferences in the
field include Intelligent Systems for Molecular Biology (ISMB)
and RECOMB.
BIOINFORMATICS
History of Computational Genomics
The development of computer-assisted
mathematics (using products such as Mathematica or
Matlab) has helped engineers, mathematicians and
computer scientists to start operating in this domain, and
a public collection of case studies and demonstrations is
growing, ranging from whole genome comparisons to
gene expression analysis. This has increased the
introduction of different ideas, including concepts from
systems and control, information theory, strings analysis
and data mining. It is anticipated that computational
approaches will become and remain a standard topic for
research and teaching, while students fluent in both topics
start being formed in the multiple courses created in the
past few years.
BIOINFORMATICS
BIOINFORMATICS
Contributions of computational genomics research
to biology
 Contributions of computational genomics research to
biology include:
 discovering subtle patterns in genomic sequences
proposing cellular signaling networks
 proposing mechanisms of genome evolution
 predict precise locations of all human genes using
comparative genomics techniques with several
mammalian and vertebrate species
 predict conserved genomic regions that are related to
early embryonic development
 discover potential links between repeated sequence
motifs and tissue-specific gene expression
 measure regions of genomes that have undergone
unusually rapid evolution
BIOINFORMATICS
First Computer Model of an Organism
Researchers at Stanford University
created the first software simulation of an entire
organism. The smallest free-living organism,
Mycoplasma genitalium, has 525 genes which
are fully mapped.
With data from more than 900 scientific
papers reported on the bacterium, researchers
developed the software model using the object-
oriented programming approach.
BIOINFORMATICS
First Computer Model of an Organism
 A series of modules mimic the various functions of the
cell and then are integrated together into a whole
simulated organism. The simulation runs on a single
CPU, recreates the complete life span of the cell at the
molecular level, reproducing the interactions of
molecules in cell processes including metabolism and
cell division.
 The ‘silicon cell’ will act as computerized laboratories
that could perform experiments which are difficult to do
on an actual organism, or could carry out procedures
much faster. The applications will include faster
screening of new compounds, understanding of basic
cellular principles and behavior.
BIOINFORMATICS
The Covert Lab incorporated more than 1,900 experimentally
observed parameters into their model of the tiny parasite
Mycoplasma genitalium.
BIOINFORMATICS
Problems in computational biology
 Permutations
 Graph algorithms
 Pattern matching and discovery
 String similarity
 Clustering
 Optimization
 3D structure alignment
 Statistical methods, significance
 Randomized algorithms
BIOINFORMATICS
Data Storage
Use computational algorithms to efficiently store
large amounts of biological data.
 Standardize
 Ontologies
 Search for 3D protein structures
BIOINFORMATICS
Biological Databases
Vast genomic data is freely available online
NCBI GenBank http://ncbi.nih.gov
Huge collection of databases,
including DNA sequence database
Protein Data Bank http://www.pdb.org
Database of protein tertiary structures
SWISSPROT http://www.expasy.org/sprot/
Database of annotated protein sequences
PROSITE http://kr.expasy.org/prosite
Database of protein active site motifs
BIOINFORMATICS
BIOINFORMATICS

Computational Genomics - Bioinformatics - IK

  • 1.
    TU- SOFIA Computer Sciences and Engineering BIOINFORMATIC S ComputationalGenomics Ilgın KAVAKLIOĞULLARI 273213005 CSE III. COURSE
  • 2.
    Genomic Genomics is adiscipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes. Advances in genomics have triggered a revolution in discovery-based research to understand even the most complex biological systems such as the brain. The field includes efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping. BIOINFORMATICS
  • 3.
    Computational Genomics Computational genomics (oftenreferred to as Computational Genetics) refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. BIOINFORMATICS
  • 4.
    Computational Genomics These, incombination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. BIOINFORMATICS
  • 5.
    Computational Genomics As such,computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes (rather than individual genes) to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery. BIOINFORMATICS
  • 6.
  • 7.
    History of ComputationalGenomics The roots of computational genomics are shared with those of bioinformatics. During the 1960s, Margaret Dayhoff and others at the National Biomedical Research Foundation assembled databases of homologous protein sequences for evolutionary study. Their research developed a phylogenetic tree that determined the evolutionary changes that were required for a particular protein to change into another protein based on the underlying amino acid sequences. This led them to create a scoring matrix that assessed the likelihood of one protein being related to another. BIOINFORMATICS
  • 8.
    History of ComputationalGenomics Beginning in the 1980s, databases of genome sequences began to be recorded, but this presented new challenges in the form of searching and comparing the databases of gene information. Unlike text-searching algorithms that are used on websites such as Google or Wikipedia, searching for sections of genetic similarity requires one to find strings that are not simply identical, but similar. This led to the development of the Needleman-Wunsch algorithm, which is a dynamic programming algorithm for comparing sets of amino acid sequences with each other by using scoring matrices derived from the earlier research by Dayhoff. Later, the BLAST algorithm was developed for performing fast, optimized searches of gene sequence databases. BLAST and its derivatives are probably the most widely used algorithms for this purpose. BIOINFORMATICS
  • 9.
    History of ComputationalGenomics The emergence of the phrase "computational genomics" coincides with the availability of complete sequenced genomes in the mid-to-late 1990s. The first meeting of the Annual Conference on Computational Genomics was organized by scientists from The Institute for Genomic Research (TIGR) in 1998, providing a forum for this speciality and effectively distinguishing this area of science from the more general fields of Genomics or Computational Biology.The first use of this term in scientific literature, according to MEDLINE abstracts, was just one year earlier in Nucleic Acids Research. The final Computational Genomics conference was held in 2006, featuring a keynote talk by Nobel Laureate Barry Marshall, co-discoverer of the link between Helicobacter pylori and stomach ulcers. As of 2014, the leading conferences in the field include Intelligent Systems for Molecular Biology (ISMB) and RECOMB. BIOINFORMATICS
  • 10.
    History of ComputationalGenomics The development of computer-assisted mathematics (using products such as Mathematica or Matlab) has helped engineers, mathematicians and computer scientists to start operating in this domain, and a public collection of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis. This has increased the introduction of different ideas, including concepts from systems and control, information theory, strings analysis and data mining. It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while students fluent in both topics start being formed in the multiple courses created in the past few years. BIOINFORMATICS
  • 11.
  • 12.
    Contributions of computationalgenomics research to biology  Contributions of computational genomics research to biology include:  discovering subtle patterns in genomic sequences proposing cellular signaling networks  proposing mechanisms of genome evolution  predict precise locations of all human genes using comparative genomics techniques with several mammalian and vertebrate species  predict conserved genomic regions that are related to early embryonic development  discover potential links between repeated sequence motifs and tissue-specific gene expression  measure regions of genomes that have undergone unusually rapid evolution BIOINFORMATICS
  • 13.
    First Computer Modelof an Organism Researchers at Stanford University created the first software simulation of an entire organism. The smallest free-living organism, Mycoplasma genitalium, has 525 genes which are fully mapped. With data from more than 900 scientific papers reported on the bacterium, researchers developed the software model using the object- oriented programming approach. BIOINFORMATICS
  • 14.
    First Computer Modelof an Organism  A series of modules mimic the various functions of the cell and then are integrated together into a whole simulated organism. The simulation runs on a single CPU, recreates the complete life span of the cell at the molecular level, reproducing the interactions of molecules in cell processes including metabolism and cell division.  The ‘silicon cell’ will act as computerized laboratories that could perform experiments which are difficult to do on an actual organism, or could carry out procedures much faster. The applications will include faster screening of new compounds, understanding of basic cellular principles and behavior. BIOINFORMATICS
  • 15.
    The Covert Labincorporated more than 1,900 experimentally observed parameters into their model of the tiny parasite Mycoplasma genitalium. BIOINFORMATICS
  • 16.
    Problems in computationalbiology  Permutations  Graph algorithms  Pattern matching and discovery  String similarity  Clustering  Optimization  3D structure alignment  Statistical methods, significance  Randomized algorithms BIOINFORMATICS
  • 17.
    Data Storage Use computationalalgorithms to efficiently store large amounts of biological data.  Standardize  Ontologies  Search for 3D protein structures BIOINFORMATICS
  • 18.
    Biological Databases Vast genomicdata is freely available online NCBI GenBank http://ncbi.nih.gov Huge collection of databases, including DNA sequence database Protein Data Bank http://www.pdb.org Database of protein tertiary structures SWISSPROT http://www.expasy.org/sprot/ Database of annotated protein sequences PROSITE http://kr.expasy.org/prosite Database of protein active site motifs BIOINFORMATICS
  • 19.