KEMBAR78
Bda2015 tutorial-part1-intro | PDF
16th December 2015
Genomics 3.0: Big Data in
Precision Medicine
Asoke K Talukder, Ph.D
InterpretOmics, Bangalore, India
17th December 2009
Big Data Analytics 2015
Hyderabad 16-18 December, 2015
16th December 2015
Acknowledgement
• BDA2015 Technical committee
• Authors & Publishers making their articles Open
Access in the Web
• Open Source Software/Foundation
• Authors of Open Source & Open Domain software
• NCBI & other open domain databases
• Wikipedia & other sites that believe in Bhikshu
Economy
2
16th December 2015 3
Disclaimer
• During my research for this tutorial, I have referred
many text and many presentations available in the
Web and obtained from various colleagues and
professionals. I tried to give credit to creators of
artifacts used in this presentation; however, if I
have missed credit citation to the original author,
that is undeliberate and unintentional. Such
omissions are regretted.
16th December 2015
About the Speaker
• Dr. Asoke K. Talukder is a computer scientist – worked for
companies like Fujitsu-ICIM, Microsoft, Oracle, Informix, Digital,
Hewlett Packard, ICL, Sequoia, Northern Telecom, NEC,
KredietBank, iGate, Cellnext, etc. Dr. Asoke authored/edited six
books out of which two are translated in Chinese and published
many peer-reviewed research papers. He is recipient of many
international awards including All India Radio/Doordarshan award,
ICIM Professional Excellence Award, ICL Excellence Award, IBM
Solutions Excellence Award, Simagine GSMWorld Award etc. He
has been listed in “Who’s Who in the World”, “Who’s Who in
Science and Engineering”, and “Outstanding Scientists of 21st
Century”. He did M.Sc (Physics with Biophysics Major) and Ph.D in
Computer Science. He was the DaimlerChrysler Chair Professor at
IIIT, Adjunct Professor, Department of CSE, NIT Warangal and
Adjunct Faculty CE, NITK, Surathkal. He is Co-founder and Chief
Scientific Officer of InterpretOmics the Data Sciences and Systems
Biology company.
4
16th December 2015
Part I - Introduction
16th December 2015
Everyday Newspaper Headlines
6
16th December 2015
Structure of the Tutorial
• Introduction to Omic Sciences
• Omic Sciences Challenges
• Computational Biology
• Algorithms, & Data Mining in Biology
• Blood Biopsy – a case study
7
16th December 2015
Goal of this Tutorial
• This tutorial will define the role of Big Data and
Data Sciences in biology and lifesciences. With the
help of chemistry and physics, we have some
understanding of biology. With advancement of
technology, our next leap in biology is becoming
possible. We need Mathematics and Computers to
solve grand challenges in Biology for better
understanding of life and understanding of
genomics – the building block of life. This will help
solve problems in life like diseases management or
management of food and environment
8
16th December 2015
Leading causes of death (U.S., 1999)
number of % total
Rank Cause deaths deaths
1 heart disease 725,192 30.3
2 malignant neoplasm 549,192 23.0
3 cerebrovascular disease 167,366 7.0
4 chronic lower respiratory 124,181 5.2
5 accidents 97,860 4.1
6 diabetes mellitus 68,399 2.9
7 influenza, pneumonia 63,730 2.7
8 Alzheimer’s disease 44,536 1.9
9 nephritis & related 35,525 1.5
10 septicemia 30,680 1.3
11 … all other 2,391,39920.2
Source: National Vital Statistics Reports 49(11):1-87, 2001.
Classification of Disease
9
16th December 2015
Genomics and World Health
• “It is now believed that the information generated by
genomics will, in the long-term, have major benefits for the
prevention, diagnosis and management of many diseases
which hitherto have been difficult or impossible to control.
These include communicable and genetic diseases,
together with other common killers or causes of chronic
illhealth, including cardiovascular disease, cancer, diabetes,
the major psychoses, dementia, rheumatic disease,
asthma, and many others.”
– Genomics and World Health, Report of the Advisory
Committee on Health Research, presented to Director
general of WHO on 20 December 2001; Ref - Jeffrey D.
Sachs, WHO, Geneva, 2002
10
16th December 2015
Genomics and Food Chain
• To develop high nutrient food and high yield
crop, we need to understand the genetic
structure of plants and the disease vectors.
• We also need GMO (Genetically Modified
Organisms) crops that can grow and
produce in hostile environments like drought
affected or high salineted areas
11
16th December 2015
Genomics and Energy
• All our energy come from fossil fuels like
coal and petroleum, which has been
converted from some living biological
organism to fuel for millions of years
• Can we culture organisms that will reduce
this cycle to few years instead of millions of
years
• Can we generate bio-fuels that will be
economic and commercially viable?
12
16th December 2015
Genomics and Environment
• Can we culture organisms that will help the
carbon cycle and reduce the CO2?
• Can we culture organisms or plants that will
desalinate the sea water and produce sweet
drinking water?
• Can we culture organisms or plans that will
clean the environment and accelerate the
bio-degradability of waste?
13
16th December 2015
Genetic Components of Disease
Alzheimer’s Disease
14
16th December 2015
Landmark Discoveries
• 1941 Genes code for single proteins
• 1944 Proof that DNA carries genetic information
• 1949 The concept of sickle cell anaemia as a “molecular disease”
• 1953 Structure of insulin determined
• 1953 Multistage mutational theory of cancer by Nordling
• 1953 Field Cancerization theory of cancer
• 1953 Structure of Neuclic Acid and DNA determined
• 1956 Monogenic disease due to a single amino acid substitution of the β-chain of haemoglobin
• 1960 The X-ray crystallographic structure of haemoglobin
• 1961 The genetic code, messenger RNA, gene regulation
• 1972 Recombinant DNA, cloning and gene isolation
• 1974 Direct demonstration of a human gene deletion
• 1975 Southern blotting*
• 1976 Proto-oncogenes
• 1977 DNA sequencing
• 1978 Human gene library
• 1979 Restriction fragment length polymorphism used for prenatal diagnosis Stop codon mutation
demonstrated in human globin messenger RNA Cellular oncogenes
• 1979–81 Human genes cloned and sequenced
• 1985 “Disease genes” isolated by positional cloning Polymerase chain reaction (PCR)
• 2000 The Human Genome Project — completion of 90% draft
15
16th December 2015
Questions Biologists Often Ask
Biologists need answers to a number of questions
How can we get all the knowledge that are contained in a
given sequence or structural data
analysis
prediction of certain properties
How can software tools help in designing drugs and
cure diseases based on available data
Tools for early drug discovery process
Tools to predict and treat before they manifest
16
16th December 2015
Omic Sciences
• Genomics – is the "basic recipe" book defining an individual’s
characteristics, or that of a population or of a living species
• Transcriptomics – is the science that studies how the "basic recipes" are
translated into a final product: the proteins
• Proteomics – is the study of all proteins produced by the genome
expression
• Metabolomics – is the the study of interactions between proteins and all
"metabolites" (sugar, fat, biomolecules, etc.) – of a cell or a biological entity
• Physiomics – is the study of interaction with physiology
• Fluxomics – is the study of dynamic changes of molecules within a cell over
time.
• Sociomics – is the study of all social and cultural ecosystems that interact
with the genomes
• Epigenomics – is the influence of the environmental imprint on the "coat"
that covers the genetic material in the genome
• Phenomics – is the study of phenotype
• Bibliomics – is the study of literature
17
16th December 2015
Genomics
• Genomics is the study of the genomes of organisms. The
field includes intensive efforts to determine the entire DNA
sequence of organisms and fine-scale genetic mapping
efforts. The field also includes studies of intragenomic
phenomena such as heterosis, epistasis, pleiotropy and
other interactions between loci and alleles within the
genome. In contrast, the investigation of the roles and
functions of single genes is a primary focus of molecular
biology or genetics and is a common topic of modern
medical and biological research. Research of single genes
does not fall into the definition of genomics unless the aim
of this genetic, pathway, and functional information analysis
is to elucidate its effect on, place in, and response to the
entire genome's networks.
18
16th December 2015
Gene
• With the exception of viruses, which are intracellular parasites, living
organisms are divided into two general classes. First, there are
eukaryotes whose cells have a complex compartmentalized internal
structure; they comprise algae, fungi, plants and animals. Second, there
are prokaryotes, single-celled microorganisms with a simple internal
organization, which comprise bacteria and related organisms. Genetic
information is transferred from one generation to the next by subcellular
structures called chromosomes. Prokaryotes usually have a single
circular chromosome, while most eukaryotes have more than two and in
some cases up to several hundred. For example, in humans there are
23 pairs; one of the pair is inherited from each parent. Twenty-two pairs
are called autosomes and one pair are called sex chromosomes. The
latter are designated X and Y; females have two X chromosomes (XX)
while males have an X and Y (XY).
19
16th December 2015
Genetics Vs Genomics
• Genetics is Biology
• Genomics is Statistical Data Mining
• Genetics is Confirmatory
• Genomics is Expolratory
• Genetics is hypothesis driven
• Genomics is hypothesis creating
20
16th December 2015
Genomics 3.0
• Genomics 1.0: started with the Human genome project, used by
academics and researchers to understand the disease dynamics and
the genotype phenotypic association of a living system at a time when
clinicians treat the symptom of a disease (phenotype)
• Genomics 2.0: entered the clinic and pharmaceutical companies
through translational genomics. It is used today as a tool for diagnosis
of non-communicable and genetic diseases. Clinicians use Genomics
2.0 to not just treat symptoms; but, to treat the disease
• Genomics 3.0: will deal with holistic precision medicine and will be
driven by big-data genomic analytics of the 21st Century. Genomics 3.0
will be used for asymptomatic disease onset. It will not just treat a
disease, but treat a patient and cure a disease
16th December 2015
Reduction Vs Integration
22
16th December 2015
What is a System?
• A system is a whoesome entity made out of set of interacting or
interdependent components forming an integrated whole object
• It can be collection of a set of elements (often called
'components') and relationships which are different from
relationships of the set or its elements to other elements or sets
• Interdependent components may have some property or even
cannot exibit any property outside the wholesome object
• These components when combined, it becomes a wholesome
system with a static and dynamic property completely different
from the properties of individual components
23
16th December 2015
Systems Biology
• Systems Biology Is about integration of modeling,
simulation, experimentation, databases, and
bioinformatic approaches
• Predictive understanding of microbial and plant
systems for advancing for clinical medicine, high
yield crops, hight nutriant produce, biofuel,
biological sontrol on carbon-cycling, cleaning up
contaminated environment etc.
• integration of modeling, simulation, experimentation,
and bioinformatic approaches
24
16th December 2015
The Synergy
Genomics
Transcriptomics
Proteomics
Metabolomics
Fluxomics
Sociomics
Epigenomics
Systems Biology
........
Bibliomics
25
16th December 2015
Model
• Scientific modelling is an activity to make a particular function
or entity of the real world easier to define, quantify, visualize,
understand, or simulate by referencing it to existing and
usually commonly accepted knowledge
• A simulator should be able to model the actual system in
Reduced or Enlarged Space & Time
• Key issues in simulation include representation of the true
characteristics, function, and behaviours of the original
system in a space that can be manipulated or changed as
desired
• However, in many cases the similarity is only approximate or
even intentionally distorted.
26
16th December 2015
Biological System
27
16th December 2015
Ways To Study A System*
28
16th December 2015
Deductive and Inductive Science
Ref: Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology, Springer, 2003
Physical Science
Law of Gravitation,
Newton's Law of Motion
E = mC2
Chemical/Molecular Properties
Statistics
Biological Phenomenon
Simulation (Model fitting)
Wireless Mobile Communication
Clinical Trial
29
16th December 2015
Technical Attractions of
Simulation
• Ability to compress time, expand time
• Ability to control sources of variation
• Avoids errors in measurement
• Ability to stop and review
• Ability to restore system state
• Facilitates replication
• Modeler can control level of detail
Discrete-Event Simulation: Modeling, Programming, and Analysis by G. Fishman, 2001
30
16th December 2015
Simulation System
31
16th December 2015
Part II – Some Biology
16th December 2015
Will impact the health care system significantly:
• Pharmaceuticals
• Biotechnology
• Healthcare industry
• Health insurance
• Medicine--diagnostics, therapy, prevention, wellness
• Nutrition
• Assessments of environmental toxicities
• Academia and medical schools
Precision Medicine Will Transform
the Health Care Industry
Healthcare
System
New ideas need new
organizational structures
33
16th December 2015
Instruments to Decipher Various
Types of Biological Information
34
16th December 2015
Protein interactions: Yeast two-hybrid method
35
16th December 2015
• Based on X-Ray data from Rosliand Franklin, recognized that the 3.4
Angstrom period suggested a double helix.
• Based on Chargaff’s rule ([A]=[T] and [C]=[G]), recognized that the
two strands must be held together by H-bonds between purine and
pyrimidine pairs.
• Accepted the assumption that nucleotides were held together by
phosphodiester bonds with phosphate as the chain backbone.
Watson-Crick Model of DNA
36
16th December 2015
• James D. Watson and Francis
Crick who, using x-ray data
collected by Rosalind Franklin,
proposed the double helix
structure of the DNA molecule in
1953. Their article, Molecular
Structure of Nucleic Acids: A
Structure for Deoxyribose
Nucleic Acid, is celebrated for its
treatment of the B form of DNA
(B-DNA), and as the source of
Watson-Crick base pairing of
nucleotides. They with Maurice
Wilkins, were awarded the Nobel
Prize in Physiology or Medicine
in 1962.
Watson & Crick
37
16th December 2015
The Journal Article that Won the Nobel Prize
38
16th December 2015
Interactions within a Cell
Animal Plant
Nucleus
Ribosome
Endoplasmic Reticulum
Golgi Body
Ribosome: site where proteins are made
39
16th December 2015
Nucleus
Chromosome
DNA
Nucleic Acid
Nucleotide
Inside the Nucleus
40
16th December 2015
Nucleic Acids
• Deoxyribonucleic acid (DNA)
– DNA is found in the nucleus with small amounts
in mitochondria and chloroplasts
• Ribonucleic acid (RNA)
– RNA is found throughout the cell
© 2007 Paul Billiet ODWS
41
16th December 2015
Watson-Crick Model of DNA
• Chains were in an antiparallel
orientation
• Bases stacked perpendicular
to helix axis and associate
through hydrogen bonds
• Each turn is 34 Angstroms =
10 bases/turn
• Major and minor grooves
within the helix
• Double helix has a 20
Angstrom diameter
42
16th December 2015
ADDING IN THE
BASES
• The bases are
attached to the 1st
Carbon
• Their order is
important
It determines the
genetic information
of the molecule
P
P
P
P
P
P
G
C
C
A
T
T© 2007 Paul Billiet ODWS
43
16th December 2015
Nucleotide Base Pairing
Nucleotides pair by forming H-bonds between bases. The
pairing is the basis for the antiparallel strands associating with
each other.
44
16th December 2015
3’
3’ 5’
5’
Single Stranded DNADouble Stranded DNA
45
16th December 2015
Proteins play key roles in a living
system
• Three examples of protein functions
– Catalysis:
Almost all chemical reactions in a
living cell are catalyzed by protein
enzymes.
– Transport:
Some proteins transports various
substances, such as oxygen, ions, and
so on.
– Information transfer:
For example, hormones.
Alcohol
dehydrogenase
oxidizes alcohols
to aldehydes or
ketones
Haemoglobin
carries oxygen
Insulin controls
the amount of
sugar in the
blood
46
16th December 2015
Amino acid: Basic unit of protein
COO-NH3
+ C
R
H
An amino acid
Different side chains,
R, determin the
properties of 20
amino acids.
Amino group Carboxylic
acid group
47
16th December 2015
Proteins are linear polymers of
amino acids
R1
NH3
+ C CO
H
R2
NH C CO
H
R3
NH C CO
H
R2
NH3
+
C COO
ー
H
+
R1
NH3
+
C COO
ー
H
+
H2OH2O
Peptide
bond
Peptide
bond
The amino acid
sequence is called as
primary structure
A A
F
NG
G
S
T
S
D
K
A carboxylic acid
condenses with an amino
group with the release of a
water
48
16th December 2015
Gene is protein’s blueprint,
genome is life’s blueprint
Gene
GenomeDNA
Protein
Gene Gene
Gene
Gene
Gene
Gene
GeneGene
GeneGene
GeneGene
Gene
Gene
Protein Protein
Protein
Protein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
49
16th December 2015
Gene is protein’s blueprint,
Genome is life’s blueprint
Genome
Gene Gene
Gene
Gene
Gene
Gene
GeneGene
GeneGene
GeneGene
Gene
Gene
Protein Protein
Protein
Protein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
Glycolysis network
50
16th December 2015
Amino acid sequence is
encoded by DNA base sequence
in a gene
Thirdletter
G
A
C
T
G
A
C
T
G
A
C
T
G
A
C
T
Gly
Arg
Ser
Arg
Trp
Stop
Cys
GACT
GGGGAGGCGGTG
GGA
Glu
GAAGCAGTA
GGCGACGCCGTC
GGT
Asp
GAT
Ala
GCT
Val
GTT
G
AGGAAGACGMetATG
AGA
Lys
AAAACAATA
AGCAACACCATC
AGT
Asn
AAT
Thr
ACT
Ile
ATT
A
CGGCAGCCGCTG
CGA
Gln
CAACCACTA
CGCCACCCCCTC
CGT
His
CAT
Pro
CCT
Leu
CTT
C
TGGTAGTCGTTG
TGA
Stop
TAATCA
Leu
TTA
TGCTACTCCTTC
TGT
Tyr
TAT
Ser
TCT
Phe
TTT
T
Firstletter
Second letter
51
16th December 2015
Our life is maintained by
molecular network systems
Molecular network
system in a cell
(From ExPASy Biochemical Pathways; http://www.expasy.org/cgi-bin/show_thumbnails.pl?2)
52
16th December 2015
So how can we meaningfully
integrate the data?
53
16th December 2015
protein-gene
interactions
protein-protein
interactions
PROTEOME
GENOME
METABOLISM
Bio-chemical
reactions
Citrate Cycle
Cellular networks:
GENES
54
16th December 2015
A Real-life System - Reactome
55
16th December 2015
End of Part I & II
InterpretOmics
Office: Shezan Lavelle, 5th Floor,
#15 Walton Road, Bengaluru 560001
Lab: #329, 7th Main, HAL 2nd Stage,
Indiranagar, Bengaluru 560008
Phone: +91(80)46623800

Bda2015 tutorial-part1-intro

  • 1.
    16th December 2015 Genomics3.0: Big Data in Precision Medicine Asoke K Talukder, Ph.D InterpretOmics, Bangalore, India 17th December 2009 Big Data Analytics 2015 Hyderabad 16-18 December, 2015
  • 2.
    16th December 2015 Acknowledgement •BDA2015 Technical committee • Authors & Publishers making their articles Open Access in the Web • Open Source Software/Foundation • Authors of Open Source & Open Domain software • NCBI & other open domain databases • Wikipedia & other sites that believe in Bhikshu Economy 2
  • 3.
    16th December 20153 Disclaimer • During my research for this tutorial, I have referred many text and many presentations available in the Web and obtained from various colleagues and professionals. I tried to give credit to creators of artifacts used in this presentation; however, if I have missed credit citation to the original author, that is undeliberate and unintentional. Such omissions are regretted.
  • 4.
    16th December 2015 Aboutthe Speaker • Dr. Asoke K. Talukder is a computer scientist – worked for companies like Fujitsu-ICIM, Microsoft, Oracle, Informix, Digital, Hewlett Packard, ICL, Sequoia, Northern Telecom, NEC, KredietBank, iGate, Cellnext, etc. Dr. Asoke authored/edited six books out of which two are translated in Chinese and published many peer-reviewed research papers. He is recipient of many international awards including All India Radio/Doordarshan award, ICIM Professional Excellence Award, ICL Excellence Award, IBM Solutions Excellence Award, Simagine GSMWorld Award etc. He has been listed in “Who’s Who in the World”, “Who’s Who in Science and Engineering”, and “Outstanding Scientists of 21st Century”. He did M.Sc (Physics with Biophysics Major) and Ph.D in Computer Science. He was the DaimlerChrysler Chair Professor at IIIT, Adjunct Professor, Department of CSE, NIT Warangal and Adjunct Faculty CE, NITK, Surathkal. He is Co-founder and Chief Scientific Officer of InterpretOmics the Data Sciences and Systems Biology company. 4
  • 5.
    16th December 2015 PartI - Introduction
  • 6.
    16th December 2015 EverydayNewspaper Headlines 6
  • 7.
    16th December 2015 Structureof the Tutorial • Introduction to Omic Sciences • Omic Sciences Challenges • Computational Biology • Algorithms, & Data Mining in Biology • Blood Biopsy – a case study 7
  • 8.
    16th December 2015 Goalof this Tutorial • This tutorial will define the role of Big Data and Data Sciences in biology and lifesciences. With the help of chemistry and physics, we have some understanding of biology. With advancement of technology, our next leap in biology is becoming possible. We need Mathematics and Computers to solve grand challenges in Biology for better understanding of life and understanding of genomics – the building block of life. This will help solve problems in life like diseases management or management of food and environment 8
  • 9.
    16th December 2015 Leadingcauses of death (U.S., 1999) number of % total Rank Cause deaths deaths 1 heart disease 725,192 30.3 2 malignant neoplasm 549,192 23.0 3 cerebrovascular disease 167,366 7.0 4 chronic lower respiratory 124,181 5.2 5 accidents 97,860 4.1 6 diabetes mellitus 68,399 2.9 7 influenza, pneumonia 63,730 2.7 8 Alzheimer’s disease 44,536 1.9 9 nephritis & related 35,525 1.5 10 septicemia 30,680 1.3 11 … all other 2,391,39920.2 Source: National Vital Statistics Reports 49(11):1-87, 2001. Classification of Disease 9
  • 10.
    16th December 2015 Genomicsand World Health • “It is now believed that the information generated by genomics will, in the long-term, have major benefits for the prevention, diagnosis and management of many diseases which hitherto have been difficult or impossible to control. These include communicable and genetic diseases, together with other common killers or causes of chronic illhealth, including cardiovascular disease, cancer, diabetes, the major psychoses, dementia, rheumatic disease, asthma, and many others.” – Genomics and World Health, Report of the Advisory Committee on Health Research, presented to Director general of WHO on 20 December 2001; Ref - Jeffrey D. Sachs, WHO, Geneva, 2002 10
  • 11.
    16th December 2015 Genomicsand Food Chain • To develop high nutrient food and high yield crop, we need to understand the genetic structure of plants and the disease vectors. • We also need GMO (Genetically Modified Organisms) crops that can grow and produce in hostile environments like drought affected or high salineted areas 11
  • 12.
    16th December 2015 Genomicsand Energy • All our energy come from fossil fuels like coal and petroleum, which has been converted from some living biological organism to fuel for millions of years • Can we culture organisms that will reduce this cycle to few years instead of millions of years • Can we generate bio-fuels that will be economic and commercially viable? 12
  • 13.
    16th December 2015 Genomicsand Environment • Can we culture organisms that will help the carbon cycle and reduce the CO2? • Can we culture organisms or plants that will desalinate the sea water and produce sweet drinking water? • Can we culture organisms or plans that will clean the environment and accelerate the bio-degradability of waste? 13
  • 14.
    16th December 2015 GeneticComponents of Disease Alzheimer’s Disease 14
  • 15.
    16th December 2015 LandmarkDiscoveries • 1941 Genes code for single proteins • 1944 Proof that DNA carries genetic information • 1949 The concept of sickle cell anaemia as a “molecular disease” • 1953 Structure of insulin determined • 1953 Multistage mutational theory of cancer by Nordling • 1953 Field Cancerization theory of cancer • 1953 Structure of Neuclic Acid and DNA determined • 1956 Monogenic disease due to a single amino acid substitution of the β-chain of haemoglobin • 1960 The X-ray crystallographic structure of haemoglobin • 1961 The genetic code, messenger RNA, gene regulation • 1972 Recombinant DNA, cloning and gene isolation • 1974 Direct demonstration of a human gene deletion • 1975 Southern blotting* • 1976 Proto-oncogenes • 1977 DNA sequencing • 1978 Human gene library • 1979 Restriction fragment length polymorphism used for prenatal diagnosis Stop codon mutation demonstrated in human globin messenger RNA Cellular oncogenes • 1979–81 Human genes cloned and sequenced • 1985 “Disease genes” isolated by positional cloning Polymerase chain reaction (PCR) • 2000 The Human Genome Project — completion of 90% draft 15
  • 16.
    16th December 2015 QuestionsBiologists Often Ask Biologists need answers to a number of questions How can we get all the knowledge that are contained in a given sequence or structural data analysis prediction of certain properties How can software tools help in designing drugs and cure diseases based on available data Tools for early drug discovery process Tools to predict and treat before they manifest 16
  • 17.
    16th December 2015 OmicSciences • Genomics – is the "basic recipe" book defining an individual’s characteristics, or that of a population or of a living species • Transcriptomics – is the science that studies how the "basic recipes" are translated into a final product: the proteins • Proteomics – is the study of all proteins produced by the genome expression • Metabolomics – is the the study of interactions between proteins and all "metabolites" (sugar, fat, biomolecules, etc.) – of a cell or a biological entity • Physiomics – is the study of interaction with physiology • Fluxomics – is the study of dynamic changes of molecules within a cell over time. • Sociomics – is the study of all social and cultural ecosystems that interact with the genomes • Epigenomics – is the influence of the environmental imprint on the "coat" that covers the genetic material in the genome • Phenomics – is the study of phenotype • Bibliomics – is the study of literature 17
  • 18.
    16th December 2015 Genomics •Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks. 18
  • 19.
    16th December 2015 Gene •With the exception of viruses, which are intracellular parasites, living organisms are divided into two general classes. First, there are eukaryotes whose cells have a complex compartmentalized internal structure; they comprise algae, fungi, plants and animals. Second, there are prokaryotes, single-celled microorganisms with a simple internal organization, which comprise bacteria and related organisms. Genetic information is transferred from one generation to the next by subcellular structures called chromosomes. Prokaryotes usually have a single circular chromosome, while most eukaryotes have more than two and in some cases up to several hundred. For example, in humans there are 23 pairs; one of the pair is inherited from each parent. Twenty-two pairs are called autosomes and one pair are called sex chromosomes. The latter are designated X and Y; females have two X chromosomes (XX) while males have an X and Y (XY). 19
  • 20.
    16th December 2015 GeneticsVs Genomics • Genetics is Biology • Genomics is Statistical Data Mining • Genetics is Confirmatory • Genomics is Expolratory • Genetics is hypothesis driven • Genomics is hypothesis creating 20
  • 21.
    16th December 2015 Genomics3.0 • Genomics 1.0: started with the Human genome project, used by academics and researchers to understand the disease dynamics and the genotype phenotypic association of a living system at a time when clinicians treat the symptom of a disease (phenotype) • Genomics 2.0: entered the clinic and pharmaceutical companies through translational genomics. It is used today as a tool for diagnosis of non-communicable and genetic diseases. Clinicians use Genomics 2.0 to not just treat symptoms; but, to treat the disease • Genomics 3.0: will deal with holistic precision medicine and will be driven by big-data genomic analytics of the 21st Century. Genomics 3.0 will be used for asymptomatic disease onset. It will not just treat a disease, but treat a patient and cure a disease
  • 22.
  • 23.
    16th December 2015 Whatis a System? • A system is a whoesome entity made out of set of interacting or interdependent components forming an integrated whole object • It can be collection of a set of elements (often called 'components') and relationships which are different from relationships of the set or its elements to other elements or sets • Interdependent components may have some property or even cannot exibit any property outside the wholesome object • These components when combined, it becomes a wholesome system with a static and dynamic property completely different from the properties of individual components 23
  • 24.
    16th December 2015 SystemsBiology • Systems Biology Is about integration of modeling, simulation, experimentation, databases, and bioinformatic approaches • Predictive understanding of microbial and plant systems for advancing for clinical medicine, high yield crops, hight nutriant produce, biofuel, biological sontrol on carbon-cycling, cleaning up contaminated environment etc. • integration of modeling, simulation, experimentation, and bioinformatic approaches 24
  • 25.
    16th December 2015 TheSynergy Genomics Transcriptomics Proteomics Metabolomics Fluxomics Sociomics Epigenomics Systems Biology ........ Bibliomics 25
  • 26.
    16th December 2015 Model •Scientific modelling is an activity to make a particular function or entity of the real world easier to define, quantify, visualize, understand, or simulate by referencing it to existing and usually commonly accepted knowledge • A simulator should be able to model the actual system in Reduced or Enlarged Space & Time • Key issues in simulation include representation of the true characteristics, function, and behaviours of the original system in a space that can be manipulated or changed as desired • However, in many cases the similarity is only approximate or even intentionally distorted. 26
  • 27.
  • 28.
    16th December 2015 WaysTo Study A System* 28
  • 29.
    16th December 2015 Deductiveand Inductive Science Ref: Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology, Springer, 2003 Physical Science Law of Gravitation, Newton's Law of Motion E = mC2 Chemical/Molecular Properties Statistics Biological Phenomenon Simulation (Model fitting) Wireless Mobile Communication Clinical Trial 29
  • 30.
    16th December 2015 TechnicalAttractions of Simulation • Ability to compress time, expand time • Ability to control sources of variation • Avoids errors in measurement • Ability to stop and review • Ability to restore system state • Facilitates replication • Modeler can control level of detail Discrete-Event Simulation: Modeling, Programming, and Analysis by G. Fishman, 2001 30
  • 31.
  • 32.
    16th December 2015 PartII – Some Biology
  • 33.
    16th December 2015 Willimpact the health care system significantly: • Pharmaceuticals • Biotechnology • Healthcare industry • Health insurance • Medicine--diagnostics, therapy, prevention, wellness • Nutrition • Assessments of environmental toxicities • Academia and medical schools Precision Medicine Will Transform the Health Care Industry Healthcare System New ideas need new organizational structures 33
  • 34.
    16th December 2015 Instrumentsto Decipher Various Types of Biological Information 34
  • 35.
    16th December 2015 Proteininteractions: Yeast two-hybrid method 35
  • 36.
    16th December 2015 •Based on X-Ray data from Rosliand Franklin, recognized that the 3.4 Angstrom period suggested a double helix. • Based on Chargaff’s rule ([A]=[T] and [C]=[G]), recognized that the two strands must be held together by H-bonds between purine and pyrimidine pairs. • Accepted the assumption that nucleotides were held together by phosphodiester bonds with phosphate as the chain backbone. Watson-Crick Model of DNA 36
  • 37.
    16th December 2015 •James D. Watson and Francis Crick who, using x-ray data collected by Rosalind Franklin, proposed the double helix structure of the DNA molecule in 1953. Their article, Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid, is celebrated for its treatment of the B form of DNA (B-DNA), and as the source of Watson-Crick base pairing of nucleotides. They with Maurice Wilkins, were awarded the Nobel Prize in Physiology or Medicine in 1962. Watson & Crick 37
  • 38.
    16th December 2015 TheJournal Article that Won the Nobel Prize 38
  • 39.
    16th December 2015 Interactionswithin a Cell Animal Plant Nucleus Ribosome Endoplasmic Reticulum Golgi Body Ribosome: site where proteins are made 39
  • 40.
    16th December 2015 Nucleus Chromosome DNA NucleicAcid Nucleotide Inside the Nucleus 40
  • 41.
    16th December 2015 NucleicAcids • Deoxyribonucleic acid (DNA) – DNA is found in the nucleus with small amounts in mitochondria and chloroplasts • Ribonucleic acid (RNA) – RNA is found throughout the cell © 2007 Paul Billiet ODWS 41
  • 42.
    16th December 2015 Watson-CrickModel of DNA • Chains were in an antiparallel orientation • Bases stacked perpendicular to helix axis and associate through hydrogen bonds • Each turn is 34 Angstroms = 10 bases/turn • Major and minor grooves within the helix • Double helix has a 20 Angstrom diameter 42
  • 43.
    16th December 2015 ADDINGIN THE BASES • The bases are attached to the 1st Carbon • Their order is important It determines the genetic information of the molecule P P P P P P G C C A T T© 2007 Paul Billiet ODWS 43
  • 44.
    16th December 2015 NucleotideBase Pairing Nucleotides pair by forming H-bonds between bases. The pairing is the basis for the antiparallel strands associating with each other. 44
  • 45.
    16th December 2015 3’ 3’5’ 5’ Single Stranded DNADouble Stranded DNA 45
  • 46.
    16th December 2015 Proteinsplay key roles in a living system • Three examples of protein functions – Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes. – Transport: Some proteins transports various substances, such as oxygen, ions, and so on. – Information transfer: For example, hormones. Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Haemoglobin carries oxygen Insulin controls the amount of sugar in the blood 46
  • 47.
    16th December 2015 Aminoacid: Basic unit of protein COO-NH3 + C R H An amino acid Different side chains, R, determin the properties of 20 amino acids. Amino group Carboxylic acid group 47
  • 48.
    16th December 2015 Proteinsare linear polymers of amino acids R1 NH3 + C CO H R2 NH C CO H R3 NH C CO H R2 NH3 + C COO ー H + R1 NH3 + C COO ー H + H2OH2O Peptide bond Peptide bond The amino acid sequence is called as primary structure A A F NG G S T S D K A carboxylic acid condenses with an amino group with the release of a water 48
  • 49.
    16th December 2015 Geneis protein’s blueprint, genome is life’s blueprint Gene GenomeDNA Protein Gene Gene Gene Gene Gene Gene GeneGene GeneGene GeneGene Gene Gene Protein Protein Protein Protein Protein ProteinProtein Protein Protein Protein Protein Protein Protein Protein 49
  • 50.
    16th December 2015 Geneis protein’s blueprint, Genome is life’s blueprint Genome Gene Gene Gene Gene Gene Gene GeneGene GeneGene GeneGene Gene Gene Protein Protein Protein Protein Protein ProteinProtein Protein Protein Protein Protein Protein Protein Protein Glycolysis network 50
  • 51.
    16th December 2015 Aminoacid sequence is encoded by DNA base sequence in a gene Thirdletter G A C T G A C T G A C T G A C T Gly Arg Ser Arg Trp Stop Cys GACT GGGGAGGCGGTG GGA Glu GAAGCAGTA GGCGACGCCGTC GGT Asp GAT Ala GCT Val GTT G AGGAAGACGMetATG AGA Lys AAAACAATA AGCAACACCATC AGT Asn AAT Thr ACT Ile ATT A CGGCAGCCGCTG CGA Gln CAACCACTA CGCCACCCCCTC CGT His CAT Pro CCT Leu CTT C TGGTAGTCGTTG TGA Stop TAATCA Leu TTA TGCTACTCCTTC TGT Tyr TAT Ser TCT Phe TTT T Firstletter Second letter 51
  • 52.
    16th December 2015 Ourlife is maintained by molecular network systems Molecular network system in a cell (From ExPASy Biochemical Pathways; http://www.expasy.org/cgi-bin/show_thumbnails.pl?2) 52
  • 53.
    16th December 2015 Sohow can we meaningfully integrate the data? 53
  • 54.
  • 55.
    16th December 2015 AReal-life System - Reactome 55
  • 56.
    16th December 2015 Endof Part I & II InterpretOmics Office: Shezan Lavelle, 5th Floor, #15 Walton Road, Bengaluru 560001 Lab: #329, 7th Main, HAL 2nd Stage, Indiranagar, Bengaluru 560008 Phone: +91(80)46623800