bioinformatics simple


 Science of collecting, analyzing and conceptualizing
biological data by implication of informatics techniques.
2
Bioinformatics
Biology
Informa-
tics
Bioinformatics

Biological
Data
Computer
Analysis+
Mouse Genome: 2.5 billion base pairs
Human Genome: 3 billion base pairs 3


 Manage biological information
 organize biological information using databases
 Process, analyze, and visualize biological data
 Share biological information to the public using the Internet.
4
Goals of Bioinformatics


 Bio – informatics
 Bioinformatics is conceptualizing biology in terms of
molecules (in the sense of physical-chemistry)
applying “informatics” techniques (derived from
disciplines such as applied math, CS, and statistics)
to understand and organize the information
associated with these molecules, on a large-scale.
 Bioinformatics is a practical discipline with many
applications.
5
Definition


Computational biology
6
Bioinformatics
Systems
biology
Genomics
Bioinformatics


7
Biological Information
 Central Dogma
of Molecular Biology
DNA
-> RNA
-> Protein
-> Phenotype
-> DNA
 Molecules
 Sequence, Structure, Function,
Interaction
 Processes
 Mechanism, Specificity,
Regulation
 Central Paradigm
for Bioinformatics
Genomic Sequence Information
-> mRNA (level)
-> Protein Sequence
-> Protein Structure
-> Protein Function
-> Protein Interaction
-> Phenotype
 Large Amounts of Information
 Statistical
 Computer Processing


Systems Analysis
Information Theory
Graph Theory
Robotics
Algorithms
Artificial IntelligenceStatistics
8


9
Domains of
bioinformatics
Bio-informatist
Development of new
software
Algorithms
Bio-informaticians.
Using different algorithms
and computer software


 Could not have been achieved without bioinformatics
 Goals
 3 billion DNA subunits
 Discover all the human genes
 Make them accessible for further biological study
 then ?
 Need to bring together and store vast amounts of information
from
 Lab equipment and experiments
 Computer Analysis
 Human Analysis
 Make visible to the world’s scientists 10
Human genome project


11
How to analyze
information
 Data
 –Management.
 –Analysis.
 –Derive Hypothesis.
 –Design and Implement an in silico experiment.
 –Confirm in the wet lab.


 Find an answer quickly
 Most in silico biology is faster than in vitro
 2. Massive amounts of data to analyze
 Need to make use of all information
 Not possible to do analysis by hand
 Can’t organize and store information only using lab note
books•
 Automation is key
 However!
 Verification ?
12
Why bioinformatics


1. Computational biology-
 Computing methods for classical biology
 Primarily concerned ----> Evolutionary, population and
theoretical biology,
 Cellular/Molecular biology ?
2. Medical informatics-
 Computing methods to improve communication,
understanding, and management of medical data
 Data Manipulation
Applications


3. Chemo -informatics
 Chemical and biological technology, for drug design
and development
4. Genomics
 Analysis and comparison of the entire genome of a
single species or of multiple species
 Genomics existed before any genomes were
completely sequenced, but in a very primitive state
Continued…


5. Proteomics
 Study of how the genome is expressed in proteins, and of
how these proteins function and interact
 Concerned with the actual states of specific cells, rather
than the potential states described by the genome
6. Pharmacogenomics
 The application of genomic methods to identify drug
targets
 For example, searching entire genomes for potential drug
receptors, or by studying gene expression patterns in
tumors
Continued….


7. Pharmacogenetics :
 The use of genomic methods to determine what
causes variations in individual response to drug
treatments
 The goal is to identify drugs that may be only be
effective for subsets of patients, or to tailor drugs for
specific individuals or groups


17
Main Goal:
?
Annotation Comparative
genomics
Structural
genomics
Functional
genomics
The “post-genomics” era

18
Annotation
Identify the genes within a
given sequence of DNA
Identify the sites
Which regulate the gene
Predict the function


 A gene is characterized by several features
(promoter, ORF…)
 some are easier and some harder to detect…
19
How do we identify a gene
in a genome?


21
Comparison between the full drafts of the human and chimp
genomes revealed that they differ only by 1.23%
How humans
are chimps?
Perhaps not surprising!!!


So where are we different ??
22
Human ATAGCGGGGGGATGCGGGCCCTATACCC
Chimp ATAGGGG - - GGATGCGGGCCCTATACCC
Mouse ATAGCG - - - GGATGCGGCGC -TATACCA

24
The protein three dimensional structure can tell
much more than the sequence alone
Protein-ligand complexes
Functional sites
fold Evolutionary
relationship
Shape and electrostatics
Active sites
protein complexes
Biologic processes


The different types of data are collected in database
 Sequence databases
 Structural databases
 Databases of Experimental Results
All databases are connected
25
Resources and Databases


Gene database
Genome database
Disease related mutation database
26
Sequence databases


 3-dimensional structures of proteins, nucleic acids,
molecular complexes etc
 3-d data is available due to techniques such as NMR
and X-Ray crystallography
27
Structure Databases


 Data such as experimental microarray images- gene
expression data
 Proteomic data- protein expression data
 Metabolic pathways, protein-protein interaction
data, regulatory networks
28
Databases of Experimental
Results


29
PubMed
Service of the National Library of Medicine
http://www.ncbi.nlm.nih.gov/pubmed/
Literature Databases


 Each Database contains specific information
 Like other biological systems also these databases are
interrelated
30
Putting it all Together

31
GENOMIC DATA
GenBank
DDBJ
EMBL
ASSEMBLED
GENOMES
GoldenPath
WormBase
TIGR
PROTEIN
PIR
SWISS-PROT
STRUCTURE
PDB
MMDB
SCOP
LITERATURE
PubMed
PATHWAY
KEGG
COG
DISEASE
LocusLink
OMIM
OMIA
GENES
RefSeq
AllGenes
GDBSNPs
dbSNP
ESTs
dbEST
unigene
MOTIFS
BLOCKS
Pfam
Prosite
GENE
EXPRESSION
Stanford MGDB
NetAffx
ArrayExpress


Applications I-- Genomics
 Finding Genes in Genomic DNA
 introns
 exons
 Promotors
 Characterizing Repeats in Genomic DNA
 Statistics
 Patterns
 Expression Analysis
 Time Course Clustering
 Identifying regulatory Regions
 Measuring Differences
• Genome Comparisons
 Ortholog Families
 Genome annotation
 Evolutionary Phylogenetic
trees
• Characterizing Intergenic
Regions
 Finding Pseudo genes
 Patterns
• Duplications in the Genome
 Large scale genomic
alignment


Application II-
Protein
Sequence
 Sequence Alignment
 non-exact string matching,
gaps
 How to align two strings
optimally via Dynamic
Programming
 Local vs Global Alignment
 Suboptimal Alignment
 Hashing to increase speed
(BLAST, FASTA)
 Amino acid substitution
scoring matrices
 Multiple Alignment and
Consensus Patterns
 How to align more than one
sequence and then fuse the
result in a consensus
representation
 Transitive Comparisons
 HMMs, Profiles
 Motifs
 Scoring schemes and
Matching statistics
 How to tell if a given
alignment or match is
statistically significant
 A P-value (or an e-value)?
 Score Distributions
(extreme val. dist.)
 Low Complexity Sequences
 Evolutionary Issues
 Rates of mutation and change


Application
III-- Protein
Structure
 Secondary Structure
“Prediction”
 via Propensities
 Neural Networks, Genetic
Algorithm.
 Simple Statistics
 Trans Membrane Regions
 Assessing Secondary Structure
Prediction
 Tertiary Structure Prediction
 Fold Recognition
 Threading
 Ab initio
 Function Prediction
 Active site identification
 Relation of Sequence Similarity to
Structural Similarity


Example Application IV: Finding Homologs
Core


 Overall Occurrence of a
Certain Feature in the
Genome
 e.g. how many kinases in
Yeast
 Compare Organisms and
Tissues
 Expression levels in
Cancerous vs Normal
Tissues
 Databases, Statistics
Example Application IV:
Overall Genome Characterization

bioinformatics simple

In this document

More Related Content

What's hot

Viewers also liked

Similar to bioinformatics simple

More from nadeem akhter

Recently uploaded

bioinformatics simple