BIOINFORMATICS
DEFINITION
SOURCE OF
DATA USED
AIMS
Application
of To organizes data in a
way
that
allows Raw DNA sequence
computational
researchers to access
techniques
to existing information & Protein sequence
understand
and to submit new entries Macromolecular
organize
the as they are produced
structure
To develop tool &
information
Genomes
associated
with resources that aid in Gene expression
the analysis of data
biological
To use the tools to Literature
macromolecules
analyze the data & Metabolic pathways
interpret the results in
a
biologically
meaningful manner
GENOMIC
STUDIES
GENE
EXPRESSION
STUDIES
Provides valuable
insight into the
stereochemical
principles of binding
Concentrated on
model organisms &
analysis of
regulatory systems
Focused on devising
methods to cluster genes by
similarities in expression
profiles
To determine proteins that
are expressed together
under diff cellular conditions
GENE
EXPRESSI
ON DATA
FARAH ALIA
BT RAHAMAT
2013235384
EH 222 7C
DATA
INTEGRATI
ON
THE
BIOINFOR
MATICS
SPECTRU
M
REDUNDANCY &
MULTIPLICITY DATA
PROTEIN SEQUENCE
DATABASES
STRUCTURAL
DATABASES
Data classification between genomes & their products
Categorized as primary, composite & secondary
Primary databases as a repository for the raw data (Eg: SWISS-PROT)
Composite databases compile & filter sequence data from diff primary
database to produce combined non-redundant sets that are more
complete than the individual databases (Eg: OWL)
Secondary databases contain information derived from protein
sequences & help the user determine whether a new sequence belongs to
a known protein family (Eg: PROSITE)
Databases of macromolecular structures
PDB provides primary archive of all 3D structures for
macromolecules (proteins, RNA, DNA)
Solved by x-ray crystallography and NMR
3 major databases classify proteins by structure to
identify structural & evolutionary relationships (CATH,
SCOP, FSSP)
Measure
the amount
of mRNA
OR protein
products
produced
by cell
Most
profitable
research
integrate
multiple
sources of
data
Allow
expansion
of
biological
analysis;
depth &
breadth
3 main
technologies
: cDNA
microarray,
Affymatrix
GeneChip,
SAGE
methods
Not always
straightforw
ard to access
because diff
in
nomenclatur
e & file
formats
Depth to
take single
protein &
maximizes
understandi
ng about
proteins
encoded
NUCLEOTIDE &
GENOME SEQUENCES
Yeast
measure
mRNA levels
throughout
whole cell
cycle, some
focus on
particular
stage in cycle
Separated
according
to sources
of
informatio
n
Breadth
compare
a gene
with
others
FINDING
HOMOLOGUES
RATIONAL
DRUG DESIGN
LARGE-SCALE
CENSUS
Simplifies
problem to
understand
complex
genomes
Earliest
medical
applications
Help identify
interesting
subject areas for
further detailed
analysis
Biggest excitement availability of complete genome sequences for
different organisms
Whole-genome sequencing often conducted through international
collaborations, individual genomes are published at diff sites
Entrez genome database combines complete & partial genomes in a
single location
Cluster of Orthologous Group (COG) predict function of uncharacterized
proteins & identify phylogenetic patterns of protein occurence
GENE
EXPRESSION
ANALYSIS
Compile
expression data
for cells affected
by diff diseases
APPLICATION
TRANSCRIPTION
REGULATION
STRUCTURAL
STUDIES
INTRODUCTION
AND OVERVIEW
OF BIO
INFORMATICS
Can be grouped together based on biologically meaningful similarities
Genes grouped by particular functions OR by metabolic pathway
Organisms often have multiple copies of a particular gene through
duplication
Proteins adopt equivalent structures even when they differ greatly in
sequence
Analogues proteins have related folds, unrelated sequences
Homologous proteins both sequentially & structurally similar