0% found this document useful (0 votes)

47 views79 pages

5 Microarray PDF

This document discusses data mining in transcriptomics databases. It provides three key points: 1. It describes various technologies used for genome-wide expression profiling like microarrays and RNA sequencing. It also discusses how the resulting data is organized and classified in databases. 2. It provides examples of several existing public databases that store and provide access to gene expression data from microarray and RNA-seq experiments. This includes databases for specific model organisms like FlyView and databases with a broader scope like Stanford Microarray Database. 3. It gives an overview of common analytical methods used for mining gene expression data, such as clustering, to group genes or samples based on similarity in expression patterns. It also discusses distance and similarity

Uploaded by

Swapnil Gudmalwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views79 pages

5 Microarray PDF

Uploaded by

Swapnil Gudmalwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Topic 7

Data-mining in trancriptomics databases

• Genome-wide expression profiling

• The technology
• Organization and classification of data-sets
• Data-mining
ORGANIZATION OF BIOLOGICAL DATA

Gene i Genomics

m-RNA i Transcriptomics

Protein Sequence /
Protein i Proteomics

Function
(Enzyme, 3-D Structural
hormone etc.) Database
The Flow of Genetic Information

5’ Sequence same as RNA

3’
DNA ACTGCACCATGGGGCTCAGCGACGGGGAATGGCACTTGGTG
TGACGTGGTACCCCGAGTCGCTGCCCCTTACCGTGAACCAC
Sequence complementary to RNA

mRNA 5’ ACUGCACCAUGGGGCUCAGCGACGGGGAAUGGCACUUGGUG

Initiation codons
signal
Protein
Met-Gly-Leu-Ser-Asp-Gly-Gln-Trp-His-Leu-Val
DESCRIPTION OF A LIVING CELL / VIRUS

Genome / General Capability

Genomics of the Cell

Transcriptomics Readyness of the Cell

Proteomics / Physiological state

Protein Map of the cell
Network genomics

Metabolites

DNA RNA Protein

Growth rate
Expression

stem cells
cancer cells
microbes
Some useful signals on Genes
Upstream activating
sequences (UAS)

m-RNA expression
TATA box
start & end

DNA
x x
mRNA

Ribosomal
binding site protein
Protein Protein
synthesis synthesis
starts stops
A typical gene in higher organisms

Transcription Acceptor
Intron Donor
start site model
(non-coding region) model

Translation Stop
start site Exon (coding codon
region)
Alternative splicing leads to diversity
Transcription
start site
E1 I1 E2 I2 E3

E1 E2 E3

E1 I1 E2 E3
Human RNA-splice junctions sequence matrix
Genetic Regulation of Processes
(Regulation of Transcriptional Activity)
A Typical Genetic Regulatory Circuit

McAdams and Arkin, Proc. Natl. Acad. Sci., 1997, vol 94, 814-819
Newly identified members of Gal4 Regulatory Circuit

Ren et al, Science, 22 Dec 2000, vol 290, 2306-2309

8 cross-checks for regulon quantitation
In vitro
Protein fusions In vivo selection
Selection A-B (one-hybrid)
(Selex) A
B

EC SC BS HI

P1 1 0 1
P2 1 1 0
P3 0 1 1
P4 1 0 0
P5 1 1 1
Microarray data P6 0 1 1
Coregulated sets P7 1 1 0

of genes Phylogenetic profiles

TCA
cycle
B. subtilis purM purN purH purD

E. coli purM purN

Metabolic pathways Known regulons in

purH purD

Conserved operons other organisms

Data mining in transcriptomics
databases

47 articles on RNA array data

13 databases (3 Sybase, 2 Oracle, 8 Other)
60 articles on RNA array data mining
108 companies, 23 for software
Current Gene Expression Databases

 Axeldb www.dkfz-
heidelberg.de/abt0135/axeldb.htm
Gene expression in Xenopus
 BodyMap bodymap.ims.u-tokyo.ac.jp/
human & mouse gene expression
 FlyView pbio07.uni-muenster.de/ Drosophila
 Interferon Stimulated Gene Database
www.lerner.ccf.org/labs/williams/xchi-html.cgi
genes induced by treatment with interferon
 Stanford Microarray Database
genome-www.stanford.edu/microarray
Raw & normalized data from various sources
RNA quantitation database integration
experiment • R/G ratios
control ORF
Microarrays1 • R, G values
~1000 bp • quality indicators
hybridization
ORF • Averaged PM-MM
PM • “presence”
Affymetrix2 MM
25-bp hybridization • feature statistics

ORF SAGE Tag • 25-mers

SAGE3 • Counts of SAGE 14-

sequence counting mers sequence tags
for each ORF
concatamers
1 DeRisi, et.al., Science 278:680-686 (1997)
2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996)
3 Velculescu, et.al,, Science 270:484-487 (1995)
Biotinylated RNA
from experiment

GeneChip expression Each probe cell contains

analysis probe array millions of copies of a specific
oligonucleotide probe

Streptavidin-
phycoerythrin
Image of hybridized probe array conjugate
Error Model for Microarray Data

Fawcett et al, Proc. Natl. Acad. Sci. USA (2000) 97, 8063-68
Representation of expression data

Normalized Time-point 1
Expression Data
from microarrays

T1 T2 T3

Time-point 3
Gene 1

dij
.

Gene 1
Gene N Gene 2
Cluster analysis of mRNA expression data

By gene (rat spinal cord development, yeast cell cycle):

Wen et al., 1998; Tavazoie et al., 1999; Eisen et al., 1998;
Tamayo et al., 1999

By condition or cell-type or by gene&cell-type (human

cancer):
Golub, et al. 1999; Alon, et al. 1999; Perou, et al. 1999;
Weinstein, et al. 1997
Cluster Analysis

• To divide samples into homogeneous groups based on set

of features.
• Clustering of genes based on similarity in expression
pattern over a range of conditions.

Protein/protein complex

Genes

DNA regulatory elements

Gene Expression Data Analysis

Gene Expression Data

Pairwise Measures
Distance/Similarity Matrix
Clustering
Gene Clusters
Motif Searching/...
Regulatory Elements / Gene Functions
Clusters of Two-Dimensional Data
Key Terms in Cluster Analysis

• Distance & Similarity measures

• Hierarchical & non-hierarchical
• Single/complete/average linkage
• Dendrograms & ordering
Distance Measures: Minkowski Metric

Suppose two objects x and y both have p features :

x  ( x1 x 2  xp )
y  ( y1 y 2  yp )
The Minkowski metric is defined by
p
d ( x, y)  r | xi  yi |r
i 1
Most Common Minkowski Metrics
1, r  2 (Euclidean distance )
p
d ( x, y)  2 | xi  yi |2
i 1

2, r  1 (Manhattan distance)
p
d ( x , y )   | xi  yi |
i 1

3, r   (" sup" distance )

d ( x , y )  max | xi  yi |
1 i  p
An Example
x

3 y

1, Euclidean distance : 2 4 2  32  5.
2, Manhattan distance : 4  3  7.
3, " sup" distance : max{4,3}  4.
Manhattan distance is called Hamming
distance when all features are binary.

Gene Expression Levels Under 17 Conditions (1-High,0-Low)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
GeneA 0 1 1 0 0 1 0 0 1 0 0 1 1 1 0 0 1
GeneB 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 1 1

Hamming Distance : #( 01 )  #( 10 )  4  1  5.
Similarity Measures: Correlation Coefficient
p

 ( x  x)( y
i 1
i i  y)
s ( x, y ) 
p p
2 2
 i
( x
i 1
 x )   i
( y  y )
i 1

p p
averages : x  1
p  xi and y 
i 1
1
p y.
i 1
i

s( x, y)  1
What kind of x and y give
(1) s(x,y)=1,
(2) s(x,y)=-1,
(3) s(x,y)=0 ?
Similarity Measures: Correlation Coefficient

Expression Gene A Gene B

Level
Gene B Gene A
Time Time

Expression Gene B
Level
Gene A

Time
Pattern recognition &
normalization

Singular Value Decomposition (SVD) =

Principal-Component Analysis (PCA)

Linear transformation of Genes by Conditions space

to “Eigen” space producing orthonormal superpositions.
hierarchical & non-
Normalized Expression Data

ab c d
Clustering methods

Hierarchical: a series of successive fusions or

splittings of data until a final number of clusters is
obtained.
• A definite hierarchy between clusters & sub-clusters
Non-hierarchical -: A number of clusters is assumed
at the start. Points are allocated among clusters so
that a criterion is minimized, e.g.the within-clusters
sum of the variance
• No hierarchy within clusters or between clusters.
• E.g. K-mean, Self Organizing maps, etc..
Hierarchical Clustering Techniques

At the beginning, each object (gene) is

a cluster. In each of the subsequent
steps, two closest clusters will merge
into one cluster until there is only one
cluster left.
The distance between two clusters is
defined as the distance between--

Distance Matrix
Complete-Link Method
Euclidean Distance

a
a,b a,b
b a,b,c,d
c,d
c d c d
(1) (2) (3)

b c d b c d c d c, d
a 2 5 6 a 2 5 6 a, b 5 6 a, b 6
b 3 5 b 3 5 c 4
c 4 c 4

Distance Matrix
Compare Dendrograms
Single-Link Complete-Link
ab c d 0
ab c d

6
Which clustering methods do you suggest
for the following two-dimensional data?
Problems of Hierarchical
Clustering
• It concerns more about complete tree
structure than the optimal number of
clusters.
• There is no possibility of correcting for a
poor initial partition.
• Similarity and distance measures rarely
have strict numerical significance.
Non-hierarchical clustering
Normalized Expression Data
Interpreting Patterns of Gene Expression
with Self Organizing Maps

Tamayo et al, Proc. Natl. Acad. Sci. USA, 1999, Vol 96, 2907
SOM algorithm
• Initial mapping of nodes fo is random.
• At each iteration, data-point P is selected and the
node Np that maps closest to P is identified.
• The mapping of the nodes is then adjusted by the
formula
fi+1(N) = fi(N) + (d(N, Np), i) (P-fi(Np)

where learning rate, (x, i) = 0.02 T / (T + 100 i)

T = max. no of iterations.
Clustering of genes with Self Organizing Maps
Clustering by K-means
•Given a set S of N p-dimension vectors without any prior
knowledge about the set, the K-means clustering algorithm
forms K disjoint nonempty subsets such that each subset
minimizes some measure of dissimilarity locally. The algorithm
will globally yield an optimal dissimilarity of all subsets.
•Euclidean distance metric between the coordinates of any two
genes in the space reflects ignorance of a more biologically
relevant measure of distance. K-means is an unsupervised,
iterative algorithm that minimizes the within-cluster sum of
squared distances from the cluster mean.
•The first cluster center is chosen as the centroid of the entire
data set and subsequent centers are chosen by finding the
data point farthest from the centers already chosen. 200-400
iterations.
Representation of expression data
T1 T2 T3
Gene 1
Time-point 1

Time-point 3

Gene N
dij
.
Normalized
Expression Data Gene 1
from microarrays Gene 2
Identifying prevalent expression patterns
(gene clusters)
Time-point 1

Normalized
Expression
1.5

0.5
Time-point 3

-0.5
1 2 3

-1

-1.5

Time -point

Normalized
Expression
Normalized
Expression

1.2 1.5

1
0.7

0.5
0.2
0
-0.3
1 2 3 -0.5 1 2 3
-0.8
-1

-1.3
-1.5

-1.8 -2

Time -point Time -point

Evaluate Cluster contents
Genes MIPS functional category
gpm1 Glycolysis
HTB1 Nuclear
RPL11A
Organization
RPL12B
RPL13A
RPL14A Ribosome
RPL15A
RPL17A
RPL23A
TEF2 Translation
YDL228c
YDR133C
YDR134C
YDR327W Unknown
YDR417C
YKL153W
YPL142C
Representation and clustering of Gene Expression Data

Eisen et al, Proc. Natl. Acad. Sci. USA, 1998, Vol 95, 14863
Hierarchical Clustering of Genes from Expression Data

Red=up-regulated, green=down-regulated
Gene Disruption Studies in Yeast

genes

M
u
t
a
n
t
s


Hughes et al, Cell, 2000, vol 102, 109-126

Molecular Classification of Human Breast Tumors
Biclustering of Gene Expression Data
Breast tumor samples 

g
e
n
e
s


Perou et al, Nature, 2000, vol 406, 747-752

Identification of marker genes in cancer by
expression profiling
Data-Management in Cancer Research

Weinstein et al, Science (1997) 275, 343-349

Obtaining correlation by integrating two data-sets
Database S: Molecular Structure Descriptors
460,000 compounds x 588 descriptors
Database A: Activity patterns (-log GI50)
60,000 compounds x 60 cell lines
Database T: molecular targets (abundance/expression)
100 targets x 60 cell lines
Database A.T’: Correlation between compounds & targets

60 cell lines 100 targets

60k compds

60 cell lines

100 targets

60k compds
A . T’ = A.T’
‘‘Clustered correlation’’ map of compounds & molecular targets

compounds

Targets
Gleaning information from the Cancer databases at NCI

• Clustering of cell lines based on A, T, & A.T’

databases
• Prediction of mechanism of action of drugs based on
A.T’ database
• Correlation of targets in terms of expression based on
T.T’ database.
• Correlation of targets in terms of activities based on
(A.T’)’.(A.T’) database.
• Correlation between structure descriptors and
molecular targets based on S’.(A.T’) database.
Target-target correlation using cancer data

In terms of expression In terms of activities

(T.T’) (A.T’)’.(A.T’)
1
Targets

113
1 Targets 113 1 Targets 113
Correlation
between structure
descriptors and
Targets in
S’.(AT’)
database
Scherf et al, Nature Genetics (2000) 24, 236-44
Hierarchical clustering of human cancer cell lines

Based on Based on
gene sensitivity
expression to 1400
profiles compds
tested
drugs Clustered Correlation for A.T’ database

genes
Distinct Types of Diffuse Large B-Cell Lymphoma
Identified by Gene Expression Profiling

Alizadeh et al, Nature (2000) 403, 503-511

Gene expression signatures for cancer types
DLBCL gene expression subgroups define
prognostic categories
Class Discovery & Class Prediction in Cancer Research
by Gene Expression Monitoring

• General strategy, independent of previous

biological knowledge

• Class Discovery: New Cancer Classes

• Class Prediction: Assigning tumors to known

classes

• Based solely on gene expression monitoring

Golub et al, Science, 1999, vol 286, 531-537

Class Distinction Between
Acute Myeloid Lukemia (AML) &
Acute Lymphoblastic Leukemia (ALL)

Identify Distinguishing Features in a Dataset

Class Prediction Between AML & ALL
Assigning new tumor to known class
Class Discovery in Cancer with a 2-cluster SOM

Golub et al, Science, 1999, vol 286, 531-537

Class Discovery with a 4-cluster SOM

• Possibly, discovers a New Class of Cancer

• Can be applied to cancer data irrespective of
biological background
Exon Microarrays for Human Genome

Shoemaker, et al, Nature (2001) 409, 922-927

15,511 probes for 8,183 predicted exons
69 experiments
Using Expression Data from multiple experiments to
validate exons & define Gene boundaries.
Characterization of novel transcripts using Tiling Arrays
Verification of predicted exons using tiling microarrays.
Whole genome scan for validating predicted exons.
Determination of Regulatory Network and Motifs
from Microarray Data

Tavazoie et al, Nature genetics (1999) 22, 281-85

Application of Microarray Technology

• Classification of cancers, identification of marker

genes
• Validation of predicted exons / genes for higher
organisms.
• Identification of genetic regulatory networks.

CMMB 461 Dna Microarray 2 2019 For D2L
No ratings yet
CMMB 461 Dna Microarray 2 2019 For D2L
27 pages
Microarray Full
No ratings yet
Microarray Full
56 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
Clustering
No ratings yet
Clustering
22 pages
How Does Gene Expression Clustering Work?: Primer
No ratings yet
How Does Gene Expression Clustering Work?: Primer
3 pages
K-Means and Kohonen Maps Unsupervised Clustering Techniques: Steve Hookway 4/8/04
No ratings yet
K-Means and Kohonen Maps Unsupervised Clustering Techniques: Steve Hookway 4/8/04
53 pages
Clustering Tutorial May
No ratings yet
Clustering Tutorial May
60 pages
Clustering
No ratings yet
Clustering
36 pages
Clustering: Georg Gerber Lecture #6, 2/6/02
No ratings yet
Clustering: Georg Gerber Lecture #6, 2/6/02
50 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
34 pages
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
No ratings yet
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
34 pages
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
No ratings yet
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
22 pages
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
No ratings yet
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
48 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
Ch10 Clustering
No ratings yet
Ch10 Clustering
45 pages
K Means Clustering
No ratings yet
K Means Clustering
43 pages
Clustering
No ratings yet
Clustering
64 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
32 pages
MIT6 047F15 Lecture07
No ratings yet
MIT6 047F15 Lecture07
86 pages
Gene Data Mining with WEKA
No ratings yet
Gene Data Mining with WEKA
12 pages
Hybrid Clustering for Microarray Data
No ratings yet
Hybrid Clustering for Microarray Data
4 pages
Metodos Clasificacion
No ratings yet
Metodos Clasificacion
203 pages
Gene and Sample Clustering
No ratings yet
Gene and Sample Clustering
5 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Clustering and K-Means
No ratings yet
Clustering and K-Means
96 pages
Clustering
No ratings yet
Clustering
45 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Clustering
No ratings yet
Clustering
75 pages
Comparative Analysis of Clustering Methods For Gene Expression Data
No ratings yet
Comparative Analysis of Clustering Methods For Gene Expression Data
117 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Genes 13 01839 v2
No ratings yet
Genes 13 01839 v2
22 pages
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
No ratings yet
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
9 pages
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
No ratings yet
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
14 pages
Genome Data Analysis Tools
No ratings yet
Genome Data Analysis Tools
11 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
ML Imp Ques 2
No ratings yet
ML Imp Ques 2
37 pages
What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?
No ratings yet
What Is The Most Natural (Non-Autonomous, E.G. Breathing) Thing Done by Human Beings? How Often Does The Average Human Do It?
32 pages
Unit Iii
No ratings yet
Unit Iii
62 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
91 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
On The Selection of Appropriate Distances For Gene Expression Data Clustering
No ratings yet
On The Selection of Appropriate Distances For Gene Expression Data Clustering
18 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
2445 Gene Expression Clustering With Functional Mixture Models
No ratings yet
2445 Gene Expression Clustering With Functional Mixture Models
8 pages
Ijcet 10 01 005 PDF
No ratings yet
Ijcet 10 01 005 PDF
10 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
High Level Analysis of Microarray Data: Claudio Altafini
No ratings yet
High Level Analysis of Microarray Data: Claudio Altafini
30 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Cluster Analysis
No ratings yet
Cluster Analysis
37 pages
Clustering
No ratings yet
Clustering
45 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Ult SCH 94 Benchmark
No ratings yet
Ult SCH 94 Benchmark
14 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Product Analytics for PMs
No ratings yet
Product Analytics for PMs
9 pages
CL415 Midsem
No ratings yet
CL415 Midsem
2 pages
CL415 Midsem PDF
No ratings yet
CL415 Midsem PDF
2 pages
Pressure Vessel Design Guide
No ratings yet
Pressure Vessel Design Guide
2 pages
PART-SG and PART-RKM Should Be Answered Separate Answerbooks
No ratings yet
PART-SG and PART-RKM Should Be Answered Separate Answerbooks
4 pages
PED AssignmentProblems 2017 PDF
No ratings yet
PED AssignmentProblems 2017 PDF
10 pages
CL409 Materials Science: Jayesh Bellare
No ratings yet
CL409 Materials Science: Jayesh Bellare
12 pages
PED AssignmentProblems 2017 PDF
No ratings yet
PED AssignmentProblems 2017 PDF
10 pages
Rollno: - Hostel: - Name
No ratings yet
Rollno: - Hostel: - Name
1 page
Quiz-8 (Take-Home Honour Code Applies) : Due 10.35 Am Mon 29-Aug-2016 Topic: X-Ray Crystallography (Self-Study Material Attached)
No ratings yet
Quiz-8 (Take-Home Honour Code Applies) : Due 10.35 Am Mon 29-Aug-2016 Topic: X-Ray Crystallography (Self-Study Material Attached)
3 pages
Quiz-5 (Take-Home Honour Code Applies) : Due 11.35 Am Tue 16-Aug-2016
No ratings yet
Quiz-5 (Take-Home Honour Code Applies) : Due 11.35 Am Tue 16-Aug-2016
2 pages
Materials Science: Sintering & Fracture
No ratings yet
Materials Science: Sintering & Fracture
13 pages
Engineering Pre-Stress Calculation
No ratings yet
Engineering Pre-Stress Calculation
1 page
CL409 Q3 PDF
No ratings yet
CL409 Q3 PDF
1 page
IIM LUCKNOW - Interview-Letter PDF
No ratings yet
IIM LUCKNOW - Interview-Letter PDF
2 pages
Memories, Myths and Misconceptions: An Analysis of Dominant Zionist Narratives Formalized in The Israeli Declaration of Independence
100% (2)
Memories, Myths and Misconceptions: An Analysis of Dominant Zionist Narratives Formalized in The Israeli Declaration of Independence
136 pages
Cumulative Test 1-9 A: Grammar
No ratings yet
Cumulative Test 1-9 A: Grammar
6 pages
The 4 Disciplines of Execution Revised and Updated
No ratings yet
The 4 Disciplines of Execution Revised and Updated
8 pages
Quadrilateral
No ratings yet
Quadrilateral
7 pages
Guide To Loving Relationships
No ratings yet
Guide To Loving Relationships
65 pages
Commentary 1 Bang Pham
No ratings yet
Commentary 1 Bang Pham
8 pages
Full Download The Subject of Coexistence Otherness in International Relations Borderlines Series 1st Edition Louiza Odysseos PDF
100% (13)
Full Download The Subject of Coexistence Otherness in International Relations Borderlines Series 1st Edition Louiza Odysseos PDF
84 pages
4Q2324 C1 Drills - Hydraulics
No ratings yet
4Q2324 C1 Drills - Hydraulics
6 pages
Identity Crisis in Michael Ondaatje's The English Patient
No ratings yet
Identity Crisis in Michael Ondaatje's The English Patient
3 pages
Rating Scale For Student Teachers
100% (3)
Rating Scale For Student Teachers
3 pages
Copper Concentrate
No ratings yet
Copper Concentrate
5 pages
Datasheet Inverter 180VA 1200VA en
No ratings yet
Datasheet Inverter 180VA 1200VA en
2 pages
Answer
No ratings yet
Answer
2 pages
Student Centered Learning Toolkit
No ratings yet
Student Centered Learning Toolkit
72 pages
DLL English 10 Q1 - Module 1 - Lesson 3 - Myth, Implicit and Explicit Signals, Let It Go, Orpheus, Life of Pi
No ratings yet
DLL English 10 Q1 - Module 1 - Lesson 3 - Myth, Implicit and Explicit Signals, Let It Go, Orpheus, Life of Pi
8 pages
Swatch Sheet
100% (1)
Swatch Sheet
5 pages
Applied Radiological Anatomy 2nd Semester
No ratings yet
Applied Radiological Anatomy 2nd Semester
7 pages
Listening Compre and Dictation Grade 3
No ratings yet
Listening Compre and Dictation Grade 3
3 pages
Sydney Airport Airside Driving Pocket Book Jul 2018
No ratings yet
Sydney Airport Airside Driving Pocket Book Jul 2018
70 pages
Bommer Et Al 2015 A Sshac Level 3 Probabilistic Seismic Hazard Analysis For A New Build Nuclear Site in South Africa
No ratings yet
Bommer Et Al 2015 A Sshac Level 3 Probabilistic Seismic Hazard Analysis For A New Build Nuclear Site in South Africa
38 pages
Sade Assignment Full
No ratings yet
Sade Assignment Full
12 pages
Situating Uncertainty in Clinical Decisi
No ratings yet
Situating Uncertainty in Clinical Decisi
7 pages
Final Research 13
No ratings yet
Final Research 13
20 pages
Understanding Solar Plant Design ParametersSolar Irradiance, Tilt Angle, Azimuth, Efficiency Factors and Shading Analysis
No ratings yet
Understanding Solar Plant Design ParametersSolar Irradiance, Tilt Angle, Azimuth, Efficiency Factors and Shading Analysis
46 pages
05 - m106 - Partie4-7e
No ratings yet
05 - m106 - Partie4-7e
34 pages
Initial Summary Streetcar Cost Review 8-31-18
No ratings yet
Initial Summary Streetcar Cost Review 8-31-18
23 pages
Resume 1-Pharm 1 PG
No ratings yet
Resume 1-Pharm 1 PG
2 pages
Science Ramban Part 1
100% (5)
Science Ramban Part 1
85 pages
Motorcycle EFI Tuning Guide
100% (2)
Motorcycle EFI Tuning Guide
168 pages
SAP PM - Key Figures For Order Costs
No ratings yet
SAP PM - Key Figures For Order Costs
3 pages

5 Microarray PDF

Uploaded by

5 Microarray PDF

Uploaded by

Topic 7

Data-mining in trancriptomics databases

• Genome-wide expression profiling

5’ Sequence same as RNA

Genome / General Capability

Transcriptomics Readyness of the Cell

Proteomics / Physiological state

DNA RNA Protein

Ren et al, Science, 22 Dec 2000, vol 290, 2306-2309

of genes Phylogenetic profiles

E. coli purM purN

Metabolic pathways Known regulons in

Conserved operons other organisms

47 articles on RNA array data

ORF SAGE Tag • 25-mers

SAGE3 • Counts of SAGE 14-

GeneChip expression Each probe cell contains

By gene (rat spinal cord development, yeast cell cycle):

By condition or cell-type or by gene&cell-type (human

• To divide samples into homogeneous groups based on set

DNA regulatory elements

Gene Expression Data

• Distance & Similarity measures

Suppose two objects x and y both have p features :

3, r   (" sup" distance )

Gene Expression Levels Under 17 Conditions (1-High,0-Low)

Expression Gene A Gene B

Singular Value Decomposition (SVD) =

Linear transformation of Genes by Conditions space

Hierarchical: a series of successive fusions or

At the beginning, each object (gene) is

• Single-Link Method / Nearest Neighbor

where learning rate, (x, i) = 0.02 T / (T + 100 i)

Time -point Time -point

Hughes et al, Cell, 2000, vol 102, 109-126

Perou et al, Nature, 2000, vol 406, 747-752

Weinstein et al, Science (1997) 275, 343-349

60 cell lines 100 targets

• Clustering of cell lines based on A, T, & A.T’

In terms of expression In terms of activities

Alizadeh et al, Nature (2000) 403, 503-511

• General strategy, independent of previous

• Class Discovery: New Cancer Classes

• Class Prediction: Assigning tumors to known

• Based solely on gene expression monitoring

Golub et al, Science, 1999, vol 286, 531-537

Identify Distinguishing Features in a Dataset

Golub et al, Science, 1999, vol 286, 531-537

• Possibly, discovers a New Class of Cancer

Shoemaker, et al, Nature (2001) 409, 922-927

Tavazoie et al, Nature genetics (1999) 22, 281-85

• Classification of cancers, identification of marker

You might also like