0% found this document useful (0 votes)

122 views5 pages

Lecture08 Notes

1) The document discusses dimensionality reduction techniques for MNIST image data. It finds that some pixels have zero variance and don't contribute useful information, allowing dimensionality to be reduced from 784 to 609 pixels. 2) Analyzing pixel variances cumulatively shows that around 75% of the variance can be explained by just 500 pixels, suggesting further dimensionality reduction may be possible by finding correlations between pixels. 3) The document introduces random projections as a simple and computationally efficient linear method to find a lower dimensional representation of the data that preserves its intrinsic structure and variations.

Uploaded by

chelsea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views5 pages

Lecture08 Notes

Uploaded by

chelsea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lecture 8: Dimensionality Reduction

Iain Styles
5 November 2019

mnist data is not uniformly distributed

Reducing the Dimensionality of MNIST

One problem with the analysis we have done so far is that we have
considered points that are distributed throughout a hypervolume.
"Real" data does not behave in this way; it tends to lie on some
low-dimensional subspace. Trivially, in MNIST, some pixels have
the same value in all images, as shown in Figure 1, which shows
the mean and standard deviation of 1000 MNIST images, together
with a mask that shows which pixels vary, and which do not. 175
of the 784 pixels in the image, which correspond to dimensions in
the underlying vector space, have zero variance. In this case they
also have zero mean, but that is less important. This means that
we could, trivially, reduce the dimensionality of the data to 609, by
just retaining those pixels which vary and therefore contain useful
information.

Mean Standard Deviation Non-varying pixels

Figure 1: Image statistics from 1000

MNIST images. Left: The pixel-mean.
In Figure 2 we show the cumulative sum of the pixel variances, Centre: the pixel standard deviation.
and see that nearly all of the variance is from 500 pixels, and Right: Black pixels are those with zero
variance.
around 75% of it is from only half of the pixels, suggesting that
further reductions in dimensionality may be possible. In fact, by
looking at each pixel in isolation, we are only scratching the surface
of what is possible in terms of reducing the dimension of the data
in a meaningful way. One easy-to-picture reason why we might be
able to reduce the dimensionality further is that correlations be-
tween pixels mean that the value of one pixel can be predicted from
the value of another, but more complex situations are possible. For
instance, imagine that you have a picture or a poster that is rolled
up. When unrolled, points on the poster can be described quite
adequately in a 2d coordinate system. However, when the poster
is rolled up, it becomes a three-dimensional object that requires a
3d coordinate system. If we could find a transformation that could
"unroll" the poster, then we could work with it in 2d coordinates.
This argument can be extended to higher dimensions, and the task
high dimensional images
tend to have a sub-structure
they are not randomly distributed entirely
across the entire dimension space

if we can identify this underlying substructure

we can reduce the dimensionality
lecture 8: dimensionality reduction 2

of dimensionality reduction is to find the transformations that allows

1.00

Cumulative Variance
to represent data using their intrinsic dimensionality: the number of
meaningful degree of freedom. 0.75
To understand what this means in terms of MNIST, let us con-
0.50
sider what the underlying degrees of freedom are.
0.25
• We have ten digit class.
0.00
• Members of each class may vary by width and height (although
0 500
this is somewhat mitigated by the alignment process.) Dimension
• Each character class will have some shape variation – for exam-
Figure 2: Cumulative variance as a
ple, whether one’s have a serif and/or a base (eg 1 vs 1). function of dimensionality. The pixels
were ordered by variance and the
An order-of-magnitude estimate suggests that there may be a cumulative sum was calculated. This
shows that around 75% of the variance
few ten’s of intrinsics degree of freedom in the data. The question
in the data is from fewer than half of
now is, how do we find a transformation that extracts them? the pixels.
we are seeking to project
our data onto a set of
Dimensionality Reduction axes which describes the
majority of the data
We will limit our discussion to linear methods for dimensionality
reduction; that is, methods that apply a single global linear trans- think taking a 1d object
formation (rotation, scale, shear etc) to the data. To understand and lining it up exclusively
what this might do, imagine a sheet of paper embedded in 3d. If with the x axis in a 3d space
we can rotate and shift the sheet so that it aligns with the 3d coordi- e.g. a pole
nate system (x, y, z), then we can discard the coordinate that aligns
with the thinnest dimension of the sheet of paper (its thickness), we only need one dimension
and just retain those coordinates that describe the main variations to describe this object
in the sheet: we can reduce the problem from 3d to 2d. We will be
trying to do the same here: to shift and rotate the data to find a
coordinate system that describes the main variation whilst preserv-
ing the internal structure of the data. Our hope is that by reducing
the dimensionality of the data in a way that preserves its internal
structure, we can mitigate some of the issues that we see in high
dimensional spaces.
There are many ways in which the dimensionality of a dataset
can be reduced; we will focus on the randomrandom projection.
This is an attractive method because it is computationally very
simple and low-cost, but there is some elegant theory behind it that
provides much insight into the problems that high dimensionality
causes. you can gain
a little bit by running
the random projections
Random Projections a few times
and averaging,
We have already seen random projections in action, and have but it is surprisingly robust
seen, experimentally, how much benefit they can bring to a high- to variance within the random
dimensional learning problem. The central idea is exceptionally projections anyway
simple. Given an original dataset containing N samples in M di-
mensional space, arranged as an M × N matrix X (one sample per
column):
N for mnist
M
M = 784
N = 60,000
K lecture 8: dimensionality reduction 3

M each column in this vector is random, normalised to sum to 1

• Generate K random vectors with M components randomly sam-

we end with X’ = K x N
pled from N (0, 1). Arrange these as a matrix R of size M × K,
N samples in K dimensional space
with one vector per column.

• Normalise the columns of R so each has unit length.

N reduced |P| cos �� P
• Compute X0 = RT X K dimensionality
from M to K r2
• The columns of X0 contain the samples projected (Figure 3) into |P|
the lower dimensional space define by the K random vectors.
r1 |P| cos ��1
��
We saw, in k-nn classification, that this works. But why does it ��

work? The key idea is that whilst distances in high-dimensional Figure 3: Projection of a point P onto
spaces are of limited use, if we could could map the the points into two unit random vectors r1 and r2 .
Noting that r1 and r2 are unit vectors,
some low-dimensional space (where distances are more useful) in
the projection is given by the dot
a way that preserves the local structure of the data, then we can render product P · r = | P| cos θ.
distances useful again.
Is it possible to find such a map? It turns out that random pro-
jections, subject to a few constraints, can be be shown to gener-
ate just such a mapping. The key to this is a mathematical result
known as the Johnson-Lindenstrauss lemma1 . This is a statement that 1
Johnson, William B., and Joram Lin-
points in a high-dimensional space can be mapped onto a low- denstrauss. "Extensions of Lipschitz
mappings into a Hilbert space." Con-
dimensional space in such a way that relative distances between temporary mathematics 26.189-206
points are preserved, whilst the absolute distances are reduced. The (1984): 1.
lemma states that:
Given a set X of N data points in R M (M-dimensional real
distance
between space), there is a linear map f : R 7→ R that maps X onto a
M K
distance between
points random lower-dimensional space RK . The map obeys points post mapping
before
mapping (1 − ε)k x1 − x2 k2 ≤ k f ( x1 ) − f ( x2 )k2 ≤ (1 + ε)k x1 − x2 k2 (1)
distance between
for all x1 , x2 ∈ X and for 0 < ε < 1 and K > 8 ln( N )/ε2 . It has points before mapping
2
also been proven that this holds when the function f is a random, 2
Frankl, Peter, and Hiroshi Maehara.
orthonormal linear transformation. A proof of this result is far be- "The Johnson-Lindenstrauss lemma
and the sphericity of some graphs."
yond the scope of this course. Let us try to unpick its implications. Journal of Combinatorial Theory,
Firstly, we need to be clear of the central result here: Johnson- Series B 44.3 (1988): 355-362.
Lindenstrauss and its extension by Frankl and Maehara together
state that a “random, orthonormal linear transformation” can
be used to project a set of points onto a lower-dimensional sub-
space whilst preserving the distances between them to within
some bounds controlled by e: distances between points are scaled
by a factor between 1 − ε and 1 + ε. Noting the condition that
K > 8 ln( N )/ε2 , this means that smaller values of ε (better distance better preservation of distances
preservation) requires higher values of K. requires a larger dimensionality
One now comes across yet another remarkable property of high-
dimensional spaces. The requirement that the the random vectors
be orthormal – ri · r j = 1 if i=j else 0 – seems like it could be rather
onerous. There are well-established methods for doing this (such as
Gramm-Schmidt orthogonalisation), but they carry a computational
cost. Fortunately, high dimensionality saves us and it turns out that
an explicit normalisation step is often not needed, because in very
high-dimensional spaces, the probability that two random vectors
lecture 8: dimensionality reduction 4

will be orthogonal tends towards 1! We demonstrate this numeri-

cally in Figure 4, noting that because x · y = | x ||y| cos θ, x · y = 0
means that the cos θ = 0 corresponds to an angle of 90◦ which
means that x and y are orthogonal. These results demonstrate that
as the dimensionality increases, the probability that two random
vectors are are not orthogonal tends to zero. This is again a conse-
quence of the dimensionality, and it can be shown that the number
of sets of nearly orthonal vectors is exponential in the dimensional-
ity3 . If the dimensionality of our problem is high enough, random 3
Kainen, Paul C.; Kůrková, Vĕra
vectors are self-orthogonalising! (1993), "Quasiorthogonal dimen-
sion of Euclidean spaces", Applied
dot product gives cosine Mathematics Letters, 6 (3): 7âĂŞ10,
cosine(x) = 0 —> x = 90/180/270 etc
d=5 d=10 d=20
doi:10.1016/0893-9659(93)90023-G, MR

0.8
1347278

1.00 1.5
0.6
0.75
P(r1 r2)

P(r1 r2)

P(r1 r2)
0.4 1.0
0.50
0.2 0.25 0.5
0.0 0.00 0.0
1 0 1 1 0 1 1 0 1
r1 r2 r1 r2 r1 r2

d=50 d=100 d=200

4
2 3 4
P(r1 r2)

P(r1 r2)

2
1 2
1
0 0 0
1 0 1 1 0 1 1 0 1
r1 r2 r1 r2 r1 r2

d=500 d=1000
8
10
6
P(r1 r2)

P(r1 r2)

random vectors
4 5 are nearly all orthogonal
to each other
2
0 0
1 0 1 1 0 1
r1 r2 r1 r2
Figure 4: P(ri · r j |i 6= j) from 100,000
normalised random vectors in different
Finally, let us investigate the effect that random projection has on dimensional spaces.
the pairwise distances. We compute all pairwise distances on the
data projected on to forty random vectors, with distances computed
in the lower-dimensional space. The results are shown in Figure 5.
The mean/median of the distribution has been reduced to ≈ 580
from ≈ 2300, and the standard deviation to ≈ 100 from ≈ 300. This
lecture 8: dimensionality reduction 5

does not seems entirely consistent with the Johnson-Lindenstrauss

lemma, which stated that distances would be preserved. Why is
this?
Although the main implications of Johnson-Lindenstrauss seem
clear, there is a sublety. One would expect dimensionality reduction
to reduce the distances between points, not to preserve them, but J-L
seems to say the opposite. However, it is perhaps more helpful to
rewrite the theorem as

k f ( x1 ) − f ( x2 )k2
1−ε ≤ ≤ 1+ε (2)
k x1 − x2 k
which allows us to see that J-L is in fact a theorem about the
relative distances between points: it states that it is the relative dis-
tances that are preserved, whilst the absolute distances become 0.004
reduced due to dimensional scaling. This is what makes it such a
0.003

P(||r1 r2||)
powerful result for machine learning: we can use random projec-
tions to reduce the dimensionality and overcome the "curse" at a 0.002
very low computational cost, whilst preserving the structure (rela-
0.001
tive distances) within the data. This is why we see such a dramatic
performance improvement in k-nn classification on MNIST digits. 0.000
500 1000
||r1 r2||

Figure 5: Distribution of pairwise

distances following projection onto
forty random vectors in pixel space
between 1000 examples from the
MNIST test set, and 1000 examples
from the training set.

Unit 4
No ratings yet
Unit 4
109 pages
Lec 11 Data Preparation Dimensionality-Reduction-JL-Lemma
No ratings yet
Lec 11 Data Preparation Dimensionality-Reduction-JL-Lemma
36 pages
ML Module 6
No ratings yet
ML Module 6
29 pages
Module7 Slides
No ratings yet
Module7 Slides
69 pages
Dimensionality Reduction 22-01-22
No ratings yet
Dimensionality Reduction 22-01-22
47 pages
Dimensionality Reduction in ML
No ratings yet
Dimensionality Reduction in ML
47 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
Geometry of High-Dimensional Space
No ratings yet
Geometry of High-Dimensional Space
36 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
38 pages
Subtitle
No ratings yet
Subtitle
1 page
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
ML - Week 05
No ratings yet
ML - Week 05
25 pages
Hadsell Chopra Lecun 06 PDF
No ratings yet
Hadsell Chopra Lecun 06 PDF
8 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Dimension
No ratings yet
Dimension
53 pages
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
No ratings yet
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
12 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
Curse
No ratings yet
Curse
12 pages
Curse of Dimensionality and Its Reduction
No ratings yet
Curse of Dimensionality and Its Reduction
5 pages
CS434a/541a: Pattern Recognition Prof. Olga Veksler
No ratings yet
CS434a/541a: Pattern Recognition Prof. Olga Veksler
42 pages
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
No ratings yet
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
10 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
274 pages
Charting A Manifold: Mitsubishi Electric Information Technology Center America, 2003
No ratings yet
Charting A Manifold: Mitsubishi Electric Information Technology Center America, 2003
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
41 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Lecture06 Graph Visualization
No ratings yet
Lecture06 Graph Visualization
74 pages
Manifold Learning Algorithms
No ratings yet
Manifold Learning Algorithms
17 pages
MBC W1-2 Notes
No ratings yet
MBC W1-2 Notes
21 pages
A Study of Projection Based Multiplicative Data Perturbation For Privacy Preserving Data Mining
No ratings yet
A Study of Projection Based Multiplicative Data Perturbation For Privacy Preserving Data Mining
3 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
Supplementary - Active Learning Alloys
No ratings yet
Supplementary - Active Learning Alloys
38 pages
5-Dimension Reduction
No ratings yet
5-Dimension Reduction
48 pages
Dimensionality Reduction for Scientists
No ratings yet
Dimensionality Reduction for Scientists
8 pages
Aaai06 280
No ratings yet
Aaai06 280
4 pages
Pca
No ratings yet
Pca
6 pages
Dimension Reduction For Clustering
No ratings yet
Dimension Reduction For Clustering
35 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
L16 LogisticRegression2
No ratings yet
L16 LogisticRegression2
58 pages
Ee5551 Newproj Report
No ratings yet
Ee5551 Newproj Report
18 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
LDA and PCA Algorithms Explained
No ratings yet
LDA and PCA Algorithms Explained
10 pages
Machine Learning Dimensionality Guide
No ratings yet
Machine Learning Dimensionality Guide
9 pages
PrincipalComponentAnalysis LectureNotesPublic
No ratings yet
PrincipalComponentAnalysis LectureNotesPublic
24 pages
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
No ratings yet
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
18 pages
Feature Selection Extraction
No ratings yet
Feature Selection Extraction
24 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
FML 4
No ratings yet
FML 4
26 pages
Manifold Learning via Projections
No ratings yet
Manifold Learning via Projections
8 pages
Pat Recogn
No ratings yet
Pat Recogn
145 pages
American Headway 4 Workbook PDF
0% (2)
American Headway 4 Workbook PDF
3 pages
Srividya College of Engineering and Technology Question Bank
No ratings yet
Srividya College of Engineering and Technology Question Bank
8 pages
Power Series Solutions - Complete
No ratings yet
Power Series Solutions - Complete
65 pages
ALLGON 2002 Product Catalog
0% (1)
ALLGON 2002 Product Catalog
92 pages
Casting Processes and Defects Quiz
No ratings yet
Casting Processes and Defects Quiz
10 pages
Presentation1 (Accidental Sampling)
No ratings yet
Presentation1 (Accidental Sampling)
40 pages
SOT-23 Plastic-Encapsulate Transistors: Jiangsu Changjiang Electronics Technology Co., LTD
No ratings yet
SOT-23 Plastic-Encapsulate Transistors: Jiangsu Changjiang Electronics Technology Co., LTD
2 pages
Project Schedule Update Guide
100% (1)
Project Schedule Update Guide
2 pages
Prediction of Roughness and Tool Wear in - 2
No ratings yet
Prediction of Roughness and Tool Wear in - 2
22 pages
Biochem Molecular Bio Educ - 2023 - Garma - Demystifying Dimensionality Reduction Techniques in The Omics Era A
No ratings yet
Biochem Molecular Bio Educ - 2023 - Garma - Demystifying Dimensionality Reduction Techniques in The Omics Era A
14 pages
Cellular Respiration for Students
No ratings yet
Cellular Respiration for Students
15 pages
RCH-Series Hollow Plunger Cylinders EN-US
No ratings yet
RCH-Series Hollow Plunger Cylinders EN-US
2 pages
Kinetic Theory - Class 11 Physics NCERT Solutions Free PDF Download
No ratings yet
Kinetic Theory - Class 11 Physics NCERT Solutions Free PDF Download
22 pages
Class X (Mathematics) : Holiday Homework
No ratings yet
Class X (Mathematics) : Holiday Homework
7 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
24 pages
By Kevin E. Presley Training Coordinator
No ratings yet
By Kevin E. Presley Training Coordinator
35 pages
Webasto BlueCool Classic Operation, Settings, Troubleshooting
No ratings yet
Webasto BlueCool Classic Operation, Settings, Troubleshooting
54 pages
1 Topic: Applications of Partial Differential Equations
No ratings yet
1 Topic: Applications of Partial Differential Equations
20 pages
Astm A729 A729m-09
No ratings yet
Astm A729 A729m-09
8 pages
Digital Transmission
No ratings yet
Digital Transmission
48 pages
Introductory Econometrics Exam
No ratings yet
Introductory Econometrics Exam
2 pages
Comprehensive Guide to Report Writing
No ratings yet
Comprehensive Guide to Report Writing
127 pages
1k Resistor Datasheet SMD
No ratings yet
1k Resistor Datasheet SMD
8 pages
AC Compressor Manual
No ratings yet
AC Compressor Manual
48 pages
Jntuk R16 CSE Syllabus
No ratings yet
Jntuk R16 CSE Syllabus
175 pages
How To Construct RWH Tanks
No ratings yet
How To Construct RWH Tanks
20 pages
Millikan Oil Drop Experiment
No ratings yet
Millikan Oil Drop Experiment
6 pages
Worksheet 8 Answers
No ratings yet
Worksheet 8 Answers
1 page
Hospital
No ratings yet
Hospital
32 pages
Cube-Voyager - Technical Brochure
No ratings yet
Cube-Voyager - Technical Brochure
3 pages

Lecture08 Notes

Uploaded by

Lecture08 Notes

Uploaded by

Lecture 8: Dimensionality Reduction

mnist data is not uniformly distributed

Mean Standard Deviation Non-varying pixels

Figure 1: Image statistics from 1000

if we can identify this underlying substructure

of dimensionality reduction is to find the transformations that allows

M each column in this vector is random, normalised to sum to 1

• Generate K random vectors with M components randomly sam-

• Normalise the columns of R so each has unit length.

will be orthogonal tends towards 1! We demonstrate this numeri-

d=50 d=100 d=200

does not seems entirely consistent with the Johnson-Lindenstrauss

Figure 5: Distribution of pairwise

You might also like