0% found this document useful (0 votes)

13 views38 pages

Lecture 6

The document discusses various nonlinear dimensionality reduction techniques, including kernel PCA, Multidimensional Scaling (MDS), and Isometric Mapping (ISOMAP). It outlines the mathematical foundations and algorithms for these methods, emphasizing the importance of kernel matrices and distance metrics in transforming high-dimensional data into lower dimensions. Additionally, it provides references and implementation details for each technique.

Uploaded by

huntersganggaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views38 pages

Lecture 6

Uploaded by

huntersganggaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

CS303: Mathematical Foundations for AI

Nonlinear Dimensionality Reduction

23 Jan 2025
Recap

• Recap
▶ Principal Component Analysis (PCA)
▶ Linear Discriminant Analysis
• kernel PCA
• Multidimensional Scaling (MDS)
• Isometric Mapping (ISOMAP)

1 / 28
References

• kernel PCA
▶ PCA and Fisher’s Discriminant Analysis – Bishop, Christopher M., and
Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4.
No. 4. New York: springer, 2006.
▶ kernel PCA
▶ kernel Matrix
• MDS
▶ Video Lecture
▶ Slides
• ISOMAP
▶ Original Paper
▶ Video Lecture
▶ Slides

2 / 28
Manifold

3 / 28
Manifold

Locally Euclidean (flat)

4 / 28
PCA on Swiss Roll Dataset

PCA Fails

4 / 28
Recall PCA

• X is the n × d data matrix

• Center the data (i.e., mean normalize) X̄
• X̄ T X̄ is the covariance matrix (d × d)
• Compute the top k eigenvectors v1 , . . . , vk of X̄ T X̄
• The new X ′ = X̄ v1 . . . vk which is of n × k

5 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , which is n × n let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk

• The normalized eigenvectors vi of X̄ T X̄

1
vi = √ X̄ T ui
λi

Proof.
X̄ X̄ T ui = λi ui
X̄ T X̄ ( X̄ T ui ) = λi ( X̄ T ui ) On both sides multiply X̄ T
X̄ T ui
vi = (∥ X̄ T ui ∥2 = λi )
∥ X̄ T ui ∥

6 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk

• The eigen vectors vi of X̄ T X̄

1
vi = √ X̄ T ui
λi

• The new data X ′ i.e., n × k is given by

X̄ v1 . . . vk = X̄ X̄ T u1 . . . uk Λ−1/2

= K̄ u1 . . . uk Λ−1/2

6 / 28
kernel PCA

The projected data X ′ i.e., n × k is given by

K̄ u1 . . . uk Λ−1/2

where ui′ s are the eigen vectors of K̄

Note: we do not need to know X at all, all we need is the kernel matrix K (inner
product between all pairs of data points)

“kernel trick”

7 / 28
kernel PCA

8 / 28
kernel PCA

• Project data x to higher dimension space ϕ( x )

• To reduce dimensions we are going to “higher dimensions” ?
• Note that we will never need the ϕ( x ) (row vector), just like we did not need X
but K̄
• All we need is the following n × n matrix

ϕ ( x1 ) ϕ ( x1 ) T . . . ϕ ( x1 ) ϕ ( x n ) T
 

K = ϕ( X )ϕ( X )T = 
 .. 
. 
T
ϕ ( x n ) ϕ ( x1 ) . . . ϕ ( x n ) ϕ ( x n ) T

• Note that the above is symmetric and positive semi-definite

9 / 28
Kernal PCA

Given X,
• We need K̄
• Compute the top k eigenvectors of K̄, u1 , . . . , uk
• Given an input xi the output with reduced dimensions is given by
yi = (yi1 , . . . , yik ),
n u1j n ukj
yi1 = ∑ K̄(xi , x j ) √λ , . . . , yik = ∑ K̄(xi , x j ) √λ
j =1 1 j =1 k

Y = K̄ u1 . . . uk Λ−1/2

10 / 28
kernel PCA

• How to get K̄ from K ? that is kernel matrix from mean centered feature space
• How to chose K? What higher dimension space?

11 / 28
Obtaining K̄
Centering in Feature Space

1 n
n i∑
µϕ = ϕ ( x i ); ϕ ′ ( xi ) = ϕ ( xi ) − µ ϕ , ∀i ∈ {1, 2, . . . , n}.
=1

The centered kernel matrix K̄ is computed as:

K̄ij = ⟨ϕ′ ( xi ), ϕ′ ( x j )⟩.

Expanding ϕ′ ( xi ) and ϕ′ ( x j ) Substitute ϕ′ ( xi ) = ϕ( xi ) − µϕ :

K̄ij = ⟨ϕ( xi ) − µϕ , ϕ( x j ) − µϕ ⟩.

12 / 28
Obtaining K̄

Expand the inner product:

K̄ij = ⟨ϕ( xi ), ϕ( x j )⟩ − ⟨ϕ( xi ), µϕ ⟩ − ⟨µϕ , ϕ( x j )⟩ + ⟨µϕ , µϕ ⟩.

• First term is ⟨ϕ( xi ), ϕ( x j )⟩ = Kij

• Second term is ⟨ϕ( xi ), µϕ ⟩ = n1 ∑nk=1 ⟨ϕ( xi ), ϕ( xk )⟩ = 1
n ∑nk=1 Kik
• Third term is ⟨µϕ , ϕ( x j )⟩ = n1 ∑nk=1 ⟨ϕ( xk ), ϕ( x j )⟩ = n1 ∑nk=1 Kkj
• Fourth term is ⟨µϕ , µϕ ⟩ = n12 ∑nk=1 ∑nl=1 ⟨ϕ( xk ), ϕ( xl )⟩ = n12 ∑nk=1 ∑nl=1 Kkl

12 / 28
Obtaining K̄
Combine these results:
1 n 1 n 1 n n
K̄ij = Kij − ∑
n k =1
Kik − ∑ Kkj + 2
n k =1 n ∑ ∑ Kkl .
k =1 l =1

1 1 1
K̄ = K − K1n − 1n K + 2 1n K1n ,
n n n
where 1n is an n × n matrix with all entries equal to 1.

Simplify further using H = In − n1 1n (the centering matrix):

K̄ = HKH.

12 / 28
Choice of K

• ϕ( X )ϕ( X )T leads to a symmetric and positive semidefinite K

• Any K which is symmetric and positive semidefinite will have corresponding ϕ( X )
(Mercer’s Theorem)

The Radial Basis Function (RBF) kernel,

!
∥ x i − x j ∥2
K ( xi , x j ) = exp − ,
2σ2

13 / 28
Choice of K

What ϕ gives leads to RBF?

13 / 28
Implementation

kernelPCA code

Summary: Given a gram matrix i.e., K or XX T (linear kernel) we can compute data in
lower dimension without access to X

14 / 28
Unrolling Swiss Roll

15 / 28
Multidimensional Scaling

16 / 28
Multidimensional Scaling

16 / 28
Multidimensional Scaling (MDS)

• We have X in n × d (High dimension)

• We need to go to Y in n × k (Low dimension)
min ∥ DX − DY ∥2F
Y,rank(Y )≤k

• DX (ij): distance between xi and x j

• DY (ij): distance between yi and y j
“Preserving the distances after transformation”

17 / 28
Classical Multidimensional Scaling

The given distances are Euclidean

• DX (ij) = d2ij = ∥ xi − x j ∥22
• d2ij = ∥ xi − x j ∥2 = ∥ xi ∥2 + ∥ x j ∥2 − 2xi x Tj (Eq. 1)
• Can we get the kernel (Gram matrix) from the distances?
▶ Non-unique
▶ Assume translational invariance ∑i xi = 0
• From Gram matrix we can then go to optimal low rank representation

18 / 28
Classical MDS
Solve the following equations 3 equations for xi x Tj (And ∑i xi = 0)

!
1 1 1
∑ d2ij =
n ∑ d2ij − ∑ ∥ xi ∥ 2 ∥ x i ∥2 =
n ∑ d2ij − 2n ∑ d2ij
i i i i ij
1 1
!
1
∑ d2ij = n ∑ d2ij − ∑ ∥ x j ∥2 ∥ x j ∥2 =
n ∑ d2ij − 2n ∑ d2ij
j ij
j j j

∑ ∑ d2ij = 2n ∑ ∥ xk ∥2
i j k Substitute the above in the Eq. 1 we have
1
xi x Tj = (∥ xi ∥2 + ∥ x j ∥2 − d2ij )
2

19 / 28
Classical MDS

Using these, the centered form of B is:

!
1 1 1 1
Kij = xi x Tj = −
2
d2ij −
n ∑ d2ij − n ∑ d2ij + n2 ∑ d2ij .
i j ij

Let 1n be an n × n matrix of ones:

1
K = − HDX H,
2

where H = In − n1 1n is the centering matrix

20 / 28
Classical MDS

Complete Algorithm
• We are given DX
• Compute K = − 12 HDX H
• Find the top k eigenvectors V = v1 . . . vk of K

• The new data Y would be

−1/2 1/2 vi vi
Y = KVΛ = VΛ As K √ = λi √
λi λi

21 / 28
Metric MDS

For Swiss roll data Euclidean distance is not good!

• D is any general distance metric
▶ d( x, x ) = 0
▶ Non-negative: d( x, y) ≥ 0
▶ Symmetry: d( x, y) = d(y, x )
▶ Triangle Inequality: d( x, z) ≤ d( x, y) + d(y, x )
• Examples?

22 / 28
Metric MDS

• A non-Euclidean D will not recover the exact embedding :(

• How good is the embedding found?

∑ij (dij − ∥yi − y j ∥)2

stress(y) =
d2ij

• Find y′ s that “Minimize stress” :)

23 / 28
ISOMAP

What D to use for Swiss roll kind of dataset?

Geodesic distance

24 / 28
ISOMAP

Given the data X which lies on some high dimension space on a low dimension
manifold.
• Is the manifold known?
• Then how to compute geodesic distance?

25 / 28
ISOMAP

Exploit “locally euclidean”

• Given a point x
• Find the neighbors of x (ϵ-ball)
• Build the nearest neighbor graph with edges as the Euclidean distances

26 / 28
ISOMAP

How to find the distance DX (ij)

• DX (ij) is the shortest path between ij on the nearest neighbor graph (Dijkstra)

27 / 28
ISOMAP

ISOMAP Algorithm
• Given: pairwise distances between high dimensional input points xi , x j , dij
• Compute nearest neighbour graph G using ϵ-ball
• Compute DX from G
• Apply MDS on DX to obtain low dimensional Y

27 / 28
Implementation ISOMAP

Notebook

28 / 28

Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
KPCA
No ratings yet
KPCA
26 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Mathematial Introduction To Data Science
No ratings yet
Mathematial Introduction To Data Science
158 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
No ratings yet
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
9 pages
Advanced Dimension Reduction Techniques
No ratings yet
Advanced Dimension Reduction Techniques
40 pages
Subspace & Kernel Methods Overview
100% (1)
Subspace & Kernel Methods Overview
12 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Kernel Isomap: Heeyoul Choi and Seungjin Choi
No ratings yet
Kernel Isomap: Heeyoul Choi and Seungjin Choi
6 pages
Pca
No ratings yet
Pca
6 pages
QSRI Lecture4
No ratings yet
QSRI Lecture4
56 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
No ratings yet
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
11 pages
ML Classifiers & PCA Explained
No ratings yet
ML Classifiers & PCA Explained
15 pages
Data Scaling and Statistical Methods
No ratings yet
Data Scaling and Statistical Methods
4 pages
Week 1
No ratings yet
Week 1
19 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
771 A18 Lec19
No ratings yet
771 A18 Lec19
131 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
24 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
20 Pca
No ratings yet
20 Pca
50 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
7.pca Mda
No ratings yet
7.pca Mda
26 pages
Supplementary - Active Learning Alloys
No ratings yet
Supplementary - Active Learning Alloys
38 pages
MLSP-6 Dimensionality Reduction
No ratings yet
MLSP-6 Dimensionality Reduction
39 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
33 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Lec6 Pca
No ratings yet
Lec6 Pca
10 pages
AI Unsupervised Learning Guide
No ratings yet
AI Unsupervised Learning Guide
44 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
Week12 PCA BayesianInference Before Lecture
No ratings yet
Week12 PCA BayesianInference Before Lecture
82 pages
CH 2
No ratings yet
CH 2
121 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
Lecture 3 Introduction To Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction To Linear Algebra (Part 2)
57 pages
Lecture 11 Dimensionality Reduction
No ratings yet
Lecture 11 Dimensionality Reduction
32 pages
Lec 3
No ratings yet
Lec 3
60 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
Week 9 Notes
No ratings yet
Week 9 Notes
6 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Unit 3
No ratings yet
Unit 3
21 pages
Nonlinear Dimensionality Reduction
No ratings yet
Nonlinear Dimensionality Reduction
18 pages
Lecture: Dimensionality Reduction With Principal Component Analysis
No ratings yet
Lecture: Dimensionality Reduction With Principal Component Analysis
42 pages
Guidlances Modulo 2
No ratings yet
Guidlances Modulo 2
4 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Roots: Graphical
No ratings yet
Roots: Graphical
18 pages
#Thomas: Tridiagonal
No ratings yet
#Thomas: Tridiagonal
14 pages
Solving Linear Systems: Decomposition
No ratings yet
Solving Linear Systems: Decomposition
24 pages
Modelling and Stressing The Interest Rates Swap Curve: September 2013
No ratings yet
Modelling and Stressing The Interest Rates Swap Curve: September 2013
19 pages
MA5232 Modeling and Numerical Simulations: Iterative Methods For Mixture-Model Segmentation 8 Apr 2015
No ratings yet
MA5232 Modeling and Numerical Simulations: Iterative Methods For Mixture-Model Segmentation 8 Apr 2015
32 pages
Large-Scale Unusual Time Series Detection
No ratings yet
Large-Scale Unusual Time Series Detection
4 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Manzo Et Al-2018-Mammal Research
No ratings yet
Manzo Et Al-2018-Mammal Research
11 pages
Batch 37
No ratings yet
Batch 37
49 pages
Top 10 NLP Question - Answer
No ratings yet
Top 10 NLP Question - Answer
16 pages
The Three-Pass Regression Filter A New Approach To Forecasting Using Many Predict
No ratings yet
The Three-Pass Regression Filter A New Approach To Forecasting Using Many Predict
23 pages
Activity 5a - Data Analysis Using R and Other Stat Application-1
No ratings yet
Activity 5a - Data Analysis Using R and Other Stat Application-1
8 pages
EViews 8 Getting Started
100% (1)
EViews 8 Getting Started
70 pages
Lexicon Development For The Sensory Description of Rye Bread
No ratings yet
Lexicon Development For The Sensory Description of Rye Bread
11 pages
10 1 1 741 5971
No ratings yet
10 1 1 741 5971
11 pages
PCA Biology
No ratings yet
PCA Biology
45 pages
Week 1
No ratings yet
Week 1
17 pages
Chemometrics and Intelligent Laboratory Systems
No ratings yet
Chemometrics and Intelligent Laboratory Systems
13 pages
Chen Et Al 2000
No ratings yet
Chen Et Al 2000
13 pages
Accelerated C++
No ratings yet
Accelerated C++
13 pages
Inline NIR for Tablet Uniformity
No ratings yet
Inline NIR for Tablet Uniformity
8 pages
Report Rahul
No ratings yet
Report Rahul
26 pages
Artificial Intelligence in Geosciences
No ratings yet
Artificial Intelligence in Geosciences
18 pages
Machine Learning Interview Questions and Answers PDF
No ratings yet
Machine Learning Interview Questions and Answers PDF
15 pages
ML-1 Guided Project Business Report
No ratings yet
ML-1 Guided Project Business Report
28 pages
11499-Texto Del Artículo-58253-1-10-20101213
No ratings yet
11499-Texto Del Artículo-58253-1-10-20101213
12 pages
Impact of Digital Financial Inclusion On ASEAN Banking Stability Implications For The Post Covid-19 Era
No ratings yet
Impact of Digital Financial Inclusion On ASEAN Banking Stability Implications For The Post Covid-19 Era
20 pages
Structural Design of Reinforced Concrete Pile Caps
No ratings yet
Structural Design of Reinforced Concrete Pile Caps
118 pages
Organizational Culture, Leadership and Performance in Dubai Municipalit
No ratings yet
Organizational Culture, Leadership and Performance in Dubai Municipalit
17 pages
Loss Functions & Classification Metrics
No ratings yet
Loss Functions & Classification Metrics
56 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
SUWARTI Et Al, Lengkap Secondary Trait and Index Selection Determination For Maize Genotype Selection in Acidic Tidal Swamp Environment
No ratings yet
SUWARTI Et Al, Lengkap Secondary Trait and Index Selection Determination For Maize Genotype Selection in Acidic Tidal Swamp Environment
33 pages
Data Science Course in Hyderabad - Innomatics
No ratings yet
Data Science Course in Hyderabad - Innomatics
10 pages

Lecture 6

Uploaded by

Lecture 6

Uploaded by

CS303: Mathematical Foundations for AI

Nonlinear Dimensionality Reduction

Locally Euclidean (flat)

• X is the n × d data matrix

• The normalized eigenvectors vi of X̄ T X̄

• The eigen vectors vi of X̄ T X̄

• The new data X ′ i.e., n × k is given by

The projected data X ′ i.e., n × k is given by

where ui′ s are the eigen vectors of K̄

• Project data x to higher dimension space ϕ( x )

• Note that the above is symmetric and positive semi-definite

The centered kernel matrix K̄ is computed as:

K̄ij = ⟨ϕ′ ( xi ), ϕ′ ( x j )⟩.

Expanding ϕ′ ( xi ) and ϕ′ ( x j ) Substitute ϕ′ ( xi ) = ϕ( xi ) − µϕ :

Expand the inner product:

K̄ij = ⟨ϕ( xi ), ϕ( x j )⟩ − ⟨ϕ( xi ), µϕ ⟩ − ⟨µϕ , ϕ( x j )⟩ + ⟨µϕ , µϕ ⟩.

• First term is ⟨ϕ( xi ), ϕ( x j )⟩ = Kij

Simplify further using H = In − n1 1n (the centering matrix):

• ϕ( X )ϕ( X )T leads to a symmetric and positive semidefinite K

The Radial Basis Function (RBF) kernel,

What ϕ gives leads to RBF?

• We have X in n × d (High dimension)

• DX (ij): distance between xi and x j

The given distances are Euclidean

Using these, the centered form of B is:

Let 1n be an n × n matrix of ones:

where H = In − n1 1n is the centering matrix

• The new data Y would be

For Swiss roll data Euclidean distance is not good!

• A non-Euclidean D will not recover the exact embedding :(

∑ij (dij − ∥yi − y j ∥)2

• Find y′ s that “Minimize stress” :)

What D to use for Swiss roll kind of dataset?

Exploit “locally euclidean”

How to find the distance DX (ij)

You might also like