KEMBAR78
Module3 OTML | PDF | Principal Component Analysis | Eigenvalues And Eigenvectors
0% found this document useful (0 votes)
15 views67 pages

Module3 OTML

The document outlines a course on optimization techniques in machine learning, focusing on Principal Component Analysis (PCA) for dimensionality reduction. It explains the motivation for PCA, its mathematical foundations, and its applications in real-world scenarios such as image compression and finance. The course aims to equip students with the ability to analyze and operationalize machine learning models while addressing challenges posed by high-dimensional data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views67 pages

Module3 OTML

The document outlines a course on optimization techniques in machine learning, focusing on Principal Component Analysis (PCA) for dimensionality reduction. It explains the motivation for PCA, its mathematical foundations, and its applications in real-world scenarios such as image compression and finance. The course aims to equip students with the ability to analyze and operationalize machine learning models while addressing challenges posed by high-dimensional data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

A8751 – Optimization Techniques in

Machine Learning
Course Overview:
The students will be able to understand and analyze how to deal
with changing data. They will also be able to identify and interpret
potential unintended effects in your project. They will understand
and define procedures to operationalize and maintain your
applied machine learning model.

Edited By Mr S Srinivas Reddy Asst Professor


Module 3:
Dimensionality Reduction and Optimization
Based on Mathematics for Machine Learning by
Deisenroth et al.

Chapters Referenced:
Chapter 10 (Dimensionality Reduction with PCA) of
the uploaded Mathematics for Machine Learning
Module 3: Dimensionality Reduction and Optimization

Problem Setting, Maximum Variance Perspective,


Projection Perspective, Eigenvector Computation and
Low-Rank Approximations, PCA in High Dimensions, Key
Steps of PCA in Practice, Latent Variable Perspective
Motivation & Intuition
"Imagine we collect data on 5 characteristics of students — height,
weight, exam score, attendance, and class participation. This is a 5-
dimensional dataset. Visualizing and analyzing it is hard. But what if
we could summarize most of this information in just 2 numbers — and
still capture nearly all the differences between students?“

This is exactly what PCA (Principal Component Analysis) does.

Objective : Reduce dimensions but retain maximum useful


information.
What is the Problem Setting in PCA???

 Principal Component Analysis- full form

 Linear dimensionality reduction technique.-type

 Transform high-dimensional data into a lowe-


dimensional space while retaining as much
information (variance) as possible.- Objective
Why do we need PCA?

 Real-world datasets often have many features (e.g., 100s or 1000s).


 Many of these features are correlated or redundant.
 Working with all features leads to:
 High computation cost.
 Difficulty in visualization.
 Overfitting due to noisy or irrelevant features.
 PCA helps by:

 Finding new uncorrelated variables (principal components).


 Keeping only the most important components (those with
highest variance).
Why do we need this?

•High-dimensional data often lies on a


low-dimensional subspace.
•Many features are redundant or correlated.
•Working in lower dimensions reduces storage,
computation, and improves visualization.
PCA-
To find new directions (called principal components) along which the data
varies the most.
These directions are orthogonal (perpendicular) to each other.
•Why do this?
• Reduce dimensionality (2D → 1D or higher → lower).
• Remove redundancy between correlated features.
• Focus on most informative directions.

Principal Components
• The directions we find are the eigenvectors of the covariance matrix
• The importance of each direction is given by the eigenvalues (they tell
how much variance lies along that direction).
Eigenvectors and Eigenvalues

Eigenvectors: Directions (axes) along which the data shows


maximum variance.

Eigenvalues: Amount of variance captured along each eigenvector.

Key Idea
 PCA transforms original correlated features into new
uncorrelated features (principal components).
 The first principal component = eigenvector with largest
eigenvalue (maximum variance).
Real-World Examples

•Image Compression: Reduce pixels from 784 to 50 while preserving


shape of digit (MNIST dataset).

•Face Recognition: PCA generates “eigenfaces” for efficient storage and


recognition.

•Finance: Reduce correlated stock indicators into a few principal factors.

•Weather Data: Temperature and humidity projected into one dimension


for seasonal trend analysis.
How does PCA work (Conceptual)?

1.Data as points in high-dimensional space


Example: Each 28×28 pixel image = a point in 784-D space.
2.Variance as Information
Directions where data varies most = most informative.
3.Principal Components
1.First principal component (PC1): Direction of maximum variance.
2.Second principal component (PC2): Next orthogonal direction of
maximum variance.
3.And so on.
4.Projection
1.Project original data onto first few components.
2.New representation = lower dimension but preserves most variance.
PCA with Covariance Matrix-
Problem:
PCA with Covariance Matrix-
Problem:
Solution:

1. Interpret Covariance Matrix


Diagonal elements (2, 2): Variance of each feature = 2 → both features vary
equally.
Off-diagonal elements (1): Covariance = 1 → positive correlation: larger
engines generally have lower fuel efficiency (inverse relationship visible after
sign analysis).
Step 2: Find Principal Components (Systematic Approach)
Step 2.1: Write the Covariance Matrix

Given covariance matrix:

Step 2.2: Compute Eigenvalues

Step 2.3: Compute Eigenvectors

Step 2.4: Order Eigenvalues and Select Principal Components

Step 2.5: Variance Explained


Maximum Variance Perspective of PCA

What is the idea?


We have high-dimensional data (e.g., 2D or 3D) and want to reduce it to
fewer dimensions (e.g., 1D) while keeping maximum information.

Information = spread of data = variance.

So, choose a line (direction) where the variance of projected data is


maximum.
Information = spread of data = variance.
1. What is spread of data
 Spread = how far data points are from the center (mean).
 If data points are close to mean → small spread.
 If data points are far from mean → large spread.
 Mathematically, spread is measured by distance from mean.

2 .Measuring distance: deviation from mean--


Maximum Variance Perspective of PCA step-by-step
solution-
• PCA looks for a line (direction) where the data
points are spread out the most.

• This line is called the


first principal component (PC1).

• By projecting data onto this line, we keep most


of the important information but in fewer
dimensions.
Projection Perspective:

Instead of thinking about “maximum spread,” projection perspective


looks at it like this:

“If we drop the data onto a line, which line gives the least
reconstruction error?”

In other words, we want the line where points stay closest to their
original positions after projection and reconstruction.
There are two complementary ways to understand PCA
mathematically:
1.Maximum Variance Perspective (already studied):
1.Finds directions (principal components) with maximum
variance.
2.Equivalent to finding eigenvectors of the covariance matrix.
2.Projection Perspective (to study now):
1.Minimizes reconstruction error when projecting data onto a
subspace.
2.Equivalent mathematically to the variance perspective but
focuses on error minimization.
Motivation
•In real-world applications, we often project high-dimensional data onto
fewer dimensions (like a plane or line).
•The question: How do we choose this projection to minimize the
information lost?
Maximum Variance Perspective says: “Pick the direction with maximum
spread.”
Projection Perspective says: “Pick the subspace that gives the smallest
reconstruction error after projection.”
Both lead to the same principal components but are derived differently:
•Variance View: Maximize variance of projected data.
•Projection View: Minimize error of reconstructing original data from
projection.
How It Works

1.Choose a line (direction) → candidate principal component.

2.Project data points onto this line (like casting shadows).

3.Reconstruct back (lift shadows back to 2D).

4.Measure error:
1.Error = distance between original point and reconstructed point.

5.Find direction that gives smallest total error.


Interpretation:
•When we reduce dimensions (2D → 1D), we lose some information.

•PCA keeps the direction of maximum variance (PC1) and ignores the second

direction (PC2).

•The error represents information lost in the ignored direction.

•Error = 0.06650.06650.0665 (very small)

•Meaning: Projection onto PC1 retains almost all the information (≈95.9% variance

kept).

•The lost variance (≈4%) is small, so 1D is a good approximation.

You might also like