0% found this document useful (0 votes)

4 views40 pages

Day8 Unsupervised Learning

Idk it wanted to uploads me the pdf so i am doing this may be it is useful for you so you can check it out

Uploaded by

chandankushal1520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views40 pages

Day8 Unsupervised Learning

Idk it wanted to uploads me the pdf so i am doing this may be it is useful for you so you can check it out

Uploaded by

chandankushal1520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Unsupervised Learning

Agenda
• Clustering methods

• K-means clustering

• Hierarchical clustering

• Dimensionality reduction

• PCA

2
Machine Learning Methods
Do you have labeled data?

Yes No

Supervised Unsupervised
What do you want to predict? Do you want to group the data?

Category Quantity Yes No

Classification Regression Clustering Dimensionality reduction

Logistic Linear Ridge

KNN SVM CART Lasso K-means Hierarchical PCA
Regression Regression Regression
Unsupervised Learning
Recall: A set of statistical tools for data that only has features/input
available, but no response.

In other words, we have X’s but no labels y.

Goal: Discover interesting patterns/properties of the data.

• E.g. for visualizing or interpreting high-dimensional data.

Challenges of Unsupervised Learning
Why is unsupervised learning challenging?

• Exploratory data analysis — goal is not always clearly defined

• Difficult to assess performance — “right answer” unknown

• Working with high-dimensional data

Types of Unsupervised Learning
Two approaches:

• Cluster analysis

- For identifying homogenous subgroups of samples

• Dimensionality reduction

- For finding a low-dimensional representation to characterize and

visualize the data
Cluster Analysis
Clustering
A set of methods for finding
subgroups within the dataset.

• Observations should share

common characteristics within the
same group, but differ across
groups.

• Groupings are determined from

attributes of the data itself —
https://medium.com/square-corner-blog/so-you-have-some-clusters-now-what-abfd297a575b
differs from classification.
Clustering vs. Classification
Classification

Class A Class B

?
Clustering Cluster A

Cluster B

Cluster C

Cluster D
Dataset
Types of Clustering
• Centroid-based clustering

• Hierarchical clustering

• Model-based clustering
- Each cluster is represented by a parametric distribution
- Dataset is a mixture of distributions

• Hard vs. soft/fuzzy clustering

- Hard: observations divided into distinct clusters
- Soft: observations may belong to more than one cluster
CME 250: Introduction to Machine Learning, Winter 2019 15
K-means Clustering
Groups data into K clusters that
satisfy two properties.

1. Each observation belongs to at

least one of the K clusters.

2. Clusters are non-overlapping.

No observation belongs to more
than one cluster.
K-means Clustering
A good clustering is one for which
the within-cluster variation is as
small as possible.

Denote each cluster by Ck, and let

W(Ck) be a measure of the within-
cluster variation.

K-means aims to solve

K-means Clustering Algorithm
1. Initialize each observation to a cluster by randomly assigning a
cluster, from 1 to K, to each observation.

2. Iterate until the cluster assignments stop changing:

a. For each of the K clusters, compute the cluster centroid. The k-th
cluster centroid is the vector of the p feature means for the
observations in the k-th cluster.

b. Assign each observation to the cluster whose centroid is closest

(using Euclidean distance as the metric).
K-means Clustering Iterations
K-means Clustering Iterations
K-means Clustering Animation
K-means Clustering Properties
It can be shown that the value of
the objective function will never
increase at each iteration of k-
means.

Since the algorithm finds local

minima, however, it will result in
different clusters with different
initializations.
K-means Pros and Cons
Pros:

• Easy to implement and understand

Cons:

• Not robust to data perturbations and different initializations

• Treats each feature equally, not robust to noise features or different

scales of features — looks for in spherical clusters in feature space

• Need to define K before running algorithm

Hierarchical Clustering
Cluster based on distances
between observations.

Represented as a tree hierarchy

(dendrogram) rather than a
partition of data.

Does not require committing to a

choice of K.
Sørlie, Therese, et al. (2003) "Repeated observation of breast
tumor subtypes in independent gene expression data sets," PNAS.
Hierarchical Clustering Algorithm
1. Initialize each observation to its own cluster.

2. For i = n, n-1, …, 2:

a. Examine all pairwise inter-cluster similarities among the i clusters

and identify the pair of clusters that are most similar. Fuse these two
clusters. The dissimilarity between these two clusters indicates the
height in the dendrogram at which the fusion occurs.

b. Compute the new pairwise inter-cluster similarities among the i-1

remaining clusters.
Distance Between Groups
It’s easy to compute Euclidean distance between two observations.
What is the distance or similarity between two groups or clusters of
observations?

Linkage: defines the dissimilarity between two groups of observations.

Most common types are complete, average, single, and centroid.
Types of Linkage

Complete linkage Single linkage Average linkage Centroid linkage

Hierarchical Clustering Example
Illustration of the first few steps of
the hierarchical clustering
algorithm, with complete linkage
and Euclidean distance.
Different Linkage, Different Dendrogram
Hierarchical Clustering Pros and Cons
Pros:

• Don’t have to choose a value of K (number of clusters) before

running algorithm

Cons:

• Do have to pick where to cut the dendrogram to obtain clusters

• Sensitive to similarity measure and type of linkage used

Dimensionality Reduction
Dimensionality Reduction
Recall the curse of dimensionality when working in high dimensions.

Dimensionality reduction is the process of reducing the number of

features under consideration.

We already saw some examples of this in the lasso and forward/

backward selection algorithms. These methods reduce dimensionality
by selecting a subset of features. However, they do so using
supervision — knowing a response y that is of interest.
Dimensionality Reduction
Principal Component Analysis
Look for a low-dimensional representation of the dataset that contains
as much variation in the dataset as possible.

E.g. for plotting our data and gaining intuition, if we can obtain a 2D
representation of the data, then we can plot the observations in this
low-dimensional space.

Note that you want to center the data and make the scales of features
comparable before performing PCA. E.g. if one feature is in kilometers
and another in meters, the one in kilometers may appear to have lower
variance when in fact this is due to scaling.
41
Principal Components
The first principal component of a set of features X1, X2,…, Xp is the
normalized linear combination of the features

that has the largest variance. “Normalized” refers to .

We refer to as the loadings of the first principal

component.

The loadings make up the first principal component vector.

Principal Components
The first principal component loading vector solves the optimization
problem
Principal Components
The second principal component Z2 is the linear combination of
features that has maximal variance out of all linear combinations that
are are uncorrelated with Z1.

Constraining Z2 to be uncorrelated with Z1 is equivalent to constraining

the direction of to be orthogonal to .
Principal Component Analysis

First two principal axes of

this Gaussian dataset.
Principal Component Analysis
Principal Components
Equivalently, find eigenvectors with the largest eigenvalues of the
sample covariance matrix.

By the singular value decomposition (SVD),

Principal Components
Equivalently, find eigenvectors with the largest eigenvalues of the
sample covariance matrix.

By the singular value decomposition (SVD),

The right singular vectors
are the loadings, or
principal axes, of the data.
Principal Components
Equivalently, find eigenvectors with the largest eigenvalues of the
sample covariance matrix.

By the singular value decomposition (SVD), UD is the full principal

components
decomposition of X, aka
the Z’s on previous slides.
How many principal components?
Scree plot

DATA - Dist
No ratings yet
DATA - Dist
90 pages
Module 3
No ratings yet
Module 3
21 pages
Module 4
No ratings yet
Module 4
63 pages
Clustering
No ratings yet
Clustering
20 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
6 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
Practical Statistics For Data Science - Chapter7
No ratings yet
Practical Statistics For Data Science - Chapter7
12 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering
No ratings yet
Clustering
38 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
Clustering
No ratings yet
Clustering
65 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Cluster
100% (1)
Cluster
72 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Cluster Analysis I: Presidency University
No ratings yet
Cluster Analysis I: Presidency University
98 pages
Clustering
No ratings yet
Clustering
55 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
Part2 Clustering Q&A
No ratings yet
Part2 Clustering Q&A
7 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Lecture 7
No ratings yet
Lecture 7
45 pages
Clustering
No ratings yet
Clustering
7 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
37 pages
Model 3
No ratings yet
Model 3
31 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Unit 5
No ratings yet
Unit 5
5 pages
DSBA Master Codebook - Unsupervised Learning
No ratings yet
DSBA Master Codebook - Unsupervised Learning
7 pages
Unit Iii
No ratings yet
Unit Iii
70 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
No ratings yet
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
48 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
69 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
82 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Lec 2
No ratings yet
Lec 2
32 pages
Unit 2
No ratings yet
Unit 2
33 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Toyota Mirai FCV Posters LR Tcm-11-564265
No ratings yet
Toyota Mirai FCV Posters LR Tcm-11-564265
10 pages
Sydney Airport Airside Driving Pocket Book Jul 2018
No ratings yet
Sydney Airport Airside Driving Pocket Book Jul 2018
70 pages
Zug Medical Accessories Catalog
No ratings yet
Zug Medical Accessories Catalog
6 pages
Vehicle Manual for Technicians
No ratings yet
Vehicle Manual for Technicians
1 page
ZR 132 VSD, ZR 160 VSD
No ratings yet
ZR 132 VSD, ZR 160 VSD
72 pages
European Commission. (2013, November) - Organic Versus Conventional Farming
No ratings yet
European Commission. (2013, November) - Organic Versus Conventional Farming
10 pages
Sect 3. Emergency Procedures
100% (1)
Sect 3. Emergency Procedures
108 pages
DLL English 10 Q1 - Module 1 - Lesson 3 - Myth, Implicit and Explicit Signals, Let It Go, Orpheus, Life of Pi
No ratings yet
DLL English 10 Q1 - Module 1 - Lesson 3 - Myth, Implicit and Explicit Signals, Let It Go, Orpheus, Life of Pi
8 pages
Reflection On Sports and Exercise Psychology
No ratings yet
Reflection On Sports and Exercise Psychology
2 pages
Art Journal 45 3 Video The Reflexive Medium PDF
No ratings yet
Art Journal 45 3 Video The Reflexive Medium PDF
85 pages
OCC Module 9
No ratings yet
OCC Module 9
5 pages
What Is The Reformed Faith
No ratings yet
What Is The Reformed Faith
5 pages
SEI NCE DB 2016 Kenya Clean Cooking
No ratings yet
SEI NCE DB 2016 Kenya Clean Cooking
6 pages
Trellix Helix Demo Guide
100% (1)
Trellix Helix Demo Guide
49 pages
What Is Development Studies
No ratings yet
What Is Development Studies
8 pages
Rule 8: Action To Avoid A Collision
100% (3)
Rule 8: Action To Avoid A Collision
48 pages
Hydraulic Systems Design Guidelines
100% (3)
Hydraulic Systems Design Guidelines
29 pages
"Mariam's Journey: A Complex Family"
100% (1)
"Mariam's Journey: A Complex Family"
7 pages
05 - m106 - Partie4-7e
No ratings yet
05 - m106 - Partie4-7e
34 pages
Memories, Myths and Misconceptions: An Analysis of Dominant Zionist Narratives Formalized in The Israeli Declaration of Independence
100% (2)
Memories, Myths and Misconceptions: An Analysis of Dominant Zionist Narratives Formalized in The Israeli Declaration of Independence
136 pages
Women Travellers
No ratings yet
Women Travellers
76 pages
Ahmed Radwan 1-1
No ratings yet
Ahmed Radwan 1-1
3 pages
3 Abdelilah Salim SEHLAOUI
No ratings yet
3 Abdelilah Salim SEHLAOUI
17 pages
Underwater Navigation for Divers
No ratings yet
Underwater Navigation for Divers
2 pages
Yukitoshi Higashino Mfta
100% (2)
Yukitoshi Higashino Mfta
29 pages
P Block Notes
No ratings yet
P Block Notes
6 pages
Interference Between Trawl Gear and Pipelines: By: Afifah Abdul Rashid
No ratings yet
Interference Between Trawl Gear and Pipelines: By: Afifah Abdul Rashid
30 pages
Polarity & Intermolecular Forces Guide
No ratings yet
Polarity & Intermolecular Forces Guide
17 pages
Tugas Topic 4 Devi Permatasari
No ratings yet
Tugas Topic 4 Devi Permatasari
8 pages
Assam Project List & Name of The Contractor - Email
No ratings yet
Assam Project List & Name of The Contractor - Email
4 pages

Day8 Unsupervised Learning

Uploaded by

Day8 Unsupervised Learning

Uploaded by

Unsupervised Learning

Category Quantity Yes No

Classification Regression Clustering Dimensionality reduction

Logistic Linear Ridge

In other words, we have X’s but no labels y.

Goal: Discover interesting patterns/properties of the data.

• E.g. for visualizing or interpreting high-dimensional data.

• Exploratory data analysis — goal is not always clearly defined

• Difficult to assess performance — “right answer” unknown

• Working with high-dimensional data

- For identifying homogenous subgroups of samples

- For finding a low-dimensional representation to characterize and

• Observations should share

• Groupings are determined from

• Hard vs. soft/fuzzy clustering

1. Each observation belongs to at

2. Clusters are non-overlapping.

Denote each cluster by Ck, and let

K-means aims to solve

2. Iterate until the cluster assignments stop changing:

b. Assign each observation to the cluster whose centroid is closest

Since the algorithm finds local

• Easy to implement and understand

• Not robust to data perturbations and different initializations

• Treats each feature equally, not robust to noise features or different

• Need to define K before running algorithm

Represented as a tree hierarchy

Does not require committing to a

a. Examine all pairwise inter-cluster similarities among the i clusters

b. Compute the new pairwise inter-cluster similarities among the i-1

Linkage: defines the dissimilarity between two groups of observations.

Complete linkage Single linkage Average linkage Centroid linkage

• Don’t have to choose a value of K (number of clusters) before

• Do have to pick where to cut the dendrogram to obtain clusters

• Sensitive to similarity measure and type of linkage used

Dimensionality reduction is the process of reducing the number of

We already saw some examples of this in the lasso and forward/

that has the largest variance. “Normalized” refers to .

We refer to as the loadings of the first principal

The loadings make up the first principal component vector.

Constraining Z2 to be uncorrelated with Z1 is equivalent to constraining

First two principal axes of

By the singular value decomposition (SVD),

By the singular value decomposition (SVD),

By the singular value decomposition (SVD), UD is the full principal

You might also like