Artificial Intelligence
Machine Learning
Unsupervised learning
2
Disclaimer Statement
In preparation of these slides, materials have been taken
from different online sources in the shape of books, websites,
research papers and presentations etc. However, the author
does not have any intention to take any benefit of these in
her/his own name. This lecture (audio, video, slides etc) is
prepared and delivered only for educational purposes and is
not intended to infringe upon the copyrighted material.
Sources have been acknowledged where applicable. The
views expressed are presenter’s alone and do not necessarily
represent actual author(s) or the institution.
Unsupervised Learning
Training data:“examples” x.
x1 , . . . , x n , x i ∈ X ⊂ Rn
• Clustering/segmentation:
f : R d − → {C 1 , . . . C k } (set of clusters).
Example: Find clusters in the population, fruits, species.
Unsupervised learning
Feature 2
Feature 1
Find clusters in the population feature 1 and feature 2.
Unsupervised learning
Feature 2
Feature 1
Methods: K-means, gaussian mixtures, hierarchical agglomerative clustering,
spectral clustering, DBScan, etc.
Clustering examples
• Clustering of the population by their demographics.
• Clustering of geographic objects (mineral deposits,
houses, etc.)
• Clustering of stars
• Audio signal separation. Example?
• Image segmentation. Example?
K-Means: example
Clustering of the population by their demographics.
K-Means: example
Clustering of geographic objects (mineral deposits, houses, etc.)
K-Means: example
Clustering of stars
K-Means: example
Clustering of stars
K-Means: example
Clustering of stars
K-Means: example
Clustering of stars
Clustering: K-Means
• Goal: Assign each example (x 1 , . . . , x n ) to one of the k clusters
{C1, . . . Ck}.
Clustering: K-Means
• Goal: Assign each example (x 1 , . . . , x n ) to one of the k clusters
{C1, . . . Ck}.
• µ j is the mean of all examples in the j t h cluster.
Clustering: K-Means
• Goal: Assign each example (x 1 , . . . , x n ) to one of the k clusters
{C1, . . . Ck}.
• µ j is the mean of all examples in the j t h cluster.
• Minimize: Σk Σ
2
J = ||x i − µ j ||
j=1 x i ∈C j
Clustering: K-Means
Algorithm K-Means:
Initialize randomly µ 1 , · · · µ k .
Clustering: K-Means
Algorithm K-Means: Initialize
randomly µ 1 , · · · µ k . Repeat
Assign each point x i to
the cluster with the closest µ j .
Clustering: K-Means
Algorithm K-Means: Initialize
randomly µ 1 , · · · µ k . Repeat
Assign each point x i to the cluster with the closest µ j .
Calculate the new mean for each cluster as follows:
1 Σ
µj = xi
|Cj |
x i ∈C j
Until convergence∗.
Clustering: K-Means
Algorithm K-Means: Initialize
randomly µ 1 , · · · µ k . Repeat
Assign each point x i to the cluster with the closest µ j .
Calculate the new mean for each cluster as follows:
1 Σ
µj = xi
|Cj |
x i ∈C j
Until convergence∗.
∗Convergence: Means no change in the clusters OR maximum
number of iterations reached.
K-Means: pros and cons
+Easy to implement
BUT...
-Need to know K
-Suffer from the curse of dimensionality
-No theoretical foundation
K-Means: questions
1. How to set k to optimally cluster the data?
2.How to evaluate your model? 3.How to cluster
non circular shapes?
K-Means: question 1
How to set k to optimally cluster the data?
G-means algorithm (Hamerly and Elkan, NIPS 2003) 1.Initialize
k to be a small number
2. Run k-means with those cluster centers, and store the resulting centers as C
3. Assign each point to its nearest cluster
4. Determine if the points in each cluster fit a Gaussian distribu- tion (Anderson-
Darling test).
5. For each cluster, if the points seem to be normally distributed, keep the cluster
center. Otherwise, replace it with two cluster centers.
6. Repeat this algorithm from step 2. until no more cluster cen- ters are created.
K-Means: question 2
How to evaluate your model?
• Not trivial (as compared to counting the number of errors in classification).
• Internal evaluation: using same data. high intra-cluster sim- ilarity
(documents within a cluster are similar) and low inter- cluster similarity. E.g.,
Davies-Bouldin index that takes into account both the distance inside the
clusters and the distance between clusters. The lower the value of the index,
the wider is the separation between different clusters, and the more tightly the
points within each cluster are located together.
• External evaluation: use of ground truth of external data. E.g., mutual
information, entropy, adjusted random index, etc.
K-Means: question 3
How to cluster non circular shapes?
There are other methods: spectral clustering, DBSCAN, BIRCH, etc. that
handle other shapes.