UNSUPERVISED
LEARNING
(Clustering and K-Means
Algorithm)
Outline of Presentation
❑Clustering
❑Types of Clustering
❑Examples and explanation of Clustering
❑K-Means Clustering
❑Working of K-Means Algorithm
❑K-Means Algorithm Examples
❑Hierarchal Clustering
❑Distance Based Clustering
Machine Learning
Classification
Supervised Task Driven (Defined Labels)
Predict Future
Learning Values Regression
(No Defined Labels)
Machine Unsupervised Data Driven
Learning Learning Clustering
Reinforcement Learning from the
mistakes
Learning
Supervised learning vs. Unsupervised learning
Supervised learning: discover patterns in the data that relate
data attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the
target attribute in future data instances.
Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in
them.
Clustering
❑ Clustering is a technique for finding similarity groups in data,
called clusters. i.e.,
❖ It groups data instances that are similar to (near) each other in
one cluster and data instances that are very different (far away)
from each other into different clusters.
❑ Clustering is often called an unsupervised learning task as
no class values denoting an a priori grouping of the data
instances are given, which is the case in supervised learning.
❑ Due to historical reasons, clustering is often considered
synonymous with unsupervised learning.
❖ In fact, association rule mining is also unsupervised
WHAT Is Clustering?
“Clustering is the process of dividing the datasets into groups,
consisting of similar data-points”
• Points in the same group are as similar as possible.
• Points in different group are as dissimilar as possible
Examples of Clustering
Items arranges in a Mall Group of Diners in a Restaurant
Illustration of Clustering
The data set has three natural groups of
data points, i.e., 3 natural clusters.
Example 1: groups people of similar sizes together to
make “small”, “medium” and “large” T-Shirts.
• Tailor-made for each person: too expensive
• One-size-fits-all: does not fit all.
Example 2: In marketing, segment customers
according to their similarities
• To do targeted marketing.
➢ It has a long history, and used in almost every field, e.g., medicine,
psychology, botany, sociology, biology, archeology, marketing, insurance,
libraries, etc.
➢ In recent years, due to the rapid increase of online documents, text
clustering becomes important.
Aspects of Clustering
➢ A clustering algorithm
▪ Partitional clustering
▪ Hierarchical clustering
▪ …
➢ A distance (similarity, or dissimilarity) function
➢ Clustering quality
▪ Inter-clusters distance maximized
▪ Intra-clusters distance minimized
➢ The quality of a clustering result depends on the algorithm, the
distance function, and the application.
What is Similarity
• In the field of
cluster analysis,
this similarity
plays an
important part.
• Now, we shall
learn how
similarity (this is
also alternatively
judged as
“dissimilarity”)
between any two
data can be
measured.
Applications of clustering
Recommendation Search results
System
Banking/ Finance/ insurance
Grouping of photos Movie
Recommendation
Retail Stores
Applications of clustering (Face Clustering)
Applications of clustering (Face Clustering)
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering
Exclusive Clustering
• Hard Clustering
• Data Point/Item belongs exclusively to one
cluster
• For Example- K-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering
Overlapping Clustering
• Soft Cluster
• Data Points/Item Belong to Multiple Cluster
• For Example- Fuzzy/ C-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering
Hierarchal Clustering
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods
Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)
E = ik=1 pCi ( p − ci ) 2
k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster
K-means Clustering
➢ K-Means is a clustering algorithm whose main goal is to group similar elements
or data-points into a cluster.
➢ “K” represents the number of clusters
Examples:
Pile of Dirty Cloths
Document Classifier
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
How will you find value of k
Summary of K-means algorithm
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 1: Loading a colour image of
tissue stained with
hemotoxylin and eosin
(H&E)
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 2: Convert the image from RGB colour space to L*a*b*
colour space
• Unlike the RGB colour model, L*a*b* colour is
designed to approximate human vision.
• There is a complicated transformation between RGB
and L*a*b*.
(L*, a*, b*) = T(R, G, B)
(R, G, B) = T’(L*, a*, b*).
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 3: Undertake clustering analysis in the (a*, b*) colour space
with the K-means algorithm
• In the L*a*b* colour space, each pixel has a
properties or feature vector: (L*, a*, b*).
• Like feature selection, L* feature is discarded. As a
result, each pixel has a feature vector (a*, b*).
• Applying the K-means algorithm to the image in the
a*b* feature space where K = 3 by applying the
domain knowledge.
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 4: Label every pixel in the
image using the results
from K-means clustering
(indicated by three
different grey levels)
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 5: Create Images that Segment the H&E Image by Colour
• Apply the label and the colour information of each pixel to achieve
separate colour images corresponding to three clusters.
“blue” pixels “white” pixels “pink” pixels
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 6: Segment the nuclei into a
separate image with the L*
feature
• In cluster 1, there are dark and light
blue objects (pixels). The dark blue
objects (pixels) correspond to nuclei
(with the domain knowledge).
• L* feature specifies the brightness
values of each colour.
• With a threshold for L*, we achieve
an image containing the nuclei only.
Problem of the K-Means Method
• The algorithm is only applicable if the mean is defined.
• For categorical data, k-mode - the centroid is represented by
most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away from other
data points.
• Outliers could be errors in the data recording or some special
data points with very different values.
Problems with outliers
Problem of the K-Means Method
• The algorithm is sensitive to initial seeds.
Problem of the K-Means Method
• If we use different seeds: good results
▪ There are some
methods to help
choose good
seeds
Problem of the K-Means Method
• The k-means algorithm is not suitable for discovering clusters
that are not hyper-ellipsoids (or hyper-spheres).
Problem of the K-Means Method
• If we use different seeds: good results
▪ There are some
methods to help
choose good
seeds
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods
Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)
E = ik=1 pCi ( p − ci ) 2
k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster
Hierarchical CLUSTERING
• Use distance matrix as clustering criteria. This method does not
require the number of clusters k as an input, but needs a termination
condition
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative
(AGNES)
a Agglomerative Nesting
ab
b abcde
c
cde
d
de
e divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0
Divisive Analysis
Types of hierarchical clustering
➢ Agglomerative (bottom up) clustering: It builds the
dendrogram (tree) from the bottom level, and
▪ merges the most similar (or nearest) pair of cluster types of
hierarchical clustering.
▪ stops when all the data points are merged into a single cluster
(i.e., the root cluster).
➢ Divisive (top down) clustering: It starts with all data
points in one cluster, the root.
• Splits the root into a set of child clusters. Each child cluster is
recursively divided further
• stops when only singleton clusters of individual data points
remain, i.e., each cluster with only a single point
An example: working of the algorithm
Hierarchical CLUSTERING
Decompose data objects into a several levels of nested partitioning (tree of
clusters), called a dendrogram
A clustering of the data objects is obtained by cutting the dendrogram at the
desired level, then each connected component forms a cluster
Density-Based CLUSTERING
• Clustering based on density (local cluster criterion), such as density-
connected points
• Major features:
➢ Several interesting studies: Discover clusters of arbitrary shape
➢ Handle noise
➢ One scan
➢ Need density parameters as termination condition
• Several interesting studies:
DBSCAN: Ester, et al. (KDD’96)
OPTICS: Ankerst, et al (SIGMOD’99).
DENCLUE: Hinneburg & D. Keim (KDD’98)
CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)
Density-Based CLUSTERING
• Two parameters: Epsilon and Minimum Points
• Eps: Maximum radius of the neighbourhood
• MinPts: Minimum number of points in an Eps-neighbourhood
of that point
• NEps(p): {q belongs to D | dist(p,q) ≤ Eps}
• Directly density-reachable: A point p is directly density-reachable
from a point q w.r.t. Eps, MinPts if
• p belongs to NEps(q) p MinPts = 5
• core point condition: Eps = 1 cm
|NEps (q)| ≥ MinPts q
Density-Reachable and Density-Connected
• Density-reachable:
• A point p is density-reachable from a
p
point q w.r.t. Eps, MinPts if there is a p1
chain of points p1, …, pn, p1 = q, pn = p q
such that pi+1 is directly density-reachable
from pi
• Density-connected
• A point p is density-connected to a point
q w.r.t. Eps, MinPts if there is a point o
p q
such that both, p and q are density-
reachable from o w.r.t. Eps and MinPts o
DBSCAN: Density-Based Spatial
Clustering of Applications with Noise
• Relies on a density-based notion of cluster: A cluster is defined as a
maximal set of density-connected points
• Discovers clusters of arbitrary shape in spatial databases with noise
Outlier
Border
Eps = 1cm
Core MinPts = 5
DBSCAN: The Algorithm
• Arbitrary select a point p
• Retrieve all points density-reachable from p w.r.t. Eps and MinPts
• If p is a core point, a cluster is formed
• If p is a border point, no points are density-reachable from p and DBSCAN
visits the next point of the database
• Continue the process until all of the points have been processed