SE-807/CS-871
Machine Learning
Prof Dr. Hammad Afzal
hammad.afzal@mcs.edu.pk
Data and Text Processing Lab
www.codteem.com
1
Agenda
• Unsupervised Learning
– K-Means Clustering
– Agglomerative Clustering
2
Unsupervised Learning
3
CLUSTERING
Clustering is the partitioning of a data set into
subsets (clusters), so that the data in each
subset (ideally) share some common trait -
often according to some defined distance
measure.
Clustering is unsupervised classification
4
CLUSTERING
There is no explicit teacher and the system forms clusters or “natural
groupings” or structure in the input pattern
5
CLUSTERING
• Data WITHOUT classes or labels
x1, x2 , x3 , xn , x d
• Deals with finding a structure in a collection of unlabeled data.
• The process of organizing objects into groups whose members are
similar in some way
• A cluster is therefore a collection of objects which are “similar” between
them and are “dissimilar” to the objects belonging to other clusters.
6
CLUSTERING
• In this case we easily identify the 4 clusters into which the data can be
divided;
• The similarity criterion is distance: two or more objects belong to the
same cluster if they are “close” according to a given distance
7
Types of Clustering
Hierarchical algorithms
These find successive clusters using previously established clusters.
1. Agglomerative ("bottom-up"):
Agglomerative algorithms begin with each element as a separate cluster and merge
them into successively larger clusters.
2. Divisive ("top-down"):
Divisive algorithms begin with the whole set and proceed to divide it into
successively into smaller clusters.
8
Types of Clustering
• Partitional clustering
– Construct a partition of a data set to produce
several clusters – At once
– The process is repeated iteratively –
Termination condition
– Examples
▪ K-means clustering
▪ Fuzzy c-means clustering
9
K means Clustering
1. Chose the number (K) of clusters and randomly
select the centroids of each cluster.
2. For each data point:
I. Calculate the distance from the data point to each
cluster.
II. Assign the data point to the closest cluster.
3. Recompute the centroid of each cluster.
4. Repeat steps 2 and 3 until there is no further
change in the assignment of data points (or in the
centroids).
10
K MEANS – Example 2
– Suppose we have 4 medicines and each has two attributes (pH
and weight index).
– Our goal is to group these objects into K=2 clusters of medicine
Medicine Weight pH-Index C
A 1 1
B 2 1
C 4 3 A B
D 5 4
11
K MEANS – Example 2
– Compute the distance between all samples and K centroids
c1 = A, c2 = B
d( D, c1 ) = ( 5 − 1)2 + ( 4 − 1)2 = 5
d( D, c2 ) = ( 5 − 2)2 + ( 4 − 1)2 = 4.24
12
K MEANS – Example 2
– Assign the sample to its closest cluster
– An element in a row of the Group matrix
below is 1 if and only if the object is
assigned to that group
13
K MEANS – Example 2
– Re-calculate the K-centroids
– Knowing the members of each
– cluster, now we compute the new
– centroid of each group based on
– these new memberships.
c1 = (1, 1)
2 + 4 + 5 1+ 3 + 4
c2 = ,
3 3
= (11 / 3, 8 / 3)
= (3.67 , 2.67 )
14
K MEANS – Example 2
• Repeat the above steps
Compute the distance of all objects to
the new centroids
15
K MEANS – Example 2
Assign the membership to objects
16
K MEANS – Example 2
Knowing the members of each cluster, now we
compute the new centroid of each group based
on these new memberships.
1+ 2 1+1 1
c1 = , = (1 , 1)
2 2 2
4+5 3+4 1 1
c2 = , = ( 4 ,3 )
2 2 2 2
17
K MEANS – Example 2
18
K MEANS – Example 2
19
K MEANS – Example 2
• We obtain result that G2=G1. Comparing the grouping of last
iteration and this iteration reveals that the objects do not move
group anymore.
• Thus, the computation of the k-mean clustering has reached its
stability and no more iterations are needed.
20
Kmeans - Examples
Data Points – RGB Values of pixels
Can be used for Image Segmentation
D. Comaniciu and P.
Meer, Robust Analysis of
Feature Spaces:
Color Image
Segmentation, 1997.
21
Kmeans - Examples
Extraction of text in degraded documents
Original Image Kmeans with k=3
22
Kmeans - Examples
Original K=5 K=11
23
Kmeans - Examples
• Quantization of colors
24
Hierarchical
Clustering
25
Hierachical clustering
Agglomerative and divisive clustering on the data set {a, b, c, d ,e }
Step 0 Step 1 Step 2 Step 3 Step 4
Agglomerative
a
ab
b
abcde
c
cde
d
de
e
Divisive
Step 4 Step 3 Step 2 Step 1 Step 0 26
Agglomerative clustering
1. Convert object attributes to distance matrix
2. Set each object as a cluster (thus if we have N
objects, we will have N clusters at the beginning)
3. Repeat until number of cluster is one (or known #
of clusters)
a. Merge two closest clusters
b. Update distance matrix
d3
d5
d1 d3,d4,d5
d4
d2
d1,d2 d4,d5 d3
27
Starting Situation
• Start with clusters of individual points and a distance/proximity matrix
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
.
.
.
Distance Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
28
Intermediate situation
• After some merging steps, we have some clusters
C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
C1 Distance Matrix
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
29
Intermediate situation
• How do we compare two clusters
C3
C4
C1
C2 C5
30
Inter cluster distance measures
Similarity?
• Single Link
• Average Link
• Complete Link
• Distance between centroids
31
Intermediate situation
• We want to merge the two closest clusters (C2 and C5) and
update the distance matrix.
C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
C1
Distance Matrix
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
32
Single link
• Smallest distance between an element in one cluster and an element
in the other
D(ci , c j ) = min D( x, y)
xci , yc j
33
Complete link
• Largest distance between an element in one cluster and an element in
the other
D(ci , c j ) = max D( x, y)
xci , yc j
34
Average Link
• Avg distance between an element in one cluster and an element in the
other
D(ci , c j ) = avg D( x, y)
xci , yc j
35
Distance between centroids
• Distance between the centroids of two clusters
36
After Merging
• Update the distance matrix
C2 U
C5
C1 C3 C4
C1 ?
C3
C2 U C5 ? ? ? ?
C4
C3 ?
C4 ?
C1
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
37
Agglomerative Clustering - Example
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
dAB = ((1-1.5)2+(1-1.5)2)1/2 = 0.707
C 5.66 4.95 0.00 2.24 1.41 2.50
Euclidean distance D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
38
Merge two closest clusters
Agglomerative Clustering - Example
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
Merge them into
single cluster` F 3 3.5
Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
Find two closest clusters D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
39
Update Distance Matrix
Agglomerative Clustering - Example
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
Dist A B C D,F E
A 0.00 0.71 5.66 ? 4.24
B 0.71 0.00 4.95 ? 3.54
C 5.66 4.95 0.00 ? 1.41
D,F ? ? ? 0.00 ?
E 4.24 3.54 1.41 ? 0.00
40
Update Distance Matrix
Agglomerative Clustering - Example
Dist A B C D E F
Min Distance – Single Linkage
A 0.00 0.71 5.66 3.61 4.24 3.20
D(D,F)→A = min(dDA,dFA)=min(3.61,3.20) = 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50 D(D,F)→B = min(dDB,dFB)=min(2.92,2.50) = 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
D(D,F)→C = min(dDC,dFC)=min(2.24,2.50) = 2.24
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00 D(D,F)→E = min(dDE,dFE)=min(1.00,1.12) = 1.00
Dist A B C D,F E Dist A B C D,F E
A 0.00 0.71 5.66 ? 4.24 A 0.00 0.71 5.66 3.20 4.24
B 0.71 0.00 4.95 ? 3.54 B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 ? 1.41 C 5.66 4.95 0.00 2.24 1.41
D,F ? ? ? 0.00 ? D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 ? 0.00 E 4.24 3.54 1.41 1.00 0.00
41
Merge two closest clusters
Agglomerative Clustering - Example
Dist A B C D,F E
A 0.00 0.71 5.66 3.20 4.24
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 0.00
Dist A,B C D,F E
A,B 0.00 ? ? ?
C ? 0.00 2.24 1.41
D,F ? 2.24 0.00 1.00
E ? 1.41 1.00 0.00
42
Update Distance Matrix
Agglomerative Clustering - Example
Dist A B C D,F E
A 0.00 0.71 5.66 3.20 4.24 D(A,B)→C = min(dCA,dCB)=min(5.66,4.95) = 4.95
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41 D(A,B)→(D,F) = min(dDA,dDB, dFA,dFB)
=min(3.61, 2.92, 3.20, 2.50) = 2.50
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 0.00
D(A,B)→E = min(dAE,dBE)=min(4.24,3.54) = 3.54
Dist A,B C D,F E Dist A,B C D,F E
A,B 0.00 ? ? ? A,B 0.00 4.95 2.50 3.54
C ? 0.00 2.24 1.41 C 4.95 0.00 2.24 1.41
D,F ? 2.24 0.00 1.00 D,F 2.50 2.24 0.00 1.00
E ? 1.41 1.00 0.00 E 3.54 1.41 1.00 0.00
43
Merge two closest clusters/Update Distance Matrix
Agglomerative Clustering - Example
Dist A,B C D,F E
A,B 0.00 4.95 2.50 3.54
C 4.95 0.00 2.24 1.41
D,F 2.50 2.24 0.00 1.00
E 3.54 1.41 1.00 0.00
Dist (A,B) C (D,F),E
(A,B) 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00
44
Merge two closest clusters/Update Distance Matrix
Agglomerative Clustering - Example
Dist (A,B) C (D,F),E
(A,B) 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00
Dist (A,B) ((D,F),E),C
(A,B) 0.00 2.50
((D,F),E),C 2.50 0.00
45
Final Result
Agglomerative Clustering - Example
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Data matrix
46
Dendrogram Representation
Agglomerative Clustering - Example
1. In the beginning we have 6 clusters: A,
B, C, D, E and F
2. We merge cluster D and F into cluster
6 (D, F) at distance 0.50
3. We merge cluster A and cluster B into
(A, B) at distance 0.71
4. We merge cluster E and (D, F) into ((D,
F), E) at distance 1.00
5. We merge cluster ((D, F), E) and C into
5 (((D, F), E), C) at distance 1.41
6. We merge cluster (((D, F), E), C) and
(A, B) into ((((D, F), E), C), (A, B)) at
4 distance 2.50
3 7. The last cluster contain all the objects,
2 thus conclude the computation
47
Single Link Clustering
5
1
3
5
2 1 0.2
2 3 6
0.15
0.1
0.05
4
4 0
3 6 2 5 4 1
Dendrogram
Nested Clusters
48
Complete link Clustering
4 1
2 5 0.4
5 0.35
2 0.3
0.25
3 6 0.2
3 0.15
1 0.1
0.05
4 0
3 6 4 1 2 5
Dendrogram
Nested Clusters
49
Average link clustering
5 4 1
2
5 0.25
2 0.2
3 6
0.15
0.1
1
0.05
4 0
3 3 6 4 1 2 5
Dendrogram
Nested Clusters
50
Comparison
5 4 1
1
3 2 5
5 5
2
2 1
2 3 3 6
6 3
1
4
4
4
Single Link 5 Complete Link
1
2
5
2
3 6
3
4 1
4
Average Link
51
Agglomerative Clustering
• Where to cut the tree?
3-cluster model 2-cluster model
52
Thank You
53