Warmup Exercise: K-means Clustering
Warmup Exercise: K-means Clustering
Hierarchical Clustering
Clusters are created iteratively, using clusters
created in previous step
Construction of a hierarchy of clusters
(dendrogram) merging clusters with minimum
distance
Use distance matrix as clustering criteria.
The Hierarchical method works by grouping data
objects(records) into a tree of clusters.
Classified Further as
Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering
Dendrogram
Dendrogram: a tree data
structure which illustrates
hierarchical clustering
techniques.
Each level shows clusters
for that level.
Leaf – individual clusters
Root – one cluster
A cluster at level i is the
union of its children
clusters at level i+1.
Hierarchical Clustering
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
Hierarchical Clustering
Clusters are created in levels actually creating
sets of clusters at each level.
Agglomerative
Initially each item in its own cluster
Iteratively clusters are merged together
Bottom Up
Divisive
Initially all items in one cluster
Large clusters are successively divided
Top Down
Distance Between Clusters
Single Link: smallest distance between points
Complete Link: largest distance between points
Distance Between Clusters
Average Link: average distance between points
Centroid: distance between centroids
Agglomerative Algorithm
The agglomerative method is basically a bottom-up
approach which involves the following steps. An
implementation however may include some variation of
these steps
1. Allocate each point to a cluster of its own. Thus we start with n
clusters for n objects.
2. Create a distance matrix by computing distances between all
pairs of clusters either using, for example, the single-link metric
or the complete-link metric. Some other metric may also be
used. Sort these distances in ascending order.
3. Find the two clusters that have the smallest distance between
them
4. Remove the pair of objects and merge them.
5. If there is only one cluster left then stop.
6. Compute all distances from the new cluster and update the
distance matrix after the merger and go to Step 3.
Agglomerative Algorithm
Allocate each point to a cluster and compute the
distance matrix
Show the half portion of the matrix
Agglomerative Algorithm
Consider the following data.
Student Age Marks1 Marks2 Marks
3
S1 18 73 75 57
S2 18 79 85 75
S3 23 70 70 52
S4 20 55 55 55
S5 22 85 86 87
S6 19 91 90 89
S7 20 70 65 60
S8 21 53 56 59
S9 19 82 82 60
S10 47 75 76 77
Agglomerative Example
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 0
S2 34 0
S3 18 52 0
S4 42 76 36 0
S5 57 23 67 95 0
S6 66 32 82 106 15 0
S7 18 46 16 30 65 76 0
S8 44 74 40 8 91 104 28 0
S9 20 22 36 60 37 46 30 58 0
S10 52 44 60 90 55 70 60 86 58 0
Agglomerative Algorithm
The smallest distance is 8 between S4 and S8.
Combine them as cluster C1 and update the table
by putting the C1 into the place where S4 was.
All distance except those with cluster C1 remain
unchanged.
Agglomerative Algorithm
Student Age Marks1 Marks2 Mark
s3
S1 18 73 75 57
S2 18 79 85 75
S3 23 70 70 52
C1 20.5 54 55.5 57
S5 22 85 86 87
S6 19 91 90 89
S7 20 70 65 60
S9 19 82 82 60
S10 47 75 76 77
Agglomerative Algorithm
S1 S2 S3 C1 S5 S6 S7 S9 S10
S1 0
S2 34 0
S3 18 52 0
C1 41 75 38 0
S5 57 23 67 93 0
S6 66 32 82 105 15 0
S7 18 46 16 29 65 76 0
S9 20 22 36 59 37 46 30 0
S10 52 44 60 88 55 70 60 58 0
Agglomerative Example
The smallest distance is now 15 between
S5 and S6.
Combine and update the table.
Agglomerative Algorithm
Student Age Marks1 Marks2 Mark
s3
S1 18 73 75 57
S2 18 79 85 75
S3 23 70 70 52
C1 20.5 54 55.5 57
C2 20.5 88 88 88
S7 20 70 65 60
S9 19 82 82 60
S10 47 75 76 77
Agglomerative Algorithm
S1 S2 S3 C1 C2 S7 S9 S10
S1 0
S2 34 0
S3 18 52 0
C1 41 75 38 0
C2 61.5 27.5 74.5 97.5 0
S7 18 46 16 29 69.5 0
S9 20 22 36 59 41.5 30 0
S10 52 44 60 88 62.5 60 58 0
Agglomerative Example
Merge S3 and S7 and put them as C3.
Continue the process.
Agglomerative Example
Agglomerative Example
Let’s now see a simple example: a hierarchical clustering of
distances in kilometers between some Italian cities. The
method used is single-linkage.
Single Link: smallest distance between points
Input distance matrix (L = 0 for all the clusters):
Agglomerative Algorithm
The nearest pair of cities is MI and TO, at
distance 138.
BA FI MI NA RM TO
BA 0
FI 662 0
MI 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
TO 996 400 138 869 669 0
Agglomerative Algorithm
The level of the new cluster is L(MI/TO) = 138 and the new
sequence number is m = 1.
The distance from the compound object to another object is
equal to the shortest distance from any member of the
cluster to the outside object.
Agglomerative Algorithm
Dist(MI/TO, BA)=min{Dist(MI,BA), Dist(TO,BA)}
Min{877, 996}
877
877
BA FI MI NA RM TO
BA 0
FI 662 0
MI 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
TO 996 400 138 869 669 0
Agglomerative Algorithm
Dist(MI/TO, FI)=min{Dist(MI,FI), Dist(TO,FI)}
Min{295, 400}
295
295
BA FI MI NA RM TO
BA 0
FI 662 0
MI 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
TO 996 400 138 869 669 0
Agglomerative Algorithm
Dist(MI/TO, NA)=min{Dist(MI,NA), Dist(TO,NA)}
Min{754, 869}
754
754
BA FI MI NA RM TO
BA 0
FI 662 0
MI 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
TO 996 400 138 869 669 0
Agglomerative Algorithm
Dist(MI/TO, RM)=min{Dist(MI,RM), Dist(TO,RM)}
Min{564, 669}
564
564
BA FI MI NA RM TO
BA 0
FI 662 0
MI 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
TO 996 400 138 869 669 0
Agglomerative Example
After merging MI with TO we obtain
the following matrix:
BA FI MI/TO NA RM
BA 0
FI 662 0
MI/TO 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
Agglomerative Example
min d(i,j) = d(NA,RM) = 219
merge NA and RM into a new cluster called
NA/RM, L(NA/RM) = 219
m=2
BA FI MI/TO NA RM
BA 0
FI 662 0
MI/TO 877 295 0
NA 255 468 754 0
RM 412 268 564 219 0
Agglomerative Example
After merging NA with RM we obtain
the following matrix:
BA FI MI/TO NA/RM
BA 0
FI 662 0
MI/TO 877 295 0
NA/RM 255 268 564 0
Agglomerative Example
min d(i,j) = d(BA,NA/RM) = 255
merge BA and NA/RM into a new cluster
called BA/NA/RM
L(BA/NA/RM) = 255
m=3
BA FI MI/TO NA/RM
BA 0
FI 662 0
MI/TO 877 295 0
NA/RM 255 268 564 0
Agglomerative Example
After merging BA with NA/RM we obtain
the following matrix:
BA/NA/RM FI MI/TO
BA/NA/RM 0
FI 268 0
MI/TO 564 295 0
Agglomerative Example
min d(i,j) = d(BA/NA/RM,FI) = 268
merge BA/NA/RM and FI into a new
cluster called BA/FI/NA/RM
L(BA/FI/NA/RM) = 268
m=4
BA/NA/RM FI MI/TO
BA/NA/RM 0
FI 268 0
MI/TO 564 295 0
Agglomerative Example
After merging FI with BA/NA/RM we obtain
the following matrix:
BA/FI/NA/RM MI/TO
BA/FI/NA/RM 0 295
MI/TO 295 0
Agglomerative Example
Finally, we merge the last two clusters at
level 295.
The process is summarized by the
following hierarchical tree:
Agglomerative Example
Advantages:
Is simple and outputs a hierarchy, a structure
that is more informative
It does not require us to pre-specify the
number of clusters
Disadvantages:
Selection of merge or split points is critical as
once a group of objects is merged or split, it
will operate on the newly generated clusters
and will not undo what was done previously.
Thus merge or split decisions if not well chosen
may lead to low-quality clusters
Divisive Hierarchical Clustering
A typical polythetic divisive method works like the following
1. Decide on a method of measuring the distance between two
objects. Also decide a threshold distance.
2. Create a distance matrix by computing distances between all
pairs of object within the cluster. Sort these distances in
ascending order.
3. Find the two objects that have the largest distance between
them. They are the most dissimilar objects.
4. If the distance between the two objects is smaller than the pre-
specified threshold and there is no other cluster that needs to be
divided then stop, otherwise continue.
5. Use the pair of objects as seeds of a K-means method to create
two new clusters.
6. If there is only one object in each cluster then stop otherwise
continue with step 2.
Divisive Hierarchical Clustering
Consider the following data
Student Age Marks1 Marks2 Marks
3
S1 18 73 75 57
S2 18 79 85 75
S3 23 70 70 52
S4 20 55 55 55
S5 22 85 86 87
S6 19 91 90 89
S7 20 70 65 60
S8 21 53 56 59
S9 19 82 82 60
S10 47 75 76 77
Divisive Hierarchical Clustering
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S1 0
S2 34 0
S3 18 52 0
S4 42 76 36 0
S5 57 23 67 95 0
S6 66 32 82 106 15 0
S7 18 46 16 30 65 76 0
S8 44 74 40 8 91 104 28 0
S9 20 22 36 60 37 46 30 58 0
S10 52 44 60 90 55 70 60 86 58 0
Divisive Hierarchical Clustering
The largest distance is between S4 and S6
Use the as new two seed
Use k-mean method two find new clusters
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S4 42 76 36 0 95 106 30 8 60 90
S6 66 32 82 106 15 0 76 104 46 70
Cluster membership
Cluster-1 (S4):
Cluster-2 (S6):
Divisive Hierarchical Clustering
Use k-mean method two find new clusters
Dist(S4,S1)=42 and Dist(S6,S1)=66
Minimum=42
S1 belongs to Cluster 2.
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S4 42 76 36 0 95 106 30 8 60 90
S6 66 32 82 106 15 0 76 104 46 70
Cluster membership
Cluster-1 (S4): S1
Cluster-2 (S6):
Divisive Hierarchical Clustering
Use k-mean method two find new clusters
Dist(S4,S2)=76 and Dist(S6,S2)=32
Minimum=32
S2 belongs to Cluster 2.
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S4 42 76 36 0 95 106 30 8 60 90
S6 66 32 82 106 15 0 76 104 46 70
Cluster membership
Cluster-1 (S4): S1
Cluster-2 (S6): S2
Divisive Hierarchical Clustering
Use k-mean method two find new clusters
Dist(S4,S3)=36 and Dist(S6,S3)=82
Minimum=36
S3 belongs to Cluster 1.
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S4 42 76 36 0 95 106 30 8 60 90
S6 66 32 82 106 15 0 76 104 46 70
Cluster membership
Cluster-1 (S4): S1, S3
Cluster-2 (S6): S2
Divisive Hierarchical Clustering
Finally we get the following:
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
S4 42 76 36 0 95 106 30 8 60 90
S6 66 32 82 106 15 0 76 104 46 70
Cluster membership
Cluster-1 (S4): S1, S3, S4 , S7, S8
Cluster-2 (S6): S2, S5, S6 , S9, S10
Divisive Hierarchical Clustering
None of the stopping criteria have been met
Split table by two seed naming S1, and S8
Take the larger cluster and continue the process.
S1 S3 S4 S7 S8
S1 0
S3 18 0
S4 42 36 0
S7 18 16 30 0
S8 44 40 8 28 0
Divisive Hierarchical Clustering
Table below shows the member of cluster 1.
Take the larger cluster and continue the process.
S2 S5 S6 S9 S10
S2 0
S5 23 0
S6 32 15 0
S9 22 37 46 0
S10 44 55 70 58 0
Conclusion
Cluster analysis is a collection of methods
that assists the user in putting different
objects from a collection of objects into
different groups. In some ways one could
say that cluster analysis is best used as
exploratory data analysis exercise when
the user has no hypothesis to test. Cluster
analysis, therefore, can be used to
uncover hidden structure which may assist
further exploration.
References
Gupta, G. K. Introduction to data mining with case studies. PHI Learning Pvt.
Ltd., 2006.
Dunham, Margaret H. Data mining: Introductory and advanced topics.
Pearson Education India, 2006.
Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and
techniques. Morgan kaufmann, 2006.
Thank you