Clustering
Clustering
1
What is Cluster Analysis?
• Finding groups of objects such that the objects in a group will be
similar (or related) to one another and different from (or
unrelated to) the objects in other groups
Applications of Cluster Analysis
3 MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
• Summarization 4 Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP
Clustering precipitation
in Australia
What is not Cluster Analysis?
• Supervised classification
• Have class label information
• Simple segmentation
• Dividing students into different registration groups alphabetically,
by last name
• Results of a query
• Groupings are a result of an external specification
• Graph partitioning
• Some mutual relevance and synergy, but areas are not identical
Notion of a Cluster can be Ambiguous
• Partitional Clustering
• A division data objects into non-overlapping subsets (clusters) such
that each data object is in exactly one subset
• Hierarchical clustering
• A set of nested clusters organized as a hierarchical tree
Partitional Clustering
p1
p3 p4
p2
p1 p2 p3 p4
Traditional Hierarchical Clustering Traditional Dendrogram
p1
p3 p4
p2
p1 p2 p3 p4
Non-traditional Hierarchical Clustering Non-traditional Dendrogram
Other Distinctions Between Sets of Clusters
• Well-separated clusters
• Center-based clusters
• Contiguous clusters
• Density-based clusters
• Property or Conceptual
• Well-Separated Clusters:
• A cluster is a set of points such that any point in a cluster is closer
(or more similar) to every other point in the cluster than to any
point not in the cluster.
3 well-separated clusters
Types of Clusters: Center-Based
• Center-based
• A cluster is a set of objects such that an object in a cluster is closer
(more similar) to the “center” of a cluster, than to the center of any
other cluster
• The center of a cluster is often a centroid, the average of all the
points in the cluster, or a medoid, the most “representative” point
of a cluster
4 center-based clusters
Types of Clusters: Contiguity-Based
8 contiguous clusters
Types of Clusters: Density-Based
• Density-based
• A cluster is a dense region of points, which is separated by low-
density regions, from other regions of high density.
• Used when the clusters are irregular or intertwined, and when noise
and outliers are present.
6 density-based clusters
Types of Clusters: Conceptual Clusters
2 Overlapping Circles
Types of Clusters: Objective Function
• Want to minimize the edge weight between clusters and maximize the edge
weight within clusters
Characteristics of the Input Data Are
Important
• Hierarchical clustering
• Density-based clustering
K-means Clustering
3
2.5
2.5
2
2
Original Points
1.5
y
1.5
1
y
1
0.5
0.5
0
3 3
3 3
2.5 2.5
2.5 2.5
2 2
2 2
1.5 1.5
y
y
1.5 1.5
1 1
y
y
1 1
0.5 0.5
0.5 0.5
0 0
0 0
Optimalx Clustering x
Sub-optimal Clustering
Optimal Clustering Sub-optimal Clustering
Importance of Choosing Initial Centroids
Iteration 6
1
2
3
4
5
3
2.5
1.5
y
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Evaluating K-means Clusters
Iteration 5
1
2
3
4
3
2.5
1.5
y
0.5
Iteration 1 Iteration 2
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Problems with Selecting Initial Points
• If there are K ‘real’ clusters then the chance of selecting one centroid from each
cluster is small.
• Chance is relatively small when K is large
• If clusters are the same size, n, then
2
y
-2
-4
-6
0 5 10 15 20
x
Starting with two initial centroids in one cluster of each pair of clusters
10 Clusters Example
Iteration 1 Iteration 2
8 8
6 6
4 4
2 2
y
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Iteration 3 Iteration 4
8 8
6 6
4 4
2 2
y
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Starting with two initial centroids in one cluster of each pair of clusters
10 Clusters Example
Iteration 4
1
2
3
8
2
y
-2
-4
-6
0 5 10 15 20
x
Starting with some pairs of clusters having three initial centroids, while other have only
one.
10 Clusters Example
Iteration 1 Iteration 2
8 8
6 6
4 4
2 2
y
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
Iteration
x 3 Iteration
x 4
8 8
6 6
4 4
2 2
y
y
0 0
-2 -2
-4 -4
-6 -6
0 5 10 15 20 0 5 10 15 20
x x
Starting with some pairs of clusters having three initial centroids, while other have only
one.
Solutions to Initial Centroids
Problem
• Multiple runs
• Helps, but probability is not on your side
• Sample and use hierarchical clustering to determine initial centroids
• Select more than k initial centroids and then select among these
initial centroids
• Select most widely separated
• Postprocessing
• Bisecting K-means
• Not as susceptible to initialization issues
Handling Empty Clusters
• Several strategies
• Choose the point that contributes most to SSE
• Choose a point from the cluster with the highest SSE
• If there are several empty clusters, the above can be repeated several times.
Updating Centers Incrementally
• In the basic K-means algorithm, centroids are updated after all points
are assigned to a centroid
6 5
0.2
4
3 4
0.15 2
5
2
0.1
1
0.05
3 1
0
1 3 2 5 4 6
Strengths of Hierarchical Clustering
• Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there
are k clusters)
48
Dendrogram
Dendrogram 3
4
5
2
1
• Single Linkage
• Complete Linkage
• Group Average Linkage
• Centroid Linkage
• Median Linkage
50
Single Linkage
Cluster B
Cluster A
51
1 2 3 4 5
(12) 3 4 5
1 0.0
2 2.0 0.0 (12) 0.0
3
6.0 5.0 0.0 3 5.0 0.0
9.0 4.0 0.0
4 10.0 9.0 4.0 0.0 4
9.0 8.0 5.0 3.0 0.0
5 8.0 5.0 3.0 0.0 5
Dendrogram
2
1
(12) 0.0
3 5.0 0.0
9.0 4.0 0.0
4
8.0 5.0 3.0 0.0
5
Dendrogram
2
1
2
1
Dendrogram 5
4
2
1
Dendrogram 5
4
3
2
1
Dendrogram 5
4
3
2
1
Dendrogram 5
4
3
2
1
• Single Linkage
• Complete Linkage
• Group Average Linkage
• Centroid Linkage
• Median Linkage
59
Complete Linkage
Cluster B
Cluster A
60
1 2 3 4 5
(12) 3 4 5
1 0.0
2 2.0 0.0 (12) 0.0
3
6.0 5.0 0.0 3 6.0 0.0
10.0 4.0 0.0
4 10.0 9.0 4.0 0.0 4
9.0 9.0 5.0 3.0 0.0
5 8.0 5.0 3.0 0.0 5
Dendrogram
2
1
(12) 0.0
3 6.0 0.0
10.0 4.0 0.0
4
9.0 5.0 3.0 0.0
5
Dendrogram
2
1
Dendrogram 5
4
2
1
(12) 0.0
3 6.0 0.0
10.0 5.0 0.0
(4 5)
Dendrogram 5
4
2
1
Dendrogram 5
4
3
2
1
(12) 0.0
(3 4 5) 10.0 0.0
Dendrogram 5
4
3
2
1
(12) 0.0
(3 4 5) 10.0 0.0
Dendrogram 5
4
3
2
1
• Single Linkage
• Complete Linkage
• Group Average Linkage
• Centroid Linkage
• Median Linkage
68
Group Average Clustering
Cluster B
3
Cluster A 4
5
2
1 69
Distance
• Single Linkage
• Complete Linkage
• Group Average Linkage
• Centroid Linkage
• Median Linkage
70
Centroid Clustering
Cluster A b
a
71
Distance
• Single Linkage
• Complete Linkage
• Group Average Linkage
• Centroid Linkage
• Median Linkage
72
Median Clustering
Cluster B
Cluster A
a
b
73
Divisive Methods
74
Polythetic Approach
• Distance
• Single Linkage
• Complete Linkage
• Group Average Linkage
• Centroid Linkage
• Median Linkage
75
Polythetic Approach
1 2 3 4 5 6 7 D(1, *) = 26.0
1 0
D(2, *) = 22.5
2 10 0
3 7 7 0
D(3, *) = 20.7
4 30 23 21 0
5 29 25 22 7 0 D(4, *) = 17.3
6 38 34 31 10 11 0
D(5, *) = 18.5
7 42 36 36 13 17 9 0
D(6, *) = 20.7
A = {1 }
1 2 3 4 5 6 7
1 D(2, A) = 10
0
2 10 0 D(3, A) = 7
3 7 7 0
4 30 23 21 0 D(4, A) = 30
5 29 25 22 7 0
D(5, A) = 29
6 38 34 31 10 11 0
D(6, A) = 38
7 42 36 36 13 17 9 0
A = {1 } D(7, A) = 42
B = {2, 3, 4, 5, 6, 7}
77
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 10 D(2, B) = 25.0
0
2 10 0 D(3, A) = 7 D(3, B) = 23.4
3 7 7 0
4 30 23 21 0 D(4, A) = 30 D(4, B) = 14.8
5 29 25 22 7 0
D(5, A) = 29 D(5, B) = 16.4
6 38 34 31 10 11 0
D(6, A) = 38 D(6, B) = 19.0
7 42 36 36 13 17 9 0
B = {2, 3, 4, 5, 6, 7}
78
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 10 D(2, B) = 25.0 2 = 15.0
0
2 10 0 D(3, A) = 7 D(3, B) = 23.4 3 = 16.4
3 7 7 0
4 30 23 21 0 D(4, A) = 30 D(4, B) = 14.8 4 = -15.2
5 29 25 22 7 0
D(5, A) = 29 D(5, B) = 16.4 5 = -12.6
6 38 34 31 10 11 0
D(6, A) = 38 D(6, B) = 19.0 6 = -19.0
7 42 36 36 13 17 9 0
B = {2, 3, 4, 5, 6, 7}
79
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 10 D(2, B) = 25.0 2 = 15.0
0
2 10 0 D(3, A) = 7 D(3, B) = 23.4 3 = 16.4
3 7 7 0
4 30 23 21 0 D(4, A) = 30 D(4, B) = 14.8 4 = -15.2
5 29 25 22 7 0
D(5, A) = 29 D(5, B) = 16.4 5 = -12.6
6 38 34 31 10 11 0
D(6, A) = 38 D(6, B) = 19.0 6 = -19.0
7 42 36 36 13 17 9 0
B = {2, 3, 4, 5, 6, 7}
80
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 8.5
0
2 10 0 D(4, A) = 25.5
3 7 7 0
4 30 23 21 0 D(5, A) = 25.5
5 29 25 22 7 0
D(6, A) = 34.5
6 38 34 31 10 11 0
7 42 36 36 13 17 9 0 D(7, A) = 39.0
A = {1, 3 }
B = {2, 4, 5, 6, 7}
81
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 8.5 D(2, B) = 29.5
0
2 10 0 D(4, A) = 25.5 D(4, B) = 13.2
3 7 7 0
4 30 23 21 0 D(5, A) = 25.5 D(5, B) = 15.0
5 29 25 22 7 0
D(6, A) = 34.5 D(6, B) = 16.0
6 38 34 31 10 11 0
7 42 36 36 13 17 9 0 D(7, A) = 39.0 D(7, B) = 18.9
A = {1, 3 }
B = {2, 4, 5, 6, 7}
82
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 8.5 D(2, B) = 29.5 2 = 21.0
0
2 10 0 D(4, A) = 25.5 D(4, B) = 13.2 4 = -12.3
3 7 7 0
4 30 23 21 0 D(5, A) = 25.5 D(5, B) = 15.0 5 = -10.5
5 29 25 22 7 0
D(6, A) = 34.5 D(6, B) = 16.0 6 = -18.5
6 38 34 31 10 11 0
7 42 36 36 13 17 9 0 D(7, A) = 39.0 D(7, B) = 18.9 7 = -20.3
A = {1, 3 , 2 }
B = {2, 4, 5, 6, 7}
83
Polythetic Approach
1 2 3 4 5 6 7
1 D(2, A) = 8.5 D(2, B) = 29.5 2 = 21.0
0
2 10 0 D(4, A) = 25.5 D(4, B) = 13.2 4 = -12.3
3 7 7 0
4 30 23 21 0 D(5, A) = 25.5 D(5, B) = 15.0 5 = -10.5
5 29 25 22 7 0
D(6, A) = 34.5 D(6, B) = 16.0 6 = -18.5
6 38 34 31 10 11 0
7 42 36 36 13 17 9 0 D(7, A) = 39.0 D(7, B) = 18.9 7 = -20.3
A = {1, 3 , 2 }
B = {2, 4, 5, 6, 7}
84
Polythetic Approach
1 2 3 4 5 6 7
1 0
D(4, A) = 24.7
2 10 0
3 7 7 0 D(5, A) = 25.3
4 30 23 21 0
29 D(6, A) = 34.3
5 25 22 7 0
6 38 34 31 10 11 0 D(7, A) = 38.0
7 42 36 36 13 17 9 0
A = {1, 3 , 2 }
B = {4, 5, 6, 7}
85
Polythetic Approach
1 2 3 4 5 6 7
1 0
D(4, A) = 24.7 D(4, B) = 10.0
2 10 0
3 7 7 0 D(5, A) = 25.3 D(5, B) = 11.7
4 30 23 21 0
29 D(6, A) = 34.3 D(6, B) = 10.0
5 25 22 7 0
6 38 34 31 10 11 0 D(7, A) = 38.0 D(7, B) = 13.0
7 42 36 36 13 17 9 0
A = {1, 3 , 2 }
B = {4, 5, 6, 7}
86
Polythetic Approach
1 2 3 4 5 6 7
1 0
D(4, A) = 24.7 D(4, B) = 10.0 4 = -14.7
2 10 0
3 7 7 0 D(5, A) = 25.3 D(5, B) = 11.7 5 = -13.6
4 30 23 21 0
29 D(6, A) = 34.3 D(6, B) = 10.0 6 = -24.3
5 25 22 7 0
6 38 34 31 10 11 0 D(7, A) = 38.0 D(7, B) = 13.0 7 = -25.0
7 42 36 36 13 17 9 0
A B C
1 0 1 1
2 1 1 0
3 1 1 1
4 1 1 0
5 0 0 1
88
Monothetic
A
It is usually used when the data consists
B 1 0
of binary variables. 1 a=3 b=1
A B C 0 c=0 d=1
1 0 1 1 Chi-Square Measure
2 1 1 0 (ad bc) 2 N
3 1 1 1 AB 2
(a b)(a c)(b d )(c d )
4 1 1 0
(3 0) 2 5
4 3 2 1
5 0 0 1
= 1.875
89
A
B 1 0
Monothetic
1 a=3 b=1
0 c=0 d=1
Chi-Square Measure
It is usually used when the data consists
of binary variables.
A B C
1 0 1 1
2 1 1 0 (ad bc) 2 N
3 1 1 1 AB 2
(a b)(a c)(b d )(c d )
4 1 1 0
(3 0) 2 5
4 3 2 1
5 0 0 1
= 1.875
90
A
B 1 0 Attr AB
Monothetic
1 a=3 b=1 .
0 c=0 d=1 a 3
Chi-Square Measure b 1
It is usually used when the data consists c 0
of binary variables.
d 1
A B C N2 5
1 0 1 1 1.8
2 1 1 0 7 bc) 2 N
(ad
3 1 1 1 AB 2
(a b)(a c)(b d )(c d )
4 1 1 0
(3 0) 2 5
4 3 2 1
5 0 0 1
= 1.875
91
A
B 1 0 Attr AB AC BC
1 Monothetic
a=3 b=1 .
0 c=0 d=1 a 3 1 2
Chi-Square Measure b 1 2 1
It is usually used when the data consists c 0 2 2
of binary variables.
d 1 0 0
A B C N2 5 5 5
1 0 1 1 1.8 2.2 0.8
For attribute A,
2 1 1 0 2
AB AC 2
7 2 3
= 4.09
3 1 1 1 For attribute B,
2 2
4 1 1 0 AB BC = 2.70
For attribute C,
5 0 0 1 AC 2 BC 2 = 3.05
92
A
B 1 0 Attr AB AC BC
1 Monothetic
a=3 b=1 .
0 c=0 d=1 a 3 1 2
Chi-Square Measure b 1 2
We choose attribute A for
1
It is usually used when the data consists c
dividing 0
the data2into two2
of binary variables. groups.
d 1and {1,05} 0
{2, 3, 4},
A B C
N2 5 5 5
1 0 1 1 1.8 2.2 0.8
For attribute A,
2 1 1 0 2 2
AB AC = 4.09 7 2 3
3 1 1 1 For attribute B,
2 2
4 1 1 0 AB BC = 2.70
For attribute C,
5 0 0 1 AC 2 BC 2 = 3.05
93
Other Clustering
Techniques
What we learnt
• K-mean
• Dendrogram
Why Clustering?
•Model-Based Clustering
•EM Algorithm
•Density-Based Clustering
•DBSCAN
•Scalable Clustering Method
•BIRCH
EM Algorithm
( x )2
1
2 2
p(x| <, >) e
= 2
EM Algorithm
• EM Algorithm
• Expectation-Maximization
• Algorithm
• Step 1 (Parameter Initialization)
• Initialize all i and i One possible
• Step 2 (Expectation) implementation:
p(x|<i, i>)
• For each point x,
• For each cluster i, p(xCi) =jp(x|<j, j>)
• Calculate the probability that x belongs cluster i
• Step 3 (Maximization)
• For each cluster i,
• Calculate the mean i according to the probabilities that all points belongs to
cluster I
• Repeat Step 2 and Step 3
until the parameters converge
One possible
implementation:
.
Other Clustering Models
•Model-Based Clustering
•EM Algorithm
•Density-Based Clustering
•DBSCAN
•Scalable Clustering Method
•BIRCH
DBSCAN
• Traditional Clustering
• Can only represent sphere clusters
• Cannot handle irregular shaped clusters
• DBSCAN
• Density-Based Spatial Clustering of Applications with Noise
DBSCAN
• A border point has fewer than MinPts within Eps, but is in the
neighborhood of a core point
• Resistant to Noise
• Can handle clusters of different shapes and sizes
When DBSCAN Does NOT Work Well
(MinPts=4, Eps=9.75).
Original Points
• Varying densities
• High-dimensional data
(MinPts=4, Eps=9.92)
DBSCAN: Determining EPS and MinPts
•Model-Based Clustering
•EM Algorithm
•Density-Based Clustering
•DBSCAN
•Scalable Clustering Method
•BIRCH
BIRCH
• Advantages
• Incremental
• Scalable
BIRCH
• Mean x i
i 1
n
Average distance from
n
member objects to the
(x ) 2
• Radius R i 1
i
mean
n
n n
(x x ) 2
• Diameter D
i 1 j 1
i j
n(n 1)
Average pairwise
distance within a
cluster
BIRCH