KEMBAR78
Unit 5 Clustering | PDF | Cluster Analysis | Data Mining
0% found this document useful (0 votes)
8 views25 pages

Unit 5 Clustering

Uploaded by

vachan s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views25 pages

Unit 5 Clustering

Uploaded by

vachan s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

t

Chapter 5

Clustering
5.1 Introduction
Chapters 3 and 4 describe how samples may be classified if a training set is available to
use in the design of a classifier. However, there are many situations where the classes
themselves are initially undetined. Given a set of feature vectors sampled from some
population, we would like to know if the data set consists of a number of relatively
distinct subsets. If it does and we can determine these subsets, we can define them to
be classes. This is sometimes called class discovery. The techniques from Chapters
3 and 4 can then be used to further analyze or model the data or to classify new data
if desired. Clustering refers to the process of grouping samples so that the samples
called clusters.
similar within each group. The groups
are
are
be to discover the subgroups rather than
In applications, the main goal may
some
the marketing director of a firm that supplies
to model them statistically. For example,
businesses in a particular community fall
to know if the
business services may want so that specific service packages and
of similar c o m p a n e s
nto any natural groupings or subgroups. Reading the public
be designed tor
each these
some of these subgroups could
marketing plans can
give idea or what an

companies might particularly if the number of


data on these dificult
and unreliable,
would be techniques allow the
division
De, but the process Ciustering
companies is
large. Fortunately, preconceptions about what kinds
eatures or
done automaticallY,
WItnOUt any Cluster analysis has
be
nto subgroups to
in the comnuny being analyzed. to group
be found Faykel used cluster analysis
OTgroupings should For example, 1 131, "hostile,"
many fields. which were then called "anxious,"
een applied in four
clusters
c a n be
In image analysis, clustering
into
patients In
165 depr
pressed depressive."
order
and "young local textures, in
levels, colors, or
retardedd psychotic,"

pixels
with similar gray

used to find roups of regions


in
theimage.
various
scover
the
199
200

Animais

Ca's
Dogs

Large 1Sma air


Long HaifShort

S t . B e m a r d L a b r a d o r

4
5
3
2

clustering.
hierarchical

5.1: A
Figure
visiua!
found through
can be
clusters

features, data it the subgroups or


two of t h e
where there
are only scatterplot two bivariate
n cases
regions in a there are
for dense lt, for e x a m p l e ,
inspection by looking feature space. more than two standard
separated in the separated by
well are
at least one
classes a r e Figure 4.20
means
and their
classes data. In
distributed enough the classes
normally if there is found e v e n if
two distinct
peaks form which could be
deviations, distinct cluster,, high-dimensional
feature space
classes forms a exist in a
of the three distinct
clusters may onto a plane defined
unknown. However, of the data
were
of the projections c a n d i d a t e s for the
centers of
be apparent in any find
and still not
way to
general
the feature
a x e s . One
and find the peaks in the
by pair
a of
n - d i m e n s i o n a l histogram
oj the data have to
clusters is to form
an the histogram may
of features is large,
if the number and the locations
histogram. However, of samples in any cell,
significant number than
to have in advance, rather
a
be very coarse arbitrarily
between these cells are specified
of the boundaries
depending on the data.

5.2 Hierarchical Clustering


A hierarchy can be represented such as the simple one sno
by a tree structure
oups,
in Figure 5.1. The patients in an animal hospital are composed of two maln
turn,
1s,
dogs and cats, each of which composed of subgroups. Each subgroup
is
composed of subgroups, and so on. Each of the individual animals, 1 tnro
is represented at
thethatlowest level of the tree. Hierarchical clustering reesm
clustering process organizes the data into large groups, which c o n t a i na
groups, and so on. A hierarchical clustering may be drawn as a tre ree or dendri t s e l fforms
The finest grouping is at the bottom of the
dendrogram; each sample by 1
5.2.
HIERARCHICAL CLUSTERING 201

cter, The coarsest grouping


The c o a r s is at the top of the
a
clus ter.

are grouped into one cluster.


dendrogram, where all samples
In between, there are various numbers of clusters. For
rarchical
in the hierar clustering of Figure 5.1, at level 0 the clusters are
example

(1.12).13)14).15).
each consisting ot an ndividual sample. At level 1, the clusters are

{1,2), {3}, {4), {5}.


At level 2, the clusters are

{1,2), 13),{4,5).
At level 3, the clusters are

{1,2,3), {4,5.
At level 4, the single cluster
{1,2,3,4,5}
consists of all the samples.
In a hierarchical clustering, level two samples belong to a cluster, they
if at some

to the same cluster at all higher levels. For example,


in Figure 5.1, at level 2
belong to the same
samples 4 and 5 belong to the same cluster; samples 4 and 5 also belong
cluster at levels 3 and 4.
called agglomerative if they build the den-
Hierarchical clustering algorithms are

bottom up and they are called divisive if they build the dendrogram
drogram from the
from the top down.
is straightforward to describe. The
The general agglomerative clustering algorithm
denoted by n.
total number of samples will be

Agglomerative Clustering Algorithm

clusters, each consisting


of one sample.
1. Begin with n

times.
2. Repeat step 3 a total of n - 1
and C; into one cluster.
clusters Ci and C; and merge C;
3. Find the most similar
the first pair found.
If there is a tie, merge

clustering algorithms
are obtainedby using different meth-
Different hierarchical measure the similarity between
of clusters. One way to
ods to determine the similarity between clusters. This distance
measures distance
clusters is to define a function that
function that measures the distance
is induced by an underlying
function typically as n nearest neighbor techniques (Sec-
In cluster analysis
between pairs of samples. m e a s u r e s are Eucidean distance
and city block
distance
tion 4.4), the most popular
distance.
202 ERING

&

1 2

15 20 25 30
05 10

Feature 1

Figure 5.2: Samples for clustering

The Single-Linkage Algorithm


The single-linkage algorithm is also known as the minimum method and the
nearest neighbor method. The latter title underscores its close relation to the
nearest neighbor classification method. The single-linkage algorithm is obtained by

defining the distance between two clusters to be the smallest distance between two
points such that one point is in each cluster. Formally, if C; and C; are clustes, the
distance between them is defined as

DsL(Ci, C) =min d(a, b),


aECi,bEC;
where d(a, b) denotes the distance between the samples a and b.

Example 5.1 Hierarchical clustering ustng the stngle-linkage algorithm.


Perform a hierarchical clustering of five samples using the single-linkage algorithm and
two features, x and y. A scatterplot ot the
data 1s shOwn in Figure 5.2. Use Euclidean
distance for the distance between samples. The
tollowing tables give the feature values
for each sample and the distance d between each pair of samples:

4.0 11.7 20.0


8 2 4.0 8.1 16.0
21.5
17.9
3 15 8 3 11.7 8.1 9.8 9.8 (5.1)
24 4 20.0 16.0 9.8
8.0
5 21.5 17.9 9.8 8.0
5 24 12
5.2. HIERARCHICAL CLUSTERING 203

For the single-sample chusters {a} and {b}, Dsu({a), {b}) =


d{a,b).
The algorithm begins with five clusters, each consisting of one sample. The two
nearest clusters are then merged. The smallest number in (5.1) is 4, which is the
distance between samples 1 and 2, so the clusters {1} and {2} are merged. At this
point there are four clusters

{1,2),{3}, {4}, {5).


Next obtain the matrix that gives the distances between these clusters:

12 3
{1,2) 8.1 16.0 17.9
8.1 9.8 9.8
4 16.0 9.8 8.0
5 17.9 9.8 8.0

The value 8.1 in row {1,2} and column 3 gives the distance between the clusters {1,2}
and 13 and is computed in the following way. Matrix (5.1) shows that d(1,3) = 11.7
and d{2,3) = 8.1. In the single-linkage algorithm, the distance between clusters is the
minimum of these values, 8.1. The other values in the first row are computed in a
similar way. The values in other than the first row or first column are simply copied
from the previous table (5.1). Since the minimum value in this matrix is 8, the clusters
{4) and {5} are merged. At this point there are three clusters:

{1,2), {3}, 4,5}.


Next obtain the matrix that gives the distance between these clusters:

{1,2 3 {4,5}
{1,2} 8.1 16.0
3 8.1 9.8
14,5 16.0 9.8

Since the minimum value in this matrix is 8.1, the clusters {1, 2 and {3} are merged.

At this point there are two clusters:

(1,2,3),14,5).

clusters at a distance of 9.8. The


h e next step will merge the two remaining
is shown in Figure 5.3.
erarchical clustering is complete. The dendrogram
3

single-linkage algorithm.
The distance
Figue 5.3: Hierarehical clustering using the
the vertical axis.
Ds1 betwen clusters that merge is shown on

The Complete-Linkage Algorithm


the aximum method or the far.
The complete-linkage algorithm is also called
between two clusters
thest neighbor method. It is obtained by detining the distance
to be the largest distance between a sample in one cluster and a sample in the other
cluster. 1f C; and C, are clusters, we define

Dci(Ci.C;) =_max d(a.b).


aECi.DE

Example 5.2 Hieranchical ciustering using the complete-linkage algorithm.

Perform a hierarchical clustering using the complete-linkage algorithm on the data


shown in Figure 5.2. Use Euclidean distance (4.1) for the distance between samples.
As before. the algorithm begins with five clusters, each
consisting of one sample.
The nearest clusters {1f and (2} are then merged to
produce the clusters
1.2).3.4}.15)
Next obtain the matrix that gives the distances between these clusters:

{1.2 3 5
{1.2} 11.7 20.0 21.5
11.7 9.8 9.8
20.0 9.8 8.0
21.3 9.8 8.0
5.2. HIERARCHICAL CLUSTERING 205

The value 11.7 in row {1,2} and column 3 gives the distance between the clusters {1,2}
and (3} and is computed in the following way. Matrix (5.1) shows that d(1,3) = 11.7
and d(2,3) = 8.1. In the complete-linkage algorithm, the distance between clusters is
the maximum of these values, 11.7. The other values in the first row are computed in
a similar way. The values in other than the first row or first column are simply copied
from (5.1). Since the minimum value in this matrix is 8, the clusters {4} and {5} are
merged. At this point the clusters are

{1,2), 13), {4,5).


Next obtain the matrix that gives the distance between these clusters:

(1,2) 3 (4,5
{1,2)
3 11.7
11.7 21.5
9.8
(4,5) 21.5 9.8

Since the minimum value in this matrix is 9.8, the clusters {3} and {4,5} are merged.
At this point the clusters are
{1,2}, {3,4,5).
Notice that these clusters are different from those obtained at the corresponding point
of the single-linkage algorithm.
At the next step, the two remaining clusters will be merged. The hierarchical
clustering is complete. The dendrogram is shown in Figure 5.4.

A cluster, by definition, contains similar samples. The single-linkage algorithm


and the complete-linkage algorithm differ in how they determine when
samples in two
clusters are similar so they can be merged. The single-linkage algorithm says that two
clusters C; and C; are similar if there are any elements a in C; and b in that
C; are
similar, in the sense that the distance between a and b is small. In other words, in
the single-linkage algorithm, it takes a single similar pair a, b with a in C; and b in C,
n order to merge C; and C. (Readers familiar with graph theory will recognize this
procedure as that used by Kruskal's algorithm to find a minimum spanning tree.) On
the other hand, the complete-linkage algorithm says that two clusters C and O, are
Snilar if the maximum of Dei(e,5) over all a in C; and b in C is small. In other
words, in the complete-linkage algorithm all pairs in C; and Cj must be similar in
order to merge Ci and
Cj.
The Average-Linkage Algorithm

Single-linkage algorithm allows clusters to grow long and thin whereas the complete
g e algorithm produces more compact clusters. Both clusterings are susceptible to
CHAPTER 5. CLUSTERIN
206

1 2 3 4

Figure 5.4: Hierarchical clustering using the


complete-linkage algorithm.
distortion by outliers or deviant
observations. The
an
attempt to compromise between the average-linkage
extremes of the single- and algorithm is
algorithms. complete-linkage
The
average-linkage clustering algorithm, also known
group method using as the
used hierarchical arithmetic averages
(UPGMA), unweighted pair-
is one of the most
defining the
clustering algorithms. The widely
in
distance between two clusters
cluster and a point in the
one to be the
average-linkage
algorithm is obtained by
average
members and C; is a cluster with other cluster. Formally, ifdistance between a point
nj members, the C; is a cluster with ni
distance between the
clusters is
DAL Ci,C) -* =

da, b).
aECi,bECi

Example 5.3
Hierarchical clustering ustng the
Perform a hierarchical average-linkage algorithm.
th:
in clustering using the
Figure 5.2. Use Euclidean
The distance (4.1)average-lnkage
algorithm with five clusters, each the distancealgorithm
for
clusters {1} and begins on
the data shown
{2} are then between sa
merged to form theconsisting
clusters
of one
samnl mp
nearest
1,2),{3), {4}, 15).
52 HIERARCHICAL CLUSTERING 207
5.2
that gives the distance between these clusters:
Next obtain the matrix

1,2) 3 5
{1,2) 9.9 18.0 19.7
9.9 9. 8 9.8
4 18 9.8 8.0
5 19.7 9.8 8 . 0

and column 3 gives the distance between the clusters {1,2}


The value 9.9 in row {1,2} 11.7
Matrix (5.1) shows that d(1,3)
=

and {3} and is computed in the following way. distance between clusters is
and d(2,3) 8.1. In the average-linkage algorithm, the
=

a r e computed in
a
these values, 9.9. The other values in the first row
the average of
row o r first column are simply copied
similar way. The values in other than the first
Since the minimum value in this matrix is 8,
the clusters {4} and {5} are
from (5.1).
merged. At this point the clusters are

{1,2). {3), {4,5).


clusters:
distance between these
Next obtain the matrix that gives the

3 14,5
{1,2) 9.9 18.9
{1,2) 9.8
3 9.9
{4,5} 18.9 9.8

and {4,5} are merged.


Since the minimum value in this
matrix is 9.8, the clusters {3
At this point the clusters are
{1,2}, {3,4,5).
clusters are merged and the hierarchical clustering
At the next step, the two remaining
s complete.

to a larger data set


of the application of the average-linkage algorithm
An example in Appendix B.4.
software package is presented
using the SAS statistical analysis

Ward's Method
method. Like the other
minimum-variance
method is also called
the At
Ward's with o n e cluster for
each individual sample.
gorithms, Ward's method begins it merges the pair that
produces the smallest
all pairs of clusters, cluster is
nteration, among e r r o r for each
of clusters. The squared
for the resulting set
error t h e feature
Where x; 1S
ared samples X1,.. .

, Xm
e n e d as follows. If a cluster contains m
208 CHAPTER 5.
vector (Til, .. . , Tia), the squared error for sample xXi-which is

distance from the mean-is


d
is the the CLUSTERI
squared Euehid
N
y-4)
J=l
where u; is the mean value of feature
j for the samples in the cluster

m
i=1
The squared error E for the entire cluster is the sum of the
samples squared errore
of the
mo.
The vector
composed of the means of each
mean vector or
centroid of the cluster. Thefeature, (1, . , Ha) =
4, is
squared distances in each feature from squared error for a cluster iscalled
of the the
squared error is thus equal to the the cluster members to
their
the sum
of total
samples in the cluster m, where the variance of the cluster mean. The
the sum of the total variance is o times the number
defined variances for each defined to be a
feature. The
squared error for a set ofof+...+o
=
to be the sum of
the squared
errors for the
individual clusters is
clusters.
Example 5.4
Hierarchical clustering
Perform a hierarchical using Ward's method.
5.2. The
algorithm clustering using Ward's
begins
point, the squared error is with five clusters each method on the data
shown in Figure
Merge {1} and (2,
for each
zero. There
merge {1} and
are 10 consisting
possible
of one
sample. At tS
possibility. For example, {3}, and ways to
merge
5.5 shows thepair
so on. a of
the feature Figure cluste
vector (4,4) and sample 2consider merging {1} and {2}.
are 6 and 4. squared
The has the
squared error for cluster feature vector Since sample erto i

{1,2} is (8,4), the feature mea u


(4 6)+(8-
The squared error for each of the 6)+(4 - 4)2 + (4 -4)2
squared error for the custers other clusters = 8.
{1,2}, {3},{4}, (3}, {4}, and {5}
{5} is is 0.
Thus the to
Since the smallest 8 +0+0+0 8.
to give the squared error in
clusters Figure 5.5 is 8, the
clusters {1} and
12}
{1,2). (3), {4}, are mergeu
{5}.
5.2. HIERARCHICAL CLUSTERING 209

Squared
Clusters Error, E
{1,2).{3).{4),{5) 8.0
{1,3),{2,{4),15} 68.5
{1,4},{2},f3},{5} 200.0
{1,5),{2),{3},{4} 232.0
{2,3},{1}.{4},{5} 32.5
{2,4}.{1}.{3),{5} 128.0

{2,5},{1},{3},14} 160.0
{3,4}.{1},{2},{5} 48.5
{3,5},{1},{2).{4} 48.5
L(4,5),{1},{2}13}| 32.0

5.5: for each way of creating four clusters.


Figure Squared errors

Squared
Clusters Error, E
72.7
{1,2,3)14),{5} 224.0
{1,2,4},{3},{5}
{1,2,5),{3},{4} 266.7
{1,2),{3,4},{5} 56.5
{1,2),{3,5},{4} 56.5
40.0
{1,2),{4,5},{3}|

5.6: Squared errors for three clusters.


Figure

that result from


5.6 shows the squared error for all possible sets of clusters
Figure e r r o r in Figure 5.6 is
Since the smallest squared
merging two of {1,2}), {3}, {4}, {5}. to form the clusters
40, the clusters {4} and {5} are merged
1,2), (3). f4,5).
sets of clusters that result from
squared error for all possible
Figure 5.7 shows the squared error in Figure 5.7 is 94,
Since the smallest
nerging two of {1, 2}, {3}, {4, 5. the clusters
are merged to give
ne clusters {3} and {4,5}
{1,2}, {3,4,5.
CHAPTER .
CLUSIERING
210
Squared

Error, E
Clusters 104.7
{1,2,3),{4,5}
380.0
{1,2,4,5),{3
94.0
{1,2},{3,4,5}

clusters.
errors
for two
5.7: Squared
Figure

method.
Figure 5.8: Dendrogram for Ward's

cu
are merged and the hierarchical
step, the two remaining clusters
At the next
is shown in Figure 5.8.
tering is complete. The resulting dendrogram

5.3 Partitional Clustering


rasts

Agglomerative clustering (Section 5.2) creates a series of nested clusters. This c o r


lusters
with partitional clustering in which the goal is usually to create one set or e d
assume

that partitions the data into similar groups. Samples close to one another are tbat
to be similar and the goal of the partitional clustering algorithms is to group d to be
to
are close together. In many of the partitional algorithms, the number of cluste
constructed is specified in advance.
the
and
If a
partitional algorithm is used to divide the data set into two group ogra
Broups,

each of these groups is divided into two parts, and so on, a hierarchical
could be produced from the top down. The hierarchy produced by
aeisive
this
5.3. PARTITIONAL CLUSTERING 211

general than the bottom-up hierarchies produced by agglomerative


.ochnique is
more
can be divided into more subgroups in
than two
one
because the groups
techniques agglomerative technique
would be for two
only way this could happen for an
step. (The if allowed by the algorithm.)
to tie, which would be extremely unlikely even
which
distances is that only the top part of the tree,
advantage of partitional techniques
Another be required, and there may
main groups and possibly their subgroups, may
shows the All of the examples in this
section a s s u m e
to complete the dendrogram.
be no need could use any distance
measure.

distances are used, but the techniques


that Euclidean

Forgy's Algorithm
|Forgy).
partitional clustering algorithms is Forgy's algorithm
to be
One of the simplest consists of k, the
number of clusters
to the algorithm
Besides the data, input The seed points could be
chosen
called seed points.
constructed, and k samples structure could be
used to guide
or some
of the desired cluster
knowledge
randomly,
their selection.

Forgy's Algorithm

centroids to the seed points.


cluster
1. Initialize the
centroid nearest it.
Put the sample in thee
find the cluster
2. For each sample, cluster centroid.
with this nearest
cluster identified

clusters in step 2, stop.


3. If no samples changed
clusters and go to step 2.
centroids of the resulting
the
4. Compute

Partitional clustering using Forgy's algorithm.


Example 5.5
data shown in Figure
Forgy's algorithm o n the
clustering using two samples (4.4)
Perform a partitional clusters, and use the first
two
produce be denoted by
5.2. Set k =2, which will In this algorithm, the samples will
seed points. aid in the computation.
and (8,4) in the list as
numbers to
rather than
their sample 5.9 shows the
their feature
vectors
centroid for each samnple. Figure
nearest cluster
For step 2, find the
(24,4), (24,12)} are produced.
and {(8,4), (15,8), first cluster
The clusters {(4,4)} centroid of the
The
results. clusters.
centroids of the
For step 4, compute the 7) since
cluster is (17.75,
the second
is (4, 4). The centroid of
= 17.75
(8+ 15+ 24+24)/4
CHAPTER 5. CLUSTERING
212

Nearest
Sample Cluster Centroid
(4,4) (4,4)
(8,4 (8,4)
(15,8) (8,4)
(24,4) (8,4)
(24,12) (8,4)
Figure 5.9: First iteration of Forgy's algorithm.

Nearest
Sample Cluster Centroid
(4,4) (4,4)
(8,4) (4,4)
(15,8) (17.75,7)
(24,4) (17.75,7)
(24,12) (17.75,7)
Figure 5.10: Second iteration of Forgy's algorithm.

and
(4 +8+4 + 12)/4 =7.

Since some samples changed clusters (there were initially no clusters), return to step

2.
Find the cluster centroid nearest each sample. Figure 5.10 shows the results. The
clusters {(4,4), (8, 4)} and {(15,8), (24, 4), (24, 12)} are produced.
For step 4, compute the centroids (6, 4) and (21,8) of the clusters. Since the sample

(8,4) changed clusters, return to step 2.


Find the cluster centroid nearest each sample. Figure 5.11 shows the results. The
clusters {(4, 4), (8,4)} and {(15, 8), (24, 4), (24, 12)} are obtained.
For step 4, compute the centroids (6, 4) and (21,8) of the clusters. Since no sample
will change clusters, the algorithm terminates.

In this version of Forgy's algorithm, the seed points are chosen arbitrarily as the
first two samples; however, other possibilities have been suggested. One alternat1ve s
to begin with k clusters generated by one of the hierarchical clustering algorithms ana
use their centroids as initial seed points.
5.3. PARTITIONAL CLUSTERING 213

Nearest
Sample Cluster Centroid
(4,4) (6,4)
(8,4) (6,4)
(15,8) (21,8
(24,4) (21,8)
(24,12) (21,8)

Figure 5.11: Third iteration of Forgy's algorithm.


It has been
proved (Selim] that Forgy's algorithm terminates; that is, eventually no
samples change clusters. However, if the number of
samples is large, it may take the
algorithm considerable time to produce stable clusters. For this reason, some versions
of Forgy's algorithm allow the user to restrict the number of iterations. Other versions
of Forgy's algorithm |Dubes] permit the user to supply
clusters to be created and
parameters that allow new
to establish a minimum cluster size.

The -means Algorithm


An algorithm similar to
Forgy's algorithm known as the k-means algorithm. Be-
is
sides the data, input to the
algorithm consists of k, the number of clusters to be
constructed. The k-means algorithm differs from
troids of the clusters are
Forgy's algorithm in that the cen-
recomputed as soon as a sample joins a cluster. Also, unlike
Forgy's algorithm which is iterative, the k-means algorithm makes only two passes
through the data set.

k-means Algorithmn

1. Begin with k clusters, each consisting of one of the first k samples. For each
of the remaining n k samples, find the centroid nearest it. Put the
-

sample in
the cluster identified with this nearest centroid. After each
sample is assigned,
recompute the centroid of the altered cluster.
. Go through second time. For each sample, find the centroid nearest
the data a

E Pat the sample in the cluster identified with this nearest centroid. (During
nls
step, do not recompute any centro1d.)

Examy P I e 5.6 Partitional clustering using the k-means algorithm.


ING

Distance to
214
Distanceto Centroid
(24,8)
Centroid
(9,5.3) 16.5
Sample 1.6 4.0
(8,4) 15.L 9.0
(24,4) 6.6 40.4
(15,8) 6.6 4.0
(4,4) 16.4
(24,12)
k-means algorithm.
2 of the
Distances for use by step
5.12:
Figure
the data in .
the k-means algorithm
on
Figure
using the hrst two samples a r
partitional clustering so t h a t
Perform a are ordered
the data
k =2 and a s s u m e that
5.2. Set
which have
centroids at
(8, 4) and (24, 4). {(8,4)} and {(24, 4)}
begin with two clusters find the centroid nearest
For step 1, three samples,
For each of the remaining
(8,4) and (24, 4). recompute the centroid of this
cluster.
this cluster, and
it, put the sample in so it joins cluster {(8,4)
nearest the centroid (8,4)
The next sample (15,8) is The centroid of the first
At this point, the clusters are {(8,4), (15, 8)} and {(24,4)}.
(11.5, 6) since
cluster is updated to

6.
(8+15)/2=11.5, (4+8)/2 =

The next sample (4, 4) is nearest the centroid (11.5, 6) so it joins cluster {(8,
(15,8)). At this point, the clusters are {(8,4), (15, 8), (4,4)} and {(24,4)}. The cet
troid of the first cluster is updated to (9,5.3).
The next sample (24, 12) is nearest the centroid (24, 4) so it joins cluster {(24,4)
At this point, the clusters are{(8,4), (15,8), (4, 4)} and {(24, 12), (24, 4)}. The ceno
of the second cluster is
updated to (24,8). At this point, step 1 of the
complete. algori
For
step 2, examine the samples one by one and entifed

with the nearest centroid. As put each one in the cluster 1a


The resulting clusters are Figure 5.12 shows, in this
case no sample chang8

{8,4), (15, 8),(4,4)} and {(24, 12), (24, 4)}.


An alternative version of the
2 is
replaced by the following k-means algorithm iterates Step 2.2. Specifically, ste
2. For steps 2 through 4: step >po
each
sample,
identified with thisfind the centroid nearest it. clust
nearest centroid. Put the
sample
Sample in
the
5.3. PARTITIONAL CLUSTERING 215

3. If no samples changed clusters, stop.


4. Recompute the centroids of altered clusters and go to step 2.

The goal of Forgy's algorithm and the k-means algorithm is to minimize the squared
error (as defined in Section 5.2) for a fixed number of clusters. These algorithms assign
samples to clusters so as to reduce the squared error and, in the iterative versions, they
stop when no further reduction occurs. However, to achieve reasonable computation
time, they do not consider all possible clusterings. For this reason, they sometimes
terminate with a clustering that achieves a local minimum squared error. Furthermore,
in general, the clusterings that these algorithms generate depend on the choice of the
seed points. For example, if Forgy's algorithm is applied to the data in Figure 5.2

using (8,4) and (24, 12) as seed points, the algorithm terminates with the clusters
{(4,4), (8, 4), (15, 8)}, {(24,4), (24, 12)). (5.2)

was 13 different from the clustering produced in Example 5.5. The clustering (5.2) has
error of 104.7 whereas the clustering of Example 5.5 has a squared e r r o r of
a94.squared
The clustering (5.2) produces a local minimum; the clustering of Example 5.5 can
be shown to produce a global minimum. For a given set of seed points, the resulting
clusters may also depend on the order in which the points are checked.

The Isodata Algorithm


The isodata algorithm can be considered to be an enhancement of the approach

taken and the k-means algorithm. Like those algorithms, it tries


by Forgy's algorithm
samples to the nearest centroid. Unlike
o minimize the squared error by assigning
a fixed number of clusters but rather it deals
those it does not deal with
algorithms,
to range over an interval that includes the number of
with k clusters where k is allowed discards clusters with too few elements. Clustters are
CUsters the user. It
requested by
if the number of clusters grows too large
or if clusters are too close together.
merged if the cluster contains very
clusters is too few or
A cluster is split if the number of follow.
ssimilar samples. The specific details parameters are required by the
5esides the data and seed points, the following
Sodata algorithm:
number of seed points
clusters, which is also the
Custers: the desired number of
cluster
e m e n t s : the minimum number
of samples permitted per
between cluster centroids
without merging
in.c the minimum distance permitted
st:
them
controls splitting
plit -812e: a parameter that
of tho
tho algorithm
STERING
hrst part of
the
t t e r a l o n a
1n

umbor
ot
ilerntion
atart: maxin per
ter nergen
chunter
of
uber

max merge:
m a x i n m m

withn
the
mln
part. of the a.

gorithn
iterationa
umber of
m a x i m u m

iter body: nlgorithm.


in the
further expluinend
are
parameters

Thee
Isodata A l g o r i t h n

algorithm.)
lnilinlize the cluster centrois
are like Porgy's
. (Steps I through
the eend points.
centroid
nearest it. Put the sampleiintothe
For each sample,
find the cluster
2. neurest
cluster centroid,.
chusteridentified with this

clusters.
centroids of the resulting
3. Compute the
number of iterations is less than
clusters and the
4. If at least one sample changed
step 2.
iter.start, go to
discard the sampls
fewer than min.elements samples. Also
5. Discard clusters with
they contain.
n o - c l u s t e r s or the
equal to 2 *
is greater than or
6. If the number of chusters otherwise
execute step 7 (merging operation);
number of this iteration is even,
to step 8.
g
cu
these
centroids is less than min_dist, merge
7. If the distance between two
ters and update the centroid; otherwise, go
to step 7. Repeat this step up
times and then go to step 8.
max merge
umber
or the
8. If the number of clusters is less than or equal to no_clusters/2 to step
otherwise, Bo
iteration is odd, execute step 9
of this (splitting operation);
10.
which

y
9. Find a chuster that has a standard deviation for some variable, opee*
exceeds split.size *O, Where og is the standard deviation ot r i " i t h i n the
original set of sunples. If none, go to step 10. Compute the mean o r of tho
Isisting«

cluster. Separate the samples in this cluster into two sets, one conss of
ther

in r is than or equal to the mean, and the other


which consishuste
c o n s i s t i n g

in which r
greater
is less than the e a n .
l e r

Compute the centroids of thes min d


If the distance between these centroids is s p l i t the

the
greater than or equal not
replace original cluster by these two clusters; otherwise,
cluster.
E
P

02 4 5 8 10 12
218 CHAPTER 5. CLUSTERINO
For step 6, the number of clusters is not greater than or equal to 2 * no.clustes.
and the number (0) of this iteration is even, so the merging operation (step 7):
executed.
)s
Since the distance between the centroids of the clusters (1,2, 4,6) and
less than min.dist, these clusters
f3, 5, 7) is
At this point, there are two
are merged. clustes
1,2,3,4,5, 6,7}, {8,9, 10, 11, 12, 13, 14}.
The merge step is not repeated (since max merge =1) so proceed to step 8. (In this
case, the remaining clusters could not be merged anyway since the distance between
their centroids is greater than min.dist.)
For step 8, the number of clusters
(2) is greater than no.clusters/2 = 1.5 and the
number of this iteration is not odd, so go to
step 10.
For step 10, since the number of iterations is less than the
and the
requested number (5)
clusters changed, proceed to step 2.
This time the Forgy part of the algorithm (steps 1 through 4) does not
clusters. change the
For step 5, since no cluster has fewer than minelements
carded. members, none is dis-
Forstep 6, the
number of clusters is not
and the number (1) of this iteration is
greater than or equal to 2 * no_clusters
odd, so proceed to step 8.
For step 8, the number of clusters
number of this iteration is odd, so the
(2) is greater than no.clusters/2 and the
splitting operation (step 9) is executed.
For step 9, there is cluster
for the variable
a
{8,9,10, 11, 12, 13, 14} that has a standard deviation
(y) exceeding split.size * Gy. The samples are then divided into two
sets that have y values less than, or greater than, the mean value of y in the cluster:
{8,9, 10, 11, 12, 13), {14}
Since the distance between their centroids is
the cluster remains
greater than or equal to 1.1 *
min.a.
split.
At this point there are three clusters:
(1,2,3,4,5,6,7), {8,9, 10, 11, 12, 13), {14}.
For step 10, since the number of
iterations is less than the
the clusters changed, proceed to step 2. requested number and
Again, theForgy part of the algorithm (steps
clusters.
1 through 4) does not change the
For step 5, cluster {14} is discarded since it has fewer than
At this point there are two clusters:
min elements members.
{1,2,3,4,5,6,7}, {8,9, 10, 11, 12, 13}.
5.3. PARTITIONAL CLUSTERING
219
For step 6, the number of
clusters is less than 2*
f this iteration is even, so the no
merging operation (step clusters
is
and the number
(2)
Since the distance between the 7) executed.
centroids of the clusters

{1,2,3,4,5,6,7} and
is not less than
{8,9,10,11,12, 13)
min.dist,
these clusters are not
For step 8, the number of
clusters (2) is
merged. Proceed to step 8.
number of this iteration is even, so go to step 10.greater than and the
no_clusters/2
For step 10, since the number of
the clusters changed, proceed to
iterations is less than the requested number and
step 2.
Again the Forgy part of the
algorithm (steps 1 through 4) does not change the
clusters.
For step 5, since no cluster has fewer than
carded.
min.elements members, none is dis

For step 6, the number of clusters is less


than 2* no.clusters and the number (3)
of this iteration is odd, so proceed to
step 8.
For step 8, the number of clusters
(2) is greater than no.clusters/2 and the
number of this iteration is odd, so the splitting
operation (step 9) is executed.
For step 9, no cluster has a variable whose standard deviation exceeds
split.size*
0, so proceed to step 10.
For step 10, the number of iterations is less than the requested number, but no
clusters changed, so the algorithm terminates.

The isodata algorithm has been used in many engineering and scientific applica-
tions. Example 5.7 shows that the algorithm requires extensive computation, even for
a small data set. For large data sets, it may be too expensive to run on conventional
reports that the isodata algorithm required seven
single-processor computers. [Tilton|
hours of computing time on a VAÄ-11/780 to cluster the gray levels in a 512 x 512 im-

Parallel systems can do much better; the same problem ran in 20


age into 16 clusters.
an array or L0,i84 processors. Whether large amounts
seconds on a system containing
of time or massively parallel computers are necessary is debatable. One study (Mezzich|
often outperform the isodata algorithm.
showed that simple k-means algorithms

the isodata algorithm to the 80X data set.


Example 5.8 Applying
to a
by Daniel KusSwurm, the isodata algorithm is applied
In this example developed
set of real data.
studies consists of 24 x 24
data set usea in many pattern recognition
A standard characters written by several
binary images or handwritten FORTRAN
pixel digitized
IERING
220 3

handwritten
X
X that has a
that has feature
feature
vector
image of a
digitized binary
Figure 5.14: A
(4, 10, 10, 4, 10,9,11,9).
umber o
by counting the num
obtained from each sample
were at
features center beginning
each of eio
persons. Eight on a line to the
noncharacter pixels was applied to a data
consecutive
The isodata algorithm
Figure 5.14). Figure 5.15 givesth
and X.
boundary positions (see characters 8, O,
of each of the is not used by th
consisting of 15 images class information
set. (The
classes for this data
feature values and chosen w e r e
The parameters
clustering algorithm.)
no.clusters= 4
min.elements= 8
min.dist = 2.5

splitsize =0.5

iter start

maxm e r g e = l

iter.body = 3

four samples.
and the seed points were the first
In the first iteration of the isodata algorithm, after the maximum um
throug
= the Forgy part of the algorithm (steps
(iterstart 5) of iterations of or
pe
numbers S
4),four clusters were obtained, which contained the following
from the three classes:

Number in Number in in
Number in | Number
Cluster 4
Class Cluster 1| Cluster 2 Cluster 3
8 2 13
14 0
1 0 14
NG
discarded. At
At this
NRIules
it is
three
point, the
| han enly
enter
, wiu
tep Number in
lusterx are Number in |
Numlber in |
Numlr i Cluster 3 Cluster 4
Cluster 2
Cter T
13

14 14
0
** no c1.
than orr equal to 2 equal
not greater no.clustera
number of clusters is operation
or step 6, the the merging (sten 7) is
is even, so
mumber (0) of this iterution
and the
clusters are merged.
lHowever, n0 =
exeruted. than no.clusters/2 1.5 and th.
number of clusters is greater
For step 8, the
so go to step
10.
mumber of this iteration is not odd, number
the requested (iter.body=
mumber of iterations is less than
For step 10, the
to step 2.
3) and the clusters changed, so proceed
In the second iteration of the isodata algorithm, the Forgy part of the algorithm
does not change the clusters.
For step 5, since no cluster has fewer than min_elements members, none is dis

carded.
For step 6, the number of clusters is not greater than or equal to 2 * no.clusters
and the number (1) of this iteration is odd, so proceed to step 8.
of clusters is greater than no.clusters/2 and the number
For step 8, the number
of this iteration is odd, so execute the
splitting operation (step 9).
For step 9, a split occurs producing the clusters

Number in Number in | Number in| Number in


Class Cluster 1 Cluster 2 Cluster 3 Cluster 4
13 0
11

X 0 14
For step 10, since the
the clusters number of
iterations is less than the r and
changed, proceed
In the third and
to step 2. requested numoe
final iteration of the
algorithm converges isodata of the
after two iterations
to algorithm, the Forgy par
produce the clusters
Number in | Number in
ClassCluster 1 in in
0 Cluster 2 Number Number
Cluster 3 Cluster 4
1 13
X 2 0
0 2
0
14
XX

:
X
O

OO
O

OL 8 9 0 8 9 0

nje

O
xx
XX K
XXO"

T
OL 8 9 0
OL 8 9 0
9Bngo

You might also like