KEMBAR78
Clustering | PDF | Statistical Classification | Artificial Intelligence
0% found this document useful (0 votes)
19 views104 pages

Clustering

Uploaded by

rotwiler38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views104 pages

Clustering

Uploaded by

rotwiler38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

Pattern Recognition:

Clustering

For BCSE Final Year


Jadavpur University
Pattern Classification by
Distance Functions
The method of pattern classification by
distance functions can be expected to yield
practical and satisfactory results only when
the pattern classes tend to have clustering
properties
Contd..

Figure 1. Patterns classifiable by


proximity concept
Contd..

Figure 2. Patterns not easily classifiable


by proximity concept
Contd..
Unlike Fig. 2, in Fig. 1, the patterns of each
class tend to cluster tightly about a typical or
representative pattern for the class. This
occurs in case where pattern variability and
other corruptive influences are well behaved.

A typical example is the problem of reading


bank checks by machine.
Contd..
Here the proximity of an unknown pattern to
the patterns of a class will serve as a measure
for its classification, so the method is called
minimum-distance pattern classification. And
the system which does this is called minimum-
distance classifiers.

x
Single Prototypes
In this case, the patterns of each class tend to
cluster tightly about a typical or
representative pattern for that class. Under
these conditions, minimum-distance classifiers
can constitute a very effective approach to the
classification problem.
Contd..
Consider M pattern classes and assume that
these classes are representable by prototype
patterns Z 1,Z 2,...,Z M .

The Euclidean distance between an arbitrary


pattern vector x and the ith prototype is given
by,

D i  X   Z i  ( X  Z i )( X  Z i ) ………………….(1)
Contd..
A minimum-distance classifier computes the
distance from a pattern x of unknown
classification to the prototype of each class,
and assigns the pattern to the class to which it
is closest.
Contd..
In other words, x is assigned to class wi if
Di < Dj for all i≠j. Ties are resolved arbitrarily.

Equation (1) may be expressed as

D i 2  X Z i 2
 ( X  Z i )( X  Z i )
Contd..
  
 X X  2 X Z iZ i Z i
1 
 X X  2( X Z i  Z i Z i )
2
Contd..
Choosing the minimum Di2 is equivalent to
choosing the minimum Di since all distances
are positive, since the term XtX is independent
of i, choosing the minimum Dt2 is equivalent to
choosing the maximum (XtZi – 1/2xZitZi)
Consequently, we may define the decision
functions..
Contd..
1  ……….(2)
d i ( X )  X Z i  Z i Z i , i  1,2,..., M
2
where x is assigned to class wi , if d i ( x)  d j ( x)
for all j  i.

Observe that d i ( X ) is a linear decision


function; that is, if Z ij , j  1,2,..., n, are the
components of Z i , and we let..
Contd..
 x1 
 
 x2 
. 
wij  z ij , j  1,2,..., n wi.n1   1 Z i  Z i and X 
.


2  
 .x n 
1 
 
then we may express Eq. (2) in the familiar
linear form

d i ( X ) W i X , i  1,2,..., M where W i ( w i1 ,w i 2 ,....,w i.n1 ).
Contd..
The decision boundary of a two-class
example in which each class is characterized
by a single prototype is shown below.

Figure 3. Decision boundary of two classes


characterized by single prototype
Contd..
It can be shown that the linear decision
surface separating every pair of prototype
points zi and zj is the hyperplane which is the
perpendicular bisector of the line segment
joining the two points. Thus the minimum-
distance-classifiers are a special case of linear
classifiers.
Multiprototypes
Here each class is characterized by several
prototypes, that is, each pattern of class wi
tends to cluster about one of the prototypes
Z i 1, Z i 2 ,..., Z i N , where N i is the number of
i

prototypes in the ith pattern class.

Let the distance function between an


arbitrary pattern x and class wi be denoted by

D i  min X  Z i l , l  1,2,..., N i
Contd..
Following the development for single prototype,
the decision functions
1 l
d i ( X )  max{( X Z i )  ( Z i )Z i l }, l  1,2,..., N i
l

2
where, as before, x is placed in class w i ,if
d i ( X )  d j ( X ), for all j  i.

The decision boundaries for a two-class case


in which each class contains two prototypes
are shown in the next slide.
Contd..

Figure 2. Piecewise-linear decision boundaries


for two classes, each of which is characterized
by two prototypes
Contd..
Note that the boundaries between classes w i
w j and are piecewise linear. Since we could
have defined this as a single-prototype, four-
class problem, the sections of the boundaries
are the perpendicular bisectors of the lines
joining the prototypes of different classes.
Contd..
Although general iterative algorithms exist
which can be used in the calculation of linear
decision function parameters, unfortunately,
no truly general algorithm is yet known for the
piecewise-linear classifier decision function.
Cluster Seeking

It is evident that the ability to determine


characteristic prototypes or cluster centers
in a given set of data plays a central role in
the design of pattern classifiers based on
the minimum-distance concept. So we need
to study various cluster-seeking methods.
Contd..
Note that the performance of a given
algorithm is not only dependent on the type of
data being analyzed, but is also strongly
influenced by the chosen measure of pattern
similarity
Measures of Similarity
To define a data cluster, it is necessary to first
define a measure of similarity which will
establish a rule for assigning patterns to the
domain of a particular cluster center. Earlier
we considered the Euclidean distance between
two patterns x and z:
D X Z
as a measure of their similarity----the smaller
the distance, the greater the similarity.
Contd..
There are, however, other meaningful
distance measures which are sometimes
useful. For example, the Mahalanobis distance
from x to m.
D  ( X  m)C 1 ( X  m) ………………(1)
Contd..
It is a useful measure of similarity when
statistical properties are being explicitly
considered. Here C is the covariance matrix of
a pattern population, m is the mean vector,
and x represents a variable pattern.
Contd..
Measures of similarity need not be restricted
to distance measures. For example, the non-
metric similarity function:
XZ
S(X , Z)  ………………….(2)
X Z
which is the cosine of the angle between
the vectors x and z, is maximum when
x and z are oriented in the same direction
with respect to the origin.
Contd..
This measure of similarity is useful when
cluster regions tend to develop along
principal axes, as shown below.
x z 1
s ( x, z 1 )  cos  1 
x z2
x z 2
s ( x, z 2 )  cos  2 
x z2

Figure 4. Illustration of a similarity measure


Contd..

Here the use of this similarity measure is


governed by certain qualifications, such as
sufficient separation of cluster regions with
respect to each other as well as with respect
to the coordinate system origin.
Contd..
When the patterns under consideration are
binary valued with 0, 1 elements, the
similarity function of Eq. (2) may be given an
interesting non-geometrical interpretation.
Contd..
We say that a binary pattern x possesses the
ith attribute if x Then the term XtZ in Eq. (2) is
simply the number of attributes shared by x
and z, while X Z  ( X X )( Z Z ) is the geometric
mean of the number of attributes possessed
by x and the number possessed by z. In this
case the similarity function s ( X , Z ) is,
therefore, seen to be a measure of common
attributes possessed by the binary vectors x
and z.
Contd..
A binary variation of Eq. (2) which has been
widely used in information retrieval, nosology
(classification of diseases), and taxonomy
(classification of plants and animals) is the so-
called Tanimoto measure, which is given by

XZ
S(X , Z) 
X X  Z Z  X  Z
Clustering Criteria
After a measure of pattern similarity has
been adopted, we are still faced with the
problem of specifying a procedure for
partitioning the given data into cluster
domains.
Contd..
The clustering criterion used may represent a
heuristic scheme, or it may be based on the
minimization (or maximization) of a certain
performance index. The heuristic approach is
guided by intuition and experience.
Contd..

The Euclidean distance is generally used as


measure of proximity. A suitable threshold is
also necessary in order to define degrees of
acceptable similarity in the cluster-seeking
process.
Contd..
The performance-index (sometimes called
objective function) approach is guided by the
development of a procedure which will
minimize or maximize the chosen performance
index. One of the most often used indices is
the sum of the squared errors index, given
by..
Contd..
NC
J   X m j 2

j 1 X S j

where N C is the number of clusters, S j is the


set of samples belonging to the jth cluster,
and 1
mj
Nj
X
X S j

is the sample mean vector of set S j


Contd..

N j represents the number of samples in Sj.


The index J represents the overall sum of the
.

squared errors between the samples of a


cluster domain and their corresponding mean.
The K-means clustering algorithm is based on
this performance index.
Contd..
Other common performance indices are :
 the average squared distances between
samples in a cluster domain
 the average squared distances between
samples in different cluster domains
 indices based on the scatter matrix concept,

 minimum and maximum-variance indices

 a score of other performance measures which


have been used throughout the years.
Contd..

It is not uncommon to find a cluster-seeking


algorithm that represents a combination of the
heuristic and performance index approaches.
The Isodata clustering algorithm is such a
combination.
A Simple Cluster-Seeking
Algorithm
Suppose that we have a set of N sample
patterns X 1, X 2,..., X N . Let the first cluster
center Z1=X1(or any other pattern) and select
.
an arbitrary nonnegative threshold T.
Next, we compute the distance D 21from X 2 to Z 2
If this distance exceeds T, a new cluster
center, Z 2  X 2, is started. Otherwise, we assign
X 2 to the domain of cluster center Z 1.Suppose
that D 21  T so that Z 2 is established. In the next
step, the distances D 31 and D 32 from X 3 to Z 1
and Z 2 are computed.
Contd..
If both D 31 and D 32 are greater than T, a new
cluster center, Z 3 X 3, is created. Otherwise,
we assign X 3 to the domain of the cluster
center to which it is closest. This process is
repeated for all patterns.
Contd..

Figure 5. Effect of threshold in clustering


Contd..
The results of the foregoing procedure
depend on
 the first cluster center chosen,
 the order in which the patterns are
considered,
 the value of T, and,
 the geometrical properties of the data.
Contd..
Some intuitive idea for choosing a suitable
value of T may be based on

 MST of data points


 Intra-cluster distance
 Inter-cluster distance
 The closest and farthest points in cluster from
the cluster center and
 the variance of these cluster domains
Contd..

This procedure may be expected to yield


useful results in situations where the data
exhibit characteristic "pockets" which are
reasonably well separated with respect to the
chosen values of the threshold.
Maximin-Distance
Algorithm
The maximum-distance algorithm is another
simple heuristic procedure based on the
Euclidean distance concept. Consider the ten
two-dimensional samples shown in the Figure
below.
Contd..

The table is initially empty. Then, in the first step,


we arbitrarily let X 1 become the first cluster
center, designated by Z 1 .
Contd..
 Next, we determine the farthest sample from X 1 ,
which in this case is X 6 , and call it cluster center Z 2
 In the third step we compute the distance from
, each remaining sample to Z 1 and Z 2 . For every pair
of these computations we save the minimum
distance. Then, we select the maximum of these
minimum distances. If this distance is an appreciable
fraction (say, at least half or more), of the distance
between cluster centers Z 1 and Z 2 we call the Z 3
corresponding sample cluster center. Otherwise the
algorithm is terminated.
K-means Algorithm
Clustering is an unsupervised technique
used in discovering inherent structure
present in the set of patterns. In fact,
clustering techniques aim to extract the
groups present in a given data set and
each such group is termed as a cluster.

Let the set of patterns be S  x1 , x 2 , ...., x n   m ,


where xi is the ith pattern vector, n is the
total number of patterns and m is the
dimensionality of the feature space.
Contd..
Let the number of clusters be K . If the
clusters are represented by C1 , C 2 , ........ , C K
then we assume
.

P1. Ci   , for i  1, 2, ........ , K


P2. Ci  C j   for i  j and
P3.  iK1 C i  S where  represents null set.
Contd..
Clustering techniques may broadly be divided
into two categories: Hierarchical and non-
hierarchical. The non-hierarchical or partitional
clustering problem deals with obtaining an
optimal partition of into subsets such that
some clustering criterion is satisfied.
Contd..
Among the partitional clustering techniques,
the -Means algorithm has been one of the
most widely used algorithms. Here the value
of K needs to be known a priori. The principle
used for clustering by K-means algorithm is to
minimize the sum of intraclass distances to
get the optimal clusters. Mathematically this
principle has been stated below.
Contd..

1.Let C1 , C 2 , ........ , C K be a set of K clusters of S


 
  x
 xC 
2.Let zj    for j .  1,2,.....K . where # C j
j

#Cj
represents the number of points in C j
Contd..
K 2

3.Let f (C1 , C2 , ........ , C K )    x  z j


j 1 xC j

f (C1 , C2 , ........ , C K ) is referred as the objective


function of the clustering C1 , C 2 , ........ , C K
4. Minimize f (C1 , C2 , ........ , C K ) over all such
C1 , C 2 , ........ , C K where C1 , C 2 , ........ , C K
satisfy P1, P2 and P3 stated earlier.
Contd..
All possible clusterings of S are to be
considered to get the optimal C1 , C 2 ,......, C k .
So obtaining the exact solution of the problem
is theoretically possible, yet not feasible in
practice due to limitations of computer storage
and time. One requires the evaluation of
S (n, k ) partitions if exhaustive enumeration is
used to solve the problem, where

1 k k j  k  n
S (n, k )   (1)   j .
k! j 1  j
This clearly indicates that exhaustive
enumeration cannot lead to the required
solution for most practical problems in
reasonable computation time. For example,
for the crude-oil data, the exact solution
requires the examination of S (56,3)  1018
partitions.
Contd..
Thus, approximate heuristic techniques
seeking a compromise or looking for an
acceptable solution have usually been
adopted. One such method is the Forgy's K-
means algorithm.
Algorithm K-means
Step 1 : Select an initial cluster configuration.
Repeat
Step 2 : Calculate cluster centers z j , j  1,2,.....K .
of the existing groups.
Step 3 : Redistribute patterns among clusters
utilizing the minimum squared Euclidean distance
classifier concept2 :
xi  C jif xi  z j  xi  zl l  {1,2,....K }, l  j
2

Until (there is no change in cluster centers)


End
Contd..
The behavior of the K-means algorithm is
influenced by
 the number of cluster centers specified,
 the choice of initial cluster centers,
 the order in which the samples are taken, and,
 the geometrical properties of the data.
Contd..
Although no general proof of convergence
exists for this algorithm, it can be expected to
yield acceptable results when the data exhibit
characteristic pockets which are relatively far
from each other.
Isodata Algorithm
The Isodata (Iterative Self-Organizing Data
Analysis) is similar in principle to the K-means
procedure in the sense that cluster centers are
iteratively determined sample means. Unlike
the latter algorithm, however, Isodata
represents a fairly comprehensive set of
additional heuristic procedures which have
been incorporated into an interactive scheme.

Before executing the algorithm it is necessary


to specify a set N C of initial cluster centers
Z 1, Z 2 ,..., Z N . For a set of N samples, { X 1, X 2,..., X N },
Isodata consists of the following steps..
C
Contd..
Step 1. Specify the following process
parameters:
K = number of cluster centers desired;
 N = a parameter against which the number of
samples in a cluster domain is compared;
 s = standard deviation parameter;
 C = lumping parameter;
L = maximum number of pairs of cluster
centers which can be lumped;
I = number of iterations allowed.
Contd..
Step 2. Distribute the N samples among the
present cluster centers, using the relation
X S j if X  Z j  X  Z i , i  1,2,..., N C ; i  j
for all x in the sample set. In this notation,
S j represents the subset of samples assigned to
cluster center Z j .
Step 3. Discard sample subsets with fewer
than  N members; that is, if for any j , N j  N ,
discard S j and reduce N C by 1.
Contd..
Step 4. Update each cluster center by
Z j , j  1,2,..., N C , setting it equal to the sample
mean of its corresponding set S j; that is,
1
Z j
Nj
 X,
X S j
j  1,2,..., N C

where N j is the number of samples in S j


Step 5. Compute the average distance D j
of samples in cluster domain S j from their
corresponding cluster center, using the
relation 1
D j
Nj

X S j
X Z j , j  1,2,..., N C
Contd..
Step 6. Compute the overall average
distance of the samples from their
respective cluster centers, using the
relation

NC
1
D
N
N
j 1
j Dj
Contd..
Step 7.
(a) If this is the last iteration, set  C 0
and go to Step 11.
(b) If N C  K / 2, go to Step 8.
(c) If this is an even-numbered iteration,
or if N C  2K , go to Step 11; otherwise,
continue
Contd..
Step 8. Find the standard deviation vector
 j ( 1 j, 2 j,..., nj) for each sample subset, using
the relation 1
 ij
N
(X
X S j
ik
2
 Z ij ) , i  1,2,..., n; j  1,2,..., N C

where n is the sample dimensionality,x ik is the


ith component of the kth sample in S j , Z ij is the
ith component of Z j , and N j is the number of
samples in S j Each component of  j represents
the standard deviation of the samples in S j
along a principal coordinate axis.
Contd..
Step 9. Find the maximum component of each
 j, j  1,2,..., N C and denote it by 
. j max
Step 10. If for any  j max, j  1,2,..., N C , we have  j max S
and
.

(a) D j  D and N j  2( N1) or


(b) N C  K / 2
Contd..

then split Z j into two new cluster centers Z j and Z j

, delete Z j , and increase N C by 1.


Cluster center Z j is formed by adding a given
quantity y j to the component of Z j which
,

corresponds to the maximum component of  j


Z j is formed by subtracting y j from the same

component of Z j. One way of specifying y j is to let


it be equal to some fraction of  j max, that is,
y j  k j max, where 0  k  1.
Contd..
The basic requirement in choosing yj is that it
be sufficient to provide a detectable difference
in the distance from an arbitrary sample to the
two new cluster centers, but not so large as to
change the overall cluster domain
arrangement appreciably.
If splitting took place in this step, go to Step
2; otherwise continue.
Contd..
Step 11. Compute the pairwise distances D ij
between all cluster centers :

D ij  Z i  Z j , i  1,2,..., N C 1; j  i  1,..., N C


Contd..
Step 12. Compare the distances D ij against
the parameter  C . Arrange the L smallest
distances which are less than  C in
ascending order:
[ D i1 j1 , D i 2 j 2 ,..., D iLjL ]
where D i j  D i j  D iL jL and L is the maximum
1 1 2 2

number of pairs of cluster centers which can


be lumped together. The lumping process is
discussed in the next step.
Contd..
Step 13. With each distance D iljl there is
associated a pair of cluster centers Z il and Z jl
Starting with the smallest of these distances,
perform a pairwise lumping operation according
to the following rule :
For l  1,2,..., L, if neither Z il nor Z jl has been used
in lumping in this iteration, merge these two
cluster centers using the following relation:
 1
Zl  [ N il ( Z il )  N jl ( Z jl )]
N il  N jl

Delete Z il and Z jl and reduce N C by 1.


Contd..
It is noted that only pairwise lumping is
allowed and that a lumped cluster center is
obtained by weighting each old cluster center
by the number of samples in its domain.
Experimental evidence indicates that more
complex lumping can produce unsatisfactory
results. The above procedure makes the
lumped cluster centers representative of the
true average point of the combined subsets. It
is also important to note that, since a cluster
center can be lumped only once, this step will
not always result in L lumped centers.
Contd..
Step 14. If this is the last iteration, the
algorithm terminates. Otherwise go to Step
1 if any of the process parameters requires
changing at the user's discretion, or go to
Step 2 if the parameters are to remain the
same for the next iteration. An iteration is
counted every time the procedure returns to
Step 1 or 2.
Example
Example: Let the patterns be {(0,0), (1,1),
(2,2), (4,3), (4,4), (5,3), (5,4), (6,5)}

In this case N  8 and n  2 . Suppose that


we initially let N C  1, Z 1 (0,0),and specify the
following parameters:

Step 1.K  2,  N 1,  S 1,  C 4, L  0, I 4
Contd..
If no a priori information on the data being
analyzed is available, these parameters are
arbitrarily chosen and then adjusted during
successive iterations through the algorithm.
Step 2. Since there is only one cluster
center,

S 1 { X 1, X 2,,..., X 8} and N 1 8

Step 3. Since N 1 N,no subsets are discarded.


Contd..
Step 4. Update the cluster centers
1  3.38 
Z 1  X   
N1 X S 1  2.75 

Step 5. ComputeD j :
1
D 1
N1

X S 1
X  Z 1  2.26
Contd..
Step 6. Compute D in this case
D  D1  2.26
,

Step 7. Since this is not the last iteration and


:

N c  K / 2 Go to step 8.

Step 8. Find the standard deviation vector for S1


1.99 
 1   
1.56 
Contd..
Step 9. The maximum component of  1 is
1.99 ; hence,  1 max 1.99.
Step 10. Since  1 max S and N C  K / 2, we
split Z 1 into two new clusters. Following the
procedure described in Step 10, suppose
that we let y j  0.5 j max 1.0. Then,

  4.38    2.38 
Z 1   , Z 1   
 2.75   2.75 
Contd..
For convenience these two cluster centers
are renamed Z 1 and Z 2 respectively. Also, N C
is increased by 1. Go to Step 2.
Step 2. The sample sets are now
S 1 { X 4, X 5, X 6, X 7, X 8}, S 2  { X 1, X 2, X 3}
and N 1 5, N 2 3.
Step 3. Since both N 1 and N 2 are greater
than  N, no subsets are discarded.
Contd..
Step 4. Update the cluster centers :

1  4.80  1 1.00 
Z 1  X   , Z 2   
N1 X S 1  3.80  N2 X S 2 1.00 

Step 5. Compute D j , j  1,2 :

1 1
D 1
N1

X S 1
X  Z 1  0.80, D 2
N2

X S 2
X  Z 2  0.94
Contd..
Step 6. Compute D :
NC
1 1 2
D
N

j 1
N j D j   N j D j  0.85
8 j 1

Step 7. Since this is an even-numbered


iteration, condition (c) of Step 7 is satisfied.
Therefore, go to Step 11.
Contd..
Step 11. Compute the pairwise distances:
D 12  Z 1 Z 2  4.72
Step 12. Compare D12 to  C. In this case, D12  C.
Step 13. From the results of Step 12, we see
that no lumping of cluster centers can take
place.
Step 14. Since this is not the last iteration,
we are faced with the decision of whether or
not to make an alteration in the parameters.
Since, in this simple example,
Contd..
1. we have obtained the desired number of
clusters,
2. their separation is greater than the average
spread indicated by the standard deviations,
and
3. each cluster subset contains a significant
percentage of the total number of samples,
we arrive at the conclusion that the cluster
centers are representative of the data.
Therefore, we return to Step 2.
Contd..
Steps 2-6 : yield the same results as in the
previous iteration
Step 7. None of the conditions in this step is
satisfied. Therefore, we proceed to Step 8.
Step 8. Compute the standard deviation of sets
S 1 { X 4, X 5, X 6, X 7, X 8} and S 2  { X 1, X 2, X 3} :

 0.75   0.82 
 1    ,  2   
 0.75   0.82 
Contd..
Step 9. In this case  1 max  0.75 and  2 max  0.82 .
Step 10. The conditions for splitting are not
satisfied. Therefore, we proceed to Step 11.
Step 11. We obtain the same result as in the
previous iteration:
D12  Z 1 Z 2  4.72

Step 12. We obtain the same result as in the


Previous iteration.
Contd..
Step 13. We obtain the same result as in
the previous iteration.
Step 14. Nothing new has been added in
this iteration, except the computation of the
standard deviation vectors. Therefore, we
return to Step 2.
Steps 2—6. yield the same results as in
the previous iteration.
Step 7. Since this is the last iteration, we
set  C 0 and go to Step ll.
Contd..
Step 11. D12  Z 1 Z 2  4.72 as before.
Step 12. We obtain the same result as in
the previous iteration.
Step 13. From the results of Step 12, we
see that no lumping can take place.
Step 14. Since this is the last iteration, the
algorithm is terminated
Contd..
It should be clear, even from the above
simple example, that the application of
Isodata to a set of moderately complex
data requires, in general, extensive
experimentation before one can arrive at
meaningful conclusions. However, by
properly organizing the information
obtained in each iteration, it is possible to
gain considerable insight into the structure
of the data.
Evaluation of Clustering Results

The principal difficulty in evaluating the


results of clustering algorithms is inability to
visualize the geometrical properties of a
high-dimensional space.

Several interpretation techniques exist


which allow at least partial insight into the
geometrical properties of the resulting
cluster domains.
Contd..
A very useful interpretation tool is the distance
between cluster centers This information is
best presented in the form of a table.
Contd..

Cluster
Center
Z1 Z2 Z3 Z4 Z5

Z1 0.0 4.8 14.7 2.1 50.6


Z2 0.0 21.1 6.1 48.3
Z3 0.0 15.0 36.7

Z4 0.0 49.3
Z5 0.0

Table 1. Distance Table for Interpreting


Clustering Results
Contd..
Cluster center Z 5 is far removed from the

other cluster centers. If it is known that this


cluster center is associated with numerous
samples, we would accept it as a valid
description of the data, otherwise we might
dismiss this cluster center as representative
of noise samples.
Contd..

If two cluster centers are relatively close,


and one of the centers is associated with a
much larger number of samples, it is often
possible to merge the two cluster domains.
The variances of a cluster domain about its
mean can be used to infer the relative
distribution of the samples in the domain.
Contd..
4

Cluster Variances
Domains
1 2 3 4

S1 1.2 0.9 4
0.7 1.0

2.0 1.3 1.5 0.9


S2
S3 3.7 4.8 7.3 10.4

S4 0.3 0.8 0.7 1.1

S5 4.2 5.4 18.3 3.3

Table 2. Variance Table for Interpreting Clustering Results


Contd..
It is assumed that each variance
component is along the direction of one of
the coordinate axes.

Note :
 Since domain S1 has very similar variances,
it can be expected to be roughly spherical in
nature.
 Cluster domain S5 on the other hand, is
significantly elongated about the third
coordinate axis. A similar analysis can be
carried out for the other domains.
Contd..

This information, coupled with the distance


table and sample numbers, can be of
significant value in interpreting clustering
results.
Graph-Theoretic Approach

Till now, the clusters are determined in such a way


that the intraset distance among each cluster is kept
to a minimum, and the interset distances between
two clusters are made as large as possible.
Contd..
An alternative approach to cluster seeking is to make
use of some basic notion in graph theory. In this
approach, a pattern graph is first constructed from
the given sample patterns, which form the nodes of
the graph. A node j is connected to a node k by an
edge, if the patterns corresponding to these two
nodes are similar or are related.
Pattern X j and pattern X k are said to be similar if the
similarity measure S ( X j, X k ) is greater than a pre-
specified threshold T. The similarity measure may be
used to generate a similarity matrix S, whose
elements are 0 or 1. The similarity matrix provides a
systematic way to construct the pattern graph. Since
cliques of a pattern graph form the clusters of the
patterns, cluster seeking can be accomplished by
detecting the cliques of the pattern graph.
Contd..
Several clique detection algorithms and
programs have been introduced in the
literature. Several such methods exist that
utilize the MST of the data points to get
the clustering.
Thank You

You might also like