KEMBAR78
Spectral Clustering | PDF | Cluster Analysis | Eigenvalues And Eigenvectors
0% found this document useful (0 votes)
17 views52 pages

Spectral Clustering

The document provides an overview of spectral clustering within the context of machine learning, highlighting its advantages over traditional clustering methods like KMeans. It discusses the importance of similarity functions, kernel methods, and the use of adjacency matrices in clustering analysis. Additionally, it includes recommended bibliography for further reading on the topic.

Uploaded by

Ramón García
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views52 pages

Spectral Clustering

The document provides an overview of spectral clustering within the context of machine learning, highlighting its advantages over traditional clustering methods like KMeans. It discusses the importance of similarity functions, kernel methods, and the use of adjacency matrices in clustering analysis. Additionally, it includes recommended bibliography for further reading on the topic.

Uploaded by

Ramón García
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Spectral Clustering

Machine Learning. Block-2: Kernel Methods

Emilio Parrado-Hernández, emilio.parrado@uc3m.es

emilio.parrado@uc3m.es

March 1, 2022

Acrónimo + Nombre 3 líneas

09
Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 2 / 44


Recommended Bibliography

Bach and Jordan, Learning Spectral Clustering, With Application To


Speech Separation, Jornal of Machine Learning Research, 7, 1963-2001,
2006
Ng, Jordan and Wiess, On Spectral Clustering: Analysis and an
algorithm, NeurIPS 2001
Azran and Ghahramani, A new approach to data driven clustering,
International Conference on Machine Learning (ICML) 2006
Shi and Malik, Normalized cuts and image segmentation, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 3 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 4 / 44


Review of clustering

Clustering is about finding groups in data


Usually framed as unsupervised learning since there is no need of labelled
data
Widely used in exploratory data analysis, or machine learning for humans
Useful in cases where collecting data is easy but labelling data is difficult
Applications:
I Image segmentation
I Speaker/audio segmentation
I Auto-Organisation of a collection of documents
I Data compression
I Biomedical applications like clustering sequences of proteins
I Marketing and sales (groups of customers)
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 5 / 44


Defining groups can be controversial

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 6 / 44


Defining groups can be controversial

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 6 / 44


Defining groups can be controversial

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 6 / 44


Defining groups can be controversial

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 6 / 44


Defining groups can be controversial

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 6 / 44


Notation

Data: n observations with d features


x>
   
1 [x1,1 x1,2 ··· x1,d−1 x1,d ]
 x> 2
  [x2,1 x2,2 ··· x2,d−1 x2,d ]
 x>
   
3
  [x3,1 x3,2 ··· x3,d−1 x3,d ]
X= . = .
   
. . .. .. .. ..

 .   .
   . . . .


 x>n−1
  [x n−1,1 x n−1,2 ··· xn−1,d−1 xn−1,d ] 
x>n [xn,1 xn,2 ··· xn,d−1 xn,d ]

Groups: K. In many clustering algorithms K is an input. Spectral


clustering will provide with methods to select K.

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 7 / 44


Similarity functions and distances
Data samples that are similar should end up in the same group, while
data samples that are different should end up in different groups
The starting point in clustering analysis is the definition of a similarity
function or a distance that measures how close are patterns
I Similarity

→1 xi , xj should be in the same cluster
s(xi , xj )
→0 xi , xj should be in different clusters
I Distance

→0 xi , xj should be in the same cluster
d(xi , xj )
→∞ xi , xj should be in different clusters
Adjacency matrix:
n × n matrix that stores all the
paired similarities or distances
between training data. Usually with
zeroes in the diagonal. All we need to Acrónimo + Nombre 3 líneas

cluster data is here

(emilio.parrado@uc3m.es) March 1, 2022 8 / 44


Links between Kernel Methods and clustering

Kernel methods propose to solve machine learning problems in an


unknown feature space induced by a kernel.
KM are an easy framework to obtain non-linear versions of common
linear algorithms.
Application of the kernel trick to clustering algorithms whose similarity or
distance can be expressed in terms of scalar products: kernel K-means
The kernel function itself can be regarded as a similarity measure (it’s
an scalar product, symmetric and positive semidefinite), therefore the
Kernel Matrix can be used to construct an adjacency matrix that can
be decomposed in clusters by means of spectral clustering techniques.

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 9 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 10 / 44


Review of KMeans

KMeans Algorithm
Input:
I K: number of clusters
I X: data matrix, each observation in a row
Output: K centroids, m1 , m2 , . . . , mK
1 Initialise the centroids at random
2 Assign observations to clusters

xj is assigned to cluster Ck if mk = argminkxj − mk k2

3 Recompute centroids
1 X
mk = xj , k = 1, . . . , K
Nk
xj ∈Ck
Acrónimo + Nombre 3 líneas

4 Iterate 2 and 3 until convergence

(emilio.parrado@uc3m.es) March 1, 2022 11 / 44


Example of KMeans

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 12 / 44


Example of KMeans

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 12 / 44


Less typical geometry

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 13 / 44


Less typical geometry

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 13 / 44


Less typical geometry

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 13 / 44


Less typical geometry

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 13 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 14 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 15 / 44


Intuitive view of spectral clustering
The adjacency matrix contains all the information that we need to split
a dataset in groups.
In kernel methods, the kernel matrix replaces the matrix with the input
data observations (typically the X matrix) as input to the training
algorithm.
The kernel matrix can be considered an adjacency matrix as the kernel
function can be interpreted as a scalar product in feature space
Think about an ideal kernel or similarity function for this dataset:

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 16 / 44


Ideal kernel matrix

 
1 1 1 1 1 0 0 0 0 0 0 0

 1 1 1 1 1 0 0 0 0 0 0 0 


 1 1 1 1 1 0 0 0 0 0 0 0 


 1 1 1 1 1 0 0 0 0 0 0 0 


 1 1 1 1 1 0 0 0 0 0 0 0 

 0 0 0 0 0 1 1 1 1 0 0 0 
K= 

 0 0 0 0 0 1 1 1 1 0 0 0 


 0 0 0 0 0 1 1 1 1 0 0 0 


 0 0 0 0 0 1 1 1 1 0 0 0 


 0 0 0 0 0 0 0 0 0 1 1 1 

 0 0 0 0 0 0 0 0 0 1 1 1 
0 0 0 0 0 0 0 0 0 1 1 1
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 17 / 44


Spectral Analysis of the ideal kernel matrix

   >
−0.45 0 0 −0.45 0 0

 −0.45 0 0 


 −0.45 0 0 


 −0.45 0 0 


 −0.45 0 0 


 −0.45 0 0 


 −0.45 0 0 

 −0.45 0 0 
 5
 −0.45 0 0 
 0 0  
 0 −0.5 0 
 0
 0 −0.5 0 
K= 4 0  
 0 −0.5 0 
 0
 0 −0.5 0 
 0 3  

 0 −0.5 0 


 0 −0.5 0 


 0 −0.5 0 


 0 −0.5 0 


 0 0 −0.58 


 0 0 −0.58 

 0 0 −0.58   0 0 −0.58 
0 0 −0.58 0 0 −0.58
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 18 / 44


Clusters in the spectrum

The structure of the spectrum of the Kernel Matrix with this


ideal kernel reflects the cluster structure within data
I The number of non-zero eigenvalues equals the number of clusters
I Rows of the eigenvectors matrix are identical for points in the same
cluster
Real kernel:
I Kernel of instances in the same cluster will not always reach a maximum
value of one
I Kernel of instances in different clusters will not be zero
Matrix perturbation analysis of the ideal situation leads to spectral
clustering: a means of retrieving the underlying clusters from the
spectrum of the kernel matrix of a non-ideal kernel.
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 19 / 44


Real kernel matrices

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 20 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 21 / 44


Links with graph partitioning

The affinity matrix and training data samples define a graph in which:
I Each training sample becomes a node in the graph
I A positive value in position A(i, j) of the affinity matrix is the edge
connecting the nodes of the graph corresponding to xi and xj .
Therefore, finding clusters in the data can be addressed as cutting the
graph in subgraphs by removing weak edges that connect disjoint subsets
of heavily connected nodes.
Normalised-Cut graph partitioning

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 22 / 44


Partition graph by cutting the blue edge

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 23 / 44


Normalized cut algorithm

Given two sets of nodes S1 and S2 , define W (S1 , S2 ) as the weight of the
edges connecting S1 and S2 :
X X
W (S1 , S2 ) = Ai,j
xi ∈S1 xj ∈S2

For a graph with a set of nodes V we define the Normalised Cut that
partitions V in K disjoint clusters
K
X W (Vk , V \ Vk )
V = ∪K
k=1 Vk as
W (Vk , V )
k=1

Finding the optimal normalized cut is equivalent to an eigenvalue


problem with the Laplacian Matrix of the graph
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 24 / 44


Example with python notebook

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 25 / 44


Alternative normalized cut formulation

K K
X W (Vk , V \ Vk ) X e>
k (D − A)ek
=
k=1
W (Vk , V )
k=1
e>
k Dek

where each ek is a vector with n components that encodes cluster Ck as



1 if xi ∈ Ck
(ek )i = ,
0 otherwise

A is the n × n adjacency matrix and D is a diagonal matrix with the sum of


all the elements in the rows of A
n
X
Dii = Aij
j=1
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 26 / 44


Alternative normalized cut formulation

The denominator:
W (Vk , V ) = e>
k Dek

each element Dii in the diagonal of D is the weight of all the edges
connecting xi with the other observations, so e>
k Dek sums all the Dii
that correspond to elements in the same cluster.
The numerator
W (Vk , V \ Vk ) = e>
k (D − A)ek

accounts for the weight of the connections of the elements in each cluster
with the elements not in the cluster (the edges we would cut if we decide
to apply that partition). e>
k Dek contains the weight of all the edges
connnecting the elements of the cluster with the whole dataset; e> k Aek
contains the weight of the edges among members of the cluster, that is
Ai,j will enter the sum as long as (ek )i = 1 and (ek )j = 1, that is, xi and
xj are members of Ck . Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 27 / 44


Normalized cut minimization
The minimization of
K
X e> (D − A)ek
k

k=1
e>
k Dek

can be relaxed into an eigenspectrum problem:


minimize e>k (D − A)ek implies look for eigenvectors of D − A with a
small eigenvalue
maximize e> > −1
k Dek is equivalent to minimize ek D ek , that is, look for
eigenvectors of D with small eigenvalue.
Combining both goals:

min e>
kD
−1/2
(D − A)D−1/2 ek =

min I − e>
kD
−1/2
AD−1/2 ek
Acrónimo + Nombre 3 líneas

which is equivalent to find the eigenvectors of D−1/2 AD−1/2 with higher


eigenvalues

(emilio.parrado@uc3m.es) March 1, 2022 28 / 44


Spectral Clustering [Relaxed] Algorithm

Input:
I K: number of clusters
I A: Affinity matrix
Output: K clusters, each cluster being a list of input observations
C1 , C2 , . . . , CK
1 Zero the elements in the diagonal of A
Pn
2 Construct a diagonal matrix D where the element Di,i = j=1 Ai,j
−1/2 −1/2
3 Construct Laplacian of the graph L = D AD
4 Find the K eigenvectors of L with largest eigenvalue, z1 , z2 , . . . , zK
5 Form matrix Z stacking the eigenvectors in columns Zn×K = [z1 , . . . , zK ]
Z
6 Form matrix Y normalising Z row-wise Yi,j = √Pi,j 2
j Zi,j
7 Perform a Kmeans clustering in the rows of Y and assign instance xj to Acrónimo + Nombre 3 líneas

cluster Ck if the j-th row of Y falls in cluster Ck

(emilio.parrado@uc3m.es) March 1, 2022 29 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 30 / 44


A random walk view of Spectral Clustering

Consider data samples (nodes in the graph) are locations.


Edges in the graph (kernel evaluations of pairs of samples) can be
regarded as easiness to travel between the nodes they connect. Given a
node xi and two neighbours xj and xk , κ(xi , xj ) > κ(xi , xk ) means that
it’s easier to travel from xi to xj than from xi to xk
A normalised Affinity Matrix so that all the rows sum up to one gives the
probabilities of travelling from each node to its neighbours in one jump.
I Define A as a kernel matrix with zeros in the diagonal.
I DefineP D as the diagonal matrix with the sum of the elements in each row:
Dii = j Aij
I Then P = D−1 A gives the probabilities of transitioning from each node to
its neighbours in one jump
I Pij is the probability of being in node xi and jumping to xj
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 31 / 44


A random walk view of Spectral Clustering

If a traveller starts in each node, after M iterations of the random walk


algorithm (jump from another node following the probabilities of P ) the
distribution of travellers in the nodes is given by

Pi,j (M ) = Probability of i-th traveller being in node xj after M jumps


M
Pi,j (M ) = Pi,j
If the graph is clearly divided into clusters, random walk exploration
would end up pointing that travellers that start in one cluster seldom
change cluster; they rather wander around within the nodes of the same
cluster
After a large number of iterations, probability distributions of travellers
that started in the same cluster will be similar Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 32 / 44


Affinity matrix and transition matrix
 
0 0.8 0 0.7 0 0 0

 0.8 0 0 0.8 0 0 0 


 0 0 0 0.9 0 0 0 

A=
 0.7 0.8 0.9 0 0.05 0 0 


 0 0 0 0.05 0 0.7 0.9 

 0 0 0 0 0.7 0 0.8 
0 0 0 0 0.9 0.8 0

 
0. 0.53 0. 0.47 0. 0. 0.

 0.5 0. 0. 0.5 0. 0. 0. 


 0. 0. 0. 1. 0. 0. 0. 

P =
 0.29 0.33 0.37 0. 0.02 0. 0. 


 0. 0. 0. 0.03 0. 0.42 0.55 

 0. 0. 0. 0. 0.47 0. 0.53 
Acrónimo + Nombre 3 líneas

0. 0. 0. 0. 0.53 0.47 0.

(emilio.parrado@uc3m.es) March 1, 2022 33 / 44


Transition matrices after M jumps

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 34 / 44


Retrieve clusters from the transition matrix

Identify clusters of rows of the matrix with the probabilities of


travellers ending in each node.
K-means with the rows of P M as data
Rows are discrete probabilities, therefore a natural distance can be the
Kullback-Leibler divergence:

KL divergence
If Qk is a prototype (centroid) for the algorithm K-prototypes (K-means with
the rows of P M ), then the distance with a row of P M can be computed as:
n
 X PijM
KL PiM ||Qk = PijM ln
j=1
Qkj

where PiM is the i-th row of P M Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 35 / 44


K-Prototypes Algorithm

Input:
I K: number of clusters
I P M : transition matrix after M iterations in the graph
Output:
I K clusters, each cluster being a list of input observations C1 , C2 , . . . , CK
I K Prototypes Q1 , Q2 , . . . , QK (centroids)
1 Initialize the prototypes (for instace choose K rows of P M at random)
2 Assign observations to clusters. Observation xi goes to cluster Ck if

k = argmink KL PiM ||Qk




3 Recompute prototypes

Qnew = average PiM : PiM ∈ Vk ,



k k = 1, . . . , K
Acrónimo + Nombre 3 líneas

4 Iterate 2 and 3 until convergence

(emilio.parrado@uc3m.es) March 1, 2022 36 / 44


K Prototypes Algorithm and hierarchical clustering
M is critical to retrieve the proper cluster structure
As M increases the final probability distribution of each node converges
to a steady state that does not change although M keeps on increasing

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 37 / 44


Determine M and K from the training data

Eigenvalues of P are smaller than 1. If (λi , vi ) are pairs (eigenvalue,


eigenvector) of P , then you can prove that |λi | ≤ 1, i = 1, . . . , n.
  > 
λ1 0 . . . 0 v1
  0 λ 2 . . . 0   v2> 
P = v1 v2 . . . vn  .
  
.. . .   .. 
 .. . . 0   . 
0 0 ... λn vn>
Since

v1>
 
 v2>  
 v1 v2 ... vn =I
 
 ..
 . 
vn>
Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 38 / 44


Determine M and K from the training data

then
 >
λM
 
1 0 ... 0 v1
 0 λM
2 ... 0   v2> 
PM =

v1 v2 ... vn
  
.. .. ..  .
0   ..
 
 . . . 
0 0 ... λM
n vn>

The spectrum of P M is formed by (λM


i , vi ), i = 1, . . . , n

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 39 / 44


Interpretation of the spectrum of P M
Each pair (λi , vi ) of P represents a data structure that can be unvealed
running random walks in the graph.
If |λi | is close to 1, that structure will survive the random walks and
unveal stable clusters of nodes of the graph. In that sense, the smaller the
value of |λi |, the sooner this value will decay as M increases and therefore
that data structure will dissolve into more stable clusters
Learn M given K:
Find the value of M that yields a robuster structure of K clusters. In
other words, it has to maximize the distance between eigenvalues λM K and
λM
K+1

MK = argmaxM even |λK |M + |λK+1 |M


 
log(|λK+1 |)

log log(|λK |)
MK =  log(|λK |)
 Acrónimo + Nombre 3 líneas

log(|λK+1 |)
M even

(emilio.parrado@uc3m.es) March 1, 2022 40 / 44


Index

1 Introduction to Spectral Clustering


Review of clustering

2 KMeans limitations

3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering

4 Tricks and rules of thumb


Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 41 / 44


Selection of the number of clusters

Clustering is a sort of an art within machine learning in the sense that it


involves a lot of human intervention
In fact, exploratory data analysis is more human oriented than machine
oriented.
In K-means the number of clusters has to be input by the user with prior
domain knowledge
Spectral clustering provides more hints to select an appropriate number of
clusters related with the number of significant eigenvalues in the spectrum
of the Laplacian or affinity matrices that are used to partition the data.
Many rules of thumb are proposed along these ideas.

Acrónimo + Nombre 3 líneas

(emilio.parrado@uc3m.es) March 1, 2022 42 / 44


Initialisation of the centroids/prototypes, etc

Spectral clustering methods resource to a core implementation of Kmeans


in the space of the rows of the matrix formed stacking as columns the
eigenvectors of the Laplacian matrix (or of the P M matrix in the random
walks case).
Kmeans converges: it reaches a number of iterations after which the
configuration of the clusters doesn’t change. But its convergence is to a
local minimum.
Therefore the initialisation of the centroids is critical to ensure this local
minimum is good enough for our purposes.
There are some heuristic strategies that aim at covering most of the space
covered by the training samples in the selection of the centroids. Two of
these accepted heuristics are:
1 Choose at random K training samples as initial centroids
2 Choose as first centroid the sample average of the training samples. Then
centroids 2, . . . , K are selected as the training samples that are further
Acrónimo + Nombre 3 líneas

from the previously selected centroids.

(emilio.parrado@uc3m.es) March 1, 2022 43 / 44


Evaluation of the clustering quality
The evaluation of the quality of a clustering is usually very subjective. In
exploratory data contexts one asks the clusters to be meaningful and
aligned with the story that the data are telling, but there is a lack of
supervised elements that enable the implementation of objective
performance measures.
Most clustering applications end up discussing dispersion, a measure of
compactness of the cluster that gives an idea about how similar are the
members of a same cluster. And how different they are from members of
other clusters.
A commonly extend method is to run several clustering algorithms in
parallel and compare their outcome. The underlying principle here
resembles that of ensemble methods. If two samples are similar, they are
likely to end up as members of the same cluster independently. Mutual
Information Gain can be used to compare two partitions
NC X
NG  
X p(x ∈ Ck , x ∈ Gj ) Acrónimo + Nombre 3 líneas

I(C; G) = p(x ∈ Ck , x ∈ Gj ) log


j=1
p(x ∈ Ck )p(x ∈ Gj )
k=1

(emilio.parrado@uc3m.es) March 1, 2022 44 / 44

You might also like