Spectral Clustering
Spectral Clustering
emilio.parrado@uc3m.es
March 1, 2022
09
Index
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
KMeans Algorithm
Input:
I K: number of clusters
I X: data matrix, each observation in a row
Output: K centroids, m1 , m2 , . . . , mK
1 Initialise the centroids at random
2 Assign observations to clusters
3 Recompute centroids
1 X
mk = xj , k = 1, . . . , K
Nk
xj ∈Ck
Acrónimo + Nombre 3 líneas
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0
K=
0 0 0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 1 1 1
Acrónimo + Nombre 3 líneas
>
−0.45 0 0 −0.45 0 0
−0.45 0 0
−0.45 0 0
−0.45 0 0
−0.45 0 0
−0.45 0 0
−0.45 0 0
−0.45 0 0
5
−0.45 0 0
0 0
0 −0.5 0
0
0 −0.5 0
K= 4 0
0 −0.5 0
0
0 −0.5 0
0 3
0 −0.5 0
0 −0.5 0
0 −0.5 0
0 −0.5 0
0 0 −0.58
0 0 −0.58
0 0 −0.58 0 0 −0.58
0 0 −0.58 0 0 −0.58
Acrónimo + Nombre 3 líneas
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
The affinity matrix and training data samples define a graph in which:
I Each training sample becomes a node in the graph
I A positive value in position A(i, j) of the affinity matrix is the edge
connecting the nodes of the graph corresponding to xi and xj .
Therefore, finding clusters in the data can be addressed as cutting the
graph in subgraphs by removing weak edges that connect disjoint subsets
of heavily connected nodes.
Normalised-Cut graph partitioning
Given two sets of nodes S1 and S2 , define W (S1 , S2 ) as the weight of the
edges connecting S1 and S2 :
X X
W (S1 , S2 ) = Ai,j
xi ∈S1 xj ∈S2
For a graph with a set of nodes V we define the Normalised Cut that
partitions V in K disjoint clusters
K
X W (Vk , V \ Vk )
V = ∪K
k=1 Vk as
W (Vk , V )
k=1
K K
X W (Vk , V \ Vk ) X e>
k (D − A)ek
=
k=1
W (Vk , V )
k=1
e>
k Dek
The denominator:
W (Vk , V ) = e>
k Dek
each element Dii in the diagonal of D is the weight of all the edges
connecting xi with the other observations, so e>
k Dek sums all the Dii
that correspond to elements in the same cluster.
The numerator
W (Vk , V \ Vk ) = e>
k (D − A)ek
accounts for the weight of the connections of the elements in each cluster
with the elements not in the cluster (the edges we would cut if we decide
to apply that partition). e>
k Dek contains the weight of all the edges
connnecting the elements of the cluster with the whole dataset; e> k Aek
contains the weight of the edges among members of the cluster, that is
Ai,j will enter the sum as long as (ek )i = 1 and (ek )j = 1, that is, xi and
xj are members of Ck . Acrónimo + Nombre 3 líneas
k=1
e>
k Dek
min e>
kD
−1/2
(D − A)D−1/2 ek =
min I − e>
kD
−1/2
AD−1/2 ek
Acrónimo + Nombre 3 líneas
Input:
I K: number of clusters
I A: Affinity matrix
Output: K clusters, each cluster being a list of input observations
C1 , C2 , . . . , CK
1 Zero the elements in the diagonal of A
Pn
2 Construct a diagonal matrix D where the element Di,i = j=1 Ai,j
−1/2 −1/2
3 Construct Laplacian of the graph L = D AD
4 Find the K eigenvectors of L with largest eigenvalue, z1 , z2 , . . . , zK
5 Form matrix Z stacking the eigenvectors in columns Zn×K = [z1 , . . . , zK ]
Z
6 Form matrix Y normalising Z row-wise Yi,j = √Pi,j 2
j Zi,j
7 Perform a Kmeans clustering in the rows of Y and assign instance xj to Acrónimo + Nombre 3 líneas
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering
0. 0.53 0. 0.47 0. 0. 0.
0.5 0. 0. 0.5 0. 0. 0.
0. 0. 0. 1. 0. 0. 0.
P =
0.29 0.33 0.37 0. 0.02 0. 0.
0. 0. 0. 0.03 0. 0.42 0.55
0. 0. 0. 0. 0.47 0. 0.53
Acrónimo + Nombre 3 líneas
0. 0. 0. 0. 0.53 0.47 0.
KL divergence
If Qk is a prototype (centroid) for the algorithm K-prototypes (K-means with
the rows of P M ), then the distance with a row of P M can be computed as:
n
X PijM
KL PiM ||Qk = PijM ln
j=1
Qkj
Input:
I K: number of clusters
I P M : transition matrix after M iterations in the graph
Output:
I K clusters, each cluster being a list of input observations C1 , C2 , . . . , CK
I K Prototypes Q1 , Q2 , . . . , QK (centroids)
1 Initialize the prototypes (for instace choose K rows of P M at random)
2 Assign observations to clusters. Observation xi goes to cluster Ck if
3 Recompute prototypes
v1>
v2>
v1 v2 ... vn =I
..
.
vn>
Acrónimo + Nombre 3 líneas
then
>
λM
1 0 ... 0 v1
0 λM
2 ... 0 v2>
PM =
v1 v2 ... vn
.. .. .. .
0 ..
. . .
0 0 ... λM
n vn>
log(|λK+1 |)
M even
2 KMeans limitations
3 Spectral Clustering
Kernels as similarities
Graph partitioning
K-Prototypes spectral clustering