Spectral Clustering
Spectral Clustering
emilio.parrado@uc3m.es
March 1, 2022
                                               09
Index
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
KMeans Algorithm
      Input:
         I   K: number of clusters
         I   X: data matrix, each observation in a row
      Output: K centroids, m1 , m2 , . . . , mK
 1    Initialise the centroids at random
 2    Assign observations to clusters
 3    Recompute centroids
                                       1 X
                                mk =       xj ,    k = 1, . . . , K
                                       Nk
                                         xj ∈Ck
                                                                                  Acrónimo + Nombre 3 líneas
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
                                                                                     
                                   1   1   1   1   1   0   0   0   0   0   0    0
                               
                                  1   1   1   1   1   0   0   0   0   0   0    0     
                                                                                      
                               
                                  1   1   1   1   1   0   0   0   0   0   0    0     
                                                                                      
                               
                                  1   1   1   1   1   0   0   0   0   0   0    0     
                                                                                      
                               
                                  1   1   1   1   1   0   0   0   0   0   0    0     
                                                                                      
                                  0   0   0   0   0   1   1   1   1   0   0    0     
                             K=                                                      
                               
                                  0   0   0   0   0   1   1   1   1   0   0    0     
                                                                                      
                               
                                  0   0   0   0   0   1   1   1   1   0   0    0     
                                                                                      
                               
                                  0   0   0   0   0   1   1   1   1   0   0    0     
                                                                                      
                               
                                  0   0   0   0   0   0   0   0   0   1   1    1     
                                                                                      
                                  0   0   0   0   0   0   0   0   0   1   1    1     
                                   0   0   0   0   0   0   0   0   0   1   1    1
                                                                                    Acrónimo + Nombre 3 líneas
                                                                                     >
            −0.45      0      0                        −0.45    0      0
    
           −0.45      0      0      
                                     
                                                   
                                                      −0.45    0      0                
                                                                                        
    
           −0.45      0      0      
                                     
                                                   
                                                      −0.45    0      0                
                                                                                        
    
           −0.45      0      0      
                                     
                                                   
                                                      −0.45    0      0                
                                                                                        
           −0.45      0      0      
                                      5
                                                     −0.45    0      0                
                                           0   0                                      
            0        −0.5     0     
                                      0
                                                       0      −0.5     0               
  K=                                       4   0                                     
            0        −0.5     0     
                                      0
                                                       0      −0.5     0               
                                           0   3                                      
    
            0        −0.5     0     
                                     
                                                   
                                                       0      −0.5     0               
                                                                                        
    
            0        −0.5     0     
                                     
                                                   
                                                       0      −0.5     0               
                                                                                        
    
            0         0     −0.58   
                                     
                                                   
                                                       0       0     −0.58             
                                                                                        
            0         0     −0.58                    0       0     −0.58             
             0         0     −0.58                      0       0     −0.58
                                                                             Acrónimo + Nombre 3 líneas
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
   The affinity matrix and training data samples define a graph in which:
      I   Each training sample becomes a node in the graph
      I   A positive value in position A(i, j) of the affinity matrix is the edge
          connecting the nodes of the graph corresponding to xi and xj .
   Therefore, finding clusters in the data can be addressed as cutting the
   graph in subgraphs by removing weak edges that connect disjoint subsets
   of heavily connected nodes.
   Normalised-Cut graph partitioning
   Given two sets of nodes S1 and S2 , define W (S1 , S2 ) as the weight of the
   edges connecting S1 and S2 :
                                        X X
                         W (S1 , S2 ) =           Ai,j
                                                xi ∈S1 xj ∈S2
   For a graph with a set of nodes V we define the Normalised Cut that
   partitions V in K disjoint clusters
                                                K
                                                X W (Vk , V \ Vk )
                             V =   ∪K
                                    k=1 Vk   as
                                                   W (Vk , V )
                                                k=1
                         K                   K
                         X W (Vk , V \ Vk ) X e>
                                               k (D − A)ek
                                           =
                         k=1
                            W (Vk , V )
                                             k=1
                                                 e>
                                                  k Dek
   The denominator:
                                  W (Vk , V ) = e>
                                                 k Dek
   each element Dii in the diagonal of D is the weight of all the edges
   connecting xi with the other observations, so e>
                                                  k Dek sums all the Dii
   that correspond to elements in the same cluster.
   The numerator
                             W (Vk , V \ Vk ) = e>
                                                 k (D − A)ek
   accounts for the weight of the connections of the elements in each cluster
   with the elements not in the cluster (the edges we would cut if we decide
   to apply that partition). e>
                              k Dek contains the weight of all the edges
   connnecting the elements of the cluster with the whole dataset; e>   k Aek
   contains the weight of the edges among members of the cluster, that is
   Ai,j will enter the sum as long as (ek )i = 1 and (ek )j = 1, that is, xi and
   xj are members of Ck .                                                  Acrónimo + Nombre 3 líneas
                                    k=1
                                              e>
                                               k Dek
                              min e>
                                   kD
                                      −1/2
                                           (D − A)D−1/2 ek =
                                min I − e>
                                         kD
                                            −1/2
                                                 AD−1/2 ek
                                                                           Acrónimo + Nombre 3 líneas
      Input:
         I   K: number of clusters
         I   A: Affinity matrix
      Output: K clusters, each cluster being a list of input observations
      C1 , C2 , . . . , CK
 1    Zero the elements in the diagonal of A
                                                                           Pn
 2    Construct a diagonal matrix D where the element Di,i =                  j=1   Ai,j
                                                  −1/2        −1/2
 3    Construct Laplacian of the graph L = D             AD
 4    Find the K eigenvectors of L with largest eigenvalue, z1 , z2 , . . . , zK
 5    Form matrix Z stacking the eigenvectors in columns Zn×K = [z1 , . . . , zK ]
                                                   Z
 6    Form matrix Y normalising Z row-wise Yi,j = √Pi,j              2
                                                                j   Zi,j
 7    Perform a Kmeans clustering in the rows of Y and assign instance xj to           Acrónimo + Nombre 3 líneas
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering
                                                                                          
                                     0. 0.53   0. 0.47  0.   0.             0.
                                
                                    0.5  0.   0.  0.5  0.   0.             0.             
                                                                                           
                                
                                    0.   0.   0.   1.  0.   0.             0.             
                                                                                           
                             P =
                                   0.29 0.33 0.37 0. 0.02 0.               0.             
                                                                                           
                                
                                    0.   0.   0. 0.03  0. 0.42            0.55            
                                                                                           
                                    0.   0.   0.   0. 0.47 0.             0.53            
                                                                                Acrónimo + Nombre 3 líneas
0. 0. 0. 0. 0.53 0.47 0.
KL divergence
If Qk is a prototype (centroid) for the algorithm K-prototypes (K-means with
the rows of P M ), then the distance with a row of P M can be computed as:
                                             n
                                          X            PijM
                              KL PiM ||Qk =     PijM ln
                                            j=1
                                                        Qkj
      Input:
         I   K: number of clusters
         I   P M : transition matrix after M iterations in the graph
      Output:
         I   K clusters, each cluster being a list of input observations C1 , C2 , . . . , CK
         I   K Prototypes Q1 , Q2 , . . . , QK (centroids)
 1    Initialize the prototypes (for instace choose K rows of P M at random)
 2    Assign observations to clusters. Observation xi goes to cluster Ck if
3 Recompute prototypes
                                 v1>
                                      
                                v2>                            
                                        v1   v2       ...   vn       =I
                                      
                                 ..
                                  .   
                                 vn>
                                                                                       Acrónimo + Nombre 3 líneas
   then
                                                                          >
                                                   λM
                                                                                   
                                                    1     0    ...     0     v1
                                                  0    λM
                                                          2    ...     0   v2>    
           PM =
                     
                         v1   v2   ...   vn
                                                                                 
                                                    ..    ..   ..         .
                                                                       0   ..
                                                                                   
                                                    .     .      .                 
                                                    0     0    ...    λM
                                                                       n     vn>
                                               log(|λK+1 |)
                                                                   M   even
2 KMeans limitations
3   Spectral Clustering
      Kernels as similarities
      Graph partitioning
      K-Prototypes spectral clustering