KEMBAR78
Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering | PDF | Algorithms | Mathematical Analysis
0% found this document useful (0 votes)
61 views11 pages

Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering

This document provides an overview of spectral clustering and graphs. It discusses how graphs can be represented by adjacency matrices and graph Laplacians. It then explains that finding clusters in graphs is NP-hard, so spectral clustering approximates the solution by relaxing the problem and solving for the second eigenvector of the graph Laplacian. Finally, it analyzes spectral clustering and shows that the solution approximates minimizing the ratio cut of the graph by assigning vertices to clusters based on the signs of the second eigenvector.

Uploaded by

EZ112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views11 pages

Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering

This document provides an overview of spectral clustering and graphs. It discusses how graphs can be represented by adjacency matrices and graph Laplacians. It then explains that finding clusters in graphs is NP-hard, so spectral clustering approximates the solution by relaxing the problem and solving for the second eigenvector of the graph Laplacian. Finally, it analyzes spectral clustering and shows that the solution approximates minimizing the ratio cut of the graph by assigning vertices to clusters based on the signs of the second eigenvector.

Uploaded by

EZ112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Math 118: Mathematical Methods of Data Theory

Lecture 9: Graphs and Spectral Clustering

Instructor: Daniel Mckenzie

Dept. of Mathematics, UCLA

TBD

1 / 11
Graphs

• Graphs G = (V , E ) where V = vertex set and E = edge set.

• For this class V = {v1 , . . . , vn } and write (i, j) for edge between vi and vj .

• Adjacency matrix: A ∈ Rn×n with Aij = 1 if (i, j) is edge, and Aij = 0 otherwise.

Insert Adjacency matrix and small graph here

2 / 11
Graphs

• di = degree of vi = number of edges incident to vi .

• D = diag(d1 , . . . , dn ) ∈ Rn×n .

• The graph Laplacian: L = D − A.

• Important properties of L:
• L is symmetric and pos. semi-definite.
• L1 = 0.
• Further variants: G can have weighted or directed edges.

3 / 11
Examples of Graphs

Figure: Left to right: Zachary’s Karate club3 , College Football 2000 season 4 , Erdos-Renyi random
graph generated using networkx

Graphs often called networks in applied settings.

1
Originally: An information flow model for conflict and fission in small groups Zachary, W. 1977. Image from
https://studentwork.prattsi.org/infovis/labs/zacharys- karate- club/
2
Originally: Community structure in social and biological networks. Girvan & Newman (2002). Image from Compressive sensing for
cut improvement and local clustering Lai & Mckenzie (2020)
3
Originally: An information flow model for conflict and fission in small groups Zachary, W. 1977. Image from
https://studentwork.prattsi.org/infovis/labs/zacharys- karate- club/
4
Originally: Community structure in social and biological networks. Girvan & Newman (2002). Image from Compressive sensing for
cut improvement and local clustering Lai & Mckenzie (2020)
4 / 11
Connected Components and Clusters
• C1 is a connected component of G if no edges between C1 and V \ C1 .
• Corollary: If C1 is a connected component then so is C2 = V \ C1 .

Figure: Left: Two connected components. Right: One connected component but two clusters

• C1 is a cluster of G if “few” edges between C1 and V \ C1 and many internal


edges in C1 .
• Ratio Cut.
• Let e(S, V \ S) = # edges from S to V \ S.
e(S, V \ S)
• RCut(S) = .
|S||V \ S|
• Find cluster as C = arg min RCut(S).
S⊂V
5 / 11
Why is finding clusters hard?

• Finding connected components: Breadth-First Search or Depth-First Search.

• Min Cut:
• Recall e(S, V \ S) = # edges from S to V \ S.
• Min Cut problem: Find C = arg min e(S, V \ S).
S⊂V
•Can be done efficiently (O(n3 )) using Ford-Fulkerson algorithm.
• Problem: typically finds small C .

e(S, V \ S)
• Recall RCut(S) = .
|S||V \ S|
• Unfortunately C = arg min RCut(S) is NP-hard.
S⊂V
• Thus, resort to approximate algorithms, like Spectral Clustering.

6 / 11
The Spectral Clustering Algorithm

• Spectral clustering for 2 clusters:


1. Compute di for i = 1, . . . , n. Let D = diag(d1 , . . . , dn ) ∈ Rn×n .
2. Compute Laplacian: L = D − A.
3. Compute second eigenpair (λ2 , v 2 ).
4. Assign vertices to clusters as:

vi ∈ C if (v2 )i > 0 or vi ∈ V \ C if (v2 )i < 0

5. Output: C .

7 / 11
Output of Spectral Clustering

Figure: Left: Two connected components. Right: One connected component but two clusters

8 / 11
Analysis of Spectral Clustering

e(S, V \ S)
n o
• Recall solving C = arg min RCut(S) = . is NP-hard.
S⊂V |S||V \ S|
• Instead, will show that Spectral Clustering solves a relaxed version of Ratio Cut.

• Proceed via steps:


1. Introduce indicator vectors 1S ∈ Rn for S ⊂ V .
2. Relate to Ratio Cut: Rcut(S) = n12 I>
S LIS .
3. Relax: Replace 1S ∈ Rn with arbitrary v ∈ Rn .
4. Argue that solving relaxed problem is easy: v 2 = arg min v > Lv.
v∈Rn
5. Can (approximately) reconstruct C from v 2 .

Ensure consistency with notation and type of indicator vectors, add a small (4–6
vertex) running example. Check consistency between S and C .

9 / 11
Analysis of Spectral Clustering
 q
|S c |
 if vi ∈ S
• For any S ⊂ V define: Is = q|S|
 − |S|c if vi ∈
/S
|S |

• Properties of indicator vectors:


1. Rcut(S) = n12 I>
S LIS (Homework).
2. So: C = argminS⊂V RCut(S) ⇔ IC = arg min I>
S LIS .
S⊂V
3. 1> IS = 0. Proof:
r r
X X |S c |
 X |S|

1> IS = (IS )i = + −
|S| |S c |
i∈V vi ∈S vi ∈S c
r  r 
|S c | |S|
= |S| − |S | c
|S| |S c |
p p
= |S||S c | − |S||S c | = 0

4. If S 6= ∅, V then kIS k2 = n (Homework).
• Relax problem arg min I>
S LIS to arg min v> Lv
S⊂V n
√v∈R
kvk2 = n and 1> v=0

10 / 11
Analysis of Spectral Clustering

Need a detour on eigenvalues and Rayleigh-Ritz. Caution that now enumerating


eigenvalues in increasing order.

• Claim: v 2 = arg minv∈Rn v> Lv : 1> v = 0 and kvk2 = n. Why?

• First eigenvector: 1 = v 1 = arg min v > Lv.


v∈Rn

kvk2 = n

• Second eigenvector: v 2 = arg min v> Lv


v∈Rn

kvk2 = n and 1> v=0
• So:
IC = arg min IS> LIS ≈ arg min v> Lv = v 2
S⊂V v∈Rn

kvk2 = n and 1> v=0

• (IC )i > 0 if vi ∈ C and (IC )i < 0 if vi 6∈ C .

• Use same rule with v 2 :

vi ∈ C if (v2 )i > 0 or vi ∈ V \ C if (v2 )i < 0

11 / 11

You might also like