Tutorial on Graph ML: Part II
Soumyodeep Dey
May 21, 2025
Overview
Introduction to Graphs (Revision)
Machine Learning Tasks (Revision)
Traditional Graph Analysis Methods
Node Embedding Methods
Conclusion: Toward Graph Neural Networks
Graphs: Definition and Types
Key Idea
Graphs model relationships between entities via nodes and edges.
Graph G = (V, E), adjacency matrix:
(
1 if (vi , vj ) ∈ E,
A[i, j] = (1)
0 otherwise
Types: Simple, directed, weighted, multi-relational
Example: Social networks, molecular graphs
Features and Importance
Node Features: X ∈ Rn×m (e.g., user profiles)
Edge Features: Weights, timestamps
Graph-Level: Diameter, density
Applications: Biology (protein interactions), knowledge graphs
ML Tasks: Node and Link Prediction
Node Classification: Predict labels, homophily
Loss: X X
L=− yi,c log ŷi,c (2)
vi ∈Vtrain c∈Y
Link Prediction: Predict missing edges
Loss:
X X
L= log s(vi , vj ) + log(1 − s(vi , vj )) (3)
(vi ,vj )∈Etrain (vi ,vj )∈E
/ train
ML Tasks: Graph-Level and Challenges
Graph Classification: Predict graph labels
Loss:
N X
X
L=− yi,c log f (Gi )c (4)
i=1 c∈Y
Clustering: Detect communities
Challenges: Non-i.i.d. data, scalability
Traditional Methods
Key Idea
Use hand-crafted features and statistics for graph analysis.
Baselines for ML tasks, limited by manual engineering
Include graph statistics, kernels, spectral methods
Graph Statistics and Features
Node-Level: Degree, PageRank, Centrality measures
Edge-Level: Weights, shortest paths
Graph-Level: Diameter, density
Centrality Measures
Eigenvector Centrality: Importance via connections
Closeness Centrality: Average shortest path
Betweenness Centrality: Shortest paths through node
Kernel Methods
Goal: Measure graph similarity via kernels
Examples: Weisfeiler-Lehman, Shortest-Path, Graphlets
Limitation: Expensive, non-adaptive
Neighbourhood Overlap Measures
Goal: Predict edges based on local/global neighbourhoods
Examples: CN, Jaccard Coefficient, Sorensen Index, Katz Index
Common Neighbors:
X
CN(vi , vj ) = A[i, k]A[j, k] (5)
k
Limitation: Not effective for multi-relational or sparse graphs.
Spectral Clustering: Graph Cut
Goal: Partition nodes into communities
Graph Cut:
k
1X
Cut(C1 , . . . , Ck ) = Cut(Ci , V \ Ci ) (6)
2
i=1
Ratio Cut: Balanced partitions
k
1 X Cut(Ci , V \ Ci )
RatioCut(C1 , . . . , Ck ) = (7)
2 |Ci |
i=1
Spectral Clustering: Laplacian
Graph Laplacian:
L=D−A (8)
Normalized Laplacian:
Lsym = D−1/2 LD−1/2 (9)
Properties: Positive semi-definite, eigenvalue 0
Algorithm: Compute k eigenvectors, k-means on embeddings
Spectral Clustering: Insights
Intuition: Eigenvectors embed nodes to reveal communities
Applications: Fraud detection, social network analysis
Limitation: O(n3 ) complexity, assumes well-separated communities
Node Embedding Methods
Key Idea
Learn low-dimensional node representations for ML tasks.
Encoder-Decoder framework
Loss:
X
L=− A[i, j] log Â[i, j] + (1 − A[i, j]) log(1 − Â[i, j]) (10)
i,j
Factorization-Based Approaches
Encoder: Factorizes matrix (e.g., A)
Objective:
Z = arg min ∥A − ZZ⊤ ∥2F (11)
Z
Limitation: Expensive (O(n3 )), ignores node features
Random Walk-Based Embeddings
Use random walks to capture node co-occurrences
Skip-gram, negative sampling:
k
" #
X X X
⊤ ⊤
L= log σ(zi zj ) + Evn log σ(−zi zn ) (12)
vi vj ∈context(vi ) n=1
Scalable, captures local/global structure
DeepWalk
Uniform random walks of length l
Skip-gram: Predict context nodes in window w
Objective (negative sampling, as above)
Intuition: Co-occurring nodes embedded closely
Limitation: Fixed walk strategy
node2vec
Biased walks with p (return), q (in-out)
Transition probability:
1
p
if vt+1 = vt−1 ,
π(vt+1 |vt , vt−1 ) = 1 if (vt , vt+1 ) ∈ E, dist(vt−1 , vt+1 ) = 1,
1
q if (vt , vt+1 ) ∈ E, dist(vt−1 , vt+1 ) = 2
(13)
Flexible: BFS-like or DFS-like walks
Conclusion
Graphs model complex systems
Traditional methods limited by scalability, manual engineering
Shallow embeddings limited by transductivity, feature integration
Graph Neural Networks (GNNs) needed for:
Hierarchical pattern learning
Generalization to new nodes/graphs
Dynamic feature integration
Continued in Part II: GNNs and beyond