Machine learning with graph

Machine Learning with Graph
Graph Neural Networks (GNN)
Ding Li 2021.10

2
CS224W: Machine Learning with Graphs
Prof. Jure Leskovec @ Stanford

6
Representing Graph
1. Adjacency Matrix 2. Edge List 3. Adjacency List

9
Node Features: Clustering Coefficient & Graphlets
Clustering Coefficient

11
Graph-Level Features: Kernel Methods

13
Weisfeiler-Lehman Kernel (Color Refinement, computationally efficient)

14
Node Embeddings: Encoder and Decoder

15
Random Walk for Node Embeddings

16
Random Walk Optimization & Stochastic Gradient Descent
Stochastic Gradient Descent
Perozzi 2014

17
Node2vec: Biased Walks
Grover 2016
Breadth First Search
Depth First Search

18
Embedding Entire Graphs
Approach 3: Anonymous Walk Embeddings

19
Anonymous Walk Embeddings
Ivanov 2018

21
PageRank: Power Iteration Method

24
Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE,
and node2vec Qiu 2018

29
A: adjacency matrix
Model can be of arbitrary depth

30
Unsupervised Training
Supervised Training

32
Graphic Convolutional Networks (GCN) Kipf 2017

33
Kipf 2017
Adjacency matrix
Degree matrix
Input-to-hidden weight matrix with H feature
hidden-to-output weight matrix with F channels
Citation networks: Citeseer, Cora and Pubmed
Features: bag-of-words feature vectors
Label: each document has a label, total 20 labels
NELL: a dataset extracted from a knowledge graph
Separated relation nodes were assigned for each entity
(e1,r,e2) →(e1, r1), (e2,r2)
Features: bag-of-words feature vectors
Label: node label
Propagation of feature information from neighboring nodes in every
layer improves classification performance in comparison to previous
methods, where only label information is aggregated.
Results for all other baseline methods are taken from the Planetoid paper
(Yang et al., 2016). Planetoid* denotes the best model for the respective
dataset out of the variants presented in their paper.

34
Hamilton 2018
Citation data: predicting paper subject categories (6) on a large citation dataset
Features: i node degrees; ii word embedding on abstract
Reddit data: predicting which community different Reddit posts belong to
Posts were connected if the same user comments on both
Features: i embedding of title; ii embedding on comments; iii post’s score; iv number of comments
Protein-protein interactions: predicting the protein roles (121) in terms of their cellular functions
Features: i positional gene sets; ii motif gene sets; iii immunological signatures

35
Ying 2018
I (2 billion pins) ↔ C (1 billion board)
Method Hit-rate
Visual embeddings (4,096 dimensions, from CNN) 17%
Annotation embeddings (256 dim, title & description -> Word2Vec) 14%
Combined embeddings (2-layer MLP on visual and annotation embeddings) 27%
Pixie (random-walk-based, closeness only from graph structure) -
PinSage (graph convolution with visual and annotation features) 67%
Hit-rate: probability that positive samples were ranked among the top
500 among the 5M negative samples
Importance pooling: based upon random walk similarity to choose positive
sampling, leading to a 46% performance gain in offline evaluation metrics.
Curriculum training: the algorithm is fed harder-and-harder examples (from
PageRank score) during training, resulting in a 12% performance gain.
Pinterest
A/B tests show 30%
to 100%
improvements in user
engagement across
various settings after
deploying PinSage

36
Velickovic 2018
K multi-head attention
Concatenation
Averaging (final layer)
GitHub

37
For the inductive task, two unseen test graphs were tested.

42
DeepSNAP provides core modules for this pipeline
GraphGym further implements the full pipeline to facilitate GNN design

45
Classification Loss
Regression Loss

52
Block Diagonal Matrices
Basis Learning

55
TransE
Relation Patterns
Bordes 2013

62
Lin 2015
KG Embeddings in Practice

70
z-score: classify subgraph “significance”
• Negative: under-representation
• Positive: over-representation

83
EPINIONS: who-trusts-whom social network
Leskovec 2010

94
Wu 2019
The accuracy of Simple Graph Convolution (SGC) is similar as GCN’s

98
CS224W Colab
• Colab 0: Networkx, PyTorch Geometric, & GCN
• Colab 1: Node Embeddings
• Colab 2: GCN Implementation
• Colab 3: GraphSAGE Implementation

Machine learning with graph

More Related Content

What's hot

Similar to Machine learning with graph

More from Ding Li

Recently uploaded

Machine learning with graph