Introduction to Graph Neural Networks
Introduction to Graph Neural Networks
                                         https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852
Non-Euclidean data structure
Successful deep learning architectures
1. Convolutional neural networks
                                         Sentiment analysis
Non-Euclidean data structure
Successful deep learning architectures
2. Recurrent neural network
             Social Graph
                                                3D Mesh   Molecular Graph
        (Facebook, Wikipedia)
𝑮𝒓𝒂𝒑𝒉 = 𝑮(𝑿, 𝑨)
   X : Node, Vertex
   - Individual person in a social network
   - Atoms in a molecule
𝑮𝒓𝒂𝒑𝒉 = 𝑮(𝑿, 𝑨)
   A : Adjacency matrix
   - Edges of a graph
   - Connectivity, Relationship
Donald Trump
• 72 years old
• Male
• American
• President, business man
• …
                            Vladimir Putin
                            • 65 years old
                            • Male
                            • Russian
                            • President
                            • …
Graph structure
Edge features
Aromatic bond
• Node classification
• Link prediction
                                                            (𝟎)
                                   Input node features, 𝑯𝒊
                                   : Raw node information
                                                        (𝑳)
                                    Final node states, 𝑯𝒊
                                    : How the NN recognizes the nodes
                                    Graph features, Z
Principles of graph neural network
Updates in a graph neural network
                                  What NN learns
                         (𝒍+𝟏)                     (𝒍)   (𝒍)   (𝒍)
                       𝑿𝒊        = 𝝈(   𝒋∈[𝒊−𝒌,𝒊+𝒌] 𝑾𝒋 𝑿𝒋 + 𝒃𝒋 )
GCN : Graph Convolutional Network
       2               3
                   1
               4
                            (𝒍+𝟏)        𝒍     𝒍      𝒍     𝒍      𝒍     𝒍      𝒍     𝒍
                           𝑯𝟐       = 𝝈 𝑯𝟏 𝑾       + 𝑯𝟐 𝑾       + 𝑯𝟑 𝑾       + 𝑯𝟒 𝑾
GCN : Graph Convolutional Network
                                                    What NN learns
                              (𝒍+𝟏)                 𝒍     𝒍         𝒍     𝒍
                             𝑯𝒊       =𝝈           𝑯𝒋 𝑾       = 𝝈 𝑨𝑯𝒋 𝑾
                                           𝒋∈𝑵 𝒊
GCN : Graph Convolutional Network
                                                       What NN learns
                                 (𝒍+𝟏)                 𝒍               𝒍
                       GCN :    𝑯𝒊       =𝝈           𝑯𝒋 𝑾   𝒍   = 𝝈 𝑨𝑯𝒋 𝑾   𝒍
                                              𝒋∈𝑵 𝒊
GAT : Graph Attention Network
• Attention revisits
                         What NN learns
                         : Convolution weight and attention coefficient
                                     (𝒍+𝟏)                       (𝒍)      𝒍        𝒍
                          GAT :    𝑯𝒊        =𝝈               𝜶𝒊𝒋 𝑯𝒋 𝑾
                                                   𝒋∈𝑵 𝒊
                                                                                               Velickovic, Petar, et al.
                                                  "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).
GAT : Graph Attention Network
• Attention mechanism in natural language processing
                                  https://devblogs.nvidia.com/introduction-neural-machine-translation-gpus-part-3/figure6_sample_translations-2/
GAT : Graph Attention Network
• Attention mechanism in natural language processing
                                                   http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
GAT : Graph Attention Network
• Attention mechanism in natural language processing
𝛽 ⋅ 𝒉𝑻𝒕 𝒉𝒔
𝑠𝑐𝑜𝑟𝑒(𝒉𝒕 , 𝒉𝒔 ) = 𝒉𝑻𝒕 𝑾𝒂 𝒉𝒔
  What NN learns
  : Convolution weight and attention coefficient
                        (𝒍+𝟏)                 (𝒍)   𝒍   𝒍
                       𝑯𝒊       =𝝈           𝜶𝒊𝒋 𝑯𝒋 𝑾       𝜶𝒊𝒋 = 𝒇(𝑯𝒊 𝑾, 𝑯𝒋 𝑾)
                                     𝒋∈𝑵 𝒊
  What NN learns
  : Convolution weight and attention coefficient
                                 (𝒍+𝟏)                   (𝒍)   𝒍      𝒍
                               𝑯𝒊        =𝝈             𝜶𝒊𝒋 𝑯𝒋 𝑾              𝜶𝒊𝒋 = 𝒇(𝑯𝒊 𝑾, 𝑯𝒋 𝑾)
                                                𝒋∈𝑵 𝒊
          𝜶𝒊𝒋 = 𝐭𝐚𝐧𝐡   𝑯𝒊 𝑾 𝑻 𝑪 𝑯𝒋 𝑾
GAT : Graph Attention Network
• Multi-head attention
                                            • Average
                                                                       𝑲
                                                (𝒍+𝟏)             𝟏                     (𝒍)       𝒍       𝒍
                                             𝑯𝒊         =𝝈                         𝜶𝒊𝒋 𝑯𝒋 𝑾
                                                                  𝑲
                                                                      𝒌=𝟏 𝒋∈𝑵 𝒊
                                             • Concatenation
                                                            𝑲
                                                (𝒍+𝟏)                             (𝒍)         𝒍       𝒍
                                             𝑯𝒊         =         𝝈            𝜶𝒊𝒋 𝑯𝒋 𝑾
                                                            𝒌=𝟏        𝒋∈𝑵 𝒊
                (𝑙+1)             𝑙     𝑙+1
             ℎ𝑖         = 𝑈(ℎ𝑖 , 𝑚𝑖            )
             (𝑙+1)                    𝑙+1
           𝑚𝑖        =            𝑀         (ℎ𝑖 , ℎ𝑗 , 𝑒𝑖𝑗 )
                         𝑗∈𝑁(𝑖)
                                                                      (𝟎)
                                             Input node features, 𝑯𝒊
                                             : Raw node information
                                                                  (𝑳)
                                              Final node states, 𝑯𝒊
                                              : How the NN recognizes the nodes
                                              Graph features, Z
Readout : permutation invariance on changing nodes order
                           Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction - Scientific Figure on ResearchGate. Available from:
     https://www.researchgate.net/Graph-permutation-invariance-and-structured-prediction-A-graph-labeling-function-F-is_fig1_323217335 [accessed 8 Sep, 2018]
Readout : permutation invariance on changing nodes order
• Graph feature
              (𝐿)
   𝑧𝐺 = 𝑓   𝐻𝑖
• Node-wise summation
                           𝐿
   𝑧𝐺 = 𝜏         𝑀𝐿𝑃 𝐻𝑖
            𝑖∈𝐺
• Graph gathering
                               𝐿   0              𝐿                                           • 𝜏 : ReLU activation
   𝑧𝐺 = 𝜏         𝜎 𝑀𝐿𝑃1 𝐻𝑖 , 𝐻𝑖       ⨀𝑀𝐿𝑃2 𝐻𝑖
            𝑖∈𝐺                                                                               • 𝜎 : sigmoid activation
  • Clustering
  • Link prediction
  • Matrix completion and recommendation
                                                                                  Kipf, Thomas N., and Max Welling.
                                           "Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).
                                                                                         https://github.com/tkipf/gae
Graph Auto-Encoders (GAE)
Encoder
Decoder
                                   Reconstructed
                            Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁𝑻 )
                                         https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41
Practical Issues
 : Inception
Inception module
                   https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41
Practical Issues
 : Inception
                    (𝟏)          𝟎        (𝟐)          𝟏
                   𝑯𝒊     = 𝝈 𝑨𝑯𝒋 𝑾(𝟎)   𝑯𝒊     = 𝝈 𝑨𝑯𝒋 𝑾(𝟏)
Practical Issues
 : Inception
                                 Going Deeeeeeeeper!
Practical Issues
 : Skip-connection
ResNet – Winner of 2015 ImageNet Challenge
                                     (𝑙+1)          (𝑙)
                              𝑦 = 𝐻𝑖         + 𝐻𝑖
                                             https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41
Practical Issues
 : Dropout
                                                                    Dropout rate (𝑝) = 0.25
 For a dense network, the dropout of hidden state reduces the number of parameters
                                                                             from 𝑁𝑤 to 𝑁𝑤 1 − 𝑝   2
Practical Issues
 : Dropout
 Hidden states in a graph neural network
                                              Age         28         40         36         ⋅⋅⋅
                       (𝑙+1)
     Graph Conv. :    𝐻𝑖       = 𝐴𝑯(𝒍) 𝑊       Sex
                                                          M           F          F         ⋅⋅⋅
                                                                   Medical
                                               Job       Student              Politician   ⋅⋅⋅
                                                                   Doctor
Nationality Korean American French ⋅⋅⋅ Nationality Korean American French ⋅⋅⋅
                           Medical                                                 Medical
       Job       Student              Politician   ⋅⋅⋅         Job       Student              Politician   ⋅⋅⋅
                           Doctor                                                  Doctor
• Molecular Applications
 1. Neural molecular fingerprint
 2. Quantitative Structure-Property Relationship (QSPR)
 3. Molecular generative model
Karate club graph, colors denote communities obtained                            GCN embedding (with random weights)
via modularity-based clustering (Brandes et al., 2008).                           for nodes in the karate club network.
• All figures and descriptions are taken from Thomas N. Kipf’s blog.
• Watch video on his blog.
                                                                                                                  Kipf, Thomas N., and Max Welling.
                                         "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016)
                                                                                               http://tkipf.github.io/graph-convolutional-networks/
Network analysis
1. Node classification
  • Good node features → Good node classification results
                               Renormalization trick
                         𝐼𝑁 + 𝐷−1/2 𝐴𝐷 −1/2 = 𝐷 −1/2 𝐴𝐷−1/2
  • Clustering
  • Link prediction
  • Matrix completion and recommendation
                                                                                  Kipf, Thomas N., and Max Welling.
                                           "Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).
                                                                                         https://github.com/tkipf/gae
Network analysis
2. Link prediction
Encoder
Decoder
                            Reconstructed
                     Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁𝑻 )
𝑨 = 𝑔(𝒁)
which takes pairs of node embeddings 𝒛𝒊 , 𝒛𝒋 and predicts respective entries 𝑨𝒊𝒋
Encoder
𝑈, 𝑉 = 𝑓(𝑋, 𝑀1 , … , 𝑀𝑅 ), where 𝑀𝑟 ∈ 0,1 𝑁𝑢 ×𝑁𝑣 is the adjacency matrix associated with rating type 𝑟 ∈ ℛ
Decoder
Encoder
              1
  𝜇𝑗→𝑖,𝑟 =      𝑊𝑥     : Message function from item 𝑗 to user i                   𝑐𝑖𝑗 =       𝒩𝑖 𝒩𝑗
             𝑐𝑖𝑗 𝑟 𝑗
Decoder
                             𝑢𝑖𝑇 𝑄𝑟 𝑣𝑗
                         𝑒
      𝑝 𝑀𝑖𝑗 = 𝑟 =                 𝑇           : probability that rating 𝑀𝑖𝑗 is 𝑟
                              𝑢𝑖 𝑄𝑠 𝑣𝑗
                        𝑠∈𝑅 𝑒
 Loss function
                 𝑅
                                   • 12 types of toxicity
                                   • Molecule species (represented with SMILES)
                                     and toxicity labels are given
                                   • But too small to train a DL model
                                             https://tripod.nih.gov/tox21/challenge/data.jsp
Molecular applications
1. Neural molecular fingerprint
                                               https://mc.ai/machine-learning-for-drug-discovery-in-a-nutshell%E2%80%8A-%E2%80%8Apart-ii/
Molecular applications
1. Neural molecular fingerprint
  Such molecular fingerprints can be easily obtained by open source packages, e.g.) RDKit.
                                           http://kehang.github.io/basic_project/2017/04/18/machine-learning-in-molecular-property-prediction/
Molecular applications
1. Neural molecular fingerprint
 In recent days, neural fingerprints generated by graph convolutional network is widely used for
 more accurate molecular property predictions.
                                           http://kehang.github.io/basic_project/2017/04/18/machine-learning-in-molecular-property-prediction/
Molecular applications
1. Neural molecular fingerprint
                                 Ascending
                                   order                                            𝟐
                                                                          𝒁𝒊 − 𝒁𝒋
Graph features
                                             Similar molecules are located closely
                                                   in the graph latent space
Molecular applications
3. Molecular generative model
Motivation : de novo molecular design
                                                                                    Segler, Marwin HS, et al. "Generating focused molecule libraries for drug discovery
                                                                                    with recurrent neural networks." ACS central science 4.1 (2017): 120-131.
Literatures
 • Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs.
   arXiv preprint arXiv:1803.03324.
 • Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular
   Graph Generation." arXiv preprint arXiv:1802.04364 (2018).
 • Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
   (2018).
 • "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders." arXiv preprint
   arXiv:1802.03480 (2018).
 • De Cao, Nicola, and Thomas Kipf. "MolGAN: An implicit generative model for small molecular graphs."
   arXiv preprint arXiv:1805.11973 (2018).
 • …
Molecular applications
3. Molecular generative model
   1. Sample whether to add a new node of a particular type or terminate : if a node type is chosen
   2. Add a node of this type to the graph
   3. Check if any further edges are needed to connect the new node to the existing graph
   4. If yes, select a node in the graph and add an edge connecting the new to the selected node.
• If a node is added, determine that add edges between current node and other nodes.
Graph propagation process : readout all node states and generating a graph feature
       𝒉′𝒗 = 𝑓𝑛 𝒂𝒗 , 𝒉𝒗   ∀𝑣 ∈ 𝒱          𝒂𝒗 =            𝑓𝑒 𝒉𝒖 , 𝒉𝒗 , 𝒙𝒖,𝒗       ∀𝑣 ∈ 𝒱
                                                  𝑢,𝑣∈ℰ
                                                                                                                                     Li, Yujia, et al.
                                                               "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).
Molecular applications
3. Molecular generative model
                               (𝑻)
   𝑓𝑎𝑑𝑑𝑒𝑑𝑔𝑒 𝐺, 𝑣 = 𝜎 𝑓𝑎𝑒 𝒉𝑮 , 𝒉𝒗       : the probability of adding an edge to the newly created node 𝑣.
                                                                            𝑻     (𝑻)
                                         𝑓𝑛𝑜𝑑𝑒𝑠 𝐺, 𝑣 = softmax 𝑓𝑠 𝒉𝒖 , 𝒉𝒗                   : Score for each node to
                            If “yes”                                                              connect the edges
                                                                                                                             Li, Yujia, et al.
                                                       "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).
Molecular applications
3. Molecular generative model
Objective function
                                                                                                               𝑝(𝐺, 𝜋)
     𝑝(𝐺) : marginal likelihood                                𝑝 𝐺 =                 𝑝(𝐺, 𝜋) = 𝔼𝑞       𝜋|𝐺
                                      permutation                                                              𝑞 𝜋|𝐺
                                                                           𝜋∈𝒫(𝐺)
• Interacting systems
• Nodes : particles, Edges : (physical) interaction between particle pairs
• Latent code : the underlying interaction graph
 𝑒 → 𝑣 ∶ 𝒉𝒍+𝟏
          𝒋   = 𝑓𝑣𝑙               𝒉𝒍 𝒊,𝒋 , 𝒙𝒋   𝒙𝒊 : initial node features, 𝒙        𝒊,𝒋   : initial edge features
                           𝑖∈𝒩𝑗
                   𝒉𝟏(𝒊,𝒋) = 𝑓𝑒1 𝒉𝟏𝒊 , 𝒉𝟏𝒋   𝒉𝟐𝒋 = 𝑓𝑣1             𝒉𝟏(𝒊,𝒋)     𝒉𝟐(𝒊,𝒋) = 𝑓𝑒2 𝒉𝟐𝒊 , 𝒉𝟐𝒋
                                                             𝑖≠𝑗
                                                                                                                                  2
                                                                𝑞𝜙 𝑧𝑖𝑗 𝑥 = softmax 𝑓𝑒𝑛𝑐,𝜙 𝑥                  𝑖𝑗,1:𝐾    = softmax ℎ(𝑖,𝑗)
  𝑝𝜃 𝒙|𝒛 =    𝑇
              𝑡=1 𝑝𝜃   𝒙𝒕+𝟏 |𝒙𝒕 , … , 𝒙𝟏 |𝒛 =   𝑇
                                                𝑡=1 𝑝𝜃   𝒙𝒕+𝟏 |𝒙𝒕 |𝒛 , since the dynamics is Markovian.
𝑝 𝒛 = 𝒊≠𝒋 𝑝𝜃 𝒛𝒊𝒋 , the prior is a factorized uniform distribution over edge types
•   Bronstein, Michael M., et al. "Geometric deep learning: going beyond euclidean data." IEEE Signal Processing
    Magazine 34.4 (2017): 18-42.
•   [NIPS 2017] Tutorial - Geometric deep learning on graphs and manifolds,
    https://nips.cc/Conferences/2017/Schedule?showEvent=8735
•   Goyal, Palash, and Emilio Ferrara. "Graph embedding techniques, applications, and performance: A survey."
    Knowledge-Based Systems 151 (2018): 78-94., https://github.com/palash1992/GEM
•   Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint
    arXiv:1806.01261 (2018).
•   Awesome Graph Embedding And Representation Learning Papers,
    https://github.com/benedekrozemberczki/awesome-graph-embedding
                                                                                        https://github.com/SeongokRyu/Graph-neural-networks
References
•   Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. "Convolutional neural networks on graphs with fast
    localized spectral filtering." Advances in Neural Information Processing Systems. 2016.
•   Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint
    arXiv:1609.02907 (2016).
•   van den Berg, Rianne, Thomas N. Kipf, and Max Welling. "Graph Convolutional Matrix Completion." stat 1050 (2017): 7.
•   Schlichtkrull, Michael, et al. "Modeling relational data with graph convolutional networks." European Semantic Web
    Conference. Springer, Cham, 2018.
•   Levie, Ron, et al. "Cayleynets: Graph convolutional neural networks with complex rational spectral filters." arXiv preprint
    arXiv:1705.07664 (2017).
                                                                                       https://github.com/SeongokRyu/Graph-neural-networks
References
•   Velickovic, Petar, et al. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).
•   GRAM: Graph-based Attention Model for Healthcare Representation Learning
•   Shang, C., Liu, Q., Chen, K. S., Sun, J., Lu, J., Yi, J., & Bi, J. (2018). Edge Attention-based Multi-Relational Graph
    Convolutional Networks. arXiv preprint arXiv:1802.04944.
•   Lee, John Boaz, et al. "Attention Models in Graphs: A Survey." arXiv preprint arXiv:1807.07984 (2018).
•   Li, Yujia, et al. "Gated graph sequence neural networks." arXiv preprint arXiv:1511.05493 (2015).
•   Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum
    chemistry. arXiv preprint arXiv:1704.01212.
                                                                                            https://github.com/SeongokRyu/Graph-neural-networks
References
•   Kipf, Thomas N., and Max Welling. "Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).
•   Simonovsky, Martin, and Nikos Komodakis. "GraphVAE: Towards Generation of Small Graphs Using Variational
    Autoencoders." arXiv preprint arXiv:1802.03480 (2018).
•   Liu, Qi, et al. "Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
    (2018).
•   Pan, Shirui, et al. "Adversarially Regularized Graph Autoencoder." arXiv preprint arXiv:1802.04407 (2018).
•   Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs. arXiv
    preprint arXiv:1803.03324.
•   Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph
    Generation." arXiv preprint arXiv:1802.04364 (2018).
•   Liu, Qi, et al. "Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
    (2018).
                                                                                         https://github.com/SeongokRyu/Graph-neural-networks
References
Applications of GNN
 •   Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular fingerprints." Advances in
     neural information processing systems. 2015.
 •   Kearnes, Steven, et al. "Molecular graph convolutions: moving beyond fingerprints." Journal of computer-aided
     molecular design 30.8 (2016): 595-608.
 •   Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R., & Tkatchenko, A. (2017). Quantum-chemical insights from
     deep tensor neural networks. Nature communications, 8, 13890.
 •   Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., ... & Pande, V. (2018). MoleculeNet: a
     benchmark for molecular machine learning. Chemical Science, 9(2), 513-530.
 •   Feinberg, Evan N., et al. "Spatial Graph Convolutions for Drug Discovery." arXiv preprint arXiv:1803.04465 (2018).
                                                                                         https://github.com/SeongokRyu/Graph-neural-networks
References
Applications of GNN
 •   Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph
     Generation." arXiv preprint arXiv:1802.04364 (2018).
 •   Liu, Qi, et al. "Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
     (2018).
 •   De Cao, Nicola, and Thomas Kipf. "MolGAN: An implicit generative model for small molecular graphs." arXiv preprint
     arXiv:1805.11973 (2018).
 •   Selvan, Raghavendra, et al. "Extraction of Airways using Graph Neural Networks." arXiv preprint arXiv:1804.04436
     (2018).
 •   Kipf, Thomas, et al. "Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).
https://github.com/SeongokRyu/Graph-neural-networks