Mining Graph Data
COURSE INTRODUCTION
Teacher: Le Ngoc Thanh
Email: lnthanh@fit.hcmus.edu.vn
                                  1
Contents
• Introduction of subjects and topics
• Rule
• Networks and graphs
• Graph mining
• Maths and applications
                                        2
About the subject
• Subject name : Mining Graph Data
• Theory/Practice: 45/30.
• Rate of listening to lectures and self-studying: 40/60.
• References :
    • Lecture slides
    • Aggarwal, Charu C., and Haixun Wang, eds. Managing and mining graph
      data. Vol. 40. New York: Springer, 2010.
    • Easley, David, and Jon Kleinberg. "Networks, crowds, and markets:
      Reasoning about a highly connected world." Significance 9 (2012): 43-44.
    • Ketmaneechairat, Hathairat. "Graph Mining Laws, Tools and Case Studies."
      Journal of Digital Information Management 12, no. 6 (2014): 446
    • …
                                                                             3
Subjects
 Week                          Subject
  1     Introduction to Graph Data Mining, Algorithms and
        Applications
  2     Review the knowledge related to graphs and data
        mining
  3     Patterns in static and dynamic graphs; generate
        graph.
  4     Indexing Graph and Ranking
  5     Mining Graph Pattern
  6     Graph Classification
                                                            4
Subjects
 Week                          Subject
  7      Clustering and community detection
  8      Link prediction
  9      Embedding graphs
 10-15   Seminar topics includes: deep learning for graphs,
         graph summarization, recommendation systems,
         anomaly detection, large size graphs.
                                                              5
Contents
• Introduction of subjects and topics
• Rule
• Networks and graphs
• Graph mining
• Maths and applications
                                        6
Regulations and academic assessment
• Theory: 40%
   • Final Exam: 40%
• Lab: 30%
• Seminar Project: 30%
• Regulations:
   • Copy, cheats → not be allowed to take the final exam
   • Actions that disrupt the classroom → not be allowed to take
    the final exam.
                                                             7
Contents
• Introduction of subjects and topics
• Rule
• Networks and graphs
• Graph mining
• Maths and applications
                                        8
Networks and graphs
• Network often used to represent the natural relationship of
 objects in the real world.
• Meanwhile, graphs show the relationship generated through
 the automatic process.
      Social Network                            Graphs          9
Social Network
• Vertices: People
• Edges: Friendships
Communication network
• Vertices: People
• Edges: email exchange
Business network
▪ Vertices: Companies
▪ Edges: relationships (financial, collaboration)
Biological network
▪ Vetices: Proteins     ▪ Vertices: metabolites, enzymes
▪ Edges: interactions   ▪ Edges: chemical reactions
Information network (WWW)
▪ Vertices: Web Pages
▪ Edges: Links
Social network
▪ Vertices: Twitter users
▪ Edges: Follows/conversations
Contents
• Introduction of subjects and topics
• Rule
• Networks and graphs
• Graph mining
• Maths and applications
                                        16
Why is network analysis important?
• The system is connected by many components, if we
 only focus on understanding a single individual, we
 cannot grasp the whole system.
• There are 2 big questions :
   • What are the structural properties of the network?
   • What interactive process is happening in the network?
                                                             17
Research in the net
• In the field of network analysis, people focus on studying
 network behaviors such as human-to-human behavior in
 social networks.
   • Predict behavior based on its measurable properties.
                                                               18
Modeling network with graphs
• Networks are not separate from graphs, they can be
 re-modeled as graphs and take advantage of its
 theoretical foundations.
                                                  19
Mining Network’s Challenges
• Normal graph:
   • Large size, very very large size (massive)
   • Too sparsity/ too density
   • small diameter
   • dynamic
• Requires efficient algorithms for storage and
 computation.
                                                  20
Contents
• Introduction of subjects and topics
• Rule
• Networks and graphs
• Graph mining
• Maths and applications
                                        21
Maths and applications
• Graph matching
                         22
Maths and applications
• Graph matching
           Finding proteins with the same function across
           different species based on their interaction networks
                                                                   23
Maths and applications
• Graph matching
           Using graphs in a network to identify hidden
           identities in a social network
                                                          24
Maths and applications
• Semantic class
                         25
Maths and applications
• Graph clustering
                         26
Maths and applications
• Graph clustering
            Grouping to reduce graph complexity
                                                  27
Maths and applications
• Graph clustering
                     Community detection
                                           28
Maths and applications
• Graph classification
   • Labeling the top
   • Labeling the link
   • Labeling the graph/subgraph
                                   29
Maths and applications
• Graph classification
                         30
Maths and applications
• Frequent pattern mining in graph)
                                      31
Maths and applications
• Frequent pattern mining in graph
               Common chemical bond string pattern
                                                     32
Maths and applications
• Frequent pattern mining in graph
                   Withdrawal and deposit form
                                                 33
Maths and applications
• privacy-preserving in graph
   • It may not be enough to remove the identifying
     information, since the information can be interpolated
     from known vertices.
   • How to mask identifier information without breaking the
     overall structure of the graph?
                                                          34
Maths and applications
• Link prediction
                         35
Maths and applications
• Link prediction
                         36
Maths and applications
                         37
References
• Aggarwal, Charu C., and Haixun Wang, eds.
 Managing and mining graph data
                                              38