KEMBAR78
Computational Social Science, Lecture 05: Networks, Part I | PPTX
Networks
           Part I

            Sharad Goel
        Columbia University
Computational Social Science: Lecture 5

          February 22, 2013
High School Dating Network
[ Bearman, Moody, & Stovel, 2004 ]
Image by Mark Newman, via Easley & Kleinberg
Corporate E-mail Communication
[ Adamic & Adar, 2004 ]
via Easley & Kleinberg
“Internet map 2004” from Math Insight
http://mathinsight.org/image/internet_map_jurvetson_2004
Networks/Graphs

             Nodes/vertices
people, organizations, webpages, computers

                  Edges
represent connections between pairs of nodes
2            5

                11
                     1
4       9

            6            2
1                   2

                        4
                6                4
2   1
            2
                                 3
                    7
                            13
3
1       2


                5

    4

                            1   2   3   4   5   6   7
                    7   1   0   1   0   1   0   0   0
            6
                        2   1   0   1   0   1   0   0
                        3   0   1   0   0   1   0   0
                        4   1   0   0   0   1   0   0
                        5   0   1   1   1   0   1   1
                        6   0   0   0   0   1   0   0
                        7   0   0   0   0   1   0   0
3
1       2


                5

    4


                    7   1   {2, 4}
            6
                        2   {1, 3, 5}
                        3   {2, 5}
                        4   {1, 5}
                        5   {2, 3, 4, 6, 7}
                        6   {5}
                        7   {5}
3
1       2


                5

    4

                        (1, 2)
                    7   (1, 4)
            6
                        (2, 3)
                        (2, 5)
                        (3, 5)
                        (4, 5)
                        (5, 6)
                        (5, 7)
2                   5           3
1               2
                        11
                                1
4           9           5

    4                               2
                6

                                        7
                    6                       1   {2:2, 4:4}
                                            2   {1:2, 3:5, 5:11}
                                            3   {2:5, 5:1}
                                            4   {1:4, 5:9}
                                            5   {2:11, 3:1, 4:9, 6:6, 7:2}
                                            6   {5:6}
                                            7   {5:2}
2                   5           3
1               2
                        11
                                1
4           9           5

    4                               2
                6
                                            (1, 2, 2)
                                        7   (1, 4, 4)
                    6
                                            (2, 3, 5)
                                            (2, 5, 11)
                                            (3, 5, 1)
                                            (4, 5, 9)
                                            (5, 6, 6)
                                            (5, 7, 2)
Adjacency list  edge list
(weighted) directed network

          Input
       Adjacency list

         Output
         Edge list
Adjacency list  edge list
(weighted) directed network

           Map
    input: u {w1, …, wk}
    foreach wi:
       output (u, wi)

         Reduce
           pass
Edge list  adjacency list
(weighted) undirected network

           Input
          Edge list

          Output
        Adjacency list
Edge list  adjacency list
(weighted) undirected network

            Map
       input: (u, w)
   output: (u, w), key := u
   output: (w, u), key := w

          Reduce
     input: u, {w1, …, wk}
           identity
3
1       2


                5

    4


                              7
            6



                Degree of node u
                # of edges incident on u
Edge list  node degrees
   undirected network

         Input
        Edge list

        Output
      Node degrees
Edge list  node degrees
   undirected network

           Map
      input: (u, w)
  output: (u, w), key := u
  output: (w, u), key := w

         Reduce
   input: u, {w1, …, wk}
       output: u, k
Edge list  degree distribution
       undirected network

             Input
            Edge list

            Output
       Degree distribution
Edge list  degree distribution
       undirected network

              Map
         input: (u, w)
     output: (u, w), key := u
     output: (w, u), key := w

            Reduce
      input: u, {w1, …, wk}
          output: u, k
Edge list  degree distribution
       undirected network

              Map
           input: u, k
        identity, key := k

            Reduce
       input: k, {u1, …, um}
          output: k, m
3
1           2


                    5

    4


                               7
                6



                        Path
            Sequence of nodes with each
        consecutive pair connected by an edge
3
1                2


                         5

    4


                                      7
                     6



                             Cycle
        Path with at least three edges with first and last
          nodes the same and all other nodes distinct
Connected Graph
There is a path between every pair of nodes
Connected Graph
There is a path between every pair of nodes
Connected Component
 A connected subset of nodes that is not
contained in any larger connected subset
Distance
Length of the shortest path between two nodes
Distance
Length of the shortest path between two nodes
Breadth-first Search
iteratively explore nodes one layer at a time
# initialize distances
dist = {}
for u in G:
   dist[u] = NA

dist [u1] = 0

d=0
periphery = { u1 }
while len(periphery) > 0:
  # find nodes one step away from the periphery
  next_level = {}
  for u in periphery:
     next_level += { w for w in neighbors[u] if dist[w] == NA }

   # update distances
   d += 1
   for u in next_level:
     dist[u] = d

  # update periphery
  periphery = next_level
BFS @ scale
    undirected network

           Input
 edge list, starting node u0

          Output
Distance to all nodes from u0
BFS @ scale
        undirected network

Input: distances (u, d)
1. join distances with edge list
2. foreach (u, d, w) output (w, d+1)
  [ also output (u0, 0) ]
3. group by w, and output min d

Computational Social Science, Lecture 05: Networks, Part I

  • 1.
    Networks Part I Sharad Goel Columbia University Computational Social Science: Lecture 5 February 22, 2013
  • 2.
    High School DatingNetwork [ Bearman, Moody, & Stovel, 2004 ] Image by Mark Newman, via Easley & Kleinberg
  • 3.
    Corporate E-mail Communication [Adamic & Adar, 2004 ] via Easley & Kleinberg
  • 4.
    “Internet map 2004”from Math Insight http://mathinsight.org/image/internet_map_jurvetson_2004
  • 5.
    Networks/Graphs Nodes/vertices people, organizations, webpages, computers Edges represent connections between pairs of nodes
  • 9.
    2 5 11 1 4 9 6 2
  • 10.
    1 2 4 6 4 2 1 2 3 7 13
  • 11.
    3 1 2 5 4 1 2 3 4 5 6 7 7 1 0 1 0 1 0 0 0 6 2 1 0 1 0 1 0 0 3 0 1 0 0 1 0 0 4 1 0 0 0 1 0 0 5 0 1 1 1 0 1 1 6 0 0 0 0 1 0 0 7 0 0 0 0 1 0 0
  • 12.
    3 1 2 5 4 7 1 {2, 4} 6 2 {1, 3, 5} 3 {2, 5} 4 {1, 5} 5 {2, 3, 4, 6, 7} 6 {5} 7 {5}
  • 13.
    3 1 2 5 4 (1, 2) 7 (1, 4) 6 (2, 3) (2, 5) (3, 5) (4, 5) (5, 6) (5, 7)
  • 14.
    2 5 3 1 2 11 1 4 9 5 4 2 6 7 6 1 {2:2, 4:4} 2 {1:2, 3:5, 5:11} 3 {2:5, 5:1} 4 {1:4, 5:9} 5 {2:11, 3:1, 4:9, 6:6, 7:2} 6 {5:6} 7 {5:2}
  • 15.
    2 5 3 1 2 11 1 4 9 5 4 2 6 (1, 2, 2) 7 (1, 4, 4) 6 (2, 3, 5) (2, 5, 11) (3, 5, 1) (4, 5, 9) (5, 6, 6) (5, 7, 2)
  • 16.
    Adjacency list edge list (weighted) directed network Input Adjacency list Output Edge list
  • 17.
    Adjacency list edge list (weighted) directed network Map input: u {w1, …, wk} foreach wi: output (u, wi) Reduce pass
  • 18.
    Edge list adjacency list (weighted) undirected network Input Edge list Output Adjacency list
  • 19.
    Edge list adjacency list (weighted) undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} identity
  • 20.
    3 1 2 5 4 7 6 Degree of node u # of edges incident on u
  • 21.
    Edge list node degrees undirected network Input Edge list Output Node degrees
  • 22.
    Edge list node degrees undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} output: u, k
  • 23.
    Edge list degree distribution undirected network Input Edge list Output Degree distribution
  • 24.
    Edge list degree distribution undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} output: u, k
  • 25.
    Edge list degree distribution undirected network Map input: u, k identity, key := k Reduce input: k, {u1, …, um} output: k, m
  • 26.
    3 1 2 5 4 7 6 Path Sequence of nodes with each consecutive pair connected by an edge
  • 27.
    3 1 2 5 4 7 6 Cycle Path with at least three edges with first and last nodes the same and all other nodes distinct
  • 28.
    Connected Graph There isa path between every pair of nodes
  • 29.
    Connected Graph There isa path between every pair of nodes
  • 30.
    Connected Component Aconnected subset of nodes that is not contained in any larger connected subset
  • 31.
    Distance Length of theshortest path between two nodes
  • 32.
    Distance Length of theshortest path between two nodes
  • 33.
  • 34.
    # initialize distances dist= {} for u in G: dist[u] = NA dist [u1] = 0 d=0 periphery = { u1 } while len(periphery) > 0: # find nodes one step away from the periphery next_level = {} for u in periphery: next_level += { w for w in neighbors[u] if dist[w] == NA } # update distances d += 1 for u in next_level: dist[u] = d # update periphery periphery = next_level
  • 35.
    BFS @ scale undirected network Input edge list, starting node u0 Output Distance to all nodes from u0
  • 36.
    BFS @ scale undirected network Input: distances (u, d) 1. join distances with edge list 2. foreach (u, d, w) output (w, d+1) [ also output (u0, 0) ] 3. group by w, and output min d