0% found this document useful (0 votes)

89 views51 pages

ECS765P - W9 - Large-Scale Graph Processing

This document discusses large scale graph processing. It covers graph applications, graph databases, graph databases with Python, Pregel, and GraphX. Graphs are used to model interactions and relationships in many domains like social networks.

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views51 pages

ECS765P - W9 - Large-Scale Graph Processing

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

ECS640U/ECS765P Big Data Processing

Large Scale Graph Processing

Lecturer: Ahmed M. A. Sayed
School of Electronic Engineering and Computer Science
ECS640U/ECS765P Big Data Processing
Large-Scale Graph Processing
Lecturer: Ahmed M. A. Sayed
School of Electronic Engineering and Computer Science

Credit: Joseph Doyle, Jesus Carrion, Felix Cuadrado, …

Weeks 6-11: Processing

Data
Ingestion Storage Processing Output
Sources

In this week, we will focus on Graph Processing

Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● Graphx
Graph Definition

A graph G = (V,E), where

• V represents the set of vertices (nodes)
• E represents the set of edges (links)
• Both vertices and edges may contain additional information

Different types of graphs:

• Directed vs. undirected edges
• Cyclic vs Acyclic
• Temporal Graphs
Graphs are ubiquitous
Modeling and tracking interactions
Social Graphs

Social media defines interaction networks

• Contacts
• Messages
• Tags
Graph analysis is quite useful to obtain valuable information
• Identify leaders in a community
Measure of influence (centrality)
Identify “special” nodes and communities
• Find the right fitness Instagram influencer to advertise your protein
powder on
Community Detection
Community detection, also called graph partition
• Helps us to reveal the hidden relations among the nodes in the network.
• Many algorithms have been developed to detect communities

Communities of college football network, using colors for conferences and spatial clustering for identified communities
https://www.ese.wustl.edu/~nehorai/research/network_science/Lu_Community_Detection_SR_2018.html
Bipartite graphs

Bipartite: when the graph is partitioned into two groups and nodes only can have edges to the other part
https://en.wikipedia.org/wiki/Bipartite_graph

Example: “Stable marriage/matching” problem: how to find a stable matching between two equally sized
sets of elements given an ordering of preferences for each element. A matching is a bijection from the
elements of one set to the elements of the other set https://en.wikipedia.org/wiki/Stable_marriage_problem
Not Stable: if there is an element A of the first matched set which prefers some given element B of the
second matched set over the element to which A is already matched with, and similarly B also
prefers A over the element to which B is already matched with.

Practical applications: Web advertising, click prediction

Contagion/epidemic networks

How quickly will COVID-19 spread on this graph?

Contact tracing and analysis of epidemic spreading

“Needle exchange” networks of drug users [Weeks et al. 2002]

Interesting Properties – Power Law

The Power Law in the degree distribution (or popularity)

• The minority (only few number of nodes) has high degree of influence
• Also called Scale-Free Networks
• Quite common in many human/social networks
Central Nodes or Influencers with high degree

Scale-Free – degree distribution follows power law

Random – uniform degree distribution

Frequency
Degree

https://en.wikipedia.org/wiki/Scale-free_network
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX
Graph Management / Storage

Traditional DBs, NoSQL DBs can store graphs

But query languages do not support native queries on graph elements (checking for relations)

We need query languages/abstractions suitable for finding relationship patterns

Rise of graph DBs

• Neo4j : Graph database management system (Java-based, not distributed)
• Titan: Distributed Graph Database
• Amazon Neptune (Nov 2017) : Fully Managed Graph Database
Graph Databases

Database that uses graph structures with nodes, edges and properties to store data

Provides index-free adjacency

• Every node is a pointer to its adjacent element
• Fast for following relationships

Edges hold most of the important information and connect:

• nodes to other nodes
• nodes to properties/metadata (Resource Description Framework - RDF)
Neo4j

Java-based graph database management system

Similar to SQL: it is ACID – Atomic, Consistent, Isolated and Durable for logical units of work for database
transactions – (https://en.wikipedia.org/wiki/ACID)

Property graph model: powerful schema-less way to model graph-based information

Good performance for non-massive datasets

Not distributed – but sharded (partition)

Cypher - Graph-specific query language

• ASCII-art syntax for define and match patterns
The property graph model

Entities – Vertices and Edges

Tags – Entities have type(s)

Properties – Key value pairs attached to entities

The property graph model: Books
Tags

Property

Entity
Why SQL is not suitable for dealing with a graph-based data?

SQL: Modelling and Querying a Graph

Relationship graph between account holders

Imagine using such

cumbersome SQL to
query large social
network graphs!

Get non-immediate
friends of
Person001 who are
up to 3 hops away
Cypher query language

Query Language for Neo4j

• Becoming standard through OpenCypher initiative (https://opencypher.org)

Declarative and Expressive language

Match queries, returning all the graph elements who satisfy all the pattern
Sample Cypher query on a graph

*..5 => any number up to 5

Movies Database neo4j
Install neo4j: https://neo4j.com/docs/operations-manual/current/installation/
Download desktop edition: https://neo4j.com/download/

Open the Movies project in the desktop and then use command :play movies

Then follow the instructions for creating movies database and queries
Movies Database neo4j
Find Movies released in the 1990s
Find actors up to 4 hops away from Kevin Bacon
Find actors shortest path between two actors
Find co-co actors of Tom Hanks
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX

Break and Quiz

Neo4j Python Library

Install neo4j python library

Pip install neo4j

For example: https://github.com/neo4j-examples/movies-python-bolt

Can access neo4j database with python using the neo4j library
Neo4j Python Obtaining Json Graph

Obtain json of the graph defined in the movies database showing movie titles and their actors/cast
Neo4j Python Search Functionality

Search for movies in the database that has sub-text defined by the variable q in movie titles

Case Sensitive Matching Any character Zero or More times

https://neo4j.com/docs/cypher-manual/current/clauses/where/#query-where-regex
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX
Graph Traversal in MapReduce
Approach: Parallel processing of each vertex
● Each Map/Reduce function has access to limited info
One node and its links
Iterative executions of a MapReduce job
● Map: compute something on each node. Potentially send information to that node or other nodes that
is aggregated by the Reducers.
● Reducers: compute something on each unique node
● The output of the reducers in iteration #n becomes the input of the mappers in iteration #n+1
Finding the Shortest Path: Intuition

Breadth-First Search (BFS) algorithm (https://en.wikipedia.org/wiki/Breadth-first_search)

We can define the solution to this problem via induction:

● distanceTo(startNode) = 0
● For all nodes n directly reachable from startNode à distanceTo(n) = 1
● For all nodes n reachable from some other set of nodes S,
distanceTo(n) = 1 + min(distanceTo(m) for all m neighbors ∈ S)
Visualizing Parallel BFS

Inefficient

Need to keep track of

the list of visited nodes
and pass it over
between jobs along
with the updated graph
state/structure
MapReduce graph processing performance
Iterative algorithms involve HDFS writing in each step
● Resending the graph structure in each iteration is VERY inefficient
One Map task per node, and sending of messages to other nodes depending on connections between
graph nodes results in significant communications cost.
In-memory systems are a much better fit for this type of computation à Spark Framework
● Graph-specific in-memory systems have been developed recently
More Efficient Alternatives
Google’s Pregel
● Original Google paper
● Google’s Pregel model: Think like a vertex [1]
Apache Giraph
● Java-based
Apache Spark GraphX
● Extension of Spark with Graph-centric computation model (Scala)
● GraphFrames for Python API (used for this week’s lab)

… Ongoing research efforts in this space

[1] Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010, June). ”Pregel: a system for large-scale graph
processing.”, In Proceedings of the ACM SIGMOD
Pregel: Think like a vertex

The Pregel framework allows you to write “vertex-centric” code.

The same user code, a compute() function, is run concurrently
on each vertex of the graph.
Each instance of this function
1. keeps track of information
2. can iterate over outgoing edges (each of which has a value)
3. can send messages to the vertices connected to those edges
or to any other vertices it may know about (e.g., having
received a vertex ID via a message) Bulk Synchronous Parallel (BSP)

https://people.cs.rutgers.edu/~pxk/417/notes/pregel.html
Pregel’s node/vertex-centric processing model
Pregel-style graph processing systems
Computation is iterative but in the form of supersteps
● Every iteration, a function that is executed at each vertex
Vertices can send messages to its neighbours
Messages arrive in the next superstep
Computation is executed in parallel
● Each vertex is independent from the rest in the same step
● Messages are the synchronization mechanism

https://people.cs.rutgers.edu/~pxk/417/notes/pregel.html
Google’s PageRank
PageRank is a link analysis algorithm
The rank value indicates the importance of a particular web page
A hyperlink to a page counts as a vote of support
A page that is linked to by many pages with high PageRank receives a high rank itself
Example: A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link
will be directed to the document with a PageRank of 0.5

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to the web., WWW
PageRank Example
Rank of the neighbor

Initial value = 1 / N
(number of pages)
Outdegree of the neighbor

r1(P2) = r(P3) / d(P3) + r(P1) / d(P1) = (1/6)/3 + (1/6)/2 = 1/18 + 1/12 = 30 / 216 = 5 / 36
r2(P2) = r1(p3)/d(p3) + r1(P1)/d(P1) = (1/12)/3 + (1/18)/2 = 1/36+1/36 = 1/18
https://en.wikipedia.org/wiki/PageRank
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX
Spark GraphX
Spark’s library for graph processing
Provides specialized RDDs for representing graph structure, as well as its information (property graphs)
Provides methods for creating graph, transforming them, implementing multiple common graph metrics
and algorithms
GraphX is written in Scala à Graphframes is the Python library for using Spark’s Graph Processing
Spark GraphX Property Graphs
Spark GraphX RDD
Holds graph data and provides methods for manipulating them
VertexRDD[VertexId, VertexData]
Vertex IDs have to be Integer/Long
VertextData Holds vertex properties
EdgeRDD [EdgeData]
Edgedata holds source and destination IDs and edge properties
Technically a directed graph
Triplets
Join of source vertex, destination vertex, and edge
GraphX predefined methods
A Graph RDD has multiple convenience methods that provide access to its information and implement
relevant operations

● Access to RDDs with the property information

graph.vertices, graph.edges, graph.triplets
● Provides a tuple with (vertexId, degree of each vertex)
graph.degrees
● Obtains each of the connected components of the graph
graph.connectedComponents
GraphX predefined methods
_2 is second field in table -> Property column _2 is second field in Property column -> position of the person

graph.vertices.map(v => v._2._2).collect() //returns (student, postdoc, professor, professor)

graph.edges.filter ( e => e._3.equals("PI")).count() //returns 1
graph.vertices.filter { case (id, (name, pos)) => pos == "postdoc" }.count // Count all users that are postdocs
return 1

https://spark.apache.org/docs/latest/graphx-programming-guide.html
Graph aggregate computation
Aggregate transformations send and process messages to every vertex through each edge
graph.aggregateMessages: This operator applies a user defined sendMsg function to each edge triplet in
the graph and then uses the mergeMsg function to aggregate those messages at their destination vertex.

The operation involves the following:

● sendMsg: EdgeContext[VD, ED, Msg] => Unit
Can send messages to either source or destination, using context (Same as Map in MapReduce)
● mergeMsg: (Msg, Msg) => Msg
All the received messages by a vertex are reduced into one (Same as Reduce in MapReduce)
Returns a tuple of (vertexId, results)

https://spark.apache.org/docs/latest/graphx-programming-guide.html#aggregate-messages-aggregatemessages
Age of the oldest follower of each node
(Scala code)

val oldFollowers: VertexRDD[(Int, Double)] =

graph.aggregateMessages[(Int, Double)](
// sendMessages Max(23, 42) = 42
edge => edge.sendToDst(edge.srcAttr),
//mergeMessages
(a, b) => math.max (a,b)
75
)

http://webprojects.eecs.qmul.ac.uk/ag316/notesSite/BDP_slides/Week7%20%7C%20BigGraphs/ECS640-9-BigGraphs.pdf
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX

End and Quiz

Graph Analytics PDF
No ratings yet
Graph Analytics PDF
13 pages
Introtoneo4jwebinar331 160331235041
No ratings yet
Introtoneo4jwebinar331 160331235041
117 pages
Graph Databases: Phil Bartie
No ratings yet
Graph Databases: Phil Bartie
83 pages
Beginnerpresentation 120429104540 Phpapp01
No ratings yet
Beginnerpresentation 120429104540 Phpapp01
30 pages
An Introduction To Graph Data Management
No ratings yet
An Introduction To Graph Data Management
39 pages
Graph Database - Wikipedia
No ratings yet
Graph Database - Wikipedia
15 pages
EUC1502 Module5 Big-Data
No ratings yet
EUC1502 Module5 Big-Data
46 pages
Lecture02 GraphDatabases Neo4J PDF
No ratings yet
Lecture02 GraphDatabases Neo4J PDF
95 pages
Graph Databases for Tech Enthusiasts
No ratings yet
Graph Databases for Tech Enthusiasts
7 pages
Graph Databases: Their Power and Limitations
No ratings yet
Graph Databases: Their Power and Limitations
12 pages
Graph Database
No ratings yet
Graph Database
4 pages
Neo4j Graph Database Overview
No ratings yet
Neo4j Graph Database Overview
19 pages
R23 IDS Unit4 PPT - 2.0
No ratings yet
R23 IDS Unit4 PPT - 2.0
38 pages
ADO Lecture IX 2023-25
No ratings yet
ADO Lecture IX 2023-25
44 pages
Graph Database
No ratings yet
Graph Database
92 pages
Neo4j Sessio11 graphDataModeling
No ratings yet
Neo4j Sessio11 graphDataModeling
68 pages
Introduction to Graph Databases
No ratings yet
Introduction to Graph Databases
18 pages
NoSQL Database Document
No ratings yet
NoSQL Database Document
5 pages
Introduction To Data Science UNIT - IV
No ratings yet
Introduction To Data Science UNIT - IV
45 pages
Unit 5 Nosql
No ratings yet
Unit 5 Nosql
72 pages
Graph Done Right
No ratings yet
Graph Done Right
9 pages
Neo4j Graph Database Guide
No ratings yet
Neo4j Graph Database Guide
8 pages
216-219, Tesma0802, IJEAST
No ratings yet
216-219, Tesma0802, IJEAST
4 pages
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
No ratings yet
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
10 pages
9 NoSQL Database
No ratings yet
9 NoSQL Database
53 pages
Graphs Fundamental Concepts and Applications
No ratings yet
Graphs Fundamental Concepts and Applications
10 pages
Graph Algorithms in MapReduce & Spark
No ratings yet
Graph Algorithms in MapReduce & Spark
22 pages
GraphDatabase Lab Practices
No ratings yet
GraphDatabase Lab Practices
18 pages
Cs498 Week 11 Slide
No ratings yet
Cs498 Week 11 Slide
145 pages
Module 5
No ratings yet
Module 5
26 pages
09 - Introduction To Graph Data Model
No ratings yet
09 - Introduction To Graph Data Model
22 pages
Social Media IR
No ratings yet
Social Media IR
39 pages
Chapter 3. Graph Platforms and Processing: Platform Considerations
No ratings yet
Chapter 3. Graph Platforms and Processing: Platform Considerations
12 pages
Graph Database
No ratings yet
Graph Database
64 pages
Analysis of Fraudulent in Graph Database For Identification and Prevention
No ratings yet
Analysis of Fraudulent in Graph Database For Identification and Prevention
8 pages
08 Graph Databases Survey
No ratings yet
08 Graph Databases Survey
7 pages
Graph Database-An Overview of Its Applications and Its Types
No ratings yet
Graph Database-An Overview of Its Applications and Its Types
5 pages
Graph Databases: Immanuel Trummer
No ratings yet
Graph Databases: Immanuel Trummer
38 pages
Using Neo4j To Mitigate Supply Chain Risk
No ratings yet
Using Neo4j To Mitigate Supply Chain Risk
31 pages
Implement - Graph Databases
No ratings yet
Implement - Graph Databases
40 pages
Neo4j: What's A Graph Database?
No ratings yet
Neo4j: What's A Graph Database?
2 pages
Online AppQ HR Q1-Q30
No ratings yet
Online AppQ HR Q1-Q30
30 pages
10 Class 2016 Partii (Read-Only)
No ratings yet
10 Class 2016 Partii (Read-Only)
23 pages
Enhance RAG with Neo4j KG & Vector Search
No ratings yet
Enhance RAG with Neo4j KG & Vector Search
40 pages
Graph Database Query Feature
No ratings yet
Graph Database Query Feature
6 pages
Graph Neo4j
No ratings yet
Graph Neo4j
46 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Graph Databases for Tech Professionals
No ratings yet
Graph Databases for Tech Professionals
24 pages
Topic 1 - Graphs
No ratings yet
Topic 1 - Graphs
14 pages
Mathematics-2
No ratings yet
Mathematics-2
10 pages
DSA - Module 4 - Lesson 3 - Graph
No ratings yet
DSA - Module 4 - Lesson 3 - Graph
47 pages
Neo4j Graph Analytics
No ratings yet
Neo4j Graph Analytics
20 pages
Big Graph Analyses: From Queries To Dependencies and Association Rules
No ratings yet
Big Graph Analyses: From Queries To Dependencies and Association Rules
19 pages
Reversing On The Edge Recon14 Jspelman Jjones PDF
No ratings yet
Reversing On The Edge Recon14 Jspelman Jjones PDF
32 pages
AI Search Algorithms & Data Structures
No ratings yet
AI Search Algorithms & Data Structures
25 pages
Neo4j Fundamentals Summary
No ratings yet
Neo4j Fundamentals Summary
1 page
Lecture 8 Graph Databases
No ratings yet
Lecture 8 Graph Databases
77 pages
Graph in Datastructure
No ratings yet
Graph in Datastructure
34 pages
Unit 5 Lecture Notes 5
No ratings yet
Unit 5 Lecture Notes 5
20 pages
ECS765P - W5 - Spark Programming
No ratings yet
ECS765P - W5 - Spark Programming
43 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
Note - Wireless Communications For Everybody
No ratings yet
Note - Wireless Communications For Everybody
2 pages
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
43 pages
ECS765P - W4 - Introduction To Spark
No ratings yet
ECS765P - W4 - Introduction To Spark
39 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
Magic Pens: Color-Changing & Erasable
No ratings yet
Magic Pens: Color-Changing & Erasable
4 pages
Magic Light Tricks Guide
No ratings yet
Magic Light Tricks Guide
6 pages
ECS726-Week01 Intro
No ratings yet
ECS726-Week01 Intro
70 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
ECS726-Week02 Symmetric EncryptionP
No ratings yet
ECS726-Week02 Symmetric EncryptionP
62 pages
ECS726-Week05 Cryptographic Protocols Key Management-P
No ratings yet
ECS726-Week05 Cryptographic Protocols Key Management-P
58 pages
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
No ratings yet
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
52 pages
Stream Processing in Big Data
No ratings yet
Stream Processing in Big Data
39 pages
MapReduce Programming Model Guide
No ratings yet
MapReduce Programming Model Guide
55 pages
W3 Ecs7020p
No ratings yet
W3 Ecs7020p
51 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
ECS781P 10 Microservices
No ratings yet
ECS781P 10 Microservices
34 pages
ECS781P 6 CloudPerformanceSLAs
No ratings yet
ECS781P 6 CloudPerformanceSLAs
39 pages
ECS781P-3-Cloud Applications
No ratings yet
ECS781P-3-Cloud Applications
50 pages
ECS781P-11-Edge of The Cloud
No ratings yet
ECS781P-11-Edge of The Cloud
30 pages
Cloud Computing Lab 2
No ratings yet
Cloud Computing Lab 2
4 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Ecs781p 4 Rest
No ratings yet
Ecs781p 4 Rest
47 pages
Matt Mello - Thought Control
100% (1)
Matt Mello - Thought Control
16 pages
The Passion of An Amateur Card Magician
100% (4)
The Passion of An Amateur Card Magician
557 pages
Tom Rose - From The Red Notebook 2nd Edition
83% (6)
Tom Rose - From The Red Notebook 2nd Edition
33 pages
Differential Calculus: y + y F (X + X) y F (X + X) - y or y F (X + X) - F (X)
No ratings yet
Differential Calculus: y + y F (X + X) y F (X + X) - y or y F (X + X) - F (X)
13 pages
PC樁型錄 (環台)
100% (1)
PC樁型錄 (環台)
28 pages
Model Constitution of The Bar Associations of State of U - PDF - Advocate - Quor
No ratings yet
Model Constitution of The Bar Associations of State of U - PDF - Advocate - Quor
32 pages
Post-Tensioned Girder
100% (1)
Post-Tensioned Girder
30 pages
PRIME AMP Guide
No ratings yet
PRIME AMP Guide
6 pages
Ashirvad UGD Price List 08-May-23
No ratings yet
Ashirvad UGD Price List 08-May-23
8 pages
Statement 5
No ratings yet
Statement 5
4 pages
Unit-3 SQE (Models)
No ratings yet
Unit-3 SQE (Models)
7 pages
Seagate HDD Data Sheet
No ratings yet
Seagate HDD Data Sheet
2 pages
CMMI Process Improvement Guide
No ratings yet
CMMI Process Improvement Guide
13 pages
HAZOP Training
No ratings yet
HAZOP Training
21 pages
IEC 61010-1-2010 Amd1-2016 Cor1-2019
50% (2)
IEC 61010-1-2010 Amd1-2016 Cor1-2019
4 pages
Printer Driver Installation Guide
No ratings yet
Printer Driver Installation Guide
9 pages
Manual Generador Genmax
No ratings yet
Manual Generador Genmax
37 pages
Sistem Informasi 12, 13, 14
No ratings yet
Sistem Informasi 12, 13, 14
43 pages
14.T24 Common Variables-R17
100% (2)
14.T24 Common Variables-R17
29 pages
Migrating An Oracle Database To AWS
No ratings yet
Migrating An Oracle Database To AWS
9 pages
Allied Telesis - at gs2002 SP - Data Sheet
No ratings yet
Allied Telesis - at gs2002 SP - Data Sheet
2 pages
Acceptance Criteria - Sample
No ratings yet
Acceptance Criteria - Sample
10 pages
PowerPoint 2013 Guide & Features
No ratings yet
PowerPoint 2013 Guide & Features
13 pages
Character Sheet: STR DEX CON INT WIS CHA HP Speed
No ratings yet
Character Sheet: STR DEX CON INT WIS CHA HP Speed
2 pages
498 FA2019 Lecture01
No ratings yet
498 FA2019 Lecture01
61 pages
SCAW Installation and Upgrade Procedure SCAW-9003B
No ratings yet
SCAW Installation and Upgrade Procedure SCAW-9003B
6 pages
LectroPol-5 Brochure English PDF
No ratings yet
LectroPol-5 Brochure English PDF
4 pages
Doc-20231010-Wa0001 231016 050803
No ratings yet
Doc-20231010-Wa0001 231016 050803
2 pages
COMP301 Lab 1
No ratings yet
COMP301 Lab 1
2 pages
Vacuum Cleaner Project
No ratings yet
Vacuum Cleaner Project
20 pages
23SC1101-course Handout
No ratings yet
23SC1101-course Handout
32 pages
CS601 Short Notes (VUAnswer - Com) Topic 124 To 204
100% (1)
CS601 Short Notes (VUAnswer - Com) Topic 124 To 204
98 pages
Curriculum Vitae: Auli Ullah Talukder
No ratings yet
Curriculum Vitae: Auli Ullah Talukder
11 pages

ECS765P - W9 - Large-Scale Graph Processing

Uploaded by

ECS765P - W9 - Large-Scale Graph Processing

Uploaded by

ECS640U/ECS765P Big Data Processing

Large Scale Graph Processing

Credit: Joseph Doyle, Jesus Carrion, Felix Cuadrado, …

In this week, we will focus on Graph Processing

A graph G = (V,E), where

Different types of graphs:

Social media defines interaction networks

Practical applications: Web advertising, click prediction

How quickly will COVID-19 spread on this graph?

Contact tracing and analysis of epidemic spreading

“Needle exchange” networks of drug users [Weeks et al. 2002]

The Power Law in the degree distribution (or popularity)

Scale-Free – degree distribution follows power law

Random – uniform degree distribution

Traditional DBs, NoSQL DBs can store graphs

We need query languages/abstractions suitable for finding relationship patterns

Rise of graph DBs

Provides index-free adjacency

Edges hold most of the important information and connect:

Java-based graph database management system

Property graph model: powerful schema-less way to model graph-based information

Good performance for non-massive datasets

Not distributed – but sharded (partition)

Cypher - Graph-specific query language

Entities – Vertices and Edges

Tags – Entities have type(s)

Properties – Key value pairs attached to entities

SQL: Modelling and Querying a Graph

Imagine using such

Query Language for Neo4j

Declarative and Expressive language

*..5 => any number up to 5

Break and Quiz

Install neo4j python library

For example: https://github.com/neo4j-examples/movies-python-bolt

Case Sensitive Matching Any character Zero or More times

Breadth-First Search (BFS) algorithm (https://en.wikipedia.org/wiki/Breadth-first_search)

We can define the solution to this problem via induction:

Need to keep track of

… Ongoing research efforts in this space

The Pregel framework allows you to write “vertex-centric” code.

● Access to RDDs with the property information

graph.vertices.map(v => v._2._2).collect() //returns (student, postdoc, professor, professor)

The operation involves the following:

val oldFollowers: VertexRDD[(Int, Double)] =

End and Quiz

You might also like