0% found this document useful (0 votes)

43 views61 pages

GraphX & Graph Analytics

GraphX is a distributed graph computation framework within Apache Spark that integrates graph and data parallel computation for Big Data Analytics. It supports various graph operations, including property graphs, neighborhood aggregation, and advanced graph algorithms like PageRank and connected components. GraphX is designed for efficient storage, flexible modeling, and fault tolerance, making it suitable for applications in social network analysis, recommendation systems, and drug discovery.

Uploaded by

nainalashalini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views61 pages

GraphX & Graph Analytics

Uploaded by

nainalashalini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

GraphX & Graph Analytics

GraphX
GraphX is a distributed graph computation framework that unifies
graph parallel and data parallel computation for Big Data Analytics
GraphX
GraphX is a component of Apache Spark for graphs and graph-parallel
computation.
It unifies ETL (Extract, Transform, Load), exploratory analysis, and
iterative graph computation within a single system.
● Built on top of Apache Spark
● Supports RDD-based graph abstraction
● Enables graph-parallel computation using Pregel API
● Combines the power of both data-parallel and graph-parallel systems
Introduction
Graphs are only useful for specific things.
● It can measure things like “connectedness”, degree distribution,
average path length, triangle counts-high level measures of a graph.
● It can count triangles in the graph, and apply the PageRank algorithm
to it.
● It can also join graphs together and transform graphs
quickly It supports the Pregel API( google) for traversing
a graph.
• Introduces VertexRDD and EdgeRDD, and the Edge data type .
Getting Started

To get started you first need to import Spark and GraphX into your
project, as follows:
import org.apache.spark._
import org.apache.spark.graphx._
// To make some of the examples work we will also need RDD
import org.apache.spark.rdd.RDD
Graphs in Machine Learning Landscape

● Graphs are a flexible and powerful data structure used to

represent entities (nodes/vertices) and the relationships between
them (edges).
● Traditional ML models work well with tabular data, but many
real-world scenarios involve interconnected data — which is
best modeled as graphs.
Graph-Structured Data

Graph-structured data consists of:

● Vertices (Nodes): represent entities (e.g., users, airports)

● Edges (Links): represent relationships or interactions between nodes

The Property Graph

● Definition: A directed multigraph with user-defined objects attached to each

vertex and edge.
● Key Characteristics:
○ Directed Multigraph: Allows multiple parallel edges between the same
source and destination.
○ Vertex Identifier: Each vertex is keyed by a unique 64-bit VertexId.
○ Edge Properties: Identified by source and destination vertex IDs.
● Applications:
○ Multiple relationships modeling (e.g., co-worker and friend between two
vertices).
Parameterization of Property Graphs

● Vertex (VD) and Edge (ED) Types:

○ VD: Data associated with each vertex.
○ ED: Data associated with each edge.

● Optimization:
○ Primitive types (e.g., int, double) reduce memory usage by using
specialized arrays.
Heterogeneous Vertex Types

Scenario: Modeling vertices with different property types.

Implementation via Inheritance:
Immutability and Fault Tolerance

● Immutability:
○ Graphs are immutable like RDDs.
○ Changes produce new graphs, reusing unaffected parts of the
original.

● Fault Tolerance:
○ Graphs are partitioned across executors.
○ Partitions can be recreated on different machines if failures occur
Logical Structure of Property Graphs
● Components:
○ Vertices: RDD encoding properties of each vertex.
○ Edges: RDD encoding properties of each edge.

● Graph Class:
Optimized RDDs in Property Graphs
● VertexRDD[VD] and EdgeRDD[ED]:
○ Extend and optimize RDD[(VertexId, VD)] and RDD[Edge[ED]].
○ Provide additional graph computation functionality.

● Conceptual Representation:
○ VertexRDD: RDD[(VertexId, VD)]
○ EdgeRDD: RDD[Edge[ED]]
Key Benefits of Property Graphs in GraphX

● Efficient Storage: Optimized memory usage for primitive data

types.
● Flexible Modeling: Supports diverse vertex and edge
properties.
● Distributed and Fault-Tolerant: Handles failures seamlessly.
● Functional Structure: Immutability ensures clean
transformations.
● Rich API: Extends RDD functionality for graph computations.
Example Property Graph
Suppose we want to construct a property graph consisting of the various
collaborators on the GraphX project. The vertex property might contain
the username and occupation. We could annotate edges with a string
describing the relationships between collaborators:
Example Property Graph
Graph-Structured Data

Example:
● Social Network:
○ Nodes = users
○ Edges = friendships
Types of graphs:
● Directed vs Undirected
● Weighted vs Unweighted
● Homogeneous vs Heterogeneous
Applications & Examples

1. Social Network Analysis

Graphs represent:
● Users → Nodes

● Friendships or follows → Edges

🔹 Use Case: Node Classification
● Goal: Predict the type of a user (e.g., spam vs real, interests, age group).

● Example:

○ Facebook graph with users and friends.

○ Train a model to classify users into categories based on their connections.

🔹 Use Case: Link Prediction
● Goal: Predict if a link (friendship) is likely to form.

● Example:

○ Suggesting "People You May Know" on Facebook.

○ Based on mutual friends, interests, and interaction patterns.

2. Recommendation Systems

Graphs represent:
● Users and items (movies, books, products) as nodes

● Interactions (e.g., rating, purchase) as edges

🔹 Use Case: Link Prediction
● Goal: Recommend products a user might buy.

● Example:

○ In Amazon, if User A likes Book 1 and Book 2, and User B likes

Book 2 and Book 3, then Book 1 can be recommended to User B.

○ This is modeled as collaborative filtering on a bipartite graph.

Drug Discovery (Bioinformatics)

Graphs represent:
● Atoms → Nodes
● Chemical bonds → Edges
🔹 Use Case: Graph Classification
● Goal: Predict the property of a molecule.

● Example:
○ Classify if a molecule is toxic or not.
○ Model learns structure–activity relationship (SAR) from
molecular graphs.
○ Graph Neural Networks (GNNs) like Graph Convolutional
Networks (GCNs) are commonly used.
Introduction to Graph Operators

● Definition: Graph operators are functions applied to graphs to

transform, analyze, or retrieve information from their vertices
and edges.
●
● Key Objectives:
○ Enable transformations and computations on graph data.
○ Provide flexibility and efficiency in graph processing.
Introduction to Graph Operators

● Categories:
○ Property Operators: Modify vertex or edge attributes.
○ Structural Operators: Alter graph structure.
○ Join Operators: Integrate external data with graph
elements.
● Application Areas:
○ Social network analysis.
○ Recommendations systems.
○ Fraud detection.
Property Operators

● Purpose: Transform vertex or edge properties without changing

the graph structure.
● Preserve structural indices for optimization.
● Initialize graphs for specific computations.
Property Operators
Structural Operators

● Purpose: Modify the structure of the graph.

Examples:
● Removing invalid vertices or edges.
● Simplifying multigraphs.
Structural Operators
Join Operators

● Purpose: Combine external data with graph elements.

Applications:
● Enriching graph data with external attributes.
● Integrating results from other computations.
Join Operators
joinVertices: Joins an RDD with vertices and modifies vertex properties.
val updatedGraph = graph.joinVertices(extraData)((id, oldAttr,
newAttr) => newAttr)

outerJoinVertices: Joins an RDD with vertices and allows for unmatched

vertices.
val updatedGraph = graph.outerJoinVertices(extraData)((id, attr,
opt) => opt.getOrElse(attr))
Advanced Graph Computations

Aggregate Messages:
● Collects and aggregates information from neighboring vertices.
Advanced Graph Computations

Pregel API:
● Iterative computation framework for graph processing.
Neighborhood Aggregation in GraphX
● Neighborhood aggregation is central to many graph analytics tasks.
● Examples include:
○ Counting followers.
○ Calculating average attributes (e.g., age of followers).
○ Iterative algorithms like PageRank, Shortest Path, and Connected
Components.
● Aggregation operators transition from mapReduceTriplets to
aggregateMessages for improved performance.
AggregateMessages Operator
● Purpose: Core operation for neighborhood aggregation.
● Definition:

Components:
1. sendMsg: Map function to send messages via EdgeContext.
2. mergeMsg: Reduce function to aggregate messages.
3. tripletFields: Optional argument to optimize join strategies by specifying accessed fields.
AggregateMessages Operator

Key Features
● Explicit user control over accessed fields (TripletFields).
● Returns VertexRDD[Msg] containing aggregated messages.
● Vertices without messages are excluded.
● Optimized for constant-sized messages.
Example: Average Age of Older Followers
val graph: Graph[Double, Int] = GraphGenerators.logNormalGraph(sc,
numVertices = 100).mapVertices((id, _) => id.toDouble)
val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int,
Double)]( triplet => {
if (triplet.srcAttr > triplet.dstAttr) {
triplet.sendToDst((1, triplet.srcAttr)) }
}, (a, b) => (a._1 + b._1, a._2 + b._2) )
val avgAgeOfOlderFollowers: VertexRDD[Double] = olderFollowers.mapValues
{ case (count, totalAge) => totalAge / count }
avgAgeOfOlderFollowers.collect.foreach(println)
Legacy Operator: mapReduceTriplets
Replaced by aggregateMessages due to:
● Inefficiency of iterator-based message aggregation.
● Limited optimization capabilities.
val graph: Graph[Int, Float] = ...

// mapReduceTriplets

def msgFun(triplet: Triplet[Int, Float]): Iterator[(Int, String)] = Iterator((triplet.dstId, "Hi"))

def reduceFun(a: String, b: String): String = a + " " + b

val result = graph.mapReduceTriplets[String](msgFun, reduceFun)

// aggregateMessages

def msgFun(triplet: EdgeContext[Int, Float, String]) {

triplet.sendToDst("Hi")

def reduceFun(a: String, b: String): String = a + " " + b

val result = graph.aggregateMessages[String](msgFun, reduceFun)

Common Aggregation Tasks
1. Degree Information:
val maxInDegree: (VertexId, Int) = graph.inDegrees.reduce(max)
val maxOutDegree: (VertexId, Int) = graph.outDegrees.reduce(max)
val maxDegrees: (VertexId, Int) = graph.degrees.reduce(max)
2. Collecting Neighbors:
val neighborIds: VertexRDD[Array[VertexId]] =
graph.collectNeighborIds(EdgeDirection.Out)
val neighbors: VertexRDD[Array[(VertexId, VD)]] =
graph.collectNeighbors(EdgeDirection.In)
Collecting Neighbors

In some cases it may be easier to express computation by collecting neighboring

vertices and their attributes at each vertex. This can be easily accomplished using
the collectNeighborIds and the collectNeighbors operators.
class GraphOps[VD, ED] {
def collectNeighborIds(edgeDirection: EdgeDirection):
VertexRDD[Array[VertexId]]
def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[
Array[(VertexId, VD)] ]
}
Distributed Graphs

GraphX adopts a vertex-cut approach to distributed graph partitioning:

Distributed Graphs
● GraphX partitions the graph along vertices which can reduce both the
communication and storage overhead.
● Logically, this corresponds to assigning edges to machines and
allowing vertices to span multiple machines.
● The exact method of assigning edges depends on the PartitionStrategy
and there are several tradeoffs to the various heuristics.
● Users can choose between different strategies by repartitioning the
graph with the Graph.partitionBy operator.
● The default partitioning strategy is to use the initial partitioning of the
edges as provided on graph construction.
Graph Algorithms
Page Rank Algorithm

● What is PageRank?
○ Measures the importance of each vertex in a graph based on the
edges (or connections).
○ Works on the premise that an edge from node u to node v
represents an endorsement of v’s importance by u.
○ Example: A Twitter user with many followers will have a higher
rank (importance).
Page Rank Algorithm

● Types of PageRank in GraphX:

○ Static PageRank: Runs for a fixed number of iterations.

○ Dynamic PageRank: Runs until the ranks converge (i.e., stop

changing by more than a specified tolerance).
Page Rank Algorithm

Using PageRank in GraphX

● How to use PageRank:

○ PageRank Methods: Available as methods on the PageRank

object.

○ GraphOps: Enables calling algorithms directly on the graph.

Page Rank Algorithm

● Example Dataset:

○ Social Network Example:

■ Users are listed in data/graphx/users.txt.

■ Relationships between users are in data/graphx/followers.txt.

○ Goal: Compute the PageRank for each user based on their

followers.
import org.apache.spark.graphx.GraphLoader
// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1)) }
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
println(ranksByUsername.collect().mkString("\n"))
What are Connected Components?

● A connected component is a subgraph in which any two vertices are

connected by paths.

● Each connected component is labeled with the ID of its

lowest-numbered vertex.

● Example: In a social network, connected components can represent

clusters of users who are all connected directly or indirectly.
Connected Components in GraphX

● GraphX Implementation:
○ The algorithm is available in the ConnectedComponents object
in GraphX.
○ Labels each connected component in the graph with a unique ID
(the ID of the lowest-numbered vertex).

● How It Works:
○ The connected component algorithm runs on a graph and assigns
the same ID to all vertices in the same connected component.
○ Vertices with no edges are considered individual components.
Connected Components in GraphX

import org.apache.spark.graphx.GraphLoader
// Load the graph as in the PageRank example
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Find the connected components
val cc = graph.connectedComponents().vertices
/
Connected Components in GraphX

/ Join the connected components with the usernames

val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))}
val ccByUsername = users.join(cc).map {
case (id, (username, cc)) => (username, cc)}
// Print the result
println(ccByUsername.collect().mkString("\n"))
Triangle Counting in GraphX

● What is Triangle Counting?

○ A vertex is part of a triangle if it has two adjacent vertices with an

edge between them.

○ Triangle counting provides a measure of clustering by counting

how many triangles pass through a vertex.
Triangle Counting in GraphX

● GraphX Implementation:

○ The triangle counting algorithm is available in the TriangleCount

object in GraphX.

○ The algorithm counts the number of triangles that pass through

each vertex.
Triangle Counting in GraphX

● How It Works:

○ A triangle is formed when three vertices are connected, and each

pair of adjacent vertices has an edge between them.

○ Triangle counting is a measure of clustering in the graph,

indicating how connected the graph is in groups.
import org.apache.spark.graphx.{GraphLoader, PartitionStrategy}
// Load the edges in canonical order and partition the graph for triangle
count
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt",
true) .partitionBy(PartitionStrategy.RandomVertexCut)
// Find the triangle count for each vertex
val triCounts = graph.triangleCount().vertices
// Join the triangle counts with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1)) }
val triCountByUsername = users.join(triCounts).map { case (id,
(username, tc)) => (username, tc)}
// Print the result
println(triCountByUsername.collect().mkString("\n"))

Spark GraphX for Data Scientists
No ratings yet
Spark GraphX for Data Scientists
43 pages
GraphX - Spark 3.5.0 Documentation
No ratings yet
GraphX - Spark 3.5.0 Documentation
34 pages
Spark-GraphX and Neo4j
No ratings yet
Spark-GraphX and Neo4j
32 pages
Lec 32
No ratings yet
Lec 32
25 pages
GraphX for Data Scientists
No ratings yet
GraphX for Data Scientists
34 pages
Da 4
No ratings yet
Da 4
14 pages
Session 3.8
No ratings yet
Session 3.8
17 pages
Lec 33
No ratings yet
Lec 33
33 pages
Aph: User Guide
No ratings yet
Aph: User Guide
21 pages
MODULE-Analyzing Co-Occurrence-Networks With GraphX
No ratings yet
MODULE-Analyzing Co-Occurrence-Networks With GraphX
43 pages
Practical Apache Spark in GraphX
No ratings yet
Practical Apache Spark in GraphX
8 pages
Boosting Big Data Analytics With Apache Spark GraphX
No ratings yet
Boosting Big Data Analytics With Apache Spark GraphX
13 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
Graph Analytics PDF
No ratings yet
Graph Analytics PDF
13 pages
NetworkX Graph Creation Guide
No ratings yet
NetworkX Graph Creation Guide
8 pages
Networkx Tutorial
100% (1)
Networkx Tutorial
8 pages
Graphs Fundamental Concepts and Applications
No ratings yet
Graphs Fundamental Concepts and Applications
10 pages
Apache Spark Graph Processing - Sample Chapter
No ratings yet
Apache Spark Graph Processing - Sample Chapter
22 pages
Graph Mining: Techniques & Applications
No ratings yet
Graph Mining: Techniques & Applications
8 pages
Unit 6
No ratings yet
Unit 6
34 pages
Lecture 1 Scribe
No ratings yet
Lecture 1 Scribe
13 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Social Expetiment
No ratings yet
Social Expetiment
3 pages
GraphX Tutorial
No ratings yet
GraphX Tutorial
17 pages
GNNS
No ratings yet
GNNS
7 pages
Heterogeneous Graphs
No ratings yet
Heterogeneous Graphs
15 pages
GML Tutorial I
No ratings yet
GML Tutorial I
5 pages
Networkx: Network Analysis With Python: Salvatore Scellato
No ratings yet
Networkx: Network Analysis With Python: Salvatore Scellato
49 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Spark & RDD Guide for Developers
No ratings yet
Spark & RDD Guide for Developers
1 page
Unit - 6
No ratings yet
Unit - 6
7 pages
A Graph 2
No ratings yet
A Graph 2
17 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
14 pages
PyGraphviz 1.6: Install & Tutorial
No ratings yet
PyGraphviz 1.6: Install & Tutorial
36 pages
Distributed Graph Analytics Programming, Languages, and Their Compilation
No ratings yet
Distributed Graph Analytics Programming, Languages, and Their Compilation
213 pages
Module 5
No ratings yet
Module 5
26 pages
Scalable Graph Analytics For Social Network Analysis Using Spark Final
No ratings yet
Scalable Graph Analytics For Social Network Analysis Using Spark Final
9 pages
Social Network Analysis Metrics
No ratings yet
Social Network Analysis Metrics
5 pages
An Introduction To Graph Data Management
No ratings yet
An Introduction To Graph Data Management
39 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Daa Aat Orh PDF
No ratings yet
Daa Aat Orh PDF
13 pages
Spark SQL & GraphX Lab Guide
No ratings yet
Spark SQL & GraphX Lab Guide
5 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
Graph Done Right
No ratings yet
Graph Done Right
9 pages
ECS765P - W9 - Large-Scale Graph Processing
No ratings yet
ECS765P - W9 - Large-Scale Graph Processing
51 pages
Graph Convolutional Networks Review
No ratings yet
Graph Convolutional Networks Review
23 pages
Graphanalyticswitharangodbfeb2021 210215121042
No ratings yet
Graphanalyticswitharangodbfeb2021 210215121042
56 pages
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
Week 16
No ratings yet
Week 16
47 pages
GRAPHS
No ratings yet
GRAPHS
63 pages
Understanding Graph Databases - A Comprehensive Introduction
No ratings yet
Understanding Graph Databases - A Comprehensive Introduction
29 pages
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
No ratings yet
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
65 pages
Week 3 4 SNA+Recommender
No ratings yet
Week 3 4 SNA+Recommender
92 pages
F14Lec12graphs PDF
No ratings yet
F14Lec12graphs PDF
85 pages
Lecture 4 - Analyzing Massive Graphs Part I
No ratings yet
Lecture 4 - Analyzing Massive Graphs Part I
27 pages
2 Attention Based Graph Summarization For Large - Compressed
No ratings yet
2 Attention Based Graph Summarization For Large - Compressed
12 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
SUPfinal Dis Math
No ratings yet
SUPfinal Dis Math
9 pages
Building Desktop Apps in Python (1)
No ratings yet
Building Desktop Apps in Python (1)
10 pages
Python Programming Aug-2022 QP
No ratings yet
Python Programming Aug-2022 QP
1 page
Java 02-Oct-2025 9-26 pm
No ratings yet
Java 02-Oct-2025 9-26 pm
17 pages
Universal Human Values
No ratings yet
Universal Human Values
6 pages
Module-I Java_Notes (1)
No ratings yet
Module-I Java_Notes (1)
58 pages
Transformer Construction and Working Principle
No ratings yet
Transformer Construction and Working Principle
14 pages
PP Handout 5
No ratings yet
PP Handout 5
16 pages
PP Lab Manual
No ratings yet
PP Lab Manual
38 pages
Polymorphism Program
No ratings yet
Polymorphism Program
5 pages
PP Handout 4
No ratings yet
PP Handout 4
20 pages
Laplce Trasfoms Bits
No ratings yet
Laplce Trasfoms Bits
5 pages
PP Handout 1
No ratings yet
PP Handout 1
29 pages
PP Assignment-II
No ratings yet
PP Assignment-II
2 pages
Chemical Fuels
No ratings yet
Chemical Fuels
18 pages
PP Handout-3
No ratings yet
PP Handout-3
24 pages
CS502 Fundamentals of Algorithms
No ratings yet
CS502 Fundamentals of Algorithms
24 pages
Importing Data Python Cheat Sheet PDF
No ratings yet
Importing Data Python Cheat Sheet PDF
1 page
UAV Pusher Configuration
No ratings yet
UAV Pusher Configuration
2 pages
Reversible Computing
No ratings yet
Reversible Computing
2 pages
Design Optimization of Crude Oil Distillation
No ratings yet
Design Optimization of Crude Oil Distillation
8 pages
Model Risk Tiering
100% (2)
Model Risk Tiering
32 pages
Kinematics of Rectilinear Motion
No ratings yet
Kinematics of Rectilinear Motion
20 pages
SM SSM DB Uk 003
No ratings yet
SM SSM DB Uk 003
4 pages
DQ Model
No ratings yet
DQ Model
7 pages
TEE - CSE3001 - DBMS - 100237 - Dr. Harihrasitaraman.S - Winter21-22-Block1 - QP
No ratings yet
TEE - CSE3001 - DBMS - 100237 - Dr. Harihrasitaraman.S - Winter21-22-Block1 - QP
3 pages
Method 8242 Heterotrophic Bacteria
No ratings yet
Method 8242 Heterotrophic Bacteria
6 pages
Quickly Export From Primavera P6 To Excel 1
No ratings yet
Quickly Export From Primavera P6 To Excel 1
7 pages
AP 550 Asphalt Paver Sell Sheet MSS-1172-02-EN
No ratings yet
AP 550 Asphalt Paver Sell Sheet MSS-1172-02-EN
2 pages
MSDS 6. 33kv, 33 KV, PT
No ratings yet
MSDS 6. 33kv, 33 KV, PT
2 pages
Getting More From Less: Large Language Models Are Good Spontaneous Multilingual Learners
No ratings yet
Getting More From Less: Large Language Models Are Good Spontaneous Multilingual Learners
14 pages
Smart Grid Innovations for Utilities
No ratings yet
Smart Grid Innovations for Utilities
13 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Pendulum Energy Program Engineering
100% (2)
Pendulum Energy Program Engineering
86 pages
Icap Workshop 1
No ratings yet
Icap Workshop 1
27 pages
Valve CV Sizing Liquids Gases
No ratings yet
Valve CV Sizing Liquids Gases
22 pages
Syntax Analysis for CS Students
No ratings yet
Syntax Analysis for CS Students
6 pages
Weibull-Analysis-In-Excel Standard IEC 61649
No ratings yet
Weibull-Analysis-In-Excel Standard IEC 61649
113 pages
Pre-Concept Design Report PDF
No ratings yet
Pre-Concept Design Report PDF
434 pages
Synopsis DIYA TERM 2
No ratings yet
Synopsis DIYA TERM 2
54 pages
HUS4
No ratings yet
HUS4
20 pages
Timoshenko Beam Theory
No ratings yet
Timoshenko Beam Theory
8 pages
Jyotish Krishnamurthy Paddhati Bansal PDF
100% (2)
Jyotish Krishnamurthy Paddhati Bansal PDF
61 pages
Mech - Design1 - 2023 - L08 Gear Design (Continued) NO Audio
No ratings yet
Mech - Design1 - 2023 - L08 Gear Design (Continued) NO Audio
72 pages
12v Battery Charger Circuit With Auto Cut Off - Circuits Gallery
0% (1)
12v Battery Charger Circuit With Auto Cut Off - Circuits Gallery
40 pages
Hospital Management Software Development: Olawale Ayotunde Sobogungod
No ratings yet
Hospital Management Software Development: Olawale Ayotunde Sobogungod
3 pages

GraphX & Graph Analytics

Uploaded by

GraphX & Graph Analytics

Uploaded by

GraphX & Graph Analytics

● Graphs are a flexible and powerful data structure used to

Graph-structured data consists of:

● Edges (Links): represent relationships or interactions between nodes

● Definition: A directed multigraph with user-defined objects attached to each

● Vertex (VD) and Edge (ED) Types:

Scenario: Modeling vertices with different property types.

● Efficient Storage: Optimized memory usage for primitive data

1. Social Network Analysis

● Friendships or follows → Edges

○ Facebook graph with users and friends.

○ Train a model to classify users into categories based on their connections.

○ Suggesting "People You May Know" on Facebook.

○ Based on mutual friends, interests, and interaction patterns.

● Interactions (e.g., rating, purchase) as edges

○ In Amazon, if User A likes Book 1 and Book 2, and User B likes

○ This is modeled as collaborative filtering on a bipartite graph.

● Definition: Graph operators are functions applied to graphs to

● Purpose: Transform vertex or edge properties without changing

● Purpose: Modify the structure of the graph.

● Purpose: Combine external data with graph elements.

outerJoinVertices: Joins an RDD with vertices and allows for unmatched

def msgFun(triplet: Triplet[Int, Float]): Iterator[(Int, String)] = Iterator((triplet.dstId, "Hi"))

def reduceFun(a: String, b: String): String = a + " " + b

val result = graph.mapReduceTriplets[String](msgFun, reduceFun)

def msgFun(triplet: EdgeContext[Int, Float, String]) {

def reduceFun(a: String, b: String): String = a + " " + b

val result = graph.aggregateMessages[String](msgFun, reduceFun)

In some cases it may be easier to express computation by collecting neighboring

GraphX adopts a vertex-cut approach to distributed graph partitioning:

● Types of PageRank in GraphX:

○ Dynamic PageRank: Runs until the ranks converge (i.e., stop

Using PageRank in GraphX

● How to use PageRank:

○ PageRank Methods: Available as methods on the PageRank

○ GraphOps: Enables calling algorithms directly on the graph.

○ Social Network Example:

■ Users are listed in data/graphx/users.txt.

■ Relationships between users are in data/graphx/followers.txt.

○ Goal: Compute the PageRank for each user based on their

● A connected component is a subgraph in which any two vertices are

● Each connected component is labeled with the ID of its

● Example: In a social network, connected components can represent

/ Join the connected components with the usernames

● What is Triangle Counting?

○ A vertex is part of a triangle if it has two adjacent vertices with an

○ Triangle counting provides a measure of clustering by counting

○ The triangle counting algorithm is available in the TriangleCount

○ The algorithm counts the number of triangles that pass through

○ A triangle is formed when three vertices are connected, and each

○ Triangle counting is a measure of clustering in the graph,

You might also like