KEMBAR78
Webinar: Working with Graph Data in MongoDB | PDF
Graph Operations
With MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB
Graph Operations
With MongoDB
Agenda
MongoDB
Introduction
01 New Lookup
Operators
03Graph Use &
Concepts
02
Example Scenarios
04 Wrap-up
06Design &
Performance
Considerations
05
MongoDB Introduction
Documents
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Fields
Typed field values
Fields can contain arrays
Number
Query Language
db.collection.find({'city':'London'})
db.collection.find({'profession':{'$in':['banking','trader']}},{'surname':1,'profession':1})
db.collection.find({'cars.year':{'$lte':1968}}).sort({'surname':1}).limit(10)
db.collection.find({'cars.model':'Bentley','cars.year':{'$lt':1966}})
db.collection.find({'cars':{'$elemMatch':{'model':'Bentley','year':{'$lt':1966}}}})
db.collection.find({'location':{'$geoWithin': { '$geometry': {
'type': 'Polygon',
coordinates: [ <array-of-coordinates> ]
}}}})
SecondaryIndexes
compound, geospatial, text, multikey, hashed,
unique, sparse, partial, TTL
Query Language
db.collection.aggregate ( [
{$match:{'profession':{'$in':['banking','trader']}}},
{$addFields:{'surnameLower':{$toLower:"$surname"},'prof':{$ifNull:["$prof","Unknown"]}},
{$group: { ... } },
{$sort: { ... } },
{$limit: { ... } },
{$match: { ... } },
...
] )
Aggregation pipeline
Schema Design
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Embed
same
document
Schema Design
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Embed
same
document
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’]
}
cars:
{ owner_id: 146
model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ owner_id: 146
model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
Separate
Collection
with reference
Functionality Timeline
2.0 – 2.2
Geospatial Polygon support
Aggregation Framework
New 2dsphere index
Aggregation Framework
efficiency optimisations
Full text search
2.4 – 2.6
3.0 – 3.2
Join functionality
Increased geo accuracy
New Aggregation operators
Improved case insensitivity
Recursive graph traversal
Faceted search
Multiple collations
3.4
MongoDB 3.4 - Multi-Model Database
Document
Rich	JSON	Data	Structures
Flexible	Schema
Global	Scale
Relational
Left-Outer	Join
Views
Schema	Validation
Key/Value
Horizontal	Scale
In-Memory
Search
Text	Search
Multiple	Languages
Faceted	Search
Binaries
Files	&	Metadata
Encrypted
Graph
Graph	&	Hierarchical
Recursive	Lookups
GeoSpatial
GeoJSON
2D	&	2DSphere
Graph Use & Concepts
Common Use Cases
• Networks
• Social – circle of friends/colleagues
• Computer network – physical/virtual/application layer
• Mapping / Routes
• Shortest route A to B
• Cybersecurity & Fraud Detection
• Real-time fraud/scam recognition
• Personalisation/Recommendation Engine
• Product, social, service, professional etc.
Graph Key Concepts
• Vertices (nodes)
• Edges (relationships)
• Nodes have properties
• Relationships have name & direction
Relational DBs Lack Relationships
• “Relationships” are actually JOINs
• Raw business or storage logic and constraints – not semantic
• JOIN tables, sparse columns, null-checks
• More JOINS = degraded performance and flexibility
Relational DBs Lack Relationships
• How expensive/complex is:
– Find my friends?
– Find friends of my friends?
– Find mutual friends?
– Find friends of my friends of my friends?
– And so on…
Native Graph Database Strengths
• Relationships are first class citizens of the database
• Index-free adjacency
• Nodes “point” directly to other nodes
• Efficient relationship traversal
Native Graph Database Challenges
• Complex query languages
• Poorly optimized for non-traversal queries
• Difficult to express
• May be memory intensive
• Less often used as System Of Record
• Synchronisation with SOR required
• Increased operational complexity
• Consistency concerns
NoSQL DBs Lack Relationships
• “Flat” disconnected documents or key/value pairs
• “Foreign keys” inferred at application layer
• Data integrity/quality onus is on the application
• Suggestions re difficulty of modeling ANY relationships efficiently with
aggregate stores.
• However…
Friends Network – Document Style
{
_id: 0,
name: "Bob Smith",
friends: ["Anna Jones", "Chris Green"]
},
{
_id: 1,
name: "Anna Jones",
friends: ["Bob Smith", "Chris Green", "Joe Lee"]
},
{
_id: 2,
name: "Chris Green",
friends: ["Anna Jones", "Bob Smith"]
}
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
Why MongoDB For Graph?
Lookup Operators
$lookup
Syntax
$lookup: {
from: <target lookup collection>,
localField: <field from the input document>,
foreignField: <field from the target collection to connect to>,
as: <field name for resulting array>
}
$graphLookup
Syntax
$graphLookup: {
from: <target lookup collection>,
startWith: <expression for value to start from>,
connectToField: <field name in target collection to connect to>,
connectFromField: <field name in target collection to connect from – recurse from here>,
as: <field name for resulting array>,
maxDepth: <max number of iterations to perform>,
depthField: <field name for number of recursive iterations required to reach this node>,
restrictSearchWithMatch: <match condition to apply to lookup>
}
Things To Note
• startWith value is an expression
• Referencing value of a field requires the ‘$’ prefix
• Can do things like {$toLower: "$name" }
• Handles array fields automatically
• connectToField and connectFromField take field names
• restrictSearchWithMatch takes a standard query expressions
Things To Note
• Cycles are automatically detected
• Can be used with 3.4 views:
• Define a view
• Recurse across existing view (‘base’ or ‘from’)
• Can be used multiple times per Aggregation pipeline
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
• Options
• Store immediate parent in each node
• Store immediate children in each node
• Traverse in multiple directions
• Recurse in same collection
• Join/recurse into another collection
5 13 14 16 176
3 15121094
2 7 8 11
1
Schema Design – with $graphLookup
75%
of use cases*
*based on beta test user feedback
So just how suitable is MongoDB for
the many varied graph use cases I
have then?”
Example Scenarios
Scenario: Calculate Friend Network
{
_id: 0,
name: "Bob Smith",
friends: ["Anna Jones", "Chris Green"]
},
{
_id: 1,
name: "Anna Jones",
friends: ["Bob Smith", "Chris Green", "Joe Lee"]
},
{
_id: 2,
name: "Chris Green",
friends: ["Anna Jones", "Bob Smith"]
}
Scenario: Calculate Friend Network
[
{
$match: { "name": "Bob Smith" }
},
{
$graphLookup: {
from: "contacts",
startWith: "$friends",
connectToField: "name",
connectFromField: "friends”,
as: "socialNetwork"
}
},
{
$project: { name: 1, friends:1, socialNetwork: "$socialNetwork.name"}
}
]
This field is an array
No maxDepth set
Scenario: Calculate Friend Network
{
"_id" : 0,
"name" : "Bob Smith",
"friends" : [
"Anna Jones",
"Chris Green"
],
"socialNetwork" : [
"Joe Lee",
"Fred Brown",
"Bob Smith",
"Chris Green",
"Anna Jones"
]
}
Array
Friends Network - Social
Bob
Smith
Chris
Greenfriends
Anna
Jones
Joe Lee
Recommendation ?
Friends Network - Social
Bob
Smith
Chris
Greenfriends
Anna
Jones
Joe Lee
Recommendation ?
Acme
Soda
Scenario: Determine Air Travel Options
ORD
JFK
BOS
PWM
LHR
{ "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] }
{ "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] }
{ "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] }
{ "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] }
{ "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
Scenario: Determine Air Travel Options
Meet Lucy
{ "_id" : 0, "name" : "Lucy", "nearestAirport" : "JFK" }
[
{
"$match": {"name":"Lucy"}
},
{
"$graphLookup": {
from: "airports",
startWith: "$nearestAirport",
connectToField: "airport",
connectFromField: "connects",
maxDepth: 2,
depthField: "numFlights",
as: "destinations”
}
}
]
Scenario: Determine Air Travel Options
Record the number of
recursions
{
name: "Lucy”,
nearestAirport: "JFK",
destinations: [
{ _id: 0, airport: "JFK", connects: ["BOS", "ORD"], numFlights: 0 },
{ _id: 1, airport: "BOS", connects: ["JFK", "PWM"], numFlights: 1 },
{ _id: 2, airport: "ORD", connects: ["JFK"], numFlights: 1 },
{ _id: 3, airport: "PWM", connects: ["BOS", "LHR"], numFlights: 2 }
]
}
Scenario: Determine Air Travel Options
How many flights this
would take
ORD
JFK
BOS
PWM
LHR
ATL
Scenario: Determine Air Travel Options
{ "_id" : 0, "airport" : "JFK", "connects" : [
{ "to" : "BOS", "airlines" : [ "UA", "AA" ] },
{ "to" : "ORD", "airlines" : [ "UA", "AA" ] },
{ "to" : "ATL", "airlines" : [ "AA", "DL" ] }] }
{ "_id" : 1, "airport" : "BOS", "connects" : [
{ "to" : "JFK", "airlines" : [ "UA", "AA" ] },
{ "to" : "PWM", "airlines" : [ "AA" ] } ]] }
{ "_id" : 2, "airport" : "ORD", "connects" : [
{ "to" : "JFK", "airlines" : [ "UA”,"AA" ] }] }
{ "_id" : 3, "airport" : "PWM", "connects" : [
{ "to" : "BOS", "airlines" : [ "AA" ] }] }
Scenario: Determine Air Travel Options
[
{
"$match":{"name":"Lucy"}
},
{
"$graphLookup": {
from: "airports",
startWith: "$nearestAirport",
connectToField: "airport",
connectFromField: "connects.to”,
maxDepth: 2,
depthField: "numFlights”,
restrictSearchWithMatch: {"connects.airlines":"UA"},
as: ”UAdestinations"
}
}
]
Scenario: Determine Air Travel Options
We’ve added a filter
{
"name" : "Lucy",
"from" : "JFK",
"UAdestinations" : [
{ "_id" : 2, "airport" : "ORD", "numFlights" : NumberLong(1) },
{ "_id" : 1, "airport" : "BOS", "numFlights" : NumberLong(1) }
]
}
Scenario: Determine Air Travel Options
Scenario: Product Categories
Mugs
Kitchen &
Dining
Commuter &
Travel
Glassware &
Drinkware
Outdoor
Recreation
Camping
Mugs
Running
Thermos
Red Run
Thermos
White Run
Thermos
Blue Run
Thermos
Scenario: Product Categories
Get all children 2 levels deep – flat result
Scenario: Product Categories
Get all children 2 levels deep – nested result
Scenario: Article Recommendation
1
98
9
1
8
15
7
2
6
8
5
38
4
12
3
4
2
75
Depth 1
Depth 2
Depth 0
43
19
content id
conversion rate
recommendation
Scenario: Article Recommendation
1
98
9
1
8
15
7
2
6
8
5
38
4
12
3
4
2
75
Depth 1
Depth 2
Depth 0
43
19
content id
conversion rate
recommendation
Recommendations
for Target #1
Recommendation for
Targets #2 and #3
Target #1 (best)
Target #2
Target #3
Syntax
Syntax
Design & Performance
Considerations
The Tale of Two Biebers
VS
Follower Churn
• Everyone worries about scaling content
• But follow requests can be >> message send rates
• Twitter enforces per day follow limits
Edge Metadata
• Models – friends/followers
• Requirements typically start simple
• Add Groups, Favorites, Relationships
Options for Storing Graphs in MongoDB
Option One – Embedding Edges
Embedded Edge Arrays
• Storing connections with user (popular choice)
üMost compact form
üEfficient for reads
• However….
• User documents grow
• Upper limit on degree (document size)
• Difficult to annotate (and index) edge
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"followers" : [ "jsr", "ian"],
"following" : [ "jsr", "pete"]
}
Embedded Edge Arrays
• Creating Rich Graph Information
• Can become cumbersome
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"friends" : [
{"uid" : "jsr", "grp" : "school"},
{"uid" : "ian", "grp" : "work"} ]
}
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"friends" : [ "jsr", "ian"],
"group" : [ ”school", ”work"]
}
Option Two – Edge Collection
Edge Collections
• Document per edge
• Very flexible for adding edge data
> db.followers.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr"
}
> db.friends.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr",
"grp" : "work",
"ts" : Date("2013-07-10")
}
Edge Collection
Indexing Strategies
Finding Followers
Find followers in single edge collection :
> db.followers.find({from : "djw"}, {_id:0, to:1})
{
"to" : "jsr"
}
Using index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
Covered index when
searching on "from" for all
followers
Specify only if multiple
edges cannot exist
Finding Following
What about who a user is following?
Could use a reverse covered index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
Notice the flipped field
order here
Wait ! There may be an issue with the reverse index…..
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
If we shard this collection by "from",
looking up followers for a specific
user is "targeted" to a shard
To find who the user is following
however, it must scatter-gather the
query to all shards
SHARDING!
Finding Following
Dual Edge Collections
Dual Edge Collections
• When "following" queries are common
• Not always the case
• Consider overhead carefully
• Can use dual collections storing
• One for each direction
• Edges are duplicated reversed
• Can be sharded independently
Wrap-up
MongoDB $graphLookup
• Efficient, index-based recursive queries
• Familiar, MongoDB query language
• Use a single System Of Record
• Cater for all query types
• No added operational overhead
• No synchronization requirements
• Reduced technology surface area
Graph Operations
With MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB

Webinar: Working with Graph Data in MongoDB

  • 1.
    Graph Operations With MongoDB CharlesSarrazin Senior Consulting Engineer, MongoDB
  • 2.
    Charles Sarrazin Senior ConsultingEngineer, MongoDB Graph Operations With MongoDB
  • 3.
    Agenda MongoDB Introduction 01 New Lookup Operators 03GraphUse & Concepts 02 Example Scenarios 04 Wrap-up 06Design & Performance Considerations 05
  • 4.
  • 5.
    Documents { first_name: ‘Paul’, surname: ‘Miller’, cell:447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays Number
  • 6.
  • 7.
    Query Language db.collection.aggregate ([ {$match:{'profession':{'$in':['banking','trader']}}}, {$addFields:{'surnameLower':{$toLower:"$surname"},'prof':{$ifNull:["$prof","Unknown"]}}, {$group: { ... } }, {$sort: { ... } }, {$limit: { ... } }, {$match: { ... } }, ... ] ) Aggregation pipeline
  • 8.
    Schema Design { first_name: ‘Paul’, surname:‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Embed same document
  • 9.
    Schema Design { first_name: ‘Paul’, surname:‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Embed same document { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’] } cars: { owner_id: 146 model: ‘Bentley’, year: 1973, value: 100000, … }, { owner_id: 146 model: ‘Rolls Royce’, year: 1965, value: 330000, … } Separate Collection with reference
  • 11.
    Functionality Timeline 2.0 –2.2 Geospatial Polygon support Aggregation Framework New 2dsphere index Aggregation Framework efficiency optimisations Full text search 2.4 – 2.6 3.0 – 3.2 Join functionality Increased geo accuracy New Aggregation operators Improved case insensitivity Recursive graph traversal Faceted search Multiple collations 3.4
  • 12.
    MongoDB 3.4 -Multi-Model Database Document Rich JSON Data Structures Flexible Schema Global Scale Relational Left-Outer Join Views Schema Validation Key/Value Horizontal Scale In-Memory Search Text Search Multiple Languages Faceted Search Binaries Files & Metadata Encrypted Graph Graph & Hierarchical Recursive Lookups GeoSpatial GeoJSON 2D & 2DSphere
  • 13.
    Graph Use &Concepts
  • 14.
    Common Use Cases •Networks • Social – circle of friends/colleagues • Computer network – physical/virtual/application layer • Mapping / Routes • Shortest route A to B • Cybersecurity & Fraud Detection • Real-time fraud/scam recognition • Personalisation/Recommendation Engine • Product, social, service, professional etc.
  • 15.
    Graph Key Concepts •Vertices (nodes) • Edges (relationships) • Nodes have properties • Relationships have name & direction
  • 16.
    Relational DBs LackRelationships • “Relationships” are actually JOINs • Raw business or storage logic and constraints – not semantic • JOIN tables, sparse columns, null-checks • More JOINS = degraded performance and flexibility
  • 17.
    Relational DBs LackRelationships • How expensive/complex is: – Find my friends? – Find friends of my friends? – Find mutual friends? – Find friends of my friends of my friends? – And so on…
  • 18.
    Native Graph DatabaseStrengths • Relationships are first class citizens of the database • Index-free adjacency • Nodes “point” directly to other nodes • Efficient relationship traversal
  • 19.
    Native Graph DatabaseChallenges • Complex query languages • Poorly optimized for non-traversal queries • Difficult to express • May be memory intensive • Less often used as System Of Record • Synchronisation with SOR required • Increased operational complexity • Consistency concerns
  • 20.
    NoSQL DBs LackRelationships • “Flat” disconnected documents or key/value pairs • “Foreign keys” inferred at application layer • Data integrity/quality onus is on the application • Suggestions re difficulty of modeling ANY relationships efficiently with aggregate stores. • However…
  • 21.
    Friends Network –Document Style { _id: 0, name: "Bob Smith", friends: ["Anna Jones", "Chris Green"] }, { _id: 1, name: "Anna Jones", friends: ["Bob Smith", "Chris Green", "Joe Lee"] }, { _id: 2, name: "Chris Green", friends: ["Anna Jones", "Bob Smith"] }
  • 22.
    Schema Design –before $graphLookup • Options • Store an array of direct children in each node • Store parent in each node • Store parent and array of ancestors • Trade-offs • Simple queries… • …vs simple updates 5 13 14 16 176 3 15121094 2 7 8 11 1
  • 23.
  • 24.
  • 25.
  • 26.
    Syntax $lookup: { from: <targetlookup collection>, localField: <field from the input document>, foreignField: <field from the target collection to connect to>, as: <field name for resulting array> }
  • 27.
  • 28.
    Syntax $graphLookup: { from: <targetlookup collection>, startWith: <expression for value to start from>, connectToField: <field name in target collection to connect to>, connectFromField: <field name in target collection to connect from – recurse from here>, as: <field name for resulting array>, maxDepth: <max number of iterations to perform>, depthField: <field name for number of recursive iterations required to reach this node>, restrictSearchWithMatch: <match condition to apply to lookup> }
  • 29.
    Things To Note •startWith value is an expression • Referencing value of a field requires the ‘$’ prefix • Can do things like {$toLower: "$name" } • Handles array fields automatically • connectToField and connectFromField take field names • restrictSearchWithMatch takes a standard query expressions
  • 30.
    Things To Note •Cycles are automatically detected • Can be used with 3.4 views: • Define a view • Recurse across existing view (‘base’ or ‘from’) • Can be used multiple times per Aggregation pipeline
  • 31.
    Schema Design –before $graphLookup • Options • Store an array of direct children in each node • Store parent in each node • Store parent and array of ancestors • Trade-offs • Simple queries… • …vs simple updates 5 13 14 16 176 3 15121094 2 7 8 11 1
  • 32.
    • Options • Storeimmediate parent in each node • Store immediate children in each node • Traverse in multiple directions • Recurse in same collection • Join/recurse into another collection 5 13 14 16 176 3 15121094 2 7 8 11 1 Schema Design – with $graphLookup
  • 33.
    75% of use cases* *basedon beta test user feedback So just how suitable is MongoDB for the many varied graph use cases I have then?”
  • 34.
  • 35.
    Scenario: Calculate FriendNetwork { _id: 0, name: "Bob Smith", friends: ["Anna Jones", "Chris Green"] }, { _id: 1, name: "Anna Jones", friends: ["Bob Smith", "Chris Green", "Joe Lee"] }, { _id: 2, name: "Chris Green", friends: ["Anna Jones", "Bob Smith"] }
  • 36.
    Scenario: Calculate FriendNetwork [ { $match: { "name": "Bob Smith" } }, { $graphLookup: { from: "contacts", startWith: "$friends", connectToField: "name", connectFromField: "friends”, as: "socialNetwork" } }, { $project: { name: 1, friends:1, socialNetwork: "$socialNetwork.name"} } ] This field is an array No maxDepth set
  • 37.
    Scenario: Calculate FriendNetwork { "_id" : 0, "name" : "Bob Smith", "friends" : [ "Anna Jones", "Chris Green" ], "socialNetwork" : [ "Joe Lee", "Fred Brown", "Bob Smith", "Chris Green", "Anna Jones" ] } Array
  • 38.
    Friends Network -Social Bob Smith Chris Greenfriends Anna Jones Joe Lee Recommendation ?
  • 39.
    Friends Network -Social Bob Smith Chris Greenfriends Anna Jones Joe Lee Recommendation ? Acme Soda
  • 40.
    Scenario: Determine AirTravel Options ORD JFK BOS PWM LHR { "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] } { "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] } { "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] } { "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] } { "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
  • 41.
    Scenario: Determine AirTravel Options Meet Lucy { "_id" : 0, "name" : "Lucy", "nearestAirport" : "JFK" }
  • 42.
    [ { "$match": {"name":"Lucy"} }, { "$graphLookup": { from:"airports", startWith: "$nearestAirport", connectToField: "airport", connectFromField: "connects", maxDepth: 2, depthField: "numFlights", as: "destinations” } } ] Scenario: Determine Air Travel Options Record the number of recursions
  • 43.
    { name: "Lucy”, nearestAirport: "JFK", destinations:[ { _id: 0, airport: "JFK", connects: ["BOS", "ORD"], numFlights: 0 }, { _id: 1, airport: "BOS", connects: ["JFK", "PWM"], numFlights: 1 }, { _id: 2, airport: "ORD", connects: ["JFK"], numFlights: 1 }, { _id: 3, airport: "PWM", connects: ["BOS", "LHR"], numFlights: 2 } ] } Scenario: Determine Air Travel Options How many flights this would take
  • 44.
  • 45.
    { "_id" :0, "airport" : "JFK", "connects" : [ { "to" : "BOS", "airlines" : [ "UA", "AA" ] }, { "to" : "ORD", "airlines" : [ "UA", "AA" ] }, { "to" : "ATL", "airlines" : [ "AA", "DL" ] }] } { "_id" : 1, "airport" : "BOS", "connects" : [ { "to" : "JFK", "airlines" : [ "UA", "AA" ] }, { "to" : "PWM", "airlines" : [ "AA" ] } ]] } { "_id" : 2, "airport" : "ORD", "connects" : [ { "to" : "JFK", "airlines" : [ "UA”,"AA" ] }] } { "_id" : 3, "airport" : "PWM", "connects" : [ { "to" : "BOS", "airlines" : [ "AA" ] }] } Scenario: Determine Air Travel Options
  • 46.
    [ { "$match":{"name":"Lucy"} }, { "$graphLookup": { from: "airports", startWith:"$nearestAirport", connectToField: "airport", connectFromField: "connects.to”, maxDepth: 2, depthField: "numFlights”, restrictSearchWithMatch: {"connects.airlines":"UA"}, as: ”UAdestinations" } } ] Scenario: Determine Air Travel Options We’ve added a filter
  • 47.
    { "name" : "Lucy", "from": "JFK", "UAdestinations" : [ { "_id" : 2, "airport" : "ORD", "numFlights" : NumberLong(1) }, { "_id" : 1, "airport" : "BOS", "numFlights" : NumberLong(1) } ] } Scenario: Determine Air Travel Options
  • 48.
    Scenario: Product Categories Mugs Kitchen& Dining Commuter & Travel Glassware & Drinkware Outdoor Recreation Camping Mugs Running Thermos Red Run Thermos White Run Thermos Blue Run Thermos
  • 49.
    Scenario: Product Categories Getall children 2 levels deep – flat result
  • 50.
    Scenario: Product Categories Getall children 2 levels deep – nested result
  • 51.
    Scenario: Article Recommendation 1 98 9 1 8 15 7 2 6 8 5 38 4 12 3 4 2 75 Depth1 Depth 2 Depth 0 43 19 content id conversion rate recommendation
  • 52.
    Scenario: Article Recommendation 1 98 9 1 8 15 7 2 6 8 5 38 4 12 3 4 2 75 Depth1 Depth 2 Depth 0 43 19 content id conversion rate recommendation Recommendations for Target #1 Recommendation for Targets #2 and #3 Target #1 (best) Target #2 Target #3
  • 53.
  • 54.
  • 55.
  • 56.
    The Tale ofTwo Biebers VS
  • 57.
    Follower Churn • Everyoneworries about scaling content • But follow requests can be >> message send rates • Twitter enforces per day follow limits
  • 58.
    Edge Metadata • Models– friends/followers • Requirements typically start simple • Add Groups, Favorites, Relationships
  • 59.
    Options for StoringGraphs in MongoDB
  • 60.
    Option One –Embedding Edges
  • 61.
    Embedded Edge Arrays •Storing connections with user (popular choice) üMost compact form üEfficient for reads • However…. • User documents grow • Upper limit on degree (document size) • Difficult to annotate (and index) edge { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "followers" : [ "jsr", "ian"], "following" : [ "jsr", "pete"] }
  • 62.
    Embedded Edge Arrays •Creating Rich Graph Information • Can become cumbersome { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ {"uid" : "jsr", "grp" : "school"}, {"uid" : "ian", "grp" : "work"} ] } { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ "jsr", "ian"], "group" : [ ”school", ”work"] }
  • 63.
    Option Two –Edge Collection
  • 64.
    Edge Collections • Documentper edge • Very flexible for adding edge data > db.followers.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr" } > db.friends.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr", "grp" : "work", "ts" : Date("2013-07-10") }
  • 65.
  • 66.
    Finding Followers Find followersin single edge collection : > db.followers.find({from : "djw"}, {_id:0, to:1}) { "to" : "jsr" } Using index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } Covered index when searching on "from" for all followers Specify only if multiple edges cannot exist
  • 67.
    Finding Following What aboutwho a user is following? Could use a reverse covered index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } Notice the flipped field order here Wait ! There may be an issue with the reverse index…..
  • 68.
    { "v" : 1, "key": { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard To find who the user is following however, it must scatter-gather the query to all shards SHARDING! Finding Following
  • 69.
  • 70.
    Dual Edge Collections •When "following" queries are common • Not always the case • Consider overhead carefully • Can use dual collections storing • One for each direction • Edges are duplicated reversed • Can be sharded independently
  • 71.
  • 72.
    MongoDB $graphLookup • Efficient,index-based recursive queries • Familiar, MongoDB query language • Use a single System Of Record • Cater for all query types • No added operational overhead • No synchronization requirements • Reduced technology surface area
  • 73.
    Graph Operations With MongoDB CharlesSarrazin Senior Consulting Engineer, MongoDB