Schema Design with MongoDB

open-source, high-performance,
document-oriented database

Schema Design Basics
roger@10Gen.com

This talk
‣Intro
-Terms / Deﬁnitions

This talk
‣Intro
‣Getting a ﬂavor
-Creating a Schema
-Indexes
-Evolving the Schema

This talk
‣Intro
‣Getting a ﬂavor
-Creating a Schema
-Indexes
-Evolving the Schema
‣Data modeling
-DBRef
-Single Table Inheritance
-Many - Many
-Trees
-Lists / Queues / Stacks

Document Oriented

Basic unit of data: JSON Documents
Not Relational, Key Value

Not OODB
- Associations implied by Document Structure
- but your database schema != your program schema

Terms
Table -> Collection

Row(s) -> JSON Document

Index -> Index

Join -> Embedding and Linking
across documents

Partition -> Shard
Partition Key -> Shard Key

Considerations

What are the requirements ?
- Functionality to be supported
- Access Patterns ?
- Data Life Cycle (insert, update, deletes)
- Expected Performance / Workload ?

Capabilities of the database ?

DB Considerations
How can we manipulate this data ?
Dynamic Queries
Secondary Indexes
Atomic Updates
Map Reduce

Access Patterns ?
Read / Write Ratio
Types of updates
Types of queries

Considerations
No Joins
Single Document Transactions only

Design Session
Use Rich Design Documents

post = {author: “kyle”,
date: new Date(),
text: “my blog post...”,
tags: [“mongodb”, “intro”]}

>db.post.save(post)

>db.posts.ﬁnd()

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "My ﬁrst blog",
tags : [ "mongodb", "intro" ] }

Notes:
- ID is unique, but can be anything you’d like

Secondary index for “author”

// 1 means ascending, -1 means descending

>db.posts.ensureIndex({author: 1})

>db.posts.ﬁnd({author: 'kyle'})

author : "kyle",
... }

Verifying indexes exist
>db.system.indexes.ﬁnd()

// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }

Verifying indexes exist
>db.system.indexes.ﬁnd()

// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }

// Index on author
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,

// ﬁnd posts with any tags
>db.posts.ﬁnd({tags: {$exists: true}})

Query operators


Regular expressions:
// posts where author starts with k
>db.posts.ﬁnd({author: /^k*/i })

Query operators


Regular expressions:
// posts where author starts with k
>db.posts.ﬁnd({author: /^k*/i })

Counting:
// posts written by mike
>db.posts.ﬁnd({author: “mike”}).count()

Extending the Schema
comment = {author: “fred”,
date: new Date(),
text: “super duper”}

update = { ‘$push’: {comments: comment},
‘$inc’: {comments_count: 1}}

>db.posts.update({_id: “...” }, update)

author : "kyle",
text : "My ﬁrst blog",
tags : [ "mongodb", "intro" ],
comments_count: 1,
comments : [

{

author : "Fred",


text : "Super Duper"

}
]}

// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})

>db.posts.ﬁnd({comments.author:”Fred”})


>db.posts.find({comments.author:”kyle”})

// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)


>db.posts.find({comments.author:”kyle”})

// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)

// most commented post:
>db.posts.find().sort({comments_count:-1}).limit(1)

When sorting, check if you need an index

Map Reduce
Aggregation and batch manipulation

Collection in, Collection out

Parallel in sharded environments

Map reduce
mapFunc = function () {
this.tags.forEach(function (z) {emit(z, {count:1});});
}

reduceFunc = function (k, v) {
var total = 0;
for (var i = 0; i < v.length; i++) { total += v[i].count; }
return {count:total}; }

res = db.posts.mapReduce(mapFunc, reduceFunc)

>db[res.result].ﬁnd()
{ _id : "intro", value : { count : 1 } }
{ _id : "mongodb", value : { count : 1 } }

Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more

Wordnik
9B records, 100M queries / week, 1,2TB
{

entry : {

header: { id: 0,

headword: "m",

sourceDictionary: "GCide",

textProns : [

{text: "(em)",

seq:0}

],

syllables: [

{id: 0,

text: "m"}

],

sourceDictionary: "1913 Webster",

headWord: "m",

id: 1,

deﬁnitions: : [

{text: "M, the thirteenth letter..."},

{text: "As a numeral, M stands for 1000"}]

}

}
}

Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more

Observations:
- Using Rich Documents works well
- Simplify relations by embedding them
- Iterative development is easy with MongoDB

Single Table Inheritance
>db.shapes.find()

{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0
>db.shapes.find({radius: {$gt: 0}})

// create index
>db.shapes.ensureIndex({radius: 1})

One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to ﬁnd latest comments across all documents

One to Many

- Embedded tree
- Single document
- Natural
- Hard to query

One to Many

- Embedded tree
- Single document
- Natural
- Hard to query

- Normalized (2 collections)
- most ﬂexible
- more queries

Many - Many
Example:

- Product can be in many categories
- Category can have many products

Products id | product_id | category_id Category

products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}

products:

categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}

products:

categories:
name: "Indonesia",
ObjectId("4c4ca30433fb5941681b9130"),

//All categories for a given product
>db.categories.ﬁnd({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

products:

categories:
name: "Indonesia",
ObjectId("4c4ca30433fb5941681b9130"),

//All categories for a given product
>db.categories.ﬁnd({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

//All products for a given category
>db.products.ﬁnd({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative
products:

categories:
name: "Indonesia"}

Alternative
products:

categories:
name: "Indonesia"}

// All products for a given category

Alternative
products:

categories:
name: "Indonesia"}

// All products for a given category

// All categories for a given product
product = db.products.ﬁnd(_id : some_id)
>db.categories.ﬁnd({_id : {$in : product.category_ids}})

Trees
Full Tree in Document

{ comments: [
{ author: “rpb”, text: “...”,
replies: [
{author: “Fred”, text: “...”,
replies: []}
]}
]}

Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 4MB limit

Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent

Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)

Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

Array of Ancestors
{ _id: "a" }

//ﬁnd all descendants of b:
>db.tree2.ﬁnd({ancestors: ‘b’})

Array of Ancestors
{ _id: "a" }

//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})

//find all ancestors of f:
>ancestors = db.tree2.findOne({_id:’f’}).ancestors
>db.tree2.find({_id: { $in : ancestors})

ﬁndAndModify
Queue example

//Example: grab highest priority job and mark

job = db.jobs.ﬁndAndModify({
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})

More Cool Stuff

• Aggregation
• Capped collections
• GridFS
• Geo

Learn More
Kyle’s presentation + video:
http://www.slideshare.net/kbanker/mongodb-schema-design
http://www.blip.tv/file/3704083

Dwight’s presentation
http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-
merriman

Documentation
Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command
Aggregration: http://www.mongodb.org/display/DOCS/Aggregation
Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections
Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing
GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification

Download
MongoDB

and let us know what you think
@mongodb
http://www.mongodb.org

DBRef
DBRef
{$ref: collection, $id: id_value}

- Think URL
- YDSMV: your driver support may vary

Sample Schema:
nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]}

Dereferencing:
nr.forEach(function(r) {
printjson(db[r.$ref].ﬁndOne({_id: r.$id}));
}

BSON
Mongodb stores data in BSON internally

Lightweight, Traversable, Efficient encoding

Typed
boolean, integer, ﬂoat, date, string, binary, array...

Schema Design with MongoDB

More Related Content

What's hot

Viewers also liked

Similar to Schema Design with MongoDB

Recently uploaded

Schema Design with MongoDB

Editor's Notes