KEMBAR78
Mongodb intro | KEY
Introduction

Christian Kvalheim - christkv@10gen.com
                @christkv
Today's Talk
• Quick introduction to NoSQL

• Some Background about mongoDB

• Using mongoDB

• Deploying mongoDB
Database Landscape

                               memcached
                                    key/value
   scalability & performance




                                                                RDBMS




                                       depth of functionality
What is NoSQL?




Key / Value   Column   Graph   Document
Key-Value Stores
• A mapping from a key to a value
• The store doesn't know anything about the the key
  or value
• The store doesn't know anything about the insides
  of the value
• Operations
 • Set, get, or delete a key-value pair
Column-Oriented Stores
• Like a relational store, but flipped around: all data
 for a column is kept together
• An index provides a means to get a column value for a
  record
• Operations:
• Get, insert, delete records; updating fields
• Streaming column data in and out of Hadoop
Graph Databases
• Stores vertex-to-vertex edges
• Operations:
 • Getting and setting edges
 • Sometimes possible to annotate vertices or edges
• Query languages support finding paths between
  vertices, subject to various constraints
Document Stores
• The store is a container for documents
• Documents are made up of named fields
 • Fields may or may not have type definitions
 • e.g. XSDs for XML stores, vs. schema-less JSON stores
• Can create "secondary indexes"
• These provide the ability to query on any document field(s)
• Operations:
• Insert and delete documents
• Update fields within documents
What is mongoDB?
MongoDB is a scalable, high-performance,
open source NoSQL database.

• Document-oriented storage
• Full Index Support
• Replication & High Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates
• Map/Reduce
• GridFS
• Company behind mongoDB
 – (A)GPL license, own copyrights, engineering team
 – support, consulting, commercial license

• Management
 – Google/DoubleClick, Oracle, Apple, NetApp
 – Funding: Sequoia, Union Square, Flybridge
 – Offices in NYC, Palo Alto, London, Dublin
 – 100+ employees
Where can you use it?
MongoDB is Implemented in C++
• Platforms 32/64 bit Windows, Linux, Mac OS-X,
  FreeBSD, Solaris

Drivers are available in many languages

10gen supported
• C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript,
  Perl, PHP, Python, Ruby, Scala, Node.JS

Community supported
• Clojure, ColdFusion, F#, Go, Groovy, Lua, R ...
  http://www.mongodb.org/display/DOCS/Drivers
History
• First release – February 2009
• v1.0 - August 2009
• v1.2 - December 2009 – MapReduce, ++
• v1.4 - March 2010 – Concurrency, Geo
• V1.6 - August 2010 – Sharding, Replica Sets
• V1.8 – March 2011 – Journaling, Geosphere
• V2.0 - Sep 2011 – V1 Indexes, Concurrency
• V2.2 - Soon - Aggregation, Concurrency
Terminology
RDBMS           MongoDB
Table           Collection
Row(s)          JSON Document
Index           Index
Join            Embedding & Linking
Partition       Shard
Partition Key   Shard Key
Documents
  Blog Post Document

> p = { author: "Chris",
         date: new ISODate(),
         text: "About MongoDB...",
         tags: ["tech", "databases"]}

> db.posts.save(p)
Querying

> db.posts.find()

   { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author : "Chris",
     date : ISODate("2012-02-02T11:52:27.442Z"),
     text : "About MongoDB...",
     tags : [ "tech", "databases" ] }

Notes:
     _id is unique, but can be anything you'd like
Introducing BSON
JSON has powerful, but limited set of datatypes
 • arrays, objects, strings, numbers and null

BSON is a binary representation of JSON
 • Adds extra dataypes with Date, Int types, Id, …
 • Optimized for performance and navigational abilities
 • And compression

MongoDB sends and stores data in BSON
 • bsonspec.org
Secondary Indexes
Create index on any Field in Document

//   1 means ascending, -1 means descending
 > db.posts.ensureIndex({author: 1})

> db.posts.findOne({author: 'Chris'})

  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author: "Chris", ... }
Compound Indexes
Create index on multiple fields in a Document

//   1 means ascending, -1 means descending
 > db.posts.ensureIndex({author: 1, ts: -1})

> db.posts.find({author: 'Chris'}).sort({ts: -1})

  [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author: "Chris", ...},
   { _id : ObjectId("4f61d325c496820ceba84124"),
     author: "Chris", ...}]
Query Operators
Conditional Operators
- $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type
- $lt, $lte, $gt, $gte

// find posts with any tags
> db.posts.find({tags: {$exists: true }})

// find posts matching a regular expression
> db.posts.find({author: /^ro*/i })

// count posts by author
> db.posts.find({author: 'Chris'}).count()
Examine the query plan
> db.posts.find({"author": 'Ross'}).explain()
{
	    "cursor" : "BtreeCursor author_1",
	    "nscanned" : 1,
	    "nscannedObjects" : 1,
	    "n" : 1,
	    "millis" : 0,
	    "indexBounds" : {
	    	   "author" : [
	    	   	    [
	    	   	    	   "Chris",
	    	   	    	   "Chris"
	    	   	    ]
	    	   ]
	    }
}
Atomic Operations
  $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit

// Create a comment
> new_comment = { author: "Fred",
                      date: new Date(),
                      text: "Best Post Ever!"}

// Add to post
> db.posts.update({ _id: "..." },
  	     	   	     {"$push": {comments: new_comment},
                   "$inc": {comments_count: 1}
                  });
Nested Documents
    {       _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
            author : "Chris",
            date : "Thu Feb 02 2012 11:50:01",
            text : "About MongoDB...",
            tags : [ "tech", "databases" ],
            comments : [{
	           	   author : "Fred",
	           	   date : "Fri Feb 03 2012 13:23:11",
	           	   text : "Best Post Ever!"
	           }],
            comment_count : 1
        }
Nested Documents
    {       _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
            author : "Chris",
            date : "Thu Feb 02 2012 11:50:01",
            text : "About MongoDB...",
            tags : [ "tech", "databases" ],
            comments : [{
	           	   author : "Fred",
	           	   date : "Fri Feb 03 2012 13:23:11",
	           	   text : "Best Post Ever!"
	           }],
            comment_count : 1
        }
Secondary Indexes
// Index nested documents
> db.posts.ensureIndex("comments.author": 1)
> db.posts.find({"comments.author": "Fred"})

// Index on tags (multi-key index)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: "tech" } )
Geo
  • Geo-spatial queries
   • Require a geo index
   • Find points near a given point
   • Find points within a polygon/sphere


// geospatial index
> db.posts.ensureIndex( "author.location": "2d" )
> db.posts.find( "author.location" :
                 { $near : [22, 42] } )
Map Reduce
    The caller provides map and reduce functions written
    in JavaScript
// Emit each tag
> map = "this['tags'].forEach(
    function(item) {emit(item, 1);}
  );"

// Calculate totals
> reduce = "function(key, values) {
     var total = 0;
     var valuesSize = values.length;
     for (var i=0; i < valuesSize; i++) {
       total += parseInt(values[i], 10);
     }
     return total;
  };
Map Reduce
// run the map reduce
> db.posts.mapReduce(map, reduce, {"out": { inline : 1}});
{
	    "results" : [
	    	    {"_id" : "databases", "value" : 1},
	    	    {"_id" : "tech", "value" : 1 }
	    ],
	    "timeMillis" : 1,
	    "counts" : {
	    	    "input" : 1,
	    	    "emit" : 2,
	    	    "reduce" : 0,
	    	    "output" : 2
	    },
	    "ok" : 1,
}
Aggregation - coming in 2.2
// Count tags
> agg = db.posts.aggregate(
    {$unwind: "$tags"},
    {$group : {_id : "$tags",
               count : {$sum: 1}}}
  )

> agg.result
  [{"_id": "databases", "count": 1},
   {"_id": "tech", "count": 1}]
GridFS
 Save files in mongoDB
 Stream data back to the client

// (Python) Create a new instance of GridFS
>>> fs = gridfs.GridFS(db)

// Save file to mongo
>>> my_image = open('my_image.jpg', 'r')
>>> file_id = fs.put(my_image)

// Read file
>>> fs.get(file_id).read()
Rich Documents

• Intuitive
• Developer friendly
• Encapsulates whole objects
• Performant
• They are scalable
Rich Documents
{   _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    line_items : [ { sku: 'tt-123',
                     name: 'Coltrane: Impressions' },
                   { ski: 'tt-457',
                     name: 'Davis: Kind of Blue' } ],
    address : { name: 'Banker',
                street: '111 Main',
                zip: 10010 },
    payment: { cc: 4567,
               exp: Date(2012, 7, 7) },
    subtotal: 2355
}
Deployment

• Single server
 - need a strong backup plan   P
Deployment

• Single server
 - need a strong backup plan       P
• Replica sets
 - High availability           P   S   S
 - Automatic failover
Deployment

• Single server
 - need a strong backup plan       P
• Replica sets
 - High availability           P   S   S
 - Automatic failover

• Sharded
 - Horizontally scale
 - Auto balancing              P   S   S

                               P   S   S
MongoDB Use Cases
• Archiving
• Content Management
• Ecommerce
• Finance
• Gaming
• Government
• Metadata Storage
• News & Media
• Online Advertising
• Online Collaboration
• Real-time stats/analytics
• Social Networks
• Telecommunications
In Production
download at mongodb.org

     conferences, appearances, and meetups
                http://www.10gen.com/events



   Facebook             |    Twitter   |        LinkedIn
http://bit.ly/mongofb       @mongodb   http://linkd.in/joinmongo


  support, training, and this talk brought to you by

Mongodb intro

  • 1.
    Introduction Christian Kvalheim -christkv@10gen.com @christkv
  • 2.
    Today's Talk • Quickintroduction to NoSQL • Some Background about mongoDB • Using mongoDB • Deploying mongoDB
  • 3.
    Database Landscape memcached key/value scalability & performance RDBMS depth of functionality
  • 4.
    What is NoSQL? Key/ Value Column Graph Document
  • 5.
    Key-Value Stores • Amapping from a key to a value • The store doesn't know anything about the the key or value • The store doesn't know anything about the insides of the value • Operations • Set, get, or delete a key-value pair
  • 6.
    Column-Oriented Stores • Likea relational store, but flipped around: all data for a column is kept together • An index provides a means to get a column value for a record • Operations: • Get, insert, delete records; updating fields • Streaming column data in and out of Hadoop
  • 7.
    Graph Databases • Storesvertex-to-vertex edges • Operations: • Getting and setting edges • Sometimes possible to annotate vertices or edges • Query languages support finding paths between vertices, subject to various constraints
  • 8.
    Document Stores • Thestore is a container for documents • Documents are made up of named fields • Fields may or may not have type definitions • e.g. XSDs for XML stores, vs. schema-less JSON stores • Can create "secondary indexes" • These provide the ability to query on any document field(s) • Operations: • Insert and delete documents • Update fields within documents
  • 9.
    What is mongoDB? MongoDBis a scalable, high-performance, open source NoSQL database. • Document-oriented storage • Full Index Support • Replication & High Availability • Auto-Sharding • Querying • Fast In-Place Updates • Map/Reduce • GridFS
  • 10.
    • Company behindmongoDB – (A)GPL license, own copyrights, engineering team – support, consulting, commercial license • Management – Google/DoubleClick, Oracle, Apple, NetApp – Funding: Sequoia, Union Square, Flybridge – Offices in NYC, Palo Alto, London, Dublin – 100+ employees
  • 11.
    Where can youuse it? MongoDB is Implemented in C++ • Platforms 32/64 bit Windows, Linux, Mac OS-X, FreeBSD, Solaris Drivers are available in many languages 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala, Node.JS Community supported • Clojure, ColdFusion, F#, Go, Groovy, Lua, R ... http://www.mongodb.org/display/DOCS/Drivers
  • 12.
    History • First release– February 2009 • v1.0 - August 2009 • v1.2 - December 2009 – MapReduce, ++ • v1.4 - March 2010 – Concurrency, Geo • V1.6 - August 2010 – Sharding, Replica Sets • V1.8 – March 2011 – Journaling, Geosphere • V2.0 - Sep 2011 – V1 Indexes, Concurrency • V2.2 - Soon - Aggregation, Concurrency
  • 13.
    Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key
  • 14.
    Documents BlogPost Document > p = { author: "Chris", date: new ISODate(), text: "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p)
  • 15.
    Querying > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Chris", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Notes: _id is unique, but can be anything you'd like
  • 16.
    Introducing BSON JSON haspowerful, but limited set of datatypes • arrays, objects, strings, numbers and null BSON is a binary representation of JSON • Adds extra dataypes with Date, Int types, Id, … • Optimized for performance and navigational abilities • And compression MongoDB sends and stores data in BSON • bsonspec.org
  • 17.
    Secondary Indexes Create indexon any Field in Document // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.findOne({author: 'Chris'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Chris", ... }
  • 18.
    Compound Indexes Create indexon multiple fields in a Document // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1, ts: -1}) > db.posts.find({author: 'Chris'}).sort({ts: -1}) [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Chris", ...}, { _id : ObjectId("4f61d325c496820ceba84124"), author: "Chris", ...}]
  • 19.
    Query Operators Conditional Operators -$all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find({tags: {$exists: true }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Chris'}).count()
  • 20.
    Examine the queryplan > db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Chris", "Chris" ] ] } }
  • 21.
    Atomic Operations $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit // Create a comment > new_comment = { author: "Fred", date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} });
  • 22.
    Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Chris", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 }
  • 23.
    Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Chris", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 }
  • 24.
    Secondary Indexes // Indexnested documents > db.posts.ensureIndex("comments.author": 1) > db.posts.find({"comments.author": "Fred"}) // Index on tags (multi-key index) > db.posts.ensureIndex( tags: 1) > db.posts.find( { tags: "tech" } )
  • 25.
    Geo •Geo-spatial queries • Require a geo index • Find points near a given point • Find points within a polygon/sphere // geospatial index > db.posts.ensureIndex( "author.location": "2d" ) > db.posts.find( "author.location" : { $near : [22, 42] } )
  • 26.
    Map Reduce The caller provides map and reduce functions written in JavaScript // Emit each tag > map = "this['tags'].forEach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuesSize = values.length; for (var i=0; i < valuesSize; i++) { total += parseInt(values[i], 10); } return total; };
  • 27.
    Map Reduce // runthe map reduce > db.posts.mapReduce(map, reduce, {"out": { inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timeMillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, }
  • 28.
    Aggregation - comingin 2.2 // Count tags > agg = db.posts.aggregate( {$unwind: "$tags"}, {$group : {_id : "$tags", count : {$sum: 1}}} ) > agg.result [{"_id": "databases", "count": 1}, {"_id": "tech", "count": 1}]
  • 29.
    GridFS Save filesin mongoDB Stream data back to the client // (Python) Create a new instance of GridFS >>> fs = gridfs.GridFS(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read()
  • 30.
    Rich Documents • Intuitive •Developer friendly • Encapsulates whole objects • Performant • They are scalable
  • 31.
    Rich Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), line_items : [ { sku: 'tt-123', name: 'Coltrane: Impressions' }, { ski: 'tt-457', name: 'Davis: Kind of Blue' } ], address : { name: 'Banker', street: '111 Main', zip: 10010 }, payment: { cc: 4567, exp: Date(2012, 7, 7) }, subtotal: 2355 }
  • 32.
    Deployment • Single server - need a strong backup plan P
  • 33.
    Deployment • Single server - need a strong backup plan P • Replica sets - High availability P S S - Automatic failover
  • 34.
    Deployment • Single server - need a strong backup plan P • Replica sets - High availability P S S - Automatic failover • Sharded - Horizontally scale - Auto balancing P S S P S S
  • 35.
    MongoDB Use Cases •Archiving • Content Management • Ecommerce • Finance • Gaming • Government • Metadata Storage • News & Media • Online Advertising • Online Collaboration • Real-time stats/analytics • Social Networks • Telecommunications
  • 36.
  • 37.
    download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by

Editor's Notes

  • #2 \n
  • #3 \n
  • #4 \n
  • #5 * memcache, redis, membase\n* mongodb, couch\n* cassandra, riak\n* neo4j, flockdb\n\n
  • #6 \n
  • #7 \n
  • #8 \n
  • #9 \n
  • #10 * JSON-style documents with dynamic schemas offer simplicity and power.\n* Index on any attribute, just like you&apos;re used to.\n* Mirror across LANs and WANs for scale and peace of mind.\n* Scale horizontally without compromising functionality.\n* Rich, document-based queries.\n* Atomic modifiers for contention-free performance.\n* Flexible aggregation and data processing.\n* Store files of any size without complicating your stack.\n
  • #11 \n
  • #12 \n
  • #13 \n
  • #14 * No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
  • #15 \n
  • #16 \n
  • #17 \n
  • #18 \n
  • #19 \n
  • #20 \n
  • #21 \n
  • #22 \n
  • #23 \n
  • #24 \n
  • #25 \n
  • #26 \n
  • #27 \n
  • #28 \n
  • #29 \n
  • #30 * If document is always presented as a whole - a single doc gives performance benefits\n* A single doc is not a panacea - as we&apos;ll see\n
  • #31 * Summarise using mongodb\n
  • #32 \n
  • #33 * Single Master..\n
  • #34 \n
  • #35 \n
  • #36 \n
  • #37 \n
  • #38 \n
  • #39 \n
  • #40 \n
  • #41 \n
  • #42 \n
  • #43 \n
  • #44 \n
  • #45 \n
  • #46 \n