KEMBAR78
MongoDB Strange Loop 2009 | KEY
open-source, high-performance,
schema-free, document-oriented
           database
RDBMS

• Great for many applications
• Shortcomings
 • Scalability
 • Flexibility
CAP theorem

• Consistency
• Availability
• Tolerance to network Partitions
• Pick two

       http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
ACID vs BASE

•   Atomicity
                  •   Basically Available
•   Consistency
                  •   Soft state
•   Isolation
                  •   Eventually consistent
•   Durability
Compared to key-value
      stores
Compared to
 CouchDB
JSON-style documents
Schema-free

• Loosening constraints - added flexibility
• Dynamically typed languages
• Migrations
Dynamic queries

• Administration
• Ease of development
• Familiarity
Focus on performance
Replication
Auto-sharding
Many supported
platforms / languages
Good at

• The web
• Caching
• High volume data
• Scalability
Less good at

• Highly transactional
• Ad-hoc business intelligence
• Problems that require SQL
MongoDB Basics
Document


• Unit of storage (think row)
• BSON (Binary JSON)
Collection

• Schema-free equivalent of a table
• Logical groups of documents
• Indexes are per-collection
_id

• Special key
• Present in all documents
• Unique across a Collection
• Any type you want
Blog back-end
Post

{author: “mike”,
 date: new Date(),
 text: “my blog post...”,
 tags: [“mongodb”, “strange”, “loop”]}
Comment


{author: “eliot”,
 date: new Date(),
 text: “great post!”}
New post

post = {author: “mike”,
  date: new Date(),
  text: “my blog post...”,
  tags: [“mongodb”, “strange”, “loop”]}

db.posts.save(post)
Embedding a comment

c = {author: “eliot”,
  date: new Date(),
  text: “great post!”}

db.posts.update({_id: post._id},
                {$push: {comments: c}})
Posts by author


db.posts.find({author: “mike”})
Last 10 posts

db.posts.find()
        .sort({date: -1})
        .limit(10)
Posts in the last week


last_week = new Date(2009, 9, 9)

db.posts.find({date: {$gt: last_week}})
Posts ending with
          ‘Loop’


db.posts.find({text: /loop$/})
Posts with a tag
db.posts.find({tag: “mongodb”})




          ... and fast
db.posts.ensureIndex({tag: 1})
Counting posts


db.posts.count()

db.posts.find({author: “mike”}).count()
Basic paging

page = 2
page_size = 15

db.posts.find().limit(page_size)
               .skip(page * page_size)
Migration: adding titles
  • Easy - just start adding them:
post = {author: “mike”,
        date: new Date(),
        text: “another blog post...”,
        tags: [“strange”, “loop”],
        title: “Review from Strange Loop”}

post_id = db.posts.save(post)
Advanced queries


    • $gt, $lt, $gte, $lte, $ne, $all, $in, $nin
    • where()
db.posts.find({$where: “this.author == ‘mike’”})
Sharding
Terminology
• Shard key
• Chunk
 • Range of the value space
 • (collection, key, min_val, max_val)
• Shard
 • Single node (or replica pair)
 • Responsible for set of chunks
Other cool stuff

• Aggregation and map reduce
• Capped collections
• Unique indexes
• Mongo shell
• GridFS
• Download MongoDB
  http://www.mongodb.org

• Try it out
• Let us know what you think!
• http://www.mongodb.org
• irc.freenode.net#mongodb
• mongodb-user on google groups
• @mongodb, @mdirolf
• mike@10gen.com
• http://www.slideshare.net/mdirolf

MongoDB Strange Loop 2009

Editor's Notes

  • #2 Mike Dirolf, 10gen sponsors MongoDB Thanks Plan: A little about what makes MongoDB interesting Some flavor for the API / querying possibilities Sharding Questions
  • #5 C - The client perceives that a set of operations has occurred all at once A - Every operation must terminate in an intended response P - Operations will complete even if individual components are unavailable Horizontal scaling necessitates P, so forced to choose C or A
  • #6 Transaction is all or none Consistent state at beginning + end of transaction Transaction behaves as if only operation Upon completion, operation will not be reversed Sacrifice consistency for availability
  • #9 More complex than just Key-Value Secondary Indexes Embedded Documents
  • #11 Compare to Couch
  • #12 No Separate Caching Layer
  • #13 Master-Slave Replica Pairs For Failover
  • #14 Infinite Scalability
  • #25 Collection (logical groupings of documents) Indexes are per-collection
  • #37 Order Preserving Partitioning Split Migrate
  • #38 Process Diagram Global vs Targeted Operations Config Servers Use Two Phase Commit
  • #39 Server Layout
  • #41 blog post twitter