KEMBAR78
Inside MongoDB: the Internals of an Open-Source Database | PDF
Mike	
  Dirolf	
  -­‐	
  @mdirolf	
  -­‐	
  10gen,	
  Inc.

                                            http://www.mongodb.org
Inside




  http://www.flickr.com/photos/tmh9/677919415/
a word of warning
  this	
  talk	
  might	
  be	
  a	
  bit	
  “hard”,
  but	
  MongoDB	
  is	
  really	
  easy:
  http://try.mongodb.org
db.test.insert({hello:	
  “world”})
_id

    if	
  not	
  specified	
  drivers	
  will	
  add	
  default:

ObjectId("4bface1a2231316e04f3c434")
                timestamp
                        machine	
  id
                               process	
  id
                                           counter


                  http://www.mongodb.org/display/DOCS/Object+IDs
BSON Encoding
     {_id:	
  ObjectId(XXXXXXXXXXXX),
     	
  hello:	
  “world”}


x27x00x00x00x07	
  	
  	
  _	
  	
  	
  i	
  	
  	
  dx00	
  	
  	
  X	
                                                     	
  
	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X	
  	
  	
  X
	
  	
  	
  Xx02	
  	
  	
  h	
  	
  	
  e	
  	
  	
  l	
  	
  	
  l	
  	
  	
  ox00x06x00
x00x00	
  	
  	
  w	
  	
  	
  o	
  	
  	
  r	
  	
  	
  l	
  	
  	
  dx00x00

                                                                                                      http://bsonspec.org
Insert Message (TCP/IP)

message	
  length   request	
  id                 response	
  id      op	
  code	
  (insert)
x68x00x00x00 xXXxXXxXXxXX x00x00x00x00 xd2x07x00x00

     reserved            collection	
  name                           document(s)
 x00x00x00x00      f	
  o	
  o	
  .	
  t	
  e	
  s	
  t	
  x00     BSON	
  Data




          http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol
Data File Allocation

                           $	
  ls	
  -­‐sk	
  /data/db/

                                                           }
                           	
  16384	
  foo.ns             allocated	
  per	
  

(up	
  to	
  2	
  gigs){
double	
  in	
  size	
     	
  65536	
  foo.0
                           131072	
  foo.1
                                                             database


                           	
  16384	
  bar.ns
                           	
  	
  	
  	
  	
  	
  	
  ...
Memory Management
Extent Allocation
foo.0
                      allocated	
  per	
  namespace:
foo.1                      foo.test
        00000000000        foo.bar
                           foo.baz
        00000000000
        00000000000        foo.$freelist
        00000000000   0000 preallocated	
  space
foo.2   00000000000
        00000000000   ns	
  details	
  stored	
  in	
  foo.ns
        00000000000
        00000000000
Record Allocation
                      ...
  Header	
  (Size,	
  Offset,	
  Next,	
  Prev)
                BSON	
  Data
                   Padding

 Deleted	
  Record	
  (Size,	
  Offset,	
  Next)

                      ...
Indexing

B-­‐Tree	
  indexes,	
  stored	
  in	
  own	
  namespaces
                              >	
  db.system.namespaces.find()
                              {	
  "name"	
  :	
  "foo.system.indexes"	
  }
                              {	
  "name"	
  :	
  "foo.test"	
  }
                              {	
  "name"	
  :	
  "foo.test.$_id_"	
  }




                  http://www.mongodb.org/display/DOCS/Indexes
db.test.find({hello:	
  “world”})
Query Language

“query	
  by	
  example”	
  plus	
  $	
  modifiers:
 {first_name:	
  “Mike”,
 	
  age:	
  {$gte:	
  20,	
  $lt:	
  40}}




       http://www.mongodb.org/display/DOCS/Advanced+Queries
Cursors
>	
  var	
  c	
  =	
  db.test.find({x:	
  20}).skip(20).limit(10)
>	
  c.next()
>	
  c.next()
	
  	
  	
  ...
                                        query
                        first	
  N	
  results	
  +	
  cursor	
  id


                           getMore	
  w/	
  cursor	
  id
                    next	
  N	
  results	
  +	
  cursor	
  id	
  or	
  0
                                           ...
Query Optimizer
          find({x:	
  10,	
  y:	
  “foo”})


	
  	
  scan
                                    terminate
	
  	
  index	
  on	
  x

	
  	
  index	
  on	
  y     remember
db.foo.drop()
Commands


    drop,	
  count,	
  copydb,	
  
findAndModify,	
  serverStatus,	
  ...




         http://www.mongodb.org/display/DOCS/Commands
db.foo.drop();

                 =
  db.foo.runCommand({drop:	
  "foo"});

                 =
   db.$cmd.findOne({drop:	
  "foo"});

                 =
db.$cmd.find({drop:	
  "foo"}).limit(-­‐1);
Capped Collections
            preallocated
      auto	
  LRI	
  age-­‐out
     no	
  default	
  _id	
  index
   always	
  in	
  insertion	
  order




  http://www.mongodb.org/display/DOCS/Capped+Collections
Replication Oplog
                                            >	
  use	
  local
                                            switched	
  to	
  db	
  local
>	
  use	
  foo                             >	
  db.oplog.$main.find()
switched	
  to	
  db	
  foo                 {ts:	
  ...,	
  op:	
  "n",	
  ns:	
  "",	
  o:	
  {}}
                                            {ts:	
  ...,	
  op:	
  "n",	
  ns:	
  "",	
  o:	
  {}}
>	
  db.test.insert({x:	
  1,	
  url:	
     {ts:	
  ...,	
  op:	
  "i",	
  ns:	
  "foo.test",	
  
"http://dirolf.com"});                      	
  o:	
  {_id:	
  ObjectId("..."),
                                            	
  	
  	
  	
  	
  x:	
  1,
                                            	
  	
  	
  	
  	
  url:	
  "http://dirolf.com"}}
                                            {ts:	
  ...,	
  op:	
  "n",	
  ns:	
  "",	
  o:	
  {}}
>	
  db.test.update({url:	
  "http://       {ts:	
  ...,	
  op:	
  "u",	
  ns:	
  "foo.test",
dirolf.com"},	
  {$inc:	
  {x:	
  1}});     	
  o2:	
  {_id:	
  ObjectId("...")},
                                            	
  o:	
  {$set:	
  {x:	
  2}}}



                               http://www.mongodb.org/display/DOCS/Replication
Replication Topology
                                         master           slave

        master
                                         master           slave


slave       slave        slave           master          master

                                          slave          master



                    http://www.mongodb.org/display/DOCS/Replication
Auto-Sharding
                      Shards
          mongod      mongod       mongod
                                                   ...
Config     mongod     mongod        mongod
Servers

mongod

mongod

mongod
                      mongos      mongos     ...


                       client
            http://www.mongodb.org/display/DOCS/Sharding
Geohashing
                              (20,	
  10)
             (0001	
  0100,	
  0000	
  1010)
                0000	
  0010	
  0110	
  0100

maps	
  close	
  coordinates	
  (21,	
  9)	
  to	
  close	
  hashes:
             0000	
  0010	
  0110	
  0011

   tricky	
  part	
  happens	
  at	
  bit-­‐flips	
  (127	
  vs	
  128)

          http://www.mongodb.org/display/DOCS/Geospatial+Indexing
Download MongoDB
            http://www.mongodb.org




these	
  slides	
  are	
  available	
  at	
  http://dirolf.com

Inside MongoDB: the Internals of an Open-Source Database

  • 1.
    Mike  Dirolf  -­‐  @mdirolf  -­‐  10gen,  Inc. http://www.mongodb.org
  • 2.
  • 3.
    a word ofwarning this  talk  might  be  a  bit  “hard”, but  MongoDB  is  really  easy: http://try.mongodb.org
  • 4.
  • 5.
    _id if  not  specified  drivers  will  add  default: ObjectId("4bface1a2231316e04f3c434") timestamp machine  id process  id counter http://www.mongodb.org/display/DOCS/Object+IDs
  • 6.
    BSON Encoding {_id:  ObjectId(XXXXXXXXXXXX),  hello:  “world”} x27x00x00x00x07      _      i      dx00      X          X      X      X      X      X      X      X      X      X      X      Xx02      h      e      l      l      ox00x06x00 x00x00      w      o      r      l      dx00x00 http://bsonspec.org
  • 7.
    Insert Message (TCP/IP) message  length request  id response  id op  code  (insert) x68x00x00x00 xXXxXXxXXxXX x00x00x00x00 xd2x07x00x00 reserved collection  name document(s) x00x00x00x00 f  o  o  .  t  e  s  t  x00 BSON  Data http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol
  • 8.
    Data File Allocation $  ls  -­‐sk  /data/db/ }  16384  foo.ns allocated  per   (up  to  2  gigs){ double  in  size    65536  foo.0 131072  foo.1 database  16384  bar.ns              ...
  • 9.
  • 10.
    Extent Allocation foo.0 allocated  per  namespace: foo.1 foo.test 00000000000 foo.bar foo.baz 00000000000 00000000000 foo.$freelist 00000000000 0000 preallocated  space foo.2 00000000000 00000000000 ns  details  stored  in  foo.ns 00000000000 00000000000
  • 11.
    Record Allocation ... Header  (Size,  Offset,  Next,  Prev) BSON  Data Padding Deleted  Record  (Size,  Offset,  Next) ...
  • 12.
    Indexing B-­‐Tree  indexes,  stored  in  own  namespaces >  db.system.namespaces.find() {  "name"  :  "foo.system.indexes"  } {  "name"  :  "foo.test"  } {  "name"  :  "foo.test.$_id_"  } http://www.mongodb.org/display/DOCS/Indexes
  • 13.
  • 14.
    Query Language “query  by  example”  plus  $  modifiers: {first_name:  “Mike”,  age:  {$gte:  20,  $lt:  40}} http://www.mongodb.org/display/DOCS/Advanced+Queries
  • 15.
    Cursors >  var  c  =  db.test.find({x:  20}).skip(20).limit(10) >  c.next() >  c.next()      ... query first  N  results  +  cursor  id getMore  w/  cursor  id next  N  results  +  cursor  id  or  0 ...
  • 16.
    Query Optimizer find({x:  10,  y:  “foo”})    scan terminate    index  on  x    index  on  y remember
  • 17.
  • 18.
    Commands drop,  count,  copydb,   findAndModify,  serverStatus,  ... http://www.mongodb.org/display/DOCS/Commands
  • 19.
    db.foo.drop(); = db.foo.runCommand({drop:  "foo"}); = db.$cmd.findOne({drop:  "foo"}); = db.$cmd.find({drop:  "foo"}).limit(-­‐1);
  • 20.
    Capped Collections preallocated auto  LRI  age-­‐out no  default  _id  index always  in  insertion  order http://www.mongodb.org/display/DOCS/Capped+Collections
  • 21.
    Replication Oplog >  use  local switched  to  db  local >  use  foo >  db.oplog.$main.find() switched  to  db  foo {ts:  ...,  op:  "n",  ns:  "",  o:  {}} {ts:  ...,  op:  "n",  ns:  "",  o:  {}} >  db.test.insert({x:  1,  url:   {ts:  ...,  op:  "i",  ns:  "foo.test",   "http://dirolf.com"});  o:  {_id:  ObjectId("..."),          x:  1,          url:  "http://dirolf.com"}} {ts:  ...,  op:  "n",  ns:  "",  o:  {}} >  db.test.update({url:  "http:// {ts:  ...,  op:  "u",  ns:  "foo.test", dirolf.com"},  {$inc:  {x:  1}});  o2:  {_id:  ObjectId("...")},  o:  {$set:  {x:  2}}} http://www.mongodb.org/display/DOCS/Replication
  • 22.
    Replication Topology master slave master master slave slave slave slave master master slave master http://www.mongodb.org/display/DOCS/Replication
  • 23.
    Auto-Sharding Shards mongod mongod mongod ... Config mongod mongod mongod Servers mongod mongod mongod mongos mongos ... client http://www.mongodb.org/display/DOCS/Sharding
  • 24.
    Geohashing (20,  10) (0001  0100,  0000  1010) 0000  0010  0110  0100 maps  close  coordinates  (21,  9)  to  close  hashes: 0000  0010  0110  0011 tricky  part  happens  at  bit-­‐flips  (127  vs  128) http://www.mongodb.org/display/DOCS/Geospatial+Indexing
  • 25.
    Download MongoDB http://www.mongodb.org these  slides  are  available  at  http://dirolf.com