KEMBAR78
MongoDB Live Hacking | PPTX
MongoDB Live Coding Session
tobias.trelle@codecentric.de
@tobiastrelle
 codecentric AG
„It‘s not my fault the chapters are short,
  MongoDB is just easy to learn“

      from „The Little MongoDB book“




codecentric AG
MongoDB User Groups by codecentric



                  MongoDB User-Gruppe Düsseldorf
                 https://www.xing.com/net/mongodb-dus
                               @MongoDUS
                         Contact: Tobias Trelle




                 MongoDB User-Gruppe Frankfurt/Main
                  https://www.xing.com/net/mongodb-ffm
                          Contact: Uwe Seiler



codecentric AG
What is MongoDB?

      Named from „humongous“ = gigantic   http://www.mongodb.org


      NoSQL datastore, Open Source https://github.com/mongodb
      support from manufacturer 10gen http://www.10gen.com


      Highly scalable (scale-out)

      Stores so called „documents“


      Supports replication & sharding


      Map/Reduce


      Geospatial indexes / queries

codecentric AG
Basic structure of a MongoDB server


                           Server



                          Database
Relational counterpart                 But …
                                       Flexible
            Table         Collection
                                       Schema

                 Row      Document


                                        - Arrays
                 Column     Field
                                        - recursive
codecentric AG
What‘s a document?

      Single record that can be stored in a collection


      JSON = JavaScript Object Notation (internal representation BSON = Binary JSON)


      var doc =      {
                     title: „MongoDB_Live_Hacking.pptx“,
                     tags: [ „cc“, „mongodb“, „nosql“ ],
                     slides: [
                               { nr = 1, header = „MongoDB User Groups by codecentric“},
                               { nr = 2, header = „MongoDB at codedcentric WiKi“},
                               …
                               ]
                     };




codecentric AG
Live Session

      CRUD operations


      Queries


      Geospatial Queries


      Map/Reduce


      Replication


      Sharding


      Raw Java API & Spring Data API


codecentric AG
Geospatial Queries

      Queries based on
      2-dimensional coordinates


      _id: "A", position: [0.001, -0.002]
      _id: "B", position: [0.75, 0.75]
      _id: "C", position: [0.5, 0.5]
      _id: "D", position: [-0.5, -0.5]


      Queries based on distances
      & shapes


      Details:
      http://blog.codecentric.de/en/2012/02/spring-data-mongodb-geospatial-queries/




codecentric AG
Map/Reduce

      Data processing algorithm based on two phases: map & reduce


      Code execution co-located with the data


      Map phase can be run in parallel (on multiple nodes etc.) on huge data sets


      MongoDB map / reduce:

                 runs on a subset of / all documents of a collection


                 Map / Reduce algorithms are JS functions


                 Output documents of the map function are input to the reduce function


                 Results are documents stored in a target collection
codecentric AG
Map/Reduce example
      We want to count occurences of tags assigned to our documents:
          {name: „Doc 1“, tags: [ „cc“, „mongodb“, „nosql“ ] }
          {name: „Doc 2“, tags: [ „cc“, „agile“ ] }              Map output:
          {name: „Doc 3“, tags: [ „cc“ ] }                       key = „cc“, value = {count: 1}
                                                                              key = „mongodb“, value = {count: 1}
                                                                              key = „nosql“, value = {count: 1}
      Map function:
                                                                              key = „cc“, value = {count: 1}
      function() { this.tags.forEach( function(tag) {                         key = „agile“, value = {count: 1}
                       emit( tag, {count: 1} )                                key = „cc“, value = {count: 1}
                       })
      }
      Reduce function:                                  Reduce input:
      function(key, values) {                           key = „cc“, values = [ {count: 1}, {count: 1}, {count: 1} ]
             var result = {count: 0};
                                                        key = „mongodb“, values = [ {count: 1} ]
             values.forEach(function(value) {
                                                        key = „nosql“, values = [ {count: 1} ]
                       result.count += value.count;
                                                        key = „agile“, values = [ {count: 1} ]
             });
             return result;
      }
codecentric AG
MongoDB Replication

      A cluster is called „replica set“
      Uses Master/Slave replication
      Writes from clients go to the master only
      If the master goes down, the slaves elect a new master (n > 2)

                                                    Replica set w/ n = 3



                                                                           Slave 1

                 Client                    Master

                                                                           Slave 2



codecentric AG
MongoDB Sharding

      Data is distributed over n nodes, each record is persisted only once
      Data only on the shard nodes
      Config Server = book keeper, knows where the data is
      Switch: Gateway for clients

                                                  Sharding setup

                                Config
                                Server                                 Shard 1




                                                                        Shard 2
             Client             Switch


codecentric AG
MongoDB Sharding in Production

      Each shard is a replica set + 3 config servers




Source: http://www.mongodb.org/display/DOCS/Sharding+Introduction
codecentric AG
MongoDB Sharding Example: Initial State
mongos> sh.status()
--- Sharding Status ---
   sharding version: { "_id" : 1, "version" : 3 }
   shards:                                                                    2 Shards
             {   "_id" : "shard0000",     "host" : "tmp-pc:9000" }
             {   "_id" : "shard0001",     "host" : "tmp-pc:9001" }
   databases:
             {   "_id" : "admin",   "partitioned" : false,    "primary" : "config" }
             {   "_id" : "data",    "partitioned" : true,    "primary" : "shard0000" }
                      data.foo chunks:
                                         shard0000      1
                              { "age" : { $minKey : 1 } } -->> { "age" : { $maxKey : 1 } } on : shard0000 { "t" : 1000,
     "i" : 0 }




codecentric AG
MongoDB Sharding Example: Multiple Chunks
mongos> sh.status()
--- Sharding Status ---
   sharding version: { "_id" : 1, "version" : 3 }
   shards:                                                                    2 Shards
             {   "_id" : "shard0000",     "host" : "tmp-pc:9000" }
             {   "_id" : "shard0001",     "host" : "tmp-pc:9001" }
   databases:
             {   "_id" : "admin",   "partitioned" : false,    "primary" : "config" }
             {   "_id" : "data",    "partitioned" : true,    "primary" : "shard0000" }
                      data.foo chunks:
                                         shard0001      4
                                         shard0000      5
              Chunks
                              { "age" : { $minKey : 1 } } -->> { "age" : 50 } on : shard0001 { "t" : 2000, "i" : 0 }
            are equally
            distributed       { "age" : 50 } -->> { "age" : 53 } on : shard0001 { "t" : 3000, "i" : 0 }
                              { "age" : 53 } -->> { "age" : 54 } on : shard0001 { "t" : 4000, "i" : 0 }
                              { "age" : 54 } -->> { "age" : 58 } on : shard0001 { "t" : 5000, "i" : 0 }
                              { "age" : 58 } -->> { "age" : 60 } on : shard0000 { "t" : 5000, "i" : 1 }
                              { "age" : 60 } -->> { "age" : 63 } on : shard0000 { "t" : 1000, "i" : 14 }
                              { "age" : 63 } -->> { "age" : 65 } on : shard0000 { "t" : 1000, "i" : 11 }
                              { "age" : 65 } -->> { "age" : 69 } on : shard0000 { "t" : 1000, "i" : 12 }
                              { "age" : 69 } -->> { "age" : { $maxKey : 1 } } on : shard0000 { "t" : 1000, "i" : 4 }


codecentric AG
MongoDB API
      Drivers for many languages (Java, Ruby, PHP, C++, …)
      Low level Java API: MongoDB Java Driver
      Spring Data MongoDB: Repository Support + Objekt/Collection Mapping

                                                 Spring Data
                                       CrudRepository     PagingAndSortingRepository

                       Spring Data      Spring Data          Spring Data           Spring Data
                           JPA           MongoDB               Neo4j                    …
                      JpaRepository   MongoRepository      GraphRepository
                                      MongoTemplate         Neo4jTemplate


                                                           Embedded     REST


                           JPA        Mongo Java Driver

                          JDBC



                         RDBMS             MongoDB              Neo4j                   …




codecentric AG
QUESTION?

Tobias Trelle

codecentric AG
Merscheider Str. 1
42699 Solingen

tel              +49 (0) 212.233628.47
fax              +49 (0) 212.233628.79
mail             Tobias.Trelle@codecentric.de
twitter          @tobiastrelle

www.codecentric.de
www.mbg-online.de
blog.codecentric.de
www.xing.com/net/mongodb-dus


codecentric AG                                  20.08.2012   17

MongoDB Live Hacking

  • 1.
    MongoDB Live CodingSession tobias.trelle@codecentric.de @tobiastrelle codecentric AG
  • 2.
    „It‘s not myfault the chapters are short, MongoDB is just easy to learn“ from „The Little MongoDB book“ codecentric AG
  • 3.
    MongoDB User Groupsby codecentric MongoDB User-Gruppe Düsseldorf https://www.xing.com/net/mongodb-dus @MongoDUS Contact: Tobias Trelle MongoDB User-Gruppe Frankfurt/Main https://www.xing.com/net/mongodb-ffm Contact: Uwe Seiler codecentric AG
  • 4.
    What is MongoDB? Named from „humongous“ = gigantic http://www.mongodb.org NoSQL datastore, Open Source https://github.com/mongodb support from manufacturer 10gen http://www.10gen.com Highly scalable (scale-out) Stores so called „documents“ Supports replication & sharding Map/Reduce Geospatial indexes / queries codecentric AG
  • 5.
    Basic structure ofa MongoDB server Server Database Relational counterpart But … Flexible Table Collection Schema Row Document - Arrays Column Field - recursive codecentric AG
  • 6.
    What‘s a document? Single record that can be stored in a collection JSON = JavaScript Object Notation (internal representation BSON = Binary JSON) var doc = { title: „MongoDB_Live_Hacking.pptx“, tags: [ „cc“, „mongodb“, „nosql“ ], slides: [ { nr = 1, header = „MongoDB User Groups by codecentric“}, { nr = 2, header = „MongoDB at codedcentric WiKi“}, … ] }; codecentric AG
  • 7.
    Live Session CRUD operations Queries Geospatial Queries Map/Reduce Replication Sharding Raw Java API & Spring Data API codecentric AG
  • 8.
    Geospatial Queries Queries based on 2-dimensional coordinates _id: "A", position: [0.001, -0.002] _id: "B", position: [0.75, 0.75] _id: "C", position: [0.5, 0.5] _id: "D", position: [-0.5, -0.5] Queries based on distances & shapes Details: http://blog.codecentric.de/en/2012/02/spring-data-mongodb-geospatial-queries/ codecentric AG
  • 9.
    Map/Reduce Data processing algorithm based on two phases: map & reduce Code execution co-located with the data Map phase can be run in parallel (on multiple nodes etc.) on huge data sets MongoDB map / reduce: runs on a subset of / all documents of a collection Map / Reduce algorithms are JS functions Output documents of the map function are input to the reduce function Results are documents stored in a target collection codecentric AG
  • 10.
    Map/Reduce example We want to count occurences of tags assigned to our documents: {name: „Doc 1“, tags: [ „cc“, „mongodb“, „nosql“ ] } {name: „Doc 2“, tags: [ „cc“, „agile“ ] } Map output: {name: „Doc 3“, tags: [ „cc“ ] } key = „cc“, value = {count: 1} key = „mongodb“, value = {count: 1} key = „nosql“, value = {count: 1} Map function: key = „cc“, value = {count: 1} function() { this.tags.forEach( function(tag) { key = „agile“, value = {count: 1} emit( tag, {count: 1} ) key = „cc“, value = {count: 1} }) } Reduce function: Reduce input: function(key, values) { key = „cc“, values = [ {count: 1}, {count: 1}, {count: 1} ] var result = {count: 0}; key = „mongodb“, values = [ {count: 1} ] values.forEach(function(value) { key = „nosql“, values = [ {count: 1} ] result.count += value.count; key = „agile“, values = [ {count: 1} ] }); return result; } codecentric AG
  • 11.
    MongoDB Replication A cluster is called „replica set“ Uses Master/Slave replication Writes from clients go to the master only If the master goes down, the slaves elect a new master (n > 2) Replica set w/ n = 3 Slave 1 Client Master Slave 2 codecentric AG
  • 12.
    MongoDB Sharding Data is distributed over n nodes, each record is persisted only once Data only on the shard nodes Config Server = book keeper, knows where the data is Switch: Gateway for clients Sharding setup Config Server Shard 1 Shard 2 Client Switch codecentric AG
  • 13.
    MongoDB Sharding inProduction Each shard is a replica set + 3 config servers Source: http://www.mongodb.org/display/DOCS/Sharding+Introduction codecentric AG
  • 14.
    MongoDB Sharding Example:Initial State mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3 } shards: 2 Shards { "_id" : "shard0000", "host" : "tmp-pc:9000" } { "_id" : "shard0001", "host" : "tmp-pc:9001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "data", "partitioned" : true, "primary" : "shard0000" } data.foo chunks: shard0000 1 { "age" : { $minKey : 1 } } -->> { "age" : { $maxKey : 1 } } on : shard0000 { "t" : 1000, "i" : 0 } codecentric AG
  • 15.
    MongoDB Sharding Example:Multiple Chunks mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3 } shards: 2 Shards { "_id" : "shard0000", "host" : "tmp-pc:9000" } { "_id" : "shard0001", "host" : "tmp-pc:9001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "data", "partitioned" : true, "primary" : "shard0000" } data.foo chunks: shard0001 4 shard0000 5 Chunks { "age" : { $minKey : 1 } } -->> { "age" : 50 } on : shard0001 { "t" : 2000, "i" : 0 } are equally distributed { "age" : 50 } -->> { "age" : 53 } on : shard0001 { "t" : 3000, "i" : 0 } { "age" : 53 } -->> { "age" : 54 } on : shard0001 { "t" : 4000, "i" : 0 } { "age" : 54 } -->> { "age" : 58 } on : shard0001 { "t" : 5000, "i" : 0 } { "age" : 58 } -->> { "age" : 60 } on : shard0000 { "t" : 5000, "i" : 1 } { "age" : 60 } -->> { "age" : 63 } on : shard0000 { "t" : 1000, "i" : 14 } { "age" : 63 } -->> { "age" : 65 } on : shard0000 { "t" : 1000, "i" : 11 } { "age" : 65 } -->> { "age" : 69 } on : shard0000 { "t" : 1000, "i" : 12 } { "age" : 69 } -->> { "age" : { $maxKey : 1 } } on : shard0000 { "t" : 1000, "i" : 4 } codecentric AG
  • 16.
    MongoDB API Drivers for many languages (Java, Ruby, PHP, C++, …) Low level Java API: MongoDB Java Driver Spring Data MongoDB: Repository Support + Objekt/Collection Mapping Spring Data CrudRepository PagingAndSortingRepository Spring Data Spring Data Spring Data Spring Data JPA MongoDB Neo4j … JpaRepository MongoRepository GraphRepository MongoTemplate Neo4jTemplate Embedded REST JPA Mongo Java Driver JDBC RDBMS MongoDB Neo4j … codecentric AG
  • 17.
    QUESTION? Tobias Trelle codecentric AG MerscheiderStr. 1 42699 Solingen tel +49 (0) 212.233628.47 fax +49 (0) 212.233628.79 mail Tobias.Trelle@codecentric.de twitter @tobiastrelle www.codecentric.de www.mbg-online.de blog.codecentric.de www.xing.com/net/mongodb-dus codecentric AG 20.08.2012 17