KEMBAR78
MongoDB as Message Queue | PDF
MongoDB as A
Message Queue
         Luke Gotszling

          Aol / About.me

Silicon Valley MongoDB User Group
           Big Data Week
           Palo Alto, CA
           April 25, 2012

                                    1
Prior AMQP Usage

• 3-node RabbitMQ cluster on v1.8, opted to
  forego disk persistence for better
  performance
• Hard to diagnose cause of failure at scale




                                               2
At About.me


• All asynchronous and periodic tasks
• Short lived messages
 • No journalling
• Sharded cluster on v2.0.4 (shard key =
  queue name)



                                           3
Benefits

• Async operations
• Per message (document) atomicity
• Batch processes
• Periodic processes
• Durability / ability to shard
• Operational familiarity


                                     4
AMQP?
                       Direct               Topic               Fanout
                                                    ?



 AMQP                   Push                  Yes                  Yes




 Mongo                                    Regular
                         Poll                                   Sort of*
 Queue                                   expression

* Options include passing a message along with an incrementing key or
multiple declarations. Added to Kombu in v2.1 -- reduces performance for
non-fanout operations due to additional queries
                                                                           5
To cap or not to cap
• Capped collections[1]
   • Better performance but limited to single node[2]
   • FIFO
• Uncapped collections -- rest of this presentation
   • Can shard, lower performance per-node
   • FIFO-ish[3], custom ordering available
[1] http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/

   http://blog.boxedice.com/2011/09/28/replacing-rabbitmq-with-mongodb/

[2] SERVER-211, SERVER-2654

[3] Only down to 1 second granularity
                                                                          6
Code (mongo)
• Create:
    db.messages.insert( { queue:"email",
                          payload:serialized_data} )


• Consume:
    db.messages.findAndModify( { query:{"queue":"email"},
                                 sort:{"_id":+1},
                                 remove:true} )




• Index:
     db.messages.ensureIndex({ queue:1 })
     db.messages.ensureIndex({ queue:1, _id:1})



                                                            7
Code (Python)
• Create:
    self.client.insert({"payload": serialize(message),
                        "queue": queue})


• Consume:
     self.client.database.command("findandmodify", "messages",
                           query={"queue": queue},
                           sort={"_id": pymongo.ASCENDING},
                           remove=True)



• Index:
     col.ensure_index([("queue", 1)])
     col.ensure_index([("queue", 1),("_id", 1)])

  http://packages.python.org/kombu/

                                                                 8
Celery Task Creation
              Benchmarks (Single-Node)
                         RabbitMQ v2.7.1                              MongoDB (2.0.4) --nojournal
                         MongoDB (2.0.4) --journal

              5600


              4200
Created / s




              2800


              1400


                 0
                     1                     2                      3                      4             5

                                                    Concurrency (processes)
                            celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16

                                                                                                           9
Celery Task Consumption
               Benchmarks (Single-Node)
                          RabbitMQ v2.7.1                          MongoDB (2.0.4) --nojournal
                          MongoDB (2.0.4) --journal

               2000


               1500
Consumed / s




               1000


                500


                  0
                      1            5              9              13             17              21     25

                                                      Concurrency (eventlet)
                            celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16

                                                                                                            10
Pros                       Cons
• Familiar technology    • Not AMQP

• Sharding               • Need to poll

• Durability             • Performance depends
                           on polling frequency
• Lower operational        and concurrency
  overhead
                         • Message consumption
• Advanced querying        is a locking operation
  (map/reduce etc...)
                         • Fewer libraries
                           available[1]
                         [1] Python has kombu, < v2.1 no fanout
                        support but better async task performance
                                                                    11
Don’t Forget To Shard
  Your Collections!




                        12
Questions?

 luke@about.me
 about.me/luke
   @lmgtwit



                 13

MongoDB as Message Queue

  • 1.
    MongoDB as A MessageQueue Luke Gotszling Aol / About.me Silicon Valley MongoDB User Group Big Data Week Palo Alto, CA April 25, 2012 1
  • 2.
    Prior AMQP Usage •3-node RabbitMQ cluster on v1.8, opted to forego disk persistence for better performance • Hard to diagnose cause of failure at scale 2
  • 3.
    At About.me • Allasynchronous and periodic tasks • Short lived messages • No journalling • Sharded cluster on v2.0.4 (shard key = queue name) 3
  • 4.
    Benefits • Async operations •Per message (document) atomicity • Batch processes • Periodic processes • Durability / ability to shard • Operational familiarity 4
  • 5.
    AMQP? Direct Topic Fanout ? AMQP Push Yes Yes Mongo Regular Poll Sort of* Queue expression * Options include passing a message along with an incrementing key or multiple declarations. Added to Kombu in v2.1 -- reduces performance for non-fanout operations due to additional queries 5
  • 6.
    To cap ornot to cap • Capped collections[1] • Better performance but limited to single node[2] • FIFO • Uncapped collections -- rest of this presentation • Can shard, lower performance per-node • FIFO-ish[3], custom ordering available [1] http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/ http://blog.boxedice.com/2011/09/28/replacing-rabbitmq-with-mongodb/ [2] SERVER-211, SERVER-2654 [3] Only down to 1 second granularity 6
  • 7.
    Code (mongo) • Create: db.messages.insert( { queue:"email", payload:serialized_data} ) • Consume: db.messages.findAndModify( { query:{"queue":"email"}, sort:{"_id":+1}, remove:true} ) • Index: db.messages.ensureIndex({ queue:1 }) db.messages.ensureIndex({ queue:1, _id:1}) 7
  • 8.
    Code (Python) • Create: self.client.insert({"payload": serialize(message), "queue": queue}) • Consume: self.client.database.command("findandmodify", "messages", query={"queue": queue}, sort={"_id": pymongo.ASCENDING}, remove=True) • Index: col.ensure_index([("queue", 1)]) col.ensure_index([("queue", 1),("_id", 1)]) http://packages.python.org/kombu/ 8
  • 9.
    Celery Task Creation Benchmarks (Single-Node) RabbitMQ v2.7.1 MongoDB (2.0.4) --nojournal MongoDB (2.0.4) --journal 5600 4200 Created / s 2800 1400 0 1 2 3 4 5 Concurrency (processes) celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16 9
  • 10.
    Celery Task Consumption Benchmarks (Single-Node) RabbitMQ v2.7.1 MongoDB (2.0.4) --nojournal MongoDB (2.0.4) --journal 2000 1500 Consumed / s 1000 500 0 1 5 9 13 17 21 25 Concurrency (eventlet) celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16 10
  • 11.
    Pros Cons • Familiar technology • Not AMQP • Sharding • Need to poll • Durability • Performance depends on polling frequency • Lower operational and concurrency overhead • Message consumption • Advanced querying is a locking operation (map/reduce etc...) • Fewer libraries available[1] [1] Python has kombu, < v2.1 no fanout support but better async task performance 11
  • 12.
    Don’t Forget ToShard Your Collections! 12
  • 13.