KEMBAR78
Inter-Process/Task Communication With Message Queues | ODP
Inter-Process/Task Communication With Message Queues William McVey < [email_address] > PyOhio July 26, 2009
Intro How I found a solution that works well for me There is a LOT of material out there that isn't covered Not necessarily ideal solution, but I learned a lot along the way
Description of the Problem HPC Controller: Tries to discover new ways web browsers (and other client software) get exploited &quot;in the wild&quot; and ensures that my employer's mitigations for these threats are effective. A Django-based data management application Invokes long running  Capture-HPC  Java application Collects and processes large amounts of data
Architecture
Key Difficulties Long running processes under short lived web requests. My initial (naive) approach: Spawn detached processes to handle jobs Process coordination via database
Lesson learned Do  not  screw with Apache's process model.
Rediscovering Queues Basic queue overview Standard lib: Queue  - mostly for thread pool management collections.dequeue  - provides efficient access to both endpoints of list structure heapq  - ordered queues (e.g. priority queue)
Generic message broker Message brokers can provide: Simple queue-like dataflow Simplified interprocess communication with message routing More effective scaling Better resilience to failure
beanstalkd/beanstalkc beanstalkd : A very simple text based-protocol with an simple yet powerful set of queue management primitives.  http://xph.us/software/beanstalkd/ beanstalkc : A simple yet powerful client API that is well documented.  http://github.com/earl/beanstalkc/ [demo here]
The need for something more Beanstalkd continues to be effective for  hpc_controller . A new project came along and I ran into some issues... Lack of authentication Lack of message integrity/confidentiality Lack of persistent messages
memcacheq Memcacheq  uses the  memcachedb  protocol to implement queues.  &quot;Cache&quot; look up of a queue name pop a value from the queue Pro: Fast, lightweight, and scales well. Persistent messages across reboots Con: Doesn't support either blocking or callback  interfaces Have to poll to see if you have messages Didn't address authentication requirement [demo here]
AMQP Advanced Message Queuing Protocol (AMQP) open protocol layer for message queues. Pro: A more powerful message routing capability TLS (aka SSL) as part of the protocol spec A variety of broker implementations Con: More complex
AMQP
AMQP Message Routing Image from:  Messaging Tutorial - AMQP Programming Tutorial for C++, Java, Python, and C# Copyright © 2008 Red Hat, Inc. Under the Open Publication License
ØMQ -  http://zeromq.org/ High performance messaging broker which can speak AMQP or you can use it's own set of python bindings to communicate via the library code. Pro: more flexible set of possible topologies (include brokerless/peer to peer, directory referral, and more). Con: Misguided 'fail fast' implementation within the library
RabbitMQ RabbitMQ < http://www.rabbitmq.com/ > is conformant to the AMQP spec and provided the features I needed: TLS protected communication Authentication / Authorization High reliability Persistent messages Broker is implemented in Erlang, but implementation doesn't matter since client side has  py-amqplib .
amqplib / carrot py-amqplib  is a client library around the AMQP protocol.Fairly low level for my needs though, so a little digging found  carrot
carrot sample >>>   from   carrot.messaging   import  Publisher, Consumer >>>   class   PostOfficePublisher (Publisher): ...   exchange  =   &quot;sorting_room&quot; ...   routing_key  =   &quot;jason&quot; >>>   class   PostOfficeConsumer (Consumer): ...   queue  =   &quot;po_box&quot; ...   exchange  =   &quot;sorting_room&quot; ...   routing_key  =   &quot;jason&quot; ... ...   def   receive ( self , message_data, message): ...   &quot;&quot;&quot;Called when we receive a message.&quot;&quot;&quot; ...   print ( &quot;Received:  %s &quot;   %  message_data)
carrot sample >>>   from   ConfigParser   import  ConfigParser >>>  config  =  ConfigParser() >>>  config . read( &quot;application.ini&quot; ) >>>   from   carrot.connection   import  AMQPConnection >>>  amqpconn  =  AMQPConnection( ...   hostname  =  config . get( &quot;broker&quot; ,  &quot;host&quot; ), ...   port  =  config . get( &quot;broker&quot; ,  &quot;port&quot; ), ...   userid  =  config . get( &quot;broker&quot; ,  &quot;userid&quot; ), ...   password  =  config . get( &quot;broker&quot; ,  &quot;password&quot; ), ...   vhost  =  config . get( &quot;broker&quot; ,  &quot;vhost&quot; )) >>>  PostOfficePublisher(connection = amqpconn) . send( ...   { &quot;My message&quot; : [ &quot;foo&quot; ,  &quot;bar&quot; ,  &quot;baz&quot; ]}) >>>  PostOfficeConsumer(connection = amqpconn) . next() Received: { &quot;My message&quot; : [ &quot;foo&quot; ,  &quot;bar&quot; ,  &quot;baz&quot; ]}
multiprocessing Part of the Python 2.6 standard library.Main intent is to provide a process alternative to the  threadingQueueManager  library.Provides some process coordination facilities, including a object and a network aware interprocess object. Pro: Part of standard library (2.6 and beyond) Con: Pretty low level
In Summary I like beanstalkc. I like AMQP (specifically RabbitMQ) along with  carrot  API Memcacheq would work well if all you need to do is cache jobs until you can process in batch Multiprocessing in worth a look I've only scratched the surface (Kamaelia, sprinkle/STOMP, etc)

Inter-Process/Task Communication With Message Queues

  • 1.
    Inter-Process/Task Communication WithMessage Queues William McVey < [email_address] > PyOhio July 26, 2009
  • 2.
    Intro How Ifound a solution that works well for me There is a LOT of material out there that isn't covered Not necessarily ideal solution, but I learned a lot along the way
  • 3.
    Description of theProblem HPC Controller: Tries to discover new ways web browsers (and other client software) get exploited &quot;in the wild&quot; and ensures that my employer's mitigations for these threats are effective. A Django-based data management application Invokes long running Capture-HPC Java application Collects and processes large amounts of data
  • 4.
  • 5.
    Key Difficulties Longrunning processes under short lived web requests. My initial (naive) approach: Spawn detached processes to handle jobs Process coordination via database
  • 6.
    Lesson learned Do not screw with Apache's process model.
  • 7.
    Rediscovering Queues Basicqueue overview Standard lib: Queue - mostly for thread pool management collections.dequeue - provides efficient access to both endpoints of list structure heapq - ordered queues (e.g. priority queue)
  • 8.
    Generic message brokerMessage brokers can provide: Simple queue-like dataflow Simplified interprocess communication with message routing More effective scaling Better resilience to failure
  • 9.
    beanstalkd/beanstalkc beanstalkd :A very simple text based-protocol with an simple yet powerful set of queue management primitives. http://xph.us/software/beanstalkd/ beanstalkc : A simple yet powerful client API that is well documented. http://github.com/earl/beanstalkc/ [demo here]
  • 10.
    The need forsomething more Beanstalkd continues to be effective for hpc_controller . A new project came along and I ran into some issues... Lack of authentication Lack of message integrity/confidentiality Lack of persistent messages
  • 11.
    memcacheq Memcacheq uses the memcachedb protocol to implement queues. &quot;Cache&quot; look up of a queue name pop a value from the queue Pro: Fast, lightweight, and scales well. Persistent messages across reboots Con: Doesn't support either blocking or callback interfaces Have to poll to see if you have messages Didn't address authentication requirement [demo here]
  • 12.
    AMQP Advanced MessageQueuing Protocol (AMQP) open protocol layer for message queues. Pro: A more powerful message routing capability TLS (aka SSL) as part of the protocol spec A variety of broker implementations Con: More complex
  • 13.
  • 14.
    AMQP Message RoutingImage from: Messaging Tutorial - AMQP Programming Tutorial for C++, Java, Python, and C# Copyright © 2008 Red Hat, Inc. Under the Open Publication License
  • 15.
    ØMQ - http://zeromq.org/ High performance messaging broker which can speak AMQP or you can use it's own set of python bindings to communicate via the library code. Pro: more flexible set of possible topologies (include brokerless/peer to peer, directory referral, and more). Con: Misguided 'fail fast' implementation within the library
  • 16.
    RabbitMQ RabbitMQ <http://www.rabbitmq.com/ > is conformant to the AMQP spec and provided the features I needed: TLS protected communication Authentication / Authorization High reliability Persistent messages Broker is implemented in Erlang, but implementation doesn't matter since client side has py-amqplib .
  • 17.
    amqplib / carrotpy-amqplib is a client library around the AMQP protocol.Fairly low level for my needs though, so a little digging found carrot
  • 18.
    carrot sample >>> from carrot.messaging import Publisher, Consumer >>> class PostOfficePublisher (Publisher): ... exchange = &quot;sorting_room&quot; ... routing_key = &quot;jason&quot; >>> class PostOfficeConsumer (Consumer): ... queue = &quot;po_box&quot; ... exchange = &quot;sorting_room&quot; ... routing_key = &quot;jason&quot; ... ... def receive ( self , message_data, message): ... &quot;&quot;&quot;Called when we receive a message.&quot;&quot;&quot; ... print ( &quot;Received: %s &quot; % message_data)
  • 19.
    carrot sample >>> from ConfigParser import ConfigParser >>> config = ConfigParser() >>> config . read( &quot;application.ini&quot; ) >>> from carrot.connection import AMQPConnection >>> amqpconn = AMQPConnection( ... hostname = config . get( &quot;broker&quot; , &quot;host&quot; ), ... port = config . get( &quot;broker&quot; , &quot;port&quot; ), ... userid = config . get( &quot;broker&quot; , &quot;userid&quot; ), ... password = config . get( &quot;broker&quot; , &quot;password&quot; ), ... vhost = config . get( &quot;broker&quot; , &quot;vhost&quot; )) >>> PostOfficePublisher(connection = amqpconn) . send( ... { &quot;My message&quot; : [ &quot;foo&quot; , &quot;bar&quot; , &quot;baz&quot; ]}) >>> PostOfficeConsumer(connection = amqpconn) . next() Received: { &quot;My message&quot; : [ &quot;foo&quot; , &quot;bar&quot; , &quot;baz&quot; ]}
  • 20.
    multiprocessing Part ofthe Python 2.6 standard library.Main intent is to provide a process alternative to the threadingQueueManager library.Provides some process coordination facilities, including a object and a network aware interprocess object. Pro: Part of standard library (2.6 and beyond) Con: Pretty low level
  • 21.
    In Summary Ilike beanstalkc. I like AMQP (specifically RabbitMQ) along with carrot API Memcacheq would work well if all you need to do is cache jobs until you can process in batch Multiprocessing in worth a look I've only scratched the surface (Kamaelia, sprinkle/STOMP, etc)