KEMBAR78
Work Queue Systems | PDF
WORK QUEUE 
SYSTEMS
MOTIVATIONS 
Do work in the background 
Parallelize tasks 
Distribute work among many machines
DESIGN CONSIDERATIONS 
Expect failure and design accordingly (process crashes, 
machine reboots, network partition) 
Break work into small, bite-size tasks 
Idempotency: ensure nothing bad will happen if your job runs 
multiple times
WORK DISTRIBUTION 
STRATEGIES
SINGLE MACHINE 
Distribute work to multiple worker threads or forked worker 
processes. 
Can easily parallelize work, but jobs go away if the process 
restarts 
Cannot distribute work to multiple machines this way 
IPC (Inter-Process Communication) is difficult to do right 
Big no-no for web apps (you want to offload work to a 
separate machine)
MULTIPLE MACHINES 
Distribute work to workers on other machines directly over the 
network 
Ruby’s DRb can distribute work, but is unstable under high 
load 
A dedicated messaging system can be used to distribute work 
reliably 
Jobs are (usually) not persistent so can be lost if something 
crashes
PERSISTENT QUEUE 
Workers pull jobs from a persistent backend queue 
Suitable when many jobs need to be queued up and worked 
over time 
Jobs can still be lost if workers crash or database hiccups 
“Reliable” queueing can recover jobs if workers crash
CAPABILITIES OF A (GOOD) 
WORK QUEUE SYSTEM
RETRIES 
Things go wrong all the time. You want jobs to be automatically 
retried.
RELIABILITY 
Messages / Jobs should never be lost.
SCHEDULING 
Schedule a job to run at a certain time instead of running 
immediately.
STATUS 
Report back to the application on the job’s completion 
percentage and whether it succeeded or failed.
PRIORITY 
If your queue fills up, important jobs might be waiting in the back 
of the queue. A priority queue allows important jobs to go to the 
top so they can be executed ASAP.
TYPES OF QUEUING BACKENDS
DEDICATED QUEUING SYSTEM 
Backend built specifically for the purpose of queueing 
Natively supports desired properties of queues 
Gearman: One of the originals. Out of date, not as fully-features 
as modern alternatives 
Beanstalkd: Very fully featured and well-maintained
GENERAL­PURPOSE 
DATABASE 
Simple to use if you’re already using a standard database 
May not scale to massive / high-throughput workloads 
SQL: May have locking / concurrency issues 
Document Store: Probably won’t provide reliability 
Redis: Swiss-Army Knife of key-value stores, used by Resque 
and Sidekiq. Everything has to fit in memory.
MESSAGING SYSTEM 
Provides generic message-passing capabilities (queues are 
just a special case) 
Very scalable and high-throughput 
Can be very complex to set up and use (topics, consumers, 
exchanges, brokers, OH MY) 
ActiveMQ, RabbitMQ, ZeroMQ, HornetQ 
- distributed Apache Kafka commit log
BATCH PROCESSING SYSTEM 
MapReduce on huge volumes of data 
Apache Hadoop 
Apache Spark 
Amazon Elastic MapReduce - hosted Hadoop
REALTIME PROCESSING 
SYSTEM 
Continual stream of input (firehose), need results within 
seconds or minutes 
Apace Storm
THIRD PARTY SERVICE 
- reliable message queue service 
IronMQ / IronWorker 
Amazon SQS: Scalable, but very bare-bones (lacks good Ruby 
worker client)
RUBY WORK QUEUE LIBRARIES 
A backend isn’t very useful without a good worker library to run 
the jobs. Often the library can provide capabilities that the 
backend does not.
RESQUE VS SIDEKIQ 
Resque forks workers, Sidekiq uses threads via Celluloid 
Both use Redis for the backend and are mostly compatible 
with each other 
Very fully featured (often via a separate gem) 
Both come with web UI to make it easier to monitor job status 
Sidekiq has a performance edge, and Sidekiq Pro offers 
reliability and batches
DELAYED JOB 
Uses Active Record, so easy to plug into existing Rails app 
Fairly well supported in the community 
Alternatives that take advantage of PostgreSQL advanced 
features: Queue Classic, Que, Toro
IN­MEMORY 
Sucker Punch and Threaded In Memory Queue run workers in 
the same process (in background threads) and distribute the 
jobs directly to these workers.
HONORABLE MENTIONS 
Sneakers - RabbitMQ 
Backburner - Beanstalkd 
TorqueBox Backgroundable (JRuby-only) 
Qu - Supports multiple backends (Redis, MongoDB, SQS). Not 
as well maintained or fully-featured.
ADAPTERS 
You may want to change queueing backends / libraries without 
rewriting all your jobs. 
MultiWorker - Adapts all the libraries mentioned in this 
presentation 
ActiveJob - Built into Rails 4.2.0 (beta), but can be used as a 
separate gem

Work Queue Systems

  • 1.
  • 2.
    MOTIVATIONS Do workin the background Parallelize tasks Distribute work among many machines
  • 3.
    DESIGN CONSIDERATIONS Expectfailure and design accordingly (process crashes, machine reboots, network partition) Break work into small, bite-size tasks Idempotency: ensure nothing bad will happen if your job runs multiple times
  • 4.
  • 5.
    SINGLE MACHINE Distributework to multiple worker threads or forked worker processes. Can easily parallelize work, but jobs go away if the process restarts Cannot distribute work to multiple machines this way IPC (Inter-Process Communication) is difficult to do right Big no-no for web apps (you want to offload work to a separate machine)
  • 6.
    MULTIPLE MACHINES Distributework to workers on other machines directly over the network Ruby’s DRb can distribute work, but is unstable under high load A dedicated messaging system can be used to distribute work reliably Jobs are (usually) not persistent so can be lost if something crashes
  • 7.
    PERSISTENT QUEUE Workerspull jobs from a persistent backend queue Suitable when many jobs need to be queued up and worked over time Jobs can still be lost if workers crash or database hiccups “Reliable” queueing can recover jobs if workers crash
  • 8.
    CAPABILITIES OF A(GOOD) WORK QUEUE SYSTEM
  • 9.
    RETRIES Things gowrong all the time. You want jobs to be automatically retried.
  • 10.
    RELIABILITY Messages /Jobs should never be lost.
  • 11.
    SCHEDULING Schedule ajob to run at a certain time instead of running immediately.
  • 12.
    STATUS Report backto the application on the job’s completion percentage and whether it succeeded or failed.
  • 13.
    PRIORITY If yourqueue fills up, important jobs might be waiting in the back of the queue. A priority queue allows important jobs to go to the top so they can be executed ASAP.
  • 14.
  • 15.
    DEDICATED QUEUING SYSTEM Backend built specifically for the purpose of queueing Natively supports desired properties of queues Gearman: One of the originals. Out of date, not as fully-features as modern alternatives Beanstalkd: Very fully featured and well-maintained
  • 16.
    GENERAL­PURPOSE DATABASE Simpleto use if you’re already using a standard database May not scale to massive / high-throughput workloads SQL: May have locking / concurrency issues Document Store: Probably won’t provide reliability Redis: Swiss-Army Knife of key-value stores, used by Resque and Sidekiq. Everything has to fit in memory.
  • 17.
    MESSAGING SYSTEM Providesgeneric message-passing capabilities (queues are just a special case) Very scalable and high-throughput Can be very complex to set up and use (topics, consumers, exchanges, brokers, OH MY) ActiveMQ, RabbitMQ, ZeroMQ, HornetQ - distributed Apache Kafka commit log
  • 18.
    BATCH PROCESSING SYSTEM MapReduce on huge volumes of data Apache Hadoop Apache Spark Amazon Elastic MapReduce - hosted Hadoop
  • 19.
    REALTIME PROCESSING SYSTEM Continual stream of input (firehose), need results within seconds or minutes Apace Storm
  • 20.
    THIRD PARTY SERVICE - reliable message queue service IronMQ / IronWorker Amazon SQS: Scalable, but very bare-bones (lacks good Ruby worker client)
  • 21.
    RUBY WORK QUEUELIBRARIES A backend isn’t very useful without a good worker library to run the jobs. Often the library can provide capabilities that the backend does not.
  • 22.
    RESQUE VS SIDEKIQ Resque forks workers, Sidekiq uses threads via Celluloid Both use Redis for the backend and are mostly compatible with each other Very fully featured (often via a separate gem) Both come with web UI to make it easier to monitor job status Sidekiq has a performance edge, and Sidekiq Pro offers reliability and batches
  • 23.
    DELAYED JOB UsesActive Record, so easy to plug into existing Rails app Fairly well supported in the community Alternatives that take advantage of PostgreSQL advanced features: Queue Classic, Que, Toro
  • 24.
    IN­MEMORY Sucker Punchand Threaded In Memory Queue run workers in the same process (in background threads) and distribute the jobs directly to these workers.
  • 25.
    HONORABLE MENTIONS Sneakers- RabbitMQ Backburner - Beanstalkd TorqueBox Backgroundable (JRuby-only) Qu - Supports multiple backends (Redis, MongoDB, SQS). Not as well maintained or fully-featured.
  • 26.
    ADAPTERS You maywant to change queueing backends / libraries without rewriting all your jobs. MultiWorker - Adapts all the libraries mentioned in this presentation ActiveJob - Built into Rails 4.2.0 (beta), but can be used as a separate gem