Work Queue Systems

MOTIVATIONS
Do work in the background
Parallelize tasks
Distribute work among many machines

DESIGN CONSIDERATIONS
Expect failure and design accordingly (process crashes,
machine reboots, network partition)
Break work into small, bite-size tasks
Idempotency: ensure nothing bad will happen if your job runs
multiple times

SINGLE MACHINE
Distribute work to multiple worker threads or forked worker
processes.
Can easily parallelize work, but jobs go away if the process
restarts
Cannot distribute work to multiple machines this way
IPC (Inter-Process Communication) is difficult to do right
Big no-no for web apps (you want to offload work to a
separate machine)

MULTIPLE MACHINES
Distribute work to workers on other machines directly over the
network
Ruby’s DRb can distribute work, but is unstable under high
load
A dedicated messaging system can be used to distribute work
reliably
Jobs are (usually) not persistent so can be lost if something
crashes

PERSISTENT QUEUE
Workers pull jobs from a persistent backend queue
Suitable when many jobs need to be queued up and worked
over time
Jobs can still be lost if workers crash or database hiccups
“Reliable” queueing can recover jobs if workers crash

CAPABILITIES OF A (GOOD)
WORK QUEUE SYSTEM

RETRIES
Things go wrong all the time. You want jobs to be automatically
retried.

RELIABILITY
Messages / Jobs should never be lost.

SCHEDULING
Schedule a job to run at a certain time instead of running
immediately.

STATUS
Report back to the application on the job’s completion
percentage and whether it succeeded or failed.

PRIORITY
If your queue fills up, important jobs might be waiting in the back
of the queue. A priority queue allows important jobs to go to the
top so they can be executed ASAP.

DEDICATED QUEUING SYSTEM
Backend built specifically for the purpose of queueing
Natively supports desired properties of queues
Gearman: One of the originals. Out of date, not as fully-features
as modern alternatives
Beanstalkd: Very fully featured and well-maintained

GENERALPURPOSE
DATABASE
Simple to use if you’re already using a standard database
May not scale to massive / high-throughput workloads
SQL: May have locking / concurrency issues
Document Store: Probably won’t provide reliability
Redis: Swiss-Army Knife of key-value stores, used by Resque
and Sidekiq. Everything has to fit in memory.

MESSAGING SYSTEM
Provides generic message-passing capabilities (queues are
just a special case)
Very scalable and high-throughput
Can be very complex to set up and use (topics, consumers,
exchanges, brokers, OH MY)
ActiveMQ, RabbitMQ, ZeroMQ, HornetQ
- distributed Apache Kafka commit log

BATCH PROCESSING SYSTEM
MapReduce on huge volumes of data
Apache Hadoop
Apache Spark
Amazon Elastic MapReduce - hosted Hadoop

REALTIME PROCESSING
SYSTEM
Continual stream of input (firehose), need results within
seconds or minutes
Apace Storm

THIRD PARTY SERVICE
- reliable message queue service
IronMQ / IronWorker
Amazon SQS: Scalable, but very bare-bones (lacks good Ruby
worker client)

RUBY WORK QUEUE LIBRARIES
A backend isn’t very useful without a good worker library to run
the jobs. Often the library can provide capabilities that the
backend does not.

RESQUE VS SIDEKIQ
Resque forks workers, Sidekiq uses threads via Celluloid
Both use Redis for the backend and are mostly compatible
with each other
Very fully featured (often via a separate gem)
Both come with web UI to make it easier to monitor job status
Sidekiq has a performance edge, and Sidekiq Pro offers
reliability and batches

DELAYED JOB
Uses Active Record, so easy to plug into existing Rails app
Fairly well supported in the community
Alternatives that take advantage of PostgreSQL advanced
features: Queue Classic, Que, Toro

INMEMORY
Sucker Punch and Threaded In Memory Queue run workers in
the same process (in background threads) and distribute the
jobs directly to these workers.

HONORABLE MENTIONS
Sneakers - RabbitMQ
Backburner - Beanstalkd
TorqueBox Backgroundable (JRuby-only)
Qu - Supports multiple backends (Redis, MongoDB, SQS). Not
as well maintained or fully-featured.

ADAPTERS
You may want to change queueing backends / libraries without
rewriting all your jobs.
MultiWorker - Adapts all the libraries mentioned in this
presentation
ActiveJob - Built into Rails 4.2.0 (beta), but can be used as a
separate gem

Work Queue Systems

More Related Content

What's hot

Similar to Work Queue Systems

Recently uploaded

Work Queue Systems