Advanced task management with Celery

Advanced Task Management in Celery

Mahendra M
@mahendra
https://github.com/mahendra

@mahendra
● Python developer for 6 years
● FOSS enthusiast/volunteer for 14 years
● Bangalore LUG and Infosys LUG
● FOSS.in and LinuxBangalore/200x
● Celery user for 3 years
● Contributions
● patches, testing new releases
● Zookeeper msg transport for kombu
● Kafka support (in-progress)

Quick Intro to Celery
● Asynchronous task/job queue
● Uses distributed message passing
● Tasks are run asynchronously on worker nodes
● Results are passed back to the caller (if any)

Overview

Worker 1

Worker 2
Sender Msg Q
.
.
.

Worker N

Sample Code
from celery.task import task

@task
def add(x, y):
return x + y

result = add.delay(5,6)
result.get()

Uses of Celery
● Asynchronous task processing
● Handling long running / heavy jobs
● Image resizing, video transcode, PDF generation
● Offloading heavy web backend operations
● Scheduling tasks to be run at a particular time
● Cron for python

Advanced Uses
● Task Routing
● Task retries, timeout and revoking
● Task Canvas – combining tasks
● Task co-ordination
● Dependencies
● Task trees or graphs
● Batch tasks
● Progress monitoring
● Tricks
● DB conflict management

Sending tasks to a particular worker

Worker 1
(Windows)

windows
Worker 2
windows (Windows)
Sender Msg Q
.
linux
.
.
Worker N
(Linux)

Routing tasks – Use cases
● Priority execution
● Based on hardware capabilities
● Special cards available for video capture
● Making use of GPUs (CUDA)
● Based on OS (for eg. Playready encryption)
● Based on location
● Moving compute closer to data (Hadoop-ish)
● Sending tasks to different data centers
● Sequencing operations (CouchDB conflicts)

Sample Code
from celery.task import task

@task(queue = 'windows')
def drm_encrypt(audio_file, key_phrase):
...

r = drm_encrypt.apply_async( args = [afile, key],
queue = 'windows' )

#Start celery worker with queues options
$ celery worker -Q windows

Retrying tasks
@task( default_retry_delay = 60,
max_retries = 3 )
try:
playready.encrypt(...)
except Exception, exc:
raise drm_encrypt.retry(exc=exc, countdown=5)

Retrying tasks
● You can specify the number of times a task can
be retried.
● The cases for retrying a task must be handled
within code. Celery will not do it automatically
● The tasks should be designed to be idempotent

Handling worker failures
@task( acks_late = True )
try:

● This is used where the task must be resend in case of
worker or node failure
● The ack message to the message queue is sent after the
task finishes executing

Worker processes

Worker 1
(Windows)

windows
Worker 2
windows (Windows)
Sender Msg Q
.
linux
.
.
Worker N
(Linux)
Process 1
Process 2

Process N

Worker process
● In every worker node, celery starts a pool of
worker processes
● The number is determined by the concurrency
setting (or autodetected – for full CPU usage)
● Each processes can be configured to restart
after running x number of tasks
● Disabled by default
● Alternately eventlet can be used instead of
processes (discuss later)

Revoking tasks
celery.control.revoke( task_id,
terminate = False,
signal = 'SIGKILL' )
●
revoke() works by sending a broadcast
message to all workers
● If a task has not yet run, workers will keep this
task_id in memory and ensure that it does not
run
● If a task is running, revoke() will not work
unless terminate = True

Task expiration
task.apply_async( expires = x )
x can be
* in seconds
* a specific datetime()

● Global time limits can be configured in settings
● Soft time limit – the task receives an exception
which can be used to cleanup
● Hard time limit – the worker running the task is
killed and is replaced with another one.

Handling soft time limit
@task()
Try:
setup_tmp_files()
SoftTimeLimitExceeded:

except SoftTimeLimitExceeded:
cleanup_tmp_files()

Task Canvas
● Chains – Linking one task to another
● Groups – Execute several tasks in parallel
● Chord – execute a task after a set of tasks has
finished
● Map and starmap – Similar to map() function
● Chunks – divide an iterable of work into chunks
● Chunks + Chord/chain can be used for map-
reduce
Best shown in a demo

Task Trees
● Home grown solution (our current approach)
● Use db models and keep track of trees
● Better approach
● Use celery-tasktree
● http://pypi.python.org/pypi/celery-tasktree

Celery Batches
● Collect jobs and execute it in a batch.
● Can be used for stats collection
● Batch execution is done once
● a configured timeout is reached OR
● a configured number of tasks have been received
● Useful for reducing n/w and db loads

Celery Batches
from celery.contrib.batches import Batches
@task( base=Batches, flush_every=50, flush_interval=10 )
def collect_stats( requests ):
items = {}
for request in requests:
item_id = request.kwargs['item_id']
items[ item_id ] = get_obj( item_id )
items[ item_id ].count += 1
# Sync to db

collect_stats.delay( item_id = 45 )
collect_stats.delay( item_id = 57 )

Celery monitoring
● Celery Flower
https://github.com/mher/flower
● Django admin monitor
● Celery jobstatic
http://pypi.python.org/pypi/jobtastic

Celery deployment
● Cyme – celery instance manager
https://github.com/celery/cyme
● Celery autoscaling
● Use celery eventlet where required

Advanced task management with Celery

More Related Content

What's hot

Similar to Advanced task management with Celery

Recently uploaded

In this document

Advanced task management with Celery