KEMBAR78
Django Celery | KEY
celery
distributed task




                   @matclayton
warning
background
what is celery?
what is celery?



  "celery is an open source asynchronous task queue/
job
  queue based on distributed message I. It is focused
what is celery?



  "celery is an open source asynchronous task queue/
job
  queue based on distributed message passing. It is




but what does that
you can run


• distributed
• concurrently
• in the background
use cases


  • external api calls (twitter)
  • long running tasks (transcoding)
  • concurrent execution (batch image
    resize)
  • load balancing (across servers)
sql

      django   celery

      ORM      carro

               AMQP
      SQL        or
               STOMP
result


•   database
•   AMQP
•   cache
•   tokyo tyrant
•   redis
•   mongodb
components


1. views.py / management
   command
2. broker – RabbitMQ
3. workers
workflow




http://robertpogorzelski.com/blog/2009/09/10/rabbitmq-celery-
rabbits and
warrens
setting up the

 $ sudo apt-get install rabbitmq-
  server
 $ sudo pip install celery


$ rabbitmqctl add_user myuser
 mypassword
$ rabbitmqctl add_vhost myvhost
$ rabbitmqctl set_permissions -p
setup
        INSTALLED_APPS += ("djcelery", )      settings.p
        BROKER_HOST = "localhost"
        BROKER_PORT = 5672
        BROKER_USER = “myuser"
        BROKER_PASSWORD = “mypassword"
        BROKER_VHOST = “myvhost"

        CELERY_QUEUES = {
            "regular_tasks": {
                "binding_key": "task.#",
            },
            "twitter_tasks": {
                "binding_key": "twitter.#",
            },
            "feed_tasks": {
                "binding_key": "feed.#",
            },
        }


        $ python manage.py celeryd -B
hello

    from celery.decorators import task
                                         tasks.p
    @task
    def add(x, y):
      return x + y




    >>> result = add.delay(4, 4)
    >>> result.wait() # wait
    8
post to

from celery.task import Task
                                             tasks.p
class UpdateStatus(Task):
    name = "twitter.updatestatus"
    routing_key = 'twitter.updatestatus'
    ignore_result = True
        
    def run(self, tweet, **kwargs):
        post_to_twitter(tweet)
        

from twitter.tasks import UpdateStatus
UpdateStatus.delay(tweet=‘hello world’)
                                             views.p
retry / rate
from celery.task import Task
                                                    tasks.p
class UpdateStatus(Task):
    name = "twitter.updatestatus"
    routing_key = 'twitter.updatestatus'
    ignore_result = True
    default_retry_delay = 5 * 60
    max_retries = 12 # 1 hour retry
     rate_limit = ‘10/s’
    
    def run(self, tweet, **kwargs):
        try:
            post_to_twitter(tweet)
        except Exception, exc:
            # If twitter crashes retry
            self.retry([tweet,], kwargs, exc=exc)

from twitter.tasks import UpdateStatus              views.p
UpdateStatus.delay(tweet=‘hello world’)
podcast


from celery.task import PeriodicTask
                                                 tasks.p
class FeedImportPeriodicTask(PeriodicTask):
    run_every = timedelta(hours=1)
    routing_key = 'feed.periodic_import'

    def run(self, **kwargs):
        logger = self.get_logger(**kwargs)
        logger.info("Running Periodic Feed Import task!")
        update_podcasts(silent=False)
class FeedImporter(Task):
    name = "feed.import"
    routing_key = 'feed.import'
                                                                            tasks.p
    ignore_result = True
    default_retry_delay = 5 * 60 # retry in 5 minutes
    max_retries = 72 # 6 Hours to cover major outages

    def run(self, podcast_id, **kwargs):
        try:
            logger = self.get_logger(**kwargs)
            # The cache key consists of the task name and the MD5 digest of the feed id.
            lock_id = "%s-lock-%s" % (self.name, podcast_id)
            is_locked = lambda: str(cache.get(lock_id)) == "true"
            acquire_lock = lambda: cache.set(lock_id, "true", 300)
            # memcache delete is very slow, so we'd rather set a false value
            # with a very low expiry time.
            release_lock = lambda: cache.set(lock_id, "nil", 1)
    
            logger.debug("Trying to import feed: %s" % podcast_id)
            if is_locked():
                logger.debug("Feed %s is already being imported by another worker" % podcast_id)
                return

            acquire_lock()
            try:
                import_feed(logger, podcast_id)
            finally:
                release_lock()
        except Exception, exc:
            logger.error(exc)
typical




• running out of disk space ==
  rabbitmq fail
• queue priorities, difficult
• non-pickle-able errors
• crashing consumers
other cool


   •   tasksets / callbacks
   •   remote control tasks
   •   abortable tasks
   •   eta – run tasks at a set time
   •   HttpDispatchTask
   •   expiring tasks
   •   celerymon
   •   celeryev
   •   ajax views
finding




• http://github.com/ask/celery
• http://github.com/ask/django-
  celery
• irc.freenode.net #celery (asksol
  owner, always helpful and about)
@matclayton
mat@mixcloud.com

Django Celery

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
    what is celery? "celery is an open source asynchronous task queue/ job queue based on distributed message I. It is focused
  • 6.
    what is celery? "celery is an open source asynchronous task queue/ job queue based on distributed message passing. It is but what does that
  • 7.
    you can run •distributed • concurrently • in the background
  • 8.
    use cases • external api calls (twitter) • long running tasks (transcoding) • concurrent execution (batch image resize) • load balancing (across servers)
  • 9.
    sql django celery ORM carro AMQP SQL or STOMP
  • 10.
    result • database • AMQP • cache • tokyo tyrant • redis • mongodb
  • 11.
    components 1. views.py /management command 2. broker – RabbitMQ 3. workers
  • 12.
  • 13.
  • 15.
    setting up the $ sudo apt-get install rabbitmq- server $ sudo pip install celery $ rabbitmqctl add_user myuser mypassword $ rabbitmqctl add_vhost myvhost $ rabbitmqctl set_permissions -p
  • 16.
    setup INSTALLED_APPS += ("djcelery", ) settings.p BROKER_HOST = "localhost" BROKER_PORT = 5672 BROKER_USER = “myuser" BROKER_PASSWORD = “mypassword" BROKER_VHOST = “myvhost" CELERY_QUEUES = {     "regular_tasks": {         "binding_key": "task.#",     },     "twitter_tasks": {         "binding_key": "twitter.#",     },     "feed_tasks": {         "binding_key": "feed.#",     }, } $ python manage.py celeryd -B
  • 17.
    hello from celery.decorators import task tasks.p @task def add(x, y): return x + y >>> result = add.delay(4, 4) >>> result.wait() # wait 8
  • 18.
    post to from celery.taskimport Task tasks.p class UpdateStatus(Task):     name = "twitter.updatestatus"     routing_key = 'twitter.updatestatus'     ignore_result = True              def run(self, tweet, **kwargs):         post_to_twitter(tweet)          from twitter.tasks import UpdateStatus UpdateStatus.delay(tweet=‘hello world’) views.p
  • 19.
    retry / rate fromcelery.task import Task tasks.p class UpdateStatus(Task):     name = "twitter.updatestatus"     routing_key = 'twitter.updatestatus'     ignore_result = True     default_retry_delay = 5 * 60     max_retries = 12 # 1 hour retry rate_limit = ‘10/s’          def run(self, tweet, **kwargs):         try: post_to_twitter(tweet)         except Exception, exc:             # If twitter crashes retry             self.retry([tweet,], kwargs, exc=exc) from twitter.tasks import UpdateStatus views.p UpdateStatus.delay(tweet=‘hello world’)
  • 20.
    podcast from celery.task importPeriodicTask tasks.p class FeedImportPeriodicTask(PeriodicTask):     run_every = timedelta(hours=1)     routing_key = 'feed.periodic_import'     def run(self, **kwargs):         logger = self.get_logger(**kwargs)         logger.info("Running Periodic Feed Import task!")         update_podcasts(silent=False)
  • 21.
    class FeedImporter(Task):     name ="feed.import"     routing_key = 'feed.import' tasks.p     ignore_result = True     default_retry_delay = 5 * 60 # retry in 5 minutes     max_retries = 72 # 6 Hours to cover major outages     def run(self, podcast_id, **kwargs):         try:             logger = self.get_logger(**kwargs)             # The cache key consists of the task name and the MD5 digest of the feed id.             lock_id = "%s-lock-%s" % (self.name, podcast_id)             is_locked = lambda: str(cache.get(lock_id)) == "true"             acquire_lock = lambda: cache.set(lock_id, "true", 300)             # memcache delete is very slow, so we'd rather set a false value             # with a very low expiry time.             release_lock = lambda: cache.set(lock_id, "nil", 1)                  logger.debug("Trying to import feed: %s" % podcast_id)             if is_locked():                 logger.debug("Feed %s is already being imported by another worker" % podcast_id)                 return             acquire_lock()             try:                 import_feed(logger, podcast_id)             finally:                 release_lock()         except Exception, exc:             logger.error(exc)
  • 22.
    typical • running outof disk space == rabbitmq fail • queue priorities, difficult • non-pickle-able errors • crashing consumers
  • 23.
    other cool • tasksets / callbacks • remote control tasks • abortable tasks • eta – run tasks at a set time • HttpDispatchTask • expiring tasks • celerymon • celeryev • ajax views
  • 24.
    finding • http://github.com/ask/celery • http://github.com/ask/django- celery • irc.freenode.net #celery (asksol owner, always helpful and about)
  • 25.