Playing tasks with
Django & Celery
Mauro Rocco
@fireantology
Playing tasks with Django & Celery
EuroPython 2011 – Florence 1
Mauro Rocco
About me
● I'm a Web Developer
● Python, Javascript, PHP, Java/Android
● celery contributor (just one of the hundreds )
About Jamendo
● Jamendo is a community of free, legal and unlimited music published
under Creative Commons licenses
● Free Music for users
● Popularity and earnings for artists
● Music licensing and background music at competitive prices for
companies
Playing tasks with Django & Celery
EuroPython 2011 – Florence 2
Mauro Rocco
Jamendo needs
● Multi-format music encoding
● Statistics (downloads, listens, reviews, stars, fb
likes) on different units
● Music analysis trough external services
● Music qualification trough several sources
● Integration with third part services
● Common jobs (contract generations,
certifications, bills, search index update)
Playing tasks with Django & Celery
EuroPython 2011 – Florence 3
Mauro Rocco
Celery
“Celery is an asynchronous task queue/job queue based on
distributed message passing. It is focused on real-time
operation, but supports scheduling as well”
● Async & Sync processes
● Concurrency within a box
● Distributed (across machines)
● Scheduling (interval, cron, ...)
● Fault tolerant
● Subtask, Set of tasks
● Web monitoring (django-celery and others)
Playing tasks with Django & Celery
EuroPython 2011 – Florence 4
Mauro Rocco
AMPQ
The Advanced Message Queuing Protocol (AMQP) is an open standard application layer
protocol for Message Oriented Middleware.
Playing tasks with Django & Celery
EuroPython 2011 – Florence 5
Mauro Rocco
Celery schema
Playing tasks with Django & Celery
EuroPython 2011 – Florence 6
Mauro Rocco
Celery worker
● Is the celery process that execute the tasks
● Can serve one or multiple queues
● Have a max number of tasks that can be
executed at the same time
● Can be remotely controlled
● Have a great configuration option called
MAX_TASK_PER_CHILD
$
$ celeryd -l INFO -c 5 -Q queue1 -E
Playing tasks with Django & Celery
EuroPython 2011 – Florence 7
Mauro Rocco
Celery worker
Playing tasks with Django & Celery
EuroPython 2011 – Florence 8
Mauro Rocco
Defining a simple task
from celery.decorators import task
@task
def make_money(how_much):
logger = make_money.get_logger()
logger.info("Congratulation, you earned %s$" % how_much)
if how_much>1000000:
return "Bora Bora"
return "Keep working"
>>> result = make_money.delay(200)
>>> result.get()
“Keep working”
Playing tasks with Django & Celery
EuroPython 2011 – Florence 9
Mauro Rocco
Retrying a task if something fails
from celery.decorators import task
@task
def make_money_real_life(how_much, wife=True):
try:
logger = make_money.get_logger()
if wife:
raise Exception("Incompatibility exception")
logger.info("Congratulation, you earned %s$" % how_much)
if how_much>1000000:
return "Bora Bora"
return "Keep working"
except Exception,exc:
make_money_real_life.retry(exc=exc,
countdown=60,
args=[how_much,False])
Playing tasks with Django & Celery
EuroPython 2011 – Florence 10
Mauro Rocco
Task set example
Extract from a jamendo task that upload track metadata in xml format to an ftp server for
music analysis
def run(self, setid=None, subtasks=None, **kwargs):
…
if not setid or not subtasks:
…
tasks = []
for slice in slices:
tasks.append(uploadTrackSlice.subtask((slice,folder_name)))
job = TaskSet(tasks=tasks)
task_set_result = job.apply_async()
setid = task_set_result.taskset_id
subtasks = [result.task_id for result in task_set_result.subtasks]
self.incrementalRetry("Result not ready", args=[setid,subtasks])
#Is a retry than we just have to check the results
tasks_result = TaskSetResult(setid, map(AsyncResult,subtasks))
if not tasks_result.ready():
self.incrementalRetry("Result not ready", args=[setid,subtasks])
else:
if tasks_result.successful():
return tasks_result.join()
else:
raise Exception("Some of the tasks was failing")
Playing tasks with Django & Celery
EuroPython 2011 – Florence 11
Mauro Rocco
The Jamendo Task class
The way for define common behaviour to all your tasks is to override __call__
and after_return methods of the celery Task class
class JamTask(Task):
def __call__(self, *args, **kwargs):
"""This method is in charge of call the run method of the task"""
self.max_retries = 30
self.sandbox = SandBox(self.name, self.request.id,
settings.PATH_SANDBOX, settings.DEBUG)
self.taskLogger = TaskLogger(args, kwargs)
self.taskLogger.__enter__()
.
.
return self.run(*args, **kwargs)
.
.
def after_return(self, status, retval, task_id, args, kwargs, einfo):
"""This method is called when the tasks end,
on whatever return state"""
self.taskLogger.__exit__(status, retval, args, kwargs, einfo)
self.cleanTaskSandBox(status,kwargs)
self.closeAllConnections()
Playing tasks with Django & Celery
EuroPython 2011 – Florence 12
Mauro Rocco
Web Monitoring tools
● django-celery
https://github.com/ask/django-celery/
● celery-pylons
http://pypi.python.org/pypi/celery-pylons
● flask-celery
https://github.com/ask/flask-celery/
Playing tasks with Django & Celery
EuroPython 2011 – Florence 13
Mauro Rocco
django-celery
Task scheduling and monitoring trough the Django admin
interface
● The celeryconf.py file is replaced by the django
settings
● The CELERY_IMPORTS conf var is replaced by the
Django INSTALLED_APPS
You run celery trough the manage.py of your project
$ python manage.py celeryd -l INFO -E
Playing tasks with Django & Celery
EuroPython 2011 – Florence 14
Mauro Rocco
django-celery settings.py
INSTALLED_APPS += ("djcelery", )
.
.
import djcelery
djcelery.setup_loader()
.
.
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
.
.
#standard celery conf vars (Broker settings, concurrency ,...)
Playing tasks with Django & Celery
EuroPython 2011 – Florence 15
Mauro Rocco
django-celery schema
Playing tasks with Django & Celery
EuroPython 2011 – Florence 16
Mauro Rocco
django-celery
Playing tasks with Django & Celery
EuroPython 2011 – Florence 17
Mauro Rocco
django-celery
Playing tasks with Django & Celery
EuroPython 2011 – Florence 18
Mauro Rocco
Some little nice extensions
Execute tasks directly from the django admin interface
Playing tasks with Django & Celery
EuroPython 2011 – Florence 19
Mauro Rocco
Some little nice extensions
Playing tasks with Django & Celery
EuroPython 2011 – Florence 20
Mauro Rocco
Jamendo needs UNIQUE tasks
A task is unique when can run only one instance of it at the same time
in the whole cloud
● Rational utilization of shared resources
● Atomic access to sensitive resources
Our idea:
● Define a list of UNIQUE tasks in settings.py
● If a lock is found define the behaviour retry or fail
● Allow the possibility of define a task UNIQUE on arguments (same
task type with different arguments can run)
● Our solution : mongodb for write and release locks.
● Best solution: cache, virtual file system ?
Playing tasks with Django & Celery
EuroPython 2011 – Florence 21
Mauro Rocco
Unique tasks
UNIQUE_TASKS = {
"searchengines.solr.index": { "retry_on_lock": False, "lock_on_type": True, },
"stats.album.rebuild": { "retry_on_lock": True, "lock_on_type": False, },
}
On task start ( method __call__ )
self.taskConcurrency = None
if kwargs[“task_name”] in settings.UNIQUE_TASKS:
self.taskConcurrency = TaskConcurrency(kwargs,
args,
settings.UNIQUE_TASKS\
[kwargs[“task_name”]])
if not self.taskConcurrency.canRun():
if self.taskConcurrency.retry:
self.incrementalRetry(Exception("Concurrency Exception"))
else:
raise Exception("Concurrency Exception")
On task end ( method after_return )
if self.taskConcurrency:
self.taskConcurrency.__exit__()
Playing tasks with Django & Celery
EuroPython 2011 – Florence 22
Mauro Rocco
Celery logs
● The logger object is not unique, the same
handler is added to different logs object
● Main Process logger, PoolWorker logger,
TaskLogger
● The command logging.getLogger(“Celery”) give
you back only the Main Process logger
● Extend logging features was a bit tricky until
the last version
Playing tasks with Django & Celery
EuroPython 2011 – Florence 23
Mauro Rocco
Centralized logging
● We give a very little contribute to celery by adding the
signal after_setup_logger and after_setup_task_logger
(the name are self explanatory)
● after_setup_logger is triggered after the build of the Main
Process logger and after the build of each PoolWorker
logger
● The signals give you back a log object, in this way you can
add additional handler for implement a centralized logging
● In our specific case we are sending the logs of all workers
to a syslog server that store log lines in a separated file.
Playing tasks with Django & Celery
EuroPython 2011 – Florence 24
Mauro Rocco
Centralized logging
import logging
from celery.signals import after_setup_logger, after_setup_task_logger
def after_setup_logger_handler(sender=None, logger=None,
loglevel=None, logfile=None,
format=None, colorize=None,
**kwds):
handler = logging.handlers.SysLogHandler(address=('syslogserver',
514))
handler.setFormatter(logging.Formatter(format))
handler.setLevel(logging.INFO)
logger.addHandler(handler)
after_setup_logger.connect(after_setup_logger_handler)
after_setup_task_logger.connect(after_setup_logger_handler)
Playing tasks with Django & Celery
EuroPython 2011 – Florence 25
Mauro Rocco
Thank you
http://www.celeryproject.org
QA
Playing tasks with Django & Celery
EuroPython 2011 – Florence 26
Mauro Rocco