Engineering

Celery in production: How to be ready and quickly resolve issues

Mohamed Taha
By Mohamed Taha
  • Blog
  • Engineering
  • Celery in production: How to be ready and quickly resolve issues

Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing the tools required to maintain operations. In fact, Celery has become the defacto standard for such a task in the Python ecosystem.


Many large companies, like Instagram, Mozilla and Udemy, rely on it for different tasks, from sending emails to processing large files or generating image thumbnails of different sizes in the background. You can find out more here.


What this means is that it’s a reliable and battle-tested technology. Awesome, right? But here comes the problem. Celery is very different when you run it in a development mode rather than in production mode; things go wrong, you discover you misconfigured it, there are scalability issues, duplicated task executions and much more – and none of these issues are addressed in the beginners tutorials. Things can get really tough.


I don’t want to sound so pessimistic. In fact, I find it a satisfying challenge and a useful learning process. And yes, you will learn how to cope by experimenting with production issues and fixing them, but things really get problematic if you don’t have a good control panel and the right tools to help you. You need to be alerted to an issue, have the right tools to identify the issue, and finally be able to fix it, making sure it doesn’t happen again.

So what is this post about?

I’m going to prepare you for when issues arise in your production environment and explain the correct celery setup that will help you solve such incidents. Although the things mentioned might seem a little basic for some, especially more experienced technologists, this tutorial can be considered as a best practice checklist to help create a healthy Celery production setup.

Choose your message broker wisely

Celery supports several message brokers, including RabbitMQ which is the default, but you might want to use Redis or Amazon SQS. Do your research and find out more about the broker, to ensure you choose wisely before starting your development. They may all do the same job, but they have some differences regarding their configurations – so make sure you choose one that will benefit your project. I learned this the hard way, joining a project that was using Redis. We had to complete a critical task- a payment- that was expected to run once, but we found out that it was programmed to run more than 20 times! Thousands of dollars could have been lost due to a misconfiguration of the Redis Visibility Timeout!

Takeaway:

Research Celery supported brokers here and once you decide on the one to go with, read through its configurations carefully.

Monitoring

The ability to track and monitor your task queues and workers in real-time is essential. Celery provides a few options that you should familiarize yourself with, because they will be your swiss army knife to fix any difficult situation.

Here are the fundamental ones:

  • Flower: this is essential. It’s a real-time web dashboard where you can monitor task progress and history, see graphs and statistics and remotely control everything including tasks lifecycle, queues, and workers etc.
  • Broker client: depending on the broker, you can use its own client, for example, redis-cli or rabbitmqctl. They can be helpful sometimes for inspecting the queues and seeing what’s inside them.
  • Management command-line utilities: they are celery commands for inspecting and controlling the workers. For example, you can see a list of the currently live executed tasks using the command `celery -A proj inspect active`

You can check everything about Celery monitoring here.

Logging

Most of the time your logs will be the source of truth to refer to when something unexpected happens. That’s why it’s a key feature in modern software applications, so make sure you properly set up your Celery log configurations. Also, ensure you have enough information logged when something changes in your code. Log the item details such as the `ID` to make easier for debugging and troubleshooting.

Application errors real-time monitoring

“Software errors are inevitable. Chaos is not.” sentry.io

Errors in production will always happen, but you can discover them early and quickly resolve them before they become invasive. As a result, you need a system that monitors your application, and instantly alerts your team via different channels, such as email or Slack. Not only that, you require a system that can assign who is responsible for this issue, tell you what its priority is, and give you full context about the issue.

Nowadays, Sentry, is the defacto standard for application errors realtime monitoring, with companies such as Uber, Microsoft, Paypal and leveraging it.

That’s it!

Enjoy getting to know Celery!

If you found Mohamed’s blog useful, check out our other blog posts for more essential insights!

Are you a developer interested in growing your software engineering career? Apply to join the Andela Talent Network today.

Mohamed Taha
Written by
Mohamed Taha
Software Engineer | Side Projects | Python | Django