For the work I am doing, I need a job queue system with the following features:
Uses a PostgreSQL database as the (default) Job Queue. Postgres provides persistence and ACID transaction guarantees. It is the simplest way to ensure that a job is not lost. It is also already running in most application clusters, so building on Postgres enables developers to easily add a Job Queue to their application without substantially increasing the necessary complexity of their application.
Defines Job Workflows as a DAG of Tasks. Most existing job queue systems (e.g., Celery) define jobs as single tasks, so it's up to the user to define more complex workflows. But many workflows (like CI/CD pipelines and data applications) need to be able to define workflows at a higher level as a DAG of tasks, in which a given task might depend on earlier tasks that must first be completed, and which might be run in parallel with other tasks in the workflow.
Runs each Task in a (temporary) container using an image. Many existing task queue systems assume that the programming environment in which the queue worker is written is available for the execution of each task. For example, Celery tasks are written and run in python. Instead, we need to be able to run tasks that have any environment that can be defined and built in a container image. This enables a task to use any software, not just the software that is available in the queue worker.
[TODO] Can use a message broker as the Job Queue. Applications that need higher performance and throughput than PostgreSQL can provide must be able to shift up to something more performant. For example, RabbitMQ is a very high-performance message broker written in Erlang.
[TODO] Can run (persistent) Task workers. Some Tasks or Task environments (images) are anticipated as being needed continually. In such job environments, the Task workers can be made persistent services that listen to the Job queue for their own Jobs. (In essence, this allows a Task to be a complete sub-workflow being handled by its own Workflow Job queue workers, in which the Tasks are enabled to run inside the Job worker container as subprocesses.)
docker run(must have a docker daemon running and be able to use it).
kubectl describe?) until it has completed
Given the Workflow DAG for the Job, the general Execution algorithm can be:
Implementing this algorithm requires multiprocessing with message passing. Super interesting!
(Using ZeroMQ is simpler and more robust than multiprocessing.Queue from the Python standard library.)