-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
Today Swarm mode allows users to specify a group of homogenous containers which are meant to be kept running with the docker service CLI. This abstraction, while powerful, may not be the right fit for containers which are intended to eventually terminate or only run periodically.
Consider, for instance:
- An admin who wishes to allow users to submit long-running compiler jobs on a Swarm cluster
- A website which needs to process all user uploaded images into thumbnails of various sizes
- An operator who wishes to periodically run
docker rmi $(docker images --filter dangling=true -q)on each machine
Problem
Though some use cases could potentially be implemented by a service which pulls jobs off a distributed queue, there are some issues with this approach:
- It adds an operational burden of creating, administrating, and ensuring the health of such a queue. For many, this will kick up the barrier to entry of performing such tasks.
- It does not necessarily ensure that failed jobs can be re-run the correct number of times or with the correct parameters. Like above, this burden has now been offloaded to the user instead of being natively orchestrated.
- It does not allow for easy customization of job parallelism settings.
Anything which is intended to be run periodically (such as the image garbage collection example above), could potentially cause a thundering herd problem if the scheduling is not handled by Swarm. Imagine, for instance, that a user creates a docker service to periodically run a command inside the containers using normal old cron. If these all wake up and attempt to execute at the same time during a time when production traffic is surging, it may be an issue for more critical production web services. While it may help to mitigate the problem if the users do proper capacity planning and use flags such as --reserve-cpu, separating these concerns early on, especially when a temporal element is involved, seems to prudent. (thanks to @jpetazzo who originally pointed out these concerns to me, and probably will have good insight as well)
This issue requests and outlines a proposed CLI, to get the ball rolling on discussion, track the issue, and gather information about potential use cases.
Proposal
A new top-level command, docker jobs, could be introduced. It would allow to specify that a container should be run X times, or every Y interval of time. It could also be used to check up on these jobs.
Examples.
Run batch job once:
$ docker jobs create \
-e S3_INPUT_BUCKET \
-e S3_OUTPUT_BUCKET \
-e IMAGE_NAME \
nathanleclaire/convert
3x3bq2ibh1qe
$ docker jobs wait 3x3bq2ibh1qe; docker jobs ls
ID NEXT PREV FINISHED FAILURES IMAGE
3x3bq2ibh1qe - 3 seconds ago 1/1 0/1 nathanleclaire/convert Run batch job 16 times:
$ docker jobs create \
--runs 16 \
--parallel 3 \
nathanleclaire/failer
bku4f1s1ncm0
$ docker jobs ls
ID NEXT PREV FINISHED FAILURES IMAGE
3x3bq2ibh1qe - 2 minutes ago 1/1 0/1 nathanleclaire/convert
bku4f1s1ncm0 Now Now 4/16 6/10 nathanleclaire/failer
Run a task every hour:
$ docker jobs create \
--every 1hr \
nathanleclaire/hourlytask
ddh7pqvgbd8l
$ # One hour later...
$ docker jobs ls
ID NEXT PREV FINISHED FAILURES IMAGE
3x3bq2ibh1qe - 1 hour ago 1/1 0/1 nathanleclaire/convert
bku4f1s1ncm0 - 1 hour ago 16/16 6/20 nathanleclaire/failer
ddh7pqvgbd8l 1hr 1 minute ago 1/1 0/1 nathanleclaire/hourlytask
Interactively re-run a job:
$ docker jobs restart 3x3bq2ibh1qe
Running job 3x3bq2ibh1qe again from the beginningAlternatively, service model could be expanded to accommodate this? But it seems they would be easier to manage (for users) as separate things.
Please let me know what you think (when you get a chance -- please focus on 1.12 first and foremost ;) ) @aluzzardi @vieux @stevvooe @abronan and others.
Cute animal
Since I'm requesting feature, least I can do is provide cute animal picture.