Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[feature request] Swarm mode should support batch/cron jobs in addition to persistent services #23880

@nathanleclaire

Description

@nathanleclaire

Description

Today Swarm mode allows users to specify a group of homogenous containers which are meant to be kept running with the docker service CLI. This abstraction, while powerful, may not be the right fit for containers which are intended to eventually terminate or only run periodically.

Consider, for instance:

  • An admin who wishes to allow users to submit long-running compiler jobs on a Swarm cluster
  • A website which needs to process all user uploaded images into thumbnails of various sizes
  • An operator who wishes to periodically run docker rmi $(docker images --filter dangling=true -q) on each machine

Problem

Though some use cases could potentially be implemented by a service which pulls jobs off a distributed queue, there are some issues with this approach:

  1. It adds an operational burden of creating, administrating, and ensuring the health of such a queue. For many, this will kick up the barrier to entry of performing such tasks.
  2. It does not necessarily ensure that failed jobs can be re-run the correct number of times or with the correct parameters. Like above, this burden has now been offloaded to the user instead of being natively orchestrated.
  3. It does not allow for easy customization of job parallelism settings.

Anything which is intended to be run periodically (such as the image garbage collection example above), could potentially cause a thundering herd problem if the scheduling is not handled by Swarm. Imagine, for instance, that a user creates a docker service to periodically run a command inside the containers using normal old cron. If these all wake up and attempt to execute at the same time during a time when production traffic is surging, it may be an issue for more critical production web services. While it may help to mitigate the problem if the users do proper capacity planning and use flags such as --reserve-cpu, separating these concerns early on, especially when a temporal element is involved, seems to prudent. (thanks to @jpetazzo who originally pointed out these concerns to me, and probably will have good insight as well)

This issue requests and outlines a proposed CLI, to get the ball rolling on discussion, track the issue, and gather information about potential use cases.

Proposal

A new top-level command, docker jobs, could be introduced. It would allow to specify that a container should be run X times, or every Y interval of time. It could also be used to check up on these jobs.

Examples.

Run batch job once:

$ docker jobs create \
    -e S3_INPUT_BUCKET \
    -e S3_OUTPUT_BUCKET \
    -e IMAGE_NAME \
    nathanleclaire/convert
3x3bq2ibh1qe

$ docker jobs wait 3x3bq2ibh1qe; docker jobs ls
ID              NEXT PREV           FINISHED   FAILURES IMAGE
3x3bq2ibh1qe    -    3 seconds ago  1/1        0/1      nathanleclaire/convert 

Run batch job 16 times:

$ docker jobs create \
    --runs 16 \
    --parallel 3 \
    nathanleclaire/failer
bku4f1s1ncm0

$ docker jobs ls
ID            NEXT  PREV           FINISHED   FAILURES IMAGE
3x3bq2ibh1qe  -     2 minutes ago  1/1        0/1      nathanleclaire/convert 
bku4f1s1ncm0  Now   Now            4/16       6/10     nathanleclaire/failer

Run a task every hour:

$ docker jobs create \
    --every 1hr \
    nathanleclaire/hourlytask
ddh7pqvgbd8l

$ # One hour later...

$ docker jobs ls
ID            NEXT PREV         FINISHED  FAILURES IMAGE
3x3bq2ibh1qe  -    1 hour ago   1/1       0/1      nathanleclaire/convert 
bku4f1s1ncm0  -    1 hour ago   16/16     6/20     nathanleclaire/failer 
ddh7pqvgbd8l  1hr  1 minute ago 1/1       0/1      nathanleclaire/hourlytask

Interactively re-run a job:

$ docker jobs restart 3x3bq2ibh1qe
Running job 3x3bq2ibh1qe again from the beginning

Alternatively, service model could be expanded to accommodate this? But it seems they would be easier to manage (for users) as separate things.

Please let me know what you think (when you get a chance -- please focus on 1.12 first and foremost ;) ) @aluzzardi @vieux @stevvooe @abronan and others.

Cute animal

Since I'm requesting feature, least I can do is provide cute animal picture.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/swarmkind/enhancementEnhancements are not bugs or new features but can improve usability or performance.kind/featureFunctionality or other elements that the project doesn't currently have. Features are new and shiny

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions