Easily manage and submit robust jobs to Slurm using Python and Bash.
- Freezes source code by copying to separate $JOB_DIR.
- Auto-submits another job if current job times out.
- Exposes hooks for custom bash code: setup/setup_resume,on_run/on_run_resume, andteardown.
- Format job names using parameters from config files.
- Interactive jobs supported for easy debugging.
pip install easy-slurmEasy Slurm provides a CLI / YAML interface, as well as a Python interface.
To submit a job, simply fill in the various parameters shown in the example below.
import easy_slurm
easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date}-{job_name}",
    src=["./src", "./assets"],
    setup="""
        virtualenv "$SLURM_TMPDIR/env"
        source "$SLURM_TMPDIR/env/bin/activate"
        pip install -r "$SLURM_TMPDIR/src/requirements.txt"
    """,
    setup_resume="""
        # Runs only on subsequent runs. Call setup and do anything else needed.
        setup
    """,
    on_run="cd src && python main.py",
    on_run_resume="cd src && python main.py --resume",
    teardown="""
        # Do any cleanup tasks here.
    """,
    sbatch_options={
        "job-name": "example-simple",
        "account": "your-username",
        "time": "3:00:00",
        "nodes": "1",
    },
    resubmit_limit=64,  # Automatic resubmission limit.
)All job files will be kept in the job_dir directory. Provide directory paths to src -- these will be archived and copied to the job_dir directory. Also provide Bash code in the hooks, which will be run in the following order:
| First run: | Subsequent runs: | 
|---|---|
| setup | setup_resume | 
| on_run | on_run_resume | 
| teardown | teardown | 
Full examples can be found here, including a simple example to run "training epochs" on a cluster.
Jobs can also be fully configured using YAML files. See examples/simple_yaml.
| job.yaml | 
| job_dir: "$HOME/jobs/{date}-{job_name}"
src: ["./src", "./assets"]
setup: |
  virtualenv "$SLURM_TMPDIR/env"
  source "$SLURM_TMPDIR/env/bin/activate"
  pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
  # Runs only on subsequent runs. Call setup and do anything else needed.
  setup
on_run: "cd src && python main.py"
on_run_resume: "cd src && python main.py --resume"
teardown: |
  # Do any cleanup tasks here.
sbatch_options:
  job-name: "example-simple"
  account: "your-username"
  time: "3:00:00"
  nodes: 1
resubmit_limit: 64  # Automatic resubmission limit. | 
Then submit the job using:
easy-slurm --job="job.yaml"One can override the parameters in the YAML file using command-line arguments. For example:
easy-slurm --job="job.yaml" --src='["./src", "./assets", "./extra"]'One useful feature is formatting paths using custom template strings:
easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date:%Y-%m-%d_%H-%M-%S_%3f}-{job_name}",
)The job names can be formatted using a config dictionary:
easy_slurm.submit_job(
    sbatch_options={
        "job-name": "bs={hp.batch_size:04},lr={hp.lr:.1e}",
        # Equivalent to:
        # "job-name": "bs=0032,lr=1.0e-02"
    },
    config={"hp": {"batch_size": 32, "lr": 1e-2}},
)This helps in automatically creating descriptive, human-readable job names.
For the CLI / YAML interface, the same can be achieved using the --config argument:
easy-slurm --job="job.yaml" --config="config.yaml"See the documentation for more information and examples.