Thanks to visit codestin.com
Credit goes to github.com

Skip to content

siligam/sworkflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

sworkflow

A lightweight Python toolkit for composing, visualizing, and submitting Slurm job workflows with complex dependencies.

Instead of writing fragile Bash scripts with nested sbatch --dependency calls, sworkflow lets you declare dependencies cleanly in Python or YAML, visualize them as a graph, and submit jobs in the correct order.

❓ Why sworkflow?

Traditional Bash or β€œPython-flavored Bash” workflows for Slurm often suffer from:

  • Fragile chaining: manual sbatch --dependency wiring, brittle string parsing
  • Hard fan-in/out: merging branches and complex DAGs is error-prone
  • Little validation/visibility: cycles and typos appear late at submit/run time
  • Poor reuse: copy-pasted scripts with ad-hoc parameters

sworkflow addresses this by:

  • Declarative DAGs: dependencies as data (YAML/Python), not shell glue
  • Built-in validation: uses graphlib.TopologicalSorter to prevent cycles and order jobs
  • Visualization first: render the DAG before submitting
  • Consistent submission: captures job IDs and applies dependency rules uniformly
  • Python API + CLI: use as a library or via simple commands

Quick contrast:

# Bash (fragile)
jid_pre=$(sbatch --parsable preprocess.sh)
jid_train=$(sbatch --parsable --dependency=afterok:$jid_pre train.sh)
jid_post=$(sbatch --parsable --dependency=afterok:$jid_train postprocess.sh)

echo "preprocess=$jid_pre train=$jid_train postprocess=$jid_post"


# sworkflow (declarative)
dependency:
  train: afterok:preprocess
  postprocess: afterok:train
jobs:
  preprocess: preprocess.sh
  train: train.sh
  postprocess: postprocess.sh

🧭 Table of Contents


πŸš€ Features

  • Declarative workflow definition – express dependencies in a dictionary or YAML file
  • Visualization – generate ASCII or graph-based DAGs before submission
  • Python API & CLI – use as a library or standalone tool
  • Safer workflows – prevents dependency cycles, ensures correct ordering
  • Config-driven – define jobs in YAML for reuse and easy editing

πŸ“¦ Prerequisites

  • Python 3.9+ (uses graphlib.TopologicalSorter)
  • A working Slurm environment (sbatch, squeue, sacct available)
  • Optional: Graphviz for advanced graph visualization

πŸ”§ Installation

Clone the repository and install:

git clone https://github.com/siligam/sworkflow.git
cd sworkflow

# Option A: Conda environment
conda env create -f environment.yaml -n sworkflow
conda activate sworkflow
pip install .

# Option B: Virtualenv / system install
python3 -m venv venv
source venv/bin/activate
pip install .

⚑ Quick Start

Example workflow (workflow.yaml)

dependency:
  train: afterok:preprocess
  postprocess: afterok:train

jobs:
  preprocess: preprocess.sh
  train: train.sh
  postprocess: postprocess.sh

Submit and visualize

# Visualize workflow
sworkflow -f workflow.yaml vis

# Submit workflow
sworkflow -f workflow.yaml submit

# Check job status
sworkflow -f workflow.yaml status

🧾 YAML schema (quick reference)

dependency:        # map[job] -> "<condition>:<dep1>[:<dep2>...]"
  train: afterok:preprocess
  eval: afterany:train:postprocess

jobs:              # map[job] -> shell command or path to script
  preprocess: preprocess.sh
  train: train.sh
  postprocess: postprocess.sh

Note: If a job value does not include the word sbatch, sworkflow will automatically prepend sbatch --parsable and inject the appropriate --dependency=... flag based on your dependency mapping. You may also pass raw sbatch flags directly (e.g., --array=... --wrap=...).


🐍 Python API

Define workflows directly in Python:

import sworkflow

dependency = {
    "train": "afterok:preprocess",
    "postprocess": "afterok:train",
}

jobs = {
    "preprocess": "preprocess.sh",
    "train": "train.sh",
    "postprocess": "postprocess.sh",
}

suite = sworkflow.Suite(dependency, jobs)

suite.visualize(as_ascii=True)
suite.submit()

Output:

preprocess β†’ train β†’ postprocess

🌳 Advanced Workflows

Branch and Merge

dependency:
  B: afterok:A
  C: afterok:A
  D: afterok:B:C
  E: afterok:D

jobs:
  A: A.sh
  B: B.sh
  C: C.sh
  D: D.sh
  E: E.sh

This produces:

    A
   / \
  B   C
   \ /
    D
    |
    E

Job Arrays

dependency:
  analyze: afterok:array

jobs:
  array: --array=10,20,30 --wrap='sleep $SLURM_ARRAY_TASK_ID'
  analyze: analyze.sh

πŸ”— Dependency syntax

  • Conditions: afterok, afterany, afternotok
  • Multiple predecessors are colon-separated, e.g. afterok:B:C means run after B and C succeed
  • All referenced predecessors must be defined under jobs
  • Example:
dependency:
  D: afterok:B:C
jobs:
  B: sbatch B.sh
  C: sbatch C.sh
  D: sbatch D.sh

πŸ“Š Visualization

sworkflow can render ASCII or Graphviz diagrams.

sworkflow -f workflow.yaml vis

Output:

preprocess
   |
 train
   |
postprocess

❓ CLI Reference

  • vis – visualize workflow
  • submit – submit jobs with dependencies
  • status – check current job states

You can set a default workflow file:

export SFILE=workflow.yaml
sworkflow vis

πŸ§ͺ Examples

See examples/minimal/ for a minimal runnable setup:

  • examples/minimal/workflow.yaml – declarative DAG
  • examples/minimal/preprocess.sh, examples/minimal/train.sh, examples/minimal/postprocess.sh – sample jobs

Run locally (requires Slurm):

export SFILE=examples/minimal/workflow.yaml
sworkflow vis
sworkflow submit

Note: make scripts executable first:

chmod +x examples/minimal/*.sh

⚠️ Error Handling

  • Dependencies are resolved using graphlib.TopologicalSorter, preventing cycles
  • Jobs will only run if their dependency conditions (afterok, afterany, afternotok) are satisfied
  • Use sworkflow status to monitor running workflows

πŸ› οΈ Troubleshooting

  • command not found: sbatch – ensure Slurm is installed/loaded and on your PATH (e.g., module load slurm)
  • Graphviz visualization fails – install graphviz and ensure dot is on PATH
  • Jobs stuck in PENDING – check partition/account/QA constraints and your sbatch resource flags (--time, --mem, --account, etc.)
  • status shows nothing – confirm sacct is enabled at your site and you have permission to query accounting data

🧭 Scope & limitations

  • Designed for Slurm; requires sbatch/squeue/sacct
  • No built-in retries/backoff beyond what you script in your job commands
  • Not a full workflow engine (no caching, scheduling, or cross-cluster orchestration)
  • status relies on Slurm accounting and may be subject to site-specific retention/latency

πŸ“š Resources


🀝 Contributing

Issues and pull requests are welcome!

  • Fork the repo and create a feature branch
  • Add tests or examples if applicable
  • Submit a pull request with a clear description

πŸ“œ License

MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages