Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NREL/torc

Repository files navigation

Torc workflow management system

Distributed workflow orchestration for complex computational pipelines

Torc is a workflow management system designed for running large-scale computational workflows with complex dependencies on local machines and HPC clusters. It uses a client-server architecture with a centralized SQLite database for state management and coordination.

License

Project Status

The software is currently being ported from Python + JavaScript + ArangoDB to Rust + SQLite. This increases portability, especially for local environments. Previous releases are still available and supported.

Most functionality is currently available, but the interfaces should not be treated as stable. Validation is not complete, and so the tool should not be used for production workloads. We expect the port to be ready for use by January 2026.

Please post new ideas for Torc in the discussions.

Features

  • Declarative Workflow Specifications - Define workflows in YAML, JSON5, JSON, or KDL
  • Automatic Dependency Resolution - Dependencies inferred from file and data relationships
  • Job Parameterization - Create parameter sweeps and grid searches with simple syntax
  • Distributed Execution - Run jobs across multiple compute nodes with resource tracking
  • Slurm Integration - Native support for HPC cluster job submission
  • Workflow Resumption - Restart workflows after failures without losing progress
  • Change Detection - Automatically detect input changes and re-run affected jobs
  • Resource Management - Track CPU, memory, and GPU usage across all jobs
  • RESTful API - Complete OpenAPI-specified REST API for integration

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Torc Server                          │
│  ┌────────────────────────────────────────────────────┐     │
│  │            REST API (Tokio 8-thread)               │     │
│  │  /workflows  /jobs  /files  /user_data  /results  │     │
│  └───────────────────┬────────────────────────────────┘     │
│                      │                                       │
│  ┌───────────────────▼────────────────────────────────┐     │
│  │              SQLite Database (WAL)                 │     │
│  │  • Workflow state    • Job dependencies           │     │
│  │  • Resource tracking • Execution results          │     │
│  └────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘
                             ▲
                             │ HTTP/REST
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
┌───────▼────────┐  ┌────────▼────────┐  ┌───────▼────────┐
│  Torc Client   │  │  Job Runner 1   │  │  Job Runner N  │
│                │  │  (compute-01)   │  │  (compute-nn)  │
│ • Create       │  │                 │  │                │
│   workflows    │  │ • Poll for jobs │  │ • Poll for jobs│
│ • Submit specs │  │ • Execute tasks │  │ • Execute tasks│
│ • Monitor      │  │ • Report results│  │ • Report results│
└────────────────┘  └─────────────────┘  └────────────────┘

Why develop another workflow management tool?

Since there are so many open source workflow management tools available, some may ask, "why develop another?" We evaluated many of them, including Nextflow, Snakemake, and Pegasus. Those are excellent tools and we took inspiration from them. However, they did not fully meet our needs and it wasn't difficult to create exactly what we wanted.

Here are the features of Torc that we think differentiate it from other tools:

  • Simple execution on local computers. Many tools require advanced setup and management. Torc provides precompiled binaries for each supported platform.

  • Node packing on HPC compute nodes

    A Torc worker can maintain a maximum queue depth of jobs on a compute node until the allocation runs out of time. Users can start workers on any number single-node or multi-node allocations.

    Users that are not savvy with Bash, Slurm, or workflows can easily distribute many jobs across nodes.

  • Torc API Server

    Torc provides a server that implements an API conforming to an OpenAPI specification, providing automatic client library generation. We use both Python and Julia clients to build and manage workflows. Users can monitor workflows through Torc-provided CLI and TUI applications or develop their own scripts.

  • Debugging errors

    We run large numbers of simulations on untested input data. Many of them fail. Torc provides automatic resource monitoring, log colletion, and detailed error reporting through raw text, tables, and formatted JSON. Torc makes it easy for users to rerun failed jobs after applying fixes.

  • Traceability

    All workflows and results are stored in a database, tracked by user and other metadata.

License

Torc is released under a BSD 3-Clause license.

Software Record

This package is developed under NREL Software Record SWR-24-127.