Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ADR: Remote pipeline inclusion#7213

Open
bentsherman wants to merge 5 commits into
masterfrom
adr-meta-pipelines
Open

ADR: Remote pipeline inclusion#7213
bentsherman wants to merge 5 commits into
masterfrom
adr-meta-pipelines

Conversation

@bentsherman

Copy link
Copy Markdown
Member

This PR adds an ADR for remote pipeline inclusion, aka "meta-pipelines".

It describes an approach for including remote pipelines into a meta-pipeline in a way that preserves dataflow concurrency between pipeline inputs/outputs.

It discusses alternative approaches such as pipeline chaining / nf-cascade and why they don't satisfy certain use cases (preserving dataflow concurrency).

It also walks through a basic example of fetchngs -> rnaseq.

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman bentsherman requested review from ewels and pditommaso June 10, 2026 00:19
@netlify

netlify Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 09c9e96
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a2ae8fb06f1bf00081bb8b5
😎 Deploy Preview https://deploy-preview-7213--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bentsherman bentsherman added this to the 26.10 milestone Jun 10, 2026
@ewels

ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

Great write up, thanks for this Ben!

As you might expect, I'm most concerned about the params. You characterise it as a one-off cost which is mitigated by LLMs, however that doesn't take into account updates to included pipelines (a core functionality with included modules). The params drift with updates would be dangerous and a constant source of dev work.

I'd still love to look into how we could bulk import nested config and apply it at root level. Even if it is a separate import + apply mechanism (eg. like config profiles in a sense?). I think without it, the use of the meta pipeline functionality is substantially limited.

Comment on lines +58 to +61
3. No use of project-level assets (`projectDir`, `bin`, `lib`) within the core workflow. Module-level assets can be used through the module `resources/` bundle and `moduleDir`.
4. Declare software dependencies (`container`, `conda`) in the process definition, not in config.
5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.
6. No plugin functions within the core workflow.

@jorgee jorgee Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clear about some of these best practices and what's the issue of not following them; maybe could be good to add an example.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following these guidelines makes it so that when you include the core workflow and its dependent modules/subworkflows, it is self-contained

For example:

  • if the core workflow uses project-level assets like bin or lib, I have to remember to copy them into the meta-pipeline
  • if the core workflow uses a param directly and I import that into the meta-pipeline, I have to remember to define the same param (with the same meaning) in the meta-pipeline
  • and so on

> results/output-rnaseq.json
```

While pipeline chaining has always been possible in theory, new language features such as [workflow outputs](20251020-workflow-outputs.md) and [record types](20260306-record-types.md) make it much more practical. Each pipeline can define a structured output which can be passed to the next pipeline via JSON. Mismatches between an upstream output and downstream input (e.g. missing columns, different column names) can be resolved by a small adapter pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain to be convinced of the point of pipeline chaining if we can trivially make meta pipelines.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think pipeline chaining is used because metapipelines don't work right now. If they did, the number of pipeline chains drops.

That's not to say they're never useful, but it's much less common.

Two main use cases:

  • Run major pipeline (sarek, rnaseq) and add a few auxiliary processes
  • Daisy chain two pipelines (fetchngs -> rnaseq)

Both are solved better by metapipelines than pipeline chaining.

The main use case for daisy chaining is actually wiring nextflow up to non Nextflow tools, e.g. Nextflow into an ETL system. In this case structured inputs and outputs are still very useful.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point the value prop of pipeline chaining appears to be low development overhead (just plug A into B)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, chaining has development overhead, it's quite a faff, all we have to do is bring meta-pipeline dev under that faff level

Comment thread adr/20260608-remote-pipeline-inclusion.md
@pinin4fjords

Copy link
Copy Markdown
Contributor

As you might expect, I'm most concerned about the params.

Agreed. Feel like we need some sort of auto-import of the params of child workflows, so e.g. they appear automatically in Platform, and I could say e.g. meta.rnaseq.pseudoaligner = 'kallisto' in the meta pipeline's nextflow.config to override.

Then some auto-assembly of docs as well.

Basically we need to standardise at the nextflow level where a bunch of the non-nextflow pieces need to live.

3. No use of project-level assets (`projectDir`, `bin`, `lib`) within the core workflow. Module-level assets can be used through the module `resources/` bundle and `moduleDir`.
4. Declare software dependencies (`container`, `conda`) in the process definition, not in config.
5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.
6. No plugin functions within the core workflow.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin support feels like a requirement, functions like a webhook or logging statement could be critical for the workflow. The main challenge might be supporting multiple versions (e.g. WORKFLOW1 uses [email protected] and WORKFLOW2 uses [email protected]), but maybe we can just say "ONE PLUGIN ONLY"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins for webhooks / logging typically live outside the core workflow. so the meta-pipeline would just import the core workflow logic and decide whether to include those plugins in its own shell

I have yet to see a plugin that is actually used in a workflow's core logic, although it's certainly possible. Most plugins provide third-party integrations at the pipeline boundary

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but this might become more popular with the plugin registry + vibe coding.

Sounds like premature optimization by me, easier to just tell people to be careful and deal with it if it's a problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's why I call them out as best practices instead of hard rules. you can use a plugin function as long as you remember to declare it in the meta-pipeline config

2. No `publishDir` -- use the `output` block.
3. No use of project-level assets (`projectDir`, `bin`, `lib`) within the core workflow. Module-level assets can be used through the module `resources/` bundle and `moduleDir`.
4. Declare software dependencies (`container`, `conda`) in the process definition, not in config.
5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ext.args is soooooo powerful, yet clearly breaks the interface contract for processes.

I still think we should promote args to a directive and it will solve a number of these issues (process.args) 😉 .

process {
    args "--concise"
    // etc...
}

// main.nf
my_process(ch_inputs, args: "--verbose")

// nextflow.config
process.withName 'my_process' {
    args = "--verbose"
}

@bentsherman bentsherman Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both ext.args and process.args can work, as long as the default value for the arg is defined in the process definition rather than in config

the core problem is that when I import a workflow, Nextflow doesn't know which config is "tied" to that workflow

5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.
6. No plugin functions within the core workflow.

For process directives, it is helpful to distinguish *what* is executed vs *how* it is executed. Directives that affect the *what* (`container`, `ext` settings) should be owned by the process definition. Directives that affect the *how* (`cpus`, `memory`, `executor`, `queue`, `errorStrategy`) should be owned by the meta-pipeline.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the distinction here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words, some directives affect the task result while others don't


Alternatively, these core plugin dependencies could be specified in the pipeline spec under `requires.plugins`. When installing a pipeline, Nextflow could copy these plugin declarations into the meta-pipeline config and/or spec.

Since this use case is rare -- plugin functions are typically used in the entry workflow outside the core workflow -- it can be deferred in the first iteration.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more private plugin registries, I expect more utility methods in plugins (e.g. updateLims(sampleId, status)), but maybe this is premature optimization.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A LIMS integration sounds like something that could live outside the core workflow

> results/output-rnaseq.json
```

While pipeline chaining has always been possible in theory, new language features such as [workflow outputs](20251020-workflow-outputs.md) and [record types](20260306-record-types.md) make it much more practical. Each pipeline can define a structured output which can be passed to the next pipeline via JSON. Mismatches between an upstream output and downstream input (e.g. missing columns, different column names) can be resolved by a small adapter pipeline.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think pipeline chaining is used because metapipelines don't work right now. If they did, the number of pipeline chains drops.

That's not to say they're never useful, but it's much less common.

Two main use cases:

  • Run major pipeline (sarek, rnaseq) and add a few auxiliary processes
  • Daisy chain two pipelines (fetchngs -> rnaseq)

Both are solved better by metapipelines than pipeline chaining.

The main use case for daisy chaining is actually wiring nextflow up to non Nextflow tools, e.g. Nextflow into an ETL system. In this case structured inputs and outputs are still very useful.


The Nextflow-in-Nextflow approach treats the included pipeline as a *black box* -- it preserves the exact pipeline behavior (core workflow + entry workflow + config) while forfeiting dataflow composition (separate dataflow graphs).

An ideal solution might combine the best of both: compose pipelines into a single dataflow graph (white box) while inheriting each pipeline's params, outputs, and config so they need not be replicated (black box). We considered such a model, where an included pipeline contributes its shell as namespaced, overridable defaults, but rejected it. Dataflow composition fundamentally requires exposing the core workflow as a set of channel ports, so the white-box mechanism is unavoidable; inheritance would only layer implicit behavior on top of it. That behavior comes at a steep cost: it relocates a one-time *write* cost (boilerplate) into a recurring *read* cost (hidden defaults, auto-bound arguments, auto-published outputs), burdens every tool that must now understand it (linter, type checker, config resolution, resume), and conflicts with the frozen-island philosophy that otherwise governs vendored code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. The added complexity is enormous.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewels @pinin4fjords @adamrtalbot

Pulling everyone into this thread to talk about auto-inheritance

As you might expect, I'm most concerned about the params. You characterise it as a one-off cost which is mitigated by LLMs, however that doesn't take into account updates to included pipelines (a core functionality with included modules). The params drift with updates would be dangerous and a constant source of dev work.

That's fair, but not my main point. The core problem is this -- if you want to preserve dataflow concurrency between pipelines, then you can't really just auto-import params into the meta-pipeline. You have to define which params are replaced with inter-pipeline wiring vs exposed to the top-level. That amounts to just writing the meta-workflow.

The development overhead is what it is. I suggest the AI skill just as an idea. I'm sure it could also handle updates. All of that is better than having loads of hidden behavior that makes the meta-pipeline impossible to reason about

I'd still love to look into how we could bulk import nested config and apply it at root level. Even if it is a separate import + apply mechanism (eg. like config profiles in a sense?). I think without it, the use of the meta pipeline functionality is substantially limited.

Not sure I understand this point. Most of the config is just standard boilerplate, so it doesn't make sense to auto-import it because you will just get lots of duplicate config

Unless you are talking about ext config. That will depend on whether we can move the default ext settings into the process definition

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on what Adam said:

In a scenario where I update my workflow from v1.1 to v1.2, an update to params should be explicit in the input block, not implicit and I hope it doesn't change too much.

The nice thing about an explicit meta-pipeline definition is that when I update the included pipeline, the linter / language server will immediately pick up on any inconsistencies, because it's just regular code. I'm not sure the tooling would be able to do that if there was a lot of implicit behavior

}

// perform RNAseq analysis
multiqc_report = NFCORE_RNASEQ( ch_samples )

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - I would remove MultiQC from all nf-core pipelines and put them in the metapipelines, i.e. no MultiQC repeats, but that's a matter of opinion.

FETCHNGS(ch_inputs)
RNASEQ(fetchngs.out)
MULTIQC(RNASEQ.out.qc_files)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that. Wasn't sure if you would want a meta-pipeline to produce one multiqc report per pipeline or just one for the whole thing

Comment thread adr/20260608-remote-pipeline-inclusion.md Outdated
@adamrtalbot

adamrtalbot commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

As you might expect, I'm most concerned about the params.

Agreed. Feel like we need some sort of auto-import of the params of child workflows, so e.g. they appear automatically in Platform, and I could say e.g. meta.rnaseq.pseudoaligner = 'kallisto' in the meta pipeline's nextflow.config to override.

Then some auto-assembly of docs as well.

Basically we need to standardise at the nextflow level where a bunch of the non-nextflow pieces need to live.

I disagree. Having unpredictable global scope params blocks is just weird and if we were designed Nextflow today we would never include this behaviour. In other languages, globals need to be used with caution and are generally not advised. Having random params.foo.bar.baz with no way of validating or checking is just "something you have to know", instead of being clear to the author.

In a scenario where I update my workflow from v1.1 to v1.2, an update to params should be explicit in the input block, not implicit and I hope it doesn't change too much.

If we really want to make them importable, we could add a dedicated params block to the workflow definition:

workflow THING {
   params:
        foo: Int
        bar: Bool
        baz: String

   take:
   // etc
}

but this doesn't feel very different to:

record ThingParams {
    foo: Int
    bar: Bool
    baz: String
}

workflow THING {
   take:
       params: ThingParams

   // etc
}

@adamrtalbot

Copy link
Copy Markdown
Collaborator

My main concern here is versioning of imported workflows. Do we include a lock file or something to ensure consistency or just trust in the files that are copied into the workflow code?


When a pipeline is included, it is vendored into the meta-pipeline project under `workflows/<scope>/<name>/`. Included pipelines are isolated -- each included pipeline has its own `modules/` and `workflows/` directories. This way, two pipelines can use different versions of the same module without compromising reproducibility.

Included pipelines should be committed to the meta-pipeline repository. The pipeline should have a *pipeline spec* (`nextflow_spec.json`) which specifies the pipeline version, so that Nextflow can track local changes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrtalbot

My main concern here is versioning of imported workflows. Do we include a lock file or something to ensure consistency or just trust in the files that are copied into the workflow code?

See here. Like modules, we will likely want to have some sort of checksum verification (e.g. .pipeline-info)

I guess the simplest way would be to commit the entire pipeline, even though only the core workflow will be used. Then you can have a single checksum for the entire pipeline directory

It's probably still useful to keep the pipeline shell in the meta-pipeline repo, since e.g. your agent will want to refer to it when updating the meta-pipeline

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nf-core copy+pastes modules for subworkflows and it works well!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we force shared modules if we wanted too?

Comment on lines +282 to +284
```groovy
include { NFCORE_FETCHNGS } from 'nf-core/fetchngs'
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone feeling adventurous, here is what Claude and I came up with while exploring auto-inheritance:

include { NFCORE_FETCHNGS } from 'nf-core/fetchngs'
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'

params {
    input: Path // meta entry point
    strandedness: String = 'auto' // one new knob
    // aligner / fasta / ... inherited from rnaseq's params, override on CLI as --rnaseq.fasta=...
}

workflow {
    main:
    ch_ids = channel.fromPath(params.input).splitCsv()
    ch_samples = NFCORE_FETCHNGS( ch_ids )

    ch_samples = samples.map { r -> r + record(strandedness: params.strandedness) }

    // rnaseq.* params automagically passed to rnaseq workflow via named arguments
    NFCORE_RNASEQ( samples: ch_samples )

    // no publish/output blocks: each pipeline's outputs publish under <output-dir>/<pipeline>/
    // question: what if I don't want to publish something (e.g. fetchngs output)?
}

Feel free to take it and run with it...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the way I was thinking. We just namespace the children's params

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea from today's dicussion: import the inner pipeline params as a record type:

include { NFCORE_FETCHNGS } from 'nf-core/fetchngs'
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'
include { params as RnaseqParams } from 'nf-core/rnaseq'

params {
    input: Path // meta entry point
    strandedness: String = 'auto' // one new knob
    rnaseq: RnaseqParams
}

// ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related idea: include params/entry/output trio as a fake named workflow so that you can just call it:

include { workflow as FETCHNGS } from 'nf-core/fetchngs'
include { workflow as RNASEQ } from 'nf-core/rnaseq'

workflow {
    fetchngs_out = FETCHNGS(params.fetchngs)

    // glue logic...
    rnaseq_in = params.rnaseq + fetchngs_out

    rnaseq_out = RNASEQ(rnaseq_in)
}

This doesn't require the inner pipelines to isolate a core workflow, it just includes the entire pipeline (params and publishing, not config)

If we allow channels in the params block (we already do it for the output block), we can even preserve the dataflow concurrency

@pditommaso pditommaso left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, Ben — the dataflow-composition motivation and the rejection of the runtime-inheritance hybrid are both nicely argued. A few thoughts to share before this moves past draft:

1. The key technical challenge could be expanded. At its core this proposes a mechanism to include a fully-fledged Nextflow workflow into another, mimicking how we already include modules and sub-workflows. The part I'd love to see fleshed out is how channels and values get bound into the included workflow's inputs. The ADR sets out the policy (params live at the top level, the core workflow consumes everything via take:) but doesn't yet describe the binding mechanics: how a scalar value vs. a streaming channel is bound at the call site, the value-channel/queue-channel broadcast semantics, and whether a typed take: can accept a bare value type like String/Path. The example here (take: aligner: String) also reads a bit differently from the typed-workflows ADR, where every take: input is a channel type. Since this binding question largely determines feasibility, it'd be great to work it out explicitly.

2. The nomenclature can be better shaped. The document moves between "meta-pipeline" and "remote pipeline", and I think the framing could be sharpened. Terms like workflow modularisation / workflow inclusion / workflow composition might describe what's happening (composing one workflow into another) more directly than introducing a new "meta-pipeline" category.

3. There's some overlap with existing sub-workflow inclusion. Once you discard the entry workflow, params, and output block and import only the core workflow, what's left looks a lot like a sub-workflow. It'd be helpful to clarify how this differs from including a remote sub-workflow, and what the main benefit is that justifies a separate mechanism (separate storage layout, a new nextflow_spec.json, a separate CLI, etc.).

4. A possible framing. I'd lean toward framing the next step as enabling remote sub-workflows — the natural progression after remote modules (processes). Module (process) → sub-workflow → composition feels like a clean, incremental story that reuses the conventions we already have, rather than introducing a "pipeline" as a new top-level artifact with its own resolution rules, storage path, and spec file. If we get remote sub-workflow inclusion right, "meta-pipelines" might largely fall out of it as a usage pattern rather than a new concept.

@bentsherman

Copy link
Copy Markdown
Member Author

@pditommaso thanks for the review

The part I'd love to see fleshed out is how channels and values get bound into the included workflow's inputs.

There isn't much to say here because it just works like normal. In the appendix example, NFCORE_RNASEQ is just a named workflow. The meta-pipeline calls it the same way that rnaseq would call it. The only difference is that some inputs might come from upstream outputs instead of params.

... how a scalar value vs. a streaming channel is bound at the call site, the value-channel/queue-channel broadcast semantics, and whether a typed take: can accept a bare value type like String/Path.

A workflow take can be a channel, a dataflow value, or a regular value. This is how it has always worked

The document moves between "meta-pipeline" and "remote pipeline", and I think the framing could be sharpened. Terms like workflow modularisation / workflow inclusion / workflow composition might describe what's happening (composing one workflow into another) more directly than introducing a new "meta-pipeline" category.

"Meta-pipeline" is the top-line feature that everyone is after, but the only actual new feature proposed by the ADR is "remote pipeline inclusion" -- how to install a pipeline as a component and keep it in sync with the source. This is why the ADR is titled "Remote pipeline inclusion". Once you have that, everything else is just normal workflow composition and convention.

They are distinct concepts -- the ADR does not treat them as interchangeable.

Once you discard the entry workflow, params, and output block and import only the core workflow, what's left looks a lot like a sub-workflow. It'd be helpful to clarify how this differs from including a remote sub-workflow, and what the main benefit is that justifies a separate mechanism (separate storage layout, a new nextflow_spec.json, a separate CLI, etc.).

The core workflow looks like a subworkflow because it is a subworkflow 😄

The only new thing that we introduce here is installing a pipeline into a project as a component and keeping it in sync with the remote source (either from Git or the registry). For that you likely need a pipeline spec (version, checksum) and a CLI (installing, updating). I just haven't spelled all that out yet because the bigger question right now is how to minimize developer overhead

I'd lean toward framing the next step as enabling remote sub-workflows — the natural progression after remote modules (processes). Module (process) → sub-workflow → composition feels like a clean, incremental story that reuses the conventions we already have, rather than introducing a "pipeline" as a new top-level artifact with its own resolution rules, storage path, and spec file. If we get remote sub-workflow inclusion right, "meta-pipelines" might largely fall out of it as a usage pattern rather than a new concept.

Looks like you arrived at the same place as me. Remote workflows are the real feature, meta-pipelines emerge naturally as a convention on top.

I'm not sure whether it's worth trying to distinguish between pipelines / workflows / subworkflows. They're all basically the same thing. Especially if we add the ability to execute named workflows directly (#7208). The difference boils down to boilerplate, which we want to minimize anyway

This is why I just talk about "remote pipeline inclusion", because when I import a workflow, I don't really care whether that workflow is a "pipeline" like rnaseq or a "subworkflow" like BAM_STATS_SAMTOOLS. Workflow composition works the same way either way.

Happy to rename the ADR to "remote workflow inclusion" to align with the workflow keyword.

@ewels

ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

My main concern here is versioning of imported workflows. Do we include a lock file or something to ensure consistency or just trust in the files that are copied into the workflow code?

Modules have a .moduleinfo file with a hash to allow checking that stuff wasn't modified. I think I saw something similar mentioned here for pipelines / workflows?

@ewels

ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

Having unpredictable global scope params blocks is just weird and if we were designed Nextflow today we would never include this behaviour. In other languages, globals need to be used with caution and are generally not advised.

@adamrtalbot agreed, I never said global. I would love it if the pipeline config is imported within a dedicated scope and treated as a baseline default. Then the import-ing pipeline can override anything, but doesn't need to duplicate config that isn't being changed.

Doing this would not be trivial. The only way I can think of is to do something fairly radical like rendering the config at import time and saving that to a locked config file somewhere. Or some other crazy mechanism.

@adamrtalbot

Copy link
Copy Markdown
Collaborator

Having unpredictable global scope params blocks is just weird and if we were designed Nextflow today we would never include this behaviour. In other languages, globals need to be used with caution and are generally not advised.

@adamrtalbot agreed, I never said global. I would love it if the pipeline config is imported within a dedicated scope and treated as a baseline default. Then the import-ing pipeline can override anything, but doesn't need to duplicate config that isn't being changed.

Doing this would not be trivial. The only way I can think of is to do something fairly radical like rendering the config at import time and saving that to a locked config file somewhere. Or some other crazy mechanism.

Config or params? In my mind they are very different concepts, I was referring to parameters here.

@adamrtalbot

Copy link
Copy Markdown
Collaborator

Happy to rename the ADR to "remote workflow inclusion" to align with the workflow keyword.

I agree with this. They're all workflows*, the only thing that separates a "pipeline" from a subworkflow is perception.

*except the anonymous entry workflow, which is where the sticky point about params and config comes in 😉

@ewels

ewels commented Jun 11, 2026

Copy link
Copy Markdown
Member

Config or params? In my mind they are very different concepts, I was referring to parameters here.

Ideally params, but might need to be config for all the ext stuff..?

Happy to rename the ADR to "remote workflow inclusion" to align with the workflow keyword.

Yeah as it stands I think this basically boils down to the functionality we already have with nf-core subworkflows, right? Which is quite far from what I think of as meta-pipelines. Still good to have and useful..

@bentsherman

Copy link
Copy Markdown
Member Author

Yeah as it stands I think this basically boils down to the functionality we already have with nf-core subworkflows, right?

Can the nf-core tooling install a workflow from a pipeline repo? e.g. NFCORE_RNASEQ from nf-core/rnaseq? I think that is the main thing that this ADR adds

@bentsherman bentsherman changed the title ADR: Meta-pipelines ADR: Remote pipeline inclusion Jun 11, 2026
Comment on lines +72 to +76
// module
include { BWA_MEM } from 'nf-core/bwa/mem'

// pipeline
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point that makes me hesitant to reframe the ADR as "remote workflow inclusion" -- here we are referencing the pipeline by name (nf-core/rnaseq)

It could be the GitHub repo or an entity in the Nextflow registry, but either way, the pipeline itself plays a role in facilitating the inclusion. Even if we only include the core workflow (NFCORE_RNASEQ), we likely need to store the entire pipeline code in the meta-pipeline repo, because that is the thing that is versioned

As a user, I will want to know that my meta-pipeline is using a specific pipeline version (e.g. nf-core/rnaseq 3.3.0), so in effect we have to say that we are including the entire pipeline

@bentsherman

Copy link
Copy Markdown
Member Author

From our discussion today:

  • define a workflow registry as a distinct concept from modules, but follows similar patterns (spec file, publish/install mechanism)

  • workflow registry encapsulates both "core workflows" and "subworkflows"

  • pipeline developer should isolate core workflow and publish it to the workflow registry, then other pipelines can import it

  • for users who want to daisy-chain pipelines with minimal effort -- just make a pipeline chain (e.g. nf-cascade, platform actions) and sacrifice dataflow concurrency

@edmundmiller edmundmiller left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few thoughts on scope/clarity — overall the direction makes sense to me.


Nextflow supports reusing process definitions via remote *module* inclusion (e.g. `include { BWA_MEM } from 'nf-core/bwa/mem'`), but there is no standard mechanism to reuse an entire *pipeline* as a building block. Users must either fork and copy code, or compose/chain multiple `nextflow run` sessions which forfeits dataflow composition.

The natural unit of reuse for a pipeline is its *core workflow* -- the named workflow that takes and emits channels (e.g. `NFCORE_RNASEQ` in `nf-core/rnaseq`) -- as distinct from the deployment shell around it (the `params`, entry workflow, `output` block, and config). The nf-core community has already structured their pipelines around this split in anticipation of meta-pipelines.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth clarifying whether this means nf-core pipelines are already generally import-ready today, or that they are moving toward this split. the rest of the ADR depends pretty heavily on a clean core-workflow / shell boundary.


- **Preserve dataflow composition**: the included pipeline participates in the meta-pipeline's dataflow graph (same session, same DAG, same work dir), enabling incremental reaction to emitted outputs.

- **Preserve reproducibility**: an included pipeline should produce the exact same results as it would when executed directly. Transitive dependencies should not be silently altered to reduce duplication.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe qualify “exact same results” a bit here? since params, config, outputs, plugins, and publishing are not inherited, direct-run equivalence seems to require the meta-pipeline to explicitly reproduce the relevant shell behavior.


## Decision

Allow remote pipelines to be included into meta-pipelines, using the same namespacing conventions and include syntax as modules. Store the included pipeline in the meta-pipeline repository under `workflows/<scope>/<name>/` with its own subdirectories for modules and subworkflows. The meta-pipeline owns all top-level concerns (entry workflow, params, outputs, config). Pipelines should be written with a self-contained core workflow to make importing as easy as possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this boundary makes sense to me. might be worth spelling out the implication that this is a contract for pipelines that want to be composable, rather than automatic reuse of arbitrary existing pipelines.

include { NFCORE_RNASEQ } from 'nf-core/rnaseq'
```

Including a remote pipeline is equivalent to including the top-level `main.nf` of that pipeline; any named workflow defined there can be included by name. By convention, the main script defines only the entry workflow and the core workflow (e.g. `NFCORE_RNASEQ` in nf-core/rnaseq). From this point, the included workflow can be called like any other workflow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this clarify what happens to the unnamed entry workflow / top-level params / output block when including the top-level main.nf? “equivalent to including main.nf” sounds a little like ordinary script inclusion, but the model here discards the shell.


When a pipeline is included, it is vendored into the meta-pipeline project under `workflows/<scope>/<name>/`. Included pipelines are isolated -- each included pipeline has its own `modules/` and `workflows/` directories. This way, two pipelines can use different versions of the same module without compromising reproducibility.

Included pipelines should be committed to the meta-pipeline repository. The pipeline should have a *pipeline spec* (`nextflow_spec.json`) which specifies the pipeline version, so that Nextflow can track local changes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should nextflow_spec.json align with the module metadata shape from the module-system ADR, or is this intentionally a separate pipeline-level spec? a short note would help avoid seeming like two parallel metadata conventions.


### Sourcing from Nextflow registry vs Git repositories

Pipelines could be stored in the Nextflow registry (as a new artifact type) or fetched directly from Git repositories. The pipeline registry is a potential long-term goal with other use cases, but it likely introduces additional scope that is not strictly related to meta-pipelines. Using existing Git repositories would be an expedient solution for the first iteration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since distribution/resolution/storage are part of the decision, could this ADR define the minimal first slice here? for example, git/tag install into workflows/<scope>/<name>/, with registry support left as a later extension.

@pditommaso

Copy link
Copy Markdown
Member
  • define a workflow registry as a distinct concept from modules, but follows similar patterns (spec file, publish/install mechanism)

I'm still convinced that this isn't a distinct concept — it's the same entity at a different granularity, and the framing matters because it drives how much we end up building.

Logical view. A module is just a component with a well-defined input/output interface that hides its internals — it's self-contained. By that definition a process, a subworkflow, and a pipeline's core workflow are all modules; they differ only in granularity, not in kind. We already agreed upthread that "they're all basically the same thing." If the abstraction is identical, introducing a second top-level entity ("workflow") next to "module" creates a distinction the type system and the user's mental model don't actually need.

Tactical view. Making it a distinct concept means re-implementing the entire surface twice — for module and for workflow — for operations that are literally identical:

create      Create a new skeleton
install     Install from the registry
run         Run directly from the registry
list        List installed
remove      Remove an installed entry
search      Search the registry
view        Show info and usage template
publish     Publish to the registry
spec        Generate a meta.yml spec
validate    Validate structure and metadata

That's two CLIs, two sets of registry APIs, two spec formats, two sets of Platform integrations, two docs trees — to express what is, behaviorally, one operation set over one kind of artifact. The cost is recurring (every new capability has to be added in both places and kept in sync), not one-off.

Proposal. Keep a single registry and a single command group over one artifact: the module. If we need to differentiate a coarse-grained "core workflow" from a fine-grained process, do it with a type/kind field in the spec (and resolution conventions), not with a parallel system. The publish/install/spec mechanism is shared by construction; the differences (e.g. a core workflow vendoring its own modules/) become metadata and layout conventions on the same entity rather than a second concept.

This also keeps the door open for #7208 (running named workflows directly) without forking the model — "run a module" and "run a workflow" stay the same command.

@pditommaso

Copy link
Copy Markdown
Member

Note, my previous comment is based on the assumption is focused on sub-workflow only, not plain pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants