Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add module directive to lineage TaskRun record#7203

Open
pinin4fjords wants to merge 2 commits into
nextflow-io:masterfrom
pinin4fjords:lineage-taskrun-module
Open

Add module directive to lineage TaskRun record#7203
pinin4fjords wants to merge 2 commits into
nextflow-io:masterfrom
pinin4fjords:lineage-taskrun-module

Conversation

@pinin4fjords

Copy link
Copy Markdown
Contributor

Closes part of #7202.

The task cache hash keys on environment module directives (TaskHasher adds task.config.getModule() to the hash), but the lineage TaskRun record had no field for them. As a result, comparing two task runs with nextflow lineage diff could not surface a module change that invalidated the cache, even though the record already captures the other environment inputs (conda, spack, container, architecture).

This adds a module field to the TaskRun v1beta1 model and populates it in LinObserver, alongside the existing environment fields.

Example

process GREET {
    module 'samtools/1.9'
    input:  val name
    output: path 'out.txt'
    script: """ echo "Hello ${name}" > out.txt """
}
workflow { GREET(Channel.of('world')) }

Changing samtools/1.9 to samtools/1.17 and re-running with -resume correctly re-runs the task. Before this change, lineage diff of the two task runs showed only the parent run id changing; now it shows the module field changing.

Scope

This covers the module directive only. The eval output commands and the stub-run flag (the other cache-hash inputs missing from the record, per #7202) are left out: representing eval commands needs a design decision, and the stub flag is partially covered already via codeChecksum switching to the stub source. Those remain tracked in #7202.

This makes the record consistent with that cache-hash input; it does not make lineage diff a full substitute for -dump-hashes, since the record stores resolved values rather than the ordered hash-key list and DataPath checksums always use the default hash mode.

Tests

  • LinEncoderTest now round-trips a populated module field.
  • LinObserverTest, LinCommandImplTest updated for the new field.
  • LinTypeAdapterFactoryTest confirms records without a module field still decode (backward compatible).

@netlify

netlify Bot commented Jun 8, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 3c82cca
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a2be7ea8ea46200082e1c4a
😎 Deploy Preview https://deploy-preview-7203--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

The task cache hash keys on environment module directives, but the
lineage TaskRun record had no field for them, so comparing two task
runs could not surface a module change that invalidated the cache.

Record the module directive alongside the other environment fields
(conda, spack, container, architecture) so the lineage record reflects
this cache-hash input.

Signed-off-by: Jonathan Manning <[email protected]>
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@pinin4fjords pinin4fjords force-pushed the lineage-taskrun-module branch from 07fee3e to de22609 Compare June 8, 2026 11:33
@jorgee

jorgee commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Uses same name as Nextflow modules in #7160

@jorgee jorgee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to envModule

Comment thread modules/nf-lineage/src/main/nextflow/lineage/model/v1beta1/TaskRun.groovy Outdated
Comment thread modules/nf-lineage/src/test/nextflow/lineage/serde/LinEncoderTest.groovy Outdated
Co-authored-by: Jorge Ejarque <[email protected]>
Signed-off-by: Jonathan Manning <[email protected]>
@pditommaso

Copy link
Copy Markdown
Member

Let's keep envModule rename in its own PR, likely this #7205

@pditommaso

Copy link
Copy Markdown
Member

Is this PR not duplicate of #7160?

@jorgee

jorgee commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

No, this refers to environment modules, and #7160 is about adding the Nextflow module id in the lineage record.

@pditommaso

Copy link
Copy Markdown
Member

Umm, not sure it should be included. Lineage is not a replacement for canonical task info

@bentsherman

Copy link
Copy Markdown
Member

@pditommaso actually we did say that the TaskRun record should contain all of the task hash components so that nextflow lineage diff can show the difference when a task is re-executed

By that logic we should also include environment modules... even though I would hope they were extinct by now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lineage TaskRun record omits cache-determining inputs (module, eval, stub), so 'lineage diff' can't explain those cache misses

4 participants