Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add Nextflow module info to lineage TaskRun entry#7160

Merged
bentsherman merged 6 commits into
masterfrom
add-module-in-lineage-task-run
Jun 12, 2026
Merged

Add Nextflow module info to lineage TaskRun entry#7160
bentsherman merged 6 commits into
masterfrom
add-module-in-lineage-task-run

Conversation

@jorgee

@jorgee jorgee commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a new module field to the lineage TaskRun model (modules/nf-lineage/.../model/v1beta1/TaskRun.groovy) that records the remote Nextflow module defining the process executed by a task, encoded as name@version (e.g. nf-core/[email protected]).
  • LinObserver.getTaskModule walks task.processor.getOwnerScript()ScriptMeta to locate the script and LinObserver.extractModuleInfo reads the module's meta.yml next to the .module-info marker (the latter identifies the directory as a Nextflow-managed remote module rather than any directory that happens to contain a meta.yml).
  • Manifest parsing is wrapped in try/catch and logged as a warning — a corrupt meta.yml will not fail the task complete handler. The field is null for local includes and for processes defined in the main script.
  • Field is appended at the end of TaskRun's @Canonical constructor, preserving backward compatibility with existing positional constructor callers.

Test plan

  • ./gradlew :nf-lineage:test — full nf-lineage suite green
  • New unit tests cover the happy path (meta.yml + .module-infoname@version), no owner script, script not marked as module, script marked as module but no manifest (local include), and malformed meta.yml (warn + null)
  • Manual: run a pipeline that uses a remote module (e.g. nf-core module) with lineage enabled and confirm the module field is populated in the persisted TaskRun entry
  • Manual: run a pipeline with only local includes / main script and confirm the module field is null

🤖 Generated with Claude Code

@netlify

netlify Bot commented May 20, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit d6bf29a
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a2c1cf1c6c7f60008238e63
😎 Deploy Preview https://deploy-preview-7160--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@jorgee jorgee requested review from bentsherman and pditommaso and removed request for pditommaso May 20, 2026 10:59

@pditommaso pditommaso left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review

Solid feature overall — happy/edge paths are covered and the .module-info + meta.yml AND-check correctly disambiguates remote modules from local includes (since setModule(true) is called by IncludeDef/ScriptLoaderV1/ScriptLoaderV2 for any include). Backward compatibility of the TaskRun positional constructor is preserved via Groovy's @Canonical defaults=true semantics.

Requesting changes for the following:

Important

  1. Risk of persisting "null@null"LinObserver.groovy:315

    return "${spec.name}@${spec.version}"

    ModuleSpecFactory.fromYaml does spec.name = data.name as String (and same for version). A valid-YAML manifest that happens to be missing those keys silently persists the GString "null@null" into the lineage record. Suggest:

    if( !spec.name || !spec.version ) {
        log.warn "Incomplete module manifest at '${manifestPath.toUriString()}'"
        return null
    }
    return "${spec.name}@${spec.version}".toString()

    The explicit .toString() is safer for downstream serializers (Kryo) than letting a GString flow into a String field.

  2. Per-task overhead with no cachingstoreTaskRun runs on every onTaskComplete. Each call does 2 Files.exists + a YAML parse on potentially remote FS. For workflows with thousands of tasks reusing the same module, this is O(tasks) instead of O(modules). Modules are immutable for the duration of a run, so caching is safe. Easiest fix: annotate extractModuleInfo(Path scriptPath) with @groovy.transform.Memoized — keyed on the script path, this turns it into a one-shot lookup per module.

  3. catch( Throwable e ) is too broadLinObserver.groovy:317. Catches OutOfMemoryError, StackOverflowError, ThreadDeath, etc. Use Exception (or IOException | RuntimeException if you want explicitness).

Minor

  1. Test coverage gap on serialization — All 5 new tests exercise getTaskModule(task) in isolation with mocks; they do not cover storeTaskRun end-to-end nor the new field's encode/decode roundtrip. Consider extending LinEncoderTest's should encode and decode TaskRun to assert the module field survives the roundtrip.

  2. Malformed-YAML test doesn't assert the warning log — Acceptable, but a regression that silently drops the warning would go unnoticed. A MemoryAppender assertion would lock it in.

  3. Field namingmodule is fine but slightly ambiguous given the rich nextflow.module.* package. moduleRef / moduleId would be more self-documenting and align with workflowRun (a reference id) rather than the noun "module". Not blocking.

  4. Style nitLinObserver.groovy:301: missing space before {extractModuleInfo(Path scriptPath){.

  5. Uncovered edge cases — symlinked module paths and subworkflow includes work with the current implementation (verified by reading loadModuleV2 + ScriptMeta), but a brief integration-style test would lock that behavior in.

Verification done

  • Read LinObserver.groovy, TaskRun.groovy, IncludeDef.groovy, ScriptMeta.groovy, ModuleSpecFactory.groovy.
  • grep -rn 'setModule\s*(' — confirmed local + remote includes both set module=true.
  • grep -rn 'new TaskRun(' in test sources — confirmed pre-existing positional callers still compile under defaults=true.
  • ./gradlew :nf-lineage:test — green.

Overall: changes requested, non-blocking. Once 1–3 are addressed, this is ready to merge.

jorgee and others added 3 commits June 9, 2026 18:01
- Guard against incomplete manifest persisting 'null@null'; return
  null + warn when name/version missing, and force String for Kryo
- Cache extractModuleInfo with @memoized (O(modules) not O(tasks))
- Narrow catch(Throwable) to catch(Exception)
- Rename TaskRun field 'module' -> 'moduleId' (and getTaskModule ->
  getTaskModuleId) to align with the workflowRun reference convention
- Style: add missing space before method body brace
- Tests: assert warning logs via ListAppender, add incomplete-manifest
  case, and cover moduleId encode/decode roundtrip in LinEncoderTest

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Signed-off-by: jorgee <[email protected]>
Drop the warning when the manifest is missing name/version; just
return null to avoid persisting 'null@null'.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Signed-off-by: jorgee <[email protected]>
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Signed-off-by: jorgee <[email protected]>
@jorgee

jorgee commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review — all points addressed in the latest pushes:

Important

  1. "null@null" riskextractModuleInfo now returns null (with a log.debug) when the manifest is missing name/version, and the success path returns "${spec.name}@${spec.version}".toString() so a GString never reaches the String field / Kryo.
  2. Per-task overheadextractModuleInfo(Path) is now @Memoized, so the Files.exists checks + YAML parse run once per module instead of once per task.
  3. catch(Throwable) — narrowed to catch(Exception).

Minor
4. Serialization coverage — extended LinEncoderTest's TaskRun roundtrip to set and assert workflowRun + moduleId.
5. Warning assertion — added a logback ListAppender helper; the malformed-manifest test now asserts the WARN is emitted. (The incomplete-manifest case is debug-level only, so its test just asserts null.)
6. Field naming — renamed modulemoduleId (and getTaskModulegetTaskModuleId) to align with the workflowRun reference convention.
7. Style nit — added the missing space before the method body brace.

@jorgee jorgee requested a review from pditommaso June 9, 2026 16:18

@pditommaso pditommaso left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorgee for addressing all the review points — the null-safety fix, memoization, and the extra test coverage look great. LGTM 👍

@bentsherman bentsherman merged commit 1be0561 into master Jun 12, 2026
21 of 23 checks passed
@bentsherman bentsherman deleted the add-module-in-lineage-task-run branch June 12, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants