Add Nextflow module info to lineage TaskRun entry#7160
Conversation
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
pditommaso
left a comment
There was a problem hiding this comment.
Code review
Solid feature overall — happy/edge paths are covered and the .module-info + meta.yml AND-check correctly disambiguates remote modules from local includes (since setModule(true) is called by IncludeDef/ScriptLoaderV1/ScriptLoaderV2 for any include). Backward compatibility of the TaskRun positional constructor is preserved via Groovy's @Canonical defaults=true semantics.
Requesting changes for the following:
Important
-
Risk of persisting
"null@null"—LinObserver.groovy:315return "${spec.name}@${spec.version}"
ModuleSpecFactory.fromYamldoesspec.name = data.name as String(and same forversion). A valid-YAML manifest that happens to be missing those keys silently persists the GString"null@null"into the lineage record. Suggest:if( !spec.name || !spec.version ) { log.warn "Incomplete module manifest at '${manifestPath.toUriString()}'" return null } return "${spec.name}@${spec.version}".toString()
The explicit
.toString()is safer for downstream serializers (Kryo) than letting aGStringflow into aStringfield. -
Per-task overhead with no caching —
storeTaskRunruns on everyonTaskComplete. Each call does 2Files.exists+ a YAML parse on potentially remote FS. For workflows with thousands of tasks reusing the same module, this is O(tasks) instead of O(modules). Modules are immutable for the duration of a run, so caching is safe. Easiest fix: annotateextractModuleInfo(Path scriptPath)with@groovy.transform.Memoized— keyed on the script path, this turns it into a one-shot lookup per module. -
catch( Throwable e )is too broad —LinObserver.groovy:317. CatchesOutOfMemoryError,StackOverflowError,ThreadDeath, etc. UseException(orIOException | RuntimeExceptionif you want explicitness).
Minor
-
Test coverage gap on serialization — All 5 new tests exercise
getTaskModule(task)in isolation with mocks; they do not coverstoreTaskRunend-to-end nor the new field's encode/decode roundtrip. Consider extendingLinEncoderTest'sshould encode and decode TaskRunto assert themodulefield survives the roundtrip. -
Malformed-YAML test doesn't assert the warning log — Acceptable, but a regression that silently drops the warning would go unnoticed. A
MemoryAppenderassertion would lock it in. -
Field naming —
moduleis fine but slightly ambiguous given the richnextflow.module.*package.moduleRef/moduleIdwould be more self-documenting and align withworkflowRun(a reference id) rather than the noun "module". Not blocking. -
Style nit —
LinObserver.groovy:301: missing space before{—extractModuleInfo(Path scriptPath){. -
Uncovered edge cases — symlinked module paths and subworkflow includes work with the current implementation (verified by reading
loadModuleV2+ScriptMeta), but a brief integration-style test would lock that behavior in.
Verification done
- Read
LinObserver.groovy,TaskRun.groovy,IncludeDef.groovy,ScriptMeta.groovy,ModuleSpecFactory.groovy. grep -rn 'setModule\s*('— confirmed local + remote includes both setmodule=true.grep -rn 'new TaskRun('in test sources — confirmed pre-existing positional callers still compile underdefaults=true../gradlew :nf-lineage:test— green.
Overall: changes requested, non-blocking. Once 1–3 are addressed, this is ready to merge.
- Guard against incomplete manifest persisting 'null@null'; return null + warn when name/version missing, and force String for Kryo - Cache extractModuleInfo with @memoized (O(modules) not O(tasks)) - Narrow catch(Throwable) to catch(Exception) - Rename TaskRun field 'module' -> 'moduleId' (and getTaskModule -> getTaskModuleId) to align with the workflowRun reference convention - Style: add missing space before method body brace - Tests: assert warning logs via ListAppender, add incomplete-manifest case, and cover moduleId encode/decode roundtrip in LinEncoderTest Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Signed-off-by: jorgee <[email protected]>
Drop the warning when the manifest is missing name/version; just return null to avoid persisting 'null@null'. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Signed-off-by: jorgee <[email protected]>
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Signed-off-by: jorgee <[email protected]>
|
Thanks for the review — all points addressed in the latest pushes: Important
Minor |
pditommaso
left a comment
There was a problem hiding this comment.
Thanks @jorgee for addressing all the review points — the null-safety fix, memoization, and the extra test coverage look great. LGTM 👍
Summary
modulefield to the lineageTaskRunmodel (modules/nf-lineage/.../model/v1beta1/TaskRun.groovy) that records the remote Nextflow module defining the process executed by a task, encoded asname@version(e.g.nf-core/[email protected]).LinObserver.getTaskModulewalkstask.processor.getOwnerScript()→ScriptMetato locate the script andLinObserver.extractModuleInforeads the module'smeta.ymlnext to the.module-infomarker (the latter identifies the directory as a Nextflow-managed remote module rather than any directory that happens to contain ameta.yml).try/catchand logged as a warning — a corruptmeta.ymlwill not fail the task complete handler. The field isnullfor local includes and for processes defined in the main script.TaskRun's@Canonicalconstructor, preserving backward compatibility with existing positional constructor callers.Test plan
./gradlew :nf-lineage:test— full nf-lineage suite greenmeta.yml+.module-info→name@version), no owner script, script not marked as module, script marked as module but no manifest (local include), and malformedmeta.yml(warn + null)nf-coremodule) with lineage enabled and confirm themodulefield is populated in the persistedTaskRunentrymodulefield isnull🤖 Generated with Claude Code