Add seqera:// data-links support to nf-tower filesystem#7070
Conversation
Signed-off-by: jorgee <[email protected]>
…ing cache Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Follow-up changes (pushed in
|
List `data-links/<provider>/` via the Platform `search=provider:<provider>` keyword instead of scanning the whole workspace list. Adds SeqeraDataLinkClient.listDataLinksByProvider() and switches the handler to use it, keeping a client-side equality guard against non-exact matches. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Signed-off-by: jorgee <[email protected]>
Provider-side filtering for data-links listing (pushed in
|
|
@pditommaso fixed conflicts with master and run a claude review. Also I included fixes for detected issues and improvements. It is ready for your review. |
pditommaso
left a comment
There was a problem hiding this comment.
Thanks @jorgee for the thorough follow-up work! I reviewed the thread and confirmed all the self-documented fixes have landed:
- Datasets listing now skips-and-logs bad entries instead of aborting the whole listing
- Signed URLs are redacted in errors/logs via
redactUrl() PagedIterableis explicitly single-use (fail-fast on seconditerator())- Provider-side filtering for
data-links/<provider>/vialistDataLinksByProvider, with the client-side equality guard retained
Code and tests look good. One minor leftover: the "View role → clear AccessDeniedException" item in the test plan is still unchecked — the 403 mapping is in place, so just worth confirming manually or deferring explicitly. Approving. 🚀
Summary
Extends the
seqera://NIO filesystem innf-towerwith a second resource type,data-links. Paths of the formseqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path>resolve to files and directories inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes).Listings and attribute queries go through the Platform's
/data-links/{id}/browse[/path]endpoints; byte reads go through pre-signed URLs returned by/data-links/{id}/generate-download-urland fetched with a plain JDKHttpClient. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency is introduced.As part of this change, the existing dataset-specific logic in
SeqeraFileSystemProvider,SeqeraFileSystem, andSeqeraPathis extracted into a realResourceTypeHandlerabstraction;DatasetsResourceHandlerandDataLinksResourceHandlerare the two implementations. The genericfs/classes become resource-type-agnostic for depth ≥ 3 (enforced byResourceTypeAbstractionTest).Design artifacts: spec.md, plan.md, ADR.
Highlights
seqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path>. Provider segments are the lowercaseDataLinkProvider.toString()value (aws,google,azure, …).PagedIterable<T>: a single shared abstraction backs both the workspace data-link list (offset paginated) and data-link content browse (token paginated). The first page is fetched eagerly soIOExceptionsurfaces at the call site, not at the firstIterator.hasNext(). Two named static fetchers (DataLinkListFetcher,DataLinkContentFetcher) own their own cursor state.readAttributeson a sub-path lists the path's parent directory and finds the entry by name; the entry'stype(FILE/FOLDER) is the authoritative signal, and a missing entry →NoSuchFileException. The/browse/{path}response shape alone does not reliably distinguish file/directory/missing paths.SeqeraFileAttributesto each emittedSeqeraPath; the provider also writes resolved attributes back onto the path after a fresh read. SubsequentreadAttributescalls on the same path instance hit the cache (zero API calls).getDataLink(ws, provider, name)issues a combined keyword search (<name> provider:<provider>) so the server returns at most one match.@Memoized, includingnullmisses.SeqeraFileSystemholds theTowerClientdirectly and exposesgetUserId()cached for the lifetime of the FS — the token doesn't change during a pipeline run. User/workspace lookup is shared infrastructure across resource types, not a dataset-client method.credentialsIdforwarding: whenDataLinkDto.credentialsis non-empty, the first credential'sidis forwarded as thecredentialsIdquery parameter on browse and download-URL requests.AbortOperationException; 403 →AccessDeniedException; 404 →NoSuchFileException. Consistent with the dataset client.Mock(TowerClient)). The pre-existing dataset tests are unchanged and continue to pass.Requirements / prerequisites
nf-towerplugin must be enabled withtower.accessToken/TOWER_ACCESS_TOKEN.Known limitations
IOException; Nextflow task retry handles recovery.SeqeraFileAttributes.lastModifiedTime()returnsInstant.EPOCHfor data-link entries.UnsupportedOperationException. The Platform's/data-links/{id}/uploadendpoints are a natural future extension point.Test plan
./gradlew :plugins:nf-tower:test— all 369 tests pass (verified locally)./gradlew :plugins:nf-tower:dependencies --configuration runtimeClasspathshows no new cloud-SDK artifacts (noaws-sdk,google-cloud-storage,azure-*)nextflow fs ls seqera://<org>/<ws>/data-links/*lists providersnextflow fs ls seqera://<org>/<ws>/data-links/<provider>/*lists data-link namesnextflow fs ls seqera://<org>/<ws>/data-links/<provider>/<name>/*lists top-level bucket entriesnextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<file>reportsis directory: falseand the correctsizenextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<dir>reportsis directory: truenextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<missing>raisesNoSuchFileExceptionfile('seqera://…/data-links/<provider>/<name>/path/to/file')using onlyTOWER_ACCESS_TOKENAccessDeniedException