Fix S3FileSystemProvider.newInputStream() draining full object on close#7046
Merged
Conversation
…stemProvider.newInputStream() Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Member
The only documented way to stream files in the Nextflow std lib is withReader, which provides a BufferedReader in a closure and automatically closes it So if anyone is calling |
bentsherman
approved these changes
Apr 17, 2026
bentsherman
pushed a commit
that referenced
this pull request
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #7043.
Summary
S3FileSystemProvider.newInputStream()previously returned the rawResponseInputStream<GetObjectResponse>from the AWS SDK. When a caller closed the stream without having read it to EOF (e.g.path.withInputStream { readLine() }), Apache HTTP client'sContentLengthInputStream.close()would drain the remaining response body to release the connection back to the pool. For a multi-GB S3 object this blocked the caller for many minutes. On a small head node (e.g. 2 vCPUs) the blocked actor thread serialised the pipeline, producing a completely silent gap of tens of minutes in the log (onlyTaskPollingMonitordumps every 5 minutes).Fix
Wrap the returned stream so that
close()callsResponseInputStream.abort()on the underlying stream instead of triggering the Apache drain. The AWS SDK v2ResponseInputStreamexposesabort()exactly for this case (close the HTTP connection without draining).Test
Adds
S3InputStreamAbortTest— a regression test against the public ~1GB FASTQ ats3://ngi-igenomes/test-data/sarek/SRR7890919_WES_HCC1395BL-EA_normal_1.fastq.gz(anonymous, eu-west-1). The test reads the first line and closes on a background thread, bounded byFuture.get(30s)(Spock's@Timeout/Thread.interrupt()cannot unblock a native SSL read, so the bound has to be on the caller side). Without the fix the close() drain blows the 30s timeout; with the fix the test completes in seconds.Notes
This is a potentially user-visible semantic change: a caller that relied on
close()to drain and discard the rest of the stream for connection reuse would now get an abort. In practice:abort()closes the pooled connection instead of reusing it, which for an S3 NIO provider accessing arbitrary large objects is the correct trade-off.