Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix S3FileSystemProvider.newInputStream() draining full object on close#7046

Merged
bentsherman merged 2 commits into
masterfrom
7043-fix-s3-newInputStream-close
Apr 17, 2026
Merged

Fix S3FileSystemProvider.newInputStream() draining full object on close#7046
bentsherman merged 2 commits into
masterfrom
7043-fix-s3-newInputStream-close

Conversation

@jorgee

@jorgee jorgee commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Fixes #7043.

Summary

S3FileSystemProvider.newInputStream() previously returned the raw ResponseInputStream<GetObjectResponse> from the AWS SDK. When a caller closed the stream without having read it to EOF (e.g. path.withInputStream { readLine() }), Apache HTTP client's ContentLengthInputStream.close() would drain the remaining response body to release the connection back to the pool. For a multi-GB S3 object this blocked the caller for many minutes. On a small head node (e.g. 2 vCPUs) the blocked actor thread serialised the pipeline, producing a completely silent gap of tens of minutes in the log (only TaskPollingMonitor dumps every 5 minutes).

Fix

Wrap the returned stream so that close() calls ResponseInputStream.abort() on the underlying stream instead of triggering the Apache drain. The AWS SDK v2 ResponseInputStream exposes abort() exactly for this case (close the HTTP connection without draining).

Test

Adds S3InputStreamAbortTest — a regression test against the public ~1GB FASTQ at s3://ngi-igenomes/test-data/sarek/SRR7890919_WES_HCC1395BL-EA_normal_1.fastq.gz (anonymous, eu-west-1). The test reads the first line and closes on a background thread, bounded by Future.get(30s) (Spock's @Timeout/Thread.interrupt() cannot unblock a native SSL read, so the bound has to be on the caller side). Without the fix the close() drain blows the 30s timeout; with the fix the test completes in seconds.

Notes

This is a potentially user-visible semantic change: a caller that relied on close() to drain and discard the rest of the stream for connection reuse would now get an abort. In practice:

  • Callers that consume the whole stream before closing are unaffected (reads still flow through the wrapper normally).
  • Connection reuse still works — abort() closes the pooled connection instead of reusing it, which for an S3 NIO provider accessing arbitrary large objects is the correct trade-off.

@netlify

netlify Bot commented Apr 17, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 7a2e423
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69e2547a18c7830008062a75

@bentsherman

Copy link
Copy Markdown
Member

This is a potentially user-visible semantic change: a caller that relied on close() to drain and discard the rest of the stream for connection reuse would now get an abort.

The only documented way to stream files in the Nextflow std lib is withReader, which provides a BufferedReader in a closure and automatically closes it

So if anyone is calling close(), they are operating outside the standard library anyway

@bentsherman bentsherman changed the title [nf-amazon] Fix S3FileSystemProvider.newInputStream() draining full object on close Fix S3FileSystemProvider.newInputStream() draining full object on close Apr 17, 2026
@bentsherman bentsherman merged commit cf38676 into master Apr 17, 2026
24 of 25 checks passed
@bentsherman bentsherman deleted the 7043-fix-s3-newInputStream-close branch April 17, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3FileSystemProvider.newInputStream() drains the full object body on close, blocking for minutes on large files

2 participants