Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DRAFT] stream: prototype for new stream implementation#62066

Draft
jasnell wants to merge 3 commits intonodejs:mainfrom
jasnell:jasnell/new-streams-prototype
Draft

[DRAFT] stream: prototype for new stream implementation#62066
jasnell wants to merge 3 commits intonodejs:mainfrom
jasnell:jasnell/new-streams-prototype

Conversation

@jasnell
Copy link
Member

@jasnell jasnell commented Mar 1, 2026

Opening this for discussion. Not intending to land this yet. It adds an implementation of the "new streams" to core and adds support to FileHandle with tests and benchmarks just to explore implementation feasibility, performance, etc.

It's worth noting that the performance of the FileHandle benchmarked added, that reads files, converts them to upper case and then compresses them, is on par with node.js streams and twice as fast as web streams. (tho... web streams are not perf optimized in any way so take that 2x with a grain of salt). The majority of the perf cost in the benchmark is due to compression overhead. Without the compression transform, the new stream can be up to 15% faster than reading the file with classic node.js streams.

The main thing this shows is that the new streams impl can (a) perform reasonably and (b) sit comfortably alongside the existing impls without any backwards compat concerns.

Benchmark runs:

fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="classic": 0.4520276595366672
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="classic": 0.5974527572097321
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="classic": 0.6425952035725405
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="webstream": 0.1911778984563999
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="webstream": 0.2179878501077266
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="webstream": 0.2446390516960688
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="pull": 0.5118129753083176
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="pull": 0.6280697056085692
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="pull": 0.596177892010514
--- 
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="classic": 0.44890689503274533
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="classic": 0.5922959407897667
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="classic": 0.6151916200977057
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="webstream": 0.22796906713941217
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="webstream": 0.2517499148269662
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="webstream": 0.2613608248108332
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="pull": 0.4725187688512099
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="pull": 0.5180217625521253
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="pull": 0.616770183722841

@jasnell jasnell requested review from mcollina and ronag March 1, 2026 18:37
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/performance
  • @nodejs/streams

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 1, 2026
mohityadav8

This comment was marked as off-topic.

Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super impressed! This is amazing.

One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).

@ronag
Copy link
Member

ronag commented Mar 2, 2026

Also could you do some mitata based benchmarks so that we can see the gc and memory pressure relative to node streams?

@ronag
Copy link
Member

ronag commented Mar 2, 2026

Another thing, in the async generator case, can we pass an optional AbortSignal? i.e. async function * (src, { signal }). We maybe could even check the function signature and if it doesn't take a second parameter don't allocate the abort controller at all.

@jasnell
Copy link
Member Author

jasnell commented Mar 2, 2026

One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).

This makes me a bit nervous for code portability. If some one starts working with this in node.js, they would end up writing code that depends on the values being Buffer and not just Uint8Array. They go to move that to another runtime implementation or standalone impl like https://github.com/jasnell/new-streams and suddenly that assumption breaks.

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to explore implementation feasibility, performance, etc

Sounds fine as this isn't exposed outside at the time


// Buffer is full
switch (this._backpressure) {
case 'strict':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure strict should be the default and not block here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll be a big part of the discussion around this. A big part of the challenge with web streams is that backpressure can be fully ignored. One of the design principles for this new approach is to apply it strictly by default. We'll need to debate this. Recommend opening an issue at https://github.com/jasnell/new-streams

return this._bytesWritten;
}

this._writerState = 'closed';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these state variables can be optimized to const numbers (or a bit map overall) to make the size of these classes smaller per instance, especially for many small streams

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I haven't yet taken an optimization pass over all this. Wanted to focus on correctness and feasibility first. but noted!

// PushQueue - Internal Queue with Chunk-Based Backpressure
// =============================================================================

class PushQueue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, I'm wondering why a push stream should buffer at all - almost other implementations of push streams I'm aware of don't do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good discussion ;-) ... Worth opening an issue at https://github.com/jasnell/new-streams

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that said... it buffers because throughput matters. A zero-buffer rendezvous (producer blocks until consumer takes each chunk) means the producer and consumer are perfectly lock-stepped and neither can work while the other is working. That's correct but slow. The buffer decouples them so work can overlap:

  • Producer writes chunk 1 into a slot, immediately starts producing chunk 2
  • Consumer reads chunk 1, starts processing it
  • Producer finishes chunk 2, drops it in the second slot
  • Both sides are working in parallel

Especially in javascript, at least some degree of buffering is required.

The way it works in the new-streams prototype is straightforward. Imagine a bucket being filled by a pipe. The highWaterMark defines how many slots in the bucket there are AND how many slots in the pipe there are. With highWaterMark: 2, the bucket has 2 slots and the pipe has 2 slots. Backpressure is signaled when both the bucket and pipeline slots are full. With strict mode by default, backpressure is signaled by rejecting additional writes. When block mode, the pipeline just keeps growing and you have to manually pay attention to the write promises.

I've tested out a ton of different strategies and this one consistently has demonstrated to be the easiest to reason about and easiest to optimize around.

* @yields {Uint8Array}
*/
async function* flattenTransformYieldAsync(value) {
if (value instanceof Uint8Array) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these instanceof checks don't work cross-realm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I've got a note on this locally already. If we decide to move forward with this that'll be one of the outstanding issues to address

return;
}
// Check for async iterable first
if (isAsyncIterable(value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want the timing to be different between sync and async here? Async iterables normalize this (you can for await a sync iterable) but the perf might suffer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something that's still to be fully evaluated. I've run maybe a few dozen scenarios through analysis and haven't encountered one yet where it causes an actual problem but it needs to be fully explored.

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry meant to approve, regardless of design changes/suggestions regarding timing and a lot of other stuff as experimental this is fine.

I would maybe update the docs to emphasize the experimentality even further than normal

@jasnell
Copy link
Member Author

jasnell commented Mar 3, 2026

@ronag ... implemented a couple of mitata benchmarks in the https://github.com/jasnell/new-streams repo (the reference impl)

--

Memory Benchmark Results

Environment: Node 25.6.0, Intel Xeon w9-3575X, --expose-gc, mitata with .gc('inner')

Per-Operation Allocations (New Streams vs Web Streams)

Scenario Speed Heap/iter (new) Heap/iter (web)
Push write/read (1K x 4KB) 2.24x faster 2.06 MB 1.43 MB
Pull + transform (1K x 4KB) 2.44x faster 334 KB 5.57 MB
pipeTo + transform (1K x 4KB) 3.15x faster 303 KB 7.47 MB
Broadcast 2 consumers (500 x 4KB) 1.04x faster 1.92 MB 1.81 MB
Large pull 40MB (10K x 4KB) 1.26x faster 2.62 MB 52.35 MB

Pipeline scenarios (pull, pipeTo) show the biggest gains: 16-25x less heap because transforms are inline function calls, not stream-to-stream pipes with internal queues. Push is faster but uses slightly more heap due to batch iteration (Uint8Array[]). Broadcast/tee are comparable at this scale.

Sustained Load (97.7 MB volume)

Scenario Peak Heap (new) Peak Heap (web stream)
pipeTo + transform 6.9 MB 50.6 MB
Broadcast 2 consumers 0.5 MB 42.8 MB
Push write/read 5.9 MB 2.5 MB
Pull + transform 6.1 MB 2.8 MB

pipeTo and broadcast show the largest sustained-load heap difference. Web Streams' pipeThrough chain buffers ~50% of total volume in flight; new streams' pipeTo pulls synchronously through the transform. Broadcast's shared ring buffer (0.5 MB) vs tee's per-branch queues (42.8 MB).

Zero retained memory for both APIs after completion -- no leaks.

@jedwards1211
Copy link

jedwards1211 commented Mar 3, 2026

@ronag passing a signal to an async generator allows the underlying source to abort it, but we're lacking a builtin way for the consumer iterating the async generator to safely cancel the stream. It can .return() its iterator when it's done, but that won't break the async generator and until it receives the next chunk, which isn't guaranteed to happen if the underlying source is something nondeterministic like pubsub events. In this case, there would be leaks that are kind of awkward to blame on user error.

Barring an improvement at the language level, the consumer can only safely cancel the underlying source if it has a reference to an AbortController that signals it.

WHATWG Streams don't have this problem if the consumer .cancel()s their reader, though they do if the consumer is async iterating them.

Happy to create examples to reproduce this if it's not clear what I'm talking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants