[DRAFT] stream: prototype for new stream implementation#62066
[DRAFT] stream: prototype for new stream implementation#62066jasnell wants to merge 3 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
ronag
left a comment
There was a problem hiding this comment.
Super impressed! This is amazing.
One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).
|
Also could you do some mitata based benchmarks so that we can see the gc and memory pressure relative to node streams? |
|
Another thing, in the async generator case, can we pass an optional AbortSignal? i.e. |
This makes me a bit nervous for code portability. If some one starts working with this in node.js, they would end up writing code that depends on the values being |
benjamingr
left a comment
There was a problem hiding this comment.
just to explore implementation feasibility, performance, etc
Sounds fine as this isn't exposed outside at the time
|
|
||
| // Buffer is full | ||
| switch (this._backpressure) { | ||
| case 'strict': |
There was a problem hiding this comment.
I'm not sure strict should be the default and not block here.
There was a problem hiding this comment.
That'll be a big part of the discussion around this. A big part of the challenge with web streams is that backpressure can be fully ignored. One of the design principles for this new approach is to apply it strictly by default. We'll need to debate this. Recommend opening an issue at https://github.com/jasnell/new-streams
| return this._bytesWritten; | ||
| } | ||
|
|
||
| this._writerState = 'closed'; |
There was a problem hiding this comment.
A lot of these state variables can be optimized to const numbers (or a bit map overall) to make the size of these classes smaller per instance, especially for many small streams
There was a problem hiding this comment.
Yeah, I haven't yet taken an optimization pass over all this. Wanted to focus on correctness and feasibility first. but noted!
| // PushQueue - Internal Queue with Chunk-Based Backpressure | ||
| // ============================================================================= | ||
|
|
||
| class PushQueue { |
There was a problem hiding this comment.
Come to think of it, I'm wondering why a push stream should buffer at all - almost other implementations of push streams I'm aware of don't do this.
There was a problem hiding this comment.
Another good discussion ;-) ... Worth opening an issue at https://github.com/jasnell/new-streams
There was a problem hiding this comment.
that said... it buffers because throughput matters. A zero-buffer rendezvous (producer blocks until consumer takes each chunk) means the producer and consumer are perfectly lock-stepped and neither can work while the other is working. That's correct but slow. The buffer decouples them so work can overlap:
- Producer writes chunk 1 into a slot, immediately starts producing chunk 2
- Consumer reads chunk 1, starts processing it
- Producer finishes chunk 2, drops it in the second slot
- Both sides are working in parallel
Especially in javascript, at least some degree of buffering is required.
The way it works in the new-streams prototype is straightforward. Imagine a bucket being filled by a pipe. The highWaterMark defines how many slots in the bucket there are AND how many slots in the pipe there are. With highWaterMark: 2, the bucket has 2 slots and the pipe has 2 slots. Backpressure is signaled when both the bucket and pipeline slots are full. With strict mode by default, backpressure is signaled by rejecting additional writes. When block mode, the pipeline just keeps growing and you have to manually pay attention to the write promises.
I've tested out a ton of different strategies and this one consistently has demonstrated to be the easiest to reason about and easiest to optimize around.
| * @yields {Uint8Array} | ||
| */ | ||
| async function* flattenTransformYieldAsync(value) { | ||
| if (value instanceof Uint8Array) { |
There was a problem hiding this comment.
All these instanceof checks don't work cross-realm
There was a problem hiding this comment.
Yep, I've got a note on this locally already. If we decide to move forward with this that'll be one of the outstanding issues to address
| return; | ||
| } | ||
| // Check for async iterable first | ||
| if (isAsyncIterable(value)) { |
There was a problem hiding this comment.
Are you sure you want the timing to be different between sync and async here? Async iterables normalize this (you can for await a sync iterable) but the perf might suffer
There was a problem hiding this comment.
That's something that's still to be fully evaluated. I've run maybe a few dozen scenarios through analysis and haven't encountered one yet where it causes an actual problem but it needs to be fully explored.
benjamingr
left a comment
There was a problem hiding this comment.
sorry meant to approve, regardless of design changes/suggestions regarding timing and a lot of other stuff as experimental this is fine.
I would maybe update the docs to emphasize the experimentality even further than normal
Refactors the cancelation per updates in the design doc
|
@ronag ... implemented a couple of mitata benchmarks in the -- Memory Benchmark ResultsEnvironment: Node 25.6.0, Intel Xeon w9-3575X, --expose-gc, mitata with .gc('inner') Per-Operation Allocations (New Streams vs Web Streams)
Pipeline scenarios (pull, pipeTo) show the biggest gains: 16-25x less heap because transforms are inline function calls, not stream-to-stream pipes with internal queues. Push is faster but uses slightly more heap due to batch iteration (Uint8Array[]). Broadcast/tee are comparable at this scale. Sustained Load (97.7 MB volume)
pipeTo and broadcast show the largest sustained-load heap difference. Web Streams' pipeThrough chain buffers ~50% of total volume in flight; new streams' pipeTo pulls synchronously through the transform. Broadcast's shared ring buffer (0.5 MB) vs tee's per-branch queues (42.8 MB). Zero retained memory for both APIs after completion -- no leaks. |
|
@ronag passing a signal to an async generator allows the underlying source to abort it, but we're lacking a builtin way for the consumer iterating the async generator to safely cancel the stream. It can Barring an improvement at the language level, the consumer can only safely cancel the underlying source if it has a reference to an WHATWG Streams don't have this problem if the consumer Happy to create examples to reproduce this if it's not clear what I'm talking about. |
Opening this for discussion. Not intending to land this yet. It adds an implementation of the "new streams" to core and adds support to
FileHandlewith tests and benchmarks just to explore implementation feasibility, performance, etc.It's worth noting that the performance of the
FileHandlebenchmarked added, that reads files, converts them to upper case and then compresses them, is on par with node.js streams and twice as fast as web streams. (tho... web streams are not perf optimized in any way so take that 2x with a grain of salt). The majority of the perf cost in the benchmark is due to compression overhead. Without the compression transform, the new stream can be up to 15% faster than reading the file with classic node.js streams.The main thing this shows is that the new streams impl can (a) perform reasonably and (b) sit comfortably alongside the existing impls without any backwards compat concerns.
Benchmark runs: