-
Couldn't load subscription status.
- Fork 1.4k
Add ZStream.mapZIOChunked
#9697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this not good:
- Unlike
mapZIOprocessing is done chunk by chunk. Whenffails for an- element in the middle of a Chunk, the other elements of the chunk are lost.
If processing fails in the middle of a chunk, the stream should first emit the elements where processing succeeded and then fail the stream.
the act of doing this breaks chunking structure though |
No, not if you implement it right. |
|
I've written an implementation that will retain chunking while also emitting every element that was processed: I'd open a merge request to @erikvanoosten's repo, but for whatever reason, I can't create a PR targeting his ZIO fork. |
Thanks @mberndt123 ! I have cherry picked your implementation and updated the scaladoc and unit test. I was playing around with foldLeft to get the same result, but your solution is simpler. |
6ed1be2 to
fcbb01e
Compare
|
@regiskuckaertz Thanks for your approval. Can you merge as well please? |
|
For the ZIO contributor that is reviewing this. @mberndt123 proposed another solution: ZPipeline.chunks.mapStream { (chunk: Chunk[In]) =>
val builder = ChunkBuilder.make[Out](chunk.size)
def writeNonEmpty = {
val out = builder.result()
ZStream.when(out.nonEmpty)(ZStream.fromChunk(out))
}
ZStream.unwrap {
chunk
.mapZIODiscard(f(_).map(builder += _))
.foldCause(
cause => writeNonEmpty ++ ZStream.failCause(cause),
_ => writeNonEmpty
)
}Please let me know if that has preference and I'll swap it in. |
|
@kyri-petrou I feel like I'm not in a position to merge anymore but I feel like @erikvanoosten has exercised all due diligence and this has my vote, for what it's worth. |
|
@paulpdaniels, a thumbs-down emoji without further explanation is a bit disappointing. My implementation is significantly terser and relies on higher level combinators (i. e. |
|
I've done some cursory benchmarks with this thing in @Benchmark
def zioMapZIOChunked: Unit = {
val stream = ZStream
.fromChunks(zioChunks: _*)
.mapZIOChunked(i => ZIO.succeed((i * i).toLong))
val result = stream.runDrain
unsafeRun(result)
}
// run with `benchmarks/Jmh/run zio.StreamBenchmarks.zioMapZIOChunked`I didn't see any meaningful (i. e. beyond noise) performance difference between the different implementations. |
Running your above example with the two implementations (current vs proposed) side-by-side 🪄 I haven't run a memory profile on these, but I would expect it would lean toward the current implementation as well.
This is a mistake for low-level operators, for instance
The current implementation is written in line with the rest of the library, consistency is just as important in maintainability. This does open up some questions in my mind about why the performance is so bad in some of the cases above (perhaps they are too naively reliant on other operators), certainly some fruitful avenues to explore on how to optimize those! Edit There is a case where the proposed is faster than the current: |
|
@kyri-petrou this PR is ready for merging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is amazing! Just a minor comment as I think it'd be good to increase awareness that this method exists
Add `ZStream.mapZIOChunked` as an alternative to `mapZIO` that does not destroy the chunking structure.
16ded37
734c78e to
16ded37
Compare
We could introduce some more documentation around these chunk-breakers on https://zio.dev as well. This zio-kafka page could be used as a start (see the page source for hidden sections). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this 🚀
Yeah I agree. I'll create an issue for it |
| ZChannel.readWithCause( | ||
| chunk => | ||
| ZChannel.unwrap { | ||
| val builder = ChunkBuilder.make[Out](chunk.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side-note: I wonder if we could reuse the ChunkBuilder after calling result() on it but maybe we can re-visit that in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a bit and I think it is not going to help. The thing is, as long as the stream is not failing or ending, we give the chunk builder a 'sizeHint' that is spot-on; the backing array will be filled to exactlty that size. You should know that if the ArrayBuilder fills up the array completely, it will hand out that array in result and then builds a new array when the builder is re-used. In other words, the array is always instantiated anew and we might as well also construct a new ChunkBuilder so that we get the small benefit of using memory that is local to the CPU core that this is running on.
Add
ZStream.mapZIOChunkedas an alternative tomapZIOthat does not destroy the chunking structure.Co-authored-by: Matthias Berndt [email protected]
Co-authored-by: Paul Daniels [email protected]
Co-authored-by: Regis Kuckaertz [email protected]