Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@erikvanoosten
Copy link
Contributor

@erikvanoosten erikvanoosten commented Mar 14, 2025

Add ZStream.mapZIOChunked as an alternative to mapZIO that does not destroy the chunking structure.

Co-authored-by: Matthias Berndt [email protected]
Co-authored-by: Paul Daniels [email protected]
Co-authored-by: Regis Kuckaertz [email protected]

Copy link
Contributor

@mberndt123 mberndt123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this not good:

  • Unlike mapZIO processing is done chunk by chunk. When f fails for an
  • element in the middle of a Chunk, the other elements of the chunk are lost.

If processing fails in the middle of a chunk, the stream should first emit the elements where processing succeeded and then fail the stream.

@calvinlfer
Copy link
Member

calvinlfer commented Mar 14, 2025

I think this not good:

  • Unlike mapZIO processing is done chunk by chunk. When f fails for an
  • element in the middle of a Chunk, the other elements of the chunk are lost.

If processing fails in the middle of a chunk, the stream should first emit the elements where processing succeeded and then fail the stream.

the act of doing this breaks chunking structure though

@mberndt123
Copy link
Contributor

the act of doing this breaks chunking structure though

No, not if you implement it right.

@mberndt123
Copy link
Contributor

I've written an implementation that will retain chunking while also emitting every element that was processed:
https://github.com/mberndt123/zio/tree/mapziochunked

I'd open a merge request to @erikvanoosten's repo, but for whatever reason, I can't create a PR targeting his ZIO fork.

@erikvanoosten
Copy link
Contributor Author

erikvanoosten commented Mar 15, 2025

I've written an implementation that will retain chunking while also emitting every element that was processed: https://github.com/mberndt123/zio/tree/mapziochunked

I'd open a merge request to @erikvanoosten's repo, but for whatever reason, I can't create a PR targeting his ZIO fork.

Thanks @mberndt123 ! I have cherry picked your implementation and updated the scaladoc and unit test. I was playing around with foldLeft to get the same result, but your solution is simpler.

regiskuckaertz
regiskuckaertz previously approved these changes Mar 16, 2025
@erikvanoosten
Copy link
Contributor Author

@regiskuckaertz Thanks for your approval. Can you merge as well please?

@erikvanoosten
Copy link
Contributor Author

erikvanoosten commented Mar 17, 2025

For the ZIO contributor that is reviewing this. @mberndt123 proposed another solution:

    ZPipeline.chunks.mapStream { (chunk: Chunk[In]) =>
      val builder = ChunkBuilder.make[Out](chunk.size)
      def writeNonEmpty = {
        val out = builder.result()
        ZStream.when(out.nonEmpty)(ZStream.fromChunk(out))
      }
      ZStream.unwrap {
        chunk
          .mapZIODiscard(f(_).map(builder += _))
          .foldCause(
            cause => writeNonEmpty ++ ZStream.failCause(cause),
            _ => writeNonEmpty
          )
      }

Please let me know if that has preference and I'll swap it in.

@regiskuckaertz
Copy link
Member

@kyri-petrou I feel like I'm not in a position to merge anymore but I feel like @erikvanoosten has exercised all due diligence and this has my vote, for what it's worth.

regiskuckaertz
regiskuckaertz previously approved these changes Mar 18, 2025
@mberndt123
Copy link
Contributor

@paulpdaniels, a thumbs-down emoji without further explanation is a bit disappointing. My implementation is significantly terser and relies on higher level combinators (i. e. mapStream), both of which are significant maintainability advantages. If you're going to argue performance, then I'd like to see some evidence. Using low-level ZChannel primitives doesn't magically make code faster.

@mberndt123
Copy link
Contributor

mberndt123 commented Mar 18, 2025

I've done some cursory benchmarks with this thing in StreamBenchmarks.scala:

  @Benchmark
  def zioMapZIOChunked: Unit = {
    val stream = ZStream
      .fromChunks(zioChunks: _*)
      .mapZIOChunked(i => ZIO.succeed((i * i).toLong))
    
    val result = stream.runDrain

    unsafeRun(result)
  }

// run with `benchmarks/Jmh/run zio.StreamBenchmarks.zioMapZIOChunked`

I didn't see any meaningful (i. e. beyond noise) performance difference between the different implementations.

@paulpdaniels
Copy link
Contributor

paulpdaniels commented Mar 18, 2025

@mberndt123

then I'd like to see some evidence

Running your above example with the two implementations (current vs proposed) side-by-side

// Using  17.0.12-amzn
// benchmarks/jmh:run .*StreamBenchmarks.zioPipelineMapChunked
chunkCount=10000 chunkSize=5000: current impl faster ~15% 
chunkCount=5000 chunkSize=5000: current impl faster ~12%
chunkCount=5000 chunkSize=50: current impl faster ~450%
chunkCount=5000 chunkSize=5: current impl faster ~1000% (10x)
chunkCount=5000 chunkSize=1: current impl faster ~1300% (13x)

🪄

I haven't run a memory profile on these, but I would expect it would lean toward the current implementation as well.

and relies on higher level combinators

This is a mistake for low-level operators, for instance mapStream is meant as an interop operator for user-land code that has a Stream but wants it to work in a pipeline without having to unpack it into a ZChannel. You can very quickly check these operators implementations to see the amount of extra work the operator would have to do if they were included.

both of which are significant maintainability advantages

The current implementation is written in line with the rest of the library, consistency is just as important in maintainability.

This does open up some questions in my mind about why the performance is so bad in some of the cases above (perhaps they are too naively reliant on other operators), certainly some fruitful avenues to explore on how to optimize those!

Edit

There is a case where the proposed is faster than the current:

// Seems consistently faster, though for some reason current implementation generates a lot variance with this one.
chunkCount=5000 chunkSize=0: proposed impl faster ~2%

@erikvanoosten
Copy link
Contributor Author

@kyri-petrou this PR is ready for merging!

kyri-petrou
kyri-petrou previously approved these changes Mar 24, 2025
Copy link
Contributor

@kyri-petrou kyri-petrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is amazing! Just a minor comment as I think it'd be good to increase awareness that this method exists

erikvanoosten and others added 4 commits March 24, 2025 18:01
Add `ZStream.mapZIOChunked` as an alternative to `mapZIO` that does not destroy the chunking structure.
@erikvanoosten
Copy link
Contributor Author

erikvanoosten commented Mar 24, 2025

@kyri-petrou

Can we add a @ see [[mapZIOChunked]] to the scaladoc of mapZIO as well?

We could introduce some more documentation around these chunk-breakers on https://zio.dev as well. This zio-kafka page could be used as a start (see the page source for hidden sections).

Copy link
Contributor

@kyri-petrou kyri-petrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this 🚀

@kyri-petrou
Copy link
Contributor

We could introduce some more documentation around these chunk-breakers on https://zio.dev/ as well

Yeah I agree. I'll create an issue for it

ZChannel.readWithCause(
chunk =>
ZChannel.unwrap {
val builder = ChunkBuilder.make[Out](chunk.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side-note: I wonder if we could reuse the ChunkBuilder after calling result() on it but maybe we can re-visit that in a followup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a bit and I think it is not going to help. The thing is, as long as the stream is not failing or ending, we give the chunk builder a 'sizeHint' that is spot-on; the backing array will be filled to exactlty that size. You should know that if the ArrayBuilder fills up the array completely, it will hand out that array in result and then builds a new array when the builder is re-used. In other words, the array is always instantiated anew and we might as well also construct a new ChunkBuilder so that we get the small benefit of using memory that is local to the CPU core that this is running on.

@kyri-petrou kyri-petrou merged commit c194b2e into zio:series/2.x Mar 28, 2025
18 checks passed
@erikvanoosten erikvanoosten deleted the mapziochunked branch March 29, 2025 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants