ZStream.mapZIOPar[Unordered] optimization attempt + ZIO.forkIn optimization #9801

eyalfa · 2025-04-16T16:11:33Z

this toy branch started as an attempt to optimize ZStream.mapZIOPar and ZStream.mapZIOParUnordered (and the counterpart ZPipeline operations) but ended up with an alternative optimized implementation for ZIO.forkIn.

the idea behind the optimization is to avoid the intermediate additional scope object required by the current implementation and leverage some low level atomic ops to track the fiber and finalizer states.

I've currently modified the two stream operators to leverage this implementation and also added two variations to the ForkAllBenchmark, please see the relevant benchmarks results below:

sbt -J-Xmx4g   'benchmarks/jmh:run -i10  zio.ForkAllBenchmark\..*'

[info] Benchmark                   (count)   Mode  Cnt      Score      Error  Units
[info] ForkAllBenchmark.run              1  thrpt   20  97175.155 ± 5230.725  ops/s
[info] ForkAllBenchmark.run            128  thrpt   20  22355.400 ±  718.012  ops/s
[info] ForkAllBenchmark.run           1024  thrpt   20   3866.213 ±  127.598  ops/s
[info] ForkAllBenchmark.scoped           1  thrpt   20  88549.481 ± 1638.243  ops/s
[info] ForkAllBenchmark.scoped         128  thrpt   20  10970.893 ±  705.796  ops/s
[info] ForkAllBenchmark.scoped        1024  thrpt   20   1480.439 ±  101.085  ops/s
[info] ForkAllBenchmark.scopedAlt        1  thrpt   20  95303.147 ± 1360.062  ops/s
[info] ForkAllBenchmark.scopedAlt      128  thrpt   20  14009.977 ±  545.024  ops/s
[info] ForkAllBenchmark.scopedAlt     1024  thrpt   20   2101.676 ±   76.975  ops/s

ZStream relevant benchmarks, baseline:

sbt -J-Xmx4g  'benchmarks/jmh:run -i20  zio.StreamParBenchmark\.zioMap.*'
[info] Benchmark                              (chunkCount)  (chunkSize)  (parChunkSize)   Mode  Cnt  Score   Error  Units
[info] StreamParBenchmark.zioMapPar                  10000         5000              50  thrpt   20  0.574 ± 0.034  ops/s
[info] StreamParBenchmark.zioMapParUnordered         10000         5000              50  thrpt   20  0.645 ± 0.036  ops/s

ZStream relevan bencharks in this branch:

sbt -J-Xmx4g  'benchmarks/jmh:run -i20  zio.StreamParBenchmark\.zioMap.*'

[info] Benchmark                              (chunkCount)  (chunkSize)  (parChunkSize)   Mode  Cnt  Score   Error  Units
[info] StreamParBenchmark.zioMapPar                  10000         5000              50  thrpt   20  0.751 ± 0.018  ops/s
[info] StreamParBenchmark.zioMapParUnordered         10000         5000              50  thrpt   20  0.821 ± 0.011  ops/s
[success] Total time: 128 s (0:02:08.0), completed 16 Apr 2025, 13:36:29

introducing this work as a draft PR in order to discuss the optimization approach.

…t due to env propagation issues

…ement forkInAlt

…[Unordered]

eyalfa · 2025-04-16T16:12:19Z

@kyri-petrou @ghostdogpr , I'd appreciate your input on this

guizmaii · 2025-04-17T00:53:55Z

FYI, I already provided some optimisation for ZStream.mapZIOParUnordered here: #9602
I'm just waiting for a review but it's a kind of optimisation we already did in some other places. Maybe you can base you work on this PR?

…, but it introduces more concurrent pressure on the scope resulting with inconsistent benchmarks results

… atomics, but it introduces more concurrent pressure on the scope resulting with inconsistent benchmarks results" This reverts commit 1f8aa79.

…e one managing the finalizer from within the child fiber. introduce additional benchmarks showing how can the scope's concurrency limits can be bypassed

eyalfa · 2025-04-20T17:59:11Z

@kyri-petrou , @guizmaii @ghostdogpr ,
I've managed to come up with an even faster implementation that doesn't rely on (an additional) atomic, in this approach the fiber itself manages the finalizer's registration into the scope.
This approach also exposes a limit on the scope's concurrency level, I've introduced additional benchmarks that utilize multiple scopes (roughly 1 scope per 256 forked fibers) which demonstrate an app level mitigation to the issue (can also be made part of heavy forking ZIO internals as well).

benchmark results (M suffixed benchmarks use multiple scopes):

[info] Benchmark                    (count)   Mode  Cnt      Score      Error  Units
[info] ForkAllBenchmark.scoped            1  thrpt   20  92752.573 ±  586.722  ops/s
[info] ForkAllBenchmark.scoped          128  thrpt   20  11150.409 ±  174.932  ops/s
[info] ForkAllBenchmark.scoped         1024  thrpt   20   1584.055 ±   17.354  ops/s
[info] ForkAllBenchmark.scopedAlt         1  thrpt   20  96579.610 ± 1324.357  ops/s
[info] ForkAllBenchmark.scopedAlt       128  thrpt   20  19045.804 ±  107.888  ops/s
[info] ForkAllBenchmark.scopedAlt      1024  thrpt   20   2140.608 ±   12.332  ops/s
[info] ForkAllBenchmark.scopedAltM        1  thrpt   20  94756.176 ± 3354.481  ops/s
[info] ForkAllBenchmark.scopedAltM      128  thrpt   20  18515.166 ±  833.846  ops/s
[info] ForkAllBenchmark.scopedAltM     1024  thrpt   20   3383.836 ±   59.078  ops/s
[info] ForkAllBenchmark.scopedM           1  thrpt   20  93495.217 ±  428.504  ops/s
[info] ForkAllBenchmark.scopedM         128  thrpt   20  11385.373 ±   63.730  ops/s
[info] ForkAllBenchmark.scopedM        1024  thrpt   20   1666.338 ±   11.102  ops/s

… old impl as a private[zio] (which hopefully pass the mima tests), modify the benchmark names accordingly

eyalfa · 2025-04-21T07:32:09Z

benchmarks after renaming the Alt impl to be the real impl.

info] Benchmark                     (count)   Mode  Cnt      Score      Error  Units
[info] ForkAllBenchmark.scoped             1  thrpt   20  97644.621 ±  711.240  ops/s
[info] ForkAllBenchmark.scoped           128  thrpt   20  18344.875 ±  248.346  ops/s
[info] ForkAllBenchmark.scoped          1024  thrpt   20   2130.510 ±   25.111  ops/s
[info] ForkAllBenchmark.scopedM            1  thrpt   20  92957.889 ± 1580.500  ops/s
[info] ForkAllBenchmark.scopedM          128  thrpt   20  18397.223 ±  348.382  ops/s
[info] ForkAllBenchmark.scopedM         1024  thrpt   20   3428.264 ±   57.074  ops/s
[info] ForkAllBenchmark.scopedOrig         1  thrpt   20  93339.264 ±  764.003  ops/s
[info] ForkAllBenchmark.scopedOrig       128  thrpt   20  11502.471 ±  150.414  ops/s
[info] ForkAllBenchmark.scopedOrig      1024  thrpt   20   1640.955 ±   25.957  ops/s
[info] ForkAllBenchmark.scopedOrigM        1  thrpt   20  90530.373 ± 3243.244  ops/s
[info] ForkAllBenchmark.scopedOrigM      128  thrpt   20   9853.839 ± 2693.841  ops/s
[info] ForkAllBenchmark.scopedOrigM     1024  thrpt   20   1659.317 ±   87.482  ops/s

…ffect that effectively executes the finalizer when explicitly invoked

eyalfa · 2025-04-21T09:41:03Z

I'm closing this in order to reopen it with a more appropriate branch name and description.

please see #9813

eyalfa added 9 commits April 14, 2025 23:46

stream_par_ops_strm_first: mapZIOPar

246a67b

stream_par_ops_strm_first: mapZIOParUnordered

6a86b3e

stream_par_ops_strm_first: bypass deferred upstream when possible

58cd742

stream_par_ops_strm_first: temprarily revert the deferred upstream op…

2069acd

…t due to env propagation issues

stream_par_ops_strm_first: introduce scope.forkSingle, use it to impl…

6e9c680

…ement forkInAlt

stream_par_ops_strm_first: use forkInAlt to optimize stream.mapZioPar…

50b01e3

…[Unordered]

Merge branch 'series/2.x' into stream_par_ops_strm_first

342cd73

stream_par_ops_strm_first: correct and fast forkInAlt impl

42a76c7

stream_par_ops_strm_first: introduce benchmark for forkScoped

c58992d

eyalfa added 5 commits April 17, 2025 23:05

stream_par_ops_strm_first: forkInAlt version that doesn't use atomics…

1f8aa79

…, but it introduces more concurrent pressure on the scope resulting with inconsistent benchmarks results

Revert "stream_par_ops_strm_first: forkInAlt version that doesn't use…

d649380

… atomics, but it introduces more concurrent pressure on the scope resulting with inconsistent benchmarks results" This reverts commit 1f8aa79.

stream_par_ops_strm_first: added few alternative impls, settled on th…

5d6e20f

…e one managing the finalizer from within the child fiber. introduce additional benchmarks showing how can the scope's concurrency limits can be bypassed

stream_par_ops_strm_first: cleanup

d22bd68

stream_par_ops_strm_first: fmt

ae9d888

eyalfa added 4 commits April 20, 2025 21:21

stream_par_ops_strm_first: scala3 syntax related fixes

53c3ced

stream_par_ops_strm_first: rewireforkIn to use the new impl, keep the…

ff83a2f

… old impl as a private[zio] (which hopefully pass the mima tests), modify the benchmark names accordingly

stream_par_ops_strm_first: fmt

c69a9a6

Merge branch 'series/2.x' into stream_par_ops_strm_first

62ff1ac

eyalfa added 5 commits April 21, 2025 10:58

stream_par_ops_strm_first: remove the orig impl

c23bae4

stream_par_ops_strm_first: fmt

03f1ea1

Merge branch 'series/2.x' into stream_par_ops_strm_first

4fbca2c

stream_par_ops_strm_first: forSingle on global scope must return an e…

f700e07

…ffect that effectively executes the finalizer when explicitly invoked

stream_par_ops_strm_first: mima

ee42365

eyalfa marked this pull request as ready for review April 21, 2025 09:09

eyalfa mentioned this pull request Apr 21, 2025

forkIn optimization #9813

Open

eyalfa closed this Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ZStream.mapZIOPar[Unordered] optimization attempt + ZIO.forkIn optimization #9801

ZStream.mapZIOPar[Unordered] optimization attempt + ZIO.forkIn optimization #9801

Uh oh!

eyalfa commented Apr 16, 2025

Uh oh!

eyalfa commented Apr 16, 2025

Uh oh!

guizmaii commented Apr 17, 2025 •

edited

Loading

Uh oh!

eyalfa commented Apr 20, 2025

Uh oh!

eyalfa commented Apr 21, 2025

Uh oh!

eyalfa commented Apr 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

ZStream.mapZIOPar[Unordered] optimization attempt + ZIO.forkIn optimization #9801

ZStream.mapZIOPar[Unordered] optimization attempt + ZIO.forkIn optimization #9801

Uh oh!

Conversation

eyalfa commented Apr 16, 2025

Uh oh!

eyalfa commented Apr 16, 2025

Uh oh!

guizmaii commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eyalfa commented Apr 20, 2025

Uh oh!

eyalfa commented Apr 21, 2025

Uh oh!

eyalfa commented Apr 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guizmaii commented Apr 17, 2025 •

edited

Loading