Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@eyalfa
Copy link
Contributor

@eyalfa eyalfa commented Apr 16, 2025

this toy branch started as an attempt to optimize ZStream.mapZIOPar and ZStream.mapZIOParUnordered (and the counterpart ZPipeline operations) but ended up with an alternative optimized implementation for ZIO.forkIn.

the idea behind the optimization is to avoid the intermediate additional scope object required by the current implementation and leverage some low level atomic ops to track the fiber and finalizer states.

I've currently modified the two stream operators to leverage this implementation and also added two variations to the ForkAllBenchmark, please see the relevant benchmarks results below:

sbt -J-Xmx4g   'benchmarks/jmh:run -i10  zio.ForkAllBenchmark\..*'

[info] Benchmark                   (count)   Mode  Cnt      Score      Error  Units
[info] ForkAllBenchmark.run              1  thrpt   20  97175.155 ± 5230.725  ops/s
[info] ForkAllBenchmark.run            128  thrpt   20  22355.400 ±  718.012  ops/s
[info] ForkAllBenchmark.run           1024  thrpt   20   3866.213 ±  127.598  ops/s
[info] ForkAllBenchmark.scoped           1  thrpt   20  88549.481 ± 1638.243  ops/s
[info] ForkAllBenchmark.scoped         128  thrpt   20  10970.893 ±  705.796  ops/s
[info] ForkAllBenchmark.scoped        1024  thrpt   20   1480.439 ±  101.085  ops/s
[info] ForkAllBenchmark.scopedAlt        1  thrpt   20  95303.147 ± 1360.062  ops/s
[info] ForkAllBenchmark.scopedAlt      128  thrpt   20  14009.977 ±  545.024  ops/s
[info] ForkAllBenchmark.scopedAlt     1024  thrpt   20   2101.676 ±   76.975  ops/s

ZStream relevant benchmarks, baseline:

sbt -J-Xmx4g  'benchmarks/jmh:run -i20  zio.StreamParBenchmark\.zioMap.*'
[info] Benchmark                              (chunkCount)  (chunkSize)  (parChunkSize)   Mode  Cnt  Score   Error  Units
[info] StreamParBenchmark.zioMapPar                  10000         5000              50  thrpt   20  0.574 ± 0.034  ops/s
[info] StreamParBenchmark.zioMapParUnordered         10000         5000              50  thrpt   20  0.645 ± 0.036  ops/s

ZStream relevan bencharks in this branch:

sbt -J-Xmx4g  'benchmarks/jmh:run -i20  zio.StreamParBenchmark\.zioMap.*'

[info] Benchmark                              (chunkCount)  (chunkSize)  (parChunkSize)   Mode  Cnt  Score   Error  Units
[info] StreamParBenchmark.zioMapPar                  10000         5000              50  thrpt   20  0.751 ± 0.018  ops/s
[info] StreamParBenchmark.zioMapParUnordered         10000         5000              50  thrpt   20  0.821 ± 0.011  ops/s
[success] Total time: 128 s (0:02:08.0), completed 16 Apr 2025, 13:36:29

introducing this work as a draft PR in order to discuss the optimization approach.

@eyalfa
Copy link
Contributor Author

eyalfa commented Apr 16, 2025

@kyri-petrou @ghostdogpr , I'd appreciate your input on this

@guizmaii
Copy link
Member

guizmaii commented Apr 17, 2025

FYI, I already provided some optimisation for ZStream.mapZIOParUnordered here: #9602
I'm just waiting for a review but it's a kind of optimisation we already did in some other places. Maybe you can base you work on this PR?

eyalfa added 5 commits April 17, 2025 23:05
…, but it introduces more concurrent pressure on the scope resulting with inconsistent benchmarks results
… atomics, but it introduces more concurrent pressure on the scope resulting with inconsistent benchmarks results"

This reverts commit 1f8aa79.
…e one managing the finalizer from within the child fiber. introduce additional benchmarks showing how can the scope's concurrency limits can be bypassed
@eyalfa
Copy link
Contributor Author

eyalfa commented Apr 20, 2025

@kyri-petrou , @guizmaii @ghostdogpr ,
I've managed to come up with an even faster implementation that doesn't rely on (an additional) atomic, in this approach the fiber itself manages the finalizer's registration into the scope.
This approach also exposes a limit on the scope's concurrency level, I've introduced additional benchmarks that utilize multiple scopes (roughly 1 scope per 256 forked fibers) which demonstrate an app level mitigation to the issue (can also be made part of heavy forking ZIO internals as well).

benchmark results (M suffixed benchmarks use multiple scopes):

[info] Benchmark                    (count)   Mode  Cnt      Score      Error  Units
[info] ForkAllBenchmark.scoped            1  thrpt   20  92752.573 ±  586.722  ops/s
[info] ForkAllBenchmark.scoped          128  thrpt   20  11150.409 ±  174.932  ops/s
[info] ForkAllBenchmark.scoped         1024  thrpt   20   1584.055 ±   17.354  ops/s
[info] ForkAllBenchmark.scopedAlt         1  thrpt   20  96579.610 ± 1324.357  ops/s
[info] ForkAllBenchmark.scopedAlt       128  thrpt   20  19045.804 ±  107.888  ops/s
[info] ForkAllBenchmark.scopedAlt      1024  thrpt   20   2140.608 ±   12.332  ops/s
[info] ForkAllBenchmark.scopedAltM        1  thrpt   20  94756.176 ± 3354.481  ops/s
[info] ForkAllBenchmark.scopedAltM      128  thrpt   20  18515.166 ±  833.846  ops/s
[info] ForkAllBenchmark.scopedAltM     1024  thrpt   20   3383.836 ±   59.078  ops/s
[info] ForkAllBenchmark.scopedM           1  thrpt   20  93495.217 ±  428.504  ops/s
[info] ForkAllBenchmark.scopedM         128  thrpt   20  11385.373 ±   63.730  ops/s
[info] ForkAllBenchmark.scopedM        1024  thrpt   20   1666.338 ±   11.102  ops/s

@eyalfa
Copy link
Contributor Author

eyalfa commented Apr 21, 2025

benchmarks after renaming the Alt impl to be the real impl.

info] Benchmark                     (count)   Mode  Cnt      Score      Error  Units
[info] ForkAllBenchmark.scoped             1  thrpt   20  97644.621 ±  711.240  ops/s
[info] ForkAllBenchmark.scoped           128  thrpt   20  18344.875 ±  248.346  ops/s
[info] ForkAllBenchmark.scoped          1024  thrpt   20   2130.510 ±   25.111  ops/s
[info] ForkAllBenchmark.scopedM            1  thrpt   20  92957.889 ± 1580.500  ops/s
[info] ForkAllBenchmark.scopedM          128  thrpt   20  18397.223 ±  348.382  ops/s
[info] ForkAllBenchmark.scopedM         1024  thrpt   20   3428.264 ±   57.074  ops/s
[info] ForkAllBenchmark.scopedOrig         1  thrpt   20  93339.264 ±  764.003  ops/s
[info] ForkAllBenchmark.scopedOrig       128  thrpt   20  11502.471 ±  150.414  ops/s
[info] ForkAllBenchmark.scopedOrig      1024  thrpt   20   1640.955 ±   25.957  ops/s
[info] ForkAllBenchmark.scopedOrigM        1  thrpt   20  90530.373 ± 3243.244  ops/s
[info] ForkAllBenchmark.scopedOrigM      128  thrpt   20   9853.839 ± 2693.841  ops/s
[info] ForkAllBenchmark.scopedOrigM     1024  thrpt   20   1659.317 ±   87.482  ops/s

@eyalfa eyalfa marked this pull request as ready for review April 21, 2025 09:09
@eyalfa eyalfa mentioned this pull request Apr 21, 2025
@eyalfa
Copy link
Contributor Author

eyalfa commented Apr 21, 2025

I'm closing this in order to reopen it with a more appropriate branch name and description.

please see #9813

@eyalfa eyalfa closed this Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants