Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kyri-petrou
Copy link
Contributor

Attempt to fix #8659

There are 4 micro-optimizations in this PR which collectively result in ~10-15% performance improvement:

  1. RuntimeFlags: Add private isEnabled et.al method overloads which takes mask: Int as an argument (as opposed to flag: RuntimeFlag). This way, we can utilize the inlining of the final val mask whenever the flag being tested is statically known and avoid the megamorphic virtual call.
  2. FiberRuntime#runLoop: Call updateLastTrace(???) after we've pattern matched on the concrete class type wherever possible (happy path). Similar to (1), this avoids the megamorphic virtual method call.
  3. FiberRuntime#updateLastTrace: We store a reference to Trace.empty into a private val and check against it, as calling a local val is more performant (according to the benchmarks) than calling the method on the Trace object.
  4. Avoid calling inbox.isEmpty prior to inbox.poll() by checking whether the value returned by poll() is null.

With these changes, the BroadFlatMapBenchmark benchmarks consistently show a ~10-15% performance improvement:

series/2.x:
[info] Benchmark                                                    (depth)   Mode  Cnt      Score     Error  Units
[info] BroadFlatMapBenchmark.zioBroadFlatMap                             20  thrpt    5  19911.313 ± 580.931  ops/s

PR:
[info] Benchmark                                                    (depth)   Mode  Cnt      Score     Error  Units
[info] BroadFlatMapBenchmark.zioBroadFlatMap                             20  thrpt    5  21731.037 ± 287.458  ops/s

I also published local versions of series/2.x and the PR brach, and tested it against one of Caliban's E2E benchmarks, and the improvement is consistent with With these changes, Caliban's benchmarks show ~10-15% improvement in throughput

series/2.x:
[info] Benchmark                                           Mode  Cnt    Score    Error  Units
[info] NestedZQueryBenchmark.multifieldBatchedQuery1000   thrpt    5  727.499 ± 10.052  ops/s

PR branch:
[info] Benchmark                                           Mode  Cnt    Score    Error  Units
[info] NestedZQueryBenchmark.multifieldBatchedQuery1000   thrpt    5  845.926 ± 14.887  ops/s

Finally, as we can see in the profiling graphs below, the itable stub blocks are (mostly) gone now:

series/2.x
image

PR branch:
image

@adamgfraser adamgfraser merged commit ecb38a3 into zio:series/2.x Feb 20, 2024
@kyri-petrou
Copy link
Contributor Author

@adamgfraser should I create a PR to backport these improvements to series/2.0.x as well?

@adamgfraser
Copy link
Contributor

@kyri-petrou Sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance loss due to interface method invocation in ZIO's executor

2 participants