FiberRuntime optimizations
#8671
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Attempt to fix #8659
There are 4 micro-optimizations in this PR which collectively result in ~10-15% performance improvement:
RuntimeFlags: Add privateisEnabledet.al method overloads which takesmask: Intas an argument (as opposed toflag: RuntimeFlag). This way, we can utilize the inlining of thefinal val maskwhenever the flag being tested is statically known and avoid the megamorphic virtual call.FiberRuntime#runLoop: CallupdateLastTrace(???)after we've pattern matched on the concrete class type wherever possible (happy path). Similar to (1), this avoids the megamorphic virtual method call.FiberRuntime#updateLastTrace: We store a reference toTrace.emptyinto a private val and check against it, as calling a local val is more performant (according to the benchmarks) than calling the method on theTraceobject.inbox.isEmptyprior toinbox.poll()by checking whether the value returned bypoll()isnull.With these changes, the
BroadFlatMapBenchmarkbenchmarks consistently show a ~10-15% performance improvement:I also published local versions of series/2.x and the PR brach, and tested it against one of Caliban's E2E benchmarks, and the improvement is consistent with With these changes, Caliban's benchmarks show ~10-15% improvement in throughput
Finally, as we can see in the profiling graphs below, the
itable stubblocks are (mostly) gone now:series/2.x

PR branch:
