Emit less jump-stubs on x64 #77639

EgorBo · 2022-10-29T19:46:29Z

Fixes #77638, reduces number of jump stubs emitted from 44k to zero.

Codegen example for a method not within reloc32 reach:

static object Foo() => new object();

G_M29399_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
G_M29399_IG02:              ;; offset=0004H
       48B9E0950831F87F0000 mov      rcx, 0x7FF8310895E0      ; System.Object
       48B88004BB90F87F0000 mov      rax, 0x7FF890BB0480      ; function address
       FFD0                 call     rax ; CORINFO_HELP_NEWSFAST
       90                   nop      
G_M29399_IG03:              ;; offset=001BH
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code 32

where previously we emitted just a call to a jump-stub.

ghost · 2022-10-29T19:46:43Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #77638, reduces number of jump stubs emitted from 44k to zero.

Author:	EgorBo
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

EgorBo · 2022-10-29T21:28:34Z

Did a couple of runs - with the current Main I always got 44k jump stubs in every single run of the benchmark. With this PR it's mostly 0, got 271 at some run.

EgorBo · 2022-10-29T22:01:53Z

Hm.. however this PR might regress Managed-to-managed calls if both calls are not withing reach but next to each others. Probably still better than emitting jump-stubs for literally all VMs helpers such as allocators.

So a better fix (as was on arm) is to rewrite Code manager to know the exact place in memory current code will be saved to.
Although, it doesn't seem painful on x64 to use direct addresses... a single instruction to populate it.

jkotas · 2022-10-30T06:53:47Z

Probably still better than emitting jump-stubs for literally all VMs helpers such as allocators.

How does mov rax, XXXXX + call rax sequence work with call target predictor in current CPUs?

IIRC, this sequence consumed precious slots in call target predictor back in the day and going through jump stubs was faster in real workloads since it was better predicted. (Also, the smaller code helps too - instruction caches are often bottleneck in .NET workload since the code is too big.)

EgorBo · 2022-10-31T23:26:47Z

@jkotas I've just pushed a change to limit this to Dynamic methods like you suggested in #62302 (comment)

It lowers number of jump-stubs generated from 44k to ~4k every benchmark launch.

However, it relies on a more precise getRelocTypeHint then it's now. But probably still better than what it ends up now - emitting jump-stubs for small temp dynamic methods?

EgorBo · 2022-10-31T23:29:04Z

IIRC, this sequence consumed precious slots in call target predictor back in the day and going through jump stubs was faster in real workloads since it was better predicted.

Reminds me this issue on Apple M1: https://twitter.com/dougallj/status/1580826539205496832 🙂

jkotas · 2022-10-31T23:38:52Z

But probably still better than what it ends up now - emitting jump-stubs for small temp dynamic methods?

I expect that it will regress small workloads with dynamic methods.

The current scheme used by the JIT/EE interface is to assume that everything fits into Rel32. Once we hit one situation that does not fit, we will redo JITing of that method and stop returning IMAGE_REL_BASED_REL32 from getRelocTypeHint from then on. It allows small workloads to get full rel32 benefit, and only large workloads with gigabytes of code to pay the overhead.

I think it would be best to extend the current scheme to handle the dynamic jumpstub case.

EgorBo · 2022-10-31T23:43:53Z

Once we hit one situation that does not fit, we will redo JITing of that method and stop returning IMAGE_REL_BASED_REL32

Isn't it a rare event and is not supposed to happen normally for small apps?
Btw, if a method reports "I fit into rel32" but it turns out to be not - a jump stub is emitted but the global REl32 is not turned off. My understanding that we stop using REL32 globally once e.g. a memory load doesn't fit

jkotas · 2022-10-31T23:56:15Z

Isn't it a rare event and is not supposed to happen normally for small apps?

Right, it is not supposed to happen for normal apps.

if a method reports "I fit into rel32" but it turns out to be not - a jump stub is emitted but the global REl32 is not turned off.

Right. I am suggesting to change the "a jump stub is emitted" part into "if it is a dynamic method { set a global flag that says to use inlinine call targets for dynamic methods and redo the method } else { a jump stub is emitted }".

ghost · 2022-12-01T05:02:19Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 29, 2022

ghost assigned EgorBo Oct 29, 2022

This comment was marked as outdated.

Sign in to view

EgorBo marked this pull request as ready for review October 29, 2022 21:25

EgorBo marked this pull request as draft October 29, 2022 23:39

Limit to dynamic method

cebfced

EgorBo force-pushed the fix-IsCallTargetInRange branch from 984b13b to cebfced Compare October 31, 2022 23:23

EgorBo mentioned this pull request Nov 6, 2022

Dynamic PGO startup improvements in NET 8 #76969

Closed

23 tasks

ghost closed this Dec 1, 2022

ghost locked as resolved and limited conversation to collaborators Dec 31, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Emit less jump-stubs on x64 #77639

Emit less jump-stubs on x64 #77639

Uh oh!

EgorBo commented Oct 29, 2022 •

edited

Loading

Uh oh!

ghost commented Oct 29, 2022

Uh oh!

This comment was marked as outdated.

EgorBo commented Oct 29, 2022

Uh oh!

EgorBo commented Oct 29, 2022 •

edited

Loading

Uh oh!

jkotas commented Oct 30, 2022

Uh oh!

EgorBo commented Oct 31, 2022

Uh oh!

EgorBo commented Oct 31, 2022

Uh oh!

jkotas commented Oct 31, 2022

Uh oh!

EgorBo commented Oct 31, 2022 •

edited

Loading

Uh oh!

jkotas commented Oct 31, 2022

Uh oh!

ghost commented Dec 1, 2022

Uh oh!

Uh oh!

Emit less jump-stubs on x64 #77639

Emit less jump-stubs on x64 #77639

Uh oh!

Conversation

EgorBo commented Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Oct 29, 2022

Uh oh!

This comment was marked as outdated.

EgorBo commented Oct 29, 2022

Uh oh!

EgorBo commented Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Oct 30, 2022

Uh oh!

EgorBo commented Oct 31, 2022

Uh oh!

EgorBo commented Oct 31, 2022

Uh oh!

jkotas commented Oct 31, 2022

Uh oh!

EgorBo commented Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Oct 31, 2022

Uh oh!

ghost commented Dec 1, 2022

Uh oh!

Uh oh!

EgorBo commented Oct 29, 2022 •

edited

Loading

EgorBo commented Oct 29, 2022 •

edited

Loading

EgorBo commented Oct 31, 2022 •

edited

Loading