Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 29, 2022

Fixes #77638, reduces number of jump stubs emitted from 44k to zero.

Codegen example for a method not within reloc32 reach:

static object Foo() => new object();
G_M29399_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
G_M29399_IG02:              ;; offset=0004H
       48B9E0950831F87F0000 mov      rcx, 0x7FF8310895E0      ; System.Object
       48B88004BB90F87F0000 mov      rax, 0x7FF890BB0480      ; function address
       FFD0                 call     rax ; CORINFO_HELP_NEWSFAST
       90                   nop      
G_M29399_IG03:              ;; offset=001BH
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code 32

where previously we emitted just a call to a jump-stub.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 29, 2022
@ghost ghost assigned EgorBo Oct 29, 2022
@ghost
Copy link

ghost commented Oct 29, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #77638, reduces number of jump stubs emitted from 44k to zero.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo

This comment was marked as outdated.

@EgorBo EgorBo marked this pull request as ready for review October 29, 2022 21:25
@EgorBo
Copy link
Member Author

EgorBo commented Oct 29, 2022

Did a couple of runs - with the current Main I always got 44k jump stubs in every single run of the benchmark. With this PR it's mostly 0, got 271 at some run.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 29, 2022

Hm.. however this PR might regress Managed-to-managed calls if both calls are not withing reach but next to each others. Probably still better than emitting jump-stubs for literally all VMs helpers such as allocators.

So a better fix (as was on arm) is to rewrite Code manager to know the exact place in memory current code will be saved to.
Although, it doesn't seem painful on x64 to use direct addresses... a single instruction to populate it.

@EgorBo EgorBo marked this pull request as draft October 29, 2022 23:39
@jkotas
Copy link
Member

jkotas commented Oct 30, 2022

Probably still better than emitting jump-stubs for literally all VMs helpers such as allocators.

How does mov rax, XXXXX + call rax sequence work with call target predictor in current CPUs?

IIRC, this sequence consumed precious slots in call target predictor back in the day and going through jump stubs was faster in real workloads since it was better predicted. (Also, the smaller code helps too - instruction caches are often bottleneck in .NET workload since the code is too big.)

@EgorBo EgorBo force-pushed the fix-IsCallTargetInRange branch from 984b13b to cebfced Compare October 31, 2022 23:23
@EgorBo
Copy link
Member Author

EgorBo commented Oct 31, 2022

@jkotas I've just pushed a change to limit this to Dynamic methods like you suggested in #62302 (comment)

It lowers number of jump-stubs generated from 44k to ~4k every benchmark launch.

However, it relies on a more precise getRelocTypeHint then it's now. But probably still better than what it ends up now - emitting jump-stubs for small temp dynamic methods?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 31, 2022

IIRC, this sequence consumed precious slots in call target predictor back in the day and going through jump stubs was faster in real workloads since it was better predicted.

Reminds me this issue on Apple M1: https://twitter.com/dougallj/status/1580826539205496832 🙂

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

But probably still better than what it ends up now - emitting jump-stubs for small temp dynamic methods?

I expect that it will regress small workloads with dynamic methods.

The current scheme used by the JIT/EE interface is to assume that everything fits into Rel32. Once we hit one situation that does not fit, we will redo JITing of that method and stop returning IMAGE_REL_BASED_REL32 from getRelocTypeHint from then on. It allows small workloads to get full rel32 benefit, and only large workloads with gigabytes of code to pay the overhead.

I think it would be best to extend the current scheme to handle the dynamic jumpstub case.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 31, 2022

Once we hit one situation that does not fit, we will redo JITing of that method and stop returning IMAGE_REL_BASED_REL32

Isn't it a rare event and is not supposed to happen normally for small apps?
Btw, if a method reports "I fit into rel32" but it turns out to be not - a jump stub is emitted but the global REl32 is not turned off. My understanding that we stop using REL32 globally once e.g. a memory load doesn't fit

@jkotas
Copy link
Member

jkotas commented Oct 31, 2022

Isn't it a rare event and is not supposed to happen normally for small apps?

Right, it is not supposed to happen for normal apps.

if a method reports "I fit into rel32" but it turns out to be not - a jump stub is emitted but the global REl32 is not turned off.

Right. I am suggesting to change the "a jump stub is emitted" part into "if it is a dynamic method { set a global flag that says to use inlinine call targets for dynamic methods and redo the method } else { a jump stub is emitted }".

@ghost
Copy link

ghost commented Dec 1, 2022

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost ghost closed this Dec 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2022
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many jump stubs
2 participants