Thanks to visit codestin.com
Credit goes to github.com

Skip to content

JIT: use root compiler instance for sufficient PGO observation #115119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AndyAyersMS
Copy link
Member

During inlining, we evaluate some aspects of an inlinee's viability while importing its direct caller. If that caller is not the inline root we may make inconsistent observations of the overall state of PGO. So for PGO observations always consult the root compiler.

For example, the root R may have decided to inline a small method A that did not have PGO (say because of minimal profiling or lack of PGO for always inlined R2R methods), and that method calls another method B; we want to evaluate the viability of B using the PGO state of R, not of A.

During inlining, we evaluate some aspects of an inlinee's viability
while importing its direct caller. If that caller is not the inline
root we may make inconsistent observations of the overall state of
PGO. So for PGO observations always consult the root compiler.

For example, the root R may have decided to inline a small method A that
did not have PGO (say because of minimal profiling or lack of PGO for
always inlined R2R methods), and that method calls another method B;
we want to evaluate the viability of B using the PGO state of R, not of A.
@Copilot Copilot AI review requested due to automatic review settings April 28, 2025 16:11
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 28, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors inlining behavior to ensure that PGO observations are based on the root compiler instance. Key changes include:

  • Logging updates in inlinepolicy.cpp to reflect the PGO state of the root compiler.
  • Modification in importercalls.cpp to use impInlineRoot()->fgHaveSufficientProfileWeights() for consistency.
  • Additional debug messages in fginline.cpp to report the PGO data state.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/coreclr/jit/inlinepolicy.cpp Added and refined JITDUMP logging for inline candidate evaluation.
src/coreclr/jit/importercalls.cpp Updated to reference the root compiler PGO state.
src/coreclr/jit/fginline.cpp Introduced extra debug logging for PGO status in inliner.
Comments suppressed due to low confidence (3)

src/coreclr/jit/inlinepolicy.cpp:1375

  • [nitpick] Consider including additional context—such as the inline candidate identifier—in this debug log for clearer traceability of inlining decisions.
JITDUMP("Callee has trusted profile\n");

src/coreclr/jit/inlinepolicy.cpp:1410

  • [nitpick] Consider appending the inline candidate’s identifier to this debug message to improve the clarity and usefulness when diagnosing inlining failures.
JITDUMP("Callee IL size %u exceeds maxCodeSize %u\n", m_CodeSize, maxCodeSize);

src/coreclr/jit/fginline.cpp:790

  • [nitpick] Ensure that these verbose debug logs are appropriately gated for production builds to avoid unintended performance impacts.
JITDUMP("INLINER: pgo source is %s; pgo data is %sconsistent; %strusted; %ssufficient\n", compGetPgoSourceName(), fgPgoConsistent ? "" : "not ", fgHaveTrustedProfileWeights() ? "" : "not ", fgHaveSufficientProfileWeights() ? "" : "not ");

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib PTAL

Likely will be difficult to see the impact via SPMI/PMI.

@@ -9256,7 +9256,7 @@ void Compiler::impCheckCanInline(GenTreeCall* call,
// Profile data allows us to avoid early "too many IL bytes" outs.
//
inlineResult->NoteBool(InlineObservation::CALLSITE_HAS_PROFILE_WEIGHTS,
compiler->fgHaveSufficientProfileWeights());
compiler->impInlineRoot()->fgHaveSufficientProfileWeights());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change; the rest is just dumping more state

@AndyAyersMS
Copy link
Member Author

Lots of missed contexts (2% ish). I will collect a bespoke ASP.NET SPMI to get a clearer picture.

@AndyAyersMS
Copy link
Member Author

Lots of missed contexts (2% ish). I will collect a bespoke ASP.NET SPMI to get a clearer picture.

Locally this seems to be too much (excluding ~800 contexts that don't replay with base)

[17:46:35] Asm diffs found
[17:46:35] Total instructions executed by base: 201022764491
[17:46:35] Total instructions executed by diff: 218769331734
[17:46:35] Total instructions executed delta: 17746567243 (8.83% of base)

[17:42:53] Total bytes of base: 66211175
[17:42:53] Total bytes of diff: 68489412
[17:42:53] Total bytes of delta: 2278237 (3.44% of base)
[17:42:53]
[17:42:53] Total PerfScore of base: 66735541.16814355
[17:42:53] Total PerfScore of diff: 66848347.7220094
[17:42:53] Total PerfScore of delta: 112806.55386584997 (0.17% of base)
[17:42:53]
[17:42:53] Relative PerfScore Geomean: 0.5682%
[17:42:53] Relative PerfScore Geomean (Diffs): 39.2229%

However most of the big diffs are from OSR methods, and my bespoke SPMI artificially enhances the set of OSR methods.

[17:42:54] Top method regressions (bytes):
[17:42:54]        10314 (1,121.09% of base) : 107161.dasm - Microsoft.EntityFrameworkCore.Metadata.Conventions.Internal.ConventionDispatcher+ImmediateConventionScope:OnEntityTypeAdded(Microsoft.EntityFrameworkCore.Metadata.Builders.IConventionEntityTypeBuilder):Microsoft.EntityFrameworkCore.Metadata.Builders.IConventionEntityTypeBuilder:this (Tier1-OSR)
[17:42:54]        10242 (1,113.26% of base) : 58257.dasm - Microsoft.EntityFrameworkCore.Metadata.Conventions.Internal.ConventionDispatcher+ImmediateConventionScope:OnEntityTypeAdded(Microsoft.EntityFrameworkCore.Metadata.Builders.IConventionEntityTypeBuilder):Microsoft.EntityFrameworkCore.Metadata.Builders.IConventionEntityTypeBuilder:this (Tier1-OSR)
[17:42:54]         8914 (633.10% of base) : 171459.dasm - Markdig.Parsers.ParserList`2[System.__Canon,System.__Canon]:.ctor(System.Collections.Generic.IEnumerable`1[System.__Canon]):this (Tier1-OSR)
[17:42:54]         8781 (624.98% of base) : 171463.dasm - Markdig.Parsers.ParserList`2[System.__Canon,System.__Canon]:.ctor(System.Collections.Generic.IEnumerable`1[System.__Canon]):this (Tier1-OSR)
[17:42:54]         8218 (2,257.69% of base) : 107077.dasm - System.Diagnostics.Metrics.MeterListener:Start():this (Tier1-OSR)
[17:42:54]         8216 (2,257.14% of base) : 122787.dasm - System.Diagnostics.Metrics.MeterListener:Start():this (Tier1-OSR)
[17:42:54]         8215 (2,256.87% of base) : 110339.dasm - System.Diagnostics.Metrics.MeterListener:Start():this (Tier1-OSR)
[17:42:54]         8215 (2,256.87% of base) : 90999.dasm - System.Diagnostics.Metrics.MeterListener:Start():this (Tier1-OSR)

Going to try restricting this to be non-OSR and see what that looks like

@AndyAyersMS
Copy link
Member Author

Still seems quite impactful with OSR using the old behavior -- this is with a local release baseline to rule out possible compiler version issues

[18:26:56] Total instructions executed by base: 201039018573
[18:26:56] Total instructions executed by diff: 211365390699
[18:26:56] Total instructions executed delta: 10326372126 (5.14% of base)

[17:57:53] Total bytes of base: 66211175
[17:57:53] Total bytes of diff: 67616223
[17:57:53] Total bytes of delta: 1405048 (2.12% of base)
[17:57:53]
[17:57:53] Total PerfScore of base: 66735541.16814355
[17:57:53] Total PerfScore of diff: 66586403.12200942
[17:57:53] Total PerfScore of delta: -149138.04613412917 (-0.22% of base)
[17:57:53]
[17:57:53] Relative PerfScore Geomean: 0.5572%
[17:57:53] Relative PerfScore Geomean (Diffs): 48.4549%

Next step might be to introduce some intermediate position, if the root compiler has sufficient PGO but the calling compiler doesn't -- the main impact here to the heuristic is the max IL size the inliner will consider. With sufficient PGO this is 1024, without it's 128. So maybe in this mixed mode we choose some value in between.

@AndyAyersMS
Copy link
Member Author

@EgorBo FYI -- seems quite costly still.

@EgorBo
Copy link
Member

EgorBo commented May 2, 2025

I think the change makes total sense. I guess we can just revert it if the dotnet/performance results won't look like it's worth it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants