Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[JIT] Enable conditional chaining for Intel APX #111072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 3, 2025

Conversation

anthonycanino
Copy link
Contributor

@anthonycanino anthonycanino commented Jan 3, 2025

Design

This PR mostly enables existing conditional chaining logic for X86 with the inclusion of APX ccmp instruction. Currently, the optimization must be explicitly enabled via DOTNET_JitEnableApxConditionalChaining=1.

Testing

Note: The testing plan for APX work has been discussed in #106557, please refer to that PR for details, only results and comments will be posted in this PR. Results posted below.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 3, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@anthonycanino
Copy link
Contributor Author

anthonycanino commented Jan 3, 2025

1. Intel SDE Testing

Test run with SDE:

base

Test run with SDE with DOTENT_JitEnableApxConditionalChaining=1

diff

2. SuperPMI results

Diffs are based on 2,635,272 contexts (1,050,818 MinOpts, 1,584,454 FullOpts).

MISSED contexts: 2,984 (0.11%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;JitEnableApxIfConv=1

Overall (-169,140 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 42,216,437 -6,257 -9.13%
benchmarks.run.windows.x64.checked.mch 8,860,704 -21,878 -4.99%
benchmarks.run_pgo.windows.x64.checked.mch 35,294,983 -25,089 -8.82%
benchmarks.run_tiered.windows.x64.checked.mch 12,613,813 -20,816 -4.88%
coreclr_tests.run.windows.x64.checked.mch 389,370,227 -11,578 -8.49%
libraries.crossgen2.windows.x64.checked.mch 44,888,851 +1,338 -8.60%
libraries.pmi.windows.x64.checked.mch 60,136,361 -7,956 -9.53%
libraries_tests.run.windows.x64.Release.mch 322,952,768 -56,484 -10.31%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 147,678,294 -20,097 -6.96%
realworld.run.windows.x64.checked.mch 10,242,976 -451 -6.47%
smoke_tests.nativeaot.windows.x64.checked.mch 4,496,305 +128 -6.51%
FullOpts (-169,140 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 23,294,229 -6,257 -9.13%
benchmarks.run.windows.x64.checked.mch 8,860,282 -21,878 -4.99%
benchmarks.run_pgo.windows.x64.checked.mch 20,646,767 -25,089 -8.82%
benchmarks.run_tiered.windows.x64.checked.mch 3,214,524 -20,816 -4.88%
coreclr_tests.run.windows.x64.checked.mch 118,145,145 -11,578 -8.49%
libraries.crossgen2.windows.x64.checked.mch 44,887,136 +1,338 -8.60%
libraries.pmi.windows.x64.checked.mch 60,023,468 -7,956 -9.53%
libraries_tests.run.windows.x64.Release.mch 132,262,168 -56,484 -10.31%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,028,529 -20,097 -6.96%
realworld.run.windows.x64.checked.mch 10,018,142 -451 -6.47%
smoke_tests.nativeaot.windows.x64.checked.mch 4,495,216 +128 -6.51%

@anthonycanino anthonycanino changed the title Enable conditional chaining for Intel APX [JIT] Enable conditional chaining for Intel APX Jan 3, 2025
@BruceForstall BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Jan 7, 2025
Comment on lines 410 to 415
// On X86, a FP compare is implemented as a fallthrough, which requires two flag checks; hence,
// we cannot simply get a single output condition to feed into a ccmp. Might be possible to chain
// this, but skipping those cases for now
GenCondition cond1;
if (op2->OperIsCmpCompare() && varTypeIsIntegralOrI(op2->gtGetOp1()) && IsInvariantInRange(op2, tree) &&
ProducesPotentialConsumableFlagsForCCMP(op1) && TryLowerConditionToFlagsNode(tree, op1, &cond1))
Copy link
Member

@jakobbotsch jakobbotsch Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to get rid of ProducedPotentialConsumableFlagsForCCMP and add an argument to TryLowerConditionToFlagsNode about whether it is allowed to lower to a condition that requires multiple flags checks. Otherwise we end up having to keep ProducedPotentialConsumableFlagsForCCMP and TryLowerConditionToFlagsNode in sync.

You can use GenConditionDesc::Get(cond).jumpKind2== EJ_NONE to check this condition in the appropriate places in TryLowerConditionToFlagsNode.

@anthonycanino
Copy link
Contributor Author

Latest superpmi results after rebasing and making some changes.

Diffs are based on 2,669,439 contexts (1,084,276 MinOpts, 1,585,163 FullOpts).

MISSED contexts: 1,455 (0.05%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;EnableApxConditionalChaining=1

Overall (-148,632 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 53,246,741 -3,781 -3.77%
benchmarks.run.windows.x64.checked.mch 8,913,794 -21,476 -4.44%
benchmarks.run_pgo.windows.x64.checked.mch 31,189,257 -27,274 -3.64%
benchmarks.run_tiered.windows.x64.checked.mch 12,550,443 -20,509 -4.42%
coreclr_tests.run.windows.x64.checked.mch 409,509,635 -8,538 -4.23%
libraries.crossgen2.windows.x64.checked.mch 39,148,972 +4,006 -8.70%
libraries.pmi.windows.x64.checked.mch 60,485,843 -6,789 -6.83%
libraries_tests.run.windows.x64.Release.mch 328,247,219 -52,846 -4.07%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 151,291,795 -11,562 -6.46%
realworld.run.windows.x64.checked.mch 11,802,974 -131 -4.70%
smoke_tests.nativeaot.windows.x64.checked.mch 3,697,229 +268 -4.41%
FullOpts (-148,632 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 28,806,967 -3,781 -3.77%
benchmarks.run.windows.x64.checked.mch 8,913,327 -21,476 -4.44%
benchmarks.run_pgo.windows.x64.checked.mch 17,455,616 -27,274 -3.64%
benchmarks.run_tiered.windows.x64.checked.mch 3,102,870 -20,509 -4.42%
coreclr_tests.run.windows.x64.checked.mch 125,765,026 -8,538 -4.23%
libraries.crossgen2.windows.x64.checked.mch 39,147,218 +4,006 -8.70%
libraries.pmi.windows.x64.checked.mch 60,372,913 -6,789 -6.83%
libraries_tests.run.windows.x64.Release.mch 132,114,880 -52,846 -4.07%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 140,638,024 -11,562 -6.46%
realworld.run.windows.x64.checked.mch 11,578,095 -131 -4.70%
smoke_tests.nativeaot.windows.x64.checked.mch 3,696,221 +268 -4.41%

@anthonycanino anthonycanino force-pushed the apx-conditional-chaining branch from a459a25 to 4f6dd62 Compare February 26, 2025 17:46
@anthonycanino
Copy link
Contributor Author

Note that right now, we have found a small bug that will appear due to some implicitly dependency on canUseEvexEncoding() and we are working on a solution. In addition, we needed to patch in some code to enable EGPR for ccmp, which must use extended EVEX to encode EGPR, not REX2 (809f921).

Once we have this ironed out, I can open this as a PR ready for review.

@anthonycanino
Copy link
Contributor Author

The follow is the latest superpmi results rebased on main. Note that I am comparing with a patched version of main to force APX (JitBypassApxCheck), so the diff is based on the conditional chaining optimization.

Diffs are based on 2,673,047 contexts (1,085,228 MinOpts, 1,587,819 FullOpts).

MISSED contexts: 123 (0.00%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;EnableApxConditionalChaining=1

Overall (-140,494 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 54,183,056 -536 +2.55%
benchmarks.run.windows.x64.checked.mch 8,915,564 -21,477 -5.22%
benchmarks.run_pgo.windows.x64.checked.mch 32,755,227 -22,078 -1.74%
benchmarks.run_tiered.windows.x64.checked.mch 12,499,782 -20,479 -4.91%
coreclr_tests.run.windows.x64.checked.mch 409,304,402 -10,860 -3.56%
libraries.crossgen2.windows.x64.checked.mch 38,988,490 +3,892 -10.34%
libraries.pmi.windows.x64.checked.mch 59,843,714 -6,897 -9.30%
libraries_tests.run.windows.x64.Release.mch 329,775,208 -50,133 -0.12%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 152,213,317 -11,713 -8.41%
realworld.run.windows.x64.checked.mch 11,790,985 -138 -7.11%
smoke_tests.nativeaot.windows.x64.checked.mch 3,748,084 -75 -5.71%
FullOpts (-140,494 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 29,887,748 -536 +2.55%
benchmarks.run.windows.x64.checked.mch 8,915,097 -21,477 -5.22%
benchmarks.run_pgo.windows.x64.checked.mch 18,993,176 -22,078 -1.74%
benchmarks.run_tiered.windows.x64.checked.mch 3,051,029 -20,479 -4.91%
coreclr_tests.run.windows.x64.checked.mch 125,636,448 -10,860 -3.56%
libraries.crossgen2.windows.x64.checked.mch 38,986,736 +3,892 -10.34%
libraries.pmi.windows.x64.checked.mch 59,730,784 -6,897 -9.30%
libraries_tests.run.windows.x64.Release.mch 133,918,228 -50,133 -0.12%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 141,524,187 -11,713 -8.41%
realworld.run.windows.x64.checked.mch 11,566,106 -138 -7.11%
smoke_tests.nativeaot.windows.x64.checked.mch 3,747,076 -75 -5.71%

@anthonycanino
Copy link
Contributor Author

Attaching the full summary for reference:

diff_summary.1.md

@anthonycanino anthonycanino marked this pull request as ready for review March 5, 2025 00:58
@anthonycanino
Copy link
Contributor Author

@jakobbotsch @BruceForstall this is ready for review. It would be helpful if we can discuss any major changes / concerns with respect to this optimization (it will be off by default, and can be tuned at a later point as well).

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few nits: we don't want names in comments, so please fix the "TODO".

@@ -8908,6 +8954,58 @@ void CodeGen::genEmitHelperCall(unsigned helper, int argSize, emitAttr retSize,
regSet.verifyRegistersUsed(killMask);
}

insOpts CodeGen::OptsFromCFlags(insCflags flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add standard function header comments to this function, and the next one.

@@ -443,6 +443,7 @@ RELEASE_CONFIG_INTEGER(EnableArm64Sve, "EnableArm64Sve",
RELEASE_CONFIG_INTEGER(EnableEmbeddedBroadcast, "EnableEmbeddedBroadcast", 1) // Allows embedded broadcasts to be disabled
RELEASE_CONFIG_INTEGER(EnableEmbeddedMasking, "EnableEmbeddedMasking", 1) // Allows embedded masking to be disabled
RELEASE_CONFIG_INTEGER(EnableApxNDD, "EnableApxNDD", 0) // Allows APX NDD feature to be disabled
RELEASE_CONFIG_INTEGER(EnableApxConditionalChaining, "EnableApxConditionalChaining", 0) // Allows APX conditional compare chaining
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a release knob for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the moment, it's good to have as a knob until we are able to tune it on APX hardware (if needed).

@@ -4422,6 +4425,11 @@ bool Lowering::TryLowerConditionToFlagsNode(GenTree* parent, GenTree* condition,
GenTree* relopOp2 = relop->gtGetOp2();

#ifdef TARGET_XARCH
if (!allowMultipleFlagsChecks && cond->IsFloat())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run into problems with my suggestion? Not all float compares require multiple flag checks so this is unnecessarily conservative.

Also the SETCC case below also needs to be updated. And it has an ARM64 ifdef that should be enabled for xarch too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and kept it conservative, but I can try your suggestion. I am away part of this week and all of next week, will give it a chance then.

Are there any other major thoughts or comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the code now and I understand a bit better. Will make the changes and try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other major thoughts or comments?

No, I think this starting to look good. Just had a comment about unifying TryLowerAndOrToCCMP between the platforms.

Comment on lines +2303 to +2317
#if defined(TARGET_AMD64)
case GT_CCMP:
genCodeForCCMP(treeNode->AsCCMP());
break;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents this from being supported on x86?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, APX is only available in Intel 64-bit mode.

@anthonycanino anthonycanino force-pushed the apx-conditional-chaining branch 2 times, most recently from 78e46c4 to 6ba5070 Compare March 25, 2025 20:43
anthonycanino and others added 4 commits March 27, 2025 02:35
Also lifts GenConditionDesc into CodeGenInterface to better check which
flag lowerings will produce multiple instructions.
Some code will conflict with latest changes. I've squashed so we
can discuss how to merge in properly.
@anthonycanino anthonycanino force-pushed the apx-conditional-chaining branch from a27d330 to f294c57 Compare March 27, 2025 09:43
@anthonycanino
Copy link
Contributor Author

@jakobbotsch I think the failing tests are unrelated. I will grab one more final SPMI results for documentation. Is there any other changes required?

Comment on lines 4531 to 4541
#ifdef TARGET_XARCH
if (!allowMultipleFlagsChecks)
{
const GenConditionDesc& desc = GenConditionDesc::Get(*cond);

if (desc.oper != GT_NONE)
{
return false;
}
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#ifdef TARGET_XARCH
if (!allowMultipleFlagsChecks)
{
const GenConditionDesc& desc = GenConditionDesc::Get(*cond);
if (desc.oper != GT_NONE)
{
return false;
}
}
#endif
if (!allowMultipleFlagsChecks)
{
const GenConditionDesc& desc = GenConditionDesc::Get(*cond);
if (desc.oper != GT_NONE)
{
return false;
}
}

// (NOTE: This just has to make the condition be true, i.e., if the condition calls for (SF ^ OF), then
// returning one will suffice
//
// TODO-XArch-APX: Revisit this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed, it is left over from before I understood how the code worked. Internal note.

//
// Return Value:
// True if the conversion was successful, false otherwise
//
bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock)
bool Compiler::optSwitchDetectAndConvert(BasicBlock* firstBlock, bool testingForConversion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check for, and why does it affect x64 and not arm64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this check in when I saw that for x64, with conditonal chaining, an optimized switch statement/jump table was getting converted into a long series of ccmp.

With respect to ARM, I have not dug into that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post the x64 diffs comparing the PR without this and the PR with this?

Ok on arm64, we can try that as a follow up to see how it impacts it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have updated: #111072 (comment)

@anthonycanino
Copy link
Contributor Author

anthonycanino commented Apr 1, 2025

FYI, latest superpmi results with APX and no conditional chaining as the base.

Diffs are based on 2,603,000 contexts (1,081,295 MinOpts, 1,521,705 FullOpts).

MISSED contexts: 3,511 (0.13%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;EnableApxConditionalChaining=1

Overall (-118,918 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 57,020,120 -1,562 +2.56%
benchmarks.run.windows.x64.checked.mch 11,381,572 -20,107 -5.14%
benchmarks.run_pgo.windows.x64.checked.mch 39,205,286 -22,901 -3.05%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 8,821,041 -20,229 -5.19%
coreclr_tests.run.windows.x64.checked.mch 408,476,326 -4,499 -13.95%
libraries.crossgen2.windows.x64.checked.mch 38,986,902 +3,959 -9.82%
libraries.pmi.windows.x64.checked.mch 40,511,830 -4,273 -9.87%
libraries_tests.run.windows.x64.Release.mch 336,563,451 -42,717 +0.48%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 152,230,624 -6,909 -8.46%
realworld.run.windows.x64.checked.mch 11,556,000 -31 -6.77%
smoke_tests.nativeaot.windows.x64.checked.mch 4,293,239 +351 -5.89%
FullOpts (-118,918 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 28,956,036 -1,562 +2.56%
benchmarks.run.windows.x64.checked.mch 11,381,029 -20,107 -5.14%
benchmarks.run_pgo.windows.x64.checked.mch 20,236,412 -22,901 -3.05%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 8,820,574 -20,229 -5.19%
coreclr_tests.run.windows.x64.checked.mch 125,227,025 -4,499 -13.95%
libraries.crossgen2.windows.x64.checked.mch 38,985,148 +3,959 -9.82%
libraries.pmi.windows.x64.checked.mch 40,399,141 -4,273 -9.87%
libraries_tests.run.windows.x64.Release.mch 137,971,643 -42,717 +0.48%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 140,977,044 -6,909 -8.46%
realworld.run.windows.x64.checked.mch 11,331,121 -31 -6.77%
smoke_tests.nativeaot.windows.x64.checked.mch 4,292,221 +351 -5.89%

====

Diffs with optimizeSwitchDetectAndConvert removed

Diffs are based on 2,602,983 contexts (1,081,172 MinOpts, 1,521,811 FullOpts).

MISSED contexts: 3,717 (0.14%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1;EnableApxConditionalChaining=1

Overall (-98,217 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 56,774,363 -1,282 +1.63%
benchmarks.run.windows.x64.checked.mch 11,381,572 -19,921 -4.44%
benchmarks.run_pgo.windows.x64.checked.mch 39,205,286 -24,364 -1.92%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 8,821,041 -20,099 -4.58%
coreclr_tests.run.windows.x64.checked.mch 408,476,326 +12,152 -12.62%
libraries.crossgen2.windows.x64.checked.mch 38,986,902 +6,396 -8.97%
libraries.pmi.windows.x64.checked.mch 40,511,830 -2,505 -7.88%
libraries_tests.run.windows.x64.Release.mch 336,563,451 -44,147 -0.05%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 152,230,624 -5,013 -6.86%
realworld.run.windows.x64.checked.mch 11,556,000 +330 -5.01%
smoke_tests.nativeaot.windows.x64.checked.mch 4,293,239 +236 -5.77%
FullOpts (-98,217 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 28,807,001 -1,282 +1.63%
benchmarks.run.windows.x64.checked.mch 11,381,029 -19,921 -4.44%
benchmarks.run_pgo.windows.x64.checked.mch 20,236,412 -24,364 -1.92%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 8,820,574 -20,099 -4.58%
coreclr_tests.run.windows.x64.checked.mch 125,227,025 +12,152 -12.62%
libraries.crossgen2.windows.x64.checked.mch 38,985,148 +6,396 -8.97%
libraries.pmi.windows.x64.checked.mch 40,399,141 -2,505 -7.88%
libraries_tests.run.windows.x64.Release.mch 137,971,643 -44,147 -0.05%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 140,977,044 -5,013 -6.86%
realworld.run.windows.x64.checked.mch 11,331,121 +330 -5.01%
smoke_tests.nativeaot.windows.x64.checked.mch 4,292,221 +236 -5.77%

@BruceForstall
Copy link
Member

@anthonycanino Please fix the formatting issues.
@jakobbotsch Do you have any more review feedback?

@@ -4460,6 +4463,18 @@ bool Lowering::TryLowerConditionToFlagsNode(GenTree* parent, GenTree* condition,
}
#endif

#if !defined(TARGET_LOONGARCH64) && !defined(TARGET_RISCV64)
if (!allowMultipleFlagsChecks && cond->IsFloat())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!allowMultipleFlagsChecks && cond->IsFloat())
if (!allowMultipleFlagsChecks)

Nit (the check below also doesn't have this)

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit.

@BruceForstall
Copy link
Member

/ba-g azure linux timeouts

@BruceForstall BruceForstall merged commit 9e8a9e9 into dotnet:main Apr 3, 2025
118 of 120 checks passed
@anthonycanino
Copy link
Contributor Author

Thanks for all the review efforts @jakobbotsch and @BruceForstall !

khushal1996 pushed a commit to khushal1996/runtime that referenced this pull request Apr 9, 2025
* Enable conditional compare chaining for AMD64.

* Reduce duplication from `optSwitchDetectLikely`.

* Update src/coreclr/jit/lsrabuild.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/lowerxarch.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/lowerxarch.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Widen the potential candidates for ccmp folding.

Also lifts GenConditionDesc into CodeGenInterface to better check which
flag lowerings will produce multiple instructions.

* Refactor some common code into lower.cpp.

Some code will conflict with latest changes. I've squashed so we
can discuss how to merge in properly.

* Refactored common code out.

* Review edits.

* Fix build errors.

* Formatting.

---------

Co-authored-by: Bruce Forstall <[email protected]>
khushal1996 pushed a commit to khushal1996/runtime that referenced this pull request Apr 9, 2025
* Enable conditional compare chaining for AMD64.

* Reduce duplication from `optSwitchDetectLikely`.

* Update src/coreclr/jit/lsrabuild.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/lowerxarch.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Update src/coreclr/jit/lowerxarch.cpp

Co-authored-by: Bruce Forstall <[email protected]>

* Widen the potential candidates for ccmp folding.

Also lifts GenConditionDesc into CodeGenInterface to better check which
flag lowerings will produce multiple instructions.

* Refactor some common code into lower.cpp.

Some code will conflict with latest changes. I've squashed so we
can discuss how to merge in properly.

* Refactored common code out.

* Review edits.

* Fix build errors.

* Formatting.

---------

Co-authored-by: Bruce Forstall <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators May 5, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants