Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unmasking EGPRs in register allocator #114867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

khushal1996
Copy link
Member

@khushal1996 khushal1996 commented Apr 21, 2025

Public issue: #114559

Description:
Currently in JIT, we have enabled extended general-purpose registers (EGPRs) in most of the instructions via REX2 or extended EVEX prefix. But previously, out of the testing stability, we have masked out EGPRs for most of the gentree nodes when building them in register allocator (LinearScan::BuildNode). This PR is to revert the masking and let EGPR ACTUALLY available to those nodes.

In detail, we used to have masks in those nodes, and this PR will remove all of them.

  • BuildHWIntrinsic
  • BuildIntrinsic
  • BuildBlockStore
  • BuildCall
  • BuildShiftRotate
  • GT_INDEX_ADDR
  • GT_CKFINITE
  • GT_RETURNTRAP
  • GT_SWITCH_TABLE
  • GT_JMPTABLE
  • BuildMul
  • BuildIndir
  • Buildcast
  • genFnProlog

Results:

SuperPMI asmdiff:
latest main + APX on v.s. latest main + changes + APX on

Note: We tend to demonstrate the improvement of our changes, so we need to have both base and diff with APX on. We have added an icount (Instruction Count) column in superpmi results to measure how well our changes are working. The following columns are added in the superpmi run below ->
Base Instruction Count - Total Instruction in Base
Diff Instruction Count - delta from total instructions in Base(percentage change considering only methods which had a diff)(percentage change considering only the methods which had icount change)

image

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 21, 2025
@khushal1996 khushal1996 changed the title unmasking EGPRs in register allocator Unmasking EGPRs in register allocator Apr 21, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 21, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@khushal1996 khushal1996 marked this pull request as ready for review May 2, 2025 09:59
@Copilot Copilot AI review requested due to automatic review settings May 2, 2025 09:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reverts the masking of extended GPRs (EGPRs) in the register allocator to allow their full usage in JIT code generation.

  • Removed calls to BuildApxIncompatibleGPRMask in favor of unmasked register definitions when extended registers are available.
  • Updated multiple LSRA routines and code emission paths to use the new useEvex/canUseApxRegs driven logic.
  • Introduced the BuildApxIncompatibleGPRMaskIfNeeded helper to streamline candidates selection when EVEX support is available.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreclr/jit/lsraxarch.cpp Removed masking calls in node building and introduced useEvex checks.
src/coreclr/jit/lsrabuild.cpp Updated BuildBinaryUses and related routines to conditionally use APX masks.
src/coreclr/jit/lsra.h Added inline declaration for BuildApxIncompatibleGPRMaskIfNeeded.
src/coreclr/jit/instrsxarch.h Added Encoding_REX2 to imul instructions for extended registers.
src/coreclr/jit/emitxarch.cpp Adjusted movbe emission to select the APX variant based on register usage.
src/coreclr/jit/codegenxarch.cpp Updated code emission paths for movbe and internal register extraction.
src/coreclr/jit/codegencommon.cpp Modified function prologue to conditionally exclude high registers.

@khushal1996
Copy link
Member Author

@dotnet/intel for review

@BruceForstall
Copy link
Member

cc @kunalspathak @dotnet/jit-contrib

@kunalspathak kunalspathak self-requested a review May 2, 2025 23:13
@khushal1996
Copy link
Member Author

@kunalspathak for reviews

INSTMUL(imul_SI, "imul", IUM_RD, BAD_CODE, 0x003068, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF)
INSTMUL(imul_DI, "imul", IUM_RD, BAD_CODE, 0x003868, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF)
INSTMUL(imul_AX, "imul", IUM_RD, BAD_CODE, 0x000068, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF | Encoding_REX2)
INSTMUL(imul_CX, "imul", IUM_RD, BAD_CODE, 0x000868, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF | Encoding_REX2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed in this section? Is it mostly spaces?

@@ -3841,19 +3841,24 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates)
GenTree* op1 = node->gtGetOp1();
GenTree* op2 = node->gtGetOp2IfPresent();
#ifdef TARGET_XARCH
const bool canUseApxRegs = compiler->canUseApxEncoding() && compiler->canUseEvexEncoding();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canUseApxEncoding

can you cache this value in compiler object or LinearScan?

if (op1->isContainedIndir() &&
((varTypeUsesFloatReg(node) || node->OperGet() == GT_BSWAP || node->OperGet() == GT_BSWAP16)) &&
candidates == RBM_NONE)
if (op1->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (op1->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs)
if (op1->isContainedIndir() && (candidates == RBM_NONE) && !canUseApxRegs)

@@ -3879,7 +3882,7 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates)
{

#ifdef TARGET_XARCH
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && candidates == RBM_NONE)
if (op2->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (op2->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs)
if (op2->isContainedIndir() && (candidates == RBM_NONE) && !canUseApxRegs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems here, we should have if ((op2->isContainedIndir() && (candidates == RBM_NONE)) || !canUseApxRegs)...basically even if op2 is not contained indir and candidates != RBM_NONE, we should still use lowGprRegs if we cannot use APX registers.

@@ -3879,7 +3882,7 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates)
{

#ifdef TARGET_XARCH
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && candidates == RBM_NONE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here...can you double check if && !canUseApxRegs is the right check?

@@ -3982,14 +3984,6 @@ void LinearScan::BuildStoreLocDef(GenTreeLclVarCommon* storeLoc,
defCandidates = allRegs(type);
#endif // TARGET_X86

#ifdef TARGET_AMD64
if (op1->isContained() && op1->OperIs(GT_BITCAST) && varTypeUsesIntReg(varDsc->GetRegisterType(storeLoc)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we don't care if APX regs is available here and below for GT_BITCAST too?

@@ -1422,6 +1409,7 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode)
SingleTypeRegSet dstAddrRegMask = RBM_NONE;
SingleTypeRegSet srcRegMask = RBM_NONE;
SingleTypeRegSet sizeRegMask = RBM_NONE;
bool useEvex = compiler->canUseEvexEncoding();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if you can cache this value in LinearScan object. It is used at multiple places.

@@ -1542,7 +1529,8 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode)
// or if are but the remainder is a power of 2 and less than the
// size of a register

SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true);
// SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true);
SingleTypeRegSet regMask = availableIntRegs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some of these places, don't we need useEvex check, the way it is done at other places?

@@ -1542,7 +1529,8 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode)
// or if are but the remainder is a power of 2 and less than the
// size of a register

SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true);
// SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the comment

BuildUse(srcAddrOrFill, BuildApxIncompatibleGPRMask(srcAddrOrFill, srcRegMask));
if (useEvex)
{
BuildUse(srcAddrOrFill, srcRegMask);
Copy link
Member

@kunalspathak kunalspathak May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useEvex check is scattered through out the code. Is it possible to create a helper method or update BuildApxIncompatibleGPRMask that takes the original mask. It will check if useEvex == true, return that mask as it is, otherwise do BuildApxIncompatibleGPRMask() operation on it to get right mask.

Just noticed you already have BuildApxIncompatibleGPRMaskIfNeeded. Can we reuse it at other places as well?

GenTree* user = nullptr;

if (LIR::AsRange(blockSequence[curBBSeqNum]).TryGetUse(intrinsicTree, &use))
// TODO-XArch-APX: some of the permute intrinsics are APX-EVEX compatible, we need to separate and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a GH issue for this?

assert(INS_imul_21 - INS_imul_AX == REG_R21);
assert(INS_imul_22 - INS_imul_AX == REG_R22);
assert(INS_imul_23 - INS_imul_AX == REG_R23);
// TODO-XArch-APX: The asserts below need the register definition from R24~R31.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have it right?

@kunalspathak
Copy link
Member

SuperPMI asmdiff:

Can you also share the tpdiff numbers for similar configuration?

Also I was hoping the asmdiff on CI to be zero diff, but still seeing differences. Do we know why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants