-
Notifications
You must be signed in to change notification settings - Fork 5k
Unmasking EGPRs in register allocator #114867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
ca02c0a
to
2efec54
Compare
GT_RETURNTRAP GT_SWITCH_TABLE GT_JMPTABLE genFnProlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reverts the masking of extended GPRs (EGPRs) in the register allocator to allow their full usage in JIT code generation.
- Removed calls to BuildApxIncompatibleGPRMask in favor of unmasked register definitions when extended registers are available.
- Updated multiple LSRA routines and code emission paths to use the new useEvex/canUseApxRegs driven logic.
- Introduced the BuildApxIncompatibleGPRMaskIfNeeded helper to streamline candidates selection when EVEX support is available.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/coreclr/jit/lsraxarch.cpp | Removed masking calls in node building and introduced useEvex checks. |
src/coreclr/jit/lsrabuild.cpp | Updated BuildBinaryUses and related routines to conditionally use APX masks. |
src/coreclr/jit/lsra.h | Added inline declaration for BuildApxIncompatibleGPRMaskIfNeeded. |
src/coreclr/jit/instrsxarch.h | Added Encoding_REX2 to imul instructions for extended registers. |
src/coreclr/jit/emitxarch.cpp | Adjusted movbe emission to select the APX variant based on register usage. |
src/coreclr/jit/codegenxarch.cpp | Updated code emission paths for movbe and internal register extraction. |
src/coreclr/jit/codegencommon.cpp | Modified function prologue to conditionally exclude high registers. |
@dotnet/intel for review |
cc @kunalspathak @dotnet/jit-contrib |
@kunalspathak for reviews |
INSTMUL(imul_SI, "imul", IUM_RD, BAD_CODE, 0x003068, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF) | ||
INSTMUL(imul_DI, "imul", IUM_RD, BAD_CODE, 0x003868, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF) | ||
INSTMUL(imul_AX, "imul", IUM_RD, BAD_CODE, 0x000068, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF | Encoding_REX2) | ||
INSTMUL(imul_CX, "imul", IUM_RD, BAD_CODE, 0x000868, BAD_CODE, INS_TT_NONE, Writes_OF | Undefined_SF | Undefined_ZF | Undefined_AF | Undefined_PF | Writes_CF | INS_FLAGS_Has_Sbit | INS_Flags_Has_NF | Encoding_REX2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed in this section? Is it mostly spaces?
@@ -3841,19 +3841,24 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates) | |||
GenTree* op1 = node->gtGetOp1(); | |||
GenTree* op2 = node->gtGetOp2IfPresent(); | |||
#ifdef TARGET_XARCH | |||
const bool canUseApxRegs = compiler->canUseApxEncoding() && compiler->canUseEvexEncoding(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (op1->isContainedIndir() && | ||
((varTypeUsesFloatReg(node) || node->OperGet() == GT_BSWAP || node->OperGet() == GT_BSWAP16)) && | ||
candidates == RBM_NONE) | ||
if (op1->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (op1->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs) | |
if (op1->isContainedIndir() && (candidates == RBM_NONE) && !canUseApxRegs) |
@@ -3879,7 +3882,7 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates) | |||
{ | |||
|
|||
#ifdef TARGET_XARCH | |||
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && candidates == RBM_NONE) | |||
if (op2->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (op2->isContainedIndir() && candidates == RBM_NONE && !canUseApxRegs) | |
if (op2->isContainedIndir() && (candidates == RBM_NONE) && !canUseApxRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems here, we should have if ((op2->isContainedIndir() && (candidates == RBM_NONE)) || !canUseApxRegs)
...basically even if op2
is not contained indir and candidates != RBM_NONE
, we should still use lowGprRegs
if we cannot use APX registers.
@@ -3879,7 +3882,7 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates) | |||
{ | |||
|
|||
#ifdef TARGET_XARCH | |||
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && candidates == RBM_NONE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here...can you double check if && !canUseApxRegs
is the right check?
@@ -3982,14 +3984,6 @@ void LinearScan::BuildStoreLocDef(GenTreeLclVarCommon* storeLoc, | |||
defCandidates = allRegs(type); | |||
#endif // TARGET_X86 | |||
|
|||
#ifdef TARGET_AMD64 | |||
if (op1->isContained() && op1->OperIs(GT_BITCAST) && varTypeUsesIntReg(varDsc->GetRegisterType(storeLoc))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we don't care if APX regs is available here and below for GT_BITCAST
too?
@@ -1422,6 +1409,7 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode) | |||
SingleTypeRegSet dstAddrRegMask = RBM_NONE; | |||
SingleTypeRegSet srcRegMask = RBM_NONE; | |||
SingleTypeRegSet sizeRegMask = RBM_NONE; | |||
bool useEvex = compiler->canUseEvexEncoding(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if you can cache this value in LinearScan
object. It is used at multiple places.
@@ -1542,7 +1529,8 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode) | |||
// or if are but the remainder is a power of 2 and less than the | |||
// size of a register | |||
|
|||
SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true); | |||
// SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true); | |||
SingleTypeRegSet regMask = availableIntRegs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some of these places, don't we need useEvex
check, the way it is done at other places?
@@ -1542,7 +1529,8 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode) | |||
// or if are but the remainder is a power of 2 and less than the | |||
// size of a register | |||
|
|||
SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true); | |||
// SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the comment
BuildUse(srcAddrOrFill, BuildApxIncompatibleGPRMask(srcAddrOrFill, srcRegMask)); | ||
if (useEvex) | ||
{ | ||
BuildUse(srcAddrOrFill, srcRegMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useEvex
check is scattered through out the code. Is it possible to create a helper method or update BuildApxIncompatibleGPRMask
that takes the original mask. It will check if useEvex == true
, return that mask as it is, otherwise do BuildApxIncompatibleGPRMask()
operation on it to get right mask.
Just noticed you already have BuildApxIncompatibleGPRMaskIfNeeded
. Can we reuse it at other places as well?
GenTree* user = nullptr; | ||
|
||
if (LIR::AsRange(blockSequence[curBBSeqNum]).TryGetUse(intrinsicTree, &use)) | ||
// TODO-XArch-APX: some of the permute intrinsics are APX-EVEX compatible, we need to separate and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a GH issue for this?
assert(INS_imul_21 - INS_imul_AX == REG_R21); | ||
assert(INS_imul_22 - INS_imul_AX == REG_R22); | ||
assert(INS_imul_23 - INS_imul_AX == REG_R23); | ||
// TODO-XArch-APX: The asserts below need the register definition from R24~R31. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have it right?
Can you also share the tpdiff numbers for similar configuration? Also I was hoping the asmdiff on CI to be zero diff, but still seeing differences. Do we know why? |
Public issue: #114559
Description:
Currently in JIT, we have enabled extended general-purpose registers (EGPRs) in most of the instructions via REX2 or extended EVEX prefix. But previously, out of the testing stability, we have masked out EGPRs for most of the gentree nodes when building them in register allocator (
LinearScan::BuildNode
). This PR is to revert the masking and let EGPR ACTUALLY available to those nodes.In detail, we used to have masks in those nodes, and this PR will remove all of them.
BuildHWIntrinsic
BuildIntrinsic
BuildBlockStore
BuildCall
BuildShiftRotate
GT_INDEX_ADDR
GT_CKFINITE
GT_RETURNTRAP
GT_SWITCH_TABLE
GT_JMPTABLE
BuildMul
BuildIndir
Buildcast
genFnProlog
Results:
SuperPMI asmdiff:
latest main + APX on
v.s.latest main + changes + APX on