-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Unmasking EGPRs in register allocator #114867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
ca02c0a
to
2efec54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reverts the masking of extended GPRs (EGPRs) in the register allocator to allow their full usage in JIT code generation.
- Removed calls to BuildApxIncompatibleGPRMask in favor of unmasked register definitions when extended registers are available.
- Updated multiple LSRA routines and code emission paths to use the new useEvex/canUseApxRegs driven logic.
- Introduced the BuildApxIncompatibleGPRMaskIfNeeded helper to streamline candidates selection when EVEX support is available.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/coreclr/jit/lsraxarch.cpp | Removed masking calls in node building and introduced useEvex checks. |
src/coreclr/jit/lsrabuild.cpp | Updated BuildBinaryUses and related routines to conditionally use APX masks. |
src/coreclr/jit/lsra.h | Added inline declaration for BuildApxIncompatibleGPRMaskIfNeeded. |
src/coreclr/jit/instrsxarch.h | Added Encoding_REX2 to imul instructions for extended registers. |
src/coreclr/jit/emitxarch.cpp | Adjusted movbe emission to select the APX variant based on register usage. |
src/coreclr/jit/codegenxarch.cpp | Updated code emission paths for movbe and internal register extraction. |
src/coreclr/jit/codegencommon.cpp | Modified function prologue to conditionally exclude high registers. |
@dotnet/intel for review |
cc @kunalspathak @dotnet/jit-contrib |
@kunalspathak for reviews |
@@ -3879,7 +3882,7 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates) | |||
{ | |||
|
|||
#ifdef TARGET_XARCH | |||
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && candidates == RBM_NONE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here...can you double check if && !canUseApxRegs
is the right check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make sure that if APX is not supported, we prune down the available candidates to only lowGPRRegs
. @Ruihan-Yin to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canUseApxRegs
is checking if both APX and EVEX are available, only with both can we guarantee instructions will have the right encoding with EGPRs on its operand. In some cases, we don't care EVEX because we can assure all the nodes in the build function will never need EVEX, but in most cases, we need both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same argument I made in #114867 (comment). Basically, we want to strictly use lowGprRegs
if:
- (
op2->isContainedIndir
|| ...) OR - apx is not supported
Basically, shouldn't we have this?
- if (op1->isContainedIndir() && !getCanUseApxRegs())
+ if (op1->isContainedIndir() || !getCanUseApxRegs())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kunal,
Tested the above changes out. We just want to force lowGPRs
when following condition is true --> op1->isContainedIndir() && (candidates == RBM_NONE)
and we dont have APX
Hence the changes. Let me know if you have any questions. Other cases do not need forced to lowGPRs
. I tested the changes out on superpmi replay
with APX on / off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kunal is right.
Theoretically we have 3 cases
Case 1: APX is not supported at all – We do not need to worry about it at all since high GPR doesn’t come into play at all. So, in effect, candidates are limited to lowGPRs
Case 2: APX is supported but EVEX support is not there – In this case, we need to restrict candidates to just lowGPRs
Case 3: APX support exists with EVEX support. – In this case, we do not need to do anything. Can give LSRA access to all registers for this node
I assume you merged case 1 and case 2 into the following condition:
if (op2->isContainedIndir() && (candidates == RBM_NONE) && !getCanUseApxRegs())
This means
If op2->isContainedIndir() && (candidates == RBM_NONE)
is true
, we can't give access to eGPR unless APX AND EVEX are available
In code, if getCanUseApxRegs() == false
one of isApxSupported
or evexIsSupported
is false.
if isApxSupported
is false
but evexIsSupported
is true
it's okay to call BuildOperandUses(op1, candidates)
; since APX not supported guarantees candidates will not include eGPR
2. If evexIsSupported
is false
but isApxSupported
is true
, candidates can possibly include eGPR but it should not be used since the node might use an instruction that does not have eEVEX support
Consider the case where evexIsSupported
is false
but isApxSupported
is true
and (candidates != RBM_NONE)
- Can we have this case happen?
- If this is possible, will this not break the code since it might assign a eGPR to this node which cannot be encoded with eEVEX?
I think the original condition I added is a problem as well
if (op1->isContainedIndir() && ((varTypeUsesFloatReg(node) || node->OperGet() == GT_BSWAP || node->OperGet() == GT_BSWAP16)) && candidates == RBM_NONE)
I made an assumption here that if candidates != RBM_NONE
, it's already been handled to account for APX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunalspathak Do you think this makes sense here?
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && !getEvexIsSupported())
{
if(candidates == RBM_NONE)
{
candidates = lowGprRegs;
}
else
{
assert(candidate & ~rbmAllInt == RBM_NONE);
candidates &= lowGprRegs;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes..@khushal1996, please also capture the comments in method docs so we know what all scenarios exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kunalspathak I have made the suggested changes after discussing with @DeepakRajendrakumaran.
@@ -3982,14 +3984,6 @@ void LinearScan::BuildStoreLocDef(GenTreeLclVarCommon* storeLoc, | |||
defCandidates = allRegs(type); | |||
#endif // TARGET_X86 | |||
|
|||
#ifdef TARGET_AMD64 | |||
if (op1->isContained() && op1->OperIs(GT_BITCAST) && varTypeUsesIntReg(varDsc->GetRegisterType(storeLoc))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we don't care if APX regs is available here and below for GT_BITCAST
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ruihan-Yin can you please comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be lowered to REX2 or eEVEX instructions, we will need to check if APX encodings are available in this case. I missed the SIMD load initially.
@@ -1542,7 +1529,8 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode) | |||
// or if are but the remainder is a power of 2 and less than the | |||
// size of a register | |||
|
|||
SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true); | |||
// SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true); | |||
SingleTypeRegSet regMask = availableIntRegs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at some of these places, don't we need useEvex
check, the way it is done at other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored some code in my latest commit.
src/coreclr/jit/lsraxarch.cpp
Outdated
BuildUse(srcAddrOrFill, BuildApxIncompatibleGPRMask(srcAddrOrFill, srcRegMask)); | ||
if (useEvex) | ||
{ | ||
BuildUse(srcAddrOrFill, srcRegMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useEvex
check is scattered through out the code. Is it possible to create a helper method or update BuildApxIncompatibleGPRMask
that takes the original mask. It will check if useEvex == true
, return that mask as it is, otherwise do BuildApxIncompatibleGPRMask()
operation on it to get right mask.
Just noticed you already have BuildApxIncompatibleGPRMaskIfNeeded
. Can we reuse it at other places as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored some code to use BuildApxIncompatibleGPRMaskIfNeeded
src/coreclr/jit/lsraxarch.cpp
Outdated
GenTree* user = nullptr; | ||
|
||
if (LIR::AsRange(blockSequence[curBBSeqNum]).TryGetUse(intrinsicTree, &use)) | ||
// TODO-XArch-APX: some of the permute intrinsics are APX-EVEX compatible, we need to separate and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a GH issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This was a stale comment and has been removed in latest changes.
Can you also share the tpdiff numbers for similar configuration? Also I was hoping the asmdiff on CI to be zero diff, but still seeing differences. Do we know why? |
src/coreclr/jit/lsraxarch.cpp
Outdated
// NI_System_Math_Abs is the only one likely to use a GPR | ||
op1RegCandidates = BuildApxIncompatibleGPRMask(op1, op1RegCandidates); | ||
if (op1RegCandidates == RBM_NONE) | ||
// op1RegCandidates = BuildApxIncompatibleGPRMask(op1, op1RegCandidates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.
src/coreclr/jit/lsraxarch.cpp
Outdated
case NI_System_Math_Abs: | ||
{ | ||
op1RegCandidates = BuildApxIncompatibleGPRMaskIfNeeded(op1, RBM_NONE, getCanUseApxRegs()); | ||
// getCanUseApxRegs() ? op1RegCandidates : BuildApxIncompatibleGPRMask(op1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.
I think this method is not named correctly. all GPR regs (low and high) work with APX, as far as my understanding. It is just high GPR (eGPR) will use REX2 encoding (correct me if I am wrong). So we are not really returning register mask that is incompatible for APX, but just prefer to return Refers to: src/coreclr/jit/lsraxarch.cpp:3378 in 34c693f. [](commit_id = 34c693f, deletion_comment = False) |
Renaming it to |
{ | ||
assert((candidates & lowGprRegs) != RBM_NONE); | ||
srcCount += BuildOperandUses(op1, candidates & lowGprRegs); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI - @DeepakRajendrakumaran
the else part if not reached right now. Shall I change it to assert(false) to make sure that if anyone reached this part, it fails and knows about this handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI - @DeepakRajendrakumaran the else part if not reached right now. Shall I change it to assert(false) to make sure that if anyone reached this part, it fails and knows about this handling?
I think it's fine as is. Will leave it up to @kunalspathak
src/coreclr/jit/codegencommon.cpp
Outdated
// TODO-Xarch-apx : Revert. Excluding eGPR so that it's not used for non REX2 supported movs. | ||
excludeMask = excludeMask | RBM_HIGHINT; | ||
// we'd require eEVEX present to enable EGPRs in HWIntrinsics. | ||
if (!compiler->canUseEvexEncoding() || !compiler->canUseApxEncoding()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the !compiler->canUseApxEncoding()
part here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think so . I ran the replays and looks good. I have removed it in my latest changes.
// Get the APXIncompatible register first | ||
regNumber tmpReg2 = internalRegisters.Extract(treeNode); | ||
// tmpReg1 can be EGPR | ||
regNumber tmpReg1 = internalRegisters.Extract(treeNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain this part step by step?
- What are we doing here? i.e., which node are we generating code for and what instruction is being generated etc?
- what exactly changed w.r.t CodeGen with this change
- Why is
regNumber tmpReg2 = internalRegisters.Extract(treeNode);
guaranteed to be aAPXIncompatible register
{ | ||
assert((candidates & lowGprRegs) != RBM_NONE); | ||
srcCount += BuildOperandUses(op1, candidates & lowGprRegs); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI - @DeepakRajendrakumaran the else part if not reached right now. Shall I change it to assert(false) to make sure that if anyone reached this part, it fails and knows about this handling?
I think it's fine as is. Will leave it up to @kunalspathak
src/coreclr/jit/lsra.cpp
Outdated
isApxSupported = compiler->canUseApxEncoding(); | ||
if (isApxSupported) | ||
apxIsSupported = compiler->canUseApxEncoding(); | ||
canUseApxRegs = apxIsSupported && evexIsSupported; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need for canUseApxRegs
based on our discussions? Will evexIsSupported
check suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used in soe places whcih could do fine with Evex checks. I have made the changes accordingly.
} | ||
} | ||
|
||
ins = needsEvex ? INS_movbe_apx : INS_movbe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ins = needsEvex ? INS_movbe_apx : INS_movbe; | |
instruction ins = needsEvex ? INS_movbe_apx : INS_movbe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is AMD64
code and hence ins is defined outside the #ifdef
checks.
src/coreclr/jit/lsrabuild.cpp
Outdated
// to restrict candidates to just lowGPRs | ||
// Case 3: APX support exists with EVEX support. – In this case, we do not need | ||
// to do anything. Can give LSRA access to all registers for this node | ||
// Case 4: APX support without Evex support - candidates can possibly include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't case 2 and case 4 the same?
src/coreclr/jit/lsraxarch.cpp
Outdated
// updated register mask. | ||
inline SingleTypeRegSet LinearScan::ForceLowGprForApxIfNeeded(GenTree* tree, | ||
SingleTypeRegSet candidates, | ||
bool UseApxRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool UseApxRegs) | |
bool useApxRegs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes in my latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added minor comments
src/coreclr/jit/lsraxarch.cpp
Outdated
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true)); | ||
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related comments - #114867 (comment), #114867 (comment), #114867 (comment)
@DeepakRajendrakumaran @kunalspathak
Details about this change below -->
- The above change generates 2 temporary register. One of them can be EGPR where as the other one cannot.
- Node affected - Cast nodes (
Ulong -> float
) - Without the
tmpReg1
,tmpReg2
swap we would see this code as
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true));
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true));
Sample program -
[MethodImplAttribute(MethodImplOptions.NoInlining)]
public static float UlongToFloat(ulong val)
{
return (float)val;
}
public static int Main(string[] args)
{
Console.WriteLine(UlongToFloat(1500));
}
Disasm
G_M49684_IG02: ;; offset=0x0000
C5F857C0 vxorps xmm0, xmm0, xmm0
488BC1 mov rax, rcx
48D1E8 shr rax, 1
8BD1 mov edx, ecx
83E201 and edx, 1
480BD0 or rdx, rax
4885C9 test rcx, rcx
480F49D1 cmovns rdx, rcx
C4E1FA2AC2 vcvtsi2ss xmm0, rdx
In the above disasm, Ins_mov
, Ins_shr
and Ins_or
can use EGPR and allowing one of the temp registers in RA
will allow EGPR usage.
The below superpmi run shows the asmdiffs with for only this case.
Diffs are based on 25,335 contexts (13 MinOpts, 25,322 FullOpts).
Base JIT options: JitBypassApxCheck=1;JitStressRegs=4000
Diff JIT options: JitBypassApxCheck=1;JitStressRegs=4000
Overall (+57 bytes)
Collection | Base size (bytes) | Diff size (bytes) | PerfScore in Diffs |
---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 6,084,089 | +57 | 0.00% |
FullOpts (+57 bytes)
Collection | Base size (bytes) | Diff size (bytes) | PerfScore in Diffs |
---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 6,083,071 | +57 | 0.00% |
Example diffs
smoke_tests.nativeaot.windows.x64.checked.mch
+6 (+0.22%) : 7020.dasm - System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
@@ -485,17 +485,17 @@ G_M25508_IG41: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=10000 {r16}, by
; byrRegs +[rbp]
mov rcx, qword ptr [rbp]
xorps xmm0, xmm0
- mov rax, rcx
- shr rax, 1
+ mov r21, rcx
+ shr r21, 1
mov edx, ecx
and edx, 1
- or rdx, rax
+ or rdx, r21
test rcx, rcx
cmovns rdx, rcx
cvtsi2ss xmm0, rdx
jns SHORT G_M25508_IG42
addss xmm0, xmm0
- ;; size=57 bbWeight=2 PerfScore 35.17
+ ;; size=60 bbWeight=2 PerfScore 35.17
G_M25508_IG42: ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=10020 {rbp r16}, gcvars, byref
; GC ptr vars -{V11}
mov rbx, r16
@@ -512,17 +512,17 @@ G_M25508_IG43: ; bbWeight=2, gcVars=0000000000000001 {V11}, gcrefRegs=000
; byrRegs +[rbp]
mov rcx, qword ptr [rbp]
xorps xmm0, xmm0
- mov rax, rcx
- shr rax, 1
+ mov r21, rcx
+ shr r21, 1
mov edx, ecx
and edx, 1
- or rdx, rax
+ or rdx, r21
test rcx, rcx
cmovns rdx, rcx
cvtsi2sd xmm0, rdx
jns SHORT G_M25508_IG44
addsd xmm0, xmm0
- ;; size=51 bbWeight=2 PerfScore 33.17
+ ;; size=54 bbWeight=2 PerfScore 33.17
G_M25508_IG44: ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=10020 {rbp r16}, gcvars, byref
; GC ptr vars -{V00 V11}
mov rbx, r16
@@ -1207,7 +1207,7 @@ RWD144 dd G_M25508_IG80 - G_M25508_IG02
dd G_M25508_IG90 - G_M25508_IG02
-; Total bytes of code 2723, prolog size 47, PerfScore 1131.45, instruction count 588, allocated bytes for code 2723 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
+; Total bytes of code 2729, prolog size 47, PerfScore 1131.45, instruction count 588, allocated bytes for code 2729 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
; ============================================================
Unwind Info:
+6 (+0.28%) : 18129.dasm - System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
@@ -632,19 +632,20 @@ G_M48359_IG49: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=10000 {r16},
; byrRegs +[rbx]
mov rcx, qword ptr [rbx]
xorps xmm0, xmm0
- mov rax, rcx
- shr rax, 1
+ mov r16, rcx
+ ; byrRegs -[r16]
+ shr r16, 1
mov edx, ecx
and edx, 1
- or rdx, rax
+ or rdx, r16
test rcx, rcx
cmovns rdx, rcx
cvtsi2ss xmm0, rdx
jns SHORT G_M48359_IG50
addss xmm0, xmm0
- ;; size=47 bbWeight=0.50 PerfScore 7.92
+ ;; size=50 bbWeight=0.50 PerfScore 7.92
G_M48359_IG50: ; bbWeight=0.50, gcVars=0000000000000001 {V10}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
- ; byrRegs -[rbx r16]
+ ; byrRegs -[rbx]
; GC ptr vars -{V00 V01 V26}
mov rbp, bword ptr [rsp+0x28]
; byrRegs +[rbp]
@@ -660,19 +661,20 @@ G_M48359_IG51: ; bbWeight=0.50, gcVars=0000000000000003 {V10 V26}, gcrefR
; byrRegs +[rbx]
mov rcx, qword ptr [rbx]
xorps xmm0, xmm0
- mov rax, rcx
- shr rax, 1
+ mov r16, rcx
+ ; byrRegs -[r16]
+ shr r16, 1
mov edx, ecx
and edx, 1
- or rdx, rax
+ or rdx, r16
test rcx, rcx
cmovns rdx, rcx
cvtsi2sd xmm0, rdx
jns SHORT G_M48359_IG52
addsd xmm0, xmm0
- ;; size=51 bbWeight=0.50 PerfScore 7.92
+ ;; size=54 bbWeight=0.50 PerfScore 7.92
G_M48359_IG52: ; bbWeight=0.50, gcVars=0000000000000001 {V10}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
- ; byrRegs -[rbx r16]
+ ; byrRegs -[rbx]
; GC ptr vars -{V01 V26}
mov rbp, bword ptr [rsp+0x28]
; byrRegs +[rbp]
@@ -966,7 +968,7 @@ RWD176 dd G_M48359_IG72 - G_M48359_IG02
dd G_M48359_IG70 - G_M48359_IG02
-; Total bytes of code 2132, prolog size 13, PerfScore 284.20, instruction count 470, allocated bytes for code 2132 (MethodHash=15234318) for method System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
+; Total bytes of code 2138, prolog size 13, PerfScore 284.20, instruction count 470, allocated bytes for code 2138 (MethodHash=15234318) for method System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
; ============================================================
Unwind Info:
+3 (+0.58%) : 16035.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -203,17 +203,17 @@ G_M60445_IG20: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
; byrRegs -[rbp]
mov r8, r16
vxorps xmm0, xmm0, xmm0
- mov rax, r8
- shr rax, 1
- mov ecx, r8d
- and ecx, 1
- or rcx, rax
+ mov r17, r8
+ shr r17, 1
+ mov eax, r8d
+ and eax, 1
+ or rax, r17
test r8, r8
- cmovns rcx, r8
- vcvtsi2sd xmm0, rcx
+ cmovns rax, r8
+ vcvtsi2sd xmm0, rax
jns SHORT G_M60445_IG21
vaddsd xmm0, xmm0
- ;; size=41 bbWeight=0.50 PerfScore 6.29
+ ;; size=44 bbWeight=0.50 PerfScore 6.29
G_M60445_IG21: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
cmp r20, 23
jae SHORT G_M60445_IG26
@@ -249,7 +249,7 @@ G_M60445_IG26: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 515, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 515 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
+; Total bytes of code 518, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 518 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
; ============================================================
Unwind Info:
+6 (+1.13%) : 22054.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -81,19 +81,19 @@ G_M42680_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov r16, rax
; gcrRegs +[r16]
xorps xmm0, xmm0
- mov rax, rbp
- ; gcrRegs -[rax]
- shr rax, 1
+ mov r17, rbp
+ shr r17, 1
mov ecx, ebp
and ecx, 1
- or rcx, rax
+ or rcx, r17
test rbp, rbp
cmovns rcx, rbp
cvtsi2sd xmm0, rcx
jns SHORT G_M42680_IG06
addsd xmm0, xmm0
- ;; size=51 bbWeight=0.50 PerfScore 7.04
+ ;; size=54 bbWeight=0.50 PerfScore 7.04
G_M42680_IG06: ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+ ; gcrRegs -[rax]
mov rcx, r16
; gcrRegs +[rcx]
movsd qword ptr [rcx+0x08], xmm0
@@ -108,19 +108,19 @@ G_M42680_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov r16, rax
; gcrRegs +[r16]
xorps xmm0, xmm0
- mov rax, rbp
- ; gcrRegs -[rax]
- shr rax, 1
+ mov r17, rbp
+ shr r17, 1
mov ecx, ebp
and ecx, 1
- or rcx, rax
+ or rcx, r17
test rbp, rbp
cmovns rcx, rbp
cvtsi2ss xmm0, rcx
jns SHORT G_M42680_IG08
addss xmm0, xmm0
- ;; size=51 bbWeight=0.50 PerfScore 7.04
+ ;; size=54 bbWeight=0.50 PerfScore 7.04
G_M42680_IG08: ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+ ; gcrRegs -[rax]
mov rcx, r16
; gcrRegs +[rcx]
movss dword ptr [rcx+0x08], xmm0
@@ -288,7 +288,7 @@ RWD00 dd G_M42680_IG17 - G_M42680_IG02
dd G_M42680_IG04 - G_M42680_IG02
-; Total bytes of code 531, prolog size 5, PerfScore 55.62, instruction count 123, allocated bytes for code 531 (MethodHash=52025947) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
+; Total bytes of code 537, prolog size 5, PerfScore 55.62, instruction count 123, allocated bytes for code 537 (MethodHash=52025947) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
; ============================================================
Unwind Info:
+6 (+0.89%) : 22059.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -81,19 +81,19 @@ G_M29411_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov r16, rax
; gcrRegs +[r16]
xorps xmm0, xmm0
- mov rax, rbp
- ; gcrRegs -[rax]
- shr rax, 1
+ mov r17, rbp
+ shr r17, 1
mov ecx, ebp
and ecx, 1
- or rcx, rax
+ or rcx, r17
test rbp, rbp
cmovns rcx, rbp
cvtsi2sd xmm0, rcx
jns SHORT G_M29411_IG06
addsd xmm0, xmm0
- ;; size=51 bbWeight=0.50 PerfScore 7.04
+ ;; size=54 bbWeight=0.50 PerfScore 7.04
G_M29411_IG06: ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+ ; gcrRegs -[rax]
mov rcx, r16
; gcrRegs +[rcx]
movsd qword ptr [rcx+0x08], xmm0
@@ -108,19 +108,19 @@ G_M29411_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov r16, rax
; gcrRegs +[r16]
xorps xmm0, xmm0
- mov rax, rbp
- ; gcrRegs -[rax]
- shr rax, 1
+ mov r17, rbp
+ shr r17, 1
mov ecx, ebp
and ecx, 1
- or rcx, rax
+ or rcx, r17
test rbp, rbp
cmovns rcx, rbp
cvtsi2ss xmm0, rcx
jns SHORT G_M29411_IG08
addss xmm0, xmm0
- ;; size=51 bbWeight=0.50 PerfScore 7.04
+ ;; size=54 bbWeight=0.50 PerfScore 7.04
G_M29411_IG08: ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+ ; gcrRegs -[rax]
mov rcx, r16
; gcrRegs +[rcx]
movss dword ptr [rcx+0x08], xmm0
@@ -318,7 +318,7 @@ RWD00 dd G_M29411_IG17 - G_M29411_IG02
dd G_M29411_IG04 - G_M29411_IG02
-; Total bytes of code 676, prolog size 5, PerfScore 61.87, instruction count 150, allocated bytes for code 676 (MethodHash=9b428d1c) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
+; Total bytes of code 682, prolog size 5, PerfScore 61.87, instruction count 150, allocated bytes for code 682 (MethodHash=9b428d1c) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
; ============================================================
Unwind Info:
+3 (+0.58%) : 2277.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -203,17 +203,17 @@ G_M60445_IG20: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
; byrRegs -[rbp]
mov r8, r16
xorps xmm0, xmm0
- mov rax, r8
- shr rax, 1
- mov ecx, r8d
- and ecx, 1
- or rcx, rax
+ mov r17, r8
+ shr r17, 1
+ mov eax, r8d
+ and eax, 1
+ or rax, r17
test r8, r8
- cmovns rcx, r8
- cvtsi2sd xmm0, rcx
+ cmovns rax, r8
+ cvtsi2sd xmm0, rax
jns SHORT G_M60445_IG21
addsd xmm0, xmm0
- ;; size=40 bbWeight=0.50 PerfScore 6.29
+ ;; size=43 bbWeight=0.50 PerfScore 6.29
G_M60445_IG21: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
cmp r20, 23
jae SHORT G_M60445_IG26
@@ -249,7 +249,7 @@ G_M60445_IG26: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
int3
;; size=6 bbWeight=0 PerfScore 0.00
-; Total bytes of code 514, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 514 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
+; Total bytes of code 517, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 517 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
; ============================================================
Unwind Info:
Details
Size improvements/regressions per collection
Collection | Contexts with diffs | Improvements | Regressions | Same size | Improvements (bytes) | Regressions (bytes) |
---|---|---|---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 11 | 0 | 11 | 0 | -0 | +57 |
PerfScore improvements/regressions per collection
Collection | Contexts with diffs | Improvements | Regressions | Same PerfScore | Improvements (PerfScore) | Regressions (PerfScore) | PerfScore Overall in FullOpts |
---|---|---|---|---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 11 | 0 | 0 | 11 | 0.00% | 0.00% | 0.0000% |
Context information
Collection | Diffed contexts | MinOpts | FullOpts | Missed, base | Missed, diff |
---|---|---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 25,335 | 13 | 25,322 | 0 (0.00%) | 0 (0.00%) |
jit-analyze output
Superpmi diff with Just APX on -->
Diffs are based on 25,335 contexts (13 MinOpts, 25,322 FullOpts).
Base JIT options: JitBypassApxCheck=1
Diff JIT options: JitBypassApxCheck=1
Overall (-20 bytes)
Collection | Base size (bytes) | Diff size (bytes) | PerfScore in Diffs |
---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 4,308,719 | -20 | 0.00% |
FullOpts (-20 bytes)
Collection | Base size (bytes) | Diff size (bytes) | PerfScore in Diffs |
---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 4,307,701 | -20 | 0.00% |
Example diffs
smoke_tests.nativeaot.windows.x64.checked.mch
-4 (-0.25%) : 1229.dasm - System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
@@ -473,17 +473,17 @@ G_M25508_IG43: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi},
jne SHORT G_M25508_IG45
mov rcx, qword ptr [rbx]
xorps xmm0, xmm0
- mov rdx, rcx
- shr rdx, 1
- mov r8d, ecx
- and r8d, 1
- or r8, rdx
+ mov r8, rcx
+ shr r8, 1
+ mov edx, ecx
+ and edx, 1
+ or rdx, r8
test rcx, rcx
- cmovns r8, rcx
- cvtsi2ss xmm0, r8
+ cmovns rdx, rcx
+ cvtsi2ss xmm0, rdx
jns SHORT G_M25508_IG44
addss xmm0, xmm0
- ;; size=45 bbWeight=2 PerfScore 31.17
+ ;; size=43 bbWeight=2 PerfScore 31.17
G_M25508_IG44: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
movss dword ptr [rsi], xmm0
jmp G_M25508_IG35
@@ -493,17 +493,17 @@ G_M25508_IG45: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi},
jne G_M25508_IG30
mov rcx, qword ptr [rbx]
xorps xmm0, xmm0
- mov rdx, rcx
- shr rdx, 1
- mov r8d, ecx
- and r8d, 1
- or r8, rdx
+ mov r8, rcx
+ shr r8, 1
+ mov edx, ecx
+ and edx, 1
+ or rdx, r8
test rcx, rcx
- cmovns r8, rcx
- cvtsi2sd xmm0, r8
+ cmovns rdx, rcx
+ cvtsi2sd xmm0, rdx
jns SHORT G_M25508_IG46
addsd xmm0, xmm0
- ;; size=49 bbWeight=2 PerfScore 31.17
+ ;; size=47 bbWeight=2 PerfScore 31.17
G_M25508_IG46: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
movsd qword ptr [rsi], xmm0
jmp G_M25508_IG35
@@ -792,7 +792,7 @@ RWD176 dd G_M25508_IG66 - G_M25508_IG02
dd G_M25508_IG64 - G_M25508_IG02
-; Total bytes of code 1573, prolog size 35, PerfScore 895.89, instruction count 429, allocated bytes for code 1573 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
+; Total bytes of code 1569, prolog size 35, PerfScore 895.89, instruction count 429, allocated bytes for code 1569 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
; ============================================================
Unwind Info:
+0 (0.00%) : 16035.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -194,14 +194,14 @@ G_M60445_IG17: ; bbWeight=0.50, epilog, nogc, extend
;; size=11 bbWeight=0.50 PerfScore 1.88
G_M60445_IG18: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
vxorps xmm0, xmm0, xmm0
- mov rcx, rax
- shr rcx, 1
- mov edx, eax
- and edx, 1
- or rdx, rcx
+ mov rdx, rax
+ shr rdx, 1
+ mov ecx, eax
+ and ecx, 1
+ or rcx, rdx
test rax, rax
- cmovns rdx, rax
- vcvtsi2sd xmm0, rdx
+ cmovns rcx, rax
+ vcvtsi2sd xmm0, rcx
jns SHORT G_M60445_IG19
vaddsd xmm0, xmm0
;; size=36 bbWeight=0.50 PerfScore 6.17
+0 (0.00%) : 22054.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -75,14 +75,14 @@ G_M42680_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
; gcrRegs +[rax]
; gcr arg pop 0
xorps xmm0, xmm0
- mov rcx, rbx
- shr rcx, 1
- mov edx, ebx
- and edx, 1
- or rdx, rcx
+ mov rdx, rbx
+ shr rdx, 1
+ mov ecx, ebx
+ and ecx, 1
+ or rcx, rdx
test rbx, rbx
- cmovns rdx, rbx
- cvtsi2sd xmm0, rdx
+ cmovns rcx, rbx
+ cvtsi2sd xmm0, rcx
jns SHORT G_M42680_IG06
addsd xmm0, xmm0
;; size=47 bbWeight=0.50 PerfScore 6.92
@@ -97,14 +97,14 @@ G_M42680_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
; gcrRegs +[rax]
; gcr arg pop 0
xorps xmm0, xmm0
- mov rcx, rbx
- shr rcx, 1
- mov edx, ebx
- and edx, 1
- or rdx, rcx
+ mov rdx, rbx
+ shr rdx, 1
+ mov ecx, ebx
+ and ecx, 1
+ or rcx, rdx
test rbx, rbx
- cmovns rdx, rbx
- cvtsi2ss xmm0, rdx
+ cmovns rcx, rbx
+ cvtsi2ss xmm0, rcx
jns SHORT G_M42680_IG08
addss xmm0, xmm0
;; size=47 bbWeight=0.50 PerfScore 6.92
+0 (0.00%) : 18129.dasm - System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
@@ -463,14 +463,14 @@ G_M48359_IG36: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0088 {rbx rd
jne SHORT G_M48359_IG38
mov rcx, qword ptr [rdi]
xorps xmm0, xmm0
- mov rax, rcx
- shr rax, 1
- mov edx, ecx
- and edx, 1
- or rdx, rax
+ mov rdx, rcx
+ shr rdx, 1
+ mov eax, ecx
+ and eax, 1
+ or rax, rdx
test rcx, rcx
- cmovns rdx, rcx
- cvtsi2ss xmm0, rdx
+ cmovns rax, rcx
+ cvtsi2ss xmm0, rax
jns SHORT G_M48359_IG37
addss xmm0, xmm0
;; size=44 bbWeight=0.50 PerfScore 7.79
@@ -485,14 +485,14 @@ G_M48359_IG38: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0088 {rbx rd
jne G_M48359_IG14
mov rcx, qword ptr [rdi]
xorps xmm0, xmm0
- mov rax, rcx
- shr rax, 1
- mov edx, ecx
- and edx, 1
- or rdx, rax
+ mov rdx, rcx
+ shr rdx, 1
+ mov eax, ecx
+ and eax, 1
+ or rax, rdx
test rcx, rcx
- cmovns rdx, rcx
- cvtsi2sd xmm0, rdx
+ cmovns rax, rcx
+ cvtsi2sd xmm0, rax
jns SHORT G_M48359_IG39
addsd xmm0, xmm0
;; size=48 bbWeight=0.50 PerfScore 7.79
+0 (0.00%) : 22059.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -75,14 +75,14 @@ G_M29411_IG05: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
; gcrRegs +[rax]
; gcr arg pop 0
xorps xmm0, xmm0
- mov rcx, rbx
- shr rcx, 1
- mov edx, ebx
- and edx, 1
- or rdx, rcx
+ mov rdx, rbx
+ shr rdx, 1
+ mov ecx, ebx
+ and ecx, 1
+ or rcx, rdx
test rbx, rbx
- cmovns rdx, rbx
- cvtsi2sd xmm0, rdx
+ cmovns rcx, rbx
+ cvtsi2sd xmm0, rcx
jns SHORT G_M29411_IG06
addsd xmm0, xmm0
;; size=47 bbWeight=0.50 PerfScore 6.92
@@ -97,14 +97,14 @@ G_M29411_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
; gcrRegs +[rax]
; gcr arg pop 0
xorps xmm0, xmm0
- mov rcx, rbx
- shr rcx, 1
- mov edx, ebx
- and edx, 1
- or rdx, rcx
+ mov rdx, rbx
+ shr rdx, 1
+ mov ecx, ebx
+ and ecx, 1
+ or rcx, rdx
test rbx, rbx
- cmovns rdx, rbx
- cvtsi2ss xmm0, rdx
+ cmovns rcx, rbx
+ cvtsi2ss xmm0, rcx
jns SHORT G_M29411_IG08
addss xmm0, xmm0
;; size=47 bbWeight=0.50 PerfScore 6.92
+0 (0.00%) : 2277.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -194,14 +194,14 @@ G_M60445_IG17: ; bbWeight=0.50, epilog, nogc, extend
;; size=11 bbWeight=0.50 PerfScore 1.88
G_M60445_IG18: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
xorps xmm0, xmm0
- mov rcx, rax
- shr rcx, 1
- mov edx, eax
- and edx, 1
- or rdx, rcx
+ mov rdx, rax
+ shr rdx, 1
+ mov ecx, eax
+ and ecx, 1
+ or rcx, rdx
test rax, rax
- cmovns rdx, rax
- cvtsi2sd xmm0, rdx
+ cmovns rcx, rax
+ cvtsi2sd xmm0, rcx
jns SHORT G_M60445_IG19
addsd xmm0, xmm0
;; size=35 bbWeight=0.50 PerfScore 6.17
Details
Size improvements/regressions per collection
Collection | Contexts with diffs | Improvements | Regressions | Same size | Improvements (bytes) | Regressions (bytes) |
---|---|---|---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 11 | 5 | 0 | 6 | -20 | +0 |
PerfScore improvements/regressions per collection
Collection | Contexts with diffs | Improvements | Regressions | Same PerfScore | Improvements (PerfScore) | Regressions (PerfScore) | PerfScore Overall in FullOpts |
---|---|---|---|---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 11 | 0 | 0 | 11 | 0.00% | 0.00% | 0.0000% |
Context information
Collection | Diffed contexts | MinOpts | FullOpts | Missed, base | Missed, diff |
---|---|---|---|---|---|
smoke_tests.nativeaot.windows.x64.checked.mch | 25,335 | 13 | 25,322 | 0 (0.00%) | 0 (0.00%) |
jit-analyze output
I have made all the necessary changes. Let me know if you have any other concerns. For testing - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/ba-g unrelated failures |
Public issue: #114559
Description:
Currently in JIT, we have enabled extended general-purpose registers (EGPRs) in most of the instructions via REX2 or extended EVEX prefix. But previously, out of the testing stability, we have masked out EGPRs for most of the gentree nodes when building them in register allocator (
LinearScan::BuildNode
). This PR is to revert the masking and let EGPR ACTUALLY available to those nodes.In detail, we used to have masks in those nodes, and this PR will remove all of them.
BuildHWIntrinsic
BuildIntrinsic
BuildBlockStore
BuildCall
BuildShiftRotate
GT_INDEX_ADDR
GT_CKFINITE
GT_RETURNTRAP
GT_SWITCH_TABLE
GT_JMPTABLE
BuildMul
BuildIndir
Buildcast
genFnProlog
Results:
SuperPMI asmdiff:
latest main + APX on
v.s.latest main + changes + APX on