Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unmasking EGPRs in register allocator #114867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
May 27, 2025
Merged

Conversation

khushal1996
Copy link
Member

@khushal1996 khushal1996 commented Apr 21, 2025

Public issue: #114559

Description:
Currently in JIT, we have enabled extended general-purpose registers (EGPRs) in most of the instructions via REX2 or extended EVEX prefix. But previously, out of the testing stability, we have masked out EGPRs for most of the gentree nodes when building them in register allocator (LinearScan::BuildNode). This PR is to revert the masking and let EGPR ACTUALLY available to those nodes.

In detail, we used to have masks in those nodes, and this PR will remove all of them.

  • BuildHWIntrinsic
  • BuildIntrinsic
  • BuildBlockStore
  • BuildCall
  • BuildShiftRotate
  • GT_INDEX_ADDR
  • GT_CKFINITE
  • GT_RETURNTRAP
  • GT_SWITCH_TABLE
  • GT_JMPTABLE
  • BuildMul
  • BuildIndir
  • Buildcast
  • genFnProlog

Results:

SuperPMI asmdiff:
latest main + APX on v.s. latest main + changes + APX on

Note: We tend to demonstrate the improvement of our changes, so we need to have both base and diff with APX on. We have added an icount (Instruction Count) column in superpmi results to measure how well our changes are working. The following columns are added in the superpmi run below ->
Base Instruction Count - Total Instruction in Base
Diff Instruction Count - delta from total instructions in Base(percentage change considering only methods which had a diff)(percentage change considering only the methods which had icount change)

image

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 21, 2025
@khushal1996 khushal1996 changed the title unmasking EGPRs in register allocator Unmasking EGPRs in register allocator Apr 21, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 21, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reverts the masking of extended GPRs (EGPRs) in the register allocator to allow their full usage in JIT code generation.

  • Removed calls to BuildApxIncompatibleGPRMask in favor of unmasked register definitions when extended registers are available.
  • Updated multiple LSRA routines and code emission paths to use the new useEvex/canUseApxRegs driven logic.
  • Introduced the BuildApxIncompatibleGPRMaskIfNeeded helper to streamline candidates selection when EVEX support is available.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreclr/jit/lsraxarch.cpp Removed masking calls in node building and introduced useEvex checks.
src/coreclr/jit/lsrabuild.cpp Updated BuildBinaryUses and related routines to conditionally use APX masks.
src/coreclr/jit/lsra.h Added inline declaration for BuildApxIncompatibleGPRMaskIfNeeded.
src/coreclr/jit/instrsxarch.h Added Encoding_REX2 to imul instructions for extended registers.
src/coreclr/jit/emitxarch.cpp Adjusted movbe emission to select the APX variant based on register usage.
src/coreclr/jit/codegenxarch.cpp Updated code emission paths for movbe and internal register extraction.
src/coreclr/jit/codegencommon.cpp Modified function prologue to conditionally exclude high registers.

@khushal1996
Copy link
Member Author

@dotnet/intel for review

@BruceForstall
Copy link
Contributor

cc @kunalspathak @dotnet/jit-contrib

@kunalspathak kunalspathak self-requested a review May 2, 2025 23:13
@khushal1996
Copy link
Member Author

@kunalspathak for reviews

@@ -3879,7 +3882,7 @@ int LinearScan::BuildBinaryUses(GenTreeOp* node, SingleTypeRegSet candidates)
{

#ifdef TARGET_XARCH
if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && candidates == RBM_NONE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here...can you double check if && !canUseApxRegs is the right check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure that if APX is not supported, we prune down the available candidates to only lowGPRRegs. @Ruihan-Yin to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canUseApxRegs is checking if both APX and EVEX are available, only with both can we guarantee instructions will have the right encoding with EGPRs on its operand. In some cases, we don't care EVEX because we can assure all the nodes in the build function will never need EVEX, but in most cases, we need both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same argument I made in #114867 (comment). Basically, we want to strictly use lowGprRegs if:

  • (op2->isContainedIndir || ...) OR
  • apx is not supported

Basically, shouldn't we have this?

- if (op1->isContainedIndir() && !getCanUseApxRegs())
+ if (op1->isContainedIndir() || !getCanUseApxRegs())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kunal,

Tested the above changes out. We just want to force lowGPRs when following condition is true --> op1->isContainedIndir() && (candidates == RBM_NONE) and we dont have APX

Hence the changes. Let me know if you have any questions. Other cases do not need forced to lowGPRs. I tested the changes out on superpmi replay with APX on / off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khushal1996 @Ruihan-Yin

I think Kunal is right.

Theoretically we have 3 cases
Case 1: APX is not supported at all – We do not need to worry about it at all since high GPR doesn’t come into play at all. So, in effect, candidates are limited to lowGPRs
Case 2: APX is supported but EVEX support is not there – In this case, we need to restrict candidates to just lowGPRs
Case 3: APX support exists with EVEX support. – In this case, we do not need to do anything. Can give LSRA access to all registers for this node

I assume you merged case 1 and case 2 into the following condition:
if (op2->isContainedIndir() && (candidates == RBM_NONE) && !getCanUseApxRegs())

This means
If op2->isContainedIndir() && (candidates == RBM_NONE) is true, we can't give access to eGPR unless APX AND EVEX are available

In code, if getCanUseApxRegs() == false one of isApxSupported or evexIsSupported is false.

if isApxSupported is false but evexIsSupported is true it's okay to call BuildOperandUses(op1, candidates); since APX not supported guarantees candidates will not include eGPR
2. If evexIsSupported is false but isApxSupported is true, candidates can possibly include eGPR but it should not be used since the node might use an instruction that does not have eEVEX support

Consider the case where evexIsSupported is false but isApxSupported is true and (candidates != RBM_NONE)

  1. Can we have this case happen?
  2. If this is possible, will this not break the code since it might assign a eGPR to this node which cannot be encoded with eEVEX?

I think the original condition I added is a problem as well
if (op1->isContainedIndir() && ((varTypeUsesFloatReg(node) || node->OperGet() == GT_BSWAP || node->OperGet() == GT_BSWAP16)) && candidates == RBM_NONE)

I made an assumption here that if candidates != RBM_NONE, it's already been handled to account for APX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak Do you think this makes sense here?

if (op2->isContainedIndir() && varTypeUsesFloatReg(op1) && !getEvexIsSupported())
{
       if(candidates == RBM_NONE)
       {
                candidates = lowGprRegs;
       }
       else
       {
              assert(candidate & ~rbmAllInt == RBM_NONE);
              candidates &= lowGprRegs;
       }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes..@khushal1996, please also capture the comments in method docs so we know what all scenarios exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kunalspathak I have made the suggested changes after discussing with @DeepakRajendrakumaran.

@@ -3982,14 +3984,6 @@ void LinearScan::BuildStoreLocDef(GenTreeLclVarCommon* storeLoc,
defCandidates = allRegs(type);
#endif // TARGET_X86

#ifdef TARGET_AMD64
if (op1->isContained() && op1->OperIs(GT_BITCAST) && varTypeUsesIntReg(varDsc->GetRegisterType(storeLoc)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we don't care if APX regs is available here and below for GT_BITCAST too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ruihan-Yin can you please comment here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be lowered to REX2 or eEVEX instructions, we will need to check if APX encodings are available in this case. I missed the SIMD load initially.

@@ -1542,7 +1529,8 @@ int LinearScan::BuildBlockStore(GenTreeBlk* blkNode)
// or if are but the remainder is a power of 2 and less than the
// size of a register

SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true);
// SingleTypeRegSet regMask = BuildApxIncompatibleGPRMask(blkNode, availableIntRegs, true);
SingleTypeRegSet regMask = availableIntRegs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some of these places, don't we need useEvex check, the way it is done at other places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored some code in my latest commit.

BuildUse(srcAddrOrFill, BuildApxIncompatibleGPRMask(srcAddrOrFill, srcRegMask));
if (useEvex)
{
BuildUse(srcAddrOrFill, srcRegMask);
Copy link
Member

@kunalspathak kunalspathak May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useEvex check is scattered through out the code. Is it possible to create a helper method or update BuildApxIncompatibleGPRMask that takes the original mask. It will check if useEvex == true, return that mask as it is, otherwise do BuildApxIncompatibleGPRMask() operation on it to get right mask.

Just noticed you already have BuildApxIncompatibleGPRMaskIfNeeded. Can we reuse it at other places as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored some code to use BuildApxIncompatibleGPRMaskIfNeeded

GenTree* user = nullptr;

if (LIR::AsRange(blockSequence[curBBSeqNum]).TryGetUse(intrinsicTree, &use))
// TODO-XArch-APX: some of the permute intrinsics are APX-EVEX compatible, we need to separate and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a GH issue for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This was a stale comment and has been removed in latest changes.

@kunalspathak
Copy link
Member

SuperPMI asmdiff:

Can you also share the tpdiff numbers for similar configuration?

Also I was hoping the asmdiff on CI to be zero diff, but still seeing differences. Do we know why?

// NI_System_Math_Abs is the only one likely to use a GPR
op1RegCandidates = BuildApxIncompatibleGPRMask(op1, op1RegCandidates);
if (op1RegCandidates == RBM_NONE)
// op1RegCandidates = BuildApxIncompatibleGPRMask(op1, op1RegCandidates);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete commented code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted.

case NI_System_Math_Abs:
{
op1RegCandidates = BuildApxIncompatibleGPRMaskIfNeeded(op1, RBM_NONE, getCanUseApxRegs());
// getCanUseApxRegs() ? op1RegCandidates : BuildApxIncompatibleGPRMask(op1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete commented code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted.

@kunalspathak
Copy link
Member

inline SingleTypeRegSet LinearScan::BuildApxIncompatibleGPRMask(GenTree* tree,

I think this method is not named correctly. all GPR regs (low and high) work with APX, as far as my understanding. It is just high GPR (eGPR) will use REX2 encoding (correct me if I am wrong). So we are not really returning register mask that is incompatible for APX, but just prefer to return lowGprRegs in cases where APX is not supported. Can we change the method name appropriately? Every time I see this method, I keep wondering why we are generating incompatible mask.


Refers to: src/coreclr/jit/lsraxarch.cpp:3378 in 34c693f. [](commit_id = 34c693f, deletion_comment = False)

@khushal1996
Copy link
Member Author

inline SingleTypeRegSet LinearScan::BuildApxIncompatibleGPRMask(GenTree* tree,

I think this method is not named correctly. all GPR regs (low and high) work with APX, as far as my understanding. It is just high GPR (eGPR) will use REX2 encoding (correct me if I am wrong). So we are not really returning register mask that is incompatible for APX, but just prefer to return lowGprRegs in cases where APX is not supported. Can we change the method name appropriately? Every time I see this method, I keep wondering why we are generating incompatible mask.

Refers to: src/coreclr/jit/lsraxarch.cpp:3378 in 34c693f. [](commit_id = 34c693f, deletion_comment = False)

Renaming it to ForceLowGPRForApx()

{
assert((candidates & lowGprRegs) != RBM_NONE);
srcCount += BuildOperandUses(op1, candidates & lowGprRegs);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI - @DeepakRajendrakumaran
the else part if not reached right now. Shall I change it to assert(false) to make sure that if anyone reached this part, it fails and knows about this handling?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI - @DeepakRajendrakumaran the else part if not reached right now. Shall I change it to assert(false) to make sure that if anyone reached this part, it fails and knows about this handling?

I think it's fine as is. Will leave it up to @kunalspathak

// TODO-Xarch-apx : Revert. Excluding eGPR so that it's not used for non REX2 supported movs.
excludeMask = excludeMask | RBM_HIGHINT;
// we'd require eEVEX present to enable EGPRs in HWIntrinsics.
if (!compiler->canUseEvexEncoding() || !compiler->canUseApxEncoding())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the !compiler->canUseApxEncoding() part here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so . I ran the replays and looks good. I have removed it in my latest changes.

Comment on lines +7378 to +7381
// Get the APXIncompatible register first
regNumber tmpReg2 = internalRegisters.Extract(treeNode);
// tmpReg1 can be EGPR
regNumber tmpReg1 = internalRegisters.Extract(treeNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain this part step by step?

  1. What are we doing here? i.e., which node are we generating code for and what instruction is being generated etc?
  2. what exactly changed w.r.t CodeGen with this change
  3. Why is regNumber tmpReg2 = internalRegisters.Extract(treeNode); guaranteed to be a APXIncompatible register

{
assert((candidates & lowGprRegs) != RBM_NONE);
srcCount += BuildOperandUses(op1, candidates & lowGprRegs);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI - @DeepakRajendrakumaran the else part if not reached right now. Shall I change it to assert(false) to make sure that if anyone reached this part, it fails and knows about this handling?

I think it's fine as is. Will leave it up to @kunalspathak

isApxSupported = compiler->canUseApxEncoding();
if (isApxSupported)
apxIsSupported = compiler->canUseApxEncoding();
canUseApxRegs = apxIsSupported && evexIsSupported;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for canUseApxRegs based on our discussions? Will evexIsSupported check suffice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used in soe places whcih could do fine with Evex checks. I have made the changes accordingly.

}
}

ins = needsEvex ? INS_movbe_apx : INS_movbe;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ins = needsEvex ? INS_movbe_apx : INS_movbe;
instruction ins = needsEvex ? INS_movbe_apx : INS_movbe;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is AMD64 code and hence ins is defined outside the #ifdef checks.

// to restrict candidates to just lowGPRs
// Case 3: APX support exists with EVEX support. – In this case, we do not need
// to do anything. Can give LSRA access to all registers for this node
// Case 4: APX support without Evex support - candidates can possibly include
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't case 2 and case 4 the same?

// updated register mask.
inline SingleTypeRegSet LinearScan::ForceLowGprForApxIfNeeded(GenTree* tree,
SingleTypeRegSet candidates,
bool UseApxRegs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool UseApxRegs)
bool useApxRegs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes in my latest commit.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added minor comments

Comment on lines 3001 to 2999
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true));
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true));
Copy link
Member Author

@khushal1996 khushal1996 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related comments - #114867 (comment), #114867 (comment), #114867 (comment)
@DeepakRajendrakumaran @kunalspathak
Details about this change below -->

  • The above change generates 2 temporary register. One of them can be EGPR where as the other one cannot.
  • Node affected - Cast nodes (Ulong -> float)
  • Without the tmpReg1, tmpReg2 swap we would see this code as
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true));
buildInternalIntRegisterDefForNode(cast, BuildApxIncompatibleGPRMask(cast, availableIntRegs, true));

Sample program -

    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    public static float UlongToFloat(ulong val)
    {
        return (float)val;
    }
    public static int Main(string[] args)
    {
         Console.WriteLine(UlongToFloat(1500));
    }

Disasm

G_M49684_IG02:  ;; offset=0x0000
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       488BC1               mov      rax, rcx
       48D1E8               shr      rax, 1
       8BD1                 mov      edx, ecx
       83E201               and      edx, 1
       480BD0               or       rdx, rax
       4885C9               test     rcx, rcx
       480F49D1             cmovns   rdx, rcx
       C4E1FA2AC2           vcvtsi2ss xmm0, rdx

In the above disasm, Ins_mov, Ins_shr and Ins_or can use EGPR and allowing one of the temp registers in RA will allow EGPR usage.

The below superpmi run shows the asmdiffs with for only this case.
Diffs are based on 25,335 contexts (13 MinOpts, 25,322 FullOpts).

Base JIT options: JitBypassApxCheck=1;JitStressRegs=4000

Diff JIT options: JitBypassApxCheck=1;JitStressRegs=4000

Overall (+57 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
smoke_tests.nativeaot.windows.x64.checked.mch 6,084,089 +57 0.00%
FullOpts (+57 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
smoke_tests.nativeaot.windows.x64.checked.mch 6,083,071 +57 0.00%
Example diffs
smoke_tests.nativeaot.windows.x64.checked.mch
+6 (+0.22%) : 7020.dasm - System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
@@ -485,17 +485,17 @@ G_M25508_IG41:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=10000 {r16}, by
        ; byrRegs +[rbp]
        mov      rcx, qword ptr [rbp]
        xorps    xmm0, xmm0
-       mov      rax, rcx
-       shr      rax, 1
+       mov      r21, rcx
+       shr      r21, 1
        mov      edx, ecx
        and      edx, 1
-       or       rdx, rax
+       or       rdx, r21
        test     rcx, rcx
        cmovns   rdx, rcx
        cvtsi2ss xmm0, rdx
        jns      SHORT G_M25508_IG42
        addss    xmm0, xmm0
-						;; size=57 bbWeight=2 PerfScore 35.17
+						;; size=60 bbWeight=2 PerfScore 35.17
 G_M25508_IG42:        ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=10020 {rbp r16}, gcvars, byref
        ; GC ptr vars -{V11}
        mov      rbx, r16
@@ -512,17 +512,17 @@ G_M25508_IG43:        ; bbWeight=2, gcVars=0000000000000001 {V11}, gcrefRegs=000
        ; byrRegs +[rbp]
        mov      rcx, qword ptr [rbp]
        xorps    xmm0, xmm0
-       mov      rax, rcx
-       shr      rax, 1
+       mov      r21, rcx
+       shr      r21, 1
        mov      edx, ecx
        and      edx, 1
-       or       rdx, rax
+       or       rdx, r21
        test     rcx, rcx
        cmovns   rdx, rcx
        cvtsi2sd xmm0, rdx
        jns      SHORT G_M25508_IG44
        addsd    xmm0, xmm0
-						;; size=51 bbWeight=2 PerfScore 33.17
+						;; size=54 bbWeight=2 PerfScore 33.17
 G_M25508_IG44:        ; bbWeight=2, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=10020 {rbp r16}, gcvars, byref
        ; GC ptr vars -{V00 V11}
        mov      rbx, r16
@@ -1207,7 +1207,7 @@ RWD144 	dd	G_M25508_IG80 - G_M25508_IG02
        	dd	G_M25508_IG90 - G_M25508_IG02
 
 
-; Total bytes of code 2723, prolog size 47, PerfScore 1131.45, instruction count 588, allocated bytes for code 2723 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
+; Total bytes of code 2729, prolog size 47, PerfScore 1131.45, instruction count 588, allocated bytes for code 2729 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
 ; ============================================================
 
 Unwind Info:
+6 (+0.28%) : 18129.dasm - System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
@@ -632,19 +632,20 @@ G_M48359_IG49:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=10000 {r16},
        ; byrRegs +[rbx]
        mov      rcx, qword ptr [rbx]
        xorps    xmm0, xmm0
-       mov      rax, rcx
-       shr      rax, 1
+       mov      r16, rcx
+       ; byrRegs -[r16]
+       shr      r16, 1
        mov      edx, ecx
        and      edx, 1
-       or       rdx, rax
+       or       rdx, r16
        test     rcx, rcx
        cmovns   rdx, rcx
        cvtsi2ss xmm0, rdx
        jns      SHORT G_M48359_IG50
        addss    xmm0, xmm0
-						;; size=47 bbWeight=0.50 PerfScore 7.92
+						;; size=50 bbWeight=0.50 PerfScore 7.92
 G_M48359_IG50:        ; bbWeight=0.50, gcVars=0000000000000001 {V10}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
-       ; byrRegs -[rbx r16]
+       ; byrRegs -[rbx]
        ; GC ptr vars -{V00 V01 V26}
        mov      rbp, bword ptr [rsp+0x28]
        ; byrRegs +[rbp]
@@ -660,19 +661,20 @@ G_M48359_IG51:        ; bbWeight=0.50, gcVars=0000000000000003 {V10 V26}, gcrefR
        ; byrRegs +[rbx]
        mov      rcx, qword ptr [rbx]
        xorps    xmm0, xmm0
-       mov      rax, rcx
-       shr      rax, 1
+       mov      r16, rcx
+       ; byrRegs -[r16]
+       shr      r16, 1
        mov      edx, ecx
        and      edx, 1
-       or       rdx, rax
+       or       rdx, r16
        test     rcx, rcx
        cmovns   rdx, rcx
        cvtsi2sd xmm0, rdx
        jns      SHORT G_M48359_IG52
        addsd    xmm0, xmm0
-						;; size=51 bbWeight=0.50 PerfScore 7.92
+						;; size=54 bbWeight=0.50 PerfScore 7.92
 G_M48359_IG52:        ; bbWeight=0.50, gcVars=0000000000000001 {V10}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
-       ; byrRegs -[rbx r16]
+       ; byrRegs -[rbx]
        ; GC ptr vars -{V01 V26}
        mov      rbp, bword ptr [rsp+0x28]
        ; byrRegs +[rbp]
@@ -966,7 +968,7 @@ RWD176 	dd	G_M48359_IG72 - G_M48359_IG02
        	dd	G_M48359_IG70 - G_M48359_IG02
 
 
-; Total bytes of code 2132, prolog size 13, PerfScore 284.20, instruction count 470, allocated bytes for code 2132 (MethodHash=15234318) for method System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
+; Total bytes of code 2138, prolog size 13, PerfScore 284.20, instruction count 470, allocated bytes for code 2138 (MethodHash=15234318) for method System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+0.58%) : 16035.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -203,17 +203,17 @@ G_M60445_IG20:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
        ; byrRegs -[rbp]
        mov      r8, r16
        vxorps   xmm0, xmm0, xmm0
-       mov      rax, r8
-       shr      rax, 1
-       mov      ecx, r8d
-       and      ecx, 1
-       or       rcx, rax
+       mov      r17, r8
+       shr      r17, 1
+       mov      eax, r8d
+       and      eax, 1
+       or       rax, r17
        test     r8, r8
-       cmovns   rcx, r8
-       vcvtsi2sd xmm0, rcx
+       cmovns   rax, r8
+       vcvtsi2sd xmm0, rax
        jns      SHORT G_M60445_IG21
        vaddsd   xmm0, xmm0
-						;; size=41 bbWeight=0.50 PerfScore 6.29
+						;; size=44 bbWeight=0.50 PerfScore 6.29
 G_M60445_IG21:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
        cmp      r20, 23
        jae      SHORT G_M60445_IG26
@@ -249,7 +249,7 @@ G_M60445_IG26:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 515, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 515 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
+; Total bytes of code 518, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 518 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
 ; ============================================================
 
 Unwind Info:
+6 (+1.13%) : 22054.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -81,19 +81,19 @@ G_M42680_IG05:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        mov      r16, rax
        ; gcrRegs +[r16]
        xorps    xmm0, xmm0
-       mov      rax, rbp
-       ; gcrRegs -[rax]
-       shr      rax, 1
+       mov      r17, rbp
+       shr      r17, 1
        mov      ecx, ebp
        and      ecx, 1
-       or       rcx, rax
+       or       rcx, r17
        test     rbp, rbp
        cmovns   rcx, rbp
        cvtsi2sd xmm0, rcx
        jns      SHORT G_M42680_IG06
        addsd    xmm0, xmm0
-						;; size=51 bbWeight=0.50 PerfScore 7.04
+						;; size=54 bbWeight=0.50 PerfScore 7.04
 G_M42680_IG06:        ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax]
        mov      rcx, r16
        ; gcrRegs +[rcx]
        movsd    qword ptr [rcx+0x08], xmm0
@@ -108,19 +108,19 @@ G_M42680_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        mov      r16, rax
        ; gcrRegs +[r16]
        xorps    xmm0, xmm0
-       mov      rax, rbp
-       ; gcrRegs -[rax]
-       shr      rax, 1
+       mov      r17, rbp
+       shr      r17, 1
        mov      ecx, ebp
        and      ecx, 1
-       or       rcx, rax
+       or       rcx, r17
        test     rbp, rbp
        cmovns   rcx, rbp
        cvtsi2ss xmm0, rcx
        jns      SHORT G_M42680_IG08
        addss    xmm0, xmm0
-						;; size=51 bbWeight=0.50 PerfScore 7.04
+						;; size=54 bbWeight=0.50 PerfScore 7.04
 G_M42680_IG08:        ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax]
        mov      rcx, r16
        ; gcrRegs +[rcx]
        movss    dword ptr [rcx+0x08], xmm0
@@ -288,7 +288,7 @@ RWD00  	dd	G_M42680_IG17 - G_M42680_IG02
        	dd	G_M42680_IG04 - G_M42680_IG02
 
 
-; Total bytes of code 531, prolog size 5, PerfScore 55.62, instruction count 123, allocated bytes for code 531 (MethodHash=52025947) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
+; Total bytes of code 537, prolog size 5, PerfScore 55.62, instruction count 123, allocated bytes for code 537 (MethodHash=52025947) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+6 (+0.89%) : 22059.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -81,19 +81,19 @@ G_M29411_IG05:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        mov      r16, rax
        ; gcrRegs +[r16]
        xorps    xmm0, xmm0
-       mov      rax, rbp
-       ; gcrRegs -[rax]
-       shr      rax, 1
+       mov      r17, rbp
+       shr      r17, 1
        mov      ecx, ebp
        and      ecx, 1
-       or       rcx, rax
+       or       rcx, r17
        test     rbp, rbp
        cmovns   rcx, rbp
        cvtsi2sd xmm0, rcx
        jns      SHORT G_M29411_IG06
        addsd    xmm0, xmm0
-						;; size=51 bbWeight=0.50 PerfScore 7.04
+						;; size=54 bbWeight=0.50 PerfScore 7.04
 G_M29411_IG06:        ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax]
        mov      rcx, r16
        ; gcrRegs +[rcx]
        movsd    qword ptr [rcx+0x08], xmm0
@@ -108,19 +108,19 @@ G_M29411_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        mov      r16, rax
        ; gcrRegs +[r16]
        xorps    xmm0, xmm0
-       mov      rax, rbp
-       ; gcrRegs -[rax]
-       shr      rax, 1
+       mov      r17, rbp
+       shr      r17, 1
        mov      ecx, ebp
        and      ecx, 1
-       or       rcx, rax
+       or       rcx, r17
        test     rbp, rbp
        cmovns   rcx, rbp
        cvtsi2ss xmm0, rcx
        jns      SHORT G_M29411_IG08
        addss    xmm0, xmm0
-						;; size=51 bbWeight=0.50 PerfScore 7.04
+						;; size=54 bbWeight=0.50 PerfScore 7.04
 G_M29411_IG08:        ; bbWeight=0.50, gcrefRegs=10000 {r16}, byrefRegs=0000 {}, byref
+       ; gcrRegs -[rax]
        mov      rcx, r16
        ; gcrRegs +[rcx]
        movss    dword ptr [rcx+0x08], xmm0
@@ -318,7 +318,7 @@ RWD00  	dd	G_M29411_IG17 - G_M29411_IG02
        	dd	G_M29411_IG04 - G_M29411_IG02
 
 
-; Total bytes of code 676, prolog size 5, PerfScore 61.87, instruction count 150, allocated bytes for code 676 (MethodHash=9b428d1c) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
+; Total bytes of code 682, prolog size 5, PerfScore 61.87, instruction count 150, allocated bytes for code 682 (MethodHash=9b428d1c) for method System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
+3 (+0.58%) : 2277.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -203,17 +203,17 @@ G_M60445_IG20:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
        ; byrRegs -[rbp]
        mov      r8, r16
        xorps    xmm0, xmm0
-       mov      rax, r8
-       shr      rax, 1
-       mov      ecx, r8d
-       and      ecx, 1
-       or       rcx, rax
+       mov      r17, r8
+       shr      r17, 1
+       mov      eax, r8d
+       and      eax, 1
+       or       rax, r17
        test     r8, r8
-       cmovns   rcx, r8
-       cvtsi2sd xmm0, rcx
+       cmovns   rax, r8
+       cvtsi2sd xmm0, rax
        jns      SHORT G_M60445_IG21
        addsd    xmm0, xmm0
-						;; size=40 bbWeight=0.50 PerfScore 6.29
+						;; size=43 bbWeight=0.50 PerfScore 6.29
 G_M60445_IG21:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
        cmp      r20, 23
        jae      SHORT G_M60445_IG26
@@ -249,7 +249,7 @@ G_M60445_IG26:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 514, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 514 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
+; Total bytes of code 517, prolog size 8, PerfScore 72.42, instruction count 119, allocated bytes for code 517 (MethodHash=388813e2) for method System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
 ; ============================================================
 
 Unwind Info:
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
smoke_tests.nativeaot.windows.x64.checked.mch 11 0 11 0 -0 +57

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
smoke_tests.nativeaot.windows.x64.checked.mch 11 0 0 11 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
smoke_tests.nativeaot.windows.x64.checked.mch 25,335 13 25,322 0 (0.00%) 0 (0.00%)

jit-analyze output

Superpmi diff with Just APX on -->
Diffs are based on 25,335 contexts (13 MinOpts, 25,322 FullOpts).

Base JIT options: JitBypassApxCheck=1

Diff JIT options: JitBypassApxCheck=1

Overall (-20 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
smoke_tests.nativeaot.windows.x64.checked.mch 4,308,719 -20 0.00%
FullOpts (-20 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
smoke_tests.nativeaot.windows.x64.checked.mch 4,307,701 -20 0.00%
Example diffs
smoke_tests.nativeaot.windows.x64.checked.mch
-4 (-0.25%) : 1229.dasm - System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
@@ -473,17 +473,17 @@ G_M25508_IG43:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi},
        jne      SHORT G_M25508_IG45
        mov      rcx, qword ptr [rbx]
        xorps    xmm0, xmm0
-       mov      rdx, rcx
-       shr      rdx, 1
-       mov      r8d, ecx
-       and      r8d, 1
-       or       r8, rdx
+       mov      r8, rcx
+       shr      r8, 1
+       mov      edx, ecx
+       and      edx, 1
+       or       rdx, r8
        test     rcx, rcx
-       cmovns   r8, rcx
-       cvtsi2ss xmm0, r8
+       cmovns   rdx, rcx
+       cvtsi2ss xmm0, rdx
        jns      SHORT G_M25508_IG44
        addss    xmm0, xmm0
-						;; size=45 bbWeight=2 PerfScore 31.17
+						;; size=43 bbWeight=2 PerfScore 31.17
 G_M25508_IG44:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
        movss    dword ptr [rsi], xmm0
        jmp      G_M25508_IG35
@@ -493,17 +493,17 @@ G_M25508_IG45:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi},
        jne      G_M25508_IG30
        mov      rcx, qword ptr [rbx]
        xorps    xmm0, xmm0
-       mov      rdx, rcx
-       shr      rdx, 1
-       mov      r8d, ecx
-       and      r8d, 1
-       or       r8, rdx
+       mov      r8, rcx
+       shr      r8, 1
+       mov      edx, ecx
+       and      edx, 1
+       or       rdx, r8
        test     rcx, rcx
-       cmovns   r8, rcx
-       cvtsi2sd xmm0, r8
+       cmovns   rdx, rcx
+       cvtsi2sd xmm0, rdx
        jns      SHORT G_M25508_IG46
        addsd    xmm0, xmm0
-						;; size=49 bbWeight=2 PerfScore 31.17
+						;; size=47 bbWeight=2 PerfScore 31.17
 G_M25508_IG46:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0048 {rbx rsi}, byref
        movsd    qword ptr [rsi], xmm0
        jmp      G_M25508_IG35
@@ -792,7 +792,7 @@ RWD176 	dd	G_M25508_IG66 - G_M25508_IG02
        	dd	G_M25508_IG64 - G_M25508_IG02
 
 
-; Total bytes of code 1573, prolog size 35, PerfScore 895.89, instruction count 429, allocated bytes for code 1573 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
+; Total bytes of code 1569, prolog size 35, PerfScore 895.89, instruction count 429, allocated bytes for code 1569 (MethodHash=b09f9c5b) for method System.Array:CopyImplPrimitiveTypeWithWidening(System.Array,int,System.Array,int,int,ubyte) (FullOpts)
 ; ============================================================
 
 Unwind Info:
+0 (0.00%) : 16035.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -194,14 +194,14 @@ G_M60445_IG17:        ; bbWeight=0.50, epilog, nogc, extend
 						;; size=11 bbWeight=0.50 PerfScore 1.88
 G_M60445_IG18:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
        vxorps   xmm0, xmm0, xmm0
-       mov      rcx, rax
-       shr      rcx, 1
-       mov      edx, eax
-       and      edx, 1
-       or       rdx, rcx
+       mov      rdx, rax
+       shr      rdx, 1
+       mov      ecx, eax
+       and      ecx, 1
+       or       rcx, rdx
        test     rax, rax
-       cmovns   rdx, rax
-       vcvtsi2sd xmm0, rdx
+       cmovns   rcx, rax
+       vcvtsi2sd xmm0, rcx
        jns      SHORT G_M60445_IG19
        vaddsd   xmm0, xmm0
 						;; size=36 bbWeight=0.50 PerfScore 6.17
+0 (0.00%) : 22054.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Unchecked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -75,14 +75,14 @@ G_M42680_IG05:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        xorps    xmm0, xmm0
-       mov      rcx, rbx
-       shr      rcx, 1
-       mov      edx, ebx
-       and      edx, 1
-       or       rdx, rcx
+       mov      rdx, rbx
+       shr      rdx, 1
+       mov      ecx, ebx
+       and      ecx, 1
+       or       rcx, rdx
        test     rbx, rbx
-       cmovns   rdx, rbx
-       cvtsi2sd xmm0, rdx
+       cmovns   rcx, rbx
+       cvtsi2sd xmm0, rcx
        jns      SHORT G_M42680_IG06
        addsd    xmm0, xmm0
 						;; size=47 bbWeight=0.50 PerfScore 6.92
@@ -97,14 +97,14 @@ G_M42680_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        xorps    xmm0, xmm0
-       mov      rcx, rbx
-       shr      rcx, 1
-       mov      edx, ebx
-       and      edx, 1
-       or       rdx, rcx
+       mov      rdx, rbx
+       shr      rdx, 1
+       mov      ecx, ebx
+       and      ecx, 1
+       or       rcx, rdx
        test     rbx, rbx
-       cmovns   rdx, rbx
-       cvtsi2ss xmm0, rdx
+       cmovns   rcx, rbx
+       cvtsi2ss xmm0, rcx
        jns      SHORT G_M42680_IG08
        addss    xmm0, xmm0
 						;; size=47 bbWeight=0.50 PerfScore 6.92
+0 (0.00%) : 18129.dasm - System.InvokeUtils:ConvertOrWidenPrimitivesEnumsAndPointersIfPossible(System.Object,ulong,int,byref):System.Exception (FullOpts)
@@ -463,14 +463,14 @@ G_M48359_IG36:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0088 {rbx rd
        jne      SHORT G_M48359_IG38
        mov      rcx, qword ptr [rdi]
        xorps    xmm0, xmm0
-       mov      rax, rcx
-       shr      rax, 1
-       mov      edx, ecx
-       and      edx, 1
-       or       rdx, rax
+       mov      rdx, rcx
+       shr      rdx, 1
+       mov      eax, ecx
+       and      eax, 1
+       or       rax, rdx
        test     rcx, rcx
-       cmovns   rdx, rcx
-       cvtsi2ss xmm0, rdx
+       cmovns   rax, rcx
+       cvtsi2ss xmm0, rax
        jns      SHORT G_M48359_IG37
        addss    xmm0, xmm0
 						;; size=44 bbWeight=0.50 PerfScore 7.79
@@ -485,14 +485,14 @@ G_M48359_IG38:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0088 {rbx rd
        jne      G_M48359_IG14
        mov      rcx, qword ptr [rdi]
        xorps    xmm0, xmm0
-       mov      rax, rcx
-       shr      rax, 1
-       mov      edx, ecx
-       and      edx, 1
-       or       rdx, rax
+       mov      rdx, rcx
+       shr      rdx, 1
+       mov      eax, ecx
+       and      eax, 1
+       or       rax, rdx
        test     rcx, rcx
-       cmovns   rdx, rcx
-       cvtsi2sd xmm0, rdx
+       cmovns   rax, rcx
+       cvtsi2sd xmm0, rax
        jns      SHORT G_M48359_IG39
        addsd    xmm0, xmm0
 						;; size=48 bbWeight=0.50 PerfScore 7.79
+0 (0.00%) : 22059.dasm - System.Linq.Expressions.Interpreter.NumericConvertInstruction+Checked:ConvertUInt64(ulong):System.Object:this (FullOpts)
@@ -75,14 +75,14 @@ G_M29411_IG05:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        xorps    xmm0, xmm0
-       mov      rcx, rbx
-       shr      rcx, 1
-       mov      edx, ebx
-       and      edx, 1
-       or       rdx, rcx
+       mov      rdx, rbx
+       shr      rdx, 1
+       mov      ecx, ebx
+       and      ecx, 1
+       or       rcx, rdx
        test     rbx, rbx
-       cmovns   rdx, rbx
-       cvtsi2sd xmm0, rdx
+       cmovns   rcx, rbx
+       cvtsi2sd xmm0, rcx
        jns      SHORT G_M29411_IG06
        addsd    xmm0, xmm0
 						;; size=47 bbWeight=0.50 PerfScore 6.92
@@ -97,14 +97,14 @@ G_M29411_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        xorps    xmm0, xmm0
-       mov      rcx, rbx
-       shr      rcx, 1
-       mov      edx, ebx
-       and      edx, 1
-       or       rdx, rcx
+       mov      rdx, rbx
+       shr      rdx, 1
+       mov      ecx, ebx
+       and      ecx, 1
+       or       rcx, rdx
        test     rbx, rbx
-       cmovns   rdx, rbx
-       cvtsi2ss xmm0, rdx
+       cmovns   rcx, rbx
+       cvtsi2ss xmm0, rcx
        jns      SHORT G_M29411_IG08
        addss    xmm0, xmm0
 						;; size=47 bbWeight=0.50 PerfScore 6.92
+0 (0.00%) : 2277.dasm - System.Number:NumberToFloatingPointBits[double](byref):ulong (FullOpts)
@@ -194,14 +194,14 @@ G_M60445_IG17:        ; bbWeight=0.50, epilog, nogc, extend
 						;; size=11 bbWeight=0.50 PerfScore 1.88
 G_M60445_IG18:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
        xorps    xmm0, xmm0
-       mov      rcx, rax
-       shr      rcx, 1
-       mov      edx, eax
-       and      edx, 1
-       or       rdx, rcx
+       mov      rdx, rax
+       shr      rdx, 1
+       mov      ecx, eax
+       and      ecx, 1
+       or       rcx, rdx
        test     rax, rax
-       cmovns   rdx, rax
-       cvtsi2sd xmm0, rdx
+       cmovns   rcx, rax
+       cvtsi2sd xmm0, rcx
        jns      SHORT G_M60445_IG19
        addsd    xmm0, xmm0
 						;; size=35 bbWeight=0.50 PerfScore 6.17
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
smoke_tests.nativeaot.windows.x64.checked.mch 11 5 0 6 -20 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
smoke_tests.nativeaot.windows.x64.checked.mch 11 0 0 11 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
smoke_tests.nativeaot.windows.x64.checked.mch 25,335 13 25,322 0 (0.00%) 0 (0.00%)

jit-analyze output

@khushal1996
Copy link
Member Author

added minor comments

I have made all the necessary changes. Let me know if you have any other concerns.

For testing -
I have ran superpmi replay with APX on/off and JitStressRegs=4000 to check for failures.
All replays are clean.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member

/ba-g unrelated failures

@kunalspathak kunalspathak merged commit 13c30d4 into dotnet:main May 27, 2025
110 of 113 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jun 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants