Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Assert failure: CheckInstrBytePattern(base[offset] & 0xF8, 0x58, base[offset]) #115120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kunalspathak opened this issue Apr 28, 2025 · 18 comments
Labels
area-VM-coreclr blocking-clean-ci-optional Blocking optional rolling runs untriaged New issue has not been triaged by the area owner

Comments

@kunalspathak
Copy link
Member

runtime-coreclr gcstress-extra: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1027838&view=ms.vss-test-web.build-test-results-tab

set DOTNET_TieredCompilation=0
set DOTNET_GCStress=0xC
set DOTNET_ReadyToRun=0

23:18:11.365 Running test: JIT/Methodical/flowgraph/dev10_bug642944/GCMaskForGSCookie/GCMaskForGSCookie.dll

Assert failure(PID 6256 [0x00001870], Thread: 1688 [0x0698]): CheckInstrBytePattern(base[offset] & 0xF8, 0x58, base[offset])

CORECLR! SKIP_POP_REG + 0x6F (0x71462b1f)
CORECLR! UnwindEbpDoubleAlignFrameEpilog + 0xF6 (0x71462ef6)
CORECLR! UnwindStackFrameX86 + 0x61 (0x714638f1)
CORECLR! EECodeManager::UnwindStackFrame + 0x113 (0x71463843)
CORECLR! DoGcStress + 0x207 (0x71651fce)
CORECLR! OnGcCoverageInterrupt + 0x12D (0x71652594)
CORECLR! IsGcMarker + 0xA8 (0x7147a04c)
CORECLR! CLRVectoredExceptionHandlerShim + 0xB4 (0x7146ef34)
NTDLL! LdrSetAppCompatDllRedirectionCallback + 0x1B825 (0x771cf2e5)
NTDLL! RtlUnwind + 0x1BA (0x7718ea7a)
    File: D:\a\_work\1\s\src\coreclr\vm\gc_unwind_x86.inl:2101
    Image: C:\h\w\9B4808E2\p\corerun.exe
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 28, 2025
@kunalspathak kunalspathak added blocking-clean-ci-optional Blocking optional rolling runs area-VM-coreclr untriaged New issue has not been triaged by the area owner and removed area-VM-coreclr untriaged New issue has not been triaged by the area owner labels Apr 28, 2025
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@mangod9
Copy link
Member

mangod9 commented Apr 28, 2025

is this fail only on x86?

@kunalspathak
Copy link
Member Author

@kunalspathak
Copy link
Member Author

ping @mangod9

@mangod9
Copy link
Member

mangod9 commented May 6, 2025

@janvorli @cshung, were there any changes recently in x86 stack walking related to your interpreter changes? Dont believe so, but since you guys have been recently looking into this code?

@janvorli
Copy link
Member

janvorli commented May 6, 2025

There were some possibly related changes made by @filipnavara recently.

@filipnavara
Copy link
Member

Do we have some commit range where this may have started happening? While I did touch some of the code I didn't make any conscious changes for the current default runtime configuration (ie. no FEATURE_EH_FUNCLETS defined).

@janvorli
Copy link
Member

janvorli commented May 6, 2025

I've taken a look at the dump. This is the code of the method that it is unwinding from (the actual location is marked):

0:000> !U /d 0bdf6778
Normal JIT generated code
MS_jumps1_il.VT.ToStringHelper()
ilAddr is 0BEC20B0 pImport is 07D30990
Begin 0BDF6778, size 91
0bdf6778 55              push    ebp
0bdf6779 8bec            mov     ebp,esp
0bdf677b 56              push    esi
0bdf677c 50              push    eax
0bdf677d 894df8          mov     dword ptr [ebp-8],ecx
0bdf6780 8b4df8          mov     ecx,dword ptr [ebp-8]
0bdf6783 8b09            mov     ecx,dword ptr [ecx]
0bdf6785 badcffab08      mov     edx,8ABFFDCh ("->ToStringHelper")
0bdf678a ff15b878e609    call    dword ptr ds:[9E678B8h] (System.String.Concat(System.String, System.String), mdToken: 060009C6)
0bdf6790 8b55f8          mov     edx,dword ptr [ebp-8]
0bdf6793 e8ccc21a66      call    coreclr!JIT_CheckedWriteBarrierEAX (71fa2a64)
0bdf6798 8b45f8          mov     eax,dword ptr [ebp-8]
0bdf679b 8b4004          mov     eax,dword ptr [eax+4]
0bdf679e 83f802          cmp     eax,2
0bdf67a1 774a            ja      0bdf67ed
0bdf67a3 8d0d1068df0b    lea     ecx,ds:[0BDF6810h]
0bdf67a9 8b0c81          mov     ecx,dword ptr [ecx+eax*4]
0bdf67ac 8d158067df0b    lea     edx,ds:[0BDF6780h]
0bdf67b2 03ca            add     ecx,edx
0bdf67b4 ffe1            jmp     ecx
0bdf67b6 8b45f8          mov     eax,dword ptr [ebp-8]
0bdf67b9 33c9            xor     ecx,ecx
0bdf67bb 894804          mov     dword ptr [eax+4],ecx
0bdf67be 8b4df8          mov     ecx,dword ptr [ebp-8]
0bdf67c1 8d65fc          lea     esp,[ebp-4]
0bdf67c4 5e              pop     esi (gcstress)
0bdf67c5 5d              pop     ebp (gcstress)
0bdf67c6 ff25a864b90b    jmp     dword ptr ds:[0BB964A8h] (gcstress)
0bdf67cc 8b45f8          mov     eax,dword ptr [ebp-8]
0bdf67cf 33c9            xor     ecx,ecx
0bdf67d1 894804          mov     dword ptr [eax+4],ecx
0bdf67d4 8b4df8          mov     ecx,dword ptr [ebp-8]
0bdf67d7 8d65fc          lea     esp,[ebp-4]
0bdf67da 5e              pop     esi (gcstress)
0bdf67db 5d              pop     ebp (gcstress)
0bdf67dc ff259064b90b    jmp     dword ptr ds:[0BB96490h] (gcstress)
0bdf67e2 b80c00ac08      mov     eax,8AC000Ch ("VT")
0bdf67e7 8d65fc          lea     esp,[ebp-4]
>>> 0bdf67ea 5e              pop     esi
0bdf67eb 5d              pop     ebp (gcstress)
0bdf67ec c3              ret (gcstress)
0bdf67ed b978baa308      mov     ecx,8A3BA78h (MT: System.Exception)
0bdf67f2 e81dc8bff7      call    039f3014 (JitHelp: CORINFO_HELP_NEWSFAST)
0bdf67f7 8bf0            mov     esi,eax
0bdf67f9 8bce            mov     ecx,esi
0bdf67fb ff15c87ae609    call    dword ptr ds:[9E67AC8h] (gcstress)
0bdf6801 8bce            mov     ecx,esi
0bdf6803 e8981efb65      call    coreclr!IL_Throw (71da86a0)
0bdf6808 cc              int     3

It ends up calling UnwindEbpDoubleAlignFrameEpilog with the epilogBase pointing to a copy of the code from 0bdf67e7 above, so it looks like this:

0be67a0f 8d65fc          lea     esp,[ebp-4]
0be67a12 5e              pop     esi
0be67a13 5d              pop     ebp
0be67a14 c3              ret

At that point, it tries to invoke SKIP_POP_REG, but as you can see, the instruction is lea esp,[ebp-4] and so the assert fails.
Looking at the code of the UnwindEbpDoubleAlignFrameEpilog and how it got to the problematic spot:

bool needLea = false;
if (info->localloc)
{
// ESP may be variable if a localloc was actually executed. We will reset it.
// lea esp, [ebp-calleeSavedRegs]
needLea = true;
}
else if (info->savedRegsCountExclFP == 0)
{
// We will just generate "mov esp, ebp" and be done with it.
if (info->rawStkSize != 0)
{
needMovEspEbp = true;
}
}
else if (info->rawStkSize == 0)
{
// do nothing before popping the callee-saved registers
}
else if (info->rawStkSize == sizeof(void*))
{
// "pop ecx" will make ESP point to the callee-saved registers
if (!InstructionAlreadyExecuted(offset, info->epilogOffs))
ESP += sizeof(void*);
offset = SKIP_POP_REG(epilogBase, offset);
}
else
{
// We need to make ESP point to the callee-saved registers
// lea esp, [ebp-calleeSavedRegs]
needLea = true;
}
if (needLea)
{
// lea esp, [ebp-calleeSavedRegs]
unsigned calleeSavedRegsSize = info->savedRegsCountExclFP * sizeof(void*);
if (!InstructionAlreadyExecuted(offset, info->epilogOffs))
ESP = GetRegdisplayFP(pContext) - calleeSavedRegsSize;
offset = SKIP_LEA_ESP_EBP(-int(calleeSavedRegsSize), epilogBase, offset);
}

The relevant info members are set as follows:

localloc = 0
savedRegsCountExclFP = 1
rawStkSize = 4

So due to info->rawStkSize == sizeof(void*), we get to the branch where we expect a register pop, but there is the lea instead.

So it seems to me some change in x86 codegen is probably breaking the unwinding code assumptions. @dotnet/jit-contrib, were there any recent changes in this area?

@kunalspathak
Copy link
Member Author

@filipnavara - is it related to #115089?

@kunalspathak
Copy link
Member Author

Do we have some commit range where this may have started happening?

It first started failing on 4/26

@filipnavara
Copy link
Member

filipnavara commented May 6, 2025

is it related to #115089?

That PR affects only built-in COM interop and not JIT codegen. I don't see anything related in the failing tests.

It first started failing on 4/26

I briefly checked the commits on that day. There was nothing quite standing out, although there was the async2 VM merge in the timeframe.

@kunalspathak
Copy link
Member Author

Alright, found the commit range to be 77f71dc...4144fda

@filipnavara
Copy link
Member

I assume you mean 4144fda...77f71dc (switched the start/end)

Thanks, that's helpful!

@filipnavara
Copy link
Member

JFYI, my commits:

  • a2779f6 was fixing some off-by-one errors in instruction pointers, but at that time the code path was only used for FEATURE_EH_FUNCLETS on x86 (not enabled). It's now used for all configurations in main, but that was done only few days ago, long after the first failure.
  • 97b50b0 should affect only FEATURE_EH_FUNCLETS platforms, so once again should not be relevant to x86 (yet).

@janvorli
Copy link
Member

janvorli commented May 6, 2025

The commit range is still too large to figure out a potential culprit. @kunalspathak it would be great to see what the code generated for the method looked like before the range you've mentioned. Maybe seeing the diff would help us better understand which change could have done that.

@BruceForstall
Copy link
Member

It looks like the dump is from running JIT/Methodical/VT/callconv/jumps1.il (not the test listed at the top, JIT/Methodical/flowgraph/dev10_bug642944/GCMaskForGSCookie/GCMaskForGSCookie.dll)

I'm guessing the change that caused this is #114899.

@jakobbotsch What was the purpose of this change?

@janvorli Should the x86 unwinder be able to handle either the "pop ecx" or "lea" form, and adjust appropriately?

@jakobbotsch
Copy link
Member

jakobbotsch commented May 6, 2025

The change is for the potential case where there is a jmp IL instruction and pop ecx would override the argument value. We can switch the JIT to keep emitting pop ecx in the epilogs that aren't actually doing any jumping, but I am unsure if that will fix the problem (do we then just hit it in the epilogs that do jump instead?).

Note that we had similar case already that was handled in the same way:

if (frameSize > 0)
{
#ifdef TARGET_X86
// Add 'compiler->compLclFrameSize' to ESP. Use "pop ECX" for that, except in cases
// where ECX may contain some state.
if ((frameSize == TARGET_POINTER_SIZE) && !compiler->compJmpOpUsed && !compiler->compIsAsync())
{
inst_RV(INS_pop, REG_ECX, TYP_I_IMPL);
regSet.verifyRegUsed(REG_ECX);
}
else
#endif // TARGET_X86
{
// Add 'compiler->compLclFrameSize' to ESP
// Generate "add esp, <stack-size>"
inst_RV_IV(INS_add, REG_SPBASE, frameSize, EA_PTRSIZE);
}
}

Here the other case emits add esp, 4 instead of pop ecx though, which I suppose gets handled by the VM.

@janvorli
Copy link
Member

janvorli commented May 6, 2025

Should the x86 unwinder be able to handle either the "pop ecx" or "lea" form, and adjust appropriately?

Yes, I think the unwinder should be modified to accommodate to what JIT generates. It is good that we now understand where the change comes from and that it was intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr blocking-clean-ci-optional Blocking optional rolling runs untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

6 participants