Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Are !UseThreadAllocationContexts branches relevant? #115102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
am11 opened this issue Apr 27, 2025 · 17 comments
Open

Are !UseThreadAllocationContexts branches relevant? #115102

am11 opened this issue Apr 27, 2025 · 17 comments
Labels
area-GC-coreclr question Answer questions and provide assistance, not an issue with source code or documentation. untriaged New issue has not been triaged by the area owner

Comments

@am11
Copy link
Member

am11 commented Apr 27, 2025

s_useThreadAllocationContexts is ever false for win-x86/x64 and single processor:

s_useThreadAllocationContexts = !useGlobalAllocationContext && (IsServerHeap() || ::g_SystemInfo.dwNumberOfProcessors != 1 || CPUGroupInfo::CanEnableGCCPUGroups());

In some places they have specialized optimizations, e.g. this win-x64 only branch:

else
{
// Replace the 1p slow allocation helpers with faster version
//
// When we're running Workstation GC on a single proc box we don't have
// InlineGetThread versions because there is no need to call GetThread
SetJitHelperFunction(CORINFO_HELP_NEWSFAST, JIT_TrialAllocSFastSP);
SetJitHelperFunction(CORINFO_HELP_NEWSFAST_ALIGN8, JIT_TrialAllocSFastSP);
SetJitHelperFunction(CORINFO_HELP_BOX, JIT_BoxFastUP);
SetJitHelperFunction(CORINFO_HELP_NEWARR_1_VC, JIT_NewArr1VC_UP);
SetJitHelperFunction(CORINFO_HELP_NEWARR_1_OBJ, JIT_NewArr1OBJ_UP);
ECall::DynamicallyAssignFCallImpl(GetEEFuncEntryPoint(AllocateStringFastUP), ECall::FastAllocateString);
}

Are these relevant? If not, we can delete these paths.

@am11 am11 added area-VM-coreclr question Answer questions and provide assistance, not an issue with source code or documentation. labels Apr 27, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 27, 2025
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@am11
Copy link
Member Author

am11 commented Apr 27, 2025

I was looking at JIT_Box for HMF removal and found that JIT_BoxFastUP is yet another variant, which also falls back to (allocating) JIT_Box when it fails. JIT_BoxFastUP is supposed to be a "fast" helper for this case but defined in a file called JitHelpers_Slow.asm for some reason.

JIT_Box variants:

  • JIT_Box - allocates object, unbox and copy data.
  • JIT_Box_MP_FastPortable - uses allocation context and falls back to JIT_Box (all platforms except win-x86, linux-x86 and single-proc win-x64 switch to this during helper initialization.. not sure why it's not just default?)
  • JIT_BoxFastUP - win-x64 only, does the similar thing as JIT_Box_MP_FastPortable in asm code except GetThread, and also falls back to JIT_Box.
  • GenBox - x86 variant src/coreclr/vm/i386/jitinterfacex86.cpp

If UseThreadAllocationContexts is not relevant, we can start by cleaning that up as part of JIT_Box non-HMF implementation.

cc @jkotas, @filipnavara

@EgorBo
Copy link
Member

EgorBo commented Apr 27, 2025

Are these relevant? If not, we can delete these paths.

Global allocation context allows to avoid slow TLS loads in allocators (workstation gc, low/single number of cpu cores). It would make sense to benchmark it for docker/linux where it's a popular case when just 1 cpu core is assigned, but the current logic only supports Windows.

@jkotas
Copy link
Member

jkotas commented Apr 27, 2025

This is a question for @dotnet/gc. They wanted to keep this special path alive in the past.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@filipnavara
Copy link
Member

filipnavara commented Apr 27, 2025

Are these relevant?

Purely from the support perspective: Windows 10 is the last version of Windows to support uniprocessor and it goes out of support on October 14, 2025. Last Intel single-core processor was Celeron G4xx released in 2013.

The code paths don't seem to trigger for containers restricted to single processor. That's basically the only use case where I would consider this relevant. Then again, that's more relevant on Linux where we currently don't have this code path enabled.

I'm a bit torn on this. As @EgorBo said, it would be interesting to see if this makes any difference for containers but that's a separate effort...

@jkotas
Copy link
Member

jkotas commented Apr 27, 2025

docker/linux where it's a popular case when just 1 cpu core is assigned,

This path requires 1 affinitized core assigned. The popular docker case is 1 non-affinitized core assigned. This special path is not usable for that.

@filipnavara
Copy link
Member

filipnavara commented Apr 27, 2025

  • GenBox - x86 variant src/coreclr/vm/i386/jitinterfacex86.cpp

I don't see much good reasons to keep x86 special. I'd be in favor of moving this to .S/.asm files, possibly even sharing some of it with NativeAOT. Then again, that's a separate task. If we could reduce the number of code paths then it would save some effort on such consolidation. (comment was not relevant to Box specifically, NativeAOT has the impl. in managed code, CoreCLR in C++ and the portable versions may be good enough)

@am11
Copy link
Member Author

am11 commented Apr 27, 2025

I don't see much good reasons to keep x86 special. I'd be in favor of moving this to .S/.asm files, possibly even sharing some of it with NativeAOT.

That is one possibility, and while at it, we can replace JIT_Box_MP_FastPortable with asm code, including JIT_Box implementation. JIT_BoxFastUP implementation in JitHelpers_Slow.asm suggests that we may be able to use ASM offsets similar to AllocFast.S/asm in NativeAOT.

Alternatively..
AFAICT, the only reason JIT_Box requires HMF is the call to AllocateObject(). Other than IL_Throw/Rethrow (which you have already ported with win-x86 funclets), all remaining HMFs are related to Allocate{Object,String,Array} and FOH (#95695). We already have InternalAlloc and InternalAllocNoChecks exposed in RuntimeHandles.cs, perhaps we can implement JIT_Box in C# (following #109135)?

@filipnavara
Copy link
Member

perhaps we can implement JIT_Box in C#

Sounds reasonable to me. It will take some effort and benchmarking but I don't expect the code to be too dissimilar to RhBox in NativeAOT.

@am11
Copy link
Member Author

am11 commented Apr 27, 2025

Initial implementation: main...am11:runtime:feature/HMF-removal/JIT_Box

Currently throws:

$ pushd src/libraries/System.Collections/tests
$ ../../../../dotnet.sh build -t:Test         
...
===========================================================================================================
  ~/projects/runtime4/artifacts/bin/System.Collections.Tests/Debug/net10.0 ~/projects/runtime4/src/libraries/System.Collections/tests
    Discovering: System.Collections.Tests (method display = ClassAndMethod, method display options = None)
    Discovered:  System.Collections.Tests (found 7096 of 9184 test cases)
    Starting:    System.Collections.Tests (parallel test collections = on [10 threads], stop on fail = off)
  [createdump] Gathering state for process 87021 
  [createdump] Crashing thread d809ecd signal 11 (000b)
  [createdump] Writing crash report to file /coredump.87021.dmp.crashreport.json
  [createdump] Could not create json file '/coredump.87021.dmp.crashreport.json': Read-only file system (30)
  [createdump] Writing minidump with heap to file /coredump.87021.dmp
  [createdump] Target process is alive
  [createdump] Could not create output file '/coredump.87021.dmp': Read-only file system (30)
  [createdump] Failure took 1470ms
  waitpid() returned successfully (wstatus 0000ff00) WEXITSTATUS ff WTERMSIG 0
  /Users/adeel/projects/runtime4/artifacts/bin/System.Collections.Tests/Debug/net10.0/RunTests.sh: line 178: 87021 Segmentation fault: 11  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Collections.Tests.runtimeconfig.json --depsfile System.Collections.Tests.deps.json /Users/adeel/.nuget/packages/microsoft.dotnet.xunitconsolerunner/2.9.2-beta.25225.102/build/../tools/net/xunit.console.dll System.Collections.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing $RSP_FILE

Also, for fast alloc, we would need to figure out if we can expose thread local ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context on managed code (I haven't checked if we already have an existing helper).

@EgorBo
Copy link
Member

EgorBo commented Apr 28, 2025

Initial implementation: main...am11:runtime:feature/HMF-removal/JIT_Box

I wonder if we don't need it at all given jit knows how to inline boxing as is (but does it only in tier1), we just need to make it always-expand probably.

Also, for fast alloc, we would need to figure out if we can expose thread local ee_alloc_context* eeAllocContext = &t_runtime_thread_locals.alloc_context on managed code (I haven't checked if we already have an existing helper).

We probably can, but the whole allocator has to be marked as nogc somehow..

@am11
Copy link
Member Author

am11 commented Apr 28, 2025

I wonder if we don't need it at all given jit knows how to inline boxing as is (but does it only in tier1), we just need to make it always-expand probably.

That would be cleaner. We can then delete getBoxHelper and JIT_Box helpers similar to https://github.com/dotnet/runtime/pull/114754/files. :)

@jkotas
Copy link
Member

jkotas commented Apr 28, 2025

we just need to make it always-expand probably.

Can we measure the Tier0 JIT throughput and code size regression from doing this? Are there are pathological cases where this can generate a lot of new temps?

@EgorBo
Copy link
Member

EgorBo commented Apr 28, 2025

we just need to make it always-expand probably.

Can we measure the Tier0 JIT throughput and code size regression from doing this? Are there are pathological cases where this can generate a lot of new temps?

At the same time, we unroll copies even in tier0 and unlike the helper, in JIT we always know the size of the object. But hard to say without measurements

@am11
Copy link
Member Author

am11 commented Apr 28, 2025

#115134 is testing to see the impact.

@cshung
Copy link
Member

cshung commented Apr 30, 2025

From a usage perspective, here is relatively recent case where customer is still using x86 single core back in 2022.

dotnet/diagnostics#2929

I can't remember where did the dump come from - probably internal given I could not share at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-GC-coreclr question Answer questions and provide assistance, not an issue with source code or documentation. untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants