Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Log error to host #114944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2025
Merged

Log error to host #114944

merged 1 commit into from
May 2, 2025

Conversation

Maoni0
Copy link
Member

@Maoni0 Maoni0 commented Apr 23, 2025

lately I've seen a couple of customer reports where the GC initialization failed to reserve the default 256GB of virtual memory for the regions range, to make this easier to diagnose I've added this as an error communicated to host. so you would see something like this

C:\temp>dotnet GCPerfSim.dll -tc 28 -tagb 50 -tlgb 2 -lohar 0 -sohsi 50 -lohsi 0 -pohsi 0 -sohpi 100 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
GC: Reserving 274877906944 bytes (256 GiB) for the regions range failed, do you have a virtual memory limit set on this process?
GC heap initialization failed with error 0x8007000E
Failed to create CoreCLR, HRESULT: 0x8007000E

also added a few other places where we might hit and got rid of some that are only for private testing. I'm not adding this for every single case where it could fail as they are really unlikely to be hit.

since the solution is to adjust some region configs I also exposed them to the runtimeconfig.

and fixed a typo I had for a config for private testing.

@Copilot Copilot AI review requested due to automatic review settings April 23, 2025 03:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds enhanced error reporting for GC initialization failures by logging detailed error messages to the host. It also exposes additional region configuration values to the runtimeconfig and removes private testing code while updating internal GC logging and promotion/demotion logic.

  • Updated the signature and usage of functions managing GC region promotion/demotion, including new parameters in decide_on_demotion_pin_surv.
  • Revised configuration variables for GCRegionRange and GCRegionSize to expose them via runtimeconfig.
  • Replaced many direct LogErrorToHost calls with a new log_init_error_to_host function for improved diagnostic output.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreclr/gc/gcpriv.h Updated function signatures (decide_on_demotion_pin_surv and overloads) for region logging.
src/coreclr/gc/gcconfig.h Changed configuration strings for region settings to expose them to runtimeconfig.
src/coreclr/gc/gccommon.cpp Added the log_init_error_to_host implementation and updated error handling in log file setup.
src/coreclr/gc/gc.h Declared the new log_init_error_to_host function.
src/coreclr/gc/gc.cpp Replaced direct logging calls with log_init_error_to_host and updated promotion/demotion logic.
Comments suppressed due to low confidence (4)

src/coreclr/gc/gcpriv.h:32294

  • Please add documentation or inline comments to clearly describe the purpose and expected values for the new bool parameters 'promote_gen1_pins_p' and 'large_pins_p', to aid future maintainers.
void decide_on_demotion_pin_surv (heap_segment* region, int* no_pinned_surv_region_count, bool promote_gen1_pins_p, bool large_pins_p)

src/coreclr/gc/gcconfig.h:104

  • With the configuration strings for GCRegionRange and GCRegionSize now exposed via runtimeconfig, please ensure that the associated documentation is updated to reflect these new settings.
INT_CONFIG   (GCRegionRange,             "GCRegionRange",             "System.GC.RegionRange",             0, ...

src/coreclr/gc/gc.cpp:14550

  • The new error logging calls replacing GCToEEInterface::LogErrorToHost improve diagnostic output. Please consider adding a brief inline comment to explain the threshold logic and usage of the gib() helper for clarity.
log_init_error_to_host ("Reserving %zd bytes (%zd GiB) for the regions range failed, do you have a virtual memory limit set on this process?", reserve_size, gib (reserve_size));

src/coreclr/gc/gccommon.cpp:277

  • Since log_init_error_to_host uses a static buffer for formatting the error message, consider potential thread-safety issues if this function can be called concurrently. It might be beneficial to either allocate a buffer on the stack or protect access with a lock.
char error_buf[256];

Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

…region size related configs in runtimeconfig
@Maoni0
Copy link
Member Author

Maoni0 commented Apr 23, 2025

I will be making the doc change for the Region* configs. I think the msg above should mention the RegionRange config (when the doc change happens).

@@ -49584,12 +49610,10 @@ HRESULT GCHeap::Initialize()
uint8_t* numa_mem = (uint8_t*)GCToOSInterface::VirtualReserve (hb_info_size_per_node, 0, 0, (uint16_t)numa_node_index);
if (!numa_mem)
{
GCToEEInterface::LogErrorToHost("Reservation of numa_mem failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these dont need to be logged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's under #ifdef HEAP_BALANCE_INSTRUMENTATION and that's only for a specific local analysis and is normally not defined.

@@ -49691,7 +49715,6 @@ HRESULT GCHeap::Initialize()

if (seg_mem == nullptr)
{
GCToEEInterface::LogErrorToHost("STRESS_REGIONS couldn't allocate ro segment");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to remove logs that we already had? I know this failure would be extremely rare, but I was trying to cover cases where the coreclr initialization could return E_FAIL and so users would have no idea where it came from and having the log at these places costs nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you made the choices you made. why not log places that return other HRESULTs like E_OUTOMEMORY - those are actually more interesting because just telling someone there's OOM most likely doesn't help.

my choices are for places that are likely hit by normal users and hard to debug currently. while it costs nothing - there are many, many places that return a non S_OK hr and I really didn't want to log in that many places so I just picked the ones I thought would be helpful. STRESS_REGIONS is not even defined normally - it's only used in private testing and in those cases it wouldn't be a problem for the person to look at where it fails.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@Maoni0 Maoni0 merged commit 9a879d3 into dotnet:main May 2, 2025
93 of 95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants