-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[profiler] Fix thread list insertion failures. #6115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mono/metadata/threads.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting change, cuz the old code was incorrect.
MONO_PROFILER_RAISE uses hazard pointers in the same way as mono_thread_info_lookup, so whatever we need to be protected would be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the callbacks invoked by MONO_PROFILER_RAISE may or may not use hazard pointers. The mono_hazard_pointer_clear below was there to clear HP1 (which would have been set by mono_thread_info_lookup) just in case the profiler callbacks wouldn't have overwritten it with something else. Was that approach actually problematic?
mono/profiler/log.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this keep happening frequently, we could return a dummy MonoProfilerThread instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never happen, and indeed never did outside #5710, where it only happened because that PR introduced events that got fired after the thread_stopped event. I think an assertion is fine, we don't want a bug like this to go unnoticed. (Also, it already passed the stress tests in an earlier run of this PR, so I think we're good.)
mono/metadata/profiler-events.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have thread_exited introduce a new callback typedef while thread_stopping doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy/paste error.
thread_stopping occurs earlier than thread_stopped, before any of the detach code has run. thread_exited occurs after thread_stopped, once all detach logic has finished.
…events.
With the revamped GC root reporting work, some GC root unregister events will
arrive after the thread_stopped event. This will cause the thread to be
re-added to the thread list in the profiler, and never be removed until program
exit. This meant that if such a thread had actually exited and a new thread
would reuse its thread ID, that new thread would fail to add itself to the
thread list, leading to failures like this one:
init_thread: failed to insert thread 0x70000b761000 in log_profiler.profiler_thread_list, found = true
By using the new thread_exited event, we remove the thread from the thread list
after these GC root unregister events have arrived, thereby ensuring that the
thread won't be incorrectly 'resurrected'.
…ents. These checks are no longer necessary as these thread callbacks are not invoked for tools threads in the first place.
…esurrection case.
|
@DavidKarlas said this fixes the assertions for him. |
|
@monojenkins merge |
|
Stress test failure is unrelated and will be fixed by #6109. |
|
cannot merge:
|
|
Stress test is failing on |
[profiler] Fix thread list insertion failures. Commit migrated from mono/mono@0c88bc2
With the revamped GC root reporting work, some GC root unregister events will arrive after the
thread_stoppedevent. This will cause the thread to be re-added to the thread list in the profiler, and never be removed until program exit. This meant that if such a thread had actually exited and a new thread would reuse its thread ID, that new thread would fail to add itself to the thread list, leading to failures like this one:By using the new
thread_exitedevent, we remove the thread from the thread list after these GC root unregister events have arrived, thereby ensuring that the thread won't be incorrectly 'resurrected'.