Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Hybrid suspend: backtrace in GC Unsafe mode considered harmful (on Linux)ย #8356

@lambdageek

Description

@lambdageek

It can race with the dynamic linker ๐Ÿ˜ 

This is from https://jenkins.mono-project.com/job/test-mono-mainline-linux-hybrid-suspend/label=debian-9-armel/5/parsed_console/log.html

Thread 1 is... I don't even know. But it's apparently GC Safe, so we async suspend it. it's holding a lock that Thread 4 needs. Thread 4 just did a GC Safe -> GC Unsafe transition (so we want to cooperatively suspend it), and it wants to record the state transition and is collecting a backtrace. So it ends up in dl_iterate_phdr and tries to lock a mutex, but Thread 1 got it, so T4 waits.

Meanwhile the suspend initiator is waiting for T1 to coop suspend itself, and the whole runtime is stuck.

We previously had a similar issue where two threads were racing for a backtrace because they were both doing thread transitions. Attempted fix here:

#if defined (__GNUC__) && !defined (__clang__)
/* GNU libc backtrace calls _Unwind_Backtrace in libgcc, which internally may take a lock. */
/* Suppose we're using hybrid suspend and T1 is in GC Unsafe and T2 is
* GC Safe. T1 will be coop suspended, and T2 will be async suspended.
* Suppose T1 is in RUNNING, and T2 just changed from RUNNING to
* BLOCKING and it is in trace_state_change to record this fact.
*
* suspend initiator: switches T1 to ASYNC_SUSPEND_REQUESTED
* suspend initiator: switches T2 to BLOCKING_SUSPEND_REQUESTED and sends a suspend signal
* T1: calls mono_threads_transition_state_poll (),
* T1: switches to SELF_SUSPENDED and starts trace_state_change ()
* T2: is still in checked_build_thread_transition for the RUNNING->BLOCKING transition and calls backtrace ()
* T2: suspend signal lands while T2 is in backtrace() holding a lock; T2 switches to BLOCKING_ASYNC_SUSPENDED () and waits for resume
* T1: calls backtrace (), waits for the lock ()
* suspend initiator: waiting for T1 to suspend.
*
* At this point we're deadlocked.
*
* So what we'll do is try to take a lock before calling backtrace and
* only collect a backtrace if there is no contention.
*/
int i;
for (i = 0; i < 2; i++ ) {
if (backtrace_mutex_trylock ()) {
int sz = backtrace (out_data, MAX_NATIVE_BT_PROBE);
backtrace_mutex_unlock ();
return sz;
} else {
mono_thread_info_yield ();
}
}
/* didn't get a backtrace, oh well. */
return 0;

But this is worse - this time the suspended thread was not doing a backtrace - it just needed the linker for whatever reason.

Thread 4 (Thread 0xf52ff450 (LWP 20777)):
#0  0xf7b381b8 in __lll_lock_wait () from /lib/arm-linux-gnueabi/libpthread.so.0
#1  0xf7b30630 in pthread_mutex_lock () from /lib/arm-linux-gnueabi/libpthread.so.0
#2  0xf7abf0dc in dl_iterate_phdr () from /lib/arm-linux-gnueabi/libc.so.6
#3  0xf7ac00f4 in __gnu_Unwind_Find_exidx () from /lib/arm-linux-gnueabi/libc.so.6
#4  0xf7b12454 in ?? () from /lib/arm-linux-gnueabi/libgcc_s.so.1
#5  0xf7b12b58 in ?? () from /lib/arm-linux-gnueabi/libgcc_s.so.1
#6  0xf7b1358c in _Unwind_Backtrace () from /lib/arm-linux-gnueabi/libgcc_s.so.1
#7  0xf7a92c04 in backtrace () from /lib/arm-linux-gnueabi/libc.so.6
#8  0x00d1c4bc in collect_backtrace (out_data=0xf5000cf0) at checked-build.c:224
#9  checked_build_thread_transition (transition=0xdd6238 "DONE_BLOCKING", info=info@entry=0xf5000470, from_state=from_state@entry=6, suspend_count=suspend_count@entry=0, next_state=next_state@entry=2, suspend_count_delta=suspend_count_delta@entry=0, capture_backtrace=capture_backtrace@entry=1) at checked-build.c:305
#10 0x00d18180 in trace_state_change_with_func (func=0xd3b1f0 <__func__.27444> "build_native_trace", suspend_count_delta=0, next_state=2, cur_raw_state=6, info=0xf5000470, transition=0xdd6238 "DONE_BLOCKING") at mono-threads-state-machine.c:105
#11 mono_threads_transition_done_blocking (info=info@entry=0xf5000470, func=func@entry=0xd3b1f0 <__func__.27444> "build_native_trace") at mono-threads-state-machine.c:541
#12 0x00d193cc in mono_threads_exit_gc_safe_region_unbalanced_internal (cookie=0xf5000470, stackdata=<optimized out>) at mono-threads-coop.c:311
#13 0x00a5f8ec in build_native_trace (error=<optimized out>) at mini-exceptions.c:1550
#14 setup_stack_trace (mono_ex=mono_ex@entry=0xf7402ef8, dynamic_methods=dynamic_methods@entry=0x0, trace_ips=0xf52fe548, trace_ips@entry=0xf52fe540) at mini-exceptions.c:1575
#15 0x00afe714 in handle_exception_first_pass (catch_frame=0xf52fe610, non_exception=0xf52fe690, out_prev_ji=<synthetic pointer>, out_ji=<synthetic pointer>, out_filter_idx=<synthetic pointer>, obj=0x0, ctx=0xf5001280) at mini-exceptions.c:1855
#16 mono_handle_exception_internal (ctx=0x0, ctx@entry=0xf52fe968, obj=0x0, resume=36250472, resume@entry=0, out_ji=0x0) at mini-exceptions.c:2059
#17 0x00aff124 in mono_handle_exception (ctx=ctx@entry=0xf52fe968, obj=<optimized out>) at mini-exceptions.c:2440
#18 0x00b37c24 in mono_arm_throw_exception (exc=<optimized out>, pc=-141186916, pc@entry=-141186912, sp=sp@entry=-181408992, int_regs=int_regs@entry=0xf52feaf8, fp_regs=fp_regs@entry=0xf52fea78) at exceptions-arm.c:173
#19 0x00b37ce4 in mono_arm_throw_exception_by_token (ex_token_index=<optimized out>, pc=-141186912, sp=-181408992, int_regs=0xf52feaf8, fp_regs=0xf52fea78) at exceptions-arm.c:185
#20 0xf799dd7c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xf55e0450 (LWP 20776)):
#0  0xf7b37268 in do_futex_wait.constprop () from /lib/arm-linux-gnueabi/libpthread.so.0
#1  0xf7b373e4 in __new_sem_wait_slow.constprop.0 () from /lib/arm-linux-gnueabi/libpthread.so.0
#2  0x00c8f34c in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xdfc028 <finalizer_sem>) at ../../mono/utils/mono-os-semaphore.h:209
#3  mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xdfc028 <finalizer_sem>) at ../../mono/utils/mono-coop-semaphore.h:43
#4  finalizer_thread (unused=unused@entry=0x0) at gc.c:903
#5  0x00c463d4 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1100
#6  start_wrapper (data=0x2306648) at threads.c:1160
#7  0xf7b2d2c4 in start_thread () from /lib/arm-linux-gnueabi/libpthread.so.0
#8  0xf7a8401c in ?make[5]: *** [runtest-managed] Error 1
make[4]: *** [test-jit] Error 2
? () from /lib/arm-linux-gnueabi/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xf73ff450 (LWP 20773)):
#0  0xf7b34760 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabi/libpthread.so.0
#1  0xf7b343cc in pthread_cond_destroy@@GLIBC_2.4 () from /lib/arm-linux-gnueabi/libpthread.so.0
#2  0x00000002 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xf7c5a4b0 (LWP 20711)):
#0  0xf79d9340 in sigsuspend () from /lib/arm-linux-gnueabi/libc.so.6
#1  0x00d1899c in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0xfff26380) at mono-threads-posix-signals.c:199
#2  <signal handler called>
#3  0xf7ac0080 in ?? () from /lib/arm-linux-gnueabi/libc.so.6
#4  0xf7abf19c in dl_iterate_phdr () from /lib/arm-linux-gnueabi/libc.so.6
#5  0x00000008 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions