-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
It can race with the dynamic linker ๐
Thread 1 is... I don't even know. But it's apparently GC Safe, so we async suspend it. it's holding a lock that Thread 4 needs. Thread 4 just did a GC Safe -> GC Unsafe transition (so we want to cooperatively suspend it), and it wants to record the state transition and is collecting a backtrace. So it ends up in dl_iterate_phdr and tries to lock a mutex, but Thread 1 got it, so T4 waits.
Meanwhile the suspend initiator is waiting for T1 to coop suspend itself, and the whole runtime is stuck.
We previously had a similar issue where two threads were racing for a backtrace because they were both doing thread transitions. Attempted fix here:
mono/mono/utils/checked-build.c
Lines 200 to 232 in 006d6ce
| #if defined (__GNUC__) && !defined (__clang__) | |
| /* GNU libc backtrace calls _Unwind_Backtrace in libgcc, which internally may take a lock. */ | |
| /* Suppose we're using hybrid suspend and T1 is in GC Unsafe and T2 is | |
| * GC Safe. T1 will be coop suspended, and T2 will be async suspended. | |
| * Suppose T1 is in RUNNING, and T2 just changed from RUNNING to | |
| * BLOCKING and it is in trace_state_change to record this fact. | |
| * | |
| * suspend initiator: switches T1 to ASYNC_SUSPEND_REQUESTED | |
| * suspend initiator: switches T2 to BLOCKING_SUSPEND_REQUESTED and sends a suspend signal | |
| * T1: calls mono_threads_transition_state_poll (), | |
| * T1: switches to SELF_SUSPENDED and starts trace_state_change () | |
| * T2: is still in checked_build_thread_transition for the RUNNING->BLOCKING transition and calls backtrace () | |
| * T2: suspend signal lands while T2 is in backtrace() holding a lock; T2 switches to BLOCKING_ASYNC_SUSPENDED () and waits for resume | |
| * T1: calls backtrace (), waits for the lock () | |
| * suspend initiator: waiting for T1 to suspend. | |
| * | |
| * At this point we're deadlocked. | |
| * | |
| * So what we'll do is try to take a lock before calling backtrace and | |
| * only collect a backtrace if there is no contention. | |
| */ | |
| int i; | |
| for (i = 0; i < 2; i++ ) { | |
| if (backtrace_mutex_trylock ()) { | |
| int sz = backtrace (out_data, MAX_NATIVE_BT_PROBE); | |
| backtrace_mutex_unlock (); | |
| return sz; | |
| } else { | |
| mono_thread_info_yield (); | |
| } | |
| } | |
| /* didn't get a backtrace, oh well. */ | |
| return 0; |
But this is worse - this time the suspended thread was not doing a backtrace - it just needed the linker for whatever reason.
Thread 4 (Thread 0xf52ff450 (LWP 20777)):
#0 0xf7b381b8 in __lll_lock_wait () from /lib/arm-linux-gnueabi/libpthread.so.0
#1 0xf7b30630 in pthread_mutex_lock () from /lib/arm-linux-gnueabi/libpthread.so.0
#2 0xf7abf0dc in dl_iterate_phdr () from /lib/arm-linux-gnueabi/libc.so.6
#3 0xf7ac00f4 in __gnu_Unwind_Find_exidx () from /lib/arm-linux-gnueabi/libc.so.6
#4 0xf7b12454 in ?? () from /lib/arm-linux-gnueabi/libgcc_s.so.1
#5 0xf7b12b58 in ?? () from /lib/arm-linux-gnueabi/libgcc_s.so.1
#6 0xf7b1358c in _Unwind_Backtrace () from /lib/arm-linux-gnueabi/libgcc_s.so.1
#7 0xf7a92c04 in backtrace () from /lib/arm-linux-gnueabi/libc.so.6
#8 0x00d1c4bc in collect_backtrace (out_data=0xf5000cf0) at checked-build.c:224
#9 checked_build_thread_transition (transition=0xdd6238 "DONE_BLOCKING", info=info@entry=0xf5000470, from_state=from_state@entry=6, suspend_count=suspend_count@entry=0, next_state=next_state@entry=2, suspend_count_delta=suspend_count_delta@entry=0, capture_backtrace=capture_backtrace@entry=1) at checked-build.c:305
#10 0x00d18180 in trace_state_change_with_func (func=0xd3b1f0 <__func__.27444> "build_native_trace", suspend_count_delta=0, next_state=2, cur_raw_state=6, info=0xf5000470, transition=0xdd6238 "DONE_BLOCKING") at mono-threads-state-machine.c:105
#11 mono_threads_transition_done_blocking (info=info@entry=0xf5000470, func=func@entry=0xd3b1f0 <__func__.27444> "build_native_trace") at mono-threads-state-machine.c:541
#12 0x00d193cc in mono_threads_exit_gc_safe_region_unbalanced_internal (cookie=0xf5000470, stackdata=<optimized out>) at mono-threads-coop.c:311
#13 0x00a5f8ec in build_native_trace (error=<optimized out>) at mini-exceptions.c:1550
#14 setup_stack_trace (mono_ex=mono_ex@entry=0xf7402ef8, dynamic_methods=dynamic_methods@entry=0x0, trace_ips=0xf52fe548, trace_ips@entry=0xf52fe540) at mini-exceptions.c:1575
#15 0x00afe714 in handle_exception_first_pass (catch_frame=0xf52fe610, non_exception=0xf52fe690, out_prev_ji=<synthetic pointer>, out_ji=<synthetic pointer>, out_filter_idx=<synthetic pointer>, obj=0x0, ctx=0xf5001280) at mini-exceptions.c:1855
#16 mono_handle_exception_internal (ctx=0x0, ctx@entry=0xf52fe968, obj=0x0, resume=36250472, resume@entry=0, out_ji=0x0) at mini-exceptions.c:2059
#17 0x00aff124 in mono_handle_exception (ctx=ctx@entry=0xf52fe968, obj=<optimized out>) at mini-exceptions.c:2440
#18 0x00b37c24 in mono_arm_throw_exception (exc=<optimized out>, pc=-141186916, pc@entry=-141186912, sp=sp@entry=-181408992, int_regs=int_regs@entry=0xf52feaf8, fp_regs=fp_regs@entry=0xf52fea78) at exceptions-arm.c:173
#19 0x00b37ce4 in mono_arm_throw_exception_by_token (ex_token_index=<optimized out>, pc=-141186912, sp=-181408992, int_regs=0xf52feaf8, fp_regs=0xf52fea78) at exceptions-arm.c:185
#20 0xf799dd7c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 0xf55e0450 (LWP 20776)):
#0 0xf7b37268 in do_futex_wait.constprop () from /lib/arm-linux-gnueabi/libpthread.so.0
#1 0xf7b373e4 in __new_sem_wait_slow.constprop.0 () from /lib/arm-linux-gnueabi/libpthread.so.0
#2 0x00c8f34c in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xdfc028 <finalizer_sem>) at ../../mono/utils/mono-os-semaphore.h:209
#3 mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=0xdfc028 <finalizer_sem>) at ../../mono/utils/mono-coop-semaphore.h:43
#4 finalizer_thread (unused=unused@entry=0x0) at gc.c:903
#5 0x00c463d4 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1100
#6 start_wrapper (data=0x2306648) at threads.c:1160
#7 0xf7b2d2c4 in start_thread () from /lib/arm-linux-gnueabi/libpthread.so.0
#8 0xf7a8401c in ?make[5]: *** [runtest-managed] Error 1
make[4]: *** [test-jit] Error 2
? () from /lib/arm-linux-gnueabi/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 0xf73ff450 (LWP 20773)):
#0 0xf7b34760 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabi/libpthread.so.0
#1 0xf7b343cc in pthread_cond_destroy@@GLIBC_2.4 () from /lib/arm-linux-gnueabi/libpthread.so.0
#2 0x00000002 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 0xf7c5a4b0 (LWP 20711)):
#0 0xf79d9340 in sigsuspend () from /lib/arm-linux-gnueabi/libc.so.6
#1 0x00d1899c in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0xfff26380) at mono-threads-posix-signals.c:199
#2 <signal handler called>
#3 0xf7ac0080 in ?? () from /lib/arm-linux-gnueabi/libc.so.6
#4 0xf7abf19c in dl_iterate_phdr () from /lib/arm-linux-gnueabi/libc.so.6
#5 0x00000008 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)