Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lambdageek
Copy link
Member

Fixes #9407

@lambdageek lambdageek requested review from kumpera and vargaz as code owners July 2, 2018 20:41
@marek-safar marek-safar mentioned this pull request Jul 2, 2018
23 tasks
@vargaz
Copy link
Contributor

vargaz commented Jul 2, 2018

Why is this needed ? The marshal lock is relatively low level lock used to protect hash table accesses etc, how can it block ?

@lambdageek
Copy link
Member Author

@vargaz There's a stack trace of a deadlock in #9407. Under hybrid suspend, GC s-t-w hangs if Thread A locks the mutex then steps into GC Safe code and is preemptively suspended while Thread B thread is in GC Unsafe (cooperatively suspended) code and also tries to lock and blocks.

If the hash table is protected by an OS mutex and the code does a g_hashtable_foreach and the callback is something that runs in GC Unsafe mode, that's pretty much guaranteed to be a deadlock under hybrid suspend.

OS mutexes should only be used if it might be taken by threads that aren't attached, or we're doing explicit GC Safe/Unsafe transitions by hand, or we can guarantee that there will be no GC Unsafe->GC Safe transitions while the mutex is locked. Everywhere else it should be a coop mutex.

@vargaz
Copy link
Contributor

vargaz commented Jul 3, 2018

Coop mutexes are kinda heavyweight, so we should avoid them if possible. In this case, the problem is that the code calls mono_method_signature () which is an api function so it enters unsafe mode, it should call some internal version which doesn't do that.

@lambdageek
Copy link
Member Author

I think for this specific mutex we can fix it by pulling the call mono_method_signature so we get the signature before taking the marshal_lock in mono_marshal_free_dynamic_wrappers.

But I don't think your suggestion about mono_method_signature is right. The runtime is mostly in GC Unsafe mode. The fact that mono_method_signature is a wrapper that does a Unsafe->Unsafe transition is mostly irrelevant - that just happens to be the first place where we do a thread state poll and coop-suspend the thread. The implementation of mono_method_signature_checked calls a ton of other methods that can poll and coop-suspend the thread.

We should make mono_threads_enter_gc_safe_region_unbalanced_with_info cheaper on non-bitcode platforms, but I still think most mutexes in the runtime should be coop.

But this kind of thing is going to come up again. All non-coop locks are suspect.

@vargaz
Copy link
Contributor

vargaz commented Jul 3, 2018

Alternatively, the problem is that the code which calls it is in gc safe mode, probably the aot runtime code should transition to gc unsafe mode at the beginning.

@lambdageek
Copy link
Member Author

lambdageek commented Jul 3, 2018

@vargaz

  1. Yea I need to look a bit closer at that AOT runtime code
  2. But I think this problem would still show up if all threads were in GC Unsafe. The issue really is that if you're Thread A holding an OS mutex and you call anything that may checkpoint you will be suspended while holding a mutex. If any other Thread B, also in Unsafe, tries to take the same mutex it will block and prevent suspend from finishing. So Thread A is already suspend and is holding the mutex. Thread B is waiting for the mutex. And the Suspend Initiator is waiting for B to suspend. B will never wake up. The suspend initiator will never resume A so the mutex will never be unlocked. Everyone is stuck.
  3. For curiosity's sake, I just tried auditing marshal.c to see what it would take to keep the marshal mutex using an OS mutex, and it will require changes to mono_ftnptr_to_delegate_handle, delegate_hash_table_add, get_cache (because it's called with mono_metadata_signature_equal as equal_func), lookup_string_ctor_signature, mono_marshal_get_runtime_invoke_dynamic, mono_marshal_get_synchronized_wrapper, mono_marhsal_free_dynamic_wrappers because they all call back into the runtime with the mutex locked. Some of them would require pretty substantial reorganization.

@lambdageek lambdageek merged commit 27cb721 into mono:master Jul 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants