Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lewurm
Copy link
Contributor

@lewurm lewurm commented Feb 2, 2016

there's a short window where we release the lock and another thread
could modify the state. That is what we saw on the ARM machines for a
while, e.g. in subthread-exit.cs there's a race between
mono_thread_execute_interruption and mono_thread_suspend_all_other_threads:

Thread 2 (Thread 0x429ff430 (LWP 5189)): // SUB THREAD

0 0x40398e30 in nanosleep () from /lib/arm-linux-gnueabihf/libpthread.so.0

1 0x0028c0ac in monoeg_g_usleep (microseconds=86650) at gdate-unix.c:53

2 0x0027b1d8 in suspend_sync_nolock (id=1079603792, interrupt_kernel=1) at mono-threads.c:913

3 0x0027b2d4 in mono_thread_info_safe_suspend_and_run (id=1079603792, interrupt_kernel=1, callback=0x1ab109 <suspend_thread_critical>, user_data=0x429fe534)

at mono-threads.c:935

4 0x001ab28a in suspend_thread_internal (thread=0x401c0120, interrupt=1) at threads.c:4807

5 0x001a91ec in mono_thread_suspend_all_other_threads () at threads.c:3297

6 0x00158d8c in ves_icall_System_Environment_Exit (result=0) at icall.c:6378

7 0x400e7d9c in ?? ()

Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0x40597250 (LWP 5186)): // MAIN THREAD

0 0x4039a5f4 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0

1 0x4039796e in do_futex_wait () from /lib/arm-linux-gnueabihf/libpthread.so.0

2 0x403979da in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0

3 0x002788d8 in mono_os_sem_wait (sem=0x388248, flags=MONO_SEM_FLAGS_NONE) at ../../mono/utils/mono-os-semaphore.h:163

4 0x00279722 in mono_thread_info_wait_for_resume (info=0x388210) at mono-threads.c:144

5 0x0027ad5e in mono_thread_info_end_self_suspend () at mono-threads.c:700

6 0x001ab2d2 in self_suspend_internal (thread=0x401c0120) at threads.c:4822

7 0x001aab0a in mono_thread_execute_interruption () at threads.c:4337

8 0x001a6a82 in ves_icall_System_Threading_Thread_Sleep_internal (ms=1200) at threads.c:1217

9 0x400da75c in ?? ()

Backtrace stopped: previous frame identical to this frame (corrupt stack?)

there's a short window where we release the lock and another thread
could modify the state.  That is what we saw on the ARM machines for a
while, e.g. in `subthread-exit.cs` there's a race between
`mono_thread_execute_interruption` and `mono_thread_suspend_all_other_threads`:

> Thread 2 (Thread 0x429ff430 (LWP 5189)):  // SUB THREAD
> #0  0x40398e30 in nanosleep () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #1  0x0028c0ac in monoeg_g_usleep (microseconds=86650) at gdate-unix.c:53
> #2  0x0027b1d8 in suspend_sync_nolock (id=1079603792, interrupt_kernel=1) at mono-threads.c:913
> mono#3  0x0027b2d4 in mono_thread_info_safe_suspend_and_run (id=1079603792, interrupt_kernel=1, callback=0x1ab109 <suspend_thread_critical>, user_data=0x429fe534)
>     at mono-threads.c:935
> mono#4  0x001ab28a in suspend_thread_internal (thread=0x401c0120, interrupt=1) at threads.c:4807
> mono#5  0x001a91ec in mono_thread_suspend_all_other_threads () at threads.c:3297
> mono#6  0x00158d8c in ves_icall_System_Environment_Exit (result=0) at icall.c:6378
> mono#7  0x400e7d9c in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
> Thread 1 (Thread 0x40597250 (LWP 5186)):  // MAIN THREAD
> #0  0x4039a5f4 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #1  0x4039796e in do_futex_wait () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #2  0x403979da in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
> mono#3  0x002788d8 in mono_os_sem_wait (sem=0x388248, flags=MONO_SEM_FLAGS_NONE) at ../../mono/utils/mono-os-semaphore.h:163
> mono#4  0x00279722 in mono_thread_info_wait_for_resume (info=0x388210) at mono-threads.c:144
> mono#5  0x0027ad5e in mono_thread_info_end_self_suspend () at mono-threads.c:700
> mono#6  0x001ab2d2 in self_suspend_internal (thread=0x401c0120) at threads.c:4822
> mono#7  0x001aab0a in mono_thread_execute_interruption () at threads.c:4337
> mono#8  0x001a6a82 in ves_icall_System_Threading_Thread_Sleep_internal (ms=1200) at threads.c:1217
> mono#9  0x400da75c in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@lewurm
Copy link
Contributor Author

lewurm commented Feb 2, 2016

thanks to @ludovic-henry for coming up with the fix! @kumpera can I have a review for this please?

@luhenry
Copy link
Contributor

luhenry commented Feb 2, 2016

LGTM

@kumpera
Copy link
Contributor

kumpera commented Feb 2, 2016

The current code looks very suspicious so I had to git blame for a while to figure out what was going on.

It's a left over from old abort, which is long gone.

LGTM. Good catch.

@lewurm
Copy link
Contributor Author

lewurm commented Feb 2, 2016

thanks!

@monojenkins merge.

monojenkins added a commit that referenced this pull request Feb 3, 2016
[threads] avoid race between setting the state flag and suspension

there's a short window where we release the lock and another thread
could modify the state.  That is what we saw on the ARM machines for a
while, e.g. in `subthread-exit.cs` there's a race between
`mono_thread_execute_interruption` and `mono_thread_suspend_all_other_threads`:

> Thread 2 (Thread 0x429ff430 (LWP 5189)):  // SUB THREAD
> #0  0x40398e30 in nanosleep () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #1  0x0028c0ac in monoeg_g_usleep (microseconds=86650) at gdate-unix.c:53
> #2  0x0027b1d8 in suspend_sync_nolock (id=1079603792, interrupt_kernel=1) at mono-threads.c:913
> #3  0x0027b2d4 in mono_thread_info_safe_suspend_and_run (id=1079603792, interrupt_kernel=1, callback=0x1ab109 <suspend_thread_critical>, user_data=0x429fe534)
>     at mono-threads.c:935
> #4  0x001ab28a in suspend_thread_internal (thread=0x401c0120, interrupt=1) at threads.c:4807
> #5  0x001a91ec in mono_thread_suspend_all_other_threads () at threads.c:3297
> #6  0x00158d8c in ves_icall_System_Environment_Exit (result=0) at icall.c:6378
> #7  0x400e7d9c in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
> Thread 1 (Thread 0x40597250 (LWP 5186)):  // MAIN THREAD
> #0  0x4039a5f4 in __libc_do_syscall () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #1  0x4039796e in do_futex_wait () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #2  0x403979da in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
> #3  0x002788d8 in mono_os_sem_wait (sem=0x388248, flags=MONO_SEM_FLAGS_NONE) at ../../mono/utils/mono-os-semaphore.h:163
> #4  0x00279722 in mono_thread_info_wait_for_resume (info=0x388210) at mono-threads.c:144
> #5  0x0027ad5e in mono_thread_info_end_self_suspend () at mono-threads.c:700
> #6  0x001ab2d2 in self_suspend_internal (thread=0x401c0120) at threads.c:4822
> #7  0x001aab0a in mono_thread_execute_interruption () at threads.c:4337
> #8  0x001a6a82 in ves_icall_System_Threading_Thread_Sleep_internal (ms=1200) at threads.c:1217
> #9  0x400da75c in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@monojenkins monojenkins merged commit 477d396 into mono:master Feb 3, 2016
@lewurm lewurm deleted the thread-suspension-race branch February 3, 2016 00:01
@akoeplinger
Copy link
Member

Should we get this into 4.3.2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants