Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@alexanderkyte
Copy link
Contributor

This contains the commits from

#12125
#12126
and
#12518

We add a checked build mode that asserts when mono mallocs inside of
the crash reporter. It makes risky allocations into assertions. It's
useful for automated testing because the double-abort often represents
itself as an indefinite hang. If it happens before the thread dumping
supervisor process is started, or after it ends, the crash reporter
hangs.
Threads without domains that get segfaults will end up in
this handler. It's not safe to call this function with a NULL domain.

See crash below:

```
* thread mono#1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10eff40f8)
  * frame #0: 0x000000010e1510d9 mono-sgen`mono_threads_summarize_execute(ctx=0x0000000000000000, out=0x0000001000000000, hashes=0x0000100000100000, silent=4096, mem="", provided_size=2199023296512) at threads.c:6414
    frame mono#1: 0x000000010e152092 mono-sgen`mono_threads_summarize(ctx=0x000000010effda00, out=0x000000010effdba0, hashes=0x000000010effdb90, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6508
    frame mono#2: 0x000000010df7c69f mono-sgen`dump_native_stacktrace(signal="SIGSEGV", ctx=0x000000010effef48) at mini-posix.c:1026
    frame mono#3: 0x000000010df7c37f mono-sgen`mono_dump_native_crash_info(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-posix.c:1147
    frame mono#4: 0x000000010de720a9 mono-sgen`mono_handle_native_crash(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-exceptions.c:3227
    frame mono#5: 0x000000010dd6ac0d mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48, debug_fault_addr=0xffffffffffffffff) at mini-runtime.c:3574
    frame mono#6: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48) at mini-runtime.c:3612
    frame mono#7: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame mono#8: 0x0000000110bb81c1
    frame mono#9: 0x000000011085ffe1
    frame mono#10: 0x000000010dd6d4f3 mono-sgen`mono_jit_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x00007ffee1ea9f08, error=0x00007ffee1eaa250) at mini-runtime.c:3215
    frame mono#11: 0x000000010e11509d mono-sgen`do_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x0000000000000000, error=0x00007ffee1eaa250) at object.c:2977
    frame mono#12: 0x000000010e10d961 mono-sgen`mono_runtime_invoke_checked(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, error=0x00007ffee1eaa250) at object.c:3145
    frame mono#13: 0x000000010e11aa58 mono-sgen`do_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5042
    frame mono#14: 0x000000010e118803 mono-sgen`mono_runtime_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5138
    frame mono#15: 0x000000010e118856 mono-sgen`mono_runtime_run_main_checked(method=0x00007faae4f01fe8, argc=2, argv=0x00007ffee1eaa760, error=0x00007ffee1eaa250) at object.c:4599
    frame mono#16: 0x000000010de1db2f mono-sgen`mono_jit_exec_internal(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1298
    frame mono#17: 0x000000010de1d95d mono-sgen`mono_jit_exec(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1257
    frame mono#18: 0x000000010de2257f mono-sgen`main_thread_handler(user_data=0x00007ffee1eaa6a0) at driver.c:1375
    frame mono#19: 0x000000010de20852 mono-sgen`mono_main(argc=3, argv=0x00007ffee1eaa758) at driver.c:2551
    frame mono#20: 0x000000010dd56d7e mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffee1eaa758) at main.c:50
    frame mono#21: 0x000000010dd5638d mono-sgen`main(argc=3, argv=0x00007ffee1eaa758) at main.c:406
    frame mono#22: 0x00007fff73aaf015 libdyld.dylib`start + 1
    frame mono#23: 0x00007fff73aaf015 libdyld.dylib`start + 1
  thread mono#2, name = 'SGen worker'
    frame #0: 0x000000010e2afd77 mono-sgen`mono_get_hazardous_pointer(pp=0x0000000000000178, hp=0x000000010ef87618, hazard_index=0) at hazard-pointer.c:208
    frame mono#1: 0x000000010e0b28e1 mono-sgen`mono_jit_info_table_find_internal(domain=0x0000000000000000, addr=0x00007fff73bffa16, try_aot=1, allow_trampolines=1) at jit-info.c:304
    frame mono#2: 0x000000010dd6aa5f mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0, debug_fault_addr=0x000000010e28fb20) at mini-runtime.c:3540
    frame mono#3: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0) at mini-runtime.c:3612
    frame mono#4: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame mono#5: 0x00007fff73bffa17 libsystem_kernel.dylib`__psynch_cvwait + 11
    frame mono#6: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame mono#7: 0x000000010e28d76d mono-sgen`mono_os_cond_wait(cond=0x000000010e44c9d8, mutex=0x000000010e44c998) at mono-os-mutex.h:168
    frame mono#8: 0x000000010e28df4f mono-sgen`get_work(worker_index=0, work_context=0x000070000fb81ee0, do_idle=0x000070000fb81ed4, job=0x000070000fb81ec8) at sgen-thread-pool.c:165
    frame mono#9: 0x000000010e28d2cb mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196
    frame mono#10: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame mono#11: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame mono#12: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread mono#3, name = 'Finalizer'
    frame #0: 0x00007fff73bf6246 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame mono#1: 0x000000010e1d9c0a mono-sgen`mono_os_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84
    frame mono#2: 0x000000010e1d832d mono-sgen`mono_coop_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41
    frame mono#3: 0x000000010e1da787 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920
    frame mono#4: 0x000000010e152919 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x000070000fd85000) at threads.c:1178
    frame mono#5: 0x000000010e1525b6 mono-sgen`start_wrapper(data=0x00007faae4f31bd0) at threads.c:1238
    frame mono#6: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame mono#7: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame mono#8: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread mono#4
    frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10
    frame mono#1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame mono#2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13
(lldb)
```
Each frame that prints ends up increased by the size of buff.
In practice, clang often fails to deduplicate some of these buffers,
leading to 30k-big stackframes.

It was noticed by a series of hard-to-diagnose segfaults on stacks that
looked otherwise fine during the crash reporting stress test.

This change fixes this, making stacks a 1/10th of the size. It doesn't
seem to break the crash reporter messages anywhere (may need to shrink
other "max name length" fields), and it's not mission-critical anywhere
else.
@alexanderkyte
Copy link
Contributor Author

@monojenkins build deb with monolite

@alexanderkyte
Copy link
Contributor Author

@monojenkins build failed

@alexanderkyte
Copy link
Contributor Author

Having difficulty reproducing the failure. The logs on CI makes it seem like the gdb phase is running way after the rest of it closes. Adding some assertions locally, I can see some weird behavior around waitpid. I can't force it to act anywhere near what the crash is showing though.

@alexanderkyte
Copy link
Contributor Author

alexanderkyte commented Jan 24, 2019

The logs here:

	0x3 - Unknown

=================================================================
	Telemetry Dumper:
=================================================================
Pkilling 0x70000ff9f000 from 0x7fffc56d73c0
Entering thread summarizer pause from 0x7fffc56d73c0
Finished thread summarizer pause from 0x7fffc56d73c0.

Waiting for dumping threads to resume

The MERP upload step has failed.
Merp invoke command failed, expected failure?
Xml file <?xml version="1.0" encoding="UTF-8"?>
<WERReportMetadata>
<ProblemSignatures>
<EventType>AppleAppCrash</EventType>
<Parameter0>Test.Xam.Minimal</Parameter0>
<Parameter1>123456</Parameter1>
<Parameter2>x64</Parameter2>
<Parameter3>com.xam.Minimal</Parameter3>
<Parameter4>Mono Exception</Parameter4>
<Parameter5>5.20.0 (pull-request-12571/9af70bcc82c Thu Jan 24 01:11:29 EST 2019)</Parameter5>
<Parameter6>0x0</Parameter6>

imply a crash happening after

https://github.com/mono/mono/blob/master/mono/mini/mini-posix.c#L1090

But the lack of the "managed stacktrace" being printed means it is crashing before the dumper returns. We don't see any gdb or lldb output, but that may be a result of all of the lldb/gdb output being moved to the end of the test output file on jenkins. It happened with the other jobs that seemed to finish successfully.

Since we're not seeing the json file printed, that means that the file was not created. The xml file also seems to not have been made.

The log showing the run having a Merp invoke command failed, expected failure? means that we managed to hit the log statement for invoking merp.

https://github.com/mono/mono/blob/master/mono/utils/mono-merp.c#L497

This contrasts with the fact that we can't find the relevant files. I'm going to log those file paths in the output and see if CI shows anything weird.

@alexanderkyte
Copy link
Contributor Author

I've ran this like 3,000 times locally on OSX x64 and not seen any failures.

@luhenry luhenry merged commit 0255d3c into mono:2018-10 Jan 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants