Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@alexanderkyte
Copy link
Contributor

@alexanderkyte alexanderkyte commented Dec 19, 2018

The added test caught a lot of failures. It failed most of the time it was run before (90% of executions resulting in crashes in seq-point or jit info tables or in gdb dumper).

After these changes, I was able to run the suite 10k times on OSX and 10k times on Linux without seeing a failure.

This brings some crash format changes too:

https://gist.github.com/alexanderkyte/72c12f0450513f079189e3e6561db502

I think this is a good idea. If we've got so many crash-time state reporters, having visual demarcation between them makes it easier to say "we crashed in the middle of doing X".

Please don't squash these changes, as they're rather varied, and would not offer a single semantic unit to be rebased around and reverted.

@alexanderkyte
Copy link
Contributor Author

alexanderkyte commented Dec 22, 2018

So I managed to get it reporting "no crash" most of the time (successfully run 50 runs on OSX). I run into a few issues with our sequence point code and with our jit lookup code (outside of the crash reporter path) crashing sometimes.

These tests really showed just how unstable a lot of the other code along the native crash path is. It's interesting to me that subjectively, it doesn't appear to be very unstable. When run in a loop like this, even with crash reporting compiled out of the runtime, you see a ton of failures (1 per 200-300 runs).

Most of them seem non-reproducible and related to sequence point and JIT internal memory. It would be a good idea to run this "stress test" a lot and to make a lot of bugs for the bug pool from it.

@alexanderkyte
Copy link
Contributor Author

I'm going to talk to one of the MERP folks and get some kind of schema I can assert about the output config files for this test. If we can cover the file contents, we should remove any bug surface for integration "surprises".

@alexanderkyte
Copy link
Contributor Author

Really rare but still interesting crash:

(lldb) bt all
warning: could not execute support code to read Objective-C class data in the process. This may reduce the quality of type information available.
* thread #1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001119b9360 dyld`dladdr + 102
    frame #1: 0x00007fff73ab0caf libdyld.dylib`dladdr + 103
    frame #2: 0x000000010c34523c mono-sgen`mono_get_portable_ip(in_ip=4537948628, out_ip=0x00007ffee39d3be8, out_offset=0x00007ffee39d3bf0, out_module=0x00007ffee39d3bf8, out_name="") at mini-exceptions.c:1426
    frame #3: 0x000000010c33cbc9 mono-sgen`mono_summarize_unmanaged_stack(out=0x00007ffee39cf280) at mini-exceptions.c:1638
    frame #4: 0x000000010c62134f mono-sgen`mono_threads_summarize_native_self(out=0x00007ffee39cf280, ctx=0x00007ffee39d7b30) at threads.c:6100
    frame #5: 0x000000010c62143d mono-sgen`mono_threads_summarize_execute(ctx=0x00007ffee39d7b30, out=0x00007ffee39d7cd0, hashes=0x00007ffee39d7cc0, silent=0, mem=0x0000000000000000, provided_size=0) at threads.c:6408
    frame #6: 0x000000010c622052 mono-sgen`mono_threads_summarize(ctx=0x00007ffee39d7b30, out=0x00007ffee39d7cd0, hashes=0x00007ffee39d7cc0, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6487
    frame #7: 0x000000010c44d5a6 mono-sgen`dump_native_stacktrace(signal="SIGABRT", ctx=0x00007ffee39d8a90) at mini-posix.c:1017
    frame #8: 0x000000010c44d3cf mono-sgen`mono_dump_native_crash_info(signal="SIGABRT", ctx=0x00007ffee39d8a90, info=0x00007ffee39d8a28) at mini-posix.c:1129
    frame #9: 0x000000010c3430d9 mono-sgen`mono_handle_native_crash(signal="SIGABRT", ctx=0x00007ffee39d8a90, info=0x00007ffee39d8a28) at mini-exceptions.c:3227
    frame #10: 0x000000010c44c790 mono-sgen`sigabrt_signal_handler(_dummy=6, _info=0x00007ffee39d8a28, context=0x00007ffee39d8a90) at mini-posix.c:223
    frame #11: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #12: 0x00007fff73bffb67 libsystem_kernel.dylib`__pthread_kill + 11
    frame #13: 0x00007fff73dca080 libsystem_pthread.dylib`pthread_kill + 333
    frame #14: 0x00007fff73b5b1ae libsystem_c.dylib`abort + 127
    frame #15: 0x00007fff73c59822 libsystem_malloc.dylib`free + 521
    frame #16: 0x000000010e7b99d4 libtest.0.dylib`monoeg_g_free(ptr=0x00007ffee39d8c34) at gmem.c:86
    frame #17: 0x000000010e7b333b libtest.0.dylib`mono_test_MerpCrashMalloc at libtest.c:7710
    frame #18: 0x000000010e798288
    frame #19: 0x000000010cbd5fd1
    frame #20: 0x000000010c23e563 mono-sgen`mono_jit_runtime_invoke(method=0x00007f86bc60a2e8, obj=0x0000000000000000, params=0x00007ffee39d9180, exc=0x00007ffee39d8f08, error=0x00007ffee39d9250) at mini-runtime.c:3215
    frame #21: 0x000000010c5e535d mono-sgen`do_runtime_invoke(method=0x00007f86bc60a2e8, obj=0x0000000000000000, params=0x00007ffee39d9180, exc=0x0000000000000000, error=0x00007ffee39d9250) at object.c:2977
    frame #22: 0x000000010c5ddc21 mono-sgen`mono_runtime_invoke_checked(method=0x00007f86bc60a2e8, obj=0x0000000000000000, params=0x00007ffee39d9180, error=0x00007ffee39d9250) at object.c:3145
    frame #23: 0x000000010c5ead18 mono-sgen`do_exec_main_checked(method=0x00007f86bc60a2e8, args=0x000000010cc003e8, error=0x00007ffee39d9250) at object.c:5042
    frame #24: 0x000000010c5e8ac3 mono-sgen`mono_runtime_exec_main_checked(method=0x00007f86bc60a2e8, args=0x000000010cc003e8, error=0x00007ffee39d9250) at object.c:5138
    frame #25: 0x000000010c5e8b16 mono-sgen`mono_runtime_run_main_checked(method=0x00007f86bc60a2e8, argc=2, argv=0x00007ffee39d9760, error=0x00007ffee39d9250) at object.c:4599
    frame #26: 0x000000010c2eeb9f mono-sgen`mono_jit_exec_internal(domain=0x00007f86bc608c20, assembly=0x00007f86bc7004f0, argc=2, argv=0x00007ffee39d9760) at driver.c:1298
    frame #27: 0x000000010c2ee9cd mono-sgen`mono_jit_exec(domain=0x00007f86bc608c20, assembly=0x00007f86bc7004f0, argc=2, argv=0x00007ffee39d9760) at driver.c:1257
    frame #28: 0x000000010c2f35af mono-sgen`main_thread_handler(user_data=0x00007ffee39d96a0) at driver.c:1375
    frame #29: 0x000000010c2f187d mono-sgen`mono_main(argc=3, argv=0x00007ffee39d9758) at driver.c:2551
    frame #30: 0x000000010c227dee mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffee39d9758) at main.c:50
    frame #31: 0x000000010c2273fd mono-sgen`main(argc=3, argv=0x00007ffee39d9758) at main.c:406
    frame #32: 0x00007fff73aaf015 libdyld.dylib`start + 1
    frame #33: 0x00007fff73aaf015 libdyld.dylib`start + 1
  thread #2, name = 'SGen worker'
    frame #0: 0x00007fff73bffa16 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame #2: 0x000000010c75d6ad mono-sgen`mono_os_cond_wait(cond=0x000000010c88a398, mutex=0x000000010c88a358) at mono-os-mutex.h:168
    frame #3: 0x000000010c75de8f mono-sgen`get_work(worker_index=0, work_context=0x0000700005651ee0, do_idle=0x0000700005651ed4, job=0x0000700005651ec8) at sgen-thread-pool.c:165
    frame #4: 0x000000010c75d20b mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196
    frame #5: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame #6: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread #3, name = 'Finalizer'
    frame #0: 0x00007fff73bffa46 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff73dc7b9d libsystem_pthread.dylib`_pthread_mutex_lock_wait + 83
    frame #2: 0x00007fff73dc54c8 libsystem_pthread.dylib`_pthread_mutex_lock_slow + 253
    frame #3: 0x00007fff73aaf140 libdyld.dylib`LockHelper::LockHelper() + 16
    frame #4: 0x00007fff73ab0c81 libdyld.dylib`dladdr + 57
    frame #5: 0x000000010c34523c mono-sgen`mono_get_portable_ip(in_ip=140735137213433, out_ip=0x0000700005850f18, out_offset=0x0000700005850f20, out_module=0x0000700005850f28, out_name="") at mini-exceptions.c:1426
    frame #6: 0x000000010c33cbc9 mono-sgen`mono_summarize_unmanaged_stack(out=0x000070000584c5b0) at mini-exceptions.c:1638
    frame #7: 0x000000010c62134f mono-sgen`mono_threads_summarize_native_self(out=0x000070000584c5b0, ctx=0x0000700005854578) at threads.c:6100
    frame #8: 0x000000010c62143d mono-sgen`mono_threads_summarize_execute(ctx=0x0000700005854578, out=0x0000700005854570, hashes=0x0000700005854560, silent=0, mem=0x0000000000000000, provided_size=0) at threads.c:6408
    frame #9: 0x000000010c44c274 mono-sgen`sigterm_signal_handler(_dummy=15, _info=0x0000700005854b58, context=0x0000700005854bc0) at mini-posix.c:240
    frame #10: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #11: 0x00007fff73bf6247 libsystem_kernel.dylib`semaphore_wait_trap + 11
    frame #12: 0x000000010c6a9b4a mono-sgen`mono_os_sem_wait(sem=0x000000010c87bdc0, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84
    frame #13: 0x000000010c6a826d mono-sgen`mono_coop_sem_wait(sem=0x000000010c87bdc0, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41
    frame #14: 0x000000010c6aa6c7 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920
    frame #15: 0x000000010c6228d9 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x0000700005855000) at threads.c:1178
    frame #16: 0x000000010c622576 mono-sgen`start_wrapper(data=0x00007f86bc639d00) at threads.c:1238
    frame #17: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame #18: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame #19: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread #4
    frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame #2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13

@alexanderkyte
Copy link
Contributor Author

Looks like condvar waiting is inherently dangerous because segfaults on OSX are sent to every thread waiting on a condvar, along with the thread that triggered the segfault. Doesn't seem to impact semaphores. I had noticed this a while back when it was suggested I use condvars for the dumper; it didn't really work. That's why we're using semaphores now.

  thread #1, name = 'tid_307', queue = 'com.apple.main-thread'
    frame #0: 0x0000000105d4c0d9 mono-sgen`mono_threads_summarize_execute(ctx=0x0000000000000000, out=0x0000001000000000, hashes=0x0000100000100000, silent=4096, mem="", provided_size=2199023296512) at threads.c:6414
    frame #1: 0x0000000105d4d092 mono-sgen`mono_threads_summarize(ctx=0x0000000106bf9a00, out=0x0000000106bf9ba0, hashes=0x0000000106bf9b90, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6508
    frame #2: 0x0000000105b7769f mono-sgen`dump_native_stacktrace(signal="SIGSEGV", ctx=0x0000000106bfaf48) at mini-posix.c:1026
    frame #3: 0x0000000105b7737f mono-sgen`mono_dump_native_crash_info(signal="SIGSEGV", ctx=0x0000000106bfaf48, info=0x0000000106bfaee0) at mini-posix.c:1147
    frame #4: 0x0000000105a6d0a9 mono-sgen`mono_handle_native_crash(signal="SIGSEGV", ctx=0x0000000106bfaf48, info=0x0000000106bfaee0) at mini-exceptions.c:3227
    frame #5: 0x0000000105965c0d mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x0000000106bfaee0, context=0x0000000106bfaf48, debug_fault_addr=0xffffffffffffffff) at mini-runtime.c:3574
    frame #6: 0x00000001059658d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x0000000106bfaee0, context=0x0000000106bfaf48) at mini-runtime.c:3612
    frame #7: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #8: 0x00000001087b81c1
    frame #9: 0x000000010845ffd1
    frame #10: 0x00000001059684f3 mono-sgen`mono_jit_runtime_invoke(method=0x00007f954df085b8, obj=0x0000000000000000, params=0x00007ffeea2af180, exc=0x00007ffeea2aef08, error=0x00007ffeea2af250) at mini-runtime.c:3215
    frame #11: 0x0000000105d1009d mono-sgen`do_runtime_invoke(method=0x00007f954df085b8, obj=0x0000000000000000, params=0x00007ffeea2af180, exc=0x0000000000000000, error=0x00007ffeea2af250) at object.c:2977
    frame #12: 0x0000000105d08961 mono-sgen`mono_runtime_invoke_checked(method=0x00007f954df085b8, obj=0x0000000000000000, params=0x00007ffeea2af180, error=0x00007ffeea2af250) at object.c:3145
    frame #13: 0x0000000105d15a58 mono-sgen`do_exec_main_checked(method=0x00007f954df085b8, args=0x0000000106c003e8, error=0x00007ffeea2af250) at object.c:5042
    frame #14: 0x0000000105d13803 mono-sgen`mono_runtime_exec_main_checked(method=0x00007f954df085b8, args=0x0000000106c003e8, error=0x00007ffeea2af250) at object.c:5138
    frame #15: 0x0000000105d13856 mono-sgen`mono_runtime_run_main_checked(method=0x00007f954df085b8, argc=2, argv=0x00007ffeea2af760, error=0x00007ffeea2af250) at object.c:4599
    frame #16: 0x0000000105a18b2f mono-sgen`mono_jit_exec_internal(domain=0x00007f954df06e90, assembly=0x00007f954dd0c270, argc=2, argv=0x00007ffeea2af760) at driver.c:1298
    frame #17: 0x0000000105a1895d mono-sgen`mono_jit_exec(domain=0x00007f954df06e90, assembly=0x00007f954dd0c270, argc=2, argv=0x00007ffeea2af760) at driver.c:1257
    frame #18: 0x0000000105a1d57f mono-sgen`main_thread_handler(user_data=0x00007ffeea2af6a0) at driver.c:1375
    frame #19: 0x0000000105a1b852 mono-sgen`mono_main(argc=3, argv=0x00007ffeea2af758) at driver.c:2551
    frame #20: 0x0000000105951d7e mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffeea2af758) at main.c:50
    frame #21: 0x000000010595138d mono-sgen`main(argc=3, argv=0x00007ffeea2af758) at main.c:406
    frame #22: 0x00007fff73aaf015 libdyld.dylib`start + 1
    frame #23: 0x00007fff73aaf015 libdyld.dylib`start + 1
* thread #2, name = 'SGen worker'
    frame #0: 0x0000000105eaad77 mono-sgen`mono_get_hazardous_pointer(pp=0x0000000000000178, hp=0x0000000106b84618, hazard_index=0) at hazard-pointer.c:208
    frame #1: 0x0000000105cad8e1 mono-sgen`mono_jit_info_table_find_internal(domain=0x0000000000000000, addr=0x00007fff73bffa16, try_aot=1, allow_trampolines=1) at jit-info.c:304
    frame #2: 0x0000000105965a5f mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000070000d6fec58, context=0x000070000d6fecc0, debug_fault_addr=0x0000000105e8ab20) at mini-runtime.c:3540
    frame #3: 0x00000001059658d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000070000d6fec58, context=0x000070000d6fecc0) at mini-runtime.c:3612
    frame #4: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #5: 0x00007fff73bffa17 libsystem_kernel.dylib`__psynch_cvwait + 11
    frame #6: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732
  * frame #7: 0x0000000105e8876d mono-sgen`mono_os_cond_wait(cond=0x00000001060479d8, mutex=0x0000000106047998) at mono-os-mutex.h:168
    frame #8: 0x0000000105e88f4f mono-sgen`get_work(worker_index=0, work_context=0x000070000d6feee0, do_idle=0x000070000d6feed4, job=0x000070000d6feec8) at sgen-thread-pool.c:165
    frame #9: 0x0000000105e882cb mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196
    frame #10: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame #11: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame #12: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread #3, name = 'Finalizer'
    frame #0: 0x00007fff73bf6246 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x0000000105dd4c0a mono-sgen`mono_os_sem_wait(sem=0x0000000106039400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84
    frame #2: 0x0000000105dd332d mono-sgen`mono_coop_sem_wait(sem=0x0000000106039400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41
    frame #3: 0x0000000105dd5787 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920
    frame #4: 0x0000000105d4d919 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x000070000d902000) at threads.c:1178
    frame #5: 0x0000000105d4d5b6 mono-sgen`start_wrapper(data=0x00007f954f906a70) at threads.c:1238
    frame #6: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame #7: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame #8: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread #4
    frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame #2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13

@alexanderkyte
Copy link
Contributor Author

Looks like forking isn't async-safe on OSX. No suggested workarounds.

(lldb) bt
warning: could not execute support code to read Objective-C class data in the process. This may reduce the quality of type information available.
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00007fff73c59227 libsystem_malloc.dylib`tiny_malloc_from_free_list + 148
    frame #1: 0x00007fff73c583bf libsystem_malloc.dylib`szone_malloc_should_clear + 422
    frame #2: 0x00007fff73c581bd libsystem_malloc.dylib`malloc_zone_malloc + 103
    frame #3: 0x00007fff73c574c7 libsystem_malloc.dylib`malloc + 24
    frame #4: 0x00007fff73dbb13a libsystem_notify.dylib`_nc_table_new + 20
    frame #5: 0x00007fff73db715b libsystem_notify.dylib`_notify_init_globals + 47
    frame #6: 0x00007fff73db719f libsystem_notify.dylib`_notify_fork_child + 35
    frame #7: 0x00007fff717c9b13 libSystem.B.dylib`libSystem_atfork_child + 49
    frame #8: 0x00007fff73b0e683 libsystem_c.dylib`fork + 47
    frame #9: 0x000000010d8eb4e6 mono-sgen`mono_merp_send(merp=0x00007ffee285c588) at mono-merp.c:266
    frame #10: 0x000000010d8e9057 mono-sgen`mono_merp_invoke(crashed_pid=97389, signal="SIGABRT", non_param_data="{\n  \"protocol_version\" : \"0.0.2\",\n  \"configuration\" : {\n    \"version\" : \"(5.23.0) (mono_merp_crashed/c100817b22a)\",\n    \"tlc\" : \"normal\",\n    \"sigsgev\" : \"altstack\",\n    \"notifications\" : \"kqueue\",\n    \"architecture\" : \"amd64\",\n    \"disabled_features\" : \"none\",\n    \"smallconfig\" : \"disabled\",\n    \"bigarrays\" : \"disabled\",\n    \"softdebug\" : \"enabled\",\n    \"interpreter\" : \"enabled\",\n    \"llvm_support\" : \"disabled\",\n    \"suspend\" : \"hybrid\"\n  },\n  \"memory\" : {\n    \"Resident Size\" : \"17739776\",\n    \"Virtual Size\" : \"4532985856\",\n    \"minor_gc_time\" : \"0\",\n    \"major_gc_time\" : \"0\",\n    \"minor_gc_count\" : \"0\",\n    \"major_gc_count\" : \"0\",\n    \"major_gc_time_concurrent\" : \"0\"\n },\n  \"threads\" : [\n {\n    \"is_managed\" : false,\n    \"crashed\" : true,\n    \"managed_thread_ptr\" : \"0x0\",\n    \"native_thread_id\" : \"0x7fffac5b7380\",\n    \"thread_info_addr\" : \"0x0\",\n    \"thread_name\" : \"tid_307\",\n    \"ctx\" : {\n      \"IP\" : \"0x7fff73bffb66\",\n      \"SP\" : \"0x7ffee285db48\",\n      \"BP\" : \"0x7ffee285db80\"\n  },\n    \"managed_frames\" : ["..., hashes=0x00007ffee285c8c0) at mono-merp.c:477
    frame #11: 0x000000010d5c8820 mono-sgen`dump_native_stacktrace(signal="SIGABRT", ctx=0x00007ffee285da90) at mini-posix.c:1081
    frame #12: 0x000000010d5c837f mono-sgen`mono_dump_native_crash_info(signal="SIGABRT", ctx=0x00007ffee285da90, info=0x00007ffee285da28) at mini-posix.c:1147
    frame #13: 0x000000010d4be0a9 mono-sgen`mono_handle_native_crash(signal="SIGABRT", ctx=0x00007ffee285da90, info=0x00007ffee285da28) at mini-exceptions.c:3227
    frame #14: 0x000000010d5c7740 mono-sgen`sigabrt_signal_handler(_dummy=6, _info=0x00007ffee285da28, context=0x00007ffee285da90) at mini-posix.c:223
    frame #15: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #16: 0x00007fff73bffb67 libsystem_kernel.dylib`__pthread_kill + 11
    frame #17: 0x00007fff73dca080 libsystem_pthread.dylib`pthread_kill + 333
    frame #18: 0x00007fff73b5b1ae libsystem_c.dylib`abort + 127
    frame #19: 0x00007fff73c59822 libsystem_malloc.dylib`free + 521
    frame #20: 0x000000010e755a90 libtest.0.dylib`monoeg_g_free(ptr=0x00007ffee285dc34) at gmem.c:86
    frame #21: 0x000000010e74f50b libtest.0.dylib`mono_test_MerpCrashMalloc at libtest.c:7710
    frame #22: 0x000000010e734288
    frame #23: 0x000000010e649fd1

@alexanderkyte
Copy link
Contributor Author

* thread #1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10eff40f8)
  * frame #0: 0x000000010e1510d9 mono-sgen`mono_threads_summarize_execute(ctx=0x0000000000000000, out=0x0000001000000000, hashes=0x0000100000100000, silent=4096, mem="", provided_size=2199023296512) at threads.c:6414
    frame #1: 0x000000010e152092 mono-sgen`mono_threads_summarize(ctx=0x000000010effda00, out=0x000000010effdba0, hashes=0x000000010effdb90, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6508
    frame #2: 0x000000010df7c69f mono-sgen`dump_native_stacktrace(signal="SIGSEGV", ctx=0x000000010effef48) at mini-posix.c:1026
    frame #3: 0x000000010df7c37f mono-sgen`mono_dump_native_crash_info(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-posix.c:1147
    frame #4: 0x000000010de720a9 mono-sgen`mono_handle_native_crash(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-exceptions.c:3227
    frame #5: 0x000000010dd6ac0d mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48, debug_fault_addr=0xffffffffffffffff) at mini-runtime.c:3574
    frame #6: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48) at mini-runtime.c:3612
    frame #7: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #8: 0x0000000110bb81c1
    frame #9: 0x000000011085ffe1
    frame #10: 0x000000010dd6d4f3 mono-sgen`mono_jit_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x00007ffee1ea9f08, error=0x00007ffee1eaa250) at mini-runtime.c:3215
    frame #11: 0x000000010e11509d mono-sgen`do_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x0000000000000000, error=0x00007ffee1eaa250) at object.c:2977
    frame #12: 0x000000010e10d961 mono-sgen`mono_runtime_invoke_checked(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, error=0x00007ffee1eaa250) at object.c:3145
    frame #13: 0x000000010e11aa58 mono-sgen`do_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5042
    frame #14: 0x000000010e118803 mono-sgen`mono_runtime_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5138
    frame #15: 0x000000010e118856 mono-sgen`mono_runtime_run_main_checked(method=0x00007faae4f01fe8, argc=2, argv=0x00007ffee1eaa760, error=0x00007ffee1eaa250) at object.c:4599
    frame #16: 0x000000010de1db2f mono-sgen`mono_jit_exec_internal(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1298
    frame #17: 0x000000010de1d95d mono-sgen`mono_jit_exec(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1257
    frame #18: 0x000000010de2257f mono-sgen`main_thread_handler(user_data=0x00007ffee1eaa6a0) at driver.c:1375
    frame #19: 0x000000010de20852 mono-sgen`mono_main(argc=3, argv=0x00007ffee1eaa758) at driver.c:2551
    frame #20: 0x000000010dd56d7e mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffee1eaa758) at main.c:50
    frame #21: 0x000000010dd5638d mono-sgen`main(argc=3, argv=0x00007ffee1eaa758) at main.c:406
    frame #22: 0x00007fff73aaf015 libdyld.dylib`start + 1
    frame #23: 0x00007fff73aaf015 libdyld.dylib`start + 1
  thread #2, name = 'SGen worker'
    frame #0: 0x000000010e2afd77 mono-sgen`mono_get_hazardous_pointer(pp=0x0000000000000178, hp=0x000000010ef87618, hazard_index=0) at hazard-pointer.c:208
    frame #1: 0x000000010e0b28e1 mono-sgen`mono_jit_info_table_find_internal(domain=0x0000000000000000, addr=0x00007fff73bffa16, try_aot=1, allow_trampolines=1) at jit-info.c:304
    frame #2: 0x000000010dd6aa5f mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0, debug_fault_addr=0x000000010e28fb20) at mini-runtime.c:3540
    frame #3: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0) at mini-runtime.c:3612
    frame #4: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame #5: 0x00007fff73bffa17 libsystem_kernel.dylib`__psynch_cvwait + 11
    frame #6: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame #7: 0x000000010e28d76d mono-sgen`mono_os_cond_wait(cond=0x000000010e44c9d8, mutex=0x000000010e44c998) at mono-os-mutex.h:168
    frame #8: 0x000000010e28df4f mono-sgen`get_work(worker_index=0, work_context=0x000070000fb81ee0, do_idle=0x000070000fb81ed4, job=0x000070000fb81ec8) at sgen-thread-pool.c:165
    frame #9: 0x000000010e28d2cb mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196
    frame #10: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame #11: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame #12: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread #3, name = 'Finalizer'
    frame #0: 0x00007fff73bf6246 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x000000010e1d9c0a mono-sgen`mono_os_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84
    frame #2: 0x000000010e1d832d mono-sgen`mono_coop_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41
    frame #3: 0x000000010e1da787 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920
    frame #4: 0x000000010e152919 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x000070000fd85000) at threads.c:1178
    frame #5: 0x000000010e1525b6 mono-sgen`start_wrapper(data=0x00007faae4f31bd0) at threads.c:1238
    frame #6: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame #7: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame #8: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread #4
    frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame #2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13
(lldb) 

Seeing this one more and more often, may be one of the last big remaining problems.

@alexanderkyte
Copy link
Contributor Author

I've got a partial fix for the handling of the double-signal situation. It won't hang now. May remain uncaught though.

@alexanderkyte
Copy link
Contributor Author

A bunch of fixes are coming. They'll be varied and semantically unrelated. Do not squash.

@alexanderkyte alexanderkyte changed the title [crash] Test mono-merp pipeline in edge cases [ Do not squash] [crash] Test mono-merp pipeline in edge cases Jan 4, 2019
@alexanderkyte alexanderkyte changed the title [ Do not squash] [crash] Test mono-merp pipeline in edge cases [crash] Test and fix stability of native crash pipeline(s) Jan 7, 2019
@alexanderkyte alexanderkyte force-pushed the mono_merp_crashed branch 3 times, most recently from 7871666 to 6fea905 Compare January 7, 2019 19:13
@alexanderkyte alexanderkyte force-pushed the mono_merp_crashed branch 2 times, most recently from f7491b7 to 9b91d9d Compare January 8, 2019 21:35
@alexanderkyte
Copy link
Contributor Author

alexanderkyte commented Jan 8, 2019


I: policy-rc.d already exists
I: Obtaining the cached apt archive contents
useradd: warning: the home directory already exists.
Not copying any file from skel directory into it.
+ cd /mnt/jenkins/workspace/test-mono-pull-request-wasm
+ [[ master == \2\0\1\8\-* ]]
+ export CI_TAGS=linux-amd64,retry-flaky-tests,pull-request,webassembly
+ CI_TAGS=linux-amd64,retry-flaky-tests,pull-request,webassembly
+ scripts/ci/run-jenkins.sh
*** start: provision
make: Entering directory '/mnt/jenkins/workspace/test-mono-pull-request-wasm/sdks/builds'
From https://github.com/juj/emsdk
   b4de632..3ef46e9  master     -> origin/master
error: Your local changes to the following files would be overwritten by merge:
	upstream/lkgr.json
Please commit your changes or stash them before you merge.
Aborting
make: *** [.stamp-wasm-checkout-and-update-emsdk] Error 1
wasm.mk:11: target '.stamp-wasm-checkout-and-update-emsdk' does not exist
cd /mnt/jenkins/workspace/test-mono-pull-request-wasm/sdks/builds/toolchains/emsdk && git clean -xdff && git pull
Removing .emscripten
Removing .emscripten_cache.lock
Removing .emscripten_cache/
Removing .emscripten_sanity
Removing clang/
Removing emscripten/
Removing emsdk_set_env.sh
Removing llvm-tags-32bit.txt
Removing llvm-tags-64bit.txt
Removing node/
Removing tmp/
Removing zips/
Updating b4de632..3ef46e9
wasm.mk:11: recipe for target '.stamp-wasm-checkout-and-update-emsdk' failed
make: Leaving directory '/mnt/jenkins/workspace/test-mono-pull-request-wasm/sdks/builds'
/mnt/jenkins/workspace/test-mono-pull-request-wasm/scripts/ci/babysitter: Test suite terminated with code 2, and suite cannot report test case data. Halting.
*** end(1): provision: �[41mFailed�[0m
I: Copying back the cached apt archive contents
I: unmounting /mnt/jenkins/workspace/test-mono-pull-request-wasm filesystem
I: unmounting dev/pts filesystem
I: unmounting dev/shm filesystem
I: unmounting proc filesystem
I: unmounting sys filesystem
I: cleaning the build env 
I: removing directory /mnt/jenkins/buildplace/6917 and its subdirectories
FATAL: null
java.io.IOException
	at org.jenkinsci.plugins.chroot.builders.ChrootBuilder.perform(ChrootBuilder.java:245)
	at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1810)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)

On Wasm: ^

@alexanderkyte alexanderkyte force-pushed the mono_merp_crashed branch 2 times, most recently from 5f03bbc to 8694651 Compare January 9, 2019 02:08
Threads without domains that get segfaults will end up in
this handler. It's not safe to call this function with a NULL domain.

See crash below:

```
* thread mono#1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10eff40f8)
  * frame #0: 0x000000010e1510d9 mono-sgen`mono_threads_summarize_execute(ctx=0x0000000000000000, out=0x0000001000000000, hashes=0x0000100000100000, silent=4096, mem="", provided_size=2199023296512) at threads.c:6414
    frame mono#1: 0x000000010e152092 mono-sgen`mono_threads_summarize(ctx=0x000000010effda00, out=0x000000010effdba0, hashes=0x000000010effdb90, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6508
    frame mono#2: 0x000000010df7c69f mono-sgen`dump_native_stacktrace(signal="SIGSEGV", ctx=0x000000010effef48) at mini-posix.c:1026
    frame mono#3: 0x000000010df7c37f mono-sgen`mono_dump_native_crash_info(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-posix.c:1147
    frame mono#4: 0x000000010de720a9 mono-sgen`mono_handle_native_crash(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-exceptions.c:3227
    frame mono#5: 0x000000010dd6ac0d mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48, debug_fault_addr=0xffffffffffffffff) at mini-runtime.c:3574
    frame mono#6: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48) at mini-runtime.c:3612
    frame mono#7: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame mono#8: 0x0000000110bb81c1
    frame mono#9: 0x000000011085ffe1
    frame mono#10: 0x000000010dd6d4f3 mono-sgen`mono_jit_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x00007ffee1ea9f08, error=0x00007ffee1eaa250) at mini-runtime.c:3215
    frame mono#11: 0x000000010e11509d mono-sgen`do_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x0000000000000000, error=0x00007ffee1eaa250) at object.c:2977
    frame mono#12: 0x000000010e10d961 mono-sgen`mono_runtime_invoke_checked(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, error=0x00007ffee1eaa250) at object.c:3145
    frame mono#13: 0x000000010e11aa58 mono-sgen`do_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5042
    frame mono#14: 0x000000010e118803 mono-sgen`mono_runtime_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5138
    frame mono#15: 0x000000010e118856 mono-sgen`mono_runtime_run_main_checked(method=0x00007faae4f01fe8, argc=2, argv=0x00007ffee1eaa760, error=0x00007ffee1eaa250) at object.c:4599
    frame mono#16: 0x000000010de1db2f mono-sgen`mono_jit_exec_internal(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1298
    frame mono#17: 0x000000010de1d95d mono-sgen`mono_jit_exec(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1257
    frame mono#18: 0x000000010de2257f mono-sgen`main_thread_handler(user_data=0x00007ffee1eaa6a0) at driver.c:1375
    frame mono#19: 0x000000010de20852 mono-sgen`mono_main(argc=3, argv=0x00007ffee1eaa758) at driver.c:2551
    frame mono#20: 0x000000010dd56d7e mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffee1eaa758) at main.c:50
    frame mono#21: 0x000000010dd5638d mono-sgen`main(argc=3, argv=0x00007ffee1eaa758) at main.c:406
    frame mono#22: 0x00007fff73aaf015 libdyld.dylib`start + 1
    frame mono#23: 0x00007fff73aaf015 libdyld.dylib`start + 1
  thread mono#2, name = 'SGen worker'
    frame #0: 0x000000010e2afd77 mono-sgen`mono_get_hazardous_pointer(pp=0x0000000000000178, hp=0x000000010ef87618, hazard_index=0) at hazard-pointer.c:208
    frame mono#1: 0x000000010e0b28e1 mono-sgen`mono_jit_info_table_find_internal(domain=0x0000000000000000, addr=0x00007fff73bffa16, try_aot=1, allow_trampolines=1) at jit-info.c:304
    frame mono#2: 0x000000010dd6aa5f mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0, debug_fault_addr=0x000000010e28fb20) at mini-runtime.c:3540
    frame mono#3: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0) at mini-runtime.c:3612
    frame mono#4: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame mono#5: 0x00007fff73bffa17 libsystem_kernel.dylib`__psynch_cvwait + 11
    frame mono#6: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame mono#7: 0x000000010e28d76d mono-sgen`mono_os_cond_wait(cond=0x000000010e44c9d8, mutex=0x000000010e44c998) at mono-os-mutex.h:168
    frame mono#8: 0x000000010e28df4f mono-sgen`get_work(worker_index=0, work_context=0x000070000fb81ee0, do_idle=0x000070000fb81ed4, job=0x000070000fb81ec8) at sgen-thread-pool.c:165
    frame mono#9: 0x000000010e28d2cb mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196
    frame mono#10: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame mono#11: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame mono#12: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread mono#3, name = 'Finalizer'
    frame #0: 0x00007fff73bf6246 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame mono#1: 0x000000010e1d9c0a mono-sgen`mono_os_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84
    frame mono#2: 0x000000010e1d832d mono-sgen`mono_coop_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41
    frame mono#3: 0x000000010e1da787 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920
    frame mono#4: 0x000000010e152919 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x000070000fd85000) at threads.c:1178
    frame mono#5: 0x000000010e1525b6 mono-sgen`start_wrapper(data=0x00007faae4f31bd0) at threads.c:1238
    frame mono#6: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame mono#7: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame mono#8: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread mono#4
    frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10
    frame mono#1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame mono#2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13
(lldb)
```
Each frame that prints ends up increased by the size of buff.
In practice, clang often fails to deduplicate some of these buffers,
leading to 30k-big stackframes.

It was noticed by a series of hard-to-diagnose segfaults on stacks that
looked otherwise fine during the crash reporting stress test.

This change fixes this, making stacks a 1/10th of the size. It doesn't
seem to break the crash reporter messages anywhere (may need to shrink
other "max name length" fields), and it's not mission-critical anywhere
else.
@alexanderkyte
Copy link
Contributor Author

alexanderkyte commented Jan 18, 2019

./mcs/class/System.Runtime.Remoting/TestResult-net_4_x.xml
/home/builder/jenkins/workspace/test-mono-pull-request-coop-arm64/scripts/ci/babysitter: Sending SIGABRT to `find`
/home/builder/jenkins/workspace/test-mono-pull-request-coop-arm64/scripts/ci/babysitter: Command `find` returned with -6 after SIGABRT
/home/builder/jenkins/workspace/test-mono-pull-request-coop-arm64/scripts/ci/babysitter: Command `find` timed out with -6
/home/builder/jenkins/workspace/test-mono-pull-request-coop-arm64/scripts/ci/babysitter: Saw timeout in test case None (never allowed). Will halt testing.
*** end(124): bundle-test-results: �[43mUnstable�[0m

For Linux AArch64 Coop Suspend https://jenkins.mono-project.com/job/test-mono-pull-request-coop-arm64/7904/

@alexanderkyte
Copy link
Contributor Author

@luhenry could you give this a look when you have the chance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants