Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@alexanderkyte
Copy link
Contributor

This contains the commits from

#12125
#12126
and
#12518

We add a checked build mode that asserts when mono mallocs inside of
the crash reporter. It makes risky allocations into assertions. It's
useful for automated testing because the double-abort often represents
itself as an indefinite hang. If it happens before the thread dumping
supervisor process is started, or after it ends, the crash reporter
hangs.
Threads without domains that get segfaults will end up in
this handler. It's not safe to call this function with a NULL domain.

See crash below:

```
* thread mono#1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10eff40f8)
  * frame #0: 0x000000010e1510d9 mono-sgen`mono_threads_summarize_execute(ctx=0x0000000000000000, out=0x0000001000000000, hashes=0x0000100000100000, silent=4096, mem="", provided_size=2199023296512) at threads.c:6414
    frame mono#1: 0x000000010e152092 mono-sgen`mono_threads_summarize(ctx=0x000000010effda00, out=0x000000010effdba0, hashes=0x000000010effdb90, silent=0, signal_handler_controller=1, mem=0x0000000000000000, provided_size=0) at threads.c:6508
    frame mono#2: 0x000000010df7c69f mono-sgen`dump_native_stacktrace(signal="SIGSEGV", ctx=0x000000010effef48) at mini-posix.c:1026
    frame mono#3: 0x000000010df7c37f mono-sgen`mono_dump_native_crash_info(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-posix.c:1147
    frame mono#4: 0x000000010de720a9 mono-sgen`mono_handle_native_crash(signal="SIGSEGV", ctx=0x000000010effef48, info=0x000000010effeee0) at mini-exceptions.c:3227
    frame mono#5: 0x000000010dd6ac0d mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48, debug_fault_addr=0xffffffffffffffff) at mini-runtime.c:3574
    frame mono#6: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000000010effeee0, context=0x000000010effef48) at mini-runtime.c:3612
    frame mono#7: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame mono#8: 0x0000000110bb81c1
    frame mono#9: 0x000000011085ffe1
    frame mono#10: 0x000000010dd6d4f3 mono-sgen`mono_jit_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x00007ffee1ea9f08, error=0x00007ffee1eaa250) at mini-runtime.c:3215
    frame mono#11: 0x000000010e11509d mono-sgen`do_runtime_invoke(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, exc=0x0000000000000000, error=0x00007ffee1eaa250) at object.c:2977
    frame mono#12: 0x000000010e10d961 mono-sgen`mono_runtime_invoke_checked(method=0x00007faae4f01fe8, obj=0x0000000000000000, params=0x00007ffee1eaa180, error=0x00007ffee1eaa250) at object.c:3145
    frame mono#13: 0x000000010e11aa58 mono-sgen`do_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5042
    frame mono#14: 0x000000010e118803 mono-sgen`mono_runtime_exec_main_checked(method=0x00007faae4f01fe8, args=0x000000010f0003e8, error=0x00007ffee1eaa250) at object.c:5138
    frame mono#15: 0x000000010e118856 mono-sgen`mono_runtime_run_main_checked(method=0x00007faae4f01fe8, argc=2, argv=0x00007ffee1eaa760, error=0x00007ffee1eaa250) at object.c:4599
    frame mono#16: 0x000000010de1db2f mono-sgen`mono_jit_exec_internal(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1298
    frame mono#17: 0x000000010de1d95d mono-sgen`mono_jit_exec(domain=0x00007faae4f00860, assembly=0x00007faae4c02ab0, argc=2, argv=0x00007ffee1eaa760) at driver.c:1257
    frame mono#18: 0x000000010de2257f mono-sgen`main_thread_handler(user_data=0x00007ffee1eaa6a0) at driver.c:1375
    frame mono#19: 0x000000010de20852 mono-sgen`mono_main(argc=3, argv=0x00007ffee1eaa758) at driver.c:2551
    frame mono#20: 0x000000010dd56d7e mono-sgen`mono_main_with_options(argc=3, argv=0x00007ffee1eaa758) at main.c:50
    frame mono#21: 0x000000010dd5638d mono-sgen`main(argc=3, argv=0x00007ffee1eaa758) at main.c:406
    frame mono#22: 0x00007fff73aaf015 libdyld.dylib`start + 1
    frame mono#23: 0x00007fff73aaf015 libdyld.dylib`start + 1
  thread mono#2, name = 'SGen worker'
    frame #0: 0x000000010e2afd77 mono-sgen`mono_get_hazardous_pointer(pp=0x0000000000000178, hp=0x000000010ef87618, hazard_index=0) at hazard-pointer.c:208
    frame mono#1: 0x000000010e0b28e1 mono-sgen`mono_jit_info_table_find_internal(domain=0x0000000000000000, addr=0x00007fff73bffa16, try_aot=1, allow_trampolines=1) at jit-info.c:304
    frame mono#2: 0x000000010dd6aa5f mono-sgen`mono_sigsegv_signal_handler_debug(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0, debug_fault_addr=0x000000010e28fb20) at mini-runtime.c:3540
    frame mono#3: 0x000000010dd6a8d3 mono-sgen`mono_sigsegv_signal_handler(_dummy=11, _info=0x000070000fb81c58, context=0x000070000fb81cc0) at mini-runtime.c:3612
    frame mono#4: 0x00007fff73dbdf5a libsystem_platform.dylib`_sigtramp + 26
    frame mono#5: 0x00007fff73bffa17 libsystem_kernel.dylib`__psynch_cvwait + 11
    frame mono#6: 0x00007fff73dc8589 libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame mono#7: 0x000000010e28d76d mono-sgen`mono_os_cond_wait(cond=0x000000010e44c9d8, mutex=0x000000010e44c998) at mono-os-mutex.h:168
    frame mono#8: 0x000000010e28df4f mono-sgen`get_work(worker_index=0, work_context=0x000070000fb81ee0, do_idle=0x000070000fb81ed4, job=0x000070000fb81ec8) at sgen-thread-pool.c:165
    frame mono#9: 0x000000010e28d2cb mono-sgen`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196
    frame mono#10: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame mono#11: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame mono#12: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread mono#3, name = 'Finalizer'
    frame #0: 0x00007fff73bf6246 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame mono#1: 0x000000010e1d9c0a mono-sgen`mono_os_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:84
    frame mono#2: 0x000000010e1d832d mono-sgen`mono_coop_sem_wait(sem=0x000000010e43e400, flags=MONO_SEM_FLAGS_ALERTABLE) at mono-coop-semaphore.h:41
    frame mono#3: 0x000000010e1da787 mono-sgen`finalizer_thread(unused=0x0000000000000000) at gc.c:920
    frame mono#4: 0x000000010e152919 mono-sgen`start_wrapper_internal(start_info=0x0000000000000000, stack_ptr=0x000070000fd85000) at threads.c:1178
    frame mono#5: 0x000000010e1525b6 mono-sgen`start_wrapper(data=0x00007faae4f31bd0) at threads.c:1238
    frame mono#6: 0x00007fff73dc7661 libsystem_pthread.dylib`_pthread_body + 340
    frame mono#7: 0x00007fff73dc750d libsystem_pthread.dylib`_pthread_start + 377
    frame mono#8: 0x00007fff73dc6bf9 libsystem_pthread.dylib`thread_start + 13
  thread mono#4
    frame #0: 0x00007fff73c0028a libsystem_kernel.dylib`__workq_kernreturn + 10
    frame mono#1: 0x00007fff73dc7009 libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame mono#2: 0x00007fff73dc6be9 libsystem_pthread.dylib`start_wqthread + 13
(lldb)
```
Each frame that prints ends up increased by the size of buff.
In practice, clang often fails to deduplicate some of these buffers,
leading to 30k-big stackframes.

It was noticed by a series of hard-to-diagnose segfaults on stacks that
looked otherwise fine during the crash reporting stress test.

This change fixes this, making stacks a 1/10th of the size. It doesn't
seem to break the crash reporter messages anywhere (may need to shrink
other "max name length" fields), and it's not mission-critical anywhere
else.
@alexanderkyte
Copy link
Contributor Author

@monojenkins build deb with monolite

@alexanderkyte
Copy link
Contributor Author

Tarball build is failing.

Diagnosed the a proximal cause: when building with a system mono of 5.16 the 2018-08 HEAD fails but when building with 5.23.0 (from nightly package) it gets:

Creating .dep_dirs-basic...
Creating .dep_dirs-basic...
CSC     [basic] Mono.Cecil.dll
mkdir -p -- ../../class/lib/basic/tmp/
CSC     [basic] cil-stringreplacer.exe
CSC     [basic] mscorlib.dll

Unhandled Exception:
System.IO.FileNotFoundException: Could not load file or assembly 'Mono.Cecil, Version=0.10.0.0, Culture=neutral, PublicKeyToken=0738eb9f132ed756' or one of its dependencies.
File name: 'Mono.Cecil, Version=0.10.0.0, Culture=neutral, PublicKeyToken=0738eb9f132ed756'
  at Program.Main (System.String[] args) [0x0010e] in <a90a1ae5ea75408fa75e79014ad08562>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.IO.FileNotFoundException: Could not load file or assembly 'Mono.Cecil, Version=0.10.0.0, Culture=neutral, PublicKeyToken=0738eb9f132ed756' or one of its dependencies.
File name: 'Mono.Cecil, Version=0.10.0.0, Culture=neutral, PublicKeyToken=0738eb9f132ed756'
  at Program.Main (System.String[] args) [0x0010e] in <a90a1ae5ea75408fa75e79014ad08562>:0 
make[8]: *** [../../class/lib/basic/mscorlib.dll] Error 1
make[7]: *** [do-all] Error 2
make[6]: *** [all-recursive] Error 1
make[5]: *** [all-recursive] Error 1
make[4]: *** [profile-do--basic--all] Error 2

@alexanderkyte
Copy link
Contributor Author

I added an infrastructure fix commit temporarily. I'll drop the tarball-making fix commit from this PR after I've got a tarball to test the rest of the build with.

@alexanderkyte
Copy link
Contributor Author

@monojenkins build deb with monolite

@alexanderkyte
Copy link
Contributor Author

The fix from #12574 seemed to work to get a tarball building.

@alexanderkyte
Copy link
Contributor Author

alexanderkyte commented Jan 23, 2019

Got the tarball at (https://jenkins.mono-project.com/job/build-source-tarball-mono-pullrequest/539/), now I'll remove that commit and leave it to #12574 to merge that fix.

@alexanderkyte
Copy link
Contributor Author

iOS SDK failing:


[19/21] cp Info.plist Info.plist.binary; plutil -convert binary1 Info.plist.binary
[20/21] if cmp -s Info.plist.binary /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/bin/ios-sim/test-Mono.Runtime.Tests.app/Info.plist ; then : ; else cp Info.plist.binary /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/bin/ios-sim/test-Mono.Runtime.Tests.app/Info.plist ; fi
[21/21] codesign --force --sign - --timestamp=none /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/bin/ios-sim/test-Mono.Runtime.Tests.app/test-Mono.Runtime.Tests
make[1]: Leaving directory '/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios'
make[1]: Entering directory '/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios'
csc /out:harness.exe -r:System.Json.dll harness/harness.cs ../../mcs/class/Mono.Options//Mono.Options/Options.cs
Microsoft (R) Visual C# Compiler version 2.7.0.62620 (e873e693)
Copyright (C) Microsoft Corporation. All rights reserved.

mono harness.exe --start-sim
Running: xcrun simctl list devices
xcrun simctl install xamarin.ios-sdk.sim bin/ios-sim/test-Mono.Runtime.Tests.app
mono harness.exe --run-sim --logfile ios-sim-Mono.Runtime.Tests.log --bundle-id com.xamarin.mono.ios.test-Mono.Runtime.Tests --bundle-dir bin/ios-sim/test-Mono.Runtime.Tests.app  test-runner.exe CONNSTR -exclude:MobileNotWorking,NotOnMac,NotWorking,ValueAdd,CAS,InetAccess,NotWorkingLinqInterpreter -labels monotouch_Mono.Runtime.Tests_test.dll
App: com.xamarin.mono.ios.test-Mono.Runtime.Tests
Running: xcrun simctl list devices
Running: xcrun simctl install xamarin.ios-sdk.sim bin/ios-sim/test-Mono.Runtime.Tests.app
Running: xcrun simctl terminate xamarin.ios-sdk.sim com.xamarin.mono.ios.test-Mono.Runtime.Tests
Running: xcrun simctl launch xamarin.ios-sdk.sim com.xamarin.mono.ios.test-Mono.Runtime.Tests test-runner.exe tcp:localhost:63334 -exclude:MobileNotWorking,NotOnMac,NotWorking,ValueAdd,CAS,InetAccess,NotWorkingLinqInterpreter -labels monotouch_Mono.Runtime.Tests_test.dll 
com.xamarin.mono.ios.test-Mono.Runtime.Tests: 36280
NUnitLite 1.0.0 (.NET 4.5 Debug)
Copyright 2013, Charlie Poole

Runtime Environment -
    OS Version: Unix 17.7.0.0
  Mono Version: 4.0.50524.0

***** /Users/builder/Library/Developer/CoreSimulator/Devices/D5579B79-C34F-4521-BF8F-3508EC9E6B6C/data/Containers/Bundle/Application/EC9EEDC8-0E2B-4397-8E8D-B5FFA46902D7/test-Mono.Runtime.Tests.app/monotouch_Mono.Runtime.Tests_test.dll
***** MonoTests.Runtime.JitTests
***** MonoTests.Runtime.JitTests.Aot
***** MonoTests.Runtime.JitTests.Arrays
***** MonoTests.Runtime.JitTests.Basic
***** MonoTests.Runtime.JitTests.Calls
***** MonoTests.Runtime.JitTests.Exceptions
***** MonoTests.Runtime.JitTests.Float
***** MonoTests.Runtime.JitTests.Generics
***** MonoTests.Runtime.JitTests.GShared
***** MonoTests.Runtime.JitTests.Long
***** MonoTests.Runtime.JitTests.Math
***** MonoTests.Runtime.JitTests.Objects

Tests run: 11, Passed: 11, Errors: 0, Failures: 0, Inconclusive: 0
  Not run: 0, Invalid: 0, Ignored: 0, Skipped: 0
Elapsed time: 00:00:00.5370000
make[1]: Leaving directory '/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios'
make[1]: Entering directory '/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios'
mono appbuilder.exe --target ios-sim64 --mono-sdkdir /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/../out --appdir /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/bin/ios-sim/test-corlib.app --runtimedir /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/runtime --builddir obj/ios-sim/test-corlib.app --sysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk --signing-identity - --bundle-executable test-corlib --bundle-identifier com.xamarin.mono.ios.test-corlib --bundle-name test-corlib -r ../../mcs/class/lib/monotouch/mscorlib.dll -r ../../mcs/class/lib/monotouch/System.dll -r ../../mcs/class/lib/monotouch/System.Xml.dll -r ../../mcs/class/lib/monotouch/System.Core.dll -r ../../mcs/class/lib/monotouch/I18N.dll -r ../../mcs/class/lib/monotouch/I18N.West.dll -r ../../mcs/class/lib/monotouch/Mono.Simd.dll -r ../../mcs/class/lib/monotouch/Mono.Security.dll -r ../../mcs/class/lib/monotouch/System.Numerics.dll -r ../../mcs/class/lib/monotouch/System.Numerics.Vectors.dll -r ../../mcs/class/lib/monotouch/nunitlite.dll -r test-runner.exe -r ../../mcs/class/lib/monotouch/tests/monotouch_corlib_test.dll
mkdir -p /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/bin/ios-sim/test-corlib.app
if test "x../../mcs/class/corlib/{es-ES,nn-NO}" != "x"; then cp -r ../../mcs/class/corlib/{es-ES,nn-NO} /Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios/bin/ios-sim/test-corlib.app/; fi
cp: ../../mcs/class/corlib/es-ES: No such file or directory
cp: ../../mcs/class/corlib/nn-NO: No such file or directory
make[1]: *** [Makefile:89: build-ios-sim-corlib] Error 1
make[1]: Leaving directory '/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios'
make: *** [Makefile:150: run-ios-sim-all] Error 1
make: Leaving directory '/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/sdks/ios'
/Users/builder/jenkins/workspace/test-mono-pull-request-amd64-osx-products-sdks-ios/scripts/ci/babysitter: Test suite terminated with code 2, and suite cannot report test case data. Halting.
*** end(19): run-sim: \e[43mUnstable\e[0m

https://jenkins.mono-project.com/job/test-mono-pull-request-amd64-osx-products-sdks-ios/8608/parsed_console/log_content.html#WARNING1

@alexanderkyte
Copy link
Contributor Author

Interesting, the merp test is failing. Will investigate.

@alexanderkyte
Copy link
Contributor Author

Hmm, the last "batch of fixes" seems to have broken this test. I see it was the icall rename. The associated test change I made didn't seem to have landed. As this json test is also being made by the json checks made in d5dc911#diff-4da111a5945e8cac7c9ed5d39a73b9e0R200 , I am just going to disable this test here.

@alexanderkyte
Copy link
Contributor Author

alexanderkyte commented Jan 24, 2019

@alexanderkyte
Copy link
Contributor Author

Yeah, when I run it locally it fails every time.
https://gist.github.com/alexanderkyte/4697b2e880a82c6af3cfa3a24b193f54

This behavior on CI seems completely broken. It's probably not getting the return code it needs to identify failure.

@alexanderkyte
Copy link
Contributor Author

So the functionality tested by this test seems to work locally just fine, and did a lot of runs just fine.

Given the hidden nature of the unmanaged APIs and the fact that the marshaling wrappers are resistant to reflection outside of the BCL (SafeStringMarshal), I don't think it's feasible to get this tested on CI for this release without adding that EnableMicrosoftTelemetryNext additional endpoint that was rejected.

@marek-safar marek-safar merged commit a4956c8 into mono:2018-08 Jan 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants