-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[arm64] Correct exception filter stack frame size #14757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[arm64] Correct exception filter stack frame size #14757
Conversation
Fixes: mono#14170 Fixes: dotnet/android#3112 The `call_filter()` function generated by `mono_arch_get_call_filter()` was overwriting a part of the previous stack frame because it was not creating a large enough new stack frame for itself. This had been working by luck before the switch to building the runtime with Android NDK r19. I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to verify that this change resolved both of the issues linked above. I also confirmed that I was able to reintroduce the issues in my local environment by removing the change and rebuilding. After this change, the generated `call_filter()` function now starts by reserving sufficient space on the stack to hold a 64-bit value at the `ctx_offset` location. The 64-bit `ctx` value is saved to the `ctx_offset` location (offset 344 in the new stack frame) shortly afterwards: stp x29, x30, [sp,#-352]! mov x29, sp str x0, [x29,mono#344] As expected, the invocation of `call_filter()` now no longer modifies the top of the previous stack frame. Top of the stack before `call_filter()`: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 Top of the stack after `call_filter()` (unchanged): (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 Additional background information ================================= The original lucky, "good" behavior worked as follows for [mono/mono@725ba2a built with Android NDK r14][1]: 1. The `resume` parameter for `mono_handle_exception_internal()` is held in register `w23`. 2. That register is saved into the current stack frame at offset 20 by a `str` instruction: 0x00000000000bc1bc <+3012>: str w23, [sp,mono#20] 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction: 2279 filtered = call_filter (ctx, ei->data.filter); 0x00000000000bc60c <+4116>: add x9, x22, x24, lsl mono#6 0x00000000000bc610 <+4120>: ldr x8, [x8,mono#3120] 0x00000000000bc614 <+4124>: ldr x1, [x9,mono#112] 0x00000000000bc618 <+4128>: add x0, sp, #0x110 0x00000000000bc61c <+4132>: blr x8 Before the `blr` instruction, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x0a8b31d8 0x00000071 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 4. `call_filter()` runs. This function is generated by `mono_arch_get_call_filter()` and starts with: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,mono#344] Note in particular how the first line subtracts 336 from `sp` to start a new stack frame, but the third line writes to a position that is 344 bytes beyond that, back within the *previous* stack frame. 5. After the invocations of `call_filter()` and `handle_exception_first_pass()`, `w23` is restored from the stack: 0x00000000000bc820 <+4648>: ldr w23, [sp,mono#20] At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 Notice how `call_filter()` has overwritten bytes 8 through 15 of the stack frame. In this original lucky scenario, this does not affect the value restored into register `w23` because that value starts at byte 20. 6. `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000bb960 <+872>: cbz w23, 0xbb9c4 <mono_handle_exception_internal+972> Since `w23` is `0`, `mono_handle_exception_internal()` correctly continues into the `else` branch. The bad behavior for [mono/mono@725ba2a built with Android NDK r19][2] works just slightly differently: 1. As before, the local `resume` parameter starts in register `w23`. 2. This time, the register is saved into the stack frame at offset 12 (instead of 20): 0x00000000000bed7c <+3200>: str w23, [sp,mono#12] 3. As before, `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction. At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 4. `call_filter()` runs as before. And the first few instructions of that function are the same as before: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,mono#344] 5. `w23` is again restored from the stack, but this time from offset 12: 0x00000000000bf2c0 <+4548>: ldr w23, [sp,mono#12] The problem is that `call_filter()` has again overwritten bytes 8 through 15 of the stack frame: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 So after the `ldr` instruction, `w23` has an incorrect value of `0x7f` rather than the correct value `0`. 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000be430 <+820>: cbz w23, 0xbe834 <mono_handle_exception_internal+1848> But this time because `w23` is *not* zero, execution continues *incorrectly* into the `if` branch rather than the `else` branch. The result is that `ji` is set to `0` rather than a valid `(MonoJitInfo *)` value. This incorrect value for `ji` leads to a null pointer dereference, as seen in the bug reports. [0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22 [1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/ [2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/
|
build |
|
(/cc @lambdageek) To fill in a little more background story about how I ended up at this PR, I was originally just planning to do a few experiments with the different versions of To do:
|
mono/mini/exceptions-arm64.c
Outdated
| fregs_offset = offset; | ||
| offset += num_fregs * 8; | ||
| ctx_offset = offset; | ||
| ctx_offset += 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think the more correct change (in the sense of saving a couple of bytes on the stack frame) would be to change this line:
ctx_offset += 8;to
offset += 8;then the frame size computation below would be correct.
Great find! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should totally do this. The code can be pretty confusing as is. Having a mysterious unused stack slot makes it also harder to follow.
|
We desperately need a fix for this. This is a desaster. This is broken in the stable version of VS2019 16.1.1 and there doesn't seem to be a way to downgrade to 16.0.X for those who are using VS Community. |
|
/cc @SamMonoRT |
|
@brendanzagaeski I've updated the PR so that it reflects my comment (since Vlad confirmed my observation). Before we start the backport process I would appreciate if you could give it another test on an Android device. |
|
Oooh, yes, that makes a lot of sense. My updated testing results with that change are GOOD. The test apps I was using for both issues run correctly. The top of the generated So the And the stack contents comparing before and after Top of the stack after call_filter() (unchanged): |
|
@monojenkins backport to 2019-06 |
|
@monojenkins squash |
[2018-08] [arm64] Correct exception filter stack frame size Fixes: #14170 Fixes: dotnet/android#3112 The `call_filter()` function generated by `mono_arch_get_call_filter()` was overwriting a part of the previous stack frame because it was not creating a large enough new stack frame for itself. This had been working by luck before the switch to building the runtime with Android NDK r19. I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to verify that this change resolved both of the issues linked above. I also confirmed that I was able to reintroduce the issues in my local environment by removing the change and rebuilding. After this change, the generated `call_filter()` function now starts by reserving sufficient space on the stack to hold a 64-bit value at the `ctx_offset` location. The 64-bit `ctx` value is saved to the `ctx_offset` location (offset 344 in the new stack frame) shortly afterwards: stp x29, x30, [sp,#-352]! mov x29, sp str x0, [x29,#344] As expected, the invocation of `call_filter()` now no longer modifies the top of the previous stack frame. Top of the stack before `call_filter()`: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 Top of the stack after `call_filter()` (unchanged): (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 <details> <summary>Additional background information</summary> ### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1] 1. The `resume` parameter for `mono_handle_exception_internal()` is held in register `w23`. 2. That register is saved into the current stack frame at offset 20 by a `str` instruction: 0x00000000000bc1bc <+3012>: str w23, [sp,#20] 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction: 2279 filtered = call_filter (ctx, ei->data.filter); 0x00000000000bc60c <+4116>: add x9, x22, x24, lsl #6 0x00000000000bc610 <+4120>: ldr x8, [x8,#3120] 0x00000000000bc614 <+4124>: ldr x1, [x9,#112] 0x00000000000bc618 <+4128>: add x0, sp, #0x110 0x00000000000bc61c <+4132>: blr x8 Before the `blr` instruction, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x0a8b31d8 0x00000071 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 4. `call_filter()` runs. This function is generated by `mono_arch_get_call_filter()` and starts with: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] Note in particular how the first line subtracts 336 from `sp` to start a new stack frame, but the third line writes to a position that is 344 bytes beyond that, back within the *previous* stack frame. 5. After the invocations of `call_filter()` and `handle_exception_first_pass()`, `w23` is restored from the stack: 0x00000000000bc820 <+4648>: ldr w23, [sp,#20] At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 Notice how `call_filter()` has overwritten bytes 8 through 15 of the stack frame. In this original lucky scenario, this does not affect the value restored into register `w23` because that value starts at byte 20. 6. `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000bb960 <+872>: cbz w23, 0xbb9c4 <mono_handle_exception_internal+972> Since `w23` is `0`, `mono_handle_exception_internal()` correctly continues into the `else` branch. ### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently 1. As before, the local `resume` parameter starts in register `w23`. 2. This time, the register is saved into the stack frame at offset 12 (instead of 20): 0x00000000000bed7c <+3200>: str w23, [sp,#12] 3. As before, `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction. At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 4. `call_filter()` runs as before. And the first few instructions of that function are the same as before: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] 5. `w23` is again restored from the stack, but this time from offset 12: 0x00000000000bf2c0 <+4548>: ldr w23, [sp,#12] The problem is that `call_filter()` has again overwritten bytes 8 through 15 of the stack frame: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 So after the `ldr` instruction, `w23` has an incorrect value of `0x7f` rather than the correct value `0`. 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000be430 <+820>: cbz w23, 0xbe834 <mono_handle_exception_internal+1848> But this time because `w23` is *not* zero, execution continues *incorrectly* into the `if` branch rather than the `else` branch. The result is that `ji` is set to `0` rather than a valid `(MonoJitInfo *)` value. This incorrect value for `ji` leads to a null pointer dereference, as seen in the bug reports. [0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22 [1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/ [2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/ </details> Backport of #14757. /cc @lambdageek @brendanzagaeski
|
Cannot squash because the following required status checks are not successful:
|
[2019-02] [arm64] Correct exception filter stack frame size Fixes: #14170 Fixes: dotnet/android#3112 The `call_filter()` function generated by `mono_arch_get_call_filter()` was overwriting a part of the previous stack frame because it was not creating a large enough new stack frame for itself. This had been working by luck before the switch to building the runtime with Android NDK r19. I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to verify that this change resolved both of the issues linked above. I also confirmed that I was able to reintroduce the issues in my local environment by removing the change and rebuilding. After this change, the generated `call_filter()` function now starts by reserving sufficient space on the stack to hold a 64-bit value at the `ctx_offset` location. The 64-bit `ctx` value is saved to the `ctx_offset` location (offset 344 in the new stack frame) shortly afterwards: stp x29, x30, [sp,#-352]! mov x29, sp str x0, [x29,#344] As expected, the invocation of `call_filter()` now no longer modifies the top of the previous stack frame. Top of the stack before `call_filter()`: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 Top of the stack after `call_filter()` (unchanged): (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 <details> <summary>Additional background information</summary> ### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1] 1. The `resume` parameter for `mono_handle_exception_internal()` is held in register `w23`. 2. That register is saved into the current stack frame at offset 20 by a `str` instruction: 0x00000000000bc1bc <+3012>: str w23, [sp,#20] 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction: 2279 filtered = call_filter (ctx, ei->data.filter); 0x00000000000bc60c <+4116>: add x9, x22, x24, lsl #6 0x00000000000bc610 <+4120>: ldr x8, [x8,#3120] 0x00000000000bc614 <+4124>: ldr x1, [x9,#112] 0x00000000000bc618 <+4128>: add x0, sp, #0x110 0x00000000000bc61c <+4132>: blr x8 Before the `blr` instruction, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x0a8b31d8 0x00000071 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 4. `call_filter()` runs. This function is generated by `mono_arch_get_call_filter()` and starts with: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] Note in particular how the first line subtracts 336 from `sp` to start a new stack frame, but the third line writes to a position that is 344 bytes beyond that, back within the *previous* stack frame. 5. After the invocations of `call_filter()` and `handle_exception_first_pass()`, `w23` is restored from the stack: 0x00000000000bc820 <+4648>: ldr w23, [sp,#20] At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 Notice how `call_filter()` has overwritten bytes 8 through 15 of the stack frame. In this original lucky scenario, this does not affect the value restored into register `w23` because that value starts at byte 20. 6. `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000bb960 <+872>: cbz w23, 0xbb9c4 <mono_handle_exception_internal+972> Since `w23` is `0`, `mono_handle_exception_internal()` correctly continues into the `else` branch. ### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently 1. As before, the local `resume` parameter starts in register `w23`. 2. This time, the register is saved into the stack frame at offset 12 (instead of 20): 0x00000000000bed7c <+3200>: str w23, [sp,#12] 3. As before, `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction. At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 4. `call_filter()` runs as before. And the first few instructions of that function are the same as before: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] 5. `w23` is again restored from the stack, but this time from offset 12: 0x00000000000bf2c0 <+4548>: ldr w23, [sp,#12] The problem is that `call_filter()` has again overwritten bytes 8 through 15 of the stack frame: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 So after the `ldr` instruction, `w23` has an incorrect value of `0x7f` rather than the correct value `0`. 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000be430 <+820>: cbz w23, 0xbe834 <mono_handle_exception_internal+1848> But this time because `w23` is *not* zero, execution continues *incorrectly* into the `if` branch rather than the `else` branch. The result is that `ji` is set to `0` rather than a valid `(MonoJitInfo *)` value. This incorrect value for `ji` leads to a null pointer dereference, as seen in the bug reports. [0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22 [1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/ [2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/ </details> Backport of #14757. /cc @lambdageek @brendanzagaeski
[2019-06] [arm64] Correct exception filter stack frame size Fixes: #14170 Fixes: dotnet/android#3112 The `call_filter()` function generated by `mono_arch_get_call_filter()` was overwriting a part of the previous stack frame because it was not creating a large enough new stack frame for itself. This had been working by luck before the switch to building the runtime with Android NDK r19. I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to verify that this change resolved both of the issues linked above. I also confirmed that I was able to reintroduce the issues in my local environment by removing the change and rebuilding. After this change, the generated `call_filter()` function now starts by reserving sufficient space on the stack to hold a 64-bit value at the `ctx_offset` location. The 64-bit `ctx` value is saved to the `ctx_offset` location (offset 344 in the new stack frame) shortly afterwards: stp x29, x30, [sp,#-352]! mov x29, sp str x0, [x29,#344] As expected, the invocation of `call_filter()` now no longer modifies the top of the previous stack frame. Top of the stack before `call_filter()`: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 Top of the stack after `call_filter()` (unchanged): (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 <details> <summary>Additional background information</summary> ### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1] 1. The `resume` parameter for `mono_handle_exception_internal()` is held in register `w23`. 2. That register is saved into the current stack frame at offset 20 by a `str` instruction: 0x00000000000bc1bc <+3012>: str w23, [sp,#20] 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction: 2279 filtered = call_filter (ctx, ei->data.filter); 0x00000000000bc60c <+4116>: add x9, x22, x24, lsl #6 0x00000000000bc610 <+4120>: ldr x8, [x8,#3120] 0x00000000000bc614 <+4124>: ldr x1, [x9,#112] 0x00000000000bc618 <+4128>: add x0, sp, #0x110 0x00000000000bc61c <+4132>: blr x8 Before the `blr` instruction, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x0a8b31d8 0x00000071 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 4. `call_filter()` runs. This function is generated by `mono_arch_get_call_filter()` and starts with: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] Note in particular how the first line subtracts 336 from `sp` to start a new stack frame, but the third line writes to a position that is 344 bytes beyond that, back within the *previous* stack frame. 5. After the invocations of `call_filter()` and `handle_exception_first_pass()`, `w23` is restored from the stack: 0x00000000000bc820 <+4648>: ldr w23, [sp,#20] At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 Notice how `call_filter()` has overwritten bytes 8 through 15 of the stack frame. In this original lucky scenario, this does not affect the value restored into register `w23` because that value starts at byte 20. 6. `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000bb960 <+872>: cbz w23, 0xbb9c4 <mono_handle_exception_internal+972> Since `w23` is `0`, `mono_handle_exception_internal()` correctly continues into the `else` branch. ### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently 1. As before, the local `resume` parameter starts in register `w23`. 2. This time, the register is saved into the stack frame at offset 12 (instead of 20): 0x00000000000bed7c <+3200>: str w23, [sp,#12] 3. As before, `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction. At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 4. `call_filter()` runs as before. And the first few instructions of that function are the same as before: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] 5. `w23` is again restored from the stack, but this time from offset 12: 0x00000000000bf2c0 <+4548>: ldr w23, [sp,#12] The problem is that `call_filter()` has again overwritten bytes 8 through 15 of the stack frame: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 So after the `ldr` instruction, `w23` has an incorrect value of `0x7f` rather than the correct value `0`. 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000be430 <+820>: cbz w23, 0xbe834 <mono_handle_exception_internal+1848> But this time because `w23` is *not* zero, execution continues *incorrectly* into the `if` branch rather than the `else` branch. The result is that `ji` is set to `0` rather than a valid `(MonoJitInfo *)` value. This incorrect value for `ji` leads to a null pointer dereference, as seen in the bug reports. [0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22 [1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/ [2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/ </details> Backport of #14757. /cc @lambdageek @brendanzagaeski
[2018-10] [arm64] Correct exception filter stack frame size Fixes: #14170 Fixes: dotnet/android#3112 The `call_filter()` function generated by `mono_arch_get_call_filter()` was overwriting a part of the previous stack frame because it was not creating a large enough new stack frame for itself. This had been working by luck before the switch to building the runtime with Android NDK r19. I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to verify that this change resolved both of the issues linked above. I also confirmed that I was able to reintroduce the issues in my local environment by removing the change and rebuilding. After this change, the generated `call_filter()` function now starts by reserving sufficient space on the stack to hold a 64-bit value at the `ctx_offset` location. The 64-bit `ctx` value is saved to the `ctx_offset` location (offset 344 in the new stack frame) shortly afterwards: stp x29, x30, [sp,#-352]! mov x29, sp str x0, [x29,#344] As expected, the invocation of `call_filter()` now no longer modifies the top of the previous stack frame. Top of the stack before `call_filter()`: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 Top of the stack after `call_filter()` (unchanged): (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 <details> <summary>Additional background information</summary> ### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1] 1. The `resume` parameter for `mono_handle_exception_internal()` is held in register `w23`. 2. That register is saved into the current stack frame at offset 20 by a `str` instruction: 0x00000000000bc1bc <+3012>: str w23, [sp,#20] 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction: 2279 filtered = call_filter (ctx, ei->data.filter); 0x00000000000bc60c <+4116>: add x9, x22, x24, lsl #6 0x00000000000bc610 <+4120>: ldr x8, [x8,#3120] 0x00000000000bc614 <+4124>: ldr x1, [x9,#112] 0x00000000000bc618 <+4128>: add x0, sp, #0x110 0x00000000000bc61c <+4132>: blr x8 Before the `blr` instruction, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x0a8b31d8 0x00000071 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 4. `call_filter()` runs. This function is generated by `mono_arch_get_call_filter()` and starts with: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] Note in particular how the first line subtracts 336 from `sp` to start a new stack frame, but the third line writes to a position that is 344 bytes beyond that, back within the *previous* stack frame. 5. After the invocations of `call_filter()` and `handle_exception_first_pass()`, `w23` is restored from the stack: 0x00000000000bc820 <+4648>: ldr w23, [sp,#20] At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x00000000 0x00000000 0x1ef51980 0x00000071 Notice how `call_filter()` has overwritten bytes 8 through 15 of the stack frame. In this original lucky scenario, this does not affect the value restored into register `w23` because that value starts at byte 20. 6. `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000bb960 <+872>: cbz w23, 0xbb9c4 <mono_handle_exception_internal+972> Since `w23` is `0`, `mono_handle_exception_internal()` correctly continues into the `else` branch. ### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently 1. As before, the local `resume` parameter starts in register `w23`. 2. This time, the register is saved into the stack frame at offset 12 (instead of 20): 0x00000000000bed7c <+3200>: str w23, [sp,#12] 3. As before, `handle_exception_first_pass()` invokes `call_filter()` via a `blr` instruction. At this step, the top of the stack looks like this: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0x2724b700 0x00000000 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 4. `call_filter()` runs as before. And the first few instructions of that function are the same as before: stp x29, x30, [sp,#-336]! mov x29, sp str x0, [x29,#344] 5. `w23` is again restored from the stack, but this time from offset 12: 0x00000000000bf2c0 <+4548>: ldr w23, [sp,#12] The problem is that `call_filter()` has again overwritten bytes 8 through 15 of the stack frame: (gdb) x/8x $sp 0x7fcd2774a0: 0x7144d250 0x00000000 0xcd2775b0 0x0000007f 0x7fcd2774b0: 0x1ee19ff0 0x00000071 0x0cc00300 0x00000071 So after the `ldr` instruction, `w23` has an incorrect value of `0x7f` rather than the correct value `0`. 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to decide how to set the `ji` local variable: 2574 if (resume) { 0x00000000000be430 <+820>: cbz w23, 0xbe834 <mono_handle_exception_internal+1848> But this time because `w23` is *not* zero, execution continues *incorrectly* into the `if` branch rather than the `else` branch. The result is that `ji` is set to `0` rather than a valid `(MonoJitInfo *)` value. This incorrect value for `ji` leads to a null pointer dereference, as seen in the bug reports. [0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22 [1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/ [2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/ </details> Backport of #14757. /cc @lambdageek @brendanzagaeski
|
@monojenkins build failed |
|
@brendanzagaeski @lambdageek all the backports are merged, you can start the bumping fun 🙂 |
|
Cannot squash because the following required status checks are not successful:
|
|
@brendanzagaeski great fix! I appreciate the time you spent on this issue |
Fixes: #14170
Fixes: dotnet/android#3112
The
call_filter()function generated bymono_arch_get_call_filter()was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself. This had been
working by luck before the switch to building the runtime with Android
NDK r19.
I used a local build of xamarin/xamarin-android/d16-1@87a80b to
verify that this change resolved both of the issues linked above. I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.
After this change, the generated
call_filter()function now starts byreserving sufficient space on the stack to hold a 64-bit value at the
ctx_offsetlocation. The 64-bitctxvalue is saved to thectx_offsetlocation (offset 344 in the new stack frame) shortlyafterwards:
As expected, the invocation of
call_filter()now no longer modifiesthe top of the previous stack frame.
Top of the stack before
call_filter():Top of the stack after
call_filter()(unchanged):Additional background information
The original lucky, "good" behavior for mono/mono/2018-08@725ba2a built with Android NDK r14
The
resumeparameter formono_handle_exception_internal()isheld in register
w23.That register is saved into the current stack frame at offset 20 by
a
strinstruction:handle_exception_first_pass()invokescall_filter()via ablrinstruction:
Before the
blrinstruction, the top of the stack looks like this:call_filter()runs. This function is generated bymono_arch_get_call_filter()and starts with:Note in particular how the first line subtracts 336 from
sptostart a new stack frame, but the third line writes to a position
that is 344 bytes beyond that, back within the previous stack
frame.
After the invocations of
call_filter()andhandle_exception_first_pass(),w23is restored from the stack:At this step, the top of the stack looks like this:
Notice how
call_filter()has overwritten bytes 8 through 15 of thestack frame. In this original lucky scenario, this does not affect
the value restored into register
w23because that value starts atbyte 20.
mono_handle_exception_internal()tests the value ofw23todecide how to set the
jilocal variable:Since
w23is0,mono_handle_exception_internal()correctlycontinues into the
elsebranch.The bad behavior for mono/mono/2018-08@725ba2a built with Android NDK r19 works slightly differently
As before, the local
resumeparameter starts in registerw23.This time, the register is saved into the stack frame at offset 12
(instead of 20):
As before,
handle_exception_first_pass()invokescall_filter()via a
blrinstruction.At this step, the top of the stack looks like this:
call_filter()runs as before. And the first few instructions ofthat function are the same as before:
w23is again restored from the stack, but this time from offset 12:The problem is that
call_filter()has again overwritten bytes 8through 15 of the stack frame:
So after the
ldrinstruction,w23has an incorrect value of0x7frather than the correct value0.As before,
mono_handle_exception_internal()tests the value ofw23todecide how to set the
jilocal variable:But this time because
w23is not zero, execution continuesincorrectly into the
ifbranch rather than theelsebranch.The result is that
jiis set to0rather than a valid(MonoJitInfo *)value. This incorrect value forjileads to anull pointer dereference, as seen in the bug reports.