Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@brendanzagaeski
Copy link
Contributor

Fixes: #14170
Fixes: dotnet/android#3112

The call_filter() function generated by mono_arch_get_call_filter()
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself. This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of xamarin/xamarin-android/d16-1@87a80b to
verify that this change resolved both of the issues linked above. I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated call_filter() function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
ctx_offset location. The 64-bit ctx value is saved to the
ctx_offset location (offset 344 in the new stack frame) shortly
afterwards:

stp	x29, x30, [sp,#-352]!
mov	x29, sp
str	x0, [x29,#344]

As expected, the invocation of call_filter() now no longer modifies
the top of the previous stack frame.

Top of the stack before call_filter():

(gdb) x/8x $sp
0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after call_filter() (unchanged):

(gdb) x/8x $sp
0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071
Additional background information

The original lucky, "good" behavior for mono/mono/2018-08@725ba2a built with Android NDK r14

  1. The resume parameter for mono_handle_exception_internal() is
    held in register w23.

  2. That register is saved into the current stack frame at offset 20 by
    a str instruction:

       0x00000000000bc1bc <+3012>:	str	w23, [sp,#20]
    
  3. handle_exception_first_pass() invokes call_filter() via a blr
    instruction:

    2279							filtered = call_filter (ctx, ei->data.filter);
       0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl #6
       0x00000000000bc610 <+4120>:	ldr	x8, [x8,#3120]
       0x00000000000bc614 <+4124>:	ldr	x1, [x9,#112]
       0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
       0x00000000000bc61c <+4132>:	blr	x8
    

    Before the blr instruction, the top of the stack looks like this:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
    0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071
    
  4. call_filter() runs. This function is generated by
    mono_arch_get_call_filter() and starts with:

       stp	x29, x30, [sp,#-336]!
       mov	x29, sp
       str	x0, [x29,#344]
    

    Note in particular how the first line subtracts 336 from sp to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the previous stack
    frame.

  5. After the invocations of call_filter() and
    handle_exception_first_pass(), w23 is restored from the stack:

       0x00000000000bc820 <+4648>:	ldr	w23, [sp,#20]
    

    At this step, the top of the stack looks like this:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
    0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071
    

    Notice how call_filter() has overwritten bytes 8 through 15 of the
    stack frame. In this original lucky scenario, this does not affect
    the value restored into register w23 because that value starts at
    byte 20.

  6. mono_handle_exception_internal() tests the value of w23 to
    decide how to set the ji local variable:

    2574			if (resume) {
       0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>
    

    Since w23 is 0, mono_handle_exception_internal() correctly
    continues into the else branch.

The bad behavior for mono/mono/2018-08@725ba2a built with Android NDK r19 works slightly differently

  1. As before, the local resume parameter starts in register w23.

  2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

       0x00000000000bed7c <+3200>:	str	w23, [sp,#12]
    
  3. As before, handle_exception_first_pass() invokes call_filter()
    via a blr instruction.

    At this step, the top of the stack looks like this:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071
    
  4. call_filter() runs as before. And the first few instructions of
    that function are the same as before:

    stp	x29, x30, [sp,#-336]!
    mov	x29, sp
    str	x0, [x29,#344]
    
  5. w23 is again restored from the stack, but this time from offset 12:

       0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,#12]
    

    The problem is that call_filter() has again overwritten bytes 8
    through 15 of the stack frame:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071
    

    So after the ldr instruction, w23 has an incorrect value of
    0x7f rather than the correct value 0.

  6. As before, mono_handle_exception_internal() tests the value of w23 to
    decide how to set the ji local variable:

    2574			if (resume) {
       0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>
    

    But this time because w23 is not zero, execution continues
    incorrectly into the if branch rather than the else branch.
    The result is that ji is set to 0 rather than a valid
    (MonoJitInfo *) value. This incorrect value for ji leads to a
    null pointer dereference, as seen in the bug reports.

Fixes: mono#14170
Fixes: dotnet/android#3112

The `call_filter()` function generated by `mono_arch_get_call_filter()`
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself.  This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to
verify that this change resolved both of the issues linked above.  I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated `call_filter()` function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
`ctx_offset` location.  The 64-bit `ctx` value is saved to the
`ctx_offset` location (offset 344 in the new stack frame) shortly
afterwards:

    stp	x29, x30, [sp,#-352]!
    mov	x29, sp
    str	x0, [x29,mono#344]

As expected, the invocation of `call_filter()` now no longer modifies
the top of the previous stack frame.

Top of the stack before `call_filter()`:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after `call_filter()` (unchanged):

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Additional background information
=================================

The original lucky, "good" behavior worked as follows for
[mono/mono@725ba2a built with Android NDK r14][1]:

 1. The `resume` parameter for `mono_handle_exception_internal()` is
    held in register `w23`.

 2. That register is saved into the current stack frame at offset 20 by
    a `str` instruction:

           0x00000000000bc1bc <+3012>:	str	w23, [sp,mono#20]

 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr`
    instruction:

        2279							filtered = call_filter (ctx, ei->data.filter);
           0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl mono#6
           0x00000000000bc610 <+4120>:	ldr	x8, [x8,mono#3120]
           0x00000000000bc614 <+4124>:	ldr	x1, [x9,mono#112]
           0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
           0x00000000000bc61c <+4132>:	blr	x8

    Before the `blr` instruction, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

 4. `call_filter()` runs.  This function is generated by
    `mono_arch_get_call_filter()` and starts with:

           stp	x29, x30, [sp,#-336]!
           mov	x29, sp
           str	x0, [x29,mono#344]

    Note in particular how the first line subtracts 336 from `sp` to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the *previous* stack
    frame.

 5. After the invocations of `call_filter()` and
    `handle_exception_first_pass()`, `w23` is restored from the stack:

           0x00000000000bc820 <+4648>:	ldr	w23, [sp,mono#20]

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

    Notice how `call_filter()` has overwritten bytes 8 through 15 of the
    stack frame.  In this original lucky scenario, this does not affect
    the value restored into register `w23` because that value starts at
    byte 20.

 6. `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>

    Since `w23` is `0`, `mono_handle_exception_internal()` correctly
    continues into the `else` branch.

The bad behavior for
[mono/mono@725ba2a built with Android NDK r19][2]
works just slightly differently:

 1. As before, the local `resume` parameter starts in register `w23`.
 2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

           0x00000000000bed7c <+3200>:	str	w23, [sp,mono#12]

 3. As before, `handle_exception_first_pass()` invokes `call_filter()`
    via a `blr` instruction.

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

 4. `call_filter()` runs as before.  And the first few instructions of
    that function are the same as before:

        stp	x29, x30, [sp,#-336]!
        mov	x29, sp
        str	x0, [x29,mono#344]

 5. `w23` is again restored from the stack, but this time from offset 12:

           0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,mono#12]

    The problem is that `call_filter()` has again overwritten bytes 8
    through 15 of the stack frame:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

    So after the `ldr` instruction, `w23` has an incorrect value of
    `0x7f` rather than the correct value `0`.

 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>

    But this time because `w23` is *not* zero, execution continues
    *incorrectly* into the `if` branch rather than the `else` branch.
    The result is that `ji` is set to `0` rather than a valid
    `(MonoJitInfo *)` value.  This incorrect value for `ji` leads to a
    null pointer dereference, as seen in the bug reports.

[0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22
[1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/
[2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/
@marek-safar
Copy link
Member

build

@brendanzagaeski
Copy link
Contributor Author

(/cc @lambdageek) To fill in a little more background story about how I ended up at this PR, I was originally just planning to do a few experiments with the different versions of libmonosgen-2.0.so before I tried the build configuration tests we had discussed, but then curiosity got the better of me, and I think I ended up identifying an appropriate fix.

To do:

  • I didn't investigate to see if there would be a good way to add an automated test around this. If appropriate, feel free to add one or point me in the right direction to add one.
  • I will follow up on Adding: How-to Native Debugging of Mono using Android Studio dotnet/android#2942 to get that merged. I will include some additional tips I now have for using lldb and gdb with the Mono runtime on Android devices.

fregs_offset = offset;
offset += num_fregs * 8;
ctx_offset = offset;
ctx_offset += 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think the more correct change (in the sense of saving a couple of bytes on the stack frame) would be to change this line:

ctx_offset += 8;

to

offset += 8;

then the frame size computation below would be correct.

Great find! 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should totally do this. The code can be pretty confusing as is. Having a mysterious unused stack slot makes it also harder to follow.

@tobiasschulz
Copy link

tobiasschulz commented Jun 3, 2019

We desperately need a fix for this. This is a desaster.

This is broken in the stable version of VS2019 16.1.1 and there doesn't seem to be a way to downgrade to 16.0.X for those who are using VS Community.

@marek-safar
Copy link
Member

/cc @SamMonoRT

@marek-safar marek-safar added this to the 2019-02 (6.0.xx) milestone Jun 3, 2019
@lewurm
Copy link
Contributor

lewurm commented Jun 3, 2019

@brendanzagaeski I've updated the PR so that it reflects my comment (since Vlad confirmed my observation). Before we start the backport process I would appreciate if you could give it another test on an Android device.

@brendanzagaeski
Copy link
Contributor Author

Oooh, yes, that makes a lot of sense.

My updated testing results with that change are GOOD. The test apps I was using for both issues run correctly.

The top of the generated call_filter() is now:

stp	x29, x30, [sp,#-352]!
mov	x29, sp
str	x0, [x29,#336]
stp	x19, x20, [x29,#168]
stp	x21, x22, [x29,#184]
stp	x23, x24, [x29,#200]
stp	x25, x26, [x29,#216]
stp	x27, x28, [x29,#232]
str	x29, [x29,#248]
str	d8, [x29,#272]
str	d9, [x29,#280]
str	d10, [x29,#288]
str	d11, [x29,#296]
str	d12, [x29,#304]
str	d13, [x29,#312]
str	d14, [x29,#320]
str	d15, [x29,#328]
ldp	x19, x20, [x0,#152]

So the ctx is saved appropriately at offset 336, just after where d15 is saved. (The calculated size for the stack frame in the final output native instructions is still 352 due to the ALIGN_TO() macro, but that's OK. We still get the advantage Vlad mentioned that the C code now correctly matches the intention.)

And the stack contents comparing before and after call_filter() is correct as expected. Top of the stack before call_filter():

(gdb) x/8x $sp
0x7fcd2774a0:	0x7144d250	0x00000000	0xab77aed8	0x00000000
0x7fcd2774b0:	0x273d2ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after call_filter() (unchanged):

(gdb) x/8x $sp
0x7fcd2774a0:	0x7144d250	0x00000000	0xab77aed8	0x00000000
0x7fcd2774b0:	0x273d2ff0	0x00000071	0x0cc00300	0x00000071

@lambdageek
Copy link
Member

@monojenkins backport to 2019-06
@monojenkins backport to 2019-02
@monojenkins backport to 2018-10
@monojenkins backport to 2018-08

@lewurm
Copy link
Contributor

lewurm commented Jun 3, 2019

@monojenkins squash

@marek-safar marek-safar closed this Jun 3, 2019
@marek-safar marek-safar reopened this Jun 3, 2019
@marek-safar marek-safar removed this from the 2019-02 (6.0.xx) milestone Jun 3, 2019
monojenkins added a commit that referenced this pull request Jun 3, 2019
[2018-08] [arm64] Correct exception filter stack frame size

Fixes: #14170
Fixes: dotnet/android#3112

The `call_filter()` function generated by `mono_arch_get_call_filter()`
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself.  This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to
verify that this change resolved both of the issues linked above.  I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated `call_filter()` function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
`ctx_offset` location.  The 64-bit `ctx` value is saved to the
`ctx_offset` location (offset 344 in the new stack frame) shortly
afterwards:

    stp	x29, x30, [sp,#-352]!
    mov	x29, sp
    str	x0, [x29,#344]

As expected, the invocation of `call_filter()` now no longer modifies
the top of the previous stack frame.

Top of the stack before `call_filter()`:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after `call_filter()` (unchanged):

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

<details>
<summary>Additional background information</summary>

### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1]

 1. The `resume` parameter for `mono_handle_exception_internal()` is
    held in register `w23`.

 2. That register is saved into the current stack frame at offset 20 by
    a `str` instruction:

           0x00000000000bc1bc <+3012>:	str	w23, [sp,#20]

 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr`
    instruction:

        2279							filtered = call_filter (ctx, ei->data.filter);
           0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl #6
           0x00000000000bc610 <+4120>:	ldr	x8, [x8,#3120]
           0x00000000000bc614 <+4124>:	ldr	x1, [x9,#112]
           0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
           0x00000000000bc61c <+4132>:	blr	x8

    Before the `blr` instruction, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

 4. `call_filter()` runs.  This function is generated by
    `mono_arch_get_call_filter()` and starts with:

           stp	x29, x30, [sp,#-336]!
           mov	x29, sp
           str	x0, [x29,#344]

    Note in particular how the first line subtracts 336 from `sp` to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the *previous* stack
    frame.

 5. After the invocations of `call_filter()` and
    `handle_exception_first_pass()`, `w23` is restored from the stack:

           0x00000000000bc820 <+4648>:	ldr	w23, [sp,#20]

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

    Notice how `call_filter()` has overwritten bytes 8 through 15 of the
    stack frame.  In this original lucky scenario, this does not affect
    the value restored into register `w23` because that value starts at
    byte 20.

 6. `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>

    Since `w23` is `0`, `mono_handle_exception_internal()` correctly
    continues into the `else` branch.

### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently

 1. As before, the local `resume` parameter starts in register `w23`.
 2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

           0x00000000000bed7c <+3200>:	str	w23, [sp,#12]

 3. As before, `handle_exception_first_pass()` invokes `call_filter()`
    via a `blr` instruction.

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

 4. `call_filter()` runs as before.  And the first few instructions of
    that function are the same as before:

        stp	x29, x30, [sp,#-336]!
        mov	x29, sp
        str	x0, [x29,#344]

 5. `w23` is again restored from the stack, but this time from offset 12:

           0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,#12]

    The problem is that `call_filter()` has again overwritten bytes 8
    through 15 of the stack frame:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

    So after the `ldr` instruction, `w23` has an incorrect value of
    `0x7f` rather than the correct value `0`.

 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>

    But this time because `w23` is *not* zero, execution continues
    *incorrectly* into the `if` branch rather than the `else` branch.
    The result is that `ji` is set to `0` rather than a valid
    `(MonoJitInfo *)` value.  This incorrect value for `ji` leads to a
    null pointer dereference, as seen in the bug reports.

[0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22
[1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/
[2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/

</details>

Backport of #14757.

/cc @lambdageek @brendanzagaeski
@monojenkins
Copy link
Contributor

Cannot squash because the following required status checks are not successful:

  • "OS X x64 Android SDK" state is "failure"
  • "Windows x64 C++" state is "failure"

monojenkins added a commit that referenced this pull request Jun 3, 2019
[2019-02] [arm64] Correct exception filter stack frame size

Fixes: #14170
Fixes: dotnet/android#3112

The `call_filter()` function generated by `mono_arch_get_call_filter()`
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself.  This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to
verify that this change resolved both of the issues linked above.  I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated `call_filter()` function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
`ctx_offset` location.  The 64-bit `ctx` value is saved to the
`ctx_offset` location (offset 344 in the new stack frame) shortly
afterwards:

    stp	x29, x30, [sp,#-352]!
    mov	x29, sp
    str	x0, [x29,#344]

As expected, the invocation of `call_filter()` now no longer modifies
the top of the previous stack frame.

Top of the stack before `call_filter()`:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after `call_filter()` (unchanged):

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

<details>
<summary>Additional background information</summary>

### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1]

 1. The `resume` parameter for `mono_handle_exception_internal()` is
    held in register `w23`.

 2. That register is saved into the current stack frame at offset 20 by
    a `str` instruction:

           0x00000000000bc1bc <+3012>:	str	w23, [sp,#20]

 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr`
    instruction:

        2279							filtered = call_filter (ctx, ei->data.filter);
           0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl #6
           0x00000000000bc610 <+4120>:	ldr	x8, [x8,#3120]
           0x00000000000bc614 <+4124>:	ldr	x1, [x9,#112]
           0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
           0x00000000000bc61c <+4132>:	blr	x8

    Before the `blr` instruction, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

 4. `call_filter()` runs.  This function is generated by
    `mono_arch_get_call_filter()` and starts with:

           stp	x29, x30, [sp,#-336]!
           mov	x29, sp
           str	x0, [x29,#344]

    Note in particular how the first line subtracts 336 from `sp` to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the *previous* stack
    frame.

 5. After the invocations of `call_filter()` and
    `handle_exception_first_pass()`, `w23` is restored from the stack:

           0x00000000000bc820 <+4648>:	ldr	w23, [sp,#20]

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

    Notice how `call_filter()` has overwritten bytes 8 through 15 of the
    stack frame.  In this original lucky scenario, this does not affect
    the value restored into register `w23` because that value starts at
    byte 20.

 6. `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>

    Since `w23` is `0`, `mono_handle_exception_internal()` correctly
    continues into the `else` branch.

### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently

 1. As before, the local `resume` parameter starts in register `w23`.
 2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

           0x00000000000bed7c <+3200>:	str	w23, [sp,#12]

 3. As before, `handle_exception_first_pass()` invokes `call_filter()`
    via a `blr` instruction.

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

 4. `call_filter()` runs as before.  And the first few instructions of
    that function are the same as before:

        stp	x29, x30, [sp,#-336]!
        mov	x29, sp
        str	x0, [x29,#344]

 5. `w23` is again restored from the stack, but this time from offset 12:

           0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,#12]

    The problem is that `call_filter()` has again overwritten bytes 8
    through 15 of the stack frame:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

    So after the `ldr` instruction, `w23` has an incorrect value of
    `0x7f` rather than the correct value `0`.

 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>

    But this time because `w23` is *not* zero, execution continues
    *incorrectly* into the `if` branch rather than the `else` branch.
    The result is that `ji` is set to `0` rather than a valid
    `(MonoJitInfo *)` value.  This incorrect value for `ji` leads to a
    null pointer dereference, as seen in the bug reports.

[0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22
[1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/
[2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/

</details>

Backport of #14757.

/cc @lambdageek @brendanzagaeski
monojenkins added a commit that referenced this pull request Jun 4, 2019
[2019-06] [arm64] Correct exception filter stack frame size

Fixes: #14170
Fixes: dotnet/android#3112

The `call_filter()` function generated by `mono_arch_get_call_filter()`
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself.  This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to
verify that this change resolved both of the issues linked above.  I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated `call_filter()` function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
`ctx_offset` location.  The 64-bit `ctx` value is saved to the
`ctx_offset` location (offset 344 in the new stack frame) shortly
afterwards:

    stp	x29, x30, [sp,#-352]!
    mov	x29, sp
    str	x0, [x29,#344]

As expected, the invocation of `call_filter()` now no longer modifies
the top of the previous stack frame.

Top of the stack before `call_filter()`:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after `call_filter()` (unchanged):

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

<details>
<summary>Additional background information</summary>

### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1]

 1. The `resume` parameter for `mono_handle_exception_internal()` is
    held in register `w23`.

 2. That register is saved into the current stack frame at offset 20 by
    a `str` instruction:

           0x00000000000bc1bc <+3012>:	str	w23, [sp,#20]

 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr`
    instruction:

        2279							filtered = call_filter (ctx, ei->data.filter);
           0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl #6
           0x00000000000bc610 <+4120>:	ldr	x8, [x8,#3120]
           0x00000000000bc614 <+4124>:	ldr	x1, [x9,#112]
           0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
           0x00000000000bc61c <+4132>:	blr	x8

    Before the `blr` instruction, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

 4. `call_filter()` runs.  This function is generated by
    `mono_arch_get_call_filter()` and starts with:

           stp	x29, x30, [sp,#-336]!
           mov	x29, sp
           str	x0, [x29,#344]

    Note in particular how the first line subtracts 336 from `sp` to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the *previous* stack
    frame.

 5. After the invocations of `call_filter()` and
    `handle_exception_first_pass()`, `w23` is restored from the stack:

           0x00000000000bc820 <+4648>:	ldr	w23, [sp,#20]

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

    Notice how `call_filter()` has overwritten bytes 8 through 15 of the
    stack frame.  In this original lucky scenario, this does not affect
    the value restored into register `w23` because that value starts at
    byte 20.

 6. `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>

    Since `w23` is `0`, `mono_handle_exception_internal()` correctly
    continues into the `else` branch.

### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently

 1. As before, the local `resume` parameter starts in register `w23`.
 2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

           0x00000000000bed7c <+3200>:	str	w23, [sp,#12]

 3. As before, `handle_exception_first_pass()` invokes `call_filter()`
    via a `blr` instruction.

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

 4. `call_filter()` runs as before.  And the first few instructions of
    that function are the same as before:

        stp	x29, x30, [sp,#-336]!
        mov	x29, sp
        str	x0, [x29,#344]

 5. `w23` is again restored from the stack, but this time from offset 12:

           0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,#12]

    The problem is that `call_filter()` has again overwritten bytes 8
    through 15 of the stack frame:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

    So after the `ldr` instruction, `w23` has an incorrect value of
    `0x7f` rather than the correct value `0`.

 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>

    But this time because `w23` is *not* zero, execution continues
    *incorrectly* into the `if` branch rather than the `else` branch.
    The result is that `ji` is set to `0` rather than a valid
    `(MonoJitInfo *)` value.  This incorrect value for `ji` leads to a
    null pointer dereference, as seen in the bug reports.

[0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22
[1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/
[2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/

</details>

Backport of #14757.

/cc @lambdageek @brendanzagaeski
monojenkins added a commit that referenced this pull request Jun 4, 2019
[2018-10] [arm64] Correct exception filter stack frame size

Fixes: #14170
Fixes: dotnet/android#3112

The `call_filter()` function generated by `mono_arch_get_call_filter()`
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself.  This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to
verify that this change resolved both of the issues linked above.  I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated `call_filter()` function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
`ctx_offset` location.  The 64-bit `ctx` value is saved to the
`ctx_offset` location (offset 344 in the new stack frame) shortly
afterwards:

    stp	x29, x30, [sp,#-352]!
    mov	x29, sp
    str	x0, [x29,#344]

As expected, the invocation of `call_filter()` now no longer modifies
the top of the previous stack frame.

Top of the stack before `call_filter()`:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after `call_filter()` (unchanged):

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

<details>
<summary>Additional background information</summary>

### The original lucky, "good" behavior for [725ba2a built with Android NDK r14][1]

 1. The `resume` parameter for `mono_handle_exception_internal()` is
    held in register `w23`.

 2. That register is saved into the current stack frame at offset 20 by
    a `str` instruction:

           0x00000000000bc1bc <+3012>:	str	w23, [sp,#20]

 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr`
    instruction:

        2279							filtered = call_filter (ctx, ei->data.filter);
           0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl #6
           0x00000000000bc610 <+4120>:	ldr	x8, [x8,#3120]
           0x00000000000bc614 <+4124>:	ldr	x1, [x9,#112]
           0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
           0x00000000000bc61c <+4132>:	blr	x8

    Before the `blr` instruction, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

 4. `call_filter()` runs.  This function is generated by
    `mono_arch_get_call_filter()` and starts with:

           stp	x29, x30, [sp,#-336]!
           mov	x29, sp
           str	x0, [x29,#344]

    Note in particular how the first line subtracts 336 from `sp` to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the *previous* stack
    frame.

 5. After the invocations of `call_filter()` and
    `handle_exception_first_pass()`, `w23` is restored from the stack:

           0x00000000000bc820 <+4648>:	ldr	w23, [sp,#20]

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

    Notice how `call_filter()` has overwritten bytes 8 through 15 of the
    stack frame.  In this original lucky scenario, this does not affect
    the value restored into register `w23` because that value starts at
    byte 20.

 6. `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>

    Since `w23` is `0`, `mono_handle_exception_internal()` correctly
    continues into the `else` branch.

### The bad behavior for [725ba2a built with Android NDK r19][2] works slightly differently

 1. As before, the local `resume` parameter starts in register `w23`.
 2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

           0x00000000000bed7c <+3200>:	str	w23, [sp,#12]

 3. As before, `handle_exception_first_pass()` invokes `call_filter()`
    via a `blr` instruction.

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

 4. `call_filter()` runs as before.  And the first few instructions of
    that function are the same as before:

        stp	x29, x30, [sp,#-336]!
        mov	x29, sp
        str	x0, [x29,#344]

 5. `w23` is again restored from the stack, but this time from offset 12:

           0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,#12]

    The problem is that `call_filter()` has again overwritten bytes 8
    through 15 of the stack frame:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

    So after the `ldr` instruction, `w23` has an incorrect value of
    `0x7f` rather than the correct value `0`.

 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>

    But this time because `w23` is *not* zero, execution continues
    *incorrectly* into the `if` branch rather than the `else` branch.
    The result is that `ji` is set to `0` rather than a valid
    `(MonoJitInfo *)` value.  This incorrect value for `ji` leads to a
    null pointer dereference, as seen in the bug reports.

[0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22
[1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/
[2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/

</details>

Backport of #14757.

/cc @lambdageek @brendanzagaeski
@lewurm
Copy link
Contributor

lewurm commented Jun 4, 2019

@monojenkins build failed

@lewurm
Copy link
Contributor

lewurm commented Jun 4, 2019

@brendanzagaeski @lambdageek all the backports are merged, you can start the bumping fun 🙂

@monojenkins
Copy link
Contributor

Cannot squash because the following required status checks are not successful:

  • "OS X x64 Android SDK" state is "failure"

@marek-safar marek-safar merged commit 888a8d4 into mono:master Jun 5, 2019
@marek-safar
Copy link
Member

@brendanzagaeski great fix! I appreciate the time you spent on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

7 participants