Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

aykevl
Copy link
Contributor

@aykevl aykevl commented Feb 13, 2018

By adding __builtin_unreachable() at the end of nlr_push, we're essentially telling the compiler that this function will never return. When GCC link-time optimisation is in use, this means that any time nlr_push() is called (which is often), the compiler thinks this function will never return and thus eliminates all code following the call. It breaks the nrf port, which uses -flto to reduce code size. When the port compiles with 97cc485, the code size is reduced by about 10K so it was easy to see there's some invalid code elimination going on.

Note: older GCC versions might complain about a missing return statement, but at least version 5.4.1 (Debian stretch) doesn't. If there are any versions that complain, I would propose inserting a return statement anyway for that GCC version and lower as a workaround.

See also:
#3484
#3492
97cc485 (introducing the issue)

@aykevl
Copy link
Contributor Author

aykevl commented Feb 13, 2018

I see one of the tests fails, but it looks unrelated:

CC ../../ports/stm32/pin.c
../../ports/stm32/pin.c: In function 'pin_obj_init_helper':
../../ports/stm32/pin.c:379:32: error: 'GPIO_SPEED_FREQ_HIGH' undeclared (first use in this function)
     GPIO_InitStructure.Speed = GPIO_SPEED_FREQ_HIGH;
                                ^
../../ports/stm32/pin.c:379:32: note: each undeclared identifier is reported only once for each function it appears in
make: *** [build/ports/stm32/pin.o] Error 1
make: Leaving directory `/home/travis/build/micropython/micropython/ports/teensy'
The command "make -C ports/teensy" exited with 2.

@stinos
Copy link
Contributor

stinos commented Feb 14, 2018

Since the comment on __builtin_reachable says what it's there for, and that seems a good reason, I don't think simply removing this is acceptable (not sure, I didn't write it). I'd think the fix is to leave it there, conditionally. Like, should be there by default, but shouldn't be there when -flto is active and gcc version is new enough.

@dpgeorge
Copy link
Member

I see one of the tests fails, but it looks unrelated:

That should be fixed now. If you rebase on to the latest then Travis should build OK.

@dpgeorge
Copy link
Member

As you see, it's not easy to get this right :) Especially to support all versions of gcc (of which older ones definitely seem buggy) and also LTO (which seems a bit fragile with respect to hand-written assembler code).

I'd think the fix is to leave it there, conditionally. Like, should be there by default, but shouldn't be there when -flto is active and gcc version is new enough.

That's also what I'd suggest. @aykevl are you able to add such a check for the version (not sure which version...) and/or LTO?

@aykevl aykevl force-pushed the fix-lto-nlr branch 2 times, most recently from 921c5e7 to 87f8b47 Compare February 14, 2018 17:30
@aykevl
Copy link
Contributor Author

aykevl commented Feb 14, 2018

I've investigated this a bit more, and updated the PR.

Some older GCC versions indeed complain about not returning anything in a naked function. Version 4.6 and lower does, version 5.4 and up doesn't, so I've added a return 0 only for GCC version < 5.4. Looking at the generated assembly code, returning 0 results in mov r0, #0 which is never reached but harmless. Using __builtin_unreachable() instead eliminates lots of reachable code.

To further verify that __builtin_unreachable() marks the function as noreturn, I've created a small gist: https://gist.github.com/aykevl/ecdb91a6ade834879849dde160fa36fc
It is made for an ARM system, like the Raspberry Pi, and can be compiled using gcc -o lto-test lto-test.c -O2. With a commented-out __builtin_unreachabe() the program does what you would expect. But when you uncomment __builtin_unreachable(), everything following the call to foo() is removed, effectively marking it noreturn. This is easy to verify in the generated assembly code:

without __builtin_unreachable():

00010344 <main>:
   10344:       e92d4010        push    {r4, lr}
   10348:       eb000055        bl      104a4 <foo>
   1034c:       e1a01000        mov     r1, r0
   10350:       e59f0008        ldr     r0, [pc, #8]    ; 10360 <main+0x1c>
   10354:       ebffffeb        bl      10308 <printf@plt>
   10358:       e3a00000        mov     r0, #0
   1035c:       e8bd8010        pop     {r4, pc}
   10360:       0001051c        .word   0x0001051c

with __builtin_unreachable():

00010314 <main>:
   10314:       e92d4010        push    {r4, lr}
   10318:       eb00004f        bl      1045c <foo>

So all in all I think __builtin_unreachable() should never be used in a case like this. Returning something (like 0) is technically invalid, but for all I've seen it doesn't harm either. Only doing that for older GCC versions makes sure we won't add the unnecessary mov r0, #0 and we will not somehow cause problems in the future.

@aykevl
Copy link
Contributor Author

aykevl commented Feb 14, 2018

Note: it increases code size on Travis (using GCC 4.8). I'll test a few things and update this comment.

UPDATE
It appears not returning anything from naked functions is allowed at least in GCC 4.8, so I've updated the check.
__builtin_unreachable() has been added in GCC 4.5, so it would only cover 3 minor versions (4.5-4.7). So for that reason alone it would be a poor choice.

Again, there is a failing test but I can't see how it's related to this PR:

$ (cd tests && MICROPY_CPYTHON3=python3.4 MICROPY_MICROPYTHON=../ports/unix/micropython_coverage ./run-tests -d thread)
[...]
17 tests performed (55 individual testcases)
16 tests passed
9 tests skipped: mutate_bytearray mutate_dict mutate_instance mutate_list mutate_set stress_heap stress_recurse thread_gc1 thread_lock4
1 tests failed: thread_qstr1

@dpgeorge
Copy link
Member

@aykevl thanks for the detailed investigation and testing. I'm happy with the PR as it stands now.

py/nlrthumb.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this will probably not work well with non-GNUC compilers, because the __GNUC__ < 4 check will probably succeed if __GNUC__ is not defined. So I guess it needs to retain the `defined(GNUC) bit at the start of the if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that check should be there. Fixed it.

By adding __builtin_unreachable() at the end of nlr_push, we're
essentially telling the compiler that this function will never return.
When GCC LTO is in use, this means that any time nlr_push() is called
(which is often), the compiler thinks this function will never return
and thus eliminates all code following the call.

Note: I've added a 'return 0' for older GCC versions like 4.6 which
complain about not returning anything (which doesn't make sense in a
naked function). Newer GCC versions (tested 4.8, 5.4 and some others)
don't complain about this.
@dpgeorge dpgeorge merged commit 5591bd2 into micropython:master Feb 18, 2018
@dpgeorge
Copy link
Member

Thanks for updating the PR, and thanks for writing a good commit message! Merged.

@aykevl aykevl deleted the fix-lto-nlr branch February 18, 2018 14:25
tannewt added a commit to tannewt/circuitpython that referenced this pull request Dec 3, 2020
Moving Adafruit_CircuitPython_BusDevice to core
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants