mpy_ld.py: Support modules larger than 4KiB on armv6m #12241

jonnor · 2023-08-15T20:15:54Z

Hit a problem with modules larger than 4 KiB when working on dynamic native modules for RP2040. This should fix that for ARM Thumb targets, using the similar strategy used for ARM Thumb 2 targets.

Have tested it on RP2040, and it seems to work: My module gets initialized, and the defined functions can be called. I have never written Thumb assembler before, so please do sanity check it!

jonnor · 2023-08-15T20:17:48Z

Besides testing on device I wrote an automated test. However I am not sure where it should be placed (or if you want such tests), as I could not find any existing unit-tests for functions in mpy_ld.py

Here is the code for the test:

from tools.mpy_ld import asm_jump_thumb
import struct

def play_thumb_bl(instruction, PC=0):
    """Compute the effects of "BL" ARM Thumb instruction on the program counter (PC)

    Based on explanations and pseudocode from
    https://stackoverflow.com/a/70756436/
    """

    bl0, bl1 = struct.unpack('<HH', instruction)

    # check that this is a valid BL instruction
    # Encoding: 1111 HOOO OOOO OOOO
    code0 = (bl0 & 0xF800) >> 11
    assert code0 == 0b11110 # H=0
    code1 = (bl1 & 0xF800) >> 11
    assert code1 == 0b11111 # H=1

    # always forward 1 instruction
    # An offset of 0 means next instruction
    PC += 4

    # effect of first instruction
    offset = bl0 & 0x7FF
    LR = PC + (offset << 12)

    # effect of second instruction
    offset = bl1 & 0x7FF
    PC = LR + (offset << 1)

    return PC


def test_asm_jump_thumb():

    start = 0x1000 # BL only used for more than 11 bit offset
    for offset in range(start, 0xFFFF, 4):

        instruction = asm_jump_thumb(offset)
        pc_change = play_thumb_bl(instruction)

        #print(instruction)
        assert pc_change == offset

test_asm_jump_thumb()

codecov · 2023-08-15T20:21:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.42%. Comparing base (f61fac0) to head (1962597).
Report is 217 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #12241   +/-   ##
=======================================
  Coverage   98.42%   98.42%           
=======================================
  Files         161      161           
  Lines       21251    21253    +2     
=======================================
+ Hits        20917    20919    +2     
  Misses        334      334

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2023-08-15T20:25:09Z

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS

dpgeorge · 2023-09-01T05:52:14Z

Thanks for the contribution.

But I don't think this will work because it's a bl instruction which is branch-and-link. That means it stores the return PC in LR. So I don't see how this code can return to the caller, because LR is overwritten.

I'm not sure how this patch worked in your case?

jonnor · 2023-09-01T18:21:02Z

Hi @dpgeorge and thank you for the review. I have no idea why it would appear to work... This is my first time doing any ARM assembly. Do you have any tips as to what the correct approach would be?

I don't really know where this code is invoked from, which makes it hard a bit for me to reason about. I have also not found any way to get the disassembly for the .mpy files to look at the code produced. I guess I need to bring out a debugger on an ARM Cortex M0 to see what is going on.

dpgeorge · 2023-09-02T07:41:21Z

Looking at assembly output of the rp2 firmware (generated by gcc), you can find these patterns used to do a long jump:

200076f8 <__mbedtls_rsa_import_veneer>:
200076f8:       b401            push    {r0}
200076fa:       4802            ldr     r0, [pc, #8]
200076fc:       4684            mov     ip, r0
200076fe:       bc01            pop     {r0}
20007700:       4760            bx      ip
20007702:       bf00            nop
20007704:       100516cd        .word   0x100516cd

That's 16 bytes, but I think it's the proper way to do it. You might be able to get away doing it in less bytes using something like:

push {r0, lr}
bl <dest>
pop {r0, pc}

That's only 8 bytes. The downside is that it increase the stack usage by 8 bytes, and may also make it harder to debug code due to the unusual stack frame. But I don't think that's a concern here. So maybe try using this 8 byte version.

But note that this version should only be used if necessary. That makes it a bit tricky because you need to know the address/offset of mpy_init early on so you know if the offset to this symbol will fit in short branch, or whether the above long branch will be needed.

jonnor · 2023-09-04T08:38:19Z

Thanks a lot @dpgeorge - I will try something like that, sometime in the next weeks.

Regarding only doing it when necessary, this is already inside a check for how large the offset is. So that should be sufficient?

dpgeorge · 2023-09-04T09:00:57Z

Regarding only doing it when necessary, this is already inside a check for how large the offset is. So that should be sufficient?

Yes, but the issue is that the asm_jump function is called twice, the first time with a dummy value of 8. This is used to determine the offsets of everything. And the second time it's called it must return a bytes that's the same size that it returned the first time. There are ways around this but it's a bit tricky.

jonnor · 2024-07-06T22:16:30Z

Yes, but the issue is that the asm_jump function is called twice, the first time with a dummy value of 8. This is used to determine the offsets of everything. And the second time it's called it must return a bytes that's the same size that it returned the first time. There are ways around this but it's a bit tricky.

I see. In that case, I propose doing the longer jump always. It is only a few bytes more than the short jump. Seeing as this is for code that exists once per module, and is called once (at import time).

jonnor · 2024-07-07T10:33:45Z

I tried doing the jump with the pushing and popping (8 byte version). But first attempts just ended up with things hanging. Will order a hardware debugger, so I can look at this properly.

Where is this jump performed when actually loading the .mpy file? I need to find a decent starting point for my debugging...
I have found the mp_raw_code_load_file() in py/persistentcode.c, that seems to load the .mpy and store it in (executable) memory. And then in do_execute_proto_fun() in py/builtinimport.c seems to be the place where module is executed?
Where mp_make_function_from_proto_fun() and mp_obj_new_fun_native() is used to make a MicroPython function that wraps the executable code? So to the best of my guesses, fun_data is the actual executable code from the .mpy file (the env.full_text when seen from the perspective of build_mpy() in mpy_ld.py)?

Any guidance here would be much appreciated!

dpgeorge · 2024-07-08T01:49:04Z

I tried doing the jump with the pushing and popping (8 byte version). But first attempts just ended up with things hanging.

Maybe try doing the 16-byte version to start with, because we know that works.

Where is this jump performed when actually loading the .mpy file?

When the .mpy is imported (or any .py file is imported) the top-level module code is executed. That's the point where the native code is run, and it just executes the first instruction in the text segment. That first instruction is this jump, which jumps to mpy_init within the native code.

The transfer of control from MicroPython runtime to the native code (the jump) happens via:

do_execute_proto_fun
  mp_call_function_0
    mp_call_function_n_kw
      fun_viper_call
        <jump instruction>
          mpy_init

Using a veneer with the BX instruction All other architectures seems to support larger modules already Fixes exception such as: File "micropython/tools/mpy_ld.py", line 904, in build_mpy jump = env.arch.asm_jump(entry_offset) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "micropython/tools/mpy_ld.py", line 92, in asm_jump_thumb assert b_off >> 11 == 0 or b_off >> 11 == -1, b_off ^^^^^^^^^^^^^^^^^ AssertionError: 6153 Signed-off-by: Jon Nordby <[email protected]>

jonnor · 2024-07-14T23:47:52Z

@dpgeorge thanks a lot for the tips. Being able to trace the code in the debugger made it much more doable to figure this out.

After some fiddling, I seem to have gotten the BX/veneer type of jump to work. It is like the example veneer from GCC you showed above - except that we need to add PC to our offset value to achieve a relative jump.
I will do some more testing on various module sizes etc., clean up the code a bit, and probably update the native mod tests for ARMv6m.

Is it acceptable to do this far jump always? Then the change can be localized to asm_jump_thumb(). Otherwise, we will have to complicate the overall logic of mpy_ld to account for the potentially-changing (as you mentioned in a previous comment). That seems a bit of trouble to save (up to) 12 bytes in module size... But it is up to you as maintainers to decide.

dpgeorge · 2024-07-15T00:28:39Z

After some fiddling, I seem to have gotten the BX/veneer type of jump to work.

Great! The code you pushed here looks a lot better now.

Is it acceptable to do this far jump always?

Well... if you could get the following jump version working IMO that would be better (it's half the size):

push {r0, lr}
bl <dest>
pop {r0, pc}

If that works I'd be happy to have that be used in all cases.

vshymanskyy · 2024-09-06T15:25:40Z

Looks like this is required for WebAssembly .wasm to .mpy conversion: https://github.com/vshymanskyy/wasm2mpy

dpgeorge · 2024-09-08T13:37:18Z

@jonnor do you want to have a go at reducing the size of the jump, based on my comment #12241 (comment) ?

If not, I can do it (and you could test it).

jonnor · 2024-09-08T16:15:51Z

@dpgeorge I would love some help with this. It is quite far outside my area of expertise, and longer stretches of focus time to do this kind of stuff is rare in the upcoming weeks/months. But I should be able to test it :)

dpgeorge · 2024-09-09T00:16:55Z

See #15812 for an alternative to this PR that emits 8 bytes for the jump.

jonnor · 2024-09-14T14:06:54Z

Superseeded by the other MR linked by @dpgeorge

dpgeorge added the tools Relates to tools/ directory in source, or other tooling label Sep 1, 2023

jonnor force-pushed the fix-large-thumb-jump branch from f3191bc to 1962597 Compare July 14, 2024 23:38

dpgeorge mentioned this pull request Aug 30, 2024

tools/mpy_ld.py: Support long entry jumps for armv6m. #15749

Closed

dpgeorge added this to the release-1.24.0 milestone Sep 4, 2024

dpgeorge mentioned this pull request Sep 9, 2024

tools/mpy_ld.py: Support jumping more than 2k on armv6m architectures. #15812

Merged

jonnor closed this Sep 14, 2024

Uh oh!

mpy_ld.py: Support modules larger than 4KiB on armv6m #12241

mpy_ld.py: Support modules larger than 4KiB on armv6m #12241

Uh oh!

Conversation

jonnor commented Aug 15, 2023

Uh oh!

jonnor commented Aug 15, 2023

Uh oh!

codecov bot commented Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpgeorge commented Sep 1, 2023

Uh oh!

jonnor commented Sep 1, 2023

Uh oh!

dpgeorge commented Sep 2, 2023

Uh oh!

jonnor commented Sep 4, 2023

Uh oh!

dpgeorge commented Sep 4, 2023

Uh oh!

jonnor commented Jul 6, 2024

Uh oh!

jonnor commented Jul 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpgeorge commented Jul 8, 2024

Uh oh!

jonnor commented Jul 14, 2024

Uh oh!

dpgeorge commented Jul 15, 2024

Uh oh!

vshymanskyy commented Sep 6, 2024

Uh oh!

dpgeorge commented Sep 8, 2024

Uh oh!

jonnor commented Sep 8, 2024

Uh oh!

dpgeorge commented Sep 9, 2024

Uh oh!

jonnor commented Sep 14, 2024

Uh oh!

Uh oh!

codecov bot commented Aug 15, 2023 •

edited

Loading

github-actions bot commented Aug 15, 2023 •

edited

Loading

jonnor commented Jul 7, 2024 •

edited

Loading