-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
mpy_ld.py: Support modules larger than 4KiB on armv6m #12241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Besides testing on device I wrote an automated test. However I am not sure where it should be placed (or if you want such tests), as I could not find any existing unit-tests for functions in mpy_ld.py Here is the code for the test: from tools.mpy_ld import asm_jump_thumb
import struct
def play_thumb_bl(instruction, PC=0):
"""Compute the effects of "BL" ARM Thumb instruction on the program counter (PC)
Based on explanations and pseudocode from
https://stackoverflow.com/a/70756436/
"""
bl0, bl1 = struct.unpack('<HH', instruction)
# check that this is a valid BL instruction
# Encoding: 1111 HOOO OOOO OOOO
code0 = (bl0 & 0xF800) >> 11
assert code0 == 0b11110 # H=0
code1 = (bl1 & 0xF800) >> 11
assert code1 == 0b11111 # H=1
# always forward 1 instruction
# An offset of 0 means next instruction
PC += 4
# effect of first instruction
offset = bl0 & 0x7FF
LR = PC + (offset << 12)
# effect of second instruction
offset = bl1 & 0x7FF
PC = LR + (offset << 1)
return PC
def test_asm_jump_thumb():
start = 0x1000 # BL only used for more than 11 bit offset
for offset in range(start, 0xFFFF, 4):
instruction = asm_jump_thumb(offset)
pc_change = play_thumb_bl(instruction)
#print(instruction)
assert pc_change == offset
test_asm_jump_thumb() |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12241 +/- ##
=======================================
Coverage 98.42% 98.42%
=======================================
Files 161 161
Lines 21251 21253 +2
=======================================
+ Hits 20917 20919 +2
Misses 334 334 ☔ View full report in Codecov by Sentry. |
Code size report:
|
Thanks for the contribution. But I don't think this will work because it's a I'm not sure how this patch worked in your case? |
Hi @dpgeorge and thank you for the review. I have no idea why it would appear to work... This is my first time doing any ARM assembly. Do you have any tips as to what the correct approach would be? I don't really know where this code is invoked from, which makes it hard a bit for me to reason about. I have also not found any way to get the disassembly for the .mpy files to look at the code produced. I guess I need to bring out a debugger on an ARM Cortex M0 to see what is going on. |
Looking at assembly output of the rp2 firmware (generated by gcc), you can find these patterns used to do a long jump:
That's 16 bytes, but I think it's the proper way to do it. You might be able to get away doing it in less bytes using something like:
That's only 8 bytes. The downside is that it increase the stack usage by 8 bytes, and may also make it harder to debug code due to the unusual stack frame. But I don't think that's a concern here. So maybe try using this 8 byte version. But note that this version should only be used if necessary. That makes it a bit tricky because you need to know the address/offset of |
Thanks a lot @dpgeorge - I will try something like that, sometime in the next weeks. Regarding only doing it when necessary, this is already inside a check for how large the offset is. So that should be sufficient? |
Yes, but the issue is that the |
I see. In that case, I propose doing the longer jump always. It is only a few bytes more than the short jump. Seeing as this is for code that exists once per module, and is called once (at import time). |
I tried doing the jump with the pushing and popping (8 byte version). But first attempts just ended up with things hanging. Will order a hardware debugger, so I can look at this properly. Where is this jump performed when actually loading the .mpy file? I need to find a decent starting point for my debugging... Any guidance here would be much appreciated! |
Maybe try doing the 16-byte version to start with, because we know that works.
When the .mpy is imported (or any .py file is imported) the top-level module code is executed. That's the point where the native code is run, and it just executes the first instruction in the text segment. That first instruction is this jump, which jumps to The transfer of control from MicroPython runtime to the native code (the jump) happens via:
|
Using a veneer with the BX instruction All other architectures seems to support larger modules already Fixes exception such as: File "micropython/tools/mpy_ld.py", line 904, in build_mpy jump = env.arch.asm_jump(entry_offset) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "micropython/tools/mpy_ld.py", line 92, in asm_jump_thumb assert b_off >> 11 == 0 or b_off >> 11 == -1, b_off ^^^^^^^^^^^^^^^^^ AssertionError: 6153 Signed-off-by: Jon Nordby <[email protected]>
f3191bc
to
1962597
Compare
@dpgeorge thanks a lot for the tips. Being able to trace the code in the debugger made it much more doable to figure this out. After some fiddling, I seem to have gotten the BX/veneer type of jump to work. It is like the example veneer from GCC you showed above - except that we need to add PC to our offset value to achieve a relative jump. Is it acceptable to do this far jump always? Then the change can be localized to asm_jump_thumb(). Otherwise, we will have to complicate the overall logic of mpy_ld to account for the potentially-changing (as you mentioned in a previous comment). That seems a bit of trouble to save (up to) 12 bytes in module size... But it is up to you as maintainers to decide. |
Great! The code you pushed here looks a lot better now.
Well... if you could get the following jump version working IMO that would be better (it's half the size):
If that works I'd be happy to have that be used in all cases. |
Looks like this is required for WebAssembly |
@jonnor do you want to have a go at reducing the size of the jump, based on my comment #12241 (comment) ? If not, I can do it (and you could test it). |
@dpgeorge I would love some help with this. It is quite far outside my area of expertise, and longer stretches of focus time to do this kind of stuff is rare in the upcoming weeks/months. But I should be able to test it :) |
See #15812 for an alternative to this PR that emits 8 bytes for the jump. |
Superseeded by the other MR linked by @dpgeorge |
Hit a problem with modules larger than 4 KiB when working on dynamic native modules for RP2040. This should fix that for ARM Thumb targets, using the similar strategy used for ARM Thumb 2 targets.
Have tested it on RP2040, and it seems to work: My module gets initialized, and the defined functions can be called. I have never written Thumb assembler before, so please do sanity check it!