-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
py/asmrv32: Add support for Zcmp opcodes in generated code. #18399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
agatti
wants to merge
6
commits into
micropython:master
Choose a base branch
from
agatti:rv32-zcmp
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+156
−26
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit extends the test runner to automatically discover inline
assembler tests for known RV32 extensions, and checks whether to add the
discovered tests to the enabled tests list.
Automatic discovery requires that inline assembler tests for RV32
extensions follow a specific pattern both for filenames and for the
tests' output in case of success. A valid RV32 extension test must
have:
* A code fragment that checks for support of the extension on the
running target in "/tests/feature_check", called
"inlineasm_rv32_<extensionname>.py" that should print the string
"rv32_<extensionname>" if the extension is supported
* A matching expected result file in "/tests/feature_check" called
"inlineasm_rv32_<extensionname>.py.exp" that must contain the string
"rv32_<extensionname>" (without quotes)
* A regular MicroPython test file in "/tests/inlineasm/rv32" called
"asm_ext_<extensionname>.py"
For example, to test the Zba extension, there must be a file called
"/tests/feature_check/inlineasm_rv32_zba.py" that should print the
string "rv32_zba" if the extension is supported, together with a file
called "/test/feature_check/inlineasm_rv32_zba.py.exp" that contains the
string "rv32_zba" in it, and finally there must be a regular MicroPython
test file called "/tests/inlineasm/rv32/asm_ext_zba.py".
Signed-off-by: Alessandro Gatti <[email protected]>
This commit introduces a new optional makefile variable to let the build system know that, when running code, a custom QEMU binary must be used instead of the one provided by the system's PATH. Given that the CI machine won't keep up with QEMU updates unless its base images tracks a new version of QEMU itself, sometimes it is needed to use a custom QEMU build to be able to test new code in an emulated context rather than having to perform on-device testing during development. Signed-off-by: Alessandro Gatti <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18399 +/- ##
=======================================
Coverage 98.38% 98.38%
=======================================
Files 171 171
Lines 22294 22294
=======================================
Hits 21933 21933
Misses 361 361 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Code size report: |
This commit performs the necessary changes to handle an additional RV32 CPU extension flag, for the Zcmp extension in this case. The changes are not limited to RV32-only code, as other parts of the tooling need to be modified for this: the testing framework has to be made aware that an extra bit can be set in sys.implementation._mpy and needs to know how it is called, and "mpy-cross" must be able to actually set that flag bit in the first place via the appropriate command line argument. Signed-off-by: Alessandro Gatti <[email protected]>
This commit introduces the possibility of using Zcmp opcodes when generating function prologues and epilogues, reducing the generated code size. With the addition of selected Zcmp opcodes, each generated function can be up to 30 bytes shorter and having a faster prologue and epilogue. If Zcmp opcodes can be used then register saving is a matter of a simple CM.PUSH opcode rather than a series of C.SWSP opcodes. Conversely, register restoring is a single CM.POPRET opcode instead of a series of C.LWSP opcodes followed by a C.JR RA opcode. This should also lead to faster code given that there's only one opcode doing the registers saving rather than a series of them. For functions that allocate less than three locals then the generated code will allocate up to 12 bytes of unused stack space. Whilst this is a relatively rare occurrence for generated native and viper code, inline assembler blocks will probably incur into this penalty. Still, considering that at the moment the only targets that support Zcmp opcodes are relatively high-end MCUs (the RP2350 in RV32 mode and the ESP32P4), this is probably not much of an issue. Signed-off-by: Alessandro Gatti <[email protected]>
This commit enables support for Zcmp opcodes when the firmware is built for the RP2350 in RV32 mode. The RP2350 explicitly supports the Zcmp extension for reducing the amount of code needed for function prologues and epilogues (see section 3.8.1.20 of the datasheet). Signed-off-by: Alessandro Gatti <[email protected]>
This commit lets the RP2 port build system use the appropriate flags to pass to "mpy-cross" when building frozen MPY files as part of the build process. Now all possible variants (RP2040, RP2350/Arm, and RP2350/RV32) have their right flags assigned, falling back the flags set of the RP2040 if a new variant is introduced. Before these changes all variants would use the RP2040 set of flags which may be a bit of an issue when building code for the RP2350 in RV32 mode. Signed-off-by: Alessandro Gatti <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for Zcmp opcodes to the native and viper emitters. Inline assembler support is not yet present in this PR as I want to get this out of the door first.
Zcmp opcodes (supported by the RP2350 in RV32 mode and by the ESP32P4) add multiple registers PUSH/POP opcodes, like Arm and Thumb, reducing every generated function footprint by a noticeable amount.
As a reference, the prologue and epilogue without Zcmp opcodes for an empty function (
def f(): pass) would take 34 bytes:For comparison, the same prologue and epilogue with Zcmp opcodes becomes this:
Given that the RP2350 supports this RV32 extension natively, it is enabled by default for said port. The ESP32P4 port will need to enable this as well once is ready.
As an added bonus, the RP2 port now has the correct
mpy-crossflags set for each supported variant (-march=armv6mfor the 2040,-march=armv7mfor the 2350 in Arm mode, and-march=rv32imc -march-flags=zba,zcmpfor the 2350 in RV32 mode). There are a couple minor changes but they'll prove themselves more useful when inline assembler support is added for this extension.Testing
Since current QEMU versions do not yet have support for Zcmp opcodes, this has to be tested on device. I've executed the test suite locally on a RP2350 in RV32 mode with the
--via-mpy --emit nativecommand line. In addition to that, Octoprobe run 368 didn't report any error.Incidentally, I've also run this under a modified version of QEMU that added support for
cm.pushandcm.popret, but that's not to be really trusted as it was patched to just get this working.Trade-offs and Alternatives
With my (probably too old) Core-V embecosm toolchain this adds around 500 bytes to the final binary. I'm not sure why the RV32 compiler isn't able to figure out that the non-Zcmp code path shouldn't be brought in. Arm toolchains do not seem to have this problem, so maybe it's my setup at fault here.
Size check should probably be done on a different environment, as the final binary size I get doesn't even match what is reported by the Octoprobe compilation run, for example.