Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@agatti
Copy link
Contributor

@agatti agatti commented Nov 11, 2025

Summary

This PR adds support for Zcmp opcodes to the native and viper emitters. Inline assembler support is not yet present in this PR as I want to get this out of the door first.

Zcmp opcodes (supported by the RP2350 in RV32 mode and by the ESP32P4) add multiple registers PUSH/POP opcodes, like Arm and Thumb, reducing every generated function footprint by a noticeable amount.

As a reference, the prologue and epilogue without Zcmp opcodes for an empty function (def f(): pass) would take 34 bytes:

       0: fd010113      addi    sp, sp, -0x30
       4: c006          c.swsp  ra, 0x0
       6: c222          c.swsp  s0, 0x4
       8: c426          c.swsp  s1, 0x8
       a: c64e          c.swsp  s3, 0xc
       c: c852          c.swsp  s4, 0x10
       e: ca56          c.swsp  s5, 0x14

          // Rest of the function here

      10: 4082          c.lwsp  ra, 0x0
      12: 4412          c.lwsp  s0, 0x4
      14: 44a2          c.lwsp  s1, 0x8
      16: 49b2          c.lwsp  s3, 0xc
      18: 4a42          c.lwsp  s4, 0x10
      1a: 4ad2          c.lwsp  s5, 0x14
      1c: 03010113      addi    sp, sp, 0x30
      20: 8082          c.jr    ra

For comparison, the same prologue and epilogue with Zcmp opcodes becomes this:

       0: bf82          cm.push   {ra, s0-s11}, -64
       2: 1151          c.addi    sp, -0xc

          // Rest of the function here
       
       4: 0131          c.addi    sp, 0xc
       6: bef2          cm.popret {ra, s0-s11}, 64

Given that the RP2350 supports this RV32 extension natively, it is enabled by default for said port. The ESP32P4 port will need to enable this as well once is ready.

As an added bonus, the RP2 port now has the correct mpy-cross flags set for each supported variant (-march=armv6m for the 2040, -march=armv7m for the 2350 in Arm mode, and -march=rv32imc -march-flags=zba,zcmp for the 2350 in RV32 mode). There are a couple minor changes but they'll prove themselves more useful when inline assembler support is added for this extension.

Testing

Since current QEMU versions do not yet have support for Zcmp opcodes, this has to be tested on device. I've executed the test suite locally on a RP2350 in RV32 mode with the --via-mpy --emit native command line. In addition to that, Octoprobe run 368 didn't report any error.

Incidentally, I've also run this under a modified version of QEMU that added support for cm.push and cm.popret, but that's not to be really trusted as it was patched to just get this working.

Trade-offs and Alternatives

With my (probably too old) Core-V embecosm toolchain this adds around 500 bytes to the final binary. I'm not sure why the RV32 compiler isn't able to figure out that the non-Zcmp code path shouldn't be brought in. Arm toolchains do not seem to have this problem, so maybe it's my setup at fault here.

Size check should probably be done on a different environment, as the final binary size I get doesn't even match what is reported by the Octoprobe compilation run, for example.

This commit extends the test runner to automatically discover inline
assembler tests for known RV32 extensions, and checks whether to add the
discovered tests to the enabled tests list.

Automatic discovery requires that inline assembler tests for RV32
extensions follow a specific pattern both for filenames and for the
tests' output in case of success.  A valid RV32 extension test must
have:

  * A code fragment that checks for support of the extension on the
    running target in "/tests/feature_check", called
    "inlineasm_rv32_<extensionname>.py" that should print the string
    "rv32_<extensionname>" if the extension is supported
  * A matching expected result file in "/tests/feature_check" called
    "inlineasm_rv32_<extensionname>.py.exp" that must contain the string
    "rv32_<extensionname>" (without quotes)
  * A regular MicroPython test file in "/tests/inlineasm/rv32" called
    "asm_ext_<extensionname>.py"

For example, to test the Zba extension, there must be a file called
"/tests/feature_check/inlineasm_rv32_zba.py" that should print the
string "rv32_zba" if the extension is supported, together with a file
called "/test/feature_check/inlineasm_rv32_zba.py.exp" that contains the
string "rv32_zba" in it, and finally there must be a regular MicroPython
test file called "/tests/inlineasm/rv32/asm_ext_zba.py".

Signed-off-by: Alessandro Gatti <[email protected]>
This commit introduces a new optional makefile variable to let the build
system know that, when running code, a custom QEMU binary must be used
instead of the one provided by the system's PATH.

Given that the CI machine won't keep up with QEMU updates unless its
base images tracks a new version of QEMU itself, sometimes it is needed
to use a custom QEMU build to be able to test new code in an emulated
context rather than having to perform on-device testing during
development.

Signed-off-by: Alessandro Gatti <[email protected]>
@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.38%. Comparing base (2762fe6) to head (5f1a4e5).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #18399   +/-   ##
=======================================
  Coverage   98.38%   98.38%           
=======================================
  Files         171      171           
  Lines       22294    22294           
=======================================
  Hits        21933    21933           
  Misses        361      361           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link

github-actions bot commented Nov 11, 2025

Code size report:

Reference:  esp32/boards: Add Silicognition ManT1S board definition. [27544a2]
Comparison: rp2/CMakeLists.txt: Set the appropriate mpy-cross flags on all targets. [merge of e2529bc]
  mpy-cross:  +465 +0.123% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

This commit performs the necessary changes to handle an additional RV32
CPU extension flag, for the Zcmp extension in this case.

The changes are not limited to RV32-only code, as other parts of the
tooling need to be modified for this: the testing framework has to be
made aware that an extra bit can be set in sys.implementation._mpy and
needs to know how it is called, and "mpy-cross" must be able to actually
set that flag bit in the first place via the appropriate command line
argument.

Signed-off-by: Alessandro Gatti <[email protected]>
@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Nov 13, 2025
This commit introduces the possibility of using Zcmp opcodes when
generating function prologues and epilogues, reducing the generated code
size.

With the addition of selected Zcmp opcodes, each generated function can
be up to 30 bytes shorter and having a faster prologue and epilogue.  If
Zcmp opcodes can be used then register saving is a matter of a simple
CM.PUSH opcode rather than a series of C.SWSP opcodes.  Conversely,
register restoring is a single CM.POPRET opcode instead of a series of
C.LWSP opcodes followed by a C.JR RA opcode.  This should also lead to
faster code given that there's only one opcode doing the registers
saving rather than a series of them.

For functions that allocate less than three locals then the generated
code will allocate up to 12 bytes of unused stack space.  Whilst this is
a relatively rare occurrence for generated native and viper code,
inline assembler blocks will probably incur into this penalty.  Still,
considering that at the moment the only targets that support Zcmp
opcodes are relatively high-end MCUs (the RP2350 in RV32 mode and the
ESP32P4), this is probably not much of an issue.

Signed-off-by: Alessandro Gatti <[email protected]>
This commit enables support for Zcmp opcodes when the firmware is built
for the RP2350 in RV32 mode.

The RP2350 explicitly supports the Zcmp extension for reducing the
amount of code needed for function prologues and epilogues (see section
3.8.1.20 of the datasheet).

Signed-off-by: Alessandro Gatti <[email protected]>
This commit lets the RP2 port build system use the appropriate flags to
pass to "mpy-cross" when building frozen MPY files as part of the build
process.

Now all possible variants (RP2040, RP2350/Arm, and RP2350/RV32) have
their right flags assigned, falling back the flags set of the RP2040 if
a new variant is introduced.  Before these changes all variants would
use the RP2040 set of flags which may be a bit of an issue when building
code for the RP2350 in RV32 mode.

Signed-off-by: Alessandro Gatti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

py-core Relates to py/ directory in source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants