py/emitnative: Optimise register clearing. #18363
Open
+22
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a new generic ASM API function to clear a register (i.e. clearing all the registers' bits).
The native emitter used to perform a XOR operation to clear a given register, but different platform have more optimised method to achieve the same result taking up less space - either for the generated code or for the code generator itself.
Arm, RV32, X86, and X64 already had an already optimised generator and generated optimised code. The code generator when build for Thumb takes less space generating a constant immediate move rather than a XOR operation, even though both operations would distill down to a single narrow opcode. On Xtensa the situation is almost the same as Thumb, with the exception that a constant immediate move would take one byte less than a XOR operation.
Thumb builds should shrink down by a small amount (QEMU/MPS2_AN385 is 24 bytes smaller, for example), whilst the ESP8266 port should be 16 bytes smaller. QEMU/VIRT_RV32 and ESP32_GENERIC builds should show no size changes.
Testing
The full test suite passed locally on QEMU for SABRELITE, MPS2_AN385, and VIRT_RV32. On my repo's branch CI passes for Unix/x86 and Unix/x64. Xtensa and Xtensawin pass the test suite without additional failures (see Octoprobe run 359).