-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
inlineasm: Add inline assembler support for RV32. #15714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #15714 +/- ##
=======================================
Coverage 98.59% 98.59%
=======================================
Files 167 167
Lines 21617 21617
=======================================
Hits 21313 21313
Misses 304 304 ☔ View full report in Codecov by Sentry. |
Code size report:
|
34e4df7
to
a60756a
Compare
@dpgeorge I've resolved the conflict blocking this PR, but now the issue with Arm tests has gotten worse. Before the qemu-arm refactoring it'd take longer than 30 seconds, but would still end. Now it's been stuck for a couple hours already. Am I missing something in |
a60756a
to
6ebf2b6
Compare
It looks like this issue with CI is resolved now? The qemu-arm CI job now passes without issue, it seems. |
Yes, that issue was indeed resolved - however the inlineasm tests for RV32 won't be picked up until #15784 is merged (and until this PR gets rebased once more). |
3bb290e
to
afb0102
Compare
Here's the rebased version taking the merged Running tests for
|
You are right, these boards don't run the tests on master. Looking into it it seems the problems are:
This is unrelated to this PR and so out of scope. But feel free to fix it if you like 😄 |
@dpgeorge fyi, MICROBIT crashes when accessing the function table, which doesn't make much sense. Edit: unless the ROM cannot be read from RAM or some other protection mechanism like that? Adding (xr) to ROM and (xrw) to RAM in the nrf51 linkerscript didn't help anyway. Edit2: yep, attaching a debugger to an emulator that's debugging a binary tells it all. nRF51 needs some MMU magic beforehand.
|
afb0102
to
fcf780d
Compare
This should now work with the changes to the test runner made after 1.24.0. This PR also makes the test runner able to detect if inline assembler support is enabled and if so for which architecture. If so, the |
fcf780d
to
66564ad
Compare
66564ad
to
4aefb36
Compare
4aefb36
to
5ba0bd1
Compare
I'm wondering on what's the policy for new test files after the introduction of Should the test files contained in this PR be moved to that framework (as in, is it a blocking issue) or is it something that can be done at a later stage (not necessarily by myself, either)? |
5ba0bd1
to
356b522
Compare
Using unittest is only for convenience, if it makes writing the test easier. So you don't need to change anything in this PR. |
356b522
to
3026f81
Compare
3026f81
to
8020a5c
Compare
assert(offset_node != NULL && "Offset node pointer is NULL."); | ||
assert(negative != NULL && "Negative pointer is NULL."); | ||
|
||
if (!MP_PARSE_NODE_IS_STRUCT_KIND(node, PN_atom_expr_normal) && !MP_PARSE_NODE_IS_STRUCT_KIND(node, PN_factor_2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation logic can be simplified, in pseudo code:
if MP_PARSE_NODE_IS_STRUCT_KIND(node, PN_factor_2):
if MP_PARSE_NODE_IS_TOKEN_KIND(node_struct->nodes[0], MP_TOKEN_OP_MINUS):
*negative = true;
elif MP_PARSE_NODE_IS_TOKEN_KIND(node_struct->nodes[0], MP_TOKEN_OP_PLUS):
// pass
else:
goto invalid;
node = node_struct->nodes[1]
if !MP_PARSE_NODE_IS_STRUCT_KIND(node, PN_atom_expr_normal):
goto invalid
Note that PN_factor_2
and PN_atom_expr_normal
always have 2 elements so there's no need to check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I ended up doing something similar, but I left the elements count check. I'll take that out.
Right now I'm cleaning up the range checks, as negative offsets are still seen as positive due to the offset node being without its minus sign and you can imagine the rest. Right now there are lots of repeated code chunks for the special cases and I'm merging them together.
When an rv32 inline-asm function is entered into, what registers are automatically saved? Is there anything else done in the prelude? With Thumb inline-asm, a few registers are saved/restored on function entry/exit. This was arguably not a good choice, it might have been a better choice to have a very minimal prelude for inline-asm function. |
The prelude saves the function table pointer and the three local registers. The reason behind that is purely because they're stored in S-registers and those registers must be restored on exit (in other words, A0-A7 and T0-T6 can be trashed as much as you want, but S0-S11 must be preserved). In theory T-registers or even the A4 to A7 could be have been used for those, but then half of the compressed opcodes wouldn't be able to access them. In fact, the emitter just calls For inline assembly code it's up to the user to keep track of RA, SP, TP, GP, and S0-S11, but that's expected of the user in the first place no matter the assembler used. |
89071c8
to
d7b61e4
Compare
OK, this should be it I guess. I wish the offset-to-register syntax support wasn't bolted on this way, but it works :) That can be cleaned up for v2 anyway. |
What do you mean, that you'd prefer it was designed it from the start? |
Indeed, everything else fits in nicely without special cases and exceptions. It's more of a personal preference, really, but re-designing everything to cover that case in a generic way as well would take a bit more time and delay this PR for the 1.25.0 milestone. Anyway, I'll rework that for v2 when adding dotted opcodes support - in the end it works and the code isn't that bad, but it could be better for sure :) |
OK, sounds fine to me. I'm glad that the syntax |
Just to be clear, this is necessary for @native/@viper code generation. But for inline-rv32-asm it's not strictly needed, but done just because inline-rv32-asm is using Would it be possible for inline-rv32-asm to use a different entry/exit set of functions and not save any registers automatically? As you say, it's up to the assembler user to keep track of and restore needed registers. |
d7b61e4
to
9270005
Compare
Done. Maybe when more targets will support |
9270005
to
c3af779
Compare
Apologies for the extra push, it's just a fix for a minor issue I found when reviewing the code for the last time (the check for negative values' range didn't take into account that the negative flag was a pointer), plus some commented test code was removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very good now. I particularly like how the tests suite can now auto-detect the inline-asm capability and arch, that's nice!
I have a few final comments and then it should be good to go in.
Thumb/Thumb2 tests are now into their own subdirectory, as RV32IMC-specific tests will be added as part of the RV32 inline assembler support. Signed-off-by: Alessandro Gatti <[email protected]>
This makes the existing popcount(uint32_t) implementation found in the RV32 emitter available to the rest of the codebase. This version of popcount will use intrinsic or builtin implementations if they are available, falling back to a generic implementation if that is not the case. Signed-off-by: Alessandro Gatti <[email protected]>
c3af779
to
5baa894
Compare
This commit adds support for writing inline assembler functions when targeting a RV32IMC processor. Given that this takes up a bit of rodata space due to its large instruction decoding table and its extensive error messages, it is enabled by default only on offline targets such as mpy-cross and the qemu port. Signed-off-by: Alessandro Gatti <[email protected]>
In certain circumstances depending on the code size, the `deflate_decompress` test fails on both ARM and RV32 with a memory allocation failure error. The issue is mitigated by having a larger GC heap, in this case around 20 KBytes more than the original 100 KBytes default. This commit makes the GC heap size configurable on a per-arch basis, with both ARM and RV32 using the enlarged 120 KBytes heap. Signed-off-by: Alessandro Gatti <[email protected]>
This commit enables by default inline assembly support for the RP2 target when it is operating in RISC-V mode. This brings the feature set when in RISC-V mode to parity with what's available in ARM mode. Signed-off-by: Alessandro Gatti <[email protected]>
This commit implements a method to detect at runtime if inline assembler support is enabled, and if so which platform it targets. This allows clean test runs even on modified version of ARM-based ports where inline assembler support is disabled, running inline assembler tests on ports that have such feature not enabled by default and manually enabled, and allows to always run the correct inlineasm tests for ports that support more than one architecture (esp32, qemu, rp2). Signed-off-by: Alessandro Gatti <[email protected]>
5baa894
to
931a768
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on RPI_PICO2
in RISC-V mode. Works well.
@agatti thanks for your efforts on this, it's a very nice feature to have, especially because RISC-V will become more and more prevalent in MCUs in the future. BTW, if you want to have a go at reducing the code size, you should be able to cut off at least 1k with the following tricks:
|
Oh, I don't think I've ever seen
That's nice - wish I've thought of that :) This will end up as a PR after 1.25.0 is out, so it can be featured in a later version along with eventual fixes for bugs that may be found in the meantime. All things considered, reducing the footprint by 30% or more may probably make this a candidate for being enabled by default on ESP32Cx targets too. I can probably shorten some error messages' text and coalesce some messages as well to further reduce the space taken up by this feature, I'll see what can be done on that. Anyway, thanks for the help on this - glad it's been merged! |
Note that there are configurable error message levels:
The default for most ports is NORMAL. It's possible to provide different error messages for the different levels. But actually I think what you have done already is pretty good, it's a much better user experience with more detailed error messages for tricky things like inline assembler. Maybe it could be tweaked to reduce size by a little, eg I just noticed a few messages end in |
Summary
This commit adds support for writing inline assembler functions when targeting a RV32IMC processor.
An almost complete test suite is also present, covering 95% of the opcodes (leaving out things like breakpoint triggers and the like).
Testing
This was tested both on QEMU-RISCV and on an ESP32-C3 after manually enabling the inline assembler emitter in
mpconfigport.h
.Due to the "interesting" limitations about immediates' size and registers set that vary from opcode to opcode, the code has a lot of checks for invalid input, along with (hopefully) helpful error messages as part of the exceptions it raises. However, unless I'm doing something really wrong, I cannot seem to be able to trap those exceptions in tests. That would be helpful for further validation of range and registers checking.
Trade-offs and Alternatives
As mentioned in the commit message, this bit of code takes up a fair amount of .rodata space (last time I checked it was something like 4.5k) and thus it is not enabled by default unless it's running on anything that's not a microcontroller. Since the Pico2 port has Arm inline assembler support enabled, this is also enabled for said port to maintain feature parity across MCU architectures.
Syntax-wise, there are a few things that may need a bit of explanation
c.
prefix, which I replaced withc_
- so for examplec.srai
turns intoc_srai
.lw a0, 10(a1)
is defined aslw(a0, a1, 10)
. This is to simplify syntax parsing, as I don't think the original syntax can be parsed as such, and should be easier to grasp for folks coming from Arm assembler.li Rd, Imm32
andla Rd, Label
. The first loads the given immediate into the given register using as few bytes as possible, the second loads the address of the given label into the given register.c_mov(Rd, Rs)
has been aliased tomov(Rd, Rs)
.add(a0, a1)
becomingC.ADD A0, A1
orADD(A0, A0, A1)
behind the scenes, oradd(a0, 100)
turning intoADDI A0, A0, 100
).