-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[openmp] Add initial ARM64EC support #138769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Treat ARM64EC as an AArch64 target for the most part. Since the vararg calling convention differs, add a dedicated implementation based on the AArch64 variant. The differences include: - Only the first four parameters are passed via registers - x4 contains a pointer to the first stack argument - x5 contains the size of arguments passed via the stack - The call is preceded by a call to __os_arm64x_check_icall, which optionally sets up an indirect call via an exit thunk if the target is x86_64 code
@@ -17,6 +17,8 @@ function(libomp_get_architecture return_arch) | |||
set(detect_arch_src_txt " | |||
#if defined(__KNC__) | |||
#error ARCHITECTURE=mic | |||
#elif defined(__aarch64__) || defined(_M_ARM64) || defined(__arm64ec__) || defined(_M_ARM64EC) | |||
#error ARCHITECTURE=aarch64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should leave a comment here to say that arm64ec also has x86_64 defined, and therefore has to be ordered here before that (so nobody tries to reorder it back later)?
Then again, if we add a comment here we should probably do the same in kmp_platform.h
below too. So perhaps it's not necessary; once things are working well enough we'll probably build this regularly enough anyway, so we'd catch any breakage that way.
@@ -159,6 +159,7 @@ KMP_PREFIX_UNDERSCORE($0): | |||
.globl KMP_PREFIX_UNDERSCORE(\proc) | |||
KMP_PREFIX_UNDERSCORE(\proc): | |||
.endm | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Stray change
b KMP_LABEL(kmp_0) | ||
KMP_LABEL(kmp_1): | ||
adrp x10, $iexit_thunk$cdecl$v$varargs | ||
add x10, x10, :lo12:$iexit_thunk$cdecl$v$varargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which object file provides this symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of .cpp files from openmp generate this exit thunk, for example for __kmp_fatal
guest exit thunk.
|
||
sub w9, w9, #1 | ||
cbz w9, KMP_LABEL(kmp_1) | ||
ldr x3, [x10, #8]! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calling convention details are a bit unclear to me here; we're accepting arguments to this function up to x4
in registers to this function, but for the called function we only pass up to x3
in registers and the rest on the stack? Is this something that the thunks and __os_arm64x_check_icall
sort out if the called function is arm64ec, reloading the later arguments into registers? Or is this that the called function has a different calling convention as a variadic function?
I'm a little unsure about this aspect, since I'm not sure if the called function really is a variadic function; I have a faint memory that this function is used as adapter for calling non-variadic functions with a varying number of arguments. Did you manage to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__os_arm64x_check_icall
can’t resolve the target correctly if it’s ARM64EC. In that case, it leaves the target address unchanged and we call the function directly. As you noted, this means that if the target isn’t actually a varargs function, it will misinterpret the arguments.
If we want to support such cases, ARM64EC alone likely won’t suffice. We may need to build x86_64 assembly instead. If we do that and the target is ARM64EC, control will return from the emulator to the entry chunk, which should handle it correctly. The downside is that it's a bit tricky for the build system to handle. I’ll investigate this further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another aspect overall here, is that openmp involves a lot of compiler generated calls to the helper functions; I'm curious about whether all of that really works out correctly without any further touches anywhere; both for the simple case of everything compiled as arm64ec (which would be fine if as such as a first step, if that works). Mixing in object files built as x86_64 would be the advanced case...
Treat ARM64EC as an AArch64 target for the most part. Since the vararg calling convention differs, add a dedicated implementation based on the aarch64 variant. The differences include:
__os_arm64x_check_icall
, which optionally sets up an indirect call via an exit thunk if the target is x86_64 code