Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[openmp] Add initial ARM64EC support #138769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented May 6, 2025

Treat ARM64EC as an AArch64 target for the most part. Since the vararg calling convention differs, add a dedicated implementation based on the aarch64 variant. The differences include:

  • Only the first four parameters are passed via registers
  • x4 contains a pointer to the first stack argument
  • x5 contains the size of arguments passed via the stack
  • The call is preceded by a call to __os_arm64x_check_icall, which optionally sets up an indirect call via an exit thunk if the target is x86_64 code

@cjacek cjacek requested a review from mstorsjo May 6, 2025 21:49
@llvmbot llvmbot added the openmp:libomp OpenMP host runtime label May 6, 2025
Copy link

github-actions bot commented May 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Treat ARM64EC as an AArch64 target for the most part. Since the vararg calling convention
differs, add a dedicated implementation based on the AArch64 variant. The differences include:

- Only the first four parameters are passed via registers
- x4 contains a pointer to the first stack argument
- x5 contains the size of arguments passed via the stack
- The call is preceded by a call to __os_arm64x_check_icall, which optionally sets up an indirect
  call via an exit thunk if the target is x86_64 code
@cjacek cjacek force-pushed the arm64ec-openmp branch from 83e12ad to af3bac5 Compare May 6, 2025 22:06
@@ -17,6 +17,8 @@ function(libomp_get_architecture return_arch)
set(detect_arch_src_txt "
#if defined(__KNC__)
#error ARCHITECTURE=mic
#elif defined(__aarch64__) || defined(_M_ARM64) || defined(__arm64ec__) || defined(_M_ARM64EC)
#error ARCHITECTURE=aarch64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should leave a comment here to say that arm64ec also has x86_64 defined, and therefore has to be ordered here before that (so nobody tries to reorder it back later)?

Then again, if we add a comment here we should probably do the same in kmp_platform.h below too. So perhaps it's not necessary; once things are working well enough we'll probably build this regularly enough anyway, so we'd catch any breakage that way.

@@ -159,6 +159,7 @@ KMP_PREFIX_UNDERSCORE($0):
.globl KMP_PREFIX_UNDERSCORE(\proc)
KMP_PREFIX_UNDERSCORE(\proc):
.endm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Stray change

b KMP_LABEL(kmp_0)
KMP_LABEL(kmp_1):
adrp x10, $iexit_thunk$cdecl$v$varargs
add x10, x10, :lo12:$iexit_thunk$cdecl$v$varargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which object file provides this symbol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of .cpp files from openmp generate this exit thunk, for example for __kmp_fatal guest exit thunk.


sub w9, w9, #1
cbz w9, KMP_LABEL(kmp_1)
ldr x3, [x10, #8]!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calling convention details are a bit unclear to me here; we're accepting arguments to this function up to x4 in registers to this function, but for the called function we only pass up to x3 in registers and the rest on the stack? Is this something that the thunks and __os_arm64x_check_icall sort out if the called function is arm64ec, reloading the later arguments into registers? Or is this that the called function has a different calling convention as a variadic function?

I'm a little unsure about this aspect, since I'm not sure if the called function really is a variadic function; I have a faint memory that this function is used as adapter for calling non-variadic functions with a varying number of arguments. Did you manage to test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__os_arm64x_check_icall can’t resolve the target correctly if it’s ARM64EC. In that case, it leaves the target address unchanged and we call the function directly. As you noted, this means that if the target isn’t actually a varargs function, it will misinterpret the arguments.

If we want to support such cases, ARM64EC alone likely won’t suffice. We may need to build x86_64 assembly instead. If we do that and the target is ARM64EC, control will return from the emulator to the entry chunk, which should handle it correctly. The downside is that it's a bit tricky for the build system to handle. I’ll investigate this further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another aspect overall here, is that openmp involves a lot of compiler generated calls to the helper functions; I'm curious about whether all of that really works out correctly without any further touches anywhere; both for the simple case of everything compiled as arm64ec (which would be fine if as such as a first step, if that works). Mixing in object files built as x86_64 would be the advanced case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomp OpenMP host runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants