Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC: First approach to add target specific intriniscs for gfx90a targets #1796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

sbalint98
Copy link
Collaborator

This MR is an initial experimental approach to add target specific sscp builtins. In particular the hip unsafe atomics is exposed through the hipsycl::sycl::detail::__acpp_unsafe_atomic_fetch_add function. It could be used by calling into AdaptiveCpp details the following way:

q.parallel_for(a.size(), [=](sycl::id<1> idx){
    constexpr auto global_adress_space = hipsycl::sycl::access::address_space::global_space;
    constexpr auto global_memory_scope = hipsycl::sycl::memory_scope::device;
    hipsycl::sycl::detail::__acpp_unsafe_atomic_fetch_add<global_adress_space>(&dev_a[0], 4.5f, relaxed_memory_order, global_memory_scope);
  });

@illuhad
Copy link
Collaborator

illuhad commented May 5, 2025

Why do we need a new bitcode file? Could we just not implement the unsafe atomic add in the existing one with some JIT reflection?

@sbalint98
Copy link
Collaborator Author

Unfortunately, clang will choke on these builtins if there is no appropriate -mcpu specified.

Compiling it without specifying gfx90a target arch results in the following error:

/home/soproni/Projects/AdaptiveCpp/src/libkernel/sscp/amdgpu/atomic_gfx90a.cpp:18:10: error: '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature gfx90a-insts
   18 |   return __builtin_amdgcn_global_atomic_fadd_f64(ptr, x);
      |         

Which is as far as I can tell is due to checking the sub-target compatibility of the builtin by the frontend here:
https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L3190

There are some exceptions made for compiling when targeting --hipstdpar however passing this when compiling the bitcode library results in an error about amdgcn not being a valid target for host compilation. Even after adding -Xclang -fcuda-is-device to the the compilation arguments the same issue persists. I am not sure what --hipstdpar changes that results in this new behavior. Do you think it make sense to dig further into the LLVM source to figure out if we could trick the fronted to emitting IR for these builtins?

Additionally the --hipstdpar flag has been only merged into llvm 18.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants