Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[cmake] Refuse too new or development LLVM versions by default #1805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

illuhad
Copy link
Collaborator

@illuhad illuhad commented May 5, 2025

We do not officially support development LLVM versions (only official releases) and currently do not test with LLVM > 20.
However, experience has shown that users regularly miss this restriction in our documentation and then are surprised when something breaks.

To avoid this, this PR adds additional cmake logic to check:

  • Whether LLVM is newer than the latest version we have in CI (currently 20)
  • Whether it is a development version (this can be done starting from 18, because LLVM 18 and newer releases have 1 as minor version)

If these checks return that the LLVM version is unsupported, we now emit a cmake error.
Expert users or AdaptiveCpp developers wishing to to work on new LLVM releases can override this check using -DACPP_EXPERIMENTAL_LLVM=ON.

@al42and I haven't tried with ROCm LLVM, but with this change it's quite likely that ROCm LLVM will be refused by default, so you may have to update your build instructions to include -DACPP_EXPERIMENTAL_LLVM=ON.

@illuhad illuhad added this to the 25.02.0 milestone May 5, 2025
@al42and
Copy link
Contributor

al42and commented May 6, 2025

@illuhad illuhad added this to the 25.02.0 milestone May 5, 2025

I think it would be less confusing to just call it 25.06 at this point :)

I haven't tried with ROCm LLVM, but with this change it's quite likely that ROCm LLVM will be refused by default, so you may have to update your build instructions to include -DACPP_EXPERIMENTAL_LLVM=ON.

Thanks for the heads up. Indeed, this breaks with ROCm 6.2+ (or so) which migrated to Clang 18.

But what's the reason to disallow intermediate versions? I understand preventing future versions (even there, a warning would perhaps be more appropriate, IMO; LLVM is not breaking things every release, do they?) People already struggle to install ACpp, why make configuring more complex on known working configurations?

I think more useful would be a check that prevents incompatible combinations of ROCm and mainline LLVM compilers, since that is pretty non-trivial to check (if one does not use ROCm LLVM, how does one know it's version to check against the LLVM used?)

@tdavidcl
Copy link
Contributor

tdavidcl commented May 6, 2025

Thanks for the heads up. Indeed, this breaks with ROCm 6.2+ (or so) which migrated to Clang 18.

For what I remember minor version being 0 on ROCM LLVM is quite frequent (if not all of them). So I don't know about that check

if(${LLVM_VERSION_MINOR} EQUAL 0)
         message(SEND_ERROR "This LLVM seems to be a development version; only LLVM releases are tested and supported. Use -DACPP_EXPERIMENTAL_LLVM=ON if you still wish to try your luck.")

Could it be feasible to have a specific code path if it is rocm/llvm since most concerns in this PR are in that case ?

@illuhad
Copy link
Collaborator Author

illuhad commented May 6, 2025

I think it would be less confusing to just call it 25.06 at this point :)

That would mean delaying the release by another month... I guess it depends on what the number means :) Maybe it just denotes the start of the release development cycle, rather than the release point? :)

But what's the reason to disallow intermediate versions? I understand preventing future versions (even there, a warning would perhaps be more appropriate, IMO; LLVM is not breaking things every release, do they?)

I would say more often than not, things break in some way with a newer LLVM release. Things got worse post-14 with larger changes that the LLVM community is doing such as opaque pointers and the upcoming untyped getelementptr change.
Both 19 and 20 didn't work for a long time, and IIRC 18 too required some patches.

The rationale behind preventing intermediate versions is that LLVM is a moving target, and in between intermediate versions, we have no way of determining which APIs are available and which are not. It's only working for you with ROCm LLVM (which typically sits in between releases) because you disable most of the compiler-based functionality.

People already struggle to install ACpp, why make configuring more complex on known working configurations?

IMO this PR makes it easier, because it aligns cmake behavior with what we have in our documentation and what is actually tested.

  • We already state quite clearly in the documentation that only official LLVM releases are supported
  • We have never tested with non-release LLVM versions, and we don't have them in CI.

Accidentally building against development versions regularly occurs with users. For example, I strongly believe that this PR could have prevented the AdaptiveCpp issues that @masterleinad encountered and presented in his IWOCL talk.

It happens again and again that people just use some clone of intel/llvm or just git clone https://github.com/llvm/llvm-project (without checking out a release branch) and then experience AdaptiveCpp not compiling, compiler crashes, or other problems.
This PR prevents that.

I think more useful would be a check that prevents incompatible combinations of ROCm and mainline LLVM compilers, since that is pretty non-trivial to check (if one does not use ROCm LLVM, how does one know it's version to check against the LLVM used?)

We could special case the situation when ROCm LLVM is targeted and a minimum compiler configuration is requested?

For what I remember minor version being 0 on ROCM LLVM is quite frequent (if not all of them). So I don't know about that check

The check is working exactly as intended :) Starting from LLVM version 18, the minor version being 0 indicates that this is a pre-release, development version of LLVM.
ROCm LLVM branches off from mainline LLVM at random points, and is indeed typically based on a pre-release version of LLVM. Since LLVM is a moving target, this means that its LLVM API support is somewhere between two LLVM versions. In this situation, there's little we can do to figure out whether a specific API that we need and that has been added/removed/changed in LLVM is actually available because we are in an ill-defined intermediate state. Sometimes things still work and can be used for experiments, but we do not have such configurations in CI. As I said, our documentation is quite clear about this.

From experience, it has become clear that it's virtually impossible for an out-of-tree LLVM-based project to robustly support arbitrary development version of LLVM. This only works when you have a fork of LLVM that is always synchronized with LLVM trunk (and even then you're not supporting arbitrary LLVM versions, but only tip-of-tree like e.g. llvm-spirv translator does).

So, this PR refusing ROCm LLVM is a consequence of ROCm LLVM typically being in between LLVM releases, and thus potentially breaking many of the LLVM compiler functionality.

As I said it might be possible to special case ROCm LLVM so that such a configuration is accepted for builds with minimum compiler compoments enabled (no SSCP, no stdpar, ...) since the less LLVM functionality is used, the less risk of breakage. And if you know what you're doing and you really, really want to build against intermediate version LLVM you can always override the check using -DACPP_EXPERIMENTAL_LLVM=ON.

Note that even if we special case ROCm LLVM with minimum compiler profile, this does not change the fact that such configurations are still untested!
To me it makes sense to highlight the fact that the user is doing something that is untested and that may break rather than silently accepting them...

@al42and
Copy link
Contributor

al42and commented May 6, 2025

We could special case the situation when ROCm LLVM is targeted and a minimum compiler configuration is requested?

Thanks, that sounds reasonable.

I think more useful would be a check that prevents incompatible combinations of ROCm and mainline LLVM compilers, since that is pretty non-trivial to check (if one does not use ROCm LLVM, how does one know it's version to check against the LLVM used?)

Here, I was referring to a different issue, e.g., when one is using mainine LLVM 17 and ROCm 6.4 (based on LLVM 19), which, IIRC, is also not supported.

So, this PR refusing ROCm LLVM is a consequence of ROCm LLVM typically being in between LLVM releases, and thus potentially breaking many of the LLVM compiler functionality.

I'd like to point out that I totally blame AMD for not identifying their version properly in LLVM's CMake integration. But sadly in the end it's usability of ACpp that suffers.

At least among our users, there's more people who don't know how what a "compiler" is than people who have random LLVM builds lying around :)

Note that even if we special case ROCm LLVM with minimum compiler profile, this does not change the fact that such configurations are still untested!

FWIW, we do test them manually for pre-releases. And, unlike any random LLVM commit, checking that ROCm LLVM works is at least feasible to do.

@illuhad
Copy link
Collaborator Author

illuhad commented May 6, 2025

Thanks, that sounds reasonable.

I've added code that should skip the test if we're in a minimal compiler configuration against ROCm LLVM. The cmake detection of ROCm LLVM is new (we've had some previous code to check ROCm version, but I found it unreliable. E.g. it doesn't work on Arch ROCm packages), so it would be great if you could test whether it works for all your use cases!

Here, I was referring to a different issue, e.g., when one is using mainine LLVM 17 and ROCm 6.4 (based on LLVM 19), which, IIRC, is also not supported.

Ah right, there are some additional constraints here.
For SSCP, AdaptiveCpp LLVM must be <= ROCm LLVM due to LLVM IR compatibility guarantees between LLVM versions. I've also added an additional cmake check to enforce this now, as this is also something that users routinely stumbled over.

For SMCP, unfortunately there is no simple rule that we could "just implement" :( AMD has routinely changed things that can break compatibility with ROCm LLVM and vanilla LLVM (directory layouts, bitcode libraries, code object model, ...). Usually AdaptiveCpp LLVM == ROCm LLVM works, but it really depends on when exactly the required patches from ROCm have landed in upstream LLVM. We'd need to test every single ROCm version against different LLVM versions on an individual basis to figure out what actually works :(
So from this point of view, I can understand your position that at least building against ROCm LLVM is reliable for SMCP (well, sort of - there can be other issues :) ).

At least among our users, there's more people who don't know how what a "compiler" is than people who have random LLVM builds lying around :)

Fair :)

From my point of view, this PR is the last one I'd like to get in before release. Once you @al42and give green light, we can release from my side :)

EDIT: Looks like CI has already successfully detected that we're trying to build an incompatible compiler (SSCP with LLVM 17 vs ROCm with LLVM 15) :D Although it's fine there because we can only do ROCm compiler testing (not JITting) in Github-hosted CI. Good to see that the detection seems to be doing its job.

… enforce that for SSCP AdaptiveCpp LLVM <= ROCm LLVM
@illuhad illuhad force-pushed the feature/be-more-restrictive-with-llvm-versions branch from cb0413d to ec1fd0b Compare May 6, 2025 23:49
CMakeLists.txt Outdated
Comment on lines 370 to 371
file(GLOB_RECURSE ROCM_FOUND_FILES FOLLOW_SYMLINKS "${ROCM_PATH}/*")
if ("${CLANG_EXECUTABLE_PATH}" IN_LIST ROCM_FOUND_FILES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a case in mind where traversing the full tree is more robust than

Suggested change
file(GLOB_RECURSE ROCM_FOUND_FILES FOLLOW_SYMLINKS "${ROCM_PATH}/*")
if ("${CLANG_EXECUTABLE_PATH}" IN_LIST ROCM_FOUND_FILES)
file(REAL_PATH "${CLANG_EXECUTABLE_PATH}" CLANG_REAL_PATH)
file(REAL_PATH "${ROCM_PATH}" ROCM_REAL_PATH)
cmake_path(IS_PREFIX ROCM_REAL_PATH "${CLANG_REAL_PATH}" NORMALIZE CLANG_IS_IN_ROCM_PATH)

? I'm worried that enumerating all the files in the tree is going to make some network filesystems very unhappy.

Copy link
Collaborator Author

@illuhad illuhad May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that cmake_path is available from cmake 3.20, but the current minimum cmake version for AdaptiveCpp is set to 3.10. Not sure if we want to bump the minimum cmake version because of this?

I can follow your concern. I think we could be more conservative with the paths that we test. Realistically, we would e.g. never expect to find clang in the include subdirectory (which I imagine might be responsible for a large chunk of the many small files that might be challenging for network filesystems). So perhaps it's enough to query bin, llvm/bin and lib/llvm/bin subdirectories?

Such an approach will probably work for the majority of ROCm installations (especially AMD-packaged ones). The case I'm most concerned about though is ROCm installations where ROCm is split up (spack?), or there is no specific ROCm subdirectory because it's just installed into /usr...I suppose in that case, users could always use -DACPP_EXPERIMENTAL_LLVM=ON..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to now only scan common bin directories ($ROCM_PATH/bin, $ROCM_PATH/llvm/bin, $ROCM_PATH/lib/llvm/bin)

@@ -400,6 +446,28 @@ if(BUILD_CLANG_PLUGIN)
set(WITH_ACCELERATED_CPU false)
endif()
endif()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right above (can't comment on unmodified code) we disable some things for LLVM < 15. Since the same limitation applies to ROCm LLVM, should we expand the code to handle USE_ROCM_LLVM there too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're suggesting. For the ROCm LLVM/Gromacs use case, we would be in a minimal compiler configuration (i.e. we wouldn't have SSCP, accelerated CPU etc enabled), so therefore this check shouldn't matter?

Or are you suggesting that we also prevent LLVM < 15 for ROCm LLVM with minimal compiler configurations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would be in a minimal compiler configuration (i.e. we wouldn't have SSCP, accelerated CPU etc enabled), so therefore this check shouldn't matter?

We will be if the user sets it. If not, I get a barrage of conflicting error messages:

$ cmake ../.. -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DLLVM_DIR=/opt/rocm/lib/llvm/lib/cmake/llvm 
...
CMake Error at CMakeLists.txt:466 (message):
  This LLVM seems to be a development version; only LLVM releases are tested
  and supported.  Use -DACPP_EXPERIMENTAL_LLVM=ON if you still wish to try
  your luck.

-- Using clang include directory: /opt/rocm/lib/llvm/lib/clang/19/include/..
-- Looking for C++ include filesystem
-- Looking for C++ include filesystem - found
-- Performing Test CXX_FILESYSTEM_NO_LINK_NEEDED
-- Performing Test CXX_FILESYSTEM_NO_LINK_NEEDED - Success
CMake Error at src/compiler/llvm-to-backend/CMakeLists.txt:43 (message):
  LLVM at /opt/rocm/lib/llvm/lib/cmake/llvm does not have libLLVM.so.  Please
  disable SSCP and related backends (-DWITH_SSCP_COMPILER=OFF
  -DWITH_OPENCL_BACKEND=OFF -DWITH_LEVEL_ZERO_BACKEND=OFF) or choose another
  LLVM installation
Call Stack (most recent call first):
  src/compiler/llvm-to-backend/CMakeLists.txt:146 (create_llvm_based_library)

If I apply flags recommended in the last one (which is the actual error), the compatibility error does not go away:

$ cmake ../.. -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DLLVM_DIR=/opt/rocm/lib/llvm/lib/cmake/llvm -DWITH_SSCP_COMPILER=OFF -DWITH_OPENCL_BACKEND=OFF -DWITH_LEVEL_ZERO_BACKEND=OFF   

...

CMake Error at CMakeLists.txt:466 (message):
  This LLVM seems to be a development version; only LLVM releases are tested
  and supported.  Use -DACPP_EXPERIMENTAL_LLVM=ON if you still wish to try
  your luck.

I pushed a fix to our docs to recommend -DACPP_COMPILER_FEATURE_PROFILE=minimal, but then it, in turn, raises warning with ACpp < 24.06, so not a perfect solution either.

IMO, as long as we're detecting ROCm Clang anyway, I was suggesting to also default to minimal profile (or a subset of it; e.g., I think stdpar can still work in SMCP mode, right?) in this case.

Copy link
Collaborator Author

@illuhad illuhad May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, as long as we're detecting ROCm Clang anyway, I was suggesting to also default to minimal profile (or a subset of it; e.g., I think stdpar can still work in SMCP mode, right?) in this case.

Not sure how easy this... There is a danger of circular dependencies here (ROCm detection is there to rule out problematic cases for non-minimal profile, but we need to have the ROCm detection to know what to default to).

stdpar works in SMCP, but not in minimal profile because it needs to do fancy compiler trickery to make all allocations GPU-accessible.

FWIW, WITH_SSCP_COMPILER and friends are all deprecated, and users should migrate to ACPP_COMPILER_FEATURE_PROFILE anyway. Although I think they should still work, so I'll look into what causes these issues.

Copy link
Collaborator Author

@illuhad illuhad May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@al42and I believe the issue was that we've also been checking for an undocumented variable WITH_REFLECTION_BUILTINS which also needed to be disabled to achieve parity with minimal feature profile. I've added a commit that treats WITH_REFLECTION_BUILTINS as something that is internally enabled whenever we enabled SSCP/stdpar/CBS, so that the user doesn't need to worry about it anymore.

For me, the following now works: cmake -DLLVM_DIR=/opt/rocm/llvm/lib/cmake/llvm/ -DWITH_SSCP_COMPILER=OFF -DWITH_STDPAR_COMPILER=OFF -DWITH_ACCELERATED_CPU=OFF ..

Note that all of SSCP, accelerated CPU/CBS and stdpar need to be disabled to achieve something that matches minimal profile.

Does this address your suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants