-
Notifications
You must be signed in to change notification settings - Fork 199
[cmake] Refuse too new or development LLVM versions by default #1805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
I think it would be less confusing to just call it 25.06 at this point :)
Thanks for the heads up. Indeed, this breaks with ROCm 6.2+ (or so) which migrated to Clang 18. But what's the reason to disallow intermediate versions? I understand preventing future versions (even there, a warning would perhaps be more appropriate, IMO; LLVM is not breaking things every release, do they?) People already struggle to install ACpp, why make configuring more complex on known working configurations? I think more useful would be a check that prevents incompatible combinations of ROCm and mainline LLVM compilers, since that is pretty non-trivial to check (if one does not use ROCm LLVM, how does one know it's version to check against the LLVM used?) |
For what I remember minor version being 0 on ROCM LLVM is quite frequent (if not all of them). So I don't know about that check if(${LLVM_VERSION_MINOR} EQUAL 0)
message(SEND_ERROR "This LLVM seems to be a development version; only LLVM releases are tested and supported. Use -DACPP_EXPERIMENTAL_LLVM=ON if you still wish to try your luck.") Could it be feasible to have a specific code path if it is rocm/llvm since most concerns in this PR are in that case ? |
That would mean delaying the release by another month... I guess it depends on what the number means :) Maybe it just denotes the start of the release development cycle, rather than the release point? :)
I would say more often than not, things break in some way with a newer LLVM release. Things got worse post-14 with larger changes that the LLVM community is doing such as opaque pointers and the upcoming untyped The rationale behind preventing intermediate versions is that LLVM is a moving target, and in between intermediate versions, we have no way of determining which APIs are available and which are not. It's only working for you with ROCm LLVM (which typically sits in between releases) because you disable most of the compiler-based functionality.
IMO this PR makes it easier, because it aligns cmake behavior with what we have in our documentation and what is actually tested.
Accidentally building against development versions regularly occurs with users. For example, I strongly believe that this PR could have prevented the AdaptiveCpp issues that @masterleinad encountered and presented in his IWOCL talk. It happens again and again that people just use some clone of
We could special case the situation when ROCm LLVM is targeted and a minimum compiler configuration is requested?
The check is working exactly as intended :) Starting from LLVM version 18, the minor version being 0 indicates that this is a pre-release, development version of LLVM. From experience, it has become clear that it's virtually impossible for an out-of-tree LLVM-based project to robustly support arbitrary development version of LLVM. This only works when you have a fork of LLVM that is always synchronized with LLVM trunk (and even then you're not supporting arbitrary LLVM versions, but only tip-of-tree like e.g. llvm-spirv translator does). So, this PR refusing ROCm LLVM is a consequence of ROCm LLVM typically being in between LLVM releases, and thus potentially breaking many of the LLVM compiler functionality. As I said it might be possible to special case ROCm LLVM so that such a configuration is accepted for builds with minimum compiler compoments enabled (no SSCP, no stdpar, ...) since the less LLVM functionality is used, the less risk of breakage. And if you know what you're doing and you really, really want to build against intermediate version LLVM you can always override the check using Note that even if we special case ROCm LLVM with minimum compiler profile, this does not change the fact that such configurations are still untested! |
Thanks, that sounds reasonable.
Here, I was referring to a different issue, e.g., when one is using mainine LLVM 17 and ROCm 6.4 (based on LLVM 19), which, IIRC, is also not supported.
I'd like to point out that I totally blame AMD for not identifying their version properly in LLVM's CMake integration. But sadly in the end it's usability of ACpp that suffers. At least among our users, there's more people who don't know how what a "compiler" is than people who have random LLVM builds lying around :)
FWIW, we do test them manually for pre-releases. And, unlike any random LLVM commit, checking that ROCm LLVM works is at least feasible to do. |
I've added code that should skip the test if we're in a minimal compiler configuration against ROCm LLVM. The cmake detection of ROCm LLVM is new (we've had some previous code to check ROCm version, but I found it unreliable. E.g. it doesn't work on Arch ROCm packages), so it would be great if you could test whether it works for all your use cases!
Ah right, there are some additional constraints here. For SMCP, unfortunately there is no simple rule that we could "just implement" :( AMD has routinely changed things that can break compatibility with ROCm LLVM and vanilla LLVM (directory layouts, bitcode libraries, code object model, ...). Usually AdaptiveCpp LLVM == ROCm LLVM works, but it really depends on when exactly the required patches from ROCm have landed in upstream LLVM. We'd need to test every single ROCm version against different LLVM versions on an individual basis to figure out what actually works :(
Fair :) From my point of view, this PR is the last one I'd like to get in before release. Once you @al42and give green light, we can release from my side :) EDIT: Looks like CI has already successfully detected that we're trying to build an incompatible compiler (SSCP with LLVM 17 vs ROCm with LLVM 15) :D Although it's fine there because we can only do ROCm compiler testing (not JITting) in Github-hosted CI. Good to see that the detection seems to be doing its job. |
… enforce that for SSCP AdaptiveCpp LLVM <= ROCm LLVM
cb0413d
to
ec1fd0b
Compare
…ompatibility check and use in CI
CMakeLists.txt
Outdated
file(GLOB_RECURSE ROCM_FOUND_FILES FOLLOW_SYMLINKS "${ROCM_PATH}/*") | ||
if ("${CLANG_EXECUTABLE_PATH}" IN_LIST ROCM_FOUND_FILES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a case in mind where traversing the full tree is more robust than
file(GLOB_RECURSE ROCM_FOUND_FILES FOLLOW_SYMLINKS "${ROCM_PATH}/*") | |
if ("${CLANG_EXECUTABLE_PATH}" IN_LIST ROCM_FOUND_FILES) | |
file(REAL_PATH "${CLANG_EXECUTABLE_PATH}" CLANG_REAL_PATH) | |
file(REAL_PATH "${ROCM_PATH}" ROCM_REAL_PATH) | |
cmake_path(IS_PREFIX ROCM_REAL_PATH "${CLANG_REAL_PATH}" NORMALIZE CLANG_IS_IN_ROCM_PATH) |
? I'm worried that enumerating all the files in the tree is going to make some network filesystems very unhappy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that cmake_path
is available from cmake 3.20, but the current minimum cmake version for AdaptiveCpp is set to 3.10. Not sure if we want to bump the minimum cmake version because of this?
I can follow your concern. I think we could be more conservative with the paths that we test. Realistically, we would e.g. never expect to find clang
in the include
subdirectory (which I imagine might be responsible for a large chunk of the many small files that might be challenging for network filesystems). So perhaps it's enough to query bin
, llvm/bin
and lib/llvm/bin
subdirectories?
Such an approach will probably work for the majority of ROCm installations (especially AMD-packaged ones). The case I'm most concerned about though is ROCm installations where ROCm is split up (spack?), or there is no specific ROCm subdirectory because it's just installed into /usr
...I suppose in that case, users could always use -DACPP_EXPERIMENTAL_LLVM=ON
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR to now only scan common bin
directories ($ROCM_PATH/bin
, $ROCM_PATH/llvm/bin
, $ROCM_PATH/lib/llvm/bin
)
@@ -400,6 +446,28 @@ if(BUILD_CLANG_PLUGIN) | |||
set(WITH_ACCELERATED_CPU false) | |||
endif() | |||
endif() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right above (can't comment on unmodified code) we disable some things for LLVM < 15. Since the same limitation applies to ROCm LLVM, should we expand the code to handle USE_ROCM_LLVM there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're suggesting. For the ROCm LLVM/Gromacs use case, we would be in a minimal compiler configuration (i.e. we wouldn't have SSCP, accelerated CPU etc enabled), so therefore this check shouldn't matter?
Or are you suggesting that we also prevent LLVM < 15 for ROCm LLVM with minimal compiler configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would be in a minimal compiler configuration (i.e. we wouldn't have SSCP, accelerated CPU etc enabled), so therefore this check shouldn't matter?
We will be if the user sets it. If not, I get a barrage of conflicting error messages:
$ cmake ../.. -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DLLVM_DIR=/opt/rocm/lib/llvm/lib/cmake/llvm
...
CMake Error at CMakeLists.txt:466 (message):
This LLVM seems to be a development version; only LLVM releases are tested
and supported. Use -DACPP_EXPERIMENTAL_LLVM=ON if you still wish to try
your luck.
-- Using clang include directory: /opt/rocm/lib/llvm/lib/clang/19/include/..
-- Looking for C++ include filesystem
-- Looking for C++ include filesystem - found
-- Performing Test CXX_FILESYSTEM_NO_LINK_NEEDED
-- Performing Test CXX_FILESYSTEM_NO_LINK_NEEDED - Success
CMake Error at src/compiler/llvm-to-backend/CMakeLists.txt:43 (message):
LLVM at /opt/rocm/lib/llvm/lib/cmake/llvm does not have libLLVM.so. Please
disable SSCP and related backends (-DWITH_SSCP_COMPILER=OFF
-DWITH_OPENCL_BACKEND=OFF -DWITH_LEVEL_ZERO_BACKEND=OFF) or choose another
LLVM installation
Call Stack (most recent call first):
src/compiler/llvm-to-backend/CMakeLists.txt:146 (create_llvm_based_library)
If I apply flags recommended in the last one (which is the actual error), the compatibility error does not go away:
$ cmake ../.. -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DLLVM_DIR=/opt/rocm/lib/llvm/lib/cmake/llvm -DWITH_SSCP_COMPILER=OFF -DWITH_OPENCL_BACKEND=OFF -DWITH_LEVEL_ZERO_BACKEND=OFF
...
CMake Error at CMakeLists.txt:466 (message):
This LLVM seems to be a development version; only LLVM releases are tested
and supported. Use -DACPP_EXPERIMENTAL_LLVM=ON if you still wish to try
your luck.
I pushed a fix to our docs to recommend -DACPP_COMPILER_FEATURE_PROFILE=minimal
, but then it, in turn, raises warning with ACpp < 24.06, so not a perfect solution either.
IMO, as long as we're detecting ROCm Clang anyway, I was suggesting to also default to minimal
profile (or a subset of it; e.g., I think stdpar can still work in SMCP mode, right?) in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, as long as we're detecting ROCm Clang anyway, I was suggesting to also default to minimal profile (or a subset of it; e.g., I think stdpar can still work in SMCP mode, right?) in this case.
Not sure how easy this... There is a danger of circular dependencies here (ROCm detection is there to rule out problematic cases for non-minimal profile, but we need to have the ROCm detection to know what to default to).
stdpar works in SMCP, but not in minimal profile because it needs to do fancy compiler trickery to make all allocations GPU-accessible.
FWIW, WITH_SSCP_COMPILER
and friends are all deprecated, and users should migrate to ACPP_COMPILER_FEATURE_PROFILE
anyway. Although I think they should still work, so I'll look into what causes these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@al42and I believe the issue was that we've also been checking for an undocumented variable WITH_REFLECTION_BUILTINS
which also needed to be disabled to achieve parity with minimal feature profile. I've added a commit that treats WITH_REFLECTION_BUILTINS
as something that is internally enabled whenever we enabled SSCP/stdpar/CBS, so that the user doesn't need to worry about it anymore.
For me, the following now works: cmake -DLLVM_DIR=/opt/rocm/llvm/lib/cmake/llvm/ -DWITH_SSCP_COMPILER=OFF -DWITH_STDPAR_COMPILER=OFF -DWITH_ACCELERATED_CPU=OFF ..
Note that all of SSCP, accelerated CPU/CBS and stdpar need to be disabled to achieve something that matches minimal profile.
Does this address your suggestion?
We do not officially support development LLVM versions (only official releases) and currently do not test with LLVM > 20.
However, experience has shown that users regularly miss this restriction in our documentation and then are surprised when something breaks.
To avoid this, this PR adds additional cmake logic to check:
If these checks return that the LLVM version is unsupported, we now emit a cmake error.
Expert users or AdaptiveCpp developers wishing to to work on new LLVM releases can override this check using
-DACPP_EXPERIMENTAL_LLVM=ON
.@al42and I haven't tried with ROCm LLVM, but with this change it's quite likely that ROCm LLVM will be refused by default, so you may have to update your build instructions to include
-DACPP_EXPERIMENTAL_LLVM=ON
.