-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
BLD: Fix PPC64LE build failure with -mcpu=power9 flag #29627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
BLD: Fix PPC64LE build failure with -mcpu=power9 flag #29627
Conversation
Fixes numpy#29622: VSX2 targets were missing -mvsx flag and incorrectly using VSX3 intrinsics. - Add -mvsx to VSX2 configuration - Guard VSX3 intrinsics with NPY__CPU_TARGET_VSX3 - Resolves build error when CFLAGS='-mcpu=power9' on PPC64LE
There are some failing tests around compiler options. |
@mattip Working on it |
The test was expecting '-mcpu=power8' but the fix correctly generates '-mcpu=power8 -mvsx' for VSX2 targets. Update test expectations to match the actual (correct) compiler flags being generated. This resolves the failing tests that were expecting the old format while the code correctly generates the new format with -mvsx flag. Fixes the failing tests mentioned in PR numpy#29627.
@mattip can you check the changes now |
LGTM. It would be nice if @seiko2plus could chime in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions. The change does solve a reported issue, but I'm not sure that everything is correct, so some more explanations would be helpful.
@@ -30,7 +30,7 @@ class Half final { | |||
#if defined(NPY_HAVE_FP16) | |||
__m128 mf = _mm_load_ss(&f); | |||
bits_ = _mm_extract_epi16(_mm_cvtps_ph(mf, _MM_FROUND_TO_NEAREST_INT), 0); | |||
#elif defined(NPY_HAVE_VSX3) && defined(NPY_HAVE_VSX_ASM) | |||
#elif defined(NPY_HAVE_VSX3) && defined(NPY_HAVE_VSX_ASM) && defined(NPY__CPU_TARGET_VSX3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained in https://numpy.org/devdocs/reference/simd/how-it-works.html#generating-the-main-configuration-header, these NPY__CPU_TARGET_
defines shouldn't be used. Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that VSX2 targets (compiled with -mcpu=power8 -mvsx) were trying to use VSX3 intrinsics, causing compilation errors.
NPY_HAVE_VSX3 tells us the project supports VSX3, but NPY__CPU_TARGET_VSX3 tells us we're currently compiling a VSX3-specific target. We need both to prevent VSX2 targets from using VSX3 intrinsics.
This isn't runtime dispatch it's preventing compilation errors during the build. The macro is only defined when compiling VSX3 targets so it's the right tool for this job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPY__CPU_TARGET_VSX3
is a private #definition and should not be used here.
Filtering out VSX
from the regex of VSX3
and VSX4
should fix the build issue.
NPY__CPU_TARGET_VSX3
was originally used internally with distutils for
dispatchable sources, serving as a helper to define implied feature macros.
Since moving to Meson, we explicitly pass implied feature #definitions instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on other words, adding && defined(NPY__CPU_TARGET_VSX3)
will always disables this branch.
@@ -13,7 +13,7 @@ if compiler_id == 'clang' | |||
VSX.update(args: ['-mvsx', '-maltivec']) | |||
endif | |||
VSX2 = mod_features.new( | |||
'VSX2', 2, implies: VSX, args: {'val': '-mcpu=power8', 'match': '.*vsx'}, | |||
'VSX2', 2, implies: VSX, args: ['-mcpu=power8', '-mvsx'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-mvsx
is already defined in the VSX
feature, and right in this line we have implies: VSX
. So this looks like it'll start inserting duplicate flags.
Also, can you explain why 'match': '.*vsx'
needs to be dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the 'match': '.*vsx' thinking the pattern matching wasn't working properly for VSX2 targets
The build error suggests VSX2 targets aren't getting the right flags
but maybe the real issue is elsewhere in the inheritance chain. Should I revert this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping 'match': '.*vsx'. It's only needed here alongside the other targets.
The match
key is a regex used to filter implied flags. Normally, the compiler
should imply this by default, which is why it’s explicitly defined here.
Fixes #29622
Problem:
When
CFLAGS='-mcpu=power9'
is set on PPC64LE systems, NumPy build fails with:Root Cause:
VSX2 targets were missing
-mvsx
flagVSX2 targets incorrectly used VSX3 intrinsics even when compiled for VSX2
Solution:
-mvsx
flag to VSX2 configuration inmeson_cpu/ppc64/meson.build
-mvsx
flag to VSX2 flags innumpy/distutils/ccompiler_opt.py
NPY__CPU_TARGET_VSX3
innumpy/_core/src/common/half.hpp