Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

SumitkCodes
Copy link

Fixes #29622

Problem:
When CFLAGS='-mcpu=power9' is set on PPC64LE systems, NumPy build fails with:
Root Cause:
VSX2 targets were missing -mvsx flag
VSX2 targets incorrectly used VSX3 intrinsics even when compiled for VSX2
Solution:

  1. Add -mvsx flag to VSX2 configuration in meson_cpu/ppc64/meson.build
  2. Add -mvsx flag to VSX2 flags in numpy/distutils/ccompiler_opt.py
  3. Guard VSX3 intrinsics with NPY__CPU_TARGET_VSX3 in numpy/_core/src/common/half.hpp

Fixes numpy#29622: VSX2 targets were missing -mvsx flag and incorrectly using VSX3 intrinsics.

- Add -mvsx to VSX2 configuration
- Guard VSX3 intrinsics with NPY__CPU_TARGET_VSX3
- Resolves build error when CFLAGS='-mcpu=power9' on PPC64LE
@charris charris changed the title Fix PPC64LE build failure with -mcpu=power9 flag BLD: Fix PPC64LE build failure with -mcpu=power9 flag Aug 25, 2025
@charris charris added the 36 - Build Build related PR label Aug 25, 2025
@mattip
Copy link
Member

mattip commented Aug 25, 2025

There are some failing tests around compiler options.

@SumitkCodes
Copy link
Author

@mattip Working on it

The test was expecting '-mcpu=power8' but the fix correctly generates
'-mcpu=power8 -mvsx' for VSX2 targets. Update test expectations to
match the actual (correct) compiler flags being generated.

This resolves the failing tests that were expecting the old format
while the code correctly generates the new format with -mvsx flag.

Fixes the failing tests mentioned in PR numpy#29627.
@SumitkCodes
Copy link
Author

@mattip can you check the changes now

@mattip
Copy link
Member

mattip commented Aug 26, 2025

LGTM. It would be nice if @seiko2plus could chime in.

@mattip mattip added this to the 2.4.0 release milestone Aug 26, 2025
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions. The change does solve a reported issue, but I'm not sure that everything is correct, so some more explanations would be helpful.

@@ -30,7 +30,7 @@ class Half final {
#if defined(NPY_HAVE_FP16)
__m128 mf = _mm_load_ss(&f);
bits_ = _mm_extract_epi16(_mm_cvtps_ph(mf, _MM_FROUND_TO_NEAREST_INT), 0);
#elif defined(NPY_HAVE_VSX3) && defined(NPY_HAVE_VSX_ASM)
#elif defined(NPY_HAVE_VSX3) && defined(NPY_HAVE_VSX_ASM) && defined(NPY__CPU_TARGET_VSX3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained in https://numpy.org/devdocs/reference/simd/how-it-works.html#generating-the-main-configuration-header, these NPY__CPU_TARGET_ defines shouldn't be used. Why is this change necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that VSX2 targets (compiled with -mcpu=power8 -mvsx) were trying to use VSX3 intrinsics, causing compilation errors.
NPY_HAVE_VSX3 tells us the project supports VSX3, but NPY__CPU_TARGET_VSX3 tells us we're currently compiling a VSX3-specific target. We need both to prevent VSX2 targets from using VSX3 intrinsics.
This isn't runtime dispatch it's preventing compilation errors during the build. The macro is only defined when compiling VSX3 targets so it's the right tool for this job.

Copy link
Member

@seiko2plus seiko2plus Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPY__CPU_TARGET_VSX3 is a private #definition and should not be used here.
Filtering out VSX from the regex of VSX3 and VSX4 should fix the build issue.

NPY__CPU_TARGET_VSX3 was originally used internally with distutils for
dispatchable sources, serving as a helper to define implied feature macros.
Since moving to Meson, we explicitly pass implied feature #definitions instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on other words, adding && defined(NPY__CPU_TARGET_VSX3) will always disables this branch.

@@ -13,7 +13,7 @@ if compiler_id == 'clang'
VSX.update(args: ['-mvsx', '-maltivec'])
endif
VSX2 = mod_features.new(
'VSX2', 2, implies: VSX, args: {'val': '-mcpu=power8', 'match': '.*vsx'},
'VSX2', 2, implies: VSX, args: ['-mcpu=power8', '-mvsx'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mvsx is already defined in the VSX feature, and right in this line we have implies: VSX. So this looks like it'll start inserting duplicate flags.

Also, can you explain why 'match': '.*vsx' needs to be dropped?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the 'match': '.*vsx' thinking the pattern matching wasn't working properly for VSX2 targets
The build error suggests VSX2 targets aren't getting the right flags
but maybe the real issue is elsewhere in the inheritance chain. Should I revert this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping 'match': '.*vsx'. It's only needed here alongside the other targets.
The match key is a regex used to filter implied flags. Normally, the compiler
should imply this by default, which is why it’s explicitly defined here.

@seiko2plus seiko2plus self-assigned this Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
36 - Build Build related PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Build failure with -mcpu=power9: error: '__builtin_vsx_vextract_fp_from_shorth' requires the '-mcpu=power9' and '-mvsx' options
5 participants