BLD: PPC issues with new CPU dispatcher #24789

matoro · 2023-09-24T22:15:18Z

Describe the issue:

Collecting here multiple issues I observed that prevent building on PPC with the new CPU dispatcher.

Issue 1:

Building on a POWER9 host which supports VSX3, VSX4 is incorrectly detected as supported.

Host machine cpu family: ppc64
Host machine cpu: ppc64le
Program python found: YES (/usr/bin/python3.11)
Found pkg-config: /usr/bin/pkg-config (2.0.3)
Run-time dependency python found: YES 3.11
Has header "Python.h" with dependency python-3.11: YES
Compiler for C supports arguments -fno-strict-aliasing: YES
Message: Appending option "detect" to "cpu-baseline" due to detecting global architecture c_arg "-mcpu=native"
Test features "VSX VSX2 VSX3 VSX4" : Supported
Message: During parsing cpu-dispatch: The following CPU features were ignored due to platform incompatibility or lack of support:
"XOP FMA4"
Test features "VSX VSX2 VSX3 VSX4" : Supported
Configuring npy_cpu_dispatch_config.h using configuration
Message:
CPU Optimization Options
  baseline:
    Requested : min+detect
    Enabled   : VSX VSX2 VSX3 VSX4
  dispatch:
    Requested : max -xop -fma4
    Enabled   :

Issue 2:

Even in VSX3-conditional code, build fails with an incorrect variable name in ASM code.

[9/297] powerpc64le-unknown-linux-gnu-g++ -Inumpy/core/libnpymath.a.p -Inumpy/core -I../numpy-1.26.0/numpy/core -Inumpy/core/include -I../numpy-1.26.0/numpy/core/include -I../numpy-1.26.0/nump
y/core/src/npymath -I../numpy-1.26.0/numpy/core/src/common -I/usr/include/python3.11 -I/var/tmp/portage/dev-python/numpy-1.26.0/work/numpy-1.26.0-python3_11/meson_cpu -fdiagnostics-color=alway
s -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++17 -O3 -mcpu=power10 -DNPY_HAVE_VSX2 -DNPY_HAVE_VSX -DNPY_HAVE_VSX_ASM -DNPY_HAVE_VSX3 -DNPY_HAVE_VSX4 -DNPY_HAVE_VSX4_MMA -O3 -mcpu=native
 -mtune=native -pipe -fno-strict-aliasing -DNDEBUG -fPIC -MD -MQ numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o -MF numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o.d -o numpy/core
/libnpymath.a.p/src_npymath_halffloat.cpp.o -c ../numpy-1.26.0/numpy/core/src/npymath/halffloat.cpp
FAILED: numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o 
powerpc64le-unknown-linux-gnu-g++ -Inumpy/core/libnpymath.a.p -Inumpy/core -I../numpy-1.26.0/numpy/core -Inumpy/core/include -I../numpy-1.26.0/numpy/core/include -I../numpy-1.26.0/numpy/core/s
rc/npymath -I../numpy-1.26.0/numpy/core/src/common -I/usr/include/python3.11 -I/var/tmp/portage/dev-python/numpy-1.26.0/work/numpy-1.26.0-python3_11/meson_cpu -fdiagnostics-color=always -D_FIL
E_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++17 -O3 -mcpu=power10 -DNPY_HAVE_VSX2 -DNPY_HAVE_VSX -DNPY_HAVE_VSX_ASM -DNPY_HAVE_VSX3 -DNPY_HAVE_VSX4 -DNPY_HAVE_VSX4_MMA -O3 -mcpu=native -mtune=
native -pipe -fno-strict-aliasing -DNDEBUG -fPIC -MD -MQ numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o -MF numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o.d -o numpy/core/libnpym
ath.a.p/src_npymath_halffloat.cpp.o -c ../numpy-1.26.0/numpy/core/src/npymath/halffloat.cpp
In file included from ../numpy-1.26.0/numpy/core/src/common/common.hpp:10,
                 from ../numpy-1.26.0/numpy/core/src/npymath/halffloat.cpp:12:
../numpy-1.26.0/numpy/core/src/common/half.hpp: In member function ‘np::Half::operator double() const’:
../numpy-1.26.0/numpy/core/src/common/half.hpp:116:38: error: ‘vf32’ was not declared in this scope
  116 |                              : "=wa"(vf32)
      |                                      ^~~~
ninja: build stopped: subcommand failed.

Issue 3:

After applying the following patch to fix:

diff --git a/numpy/core/src/common/half.hpp b/numpy/core/src/common/half.hpp
index 054c99859..b85b592bb 100644
--- a/numpy/core/src/common/half.hpp
+++ b/numpy/core/src/common/half.hpp
@@ -113,7 +113,7 @@ class Half final {
     #elif defined(NPY_HAVE_VSX3) && defined(NPY_HAVE_VSX_ASM)
         __vector float vf64;
         __asm__ __volatile__("xvcvhpdp %x0,%x1"
-                             : "=wa"(vf32)
+                             : "=wa"(vf64)
                              : "wa"(vec_splats(bits_)));
         return vec_extract(vf64, 0);
     #else

it would seem that the assembly uses invalid instructions.

[9/297] powerpc64le-unknown-linux-gnu-g++ -Inumpy/core/libnpymath.a.p -Inumpy/core -I../numpy-1.26.0/numpy/core -Inumpy/core/include -I../numpy-1.26.0/numpy/core/include -I../numpy-1.26.0/nump
y/core/src/npymath -I../numpy-1.26.0/numpy/core/src/common -I/usr/include/python3.11 -I/var/tmp/portage/dev-python/numpy-1.26.0/work/numpy-1.26.0-python3_11/meson_cpu -fdiagnostics-color=alway
s -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++17 -O3 -mcpu=power10 -DNPY_HAVE_VSX2 -DNPY_HAVE_VSX -DNPY_HAVE_VSX_ASM -DNPY_HAVE_VSX3 -DNPY_HAVE_VSX4 -DNPY_HAVE_VSX4_MMA -O3 -mcpu=native
 -mtune=native -pipe -fno-strict-aliasing -DNDEBUG -fPIC -MD -MQ numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o -MF numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o.d -o numpy/core
/libnpymath.a.p/src_npymath_halffloat.cpp.o -c ../numpy-1.26.0/numpy/core/src/npymath/halffloat.cpp
FAILED: numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o 
powerpc64le-unknown-linux-gnu-g++ -Inumpy/core/libnpymath.a.p -Inumpy/core -I../numpy-1.26.0/numpy/core -Inumpy/core/include -I../numpy-1.26.0/numpy/core/include -I../numpy-1.26.0/numpy/core/s
rc/npymath -I../numpy-1.26.0/numpy/core/src/common -I/usr/include/python3.11 -I/var/tmp/portage/dev-python/numpy-1.26.0/work/numpy-1.26.0-python3_11/meson_cpu -fdiagnostics-color=always -D_FIL
E_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++17 -O3 -mcpu=power10 -DNPY_HAVE_VSX2 -DNPY_HAVE_VSX -DNPY_HAVE_VSX_ASM -DNPY_HAVE_VSX3 -DNPY_HAVE_VSX4 -DNPY_HAVE_VSX4_MMA -O3 -mcpu=native -mtune=
native -pipe -fno-strict-aliasing -DNDEBUG -fPIC -MD -MQ numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o -MF numpy/core/libnpymath.a.p/src_npymath_halffloat.cpp.o.d -o numpy/core/libnpym
ath.a.p/src_npymath_halffloat.cpp.o -c ../numpy-1.26.0/numpy/core/src/npymath/halffloat.cpp
{standard input}: Assembler messages:
{standard input}:38: Error: unrecognized opcode: `xvcvhpdp'
{standard input}:83: Error: unrecognized opcode: `xvcvdphp'
{standard input}:846: Error: unrecognized opcode: `xvcvdphp'
{standard input}:892: Error: unrecognized opcode: `xvcvhpdp'
ninja: build stopped: subcommand failed.

This is using the following binutils:

$ ld --version
GNU ld (Gentoo 2.41 p2) 2.41.0
Copyright (C) 2023 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.

All three of these issues apply on both big and little endian, as well as regardless of using -mcpu=native or -mcpu=power9. As always, I provide free shell access to the hardware I encountered these issues on if desired.

@seiko2plus @rgommers

Reproduce the code example:

N/A

Error message:

No response

Runtime information:

N/A

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

charris · 2023-09-24T23:33:20Z

Is this a build of the main branch?

matoro · 2023-09-24T23:35:06Z

Is this a build of the main branch?

This is on 1.26.0 release.

mattip · 2023-09-26T08:30:14Z

@seiko2plus thoughts?

matoro · 2023-10-15T20:15:52Z

All issues still present in 1.26.1.

yselkowitz · 2023-11-06T04:29:16Z

Also seeing this in Fedora ELN.

rgommers · 2023-11-06T09:31:26Z

gh-24806 looks close to finished, and the added CI job is green. Not entirely sure why it's still marked as draft, but it may be worth testing to see if it resolves your issues @matoro and @yselkowitz.

matoro · 2023-11-06T15:56:53Z

gh-24806 looks close to finished, and the added CI job is green. Not entirely sure why it's still marked as draft, but it may be worth testing to see if it resolves your issues @matoro and @yselkowitz.

I just attempted it, but the PR is targeting 2.0 branch and is incompatible with 1.26 branch, so I can't test it.

seiko2plus · 2023-11-07T02:08:03Z

@matoro, My apologies for delayed response, here's a backport for it #25083

matoro · 2023-11-07T02:15:56Z

@matoro, My apologies for delayed response, here's a backport for it #25083

Works for me here on POWER9, both LE and BE. Thanks!

matoro added the 00 - Bug label Sep 24, 2023

seiko2plus mentioned this issue Sep 26, 2023

BUG: Fix build on ppc64 when the baseline set to Power9 or higher #24806

Merged

mattip mentioned this issue Nov 7, 2023

BUG: Backport fix build on ppc64 when the baseline set to Power9 or higher #25083

Merged

charris closed this as completed in #24806 Nov 7, 2023

matoro mentioned this issue Nov 19, 2023

BUG: VSX3 optimizations broken with float16 on big-endian #25178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BLD: PPC issues with new CPU dispatcher #24789

BLD: PPC issues with new CPU dispatcher #24789

matoro commented Sep 24, 2023

charris commented Sep 24, 2023

Uh oh!

matoro commented Sep 24, 2023

Uh oh!

mattip commented Sep 26, 2023

Uh oh!

matoro commented Oct 15, 2023

Uh oh!

yselkowitz commented Nov 6, 2023

Uh oh!

rgommers commented Nov 6, 2023

Uh oh!

matoro commented Nov 6, 2023

Uh oh!

seiko2plus commented Nov 7, 2023

Uh oh!

matoro commented Nov 7, 2023

Uh oh!

Uh oh!

BLD: PPC issues with new CPU dispatcher #24789

BLD: PPC issues with new CPU dispatcher #24789

Comments

matoro commented Sep 24, 2023

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

charris commented Sep 24, 2023

Uh oh!

matoro commented Sep 24, 2023

Uh oh!

mattip commented Sep 26, 2023

Uh oh!

matoro commented Oct 15, 2023

Uh oh!

yselkowitz commented Nov 6, 2023

Uh oh!

rgommers commented Nov 6, 2023

Uh oh!

matoro commented Nov 6, 2023

Uh oh!

seiko2plus commented Nov 7, 2023

Uh oh!

matoro commented Nov 7, 2023

Uh oh!