Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: VSX3 optimizations broken with float16 on big-endian #25178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matoro opened this issue Nov 19, 2023 · 3 comments · Fixed by #25195
Closed

BUG: VSX3 optimizations broken with float16 on big-endian #25178

matoro opened this issue Nov 19, 2023 · 3 comments · Fixed by #25195

Comments

@matoro
Copy link
Contributor

matoro commented Nov 19, 2023

Describe the issue:

Downstream bug: https://bugs.gentoo.org/917544

When compiling for a target supporting VSX3 optimizations such as -mcpu=power9 on big-endian, numpy.float16 is broken. Targets at VSX2 and below, i.e. -mcpu=power8, are fine.

The problem does not reproduce on little-endian at any optimization level.

This is reflected in the tests, which pass at -mcpu=power8 and have 255 failures at -mcpu=power9. Full test logs: build.log

The snippet below is a minimized reproducer which demonstrates the problem.

Unfortunately impossible to compare against older versions, because this is the first working version using meson which even compiles with -mcpu=power9 due to #24789 .

If you don't have it available, I offer free shell access to the machines I used to reproduce this here.

CC @seiko2plus @zeldin

Reproduce the code example:

import numpy as np
np.float16(1.0)
np.float16(1.0)+0.0    # operand is widened before addition
np.float16(1.0)+np.float16(0.0)

Error message:

Python 3.11.5 (main, Oct 21 2023, 17:50:00) [GCC 12.3.1 20230526] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
/usr/lib/python3.11/site-packages/numpy/core/getlimits.py:52: RuntimeWarning: divide by zero encountered in log10
  self.precision = int(-log10(self.eps))
>>> np.float16(1.0)
1.0
>>> np.float16(1.0)+0.0    # operand is widened before addition
1.0
>>> np.float16(1.0)+np.float16(0.0)
0.0

Runtime information:

Python 3.11.5 (main, Aug 28 2023, 05:57:37) [GCC 12.3.1 20230526] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, numpy; print(numpy.__version__); print(sys.version)
1.26.2
3.11.5 (main, Aug 28 2023, 05:57:37) [GCC 12.3.1 20230526]
>>> print(numpy.show_runtime())
WARNING: `threadpoolctl` not found in system! Install it by `pip install threadpoolctl`. Once installed, try `np.show_runtime` again for more detailed build information
[{'numpy_version': '1.26.2',
  'python': '3.11.5 (main, Aug 28 2023, 05:57:37) [GCC 12.3.1 20230526]',
  'uname': uname_result(system='Linux', node='matoro-ppc64dev', release='6.6.1-gentoo-ppc64', version='#1 SMP Wed Nov  8 14:31:08 EST 2023', machine='ppc64')},
 {'simd_extensions': {'baseline': ['VSX', 'VSX2'],
                      'found': ['VSX3'],
                      'not_found': ['VSX4']}}]
None

Context for the issue:

No response

@seiko2plus
Copy link
Member

Thanks for reporting this issue, #25195 should resolve it.

@mattip
Copy link
Member

mattip commented Nov 20, 2023

@matoro could you confirm the fix from #25195 which was merged?

@matoro
Copy link
Contributor Author

matoro commented Nov 20, 2023

@matoro could you confirm the fix from #25195 which was merged?

The original reporter already confirmed it in our bug tracker! https://bugs.gentoo.org/917544#c8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants