BUG: Fixes for building on Cygwin. #18102

charris · 2020-12-31T23:39:02Z

Rebase of #16246.

Changes needed to get numpy to compile on Cygwin, then a few changes to try to reduce the number of test failures due to extension module overlap, then marking differences I think are due to floating-point implementation or fork() failures.

Someone should probably check whether the list of ignored floating-point failures is reasonable.
Most of those have been around since 1.16 a while back.

Rebased to get all the commits gathered together and on top of master. Should probably all be squashed at some point,
this is for review.

@DWesl ping.

Leaving it later can lead to some headers getting confused, because they're included with one set of feature flags, but the headers they depend on were included with a different set of feature flags.

…rebase. Cygwin needs each DLL to have a unique address for fork() emulation to work. I'm hoping that calling rebase on a complete list of modules compiled in this session, plus those installed globally, will allow the test suite to get an accurate result for more tests.

… on cygwin. Most likely I should be testing for newlib (the C runtime, roughly takes the place of glibc, I think), not cygwin (the emulation layer on Windows), but I have no idea how to do that from within python. If someone working on embedded systems runs into this issue, this hopefully gives them some idea where to start.

…win. See two commits back for an explanation of why fork() fails on cygwin. Alternately, see the much better explanation at: https://cygwin.com/cygwin-ug-net/highlights.html#ov-hi-process with hints on error messages and workarounds at: https://cygwin.com/faq.html#faq.using.fixing-fork-failures

These changes made some sense when compiling only for cygwin. They do not belong in NumPy master.

I don't know if the list should be the same as one of the Windows lists, or if this should instead be a Newlib list. I know how to make a Cygwin list, so I did that. It didn't always work, though.

The generic "this doesn't work" was confusing people. Specifying what "this" is and what happed works better.

Mark tests suddenly passing, and also document how I expect them to fail.

This was actually making the situation worse, I think because I forgot to include the NumPy dlls themselves. Removing this made for many fewer fork() failures when I tried it. `python3 -m pip show numpy --files | grep dll` will give a list of dlls, with at least a relative path. This can be done in python with subprocess (will require adding pip as a runtime dependency), but I'm not sure that's in scope here.

These should primarily be on test functions.

Unconditional access to np.complex256 triggers an error in test collection on platforms where the dtype does not exist. Find another way to test the dtype for equality.

I changed this for the one test, but forgot to do so for the other. Hopefully this actually fixes the problem.

Presumably whoever wrote this code had reasons for leaving this out, and making a separate class for `gfortran`

Including both 'gfortran' and '/usr/bin/gfortran' should be redundant, and I'm not sure even my platform ever needed this. Remove it.

Short-circuiting is lovely.

Workaround for GCC bug 65782. Fixes numpy issue 14787.

I forgot the function returned bytes, not str. I also accidentally passed an extra argument.

The problem seems to have gone away on 64-bit.

It may be worth broadening the platform list, since I believe the error this tries to fix is in the MS ABI/CRT rather than in the Cygwin runtime.

Allows more precise xfail.

I think these are the tests where the "branch cut code doesn't use sign of zero" became relevant.

This was trying to ensure the definition of symbols that depend on feature macros set in "Python.h". That issue seems to have resolved itself.

I think I saw a test failure because the generated tempfile name contained the string "mkl". I think what the code is trying to do is to change the "[mkl]" section to a "[DEFAULT]" section without changing any of the contents of that section. I think this change should do the trick.

The implementation currently segfaults. I'll report this to the developers soon; hopefully it can be re-enabled soon.

32-bit modfl appears to work fine. Problem exists in Cygwin 3.1.5 to 3.1.7 and should be fixed in 3.2.0.

This reverts commit 6ffbcb2. Replaced by 17548.

This reverts commit d9c21bb. Replaced by numpy#17548.

This reverts commit 92eaff3. Replaced by numpy#17548.

charris · 2020-12-31T23:44:38Z

A lot of the fixes look to be disabling tests, which seems not quite right. Ideally the tests would pass without fixing. Its might be that we should just wait for Cygwin to improve or stop supporting it.

DWesl · 2021-01-01T16:45:41Z

A lot of those xfail marks are for fork() failures on 32-bit Cygwin, which the Cygwin maintainers have been discouraging people from using for a year or so now because there are so many, so just dropping those would work for me. If that becomes a problem on 64-bit again I can mark the failing tests then.

The edits to npy_common.h to mark which floating functions have trouble should probably stay in, and the note that the Windows dynamic loader doesn't really have a concept of rpath should probably also stay. I'm fine with keeping the list of floating-point tests that give weird results as a local patch or note to self.

Should I start the PR over? Proposed commit order:

Change some tests with loops to use pytest.mark.parametrize
The test_system_into patch to avoid weird behavior with certain tempfile names
Tell fcompiler.gnu that Windows doesn't do rpath
Tell npy_common.h which functions numpy should use its own versions for

The first two might work better as separate PRs.

charris · 2021-01-01T16:55:59Z

Should I start the PR over? Proposed commit order:

You are in a better position to carry this forward than I am. How would you like to proceed?

DWesl · 2021-01-01T16:56:05Z

The added compiler flags to avoid a ufunc crash got merged in another PR, and my fork problems on 64-bit seem to have resolved themselves, so those portions of the PR are gone now, and there's not terribly much left.

DWesl · 2021-01-01T17:01:31Z

Should I start the PR over? Proposed commit order:

You are in a better position to carry this forward than I am. How would you like to proceed?

The first two changes in that list aren't really connected to the others, so I should probably separate those into different PRs focused on just those. I'm not sure if the Cygwin-specific changes would be better handled on this PR or as a new PR or what.

charris · 2021-01-01T17:15:39Z

@DWesl I would prefer to close this and deal with new PRs. I want to know what would be most convenient for you.

DWesl · 2021-01-03T16:07:48Z

The Test_SIMD_ALL_256_FMA3__AVX2_[usf]{8,16,32,64} tests are segfaulting, as are the Test_SIMD_FP_256_FMA3__AVX2_f{32,64} tests and the Test_SIMD_INT_256_FMA3__AVX2_[us]{8,16,32,64} tests and the Test_SIMD_BOOL_256_FMA3__AVX2_b{8,16,32,64} tests. I'll try to figure out how to skip those so I can avoid using pytest-forked, which makes the tests really slow.

I should probably also figure out why they're segfaulting at some point. /proc/cpuinfo mentions AVX2 and FMA but not FMA3, and that roughly exhausts my knowledge of how to debug the crashes.

rgommers · 2021-01-24T10:25:29Z

I should probably also figure out why they're segfaulting at some point. /proc/cpuinfo mentions AVX2 and FMA but not FMA3, and that roughly exhausts my knowledge of how to debug the crashes.

@seiko2plus or @mattip any advice here on how to debug?

seiko2plus · 2021-01-24T20:47:06Z

@DWesl, we use the term FMA3 instead of FMA to avoid any confusion with AMD/FMA4.

@rgommers,

any advice here on how to debug?

Testing CPU detecting mechanism -> python runtests.py -t numpy/core/tests/test_cpu_features.py
NOTE: You will have to patch test_cpu_features.py to enables Cygwin.
Testing _SIMD module itself without any dispatched features:
- python runtests.py --cpu-dispatch="none" -t numpy/core/tests/test_simd_module.py
- python runtests.py --cpu-dispatch="none" -t numpy/core/tests/test_simd.py
Testing _SIMD module with certain dispatched features, e.g. sse41
python runtests.py --simd-test="baseline sse41" -t numpy/core/tests/test_simd.py
Checking the build log, and finally GDB or LLDB

DWesl · 2021-01-31T21:35:05Z

@DWesl, we use the term FMA3 instead of FMA to avoid any confusion with AMD/FMA4.

@rgommers,

any advice here on how to debug?
* Testing CPU detecting mechanism -> `python runtests.py -t numpy/core/tests/test_cpu_features.py`
  NOTE: You will have to patch `test_cpu_features.py` to enables `Cygwin`.

python runtests.py -t numpy/core/tests/test_cpu_features.py
Building, see build.log...
Build OK
NumPy version 1.21.0.dev0+393.g0386777c3
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
.ss                                                            [100%]
1 passed, 2 skipped in 0.08s

* Testing `_SIMD` module itself without any dispatched features:
  
  * `python runtests.py --cpu-dispatch="none" -t numpy/core/tests/test_simd_module.py`

python runtests.py --cpu-dispatch="none" -t numpy/core/tests/test_simd_module.py
Building, see build.log...
    ... build in progress
Build OK
NumPy version 1.21.0.dev0+393.g0386777c3
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3
.........................s..........                                   [100%]
35 passed, 1 skipped in 0.43s

  * `python runtests.py --cpu-dispatch="none" -t numpy/core/tests/test_simd.py`

python runtests.py --cpu-dispatch="none" -t numpy/core/tests/test_simd.py
Building, see build.log...
Build OK
NumPy version 1.21.0.dev0+393.g0386777c3
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3
.......................................................................................................................................................................................................... [ 84%]
....................................                                                                                                                                                                       [100%]
238 passed in 1.89s

* Testing `_SIMD` module with certain dispatched features, e.g. `sse41`
  `python runtests.py --simd-test="baseline sse41" -t numpy/core/tests/test_simd.py`

python runtests.py --simd-test="baseline sse41" -t numpy/core/tests/test_simd.py
Building, see build.log...
    ... build in progress
    ... build in progress
Build OK
NumPy version 1.21.0.dev0+393.g0386777c3
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
........................................................................ [ 15%]
........................................................................ [ 30%]
........................................................................ [ 45%]
........................................................................ [ 60%]
........................................................................ [ 75%]
........................................................................ [ 90%]
............................................                             [100%]
476 passed in 7.31s

$ python runtests.py --simd-test="baseline sse41 sse42 avx avx2 fma" -t numpy/core/tests/test_simd.py
Building, see build.log...
    ... build in progress
    ... build in progress
Build OK
NumPy version 1.21.0.dev0+393.g0386777c3
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
........................................................................ [ 15%]
........................................................................ [ 30%]
........................................................................ [ 45%]
........................................................................ [ 60%]
........................................................................ [ 75%]
........................................................................ [ 90%]
............................................                             [100%]
476 passed in 7.40s

$ python runtests.py -t numpy/core/tests/test_simd.py
Building, see build.log...
Build OK
NumPy version 1.21.0.dev0+393.g0386777c3
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
........................................................................ [ 15%]
........................................................................ [ 30%]
........................................................................ [ 45%]
........................................................................ [ 60%]
........................................................................ [ 75%]
........................................................................ [ 90%]
............................................                             [100%]
476 passed in 3.55s

* Checking the build log, and finally GDB or LLDB

After following the above steps, I can run the whole testsuite with no segfaults. If I run git clean -xdf first, I get segfaults. I have no idea why, but at least I have a workaround now.

@seiko2plus

This was suggested by @seiko2plus for debugging a segfault in the tests on Cygwin: numpy#18102 (comment) This test passes on Cygwin, and the whole testsuite has only the failures I expect from running on Cygwin (see numpy#18102 and numpy#16246).

DWesl · 2021-02-02T20:47:18Z

Out of curiosity, should abs(inf + nanj) be inf or nan? My first thought was nan because there's a nan in there, but the tests seem to be expecting inf.

seberg · 2021-02-02T20:51:17Z

should abs(inf + nanj) be inf or nan?

I am sure NaN is correct. If an inf result is tested, that could well be a bug in the complex abs implementation.

EDIT: I am basing this on the fact that "all NaNs are NaNs" for complex numbers (i.e. inf+nanj should be the same as nan+nanj). If you disregard that, I guess an inf result does make sense. A NaN result would seem safer to me though.

DWesl · 2021-02-02T21:23:27Z

should abs(inf + nanj) be inf or nan?

I am sure NaN is correct. If an inf result is tested, that could well be a bug in the complex abs implementation.

Okay, I recently wrote a C program to check cabsl(+-inf + nanj) and cabsl(nan +- infj), since those tests have been failing for months. All four return nan. The corresponding tests fail:

FAILED numpy/core/tests/test_multiarray.py::test_npymath_complex[complex256-inf-nan-npy_cabs-absolute] - AssertionError:
FAILED numpy/core/tests/test_multiarray.py::test_npymath_complex[complex256--inf-nan-npy_cabs-absolute] - AssertionError:
FAILED numpy/core/tests/test_multiarray.py::test_npymath_complex[complex256-nan-inf-npy_cabs-absolute] - AssertionError:
FAILED numpy/core/tests/test_multiarray.py::test_npymath_complex[complex256-nan--inf-npy_cabs-absolute] - AssertionError:

with the error:

E       AssertionError:
E       Not equal to tolerance rtol=1e-07, atol=0
E
E       x and y nan location mismatch:
E        x: array(inf)
E        y: array(nan, dtype=float128)

I just noticed the dtype for x isn't right, so I'll look into fixing that.

DWesl · 2021-02-02T22:15:43Z

I tracked the numpy.core._multiarray_tests.npy_cabs call to numpy/core/src/npymath/npy_math_complex.c.src line 121, where it calls npy_hypotl and got stuck there. I'm going to leave that be and post my other work.

charris · 2021-02-02T23:22:16Z

@DWesl Do you want me to keep this up?

DWesl · 2021-02-02T23:30:06Z

You can close it if you want. I'm running a last round of tests before creating the pull request, to see if any of the failing tests are passing now.

WarrenWeckesser · 2021-02-02T23:34:41Z

Out of curiosity, should abs(inf + nanj) be inf or nan?

Actually, it looks like the result should be inf, based on behavior specified in IEEE 754-2008. I haven't read the full specification; this comment is based on a comment that @mdickinson made in a Python bug report. And inf is the value returned by Python's abs and by np.abs in 1.19.5:

In [1]: import math                                                                                                 

In [2]: z = complex(math.inf, math.nan)                                                                             

In [3]: abs(z)                                                                                                      
Out[3]: inf

In [4]: import numpy as np                                                                                          

In [5]: np.__version__                                                                                              
Out[5]: '1.19.5'

In [6]: np.abs(z)                                                                                                   
Out[6]: inf

And there is also

In [11]: np.hypot(np.inf, np.nan)                                                                                   
Out[11]: inf

seberg · 2021-02-02T23:42:15Z

Makes sense, the usual safe bet is to say is to replace NaN with "any possible value". The oddball is just that for complex two values for which np.isnan() is True, will have different values for these cases. But I suppose that isn't really an issue.

mdickinson · 2021-02-03T08:18:43Z

Annex G of the C standard is also a useful (and more easily available) source of information for these corner cases. For the C11 standard, §G.6p6 has:

Each of the functions cabs and carg is specified by a formula in terms of a real function (whose special cases are covered in annex F): cabs(x + iy) = hypot(x, y) [...]

while §F.10.4.3p1 has:

hypot(±∞, y) returns +∞, even if y is a NaN.

@seiko2plus

This was suggested by @seiko2plus for debugging a segfault in the tests on Cygwin: numpy#18102 (comment) This test passes on Cygwin, and the whole testsuite has only the failures I expect from running on Cygwin (see numpy#18102 and numpy#16246).

@seiko2plus

This was suggested by @seiko2plus for debugging a segfault in the tests on Cygwin: numpy#18102 (comment) This test passes on Cygwin, and the whole testsuite has only the failures I expect from running on Cygwin (see numpy#18102 and numpy#16246).

DWesl added 30 commits December 31, 2020 16:12

BLD: Move the configuration defines in Python.h earlier.

1926a6b

Leaving it later can lead to some headers getting confused, because they're included with one set of feature flags, but the headers they depend on were included with a different set of feature flags.

DEV: Revert executable renames for F77/G77 compiler.

6d8f36e

These changes made some sense when compiling only for cygwin. They do not belong in NumPy master.

BLD: Add functions that fail tests on Cygwin to npy_config.h.

c8d1059

I don't know if the list should be the same as one of the Windows lists, or if this should instead be a Newlib list. I know how to make a Cygwin list, so I did that. It didn't always work, though.

TST: Describe what I know about the FPE test failure.

c356079

The generic "this doesn't work" was confusing people. Specifying what "this" is and what happed works better.

STY: A few largely cosmetic changes.

faca5fb

Mark tests suddenly passing, and also document how I expect them to fail.

Remove xfails from tests that now pass.

db30000

Use parametrize for the dtypes in the abs tests.

66dc53a

Remove xfail mark from an assert function.

0d4b4de

These should primarily be on test functions.

Remove references to np.complex256

0f11301

Unconditional access to np.complex256 triggers an error in test collection on platforms where the dtype does not exist. Find another way to test the dtype for equality.

Remove reference to np.complex256 in the other test

bc209f1

I changed this for the one test, but forgot to do so for the other. Hopefully this actually fixes the problem.

Don't add gfortran to the list of GNU77 compilers

c93cb73

Presumably whoever wrote this code had reasons for leaving this out, and making a separate class for `gfortran`

Include only 'gfortran' in gnu fcompiler, not absolute version

343127d

Including both 'gfortran' and '/usr/bin/gfortran' should be redundant, and I'm not sure even my platform ever needed this. Remove it.

TST: Use usual names for the dtypes in xfail mark

f9bea44

Short-circuiting is lovely.

BLD: Remind GCC that .seh_savexmm fails for xmm16-31.

f655535

Workaround for GCC bug 65782. Fixes numpy issue 14787.

BLD: Fix code checking for old GCC on cygwin.

d22b665

I forgot the function returned bytes, not str. I also accidentally passed an extra argument.

TST: Only mark fork()-using tests xfail on 32-bit cygwin.

8e2352b

The problem seems to have gone away on 64-bit.

BLD: Fix flags added to compiler line for old GCC on MS.

c519dd0

It may be worth broadening the platform list, since I believe the error this tries to fix is in the MS ABI/CRT rather than in the Cygwin runtime.

TST: parametrize test_npymath_complex.

33e863b

Allows more precise xfail.

TST: Clarify the xfail messages for branch cuts.

853f82a

I think these are the tests where the "branch cut code doesn't use sign of zero" became relevant.

BLD: Remove probably-unnecessary include.

c32f1ce

This was trying to ensure the definition of symbols that depend on feature macros set in "Python.h". That issue seems to have resolved itself.

BLD: Mark modfl as problematic on cygwin.

3666fce

The implementation currently segfaults. I'll report this to the developers soon; hopefully it can be re-enabled soon.

Undef HAVE_MODFL only on 64-bit cygwin

78e8c95

32-bit modfl appears to work fine. Problem exists in Cygwin 3.1.5 to 3.1.7 and should be fixed in 3.2.0.

Revert "BLD: Fix flags added to compiler line for old GCC on MS."

af0de3d

This reverts commit 6ffbcb2. Replaced by 17548.

Revert "BLD: Fix code checking for old GCC on cygwin."

8fc9a9f

This reverts commit d9c21bb. Replaced by numpy#17548.

Revert "BLD: Remind GCC that .seh_savexmm fails for xmm16-31."

7c43e16

This reverts commit 92eaff3. Replaced by numpy#17548.

charris added 00 - Bug component: build labels Dec 31, 2020

charris mentioned this pull request Dec 31, 2020

BUG: Fixes for building on Cygwin. #16246

Closed

DWesl mentioned this pull request Feb 3, 2021

ENH, DOC: Build notes and fixes for Cygwin. #18308

Closed

charris closed this Feb 3, 2021

charris deleted the rebase-16246 branch February 3, 2021 01:15

DWesl mentioned this pull request Jul 21, 2021

TST: Add Cygwin to the x86 feature tests. #19535

Merged

Uh oh!

BUG: Fixes for building on Cygwin. #18102

BUG: Fixes for building on Cygwin. #18102

Uh oh!

Conversation

charris commented Dec 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Dec 31, 2020

Uh oh!

DWesl commented Jan 1, 2021

Uh oh!

charris commented Jan 1, 2021

Uh oh!

DWesl commented Jan 1, 2021

Uh oh!

DWesl commented Jan 1, 2021

Uh oh!

charris commented Jan 1, 2021

Uh oh!

DWesl commented Jan 3, 2021

Uh oh!

rgommers commented Jan 24, 2021

Uh oh!

seiko2plus commented Jan 24, 2021

Uh oh!

DWesl commented Jan 31, 2021

Uh oh!

DWesl commented Feb 2, 2021

Uh oh!

seberg commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DWesl commented Feb 2, 2021

Uh oh!

DWesl commented Feb 2, 2021

Uh oh!

charris commented Feb 2, 2021

Uh oh!

DWesl commented Feb 2, 2021

Uh oh!

WarrenWeckesser commented Feb 2, 2021

Uh oh!

seberg commented Feb 2, 2021

Uh oh!

mdickinson commented Feb 3, 2021

Uh oh!

Uh oh!

charris commented Dec 31, 2020 •

edited

Loading

seberg commented Feb 2, 2021 •

edited

Loading