-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
TST: Add cygwin build to CI #18330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Add cygwin build to CI #18330
Conversation
I suppose I should mention: I've been using |
I wonder what is up with the doc build?
|
No idea. I'm hoping it's a transient thing that goes away with the new commit. I tried building a NumPy wheel, installing it, and testing against that, but is still failed to import |
It has been going on a lot, I have a PR open to fix it (sorry missed the discussion here). My theory is that we are getting different hardware so that it now takes sometimes up to 30 instead of around 10 minutes, and so we suddenly get timeouts frequently. In any case, seems that bumping the limit in gh-18349 should make it reliable again. |
The test is failing with "ImportError: No such file or directory" after attempting to import numpy.linalg.lapack_lite. A previous step said the file it's trying to open ( I could try a user install, a virtual environment, or tox, but I'm not sure those would make a difference. I'm still not sure what's going on here. |
Unfortunately, windows does not have tooling like Linux to detect which DLLs are missing when loading a DLL at runtime. Does cygwin provide an |
Unfortunately, windows does not have tooling like Linux to detect which
DLLs are missing when loading a DLL at runtime. Does cygwin provide an
ldd-like command to do anything like that? It might be something
around the fortran or openblas dll.
It provides ldd and an alternate command I'm more familiar with. I
added both to the check on the dlls.
|
Is the failure to load something specific to Docker or Docker running Windows? Edit: just checked the Azure jobs for Windows, and it doesn't look like there's extra steps I'm missing. I'm assuming Microsoft is doing similar things for Azure and GitHub Actions. |
I don't know, but Windows has been tricky in that regard in the past. What we need here is some input from a Windows person. |
Is there some flag we need to set to tell Docker or Windows that we want to use this file as a DLL, despite it being created in this container? |
A bit of background on Cygwin BLAS and LAPACKThe netlib reference BLAS and LAPACK dlls are installed to Attempts to load BLAS and LAPACK from Python3.8 on my local Cygwin$ ls /usr/bin/cygwin1.dll
/usr/bin/cygwin1.dll
$ python3.8
Python 3.8.7 (default, Jan 31 2021, 21:50:45)
[GCC 10.2.0] on cygwin
Type "help", "copyright", "credits" or "license" for more information. >>> import os
>>> os.add_dll_directory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'os' has no attribute 'add_dll_directory'
>>> import ctypes.util
>>> ctypes.CDLL("cygwin1.dll").cygwin_conv_path
<_FuncPtr object at 0x6fffffe6eac0>
>>> ctypes.CDLL(ctypes.util.find_library("lapack")).dgesv_
<_FuncPtr object at 0x6ffff790dd00>
>>> import numpy.linalg.lapack_lite
>>> The lack of I could try switching to 3.7 if you think that would help. |
It would eliminate one possible concern. Have you tried searching stackoverflow? |
I tried with python 3.7:
Same error. None of my searches ("GitHub Actions on Windows", "Problems loading DLLs in GitHub Actions on Windows", or "GitHub Actions on Windows third-party DLL") returned anything relevant. |
Could you save the wheel as a github action artifactsand then download that to a local machine to test it? Maybe something is missing in the packaging |
I installed a snapshot of the Cygwin dll a few months back to get around a segfault in I got
|
I seem to have a similar problem with a different project on my computer. I still have no idea why it's not importing the file it finds with the proper permissions, but I can investigate this locally now. |
Great! |
Thanks @mattip for suggesting I look at Does anyone want me to rebase and squash some of the comments together or make the commit messages numpy-style (add TST: at start)? I've recently had a couple rounds of "try to do something, then account for differences between local and CI systems" |
That would be good and you might as well rewrite the commit message(s) while doing so. The tests are passing now, do you think this is about ready? |
5464471
to
7bbe511
Compare
.github/workflows/cygwin.yml
Outdated
for name in ${dll_list}; | ||
do | ||
echo ${name} | ||
python3.7 -c "import "$(echo ${name} | sed -E -e "s/\/+(home|usr).*?site-packages\/+//g" -e "s/\//./g" -e "s/.cpython-3.m?-x86(_64)?-cygwin.dll//g") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to break this line (and similar) with \
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some places, yes, other places, not easily.
I could try using runtests.py
again; that should work now and would be shorter than the current np.test()
line
.github/workflows/cygwin.yml
Outdated
with: | ||
platform: x64 | ||
install-dir: 'C:\tools\cygwin' | ||
packages: python37-devel python37-pytest python37-hypothesis python37-cython python37-pip python37-wheel python37-setuptools liblapack-devel libopenblas gcc-fortran git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to break this line with packages: >
. See https://stackoverflow.com/questions/3790454/how-do-i-break-a-string-in-yaml-over-multiple-lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.github/workflows/cygwin.yml
Outdated
-e "s/\/+(home|usr).*?site-packages\/+//g" -e "s/\//./g" \ | ||
-e "s/.cpython-3.m?-x86(_64)?-cygwin.dll//g") | ||
done | ||
' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that?
I've moved this section to later in the run, guarded by if: failure()
so it only runs if something breaks, and cleaned up the output so it's easier to see problems. I also split finding the extension module name onto an earlier line than the import check, to help clarify what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about '
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now I see it starts way up above :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the end of the extension-module-checking script. Should I split that into a separate file somewhere rather than writing it out each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first thing is to get the test passing again, we can worry about further cleanups after that. I'm beginning to regret the original suggestion :) But it does look much nicer now.
This is now failing within the actual NumPy test suite, somewhere inside If I use Should I switch to using |
I did that, and the tests that had been segfaulting still pass, so I pushed it up for CI. |
That seems to have removed too many function inline annotations. The Cygwin test reports a segfault while executing
I suspect one of the annotations was for an SIMD equality comparison loop. |
The next step would be to make a PR adding the annotations that you think are needed, then we can ask @seiko2plus for a review. |
It could be that the relevant changes were lost unintentionally. |
|
Edit: Fix spelling of "pull request" TST: Follow workflow for other OSs using commands available on Cygwin. Other GitHub actions set fetch-depth to zero and use ./.github/actions for the actual testing. Let's see if this works. Edit: Drop shell from the last step in the action. Apparently "shell" isn't valid for steps. I hope the action knows to use bash rather than Power Shell or cmd.exe, or this is going to get very confused. TST: Adjust for cygwin The action assumes `sudo` exists, which is a bit of a problem. TST: Fetch with Cygwin git Make sure Cygwin git can access the repository. Versioneer depends on git, so this is kind of important. Edit: Include closing quote. Edit: Install Cygwin git. Versioneer needs this in order to function. Edit: FIXUP: build_src option is not spelled --verbose-config It's --verbose-cfg. TST: Check which version of Python is being used. I want to be sure it's a Cygwin one, not a Windows one. TST: Build a wheel for Cygwin. I may need to install in a virtualenv, and installing from a wheel is much faster than installing from a directory. TST: Use runtests.py to build the extensions. It couldn't find them during the tests, which is not a problem I've run into locally. Hopefully this fixes that. TST: Drop separate build step. TST: Cygwin CI: Install numpy and run tests from that. Running with runtests.py didn't work, but this should. TST: Add cygwin bin dir to path, not root dir. TST: Make sure Cygwin build fails when tests fail The tests managed to pass despite NumPy not importing. This is not a good thing. TST: Avoid steps after tests in Cygwin CI. I really need this to fail if NumPy doesn't import. TST: Ensure Cygwin CI is running in Cygwin. Kinda defeats the point if it's not. TST: Check that pip installed the C extensions. This really shouldn't need to be checked, but the CI runs have been failing due to a failed import of a C extension. I also printed the version, which will make me feel better about the right version being found and used. TST: Include closing quote in test command TST: Report loaded modules Still trying to find why the C extension module import fails. TST: Check permissions on NumPy C extensions TST: Ask python which files it's trying to load. Hopefully this gives me a general direction for where to look for "Cannot find file or directory" TST: Work around powershell syntax weirdness Drop a level of quoting and just write the command to a file, then run that. It should still show DLL permissions. TST: Fix line endings in script. bash expects \n powershell creates \r\n TST: Check for DLLs required by C extensions in Cygwin CI. It's still not working; hopefully this shows why. TST: Add import checks to the dll testing. I have a project where `python -c 'import module'` worked but `pytest --pyargs module` did not. Lets see if that happens here. Allow for global installs The runners have global install privileges, so pip will install NumPy there. I need to account for that in my script. Fix sed regex. Lots of leaning-toothpick syndrome, but it actually does what it's supposed to locally. Hopefully the CI agrees. Stop trying to import NumPy modules from sourcedir This doesn't work and hasn't for a while. TST: Simplify PATH and make sure lapack is on it. Most recent CI run said it couldn't find lapack. Reduce PATH to just Cygwin directories, make sure /usr/lib/lapack is included (after /usr/bin), and try again. Shortening PATH occasionally fixes some other problems. Let's see if that works here. TST: Change Cygwin CI python from 3.8 to 3.7 Let's see if this solves the problems. There was a change in DLL load path handling in 3.8 that might be causing the "cannot load numpy.linalg.lapack_lite" errors. TST: Stop running CI on PR branch. STY: Wrap long lines in Cygwin workflow file. TST: Update name of "main" branch used to trigger workflow. I forgot this changed a while back. TST: Specify full paths for commands. Also use dash in more places. TST: Change the newline-escape mechanism in Cygwin workflow. Backslash is apparently difficult. This reduces that step to a single command and tells YAML the string is to be interpreted as a single line. TST: Move the test command onto one line so GitHub can read it. Apparently the GitHub Actions YAML parser is incomplete. It should be reading what was there as a single-line string with no linebreak at the end and all linebreaks in the middle replaced by spaces. It seems to have parsed this as two lines. Does GitHub have a document saying which subset of YAML they actually recognize? TST: Add dependency on importlib-metadata. Pytest didn't declare it. TST: Move importlib-metadata earlier in install list. I hope this means it actually gets installed. The manager is ignoring it right now. TST: Add zipp to the package install list. Also tell pip to install the test_requirements, so I don't keep running into this problem. TST: Specify full path to python for testing. I apparently missed this earlier. TST: Add CFFI and pytz to Cygwin test environment. TST: Set global path to include /usr/lib/lapack. I'd forgotten there was a line for this already. TST: Shorten and unpin the Cygwin test requirements. I don't want to build more modules than I have to. It works fine on my machine with the system setuptools, pytz, and CFFI as well as system-ish Cython, so it should work fine on the CI runners. TST: Make sure test requirements are actually installed. Apparently pytest->importlib_metadata->"typing_extensions; python_version <= 3.7" isn't a declared dependency chain. Mentioning the last two explicitly should get them installed anyway. TST: Fix quoting in command line.
Also serves to document the script a bit.
Several of the test failures still happen. I'm not entirely sure why. BLD: Add more functions to the Cygwin replace list. These functions are mentioned in test failures, so I mark them for replacement, along with the more obvious functions that might get called by functions that continue to fail. TST: List more functions to be replaced on Cygwin. casin and casinf don't pass the branch cut tests. Let's see if replacing them lets the test pass on CI. TST: Mark more functions to be replaced on Cygwin. I tried to note the tests each function fails by group, but I don't remember all of them anymore. Let's see if casin{,f,l} gets replaced on the CI runner. BLD: Mark more functions for replacement on Cygwin. I probably need to run `git clean` to see improvements locally. BLD:List more functions to be replaced on Cygwin. This is nearly the last of them. There are still a few failures I don't know how to deal with. The cabsl/hypotl overflows will be gone next Cygwin update. I may have an idea for cpowl (if cpowl(x, n) == cexpl(n*clogl(x))) I don't understand timezones, CFFI, or LAPACK, and that's the rest of the failures. BLD: Change list of functions marked for replacement on Cygwin. I hoped this would convince `cpowl` to flag its overflows, but that doesn't appear to have happened.
There will probably be compilation involved for coverage (from pytest-cov), maybe also cython, but the rest look like pure python and should be fast.
CFFI should be able to find them now. The Cygwin runtime DLL loader is the Windows one, and the linker also shares most of the same semantics.
The implementation is already there, and the tests require npy_longdouble arithmetic, so I set up the boilerplate to make it so. It seems to fix only np.abs(npy_clongdouble), not abs(npy_clongdouble), for reasons I don't understand.
…hat accept AVX registers" These changes are not present in `main`. I see no commits likely to have specifically changed whether these SIMD functions are inlined. Adding these back to `main` is left for another PR. The symptoms I saw were segfaults, basically because function calls do not preserve alignment information.
f8b795f
to
bbd571a
Compare
…ctions that accept AVX registers"" This reverts commit bbd571a. That commit was original to this branch, but is now in main, so undoing it is a bad idea.
Rebasing to include that commit again seems to have gotten the tests working again. Anything else I need to do? One of the changes since the original version of this PR was to add |
Could move some of the github workflow into a |
Instead of trying to write it every run, just add it to the repository and use that. I'm not sure how Windows git handles line endings; I suspect it changes things to \r\n, which is not what I want. I'm on Cygwin, not MSYS, so everything expects \n.
Marking this in the parametrize call gets really big. Moving this inside the function is much shorter, even if it won't tell me when a new Cygwin release makes this obsolete. Since this only needs to wait for one Cygwin release, I suppose that's not much of a stretch.
I want the `sys.platform == "cygwin"` check before I try to access `np.clongdouble`, so the check doesn't crash the test on platforms where np.clongdouble doesn't exist.
In it goes, thanks @DWesl . |
Documented better here: numpy#18330 (comment)
Documented better here: numpy#18330 (comment)
Documented better here: numpy#18330 (comment)
Attempt to add a Cygwin build to the CI.
The code builds just fine, but the actual "running the tests" bit is crashing for reasons I don't understand and can't reproduce on my local machine. One way around this would be to build a wheel, install that into a virtual environment, and test using
np.test()
. Another would be usingtox
to run the tests, basically automating the previous suggestion.Inspired by discussion on #18308.