-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Illegal instruction (core dumped) on import for numpy 1.19.5 on ARM64 #18131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds like an OpenBLAS cpu detection problem. @martin-frbg Thoughts? |
Note that we do test on aarch64 during the wheel build, so docker may also be at fault. Where did you get it? |
Not sure what the illegal instruction could be - the |
I can confirm that
doesn't segfault, while omitting the OPENBLAS_CORETYPE env var does. |
It may be relevant that |
While that certainly does not help with cpu detection, it should not cause a crash (and is actually a scenario that I did test) - this is tried only after an unsuccessful getauxval(), and when the fopen() fails the function returns NULL (which is subsequently interpreted as an unknown/generic ARMV8 core) |
GDB says yes:
With 1.19.4, that instruction never gets executed, as the tbz instruction just before it causes a jump past it:
w0 is 255, so bit 11 is zero. So HWCAP_CPUID isn't set. In the numpy 1.19.5 code, mrs is executed directly after the getauxval call:
The compiler seems to have reordered the instructions. I believe adding the |
Thank you. I had actually looked at this (in the context of the PR that changed the keyword to |
I think the restructuring of the code you did caused the optimiser to make different decisions.
That's not good. The only thing I can think of that would definitely work is to pass the result of getauxval() into the asm, and do some branching within the asm. |
We got around this in NumPy in npy_get_floatstatus_barrier by passing a dummy parameter into the function, which prevented reordering even with |
- 1.19.5 NumPy version doesn't work well on all aarch64 systems due to numpy/numpy#18131 - this PR sets the maximum version used by test on Xavier to 1.19.4 Signed-off-by: Janusz Lisiecki <[email protected]>
I had the same issue. Had to Reinstall NumPy version 1.19.4 manually to fix the error. |
I'm running into this on an NVIDIA Jetson (aarch64) - took me a while to isolate it. Funny thing is that it fails in a virtualenv but seems to be working if you install it at the system level. The NVIDIA NGC containers from https://github.com/dusty-nv/jetson-containers install |
- 1.19.5 NumPy version doesn't work well on all aarch64 systems due to numpy/numpy#18131 - this PR sets the maximum version used by test on Xavier to 1.19.4 Signed-off-by: Janusz Lisiecki <[email protected]>
I ran into this with CI breaking on Jetson NX & TX2 and took me a good while to isolate as well. Serves me right for using |
This has nothing to do with virtualenv nor python. Those images are based on Confirmed both workarounds
avoid the issue, from system, docker, and virtualenv. |
Unfortunately I cannot reproduce this (the underlying problem with OpenBLAS' cpu detection code) on my hardware, so I cannot confirm that the trivial attempt at fixing it with a |
My initial report was using a Jetson device, I can try it on a different ARM CPU tomorrow |
raspberry pi 4 (armv7l) has no problems. |
OpenBLAS does not provide DYNAMIC_ARCH for ARMV7 (no practical difference between the provided cpu targets), but I could not reproduce the problem on a Pi4 in 64bit ARMV8 mode. |
I can't reproduce the issue on AWS Graviton2 either (ARMv8) - but it does definitely occur in Jetson Nano, Jetson TX2 and Jetson Xavier NX. |
My reproduction was not on a Jetson. I'm actually not sure what it is - it's just a (pretty powerful) server I ssh into. |
Numpy 1.19.5 leads to core dumped error on python3.6. See numpy/numpy#18131
I encountered this bug (or at least an 'Illegal Instruction' error) on a Pi Zero W (ARM1176 CPU: ARMv6 architecture) whilst trying different versions of NumPy ( |
I also found that on a Pi Zero W |
This issue is about arm64, not armv6 (32-bits), and was fixed. Commenting here is less effective than opening a new issue. If you do open a new issue, be sure to faithfully report
|
@Edward-Knight You might have to check whether your ssd support NVME or not
you should see list of SSDs. If you didn't see and result that means not supported nvme |
Installing
numpy
1.19.5
from themanylinux2014
wheel causes a SIGILL. I have tested that this happens on Python 3.7 and 3.9, butnumpy
1.19.4
works fine, as does running onx86-64
.Reproducing code example:
Using the docker image
python:3
(currently with python 3.9.1, but can reproduce with other versions):$ pip install numpy==1.19.5 ... Successfully installed numpy-1.19.5 $ python3 -c "import numpy" Illegal instruction (core dumped)
Error message:
Running inside
gdb
, I get:Full `gdb` backtrace:
NumPy/Python version information:
Numpy 1.19.5
The text was updated successfully, but these errors were encountered: