-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
numpy 1.25.0 fails with illegal instruction in aws batch #24028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you help us by saying what type of machines these are? I guess graviton? Also, getting the output of How exactly did you install/build numpy?
The build work shouldn't affect the released versions of NumPy, although it could be a build issue. Another possibility is incorrect runtime CPU detection, that sometimes happens in virtual machines for example. |
A quick update, it seems to be the 7g series that cause the problem, but due to having both 6g and 7g nodes available it has become intermittent triggering this one. I'll update with the results from a c7g node when I can get them |
Thanks, in that case, it may be good to see the difference between the two also (if there is any). |
I am seeing what I believe to be the same issue. I'm also using
For context, I'm installing For
For For
For
|
There are two unrelated systems to use advanced SIMD instructions in NumPy. One is builtin, and is reported in the Could you try limiting the If that does not work, the problem might be with OpenBLAS (although I think this is less likely). Its dispatch can be controlled by adding |
Quickly spun up a Here's the output for
Here's the output for
|
Thanks @mattip, I'll give that a shot tomorrow and report back. |
Hmm. Outside the container, you can use NumPy 1.25.1. Inside the container, you get an illegal instruction. Perhaps you are pulling in the macos wheel for numpy when building the container, instead of the linux aarch64 one? |
I'm not sure how to check that. However, I was able to run the same image directly on the
I'm a bit out of my depth now, so I'm going to reach out to AWS business support for some assistance. Thanks for your help @mattip! |
It occurs to me that we (OpenBLAS) are naively expecting SVE to be available if the detected cpu is known to support it (there's currently no fine-grained checking of hwcap, unlike other platforms) , maybe that is not always a given with containers ? |
@martin-frbg Is there anything that I can do / add to our container to help debug this? |
If you could add a call to |
@martin-frbg I was able to run the following directly on the host as well as instrument the container to run the following on entry. TL;DR Note that the same image is used when testing Docker via running on the host C7G instance and AWS Batch.
Here are the results: Directly on the host C7G EC2 instance:
Docker running on the C7G EC2 Instance:
AWS Batch running on a C7G instance:
|
That would mean you could only run with OPENBLAS_CORETYPE=NEOVERSEN1 on AWS Batch, unless AWS support can |
Thanks @martin-frbg, this is incredibly helpful. I'll try to sort this out with AWS support. |
The An AMI ID with the 5.10 kernel compatible with Batch can be found by running the following command (where
It should be noted that this issue isn't specific to AWS Batch - any Amazon Linux 2 AMI with the older 4.14 kernel in it will have this problem today. |
4.14 predated the kernel support for SVE, so the kernel doesn't enable access to SVE because it doesn't know how to save and restore the registers. The best solution here would be to check the HWCAP bits to confirm SVE is available before using it in OpenBLAS as just knowing the processor ID supports it isn't really sufficient given the kernel itself might not. |
|
Hmmm, seems that OpenBLAS commit isn't yet in the version used by the wheels. Should we bump the commit pin slightly? |
@martin-frbg thoughts about a stable commit to build against? |
Depends - I have not committed anything that I'd consider high risk since the fix for this from two weeks ago, but IIUC there should also be a sufficiently recent weekly test build for Scipy that (I think) you might be able to reuse for numpy (testing, at least) ? |
We have the weekly builds here as well. Thanks, I will start the update process. Edit: well, we have a PR for weekly builds :) |
A newer OpenBLAS was built in MacPython/openblas-libs#112 |
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause pytho nto crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Added scipy (for statistical tests) and ipython. We pin numpy to version 1.24.2, as versions newer than this cause python to crash with SIGILL on AL2 4.14 c7g.metal instances. I expect this will be patched in the upcoming 1.25.3 release. See numpy/numpy#24028 Also drop boto3, which is unneccessary since d1015be. Signed-off-by: Patrick Roy <[email protected]> Co-authored-by: Pablo Barbáchano <[email protected]>
Steps to reproduce:
import numpy 1.25.0 in a task running in aws-batch fails with illegal instruction but runs locally.
python version: 3.10.11
working numpy version: 1.24.3
not working numpy version: 1.25.0
AWS batch is configured to use the following instance types:
Error message:
bash: line 1: 47 Illegal instruction (core dumped) ( python -u src-metrics-flows-test.py --quiet --metadata se Unfortunately due to running in AWS batch we are not able to get the core dump currently.
Additional information:
Roll back to 1.24.3 and it works again without us changing any code. I would hazard an un-scientific guess that this could be related to the build work thats been going on.
The text was updated successfully, but these errors were encountered: