-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
[numpy-1.19.3, py3.6.2, dockerized env] - Fatal Python error: Segmentation fault (core dumped) #17674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@moylop260 hmm, interesting this happens without any other changes. I wonder if there is some other reason involved, e.g. a virtual machine, or so? Could you try running it using
Then type |
More info about: - Vauxoo/maintainer-quality-tools#315 And here: - numpy/numpy#17674 So, using a pinned numpy version where the error is not raised we can bypassing this error. The unique project used for all our customer is server-tools Revert this change after it is fixed.
Could be the change in the OpenBLAS library version, perhaps hardware detection related. |
Where exactly is the segfault? On first import? In some routine? What hardware are you running on ( |
Inside docker container In the host Running It is running well. But running our custom program importing the packages using the following way: $ gdb --args python3 odoo-bin
gdb --args python3 ~/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) r
Starting program: /.repo_requirements/virtualenv/python3.6/bin/python3 /home/odoo/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/.repo_requirements/virtualenv/python3.6/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.11) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
[New Thread 0x7fb7d2225700 (LWP 1613)]
[New Thread 0x7fb7ca094700 (LWP 1615)]
[New Thread 0x7fb7c9893700 (LWP 1616)]
[New Thread 0x7fb7c1092700 (LWP 1617)]
[New Thread 0x7fb7b8891700 (LWP 1618)]
[New Thread 0x7fb7b0090700 (LWP 1619)]
[New Thread 0x7fb7a788f700 (LWP 1620)]
[New Thread 0x7fb79f08e700 (LWP 1621)]
[New Thread 0x7fb79688d700 (LWP 1622)]
[New Thread 0x7fb78e08c700 (LWP 1623)]
[New Thread 0x7fb78588b700 (LWP 1624)]
[New Thread 0x7fb77d08a700 (LWP 1625)]
[New Thread 0x7fb774889700 (LWP 1626)]
[New Thread 0x7fb76c088700 (LWP 1627)]
[New Thread 0x7fb763887700 (LWP 1628)]
[New Thread 0x7fb75b086700 (LWP 1629)]
[New Thread 0x7fb752885700 (LWP 1630)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb752885700 (LWP 1630)]
0x0000000000000000 in ?? () If I install the previous version: The segmentation fault is not reproduced: gdb --args python3 odoo-bin
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) r
Starting program: /.repo_requirements/virtualenv/python3.6/bin/python3 /home/odoo/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/.repo_requirements/virtualenv/python3.6/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.11) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
[New Thread 0x7fdb6d8da700 (LWP 1658)]
[New Thread 0x7fdb65df8700 (LWP 1660)]
[New Thread 0x7fdb655f7700 (LWP 1661)]
[New Thread 0x7fdb62df6700 (LWP 1662)]
[New Thread 0x7fdb605f5700 (LWP 1663)]
[New Thread 0x7fdb5ddf4700 (LWP 1664)]
[New Thread 0x7fdb5b5f3700 (LWP 1665)]
[New Thread 0x7fdb58df2700 (LWP 1666)]
[New Thread 0x7fdb565f1700 (LWP 1667)]
[New Thread 0x7fdb53df0700 (LWP 1668)]
[New Thread 0x7fdb515ef700 (LWP 1669)]
[New Thread 0x7fdb4edee700 (LWP 1670)]
[New Thread 0x7fdb4c5ed700 (LWP 1671)]
[New Thread 0x7fdb49dec700 (LWP 1672)]
[New Thread 0x7fdb475eb700 (LWP 1673)]
[New Thread 0x7fdb44dea700 (LWP 1674)]
[New Thread 0x7fdb425e9700 (LWP 1675)]
[New Thread 0x7fdb3fde8700 (LWP 1676)]
[New Thread 0x7fdb3d5e7700 (LWP 1677)]
[New Thread 0x7fdb3ade6700 (LWP 1678)]
[New Thread 0x7fdb385e5700 (LWP 1679)]
[New Thread 0x7fdb35de4700 (LWP 1680)]
[New Thread 0x7fdb335e3700 (LWP 1681)]
[New Thread 0x7fdb30de2700 (LWP 1682)]
[New Thread 0x7fdb2e5e1700 (LWP 1683)]
[New Thread 0x7fdb2bde0700 (LWP 1684)]
[New Thread 0x7fdb295df700 (LWP 1685)]
[New Thread 0x7fdb26dde700 (LWP 1686)]
[New Thread 0x7fdb245dd700 (LWP 1687)]
[New Thread 0x7fdb21ddc700 (LWP 1688)]
[New Thread 0x7fdb1f5db700 (LWP 1689)]
[New Thread 0x7fdb1adda700 (LWP 1690)]
...
[Inferior 1 (process 2073) exited normally] It is using
What another command I should run? |
I guess OpenBLAS is the best guess right now but that doesn't seem to say much, unless it is OpenBLAS threading related. One try to narrow it down could be trying to set the environment variable |
Not many things changed
The only two, other than OpenBLAS, that look possible are the memory leak fix and nulling the deprecated fields. This would be an easy bisect if we had a simple way to test. |
@moylop260 Does your program have any C code or is it pure python. |
I think it is pure python but it is using a lot of other libraries (maybe with C code)
Let me revert sha by sha from v1.19.2 to v1.19.3 I will be back |
Using
Maybe, it not related with the changes but package build. I just put here the output installing and uninstalling 🔴 $ pip install numpy==1.19.3
Collecting numpy==1.19.3
Using cached numpy-1.19.3-cp36-cp36m-manylinux2010_x86_64.whl (14.9 MB)
Installing collected packages: numpy
Successfully installed numpy-1.19.3
$ pip uninstall numpy
Found existing installation: numpy 1.19.3
Uninstalling numpy-1.19.3:
Would remove:
/virtualenv/python3.6/bin/f2py
/virtualenv/python3.6/bin/f2py3
/virtualenv/python3.6/bin/f2py3.6
/virtualenv/python3.6/lib/python3.6/site-packages/numpy-1.19.3.dist-info/*
/virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libgfortran-2e0d59d6.so.5.0.0
/virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libopenblasp-r0-a32f1dca.3.12.so
/virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libquadmath-2d0c479f.so.0.0.0
/virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libz-eb09ad1d.so.1.2.3
/virtualenv/python3.6/lib/python3.6/site-packages/numpy/* 🟢 (v1.19.3)$ git log -r -1 --oneline
5010177cf (HEAD -> v1.19.3, tag: v1.19.3) REL: NumPy 1.19.3 release.
(v1.19.3)$ pip install .
Created wheel for numpy: filename=numpy-1.19.3-cp36-cp36m-linux_x86_64.whl size=10498376 sha256=1c798acde71e7e800b37b3b79aef4dbdb9a01a270b47e89dd43292799909ed24
Stored in directory: /tmp/pip-ephem-wheel-cache-wk7jbnti/wheels/80/4e/3e/d6f0d3d1d0e6c064a0d152f30c9cec6
(v1.19.3)$ pip uninstall numpy
Found existing installation: numpy 1.19.3
Uninstalling numpy-1.19.3:
Would remove:
/virtualenv/python3.6/bin/f2py
/virtualenv/python3.6/bin/f2py3
/virtualenv/python3.6/bin/f2py3.6
/virtualenv/python3.6/lib/python3.6/site-packages/numpy-1.19.3.dist-info/*
/virtualenv/python3.6/lib/python3.6/site-packages/numpy/* Notice there is not After installed I just deleted all files from
And now, it was fixed! 🟢 Checking differences between
|
TensorFlow building against C++ Numpy crashes with OOM on the Windows VMs. This is with numpy-1.19.3, numpy 1.19.2 is all good. Error message looks something like:
I was unable to reproduce outside of TF for now, but will keep trying. |
Maybe we should back out the OpenBLAS 0.3.12, and put in the PR to detect a bad |
That is unfortunate, but does seem likely necessary. |
I've reopened the warning PR. I'll put it into master also, as at this point we probably don't want the 3.12 library in 1.20 either. It would be nice to pin down what the problem to something easily reproducible so that OpenBLAS could be fixed. |
Using OpenBLAS 0.3.12 led to segfaults, see numpygh-17674, so revert to previous version. The Windows 10 version 2004 problem remains, but we will warn instead when it is detected.
@moylop260 that sounds like an OpenBLAS issue as well, since NumPy will not use more than one thread itself normally. It is very unfortunate, but It sounds pretty safe to assume that revering the wheel to use the older openblas version will fix it. |
We are missing some tests that might catch this sort of thing, but not sure how to do that. |
@moylop260 Could you try installing 1.19.4 from the staging repo:
I'd like to get it tested before uploading to PyPI. |
Hi @charris Thanks for fixing Installing numpy-1.19.4 is not reproduced anymore. |
I think it is a corner case since that we have different host server but they are not reproducing the same issue I think it is reproduced only for a particular kind of processors. Let me collect this information if you want. Or tell me, what kind of output do you need? If you need something else don't doubt |
@moylop260 Thanks for checking. Any information about the processor reported in the docker environment would be helpful to the OpenBLAS developers, we can't stick with an old library version forever :) @mihaimaruseac Could you also check if your problem is fixed? |
Will trigger a test build later today, sorry for the delay. We are in the process of a new TF release and the CI system is currently used for the release. |
I have created the following folder: There is the info of:
And more. I hope it helps to make a corner case environment to reproduce it running unittest Regards! |
@moylop260 Thanks. |
bokeh depends of numpy numpy raise weird errors: - Vauxoo/maintainer-quality-tools#315 - numpy/numpy#17674 - numpy/numpy#13059
altair depends of numpy numpy raise weird errors: - Vauxoo/maintainer-quality-tools#315 - numpy/numpy#17674 - numpy/numpy#13059
numpy raise weird errors: - Vauxoo/maintainer-quality-tools#315 - numpy/numpy#17674 - numpy/numpy#13059
This issue is just reproduced using python3.6 and numpy==1.19.3
This issue is just reproduced using python3.6 and numpy==1.19.3 And just using one kind of processor - https://drive.google.com/drive/folders/18mcYQu4GGPzwCRj14Pzy8LvfC9bcLid8?usp=sharing Check the following history: - Vauxoo/maintainer-quality-tools#315
Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Was anyone ever able to pinpoint exactly why Numpy was segfaulting with OpenBLAS 0.3.12? We are having a similar problem in SageMath with OpenBLAS 0.3.13, though it seems to be hardware-dependent. |
My theory is that OpenBLAS has CPU detection logic that is used to activate different optimizations.. If the docker image somehow interferes with this logic, OpenBLAS will try to use features that do not exist. You may have some luck overriding the logic by setting |
Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
The reason of websocket-client was deactivated is: numpy has the following issue: - numpy/numpy#13059 It is a corner case using a kind of processor, using docker and using python3 More info about: - numpy/numpy#17674 - numpy/numpy#17759 But who is using numpy? There are different projects using libraries that depends of numpy: ./web/requirements.txt:2:bokeh==1.1.0 ./reporting-engine/requirements.txt:1:altair ./icm/requirements.txt:1:pandas ./maintainer-quality-tools/requirements.txt:7:websocket-client So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash. It was the path: - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50 - But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34 is importing numpy if you are using python3 numpy is installed because of the requirements.txt files above and the disaster was did. We could have removed all numpy requirements but there are a lot of them. But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client. However, after researching, we found that: OpenBLAS creates a number of threads equal to the number of core threads available, so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue. After a test building an image to reproduce the error and using that environment variable and it was fixed. That change was applied in following PRs: - Vauxoo/docker-ubuntu-base#89 - Vauxoo/docker-ubuntu-base#90 With change applied in docker-ubuntu-base, it's not neccesary avoid to import websocket-client (allow JS tests work again), we are covered with env var OPENBLAS_NUM_THREADS.
The reason of websocket-client was deactivated is: numpy has the following issue: - numpy/numpy#13059 It is a corner case using a kind of processor, using docker and using python3 More info about: - numpy/numpy#17674 - numpy/numpy#17759 But who is using numpy? There are different projects using libraries that depends of numpy: ./web/requirements.txt:2:bokeh==1.1.0 ./reporting-engine/requirements.txt:1:altair ./icm/requirements.txt:1:pandas ./maintainer-quality-tools/requirements.txt:7:websocket-client So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash. It was the path: - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50 - But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34 is importing numpy if you are using python3 numpy is installed because of the requirements.txt files above and the disaster was did. We could have removed all numpy requirements but there are a lot of them. But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client. However, after researching, we found that: OpenBLAS creates a number of threads equal to the number of core threads available, so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue. After a test building an image to reproduce the error and using that environment variable and it was fixed. That change was applied in following PRs: - Vauxoo/docker-ubuntu-base#89 - Vauxoo/docker-ubuntu-base#90 With change applied in docker-ubuntu-base, it's not neccesary avoid to import websocket-client (allow JS tests work again), we are covered with env var OPENBLAS_NUM_THREADS.
Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
altair depends of numpy numpy raise weird errors: - Vauxoo/maintainer-quality-tools#315 - numpy/numpy#17674 - numpy/numpy#13059
Reproducing code example:
After release 1.19.3
Fatal Python error: Segmentation fault (core dumped)
(I just enabled the following flag:
export PYTHONFAULTHANDLER=1
but not there is a good output)I'm using python3.6 virtualenv, using a docker container.
But If I install numpy==1.19.2 the error is not raised.
NumPy/Python version information:
python -c "import sys, numpy; print(numpy.__version__, sys.version)"
🔴 With error
1.19.3 3.6.2 (default, Jul 29 2017, 00:00:00)
[GCC 4.8.4]
🟢 without error
1.19.2 3.6.2 (default, Jul 29 2017, 00:00:00)
[GCC 4.8.4]
What command should I executed to get more info about this issue?
The text was updated successfully, but these errors were encountered: