Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[numpy-1.19.3, py3.6.2, dockerized env] - Fatal Python error: Segmentation fault (core dumped) #17674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
moylop260 opened this issue Oct 29, 2020 · 28 comments

Comments

@moylop260
Copy link

Reproducing code example:

After release 1.19.3

  • The following error is raised: Fatal Python error: Segmentation fault (core dumped)

(I just enabled the following flag: export PYTHONFAULTHANDLER=1 but not there is a good output)

I'm using python3.6 virtualenv, using a docker container.

But If I install numpy==1.19.2 the error is not raised.

NumPy/Python version information:

python -c "import sys, numpy; print(numpy.__version__, sys.version)"

🔴 With error
1.19.3 3.6.2 (default, Jul 29 2017, 00:00:00)
[GCC 4.8.4]

🟢 without error
1.19.2 3.6.2 (default, Jul 29 2017, 00:00:00)
[GCC 4.8.4]

What command should I executed to get more info about this issue?

@moylop260 moylop260 changed the title [1.19.3] - Fatal Python error: Segmentation fault (core dumped) [numpy-1.19.3, py3.6.2] - Fatal Python error: Segmentation fault (core dumped) Oct 29, 2020
@moylop260 moylop260 changed the title [numpy-1.19.3, py3.6.2] - Fatal Python error: Segmentation fault (core dumped) [numpy-1.19.3, py3.6.2, dockerized env] - Fatal Python error: Segmentation fault (core dumped) Oct 29, 2020
@seberg
Copy link
Member

seberg commented Oct 29, 2020

@moylop260 hmm, interesting this happens without any other changes. I wonder if there is some other reason involved, e.g. a virtual machine, or so?

Could you try running it using gdb (which you may need to install):

gdb --args python -c "import sys, numpy; print(numpy.__version__, sys.version)"

Then type r (or run) and hit enter.

moylop260 added a commit to Vauxoo/server-tools that referenced this issue Oct 29, 2020
More info about:
 - Vauxoo/maintainer-quality-tools#315

And here:
 - numpy/numpy#17674

So, using a pinned numpy version where the error is not raised we can bypassing this error.
The unique project used for all our customer is server-tools

Revert this change after it is fixed.
@charris
Copy link
Member

charris commented Oct 29, 2020

Could be the change in the OpenBLAS library version, perhaps hardware detection related.

@mattip
Copy link
Member

mattip commented Oct 29, 2020

Where exactly is the segfault? On first import? In some routine? What hardware are you running on (uname -a)

@moylop260
Copy link
Author

moylop260 commented Oct 29, 2020

$ uname -a

Inside docker container
Linux HOSTNAME 5.4.0-51-generic #56-Ubuntu SMP Mon Oct 5 14:28:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

In the host
Linux HOSTNAME 5.4.0-51-generic #56-Ubuntu SMP Mon Oct 5 14:28:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Running
gdb --args python3 -c "import sys, numpy; print(numpy.__version__, sys.version)"

It is running well.

But running our custom program importing the packages using the following way:

$ gdb --args python3 odoo-bin
gdb --args python3 ~/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) r
Starting program: /.repo_requirements/virtualenv/python3.6/bin/python3 /home/odoo/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/.repo_requirements/virtualenv/python3.6/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.11) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
[New Thread 0x7fb7d2225700 (LWP 1613)]
[New Thread 0x7fb7ca094700 (LWP 1615)]
[New Thread 0x7fb7c9893700 (LWP 1616)]
[New Thread 0x7fb7c1092700 (LWP 1617)]
[New Thread 0x7fb7b8891700 (LWP 1618)]
[New Thread 0x7fb7b0090700 (LWP 1619)]
[New Thread 0x7fb7a788f700 (LWP 1620)]
[New Thread 0x7fb79f08e700 (LWP 1621)]
[New Thread 0x7fb79688d700 (LWP 1622)]
[New Thread 0x7fb78e08c700 (LWP 1623)]
[New Thread 0x7fb78588b700 (LWP 1624)]
[New Thread 0x7fb77d08a700 (LWP 1625)]
[New Thread 0x7fb774889700 (LWP 1626)]
[New Thread 0x7fb76c088700 (LWP 1627)]
[New Thread 0x7fb763887700 (LWP 1628)]
[New Thread 0x7fb75b086700 (LWP 1629)]
[New Thread 0x7fb752885700 (LWP 1630)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb752885700 (LWP 1630)]
0x0000000000000000 in ?? ()

If I install the previous version:
pip install numpy==1.19.2

The segmentation fault is not reproduced:

gdb --args python3 odoo-bin
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) r
Starting program: /.repo_requirements/virtualenv/python3.6/bin/python3 /home/odoo/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/.repo_requirements/virtualenv/python3.6/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.11) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
[New Thread 0x7fdb6d8da700 (LWP 1658)]
[New Thread 0x7fdb65df8700 (LWP 1660)]
[New Thread 0x7fdb655f7700 (LWP 1661)]
[New Thread 0x7fdb62df6700 (LWP 1662)]
[New Thread 0x7fdb605f5700 (LWP 1663)]
[New Thread 0x7fdb5ddf4700 (LWP 1664)]
[New Thread 0x7fdb5b5f3700 (LWP 1665)]
[New Thread 0x7fdb58df2700 (LWP 1666)]
[New Thread 0x7fdb565f1700 (LWP 1667)]
[New Thread 0x7fdb53df0700 (LWP 1668)]
[New Thread 0x7fdb515ef700 (LWP 1669)]
[New Thread 0x7fdb4edee700 (LWP 1670)]
[New Thread 0x7fdb4c5ed700 (LWP 1671)]
[New Thread 0x7fdb49dec700 (LWP 1672)]
[New Thread 0x7fdb475eb700 (LWP 1673)]
[New Thread 0x7fdb44dea700 (LWP 1674)]
[New Thread 0x7fdb425e9700 (LWP 1675)]
[New Thread 0x7fdb3fde8700 (LWP 1676)]
[New Thread 0x7fdb3d5e7700 (LWP 1677)]
[New Thread 0x7fdb3ade6700 (LWP 1678)]
[New Thread 0x7fdb385e5700 (LWP 1679)]
[New Thread 0x7fdb35de4700 (LWP 1680)]
[New Thread 0x7fdb335e3700 (LWP 1681)]
[New Thread 0x7fdb30de2700 (LWP 1682)]
[New Thread 0x7fdb2e5e1700 (LWP 1683)]
[New Thread 0x7fdb2bde0700 (LWP 1684)]
[New Thread 0x7fdb295df700 (LWP 1685)]
[New Thread 0x7fdb26dde700 (LWP 1686)]
[New Thread 0x7fdb245dd700 (LWP 1687)]
[New Thread 0x7fdb21ddc700 (LWP 1688)]
[New Thread 0x7fdb1f5db700 (LWP 1689)]
[New Thread 0x7fdb1adda700 (LWP 1690)]
...
[Inferior 1 (process 2073) exited normally]

It is using docker --version

  • Docker version 19.03.13, build 4484c46d9d

What another command I should run?

@seberg
Copy link
Member

seberg commented Oct 29, 2020

I guess OpenBLAS is the best guess right now but that doesn't seem to say much, unless it is OpenBLAS threading related. One try to narrow it down could be trying to set the environment variable OMP_NUM_THREADS=1 and/or trying to see if there is a hardware detection issue by setting OPENBLAS_CORETYPE=haswell (potentially with OPENBLAS_VERBOSE=2 to be sure it worked). Although if it was the latter, I would have somewhat expected a more useful backtrace.

@charris
Copy link
Member

charris commented Oct 29, 2020

Not many things changed

  • BLD: set upper versions for build dependencies
  • BUG: Set deprecated fields to null in PyArray_InitArrFuncs
  • ENH: Warn on unsupported Python 3.10+
  • MAINT: Update test_requirements.txt.
  • ENH: Support for the NVIDIA HPC SDK nvfortran compiler
  • BUG: Cygwin Workaround for Compiling numpy 1.17 on cygwin fails with `Error: invalid register for .seh_savexmm #14787 on affected platforms
  • BUG: Fix memory leak of buffer-info cache due to relaxed strides
  • MAINT: Backport openblas_support from master.
  • TST: Add Python 3.9 to the CI testing on Windows, Mac.
  • TST: Simplify source path names in test_extending.

The only two, other than OpenBLAS, that look possible are the memory leak fix and nulling the deprecated fields. This would be an easy bisect if we had a simple way to test.

@charris
Copy link
Member

charris commented Oct 29, 2020

@moylop260 Does your program have any C code or is it pure python.

@moylop260
Copy link
Author

@charris Does your program have any C code or is it pure python.

I think it is pure python but it is using a lot of other libraries (maybe with C code)

Not many things changed

Let me revert sha by sha from v1.19.2 to v1.19.3
in order to know what is the change reproducing it
Installing using (v1.19.3)$ pip install .

I will be back

@moylop260
Copy link
Author

moylop260 commented Oct 29, 2020

Using

  • (v1.19.3)$ pip install . 🟢 (It is running well)
  • pip install numpy==1.19.3 🔴 (It is reproducing the error)

Maybe, it not related with the changes but package build.

I just put here the output installing and uninstalling

🔴

$ pip install numpy==1.19.3
Collecting numpy==1.19.3
  Using cached numpy-1.19.3-cp36-cp36m-manylinux2010_x86_64.whl (14.9 MB)
Installing collected packages: numpy
Successfully installed numpy-1.19.3

$ pip uninstall numpy
Found existing installation: numpy 1.19.3
Uninstalling numpy-1.19.3:
  Would remove:
    /virtualenv/python3.6/bin/f2py
    /virtualenv/python3.6/bin/f2py3
    /virtualenv/python3.6/bin/f2py3.6
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy-1.19.3.dist-info/*
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libgfortran-2e0d59d6.so.5.0.0
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libopenblasp-r0-a32f1dca.3.12.so
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libquadmath-2d0c479f.so.0.0.0
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/libz-eb09ad1d.so.1.2.3
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy/*

🟢

(v1.19.3)$ git log -r -1 --oneline
5010177cf (HEAD -> v1.19.3, tag: v1.19.3) REL: NumPy 1.19.3 release.

(v1.19.3)$ pip install .
Created wheel for numpy: filename=numpy-1.19.3-cp36-cp36m-linux_x86_64.whl size=10498376 sha256=1c798acde71e7e800b37b3b79aef4dbdb9a01a270b47e89dd43292799909ed24
  Stored in directory: /tmp/pip-ephem-wheel-cache-wk7jbnti/wheels/80/4e/3e/d6f0d3d1d0e6c064a0d152f30c9cec6

(v1.19.3)$ pip uninstall numpy
Found existing installation: numpy 1.19.3
Uninstalling numpy-1.19.3:
  Would remove:
    /virtualenv/python3.6/bin/f2py
    /virtualenv/python3.6/bin/f2py3
    /virtualenv/python3.6/bin/f2py3.6
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy-1.19.3.dist-info/*
    /virtualenv/python3.6/lib/python3.6/site-packages/numpy/*

Notice there is not numpy.libs maybe I'm missing a parameter to install it
so I installed again using pypi package
$ pip install numpy==1.19.3

After installed I just deleted all files from numpy.libs folder

  • rm /virtualenv/python3.6/lib/python3.6/site-packages/numpy.libs/*.so

And now, it was fixed! 🟢
I mean, the issue is not reproduced.

Checking differences between numpy.libs of 1.19.3 and 1.19.2 the so libraries different are:

  • 🔴 1.19.3 numpy.libs/libopenblasp-r0-a32f1dca.3.12.so
  • 🟢 1.19.2 numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so

@mihaimaruseac
Copy link

mihaimaruseac commented Oct 29, 2020

TensorFlow building against C++ Numpy crashes with OOM on the Windows VMs. This is with numpy-1.19.3, numpy 1.19.2 is all good.

Error message looks something like:

Traceback (most recent call last):

  File "\\?\T:\tmp\Bazel.runfiles_a8aatfhm\runfiles\org_tensorflow\py_test_dir\tensorflow\python\ops\control_flow_ops_test.py", line 24, in <module>

    import numpy as np

  File "C:\Python36\lib\site-packages\numpy\__init__.py", line 140, in <module>

    from . import core

  File "C:\Python36\lib\site-packages\numpy\core\__init__.py", line 98, in <module>

    from . import _add_newdocs

  File "<frozen importlib._bootstrap>", line 971, in _find_and_load

  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked

  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked

  File "<frozen importlib._bootstrap_external>", line 674, in exec_module

  File "<frozen importlib._bootstrap_external>", line 771, in get_code

  File "<frozen importlib._bootstrap_external>", line 482, in _validate_bytecode_header

MemoryError

I was unable to reproduce outside of TF for now, but will keep trying.

@mattip
Copy link
Member

mattip commented Oct 29, 2020

Maybe we should back out the OpenBLAS 0.3.12, and put in the PR to detect a bad fmod in windows instead.

@seberg
Copy link
Member

seberg commented Oct 29, 2020

That is unfortunate, but does seem likely necessary.

@charris
Copy link
Member

charris commented Oct 29, 2020

I've reopened the warning PR. I'll put it into master also, as at this point we probably don't want the 3.12 library in 1.20 either. It would be nice to pin down what the problem to something easily reproducible so that OpenBLAS could be fixed.

charris added a commit to charris/numpy that referenced this issue Oct 30, 2020
Using OpenBLAS 0.3.12 led to segfaults, see numpygh-17674, so revert
to previous version. The Windows 10 version 2004 problem remains,
but we will warn instead when it is detected.
@moylop260
Copy link
Author

moylop260 commented Oct 30, 2020

Off-Topic just for record, about this version

In other environment, where this error is not reproduced other behavior is raised.
All processors were consuming 100%

  • image

We just uninstalled numpy and now the error was fixed.

@seberg
Copy link
Member

seberg commented Oct 30, 2020

@moylop260 that sounds like an OpenBLAS issue as well, since NumPy will not use more than one thread itself normally. It is very unfortunate, but It sounds pretty safe to assume that revering the wheel to use the older openblas version will fix it.

@charris
Copy link
Member

charris commented Oct 30, 2020

It is very unfortunate

We are missing some tests that might catch this sort of thing, but not sure how to do that.

@charris
Copy link
Member

charris commented Oct 30, 2020

@moylop260 Could you try installing 1.19.4 from the staging repo:

python -mpip install -i https://pypi.anaconda.org/multibuild-wheels-staging/simple numpy

I'd like to get it tested before uploading to PyPI.

@moylop260
Copy link
Author

Hi @charris

Thanks for fixing

Installing numpy-1.19.4 is not reproduced anymore.

@moylop260
Copy link
Author

moylop260 commented Oct 30, 2020

We are missing some tests that might catch this sort of thing, but not sure how to do that.

I think it is a corner case since that we have different host server but they are not reproducing the same issue
Even if we are using the same docker image

I think it is reproduced only for a particular kind of processors.

Let me collect this information if you want.

Or tell me, what kind of output do you need?

If you need something else don't doubt

@charris
Copy link
Member

charris commented Oct 30, 2020

@moylop260 Thanks for checking. Any information about the processor reported in the docker environment would be helpful to the OpenBLAS developers, we can't stick with an old library version forever :)

@mihaimaruseac Could you also check if your problem is fixed?

@mihaimaruseac
Copy link

Will trigger a test build later today, sorry for the delay. We are in the process of a new TF release and the CI system is currently used for the release.

@moylop260
Copy link
Author

I have created the following folder:

There is the info of:

  • docker --version
  • cat /proc/cpuinfo
  • cat /proc/meminfo

And more.

I hope it helps to make a corner case environment to reproduce it running unittest

Regards!

@charris
Copy link
Member

charris commented Nov 2, 2020

I have created the following folder:

@moylop260 Thanks.

moylop260 added a commit to vauxoo-dev/web that referenced this issue Nov 10, 2020
moylop260 added a commit to vauxoo-dev/reporting-engine that referenced this issue Nov 10, 2020
moylop260 added a commit to vauxoo-dev/account-financial-tools that referenced this issue Nov 10, 2020
moylop260 added a commit to vauxoo-dev/account-financial-tools that referenced this issue Nov 18, 2020
This issue is just reproduced using python3.6 and numpy==1.19.3
moylop260 added a commit to vauxoo-dev/account-financial-tools that referenced this issue Nov 18, 2020
This issue is just reproduced using python3.6 and numpy==1.19.3
And just using one kind of processor
 - https://drive.google.com/drive/folders/18mcYQu4GGPzwCRj14Pzy8LvfC9bcLid8?usp=sharing

Check the following history:
 - Vauxoo/maintainer-quality-tools#315
fernandahf added a commit to vauxoo-dev/docker-ubuntu-base that referenced this issue Feb 3, 2021
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor

Currently we know 2 server reproducing the error:

    B&F-production
    Runbot

More info about:

numpy/numpy#17674
numpy/numpy#17759

It is reproducing in the following MR:

https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197

Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=

OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
fernandahf added a commit to vauxoo-dev/docker-ubuntu-base that referenced this issue Feb 3, 2021
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor

Currently we know 2 server reproducing the error:

    B&F-production
    Runbot

More info about:

numpy/numpy#17674
numpy/numpy#17759

It is reproducing in the following MR:

https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197

Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=

OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
@embray
Copy link
Contributor

embray commented Mar 12, 2021

Was anyone ever able to pinpoint exactly why Numpy was segfaulting with OpenBLAS 0.3.12? We are having a similar problem in SageMath with OpenBLAS 0.3.13, though it seems to be hardware-dependent.

@mattip
Copy link
Member

mattip commented Mar 14, 2021

My theory is that OpenBLAS has CPU detection logic that is used to activate different optimizations.. If the docker image somehow interferes with this logic, OpenBLAS will try to use features that do not exist. You may have some luck overriding the logic by setting OPENBLAS_NUM_THREADS=1 or by setting the core type with OPENBLAS_CORETYPE=???.

fernandahf added a commit to vauxoo-dev/docker-ubuntu-base that referenced this issue Apr 15, 2021
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor

Currently we know 2 server reproducing the error:

    B&F-production
    Runbot

More info about:

numpy/numpy#17674
numpy/numpy#17759

It is reproducing in the following MR:

https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197

Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=

OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
fernandahf added a commit to vauxoo-dev/maintainer-quality-tools that referenced this issue Apr 19, 2021
The reason of websocket-client was deactivated is:

numpy has the following issue:
 - numpy/numpy#13059

It is a corner case using a kind of processor, using docker and using python3

More info about:

 - numpy/numpy#17674
 - numpy/numpy#17759

But who is using numpy?

There are different projects using libraries that depends of numpy:
./web/requirements.txt:2:bokeh==1.1.0
./reporting-engine/requirements.txt:1:altair
./icm/requirements.txt:1:pandas
./maintainer-quality-tools/requirements.txt:7:websocket-client

So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash.

It was the path:
 - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152
 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368
 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource
 - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4
 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50
- But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34
is importing numpy if you are using python3
numpy is installed because of the requirements.txt files above and the disaster was did.

We could have removed all numpy requirements but there are a lot of them.
But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client.

However, after researching, we found that:

OpenBLAS creates a number of threads equal to the number of core threads available,
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.

After a test building an image to reproduce the error and using that environment variable and it was fixed.

That change was applied in following PRs:

 - Vauxoo/docker-ubuntu-base#89
 - Vauxoo/docker-ubuntu-base#90

With change applied in docker-ubuntu-base, it's not neccesary avoid to import
websocket-client (allow JS tests work again),
we are covered with env var OPENBLAS_NUM_THREADS.
moylop260 pushed a commit to Vauxoo/maintainer-quality-tools that referenced this issue Apr 19, 2021
The reason of websocket-client was deactivated is:

numpy has the following issue:
 - numpy/numpy#13059

It is a corner case using a kind of processor, using docker and using python3

More info about:

 - numpy/numpy#17674
 - numpy/numpy#17759

But who is using numpy?

There are different projects using libraries that depends of numpy:
./web/requirements.txt:2:bokeh==1.1.0
./reporting-engine/requirements.txt:1:altair
./icm/requirements.txt:1:pandas
./maintainer-quality-tools/requirements.txt:7:websocket-client

So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash.

It was the path:
 - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152
 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368
 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource
 - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4
 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50
- But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34
is importing numpy if you are using python3
numpy is installed because of the requirements.txt files above and the disaster was did.

We could have removed all numpy requirements but there are a lot of them.
But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client.

However, after researching, we found that:

OpenBLAS creates a number of threads equal to the number of core threads available,
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.

After a test building an image to reproduce the error and using that environment variable and it was fixed.

That change was applied in following PRs:

 - Vauxoo/docker-ubuntu-base#89
 - Vauxoo/docker-ubuntu-base#90

With change applied in docker-ubuntu-base, it's not neccesary avoid to import
websocket-client (allow JS tests work again),
we are covered with env var OPENBLAS_NUM_THREADS.
fernandahf added a commit to vauxoo-dev/docker-odoo-image that referenced this issue Apr 19, 2021
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor

Currently we know 2 server reproducing the error:

    B&F-production
    Runbot

More info about:

numpy/numpy#17674
numpy/numpy#17759

It is reproducing in the following MR:

https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197

Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=

OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
fernandahf added a commit to vauxoo-dev/docker-odoo-image that referenced this issue Apr 19, 2021
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor

Currently we know 2 server reproducing the error:

    B&F-production
    Runbot

More info about:

numpy/numpy#17674
numpy/numpy#17759

It is reproducing in the following MR:

https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197

Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=

OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
moylop260 pushed a commit to Vauxoo/docker-odoo-image that referenced this issue Apr 19, 2021
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor

Currently we know 2 server reproducing the error:

    B&F-production
    Runbot

More info about:

numpy/numpy#17674
numpy/numpy#17759

It is reproducing in the following MR:

https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197

Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=

OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
luisg123v pushed a commit to Vauxoo/reporting-engine that referenced this issue Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@seberg @charris @mihaimaruseac @embray @mattip @moylop260 and others