Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
whitty opened this issue Jan 9, 2021 · 14 comments · Fixed by #18196
Closed

Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

whitty opened this issue Jan 9, 2021 · 14 comments · Fixed by #18196

Comments

@whitty
Copy link

whitty commented Jan 9, 2021

Background

We run our compile runs with daemontools to lock virtual memory usage to around 1GB per process (important when running very very parallel builds). During update from 1.19.4 -> 1.19.5 our nightlies trip over code that uses numpy.

Fine with numpy==1.19.4 (1GB):

softlimit -a $(expr 1024 \* 1024 \* 1024) make

Needed to build with numpy==1.19.5 (1.7GB):

softlimit -a $(expr 1740 \* 1024 \* 1024) make

I've whittled it down to the smallest possible reproduction which is simply importing numpy version 1.19.5 with 1GB virtual memory limit. See "Reproducing code example"

To examine further I ratcheted the limit up/down with 1.19.4 and 1.9.5 to find the point at which import succeeds.

version approx minimum import limit
1.19.4 softlimit -a 400000000 python3
1.19.5 softlimit -a 1200000000 python3
1.20.0rc2 softlimit -a 1200000000 python3
1.19.3 softlimit -a 1200000000 python3
1.19.2 softlimit -a 400000000 python3

Note this is with nothing but import numpy. The overhead difference ~800MB matches the limit changes we needed to get builds running.

Reproducing code example:

You can reproduce this just with loading of the module for version 1.19.5 on ubuntu:20.04 with daemontools. See below for Dockerfile to set up clean 20.04 to reproduce.

import numpy as np

Invoked with softlimit -a 1073741824 python3

root@88e4e56b16a1:~# pip3 install numpy==1.19.5
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
     |████████████████████████████████| 14.9 MB 10.7 MB/s 
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.4
    Uninstalling numpy-1.19.4:
      Successfully uninstalled numpy-1.19.4
Successfully installed numpy-1.19.5
root@88e4e56b16a1:~# softlimit -a 1073741824 python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/numpy/__init__.py", line 143, in <module>
    from . import lib
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/__init__.py", line 40, in <module>
    from .arraypad import *
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 533, in <module>
    def pad(array, pad_width, mode='constant', **kwargs):
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/overrides.py", line 180, in decorator
    source_object = compile(
MemoryError
>>> 

But ok with 1.19.4:

root@88e4e56b16a1:~# pip3 install numpy==1.19.4
Collecting numpy==1.19.4
  Using cached numpy-1.19.4-cp38-cp38-manylinux2010_x86_64.whl (14.5 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
Successfully installed numpy-1.19.4
root@88e4e56b16a1:~# softlimit -a 1073741824 python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> 

Dockerfile for reproducing - vanilla 20.04 + python3 + pip + softlimit

cat Dockerfile
from ubuntu:20.04

RUN apt-get update && apt-get install -y \
        python3 python3-pip \
        daemontools \
        && apt-get clean
docker build -t numpylimit .
docker run --rm -it numpylimit /bin/bash

Error message:

>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/numpy/__init__.py", line 143, in <module>
    from . import lib
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/__init__.py", line 40, in <module>
    from .arraypad import *
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 533, in <module>
    def pad(array, pad_width, mode='constant', **kwargs):
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/overrides.py", line 180, in decorator
    source_object = compile(
MemoryError

NumPy/Python version information:

1.19.5

Let me know if there is any more information I can give you.

@charris
Copy link
Member

charris commented Jan 9, 2021

OpenBLAS is more memory greedy by default. The short history of this difficulty is:

These memory problems seem docker specific, I expect the environment isn't providing accurate information about resources. The best solution going forward may be to using OPENBLAS_NUM_THREADS to limit the number of threads.

@mattip @martin-frbg Here we go again.

@martin-frbg
Copy link

Hm, I thought numpy had settled for building OpenBLAS with BUFFERSIZE=22 to keep the "old" memory footprint with 32MB GEMM buffers (at the expense of risking segfaults with large matrix sizes) ?

@charris
Copy link
Member

charris commented Jan 9, 2021

@martin-frbg It is actually BUFFERSIZE=20. This may be a different problem, but it looks the same at first glance.

@mattip
Copy link
Member

mattip commented Jan 9, 2021

@whitty what is the lower bound of the softlimit for 1.19.3 (and if it is not too much trouble 1.19.2)?

@whitty
Copy link
Author

whitty commented Jan 9, 2021

@whitty what is the lower bound of the softlimit for 1.19.3 (and if it is not too much trouble 1.19.2)?

Updated table in original:

version approx minimum import limit
1.19.4 softlimit -a 400000000 python3
1.19.5 softlimit -a 1200000000 python3
1.20.0rc2 softlimit -a 1200000000 python3
1.19.3 softlimit -a 1200000000 python3
1.19.2 softlimit -a 400000000 python3

@whitty
Copy link
Author

whitty commented Jan 10, 2021

The best solution going forward may be to using OPENBLAS_NUM_THREADS to limit the number of threads.

For reference I get get numpy==1.19.5 loaded in a limit of 400000000 (some as 1.19.4) using OPENBLAS_NUM_THREADS=3.

Relating this to number of threads changes my view of this somewhat - softlimit -a is restricting all process mappings - not actual memory usage. It sounds like the culprit here is threads reserving their stacks, which doesn't seem like a risk to us for limiting concurrent memory usage.

That said I'm not sure I can find a satisfactory alternative rlimit that suits our requirements.

These memory problems seem docker specific, I expect the environment isn't providing accurate information about resources.

Do you have any specifics here? is OpenBLAS trying to detect resources for scaling? All of our builders are docker-based (to validate multiple OS versions) and I observe the failure isn't 100% reliable across all builds. Perhaps some combination of host OS and container OS is implicated?

@charris
Copy link
Member

charris commented Jan 10, 2021

Do you have any specifics here?

No :) I'm just spitballing. Someone who knows more about how OpenBLAS preallocates memory will need to address that. I find it curious that the memory usage didn't change between 1.19.3 and 1.19.5, my understanding is that it should have gone down, although I suppose that it is possible that the number of threads allocated increased in 1.19.5.

@martin-frbg
Copy link

Nope, OpenBLAS does not currently do resource detection beyond counting the number of available cores.Is that number constant for all entries in your table of minimum softrlimits above ? (In a way it would make me happy if the problem turned out to stem from something entirely different than the GEMM buffer, but I do not remember anybody creating huge sinkholes elsewhere in the code)

@whitty
Copy link
Author

whitty commented Jan 11, 2021

s that number constant for all entries in your table of minimum softrlimits above ?

Yes the only difference between the runs are the installed version of numpy

@martin-frbg
Copy link

I'm beginning to suspect this is all caused by the BUFFERSIZE=20 workaround not actually getting through to the compiler/preprocessor when building with gmake. (Looks like I dropped a crucial bit of my original patch that would make Makefile.system append the user-supplied BUFFERSIZE declaration to the CCOMMON_OPT). That is a bit embarassing...

@mattip
Copy link
Member

mattip commented Jan 14, 2021

xref OpenMathLib/OpenBLAS#3066

I added that change as a patch to openblas-libs and triggered a build in MacPython/openblas-libs#50. Once the tarballs are uploaded, I can try to play with the resulting openblas and see if it fixes this issue.

@mattip
Copy link
Member

mattip commented Jan 14, 2021

For me the script

LIMIT=XXX; softlimit -a ${LIMIT}00000000 python -c "import numpy; print(numpy.__version__)"

gave these results:

version minimum XXX
1.19.3 34
1.19.4 11
1.19.5 34
with MacPython/openblas-libs#50 11

so once @martin-frbg makes that fix official we can build new openblas libs.

Aside: building with our openblas in a conda environment with openblas installed is painful: you need to override LDSHARED since otherwise conda python pulls LDSHARED from sysconfig.get_config_var('LDSHARED') which adds a -L directive to the conda-supplied libopenblas before the one added by site.cfg.

@mattip
Copy link
Member

mattip commented Jan 20, 2021

@whitty could you confirm that this issue should indeed be closed? You should be able to find wheels in the weekly builds in a few days

@whitty
Copy link
Author

whitty commented Jan 26, 2021

I've validated I can get basic loading under softlimit 400MB with numpy-1.21.0.dev0+485.gbec2b07db-cp38-cp38-manylinux2010_x86_64.whl, and a manually invoked build seems to build OK under 1GB softlimit.

Our formal build processes won't have any change until 1.19.6 or later comes out on pip

Thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants