Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

whitty · 2021-01-09T05:01:06Z

Background

We run our compile runs with daemontools to lock virtual memory usage to around 1GB per process (important when running very very parallel builds). During update from 1.19.4 -> 1.19.5 our nightlies trip over code that uses numpy.

Fine with numpy==1.19.4 (1GB):

softlimit -a $(expr 1024 \* 1024 \* 1024) make

Needed to build with numpy==1.19.5 (1.7GB):

softlimit -a $(expr 1740 \* 1024 \* 1024) make

I've whittled it down to the smallest possible reproduction which is simply importing numpy version 1.19.5 with 1GB virtual memory limit. See "Reproducing code example"

To examine further I ratcheted the limit up/down with 1.19.4 and 1.9.5 to find the point at which import succeeds.

version	approx minimum import limit
1.19.4	`softlimit -a 400000000 python3`
1.19.5	`softlimit -a 1200000000 python3`
1.20.0rc2	`softlimit -a 1200000000 python3`
1.19.3	`softlimit -a 1200000000 python3`
1.19.2	`softlimit -a 400000000 python3`

Note this is with nothing but import numpy. The overhead difference ~800MB matches the limit changes we needed to get builds running.

Reproducing code example:

You can reproduce this just with loading of the module for version 1.19.5 on ubuntu:20.04 with daemontools. See below for Dockerfile to set up clean 20.04 to reproduce.

import numpy as np

Invoked with softlimit -a 1073741824 python3

root@88e4e56b16a1:~# pip3 install numpy==1.19.5
Collecting numpy==1.19.5
  Downloading numpy-1.19.5-cp38-cp38-manylinux2010_x86_64.whl (14.9 MB)
     |████████████████████████████████| 14.9 MB 10.7 MB/s 
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.4
    Uninstalling numpy-1.19.4:
      Successfully uninstalled numpy-1.19.4
Successfully installed numpy-1.19.5
root@88e4e56b16a1:~# softlimit -a 1073741824 python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/numpy/__init__.py", line 143, in <module>
    from . import lib
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/__init__.py", line 40, in <module>
    from .arraypad import *
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 533, in <module>
    def pad(array, pad_width, mode='constant', **kwargs):
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/overrides.py", line 180, in decorator
    source_object = compile(
MemoryError
>>>

But ok with 1.19.4:

root@88e4e56b16a1:~# pip3 install numpy==1.19.4
Collecting numpy==1.19.4
  Using cached numpy-1.19.4-cp38-cp38-manylinux2010_x86_64.whl (14.5 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
Successfully installed numpy-1.19.4
root@88e4e56b16a1:~# softlimit -a 1073741824 python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>

Dockerfile for reproducing - vanilla 20.04 + python3 + pip + softlimit

cat Dockerfile
from ubuntu:20.04

RUN apt-get update && apt-get install -y \
        python3 python3-pip \
        daemontools \
        && apt-get clean

docker build -t numpylimit .
docker run --rm -it numpylimit /bin/bash

Error message:

>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/numpy/__init__.py", line 143, in <module>
    from . import lib
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/__init__.py", line 40, in <module>
    from .arraypad import *
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 533, in <module>
    def pad(array, pad_width, mode='constant', **kwargs):
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/overrides.py", line 180, in decorator
    source_object = compile(
MemoryError

NumPy/Python version information:

1.19.5

Let me know if there is any more information I can give you.

The text was updated successfully, but these errors were encountered:

charris · 2021-01-09T13:39:52Z

OpenBLAS is more memory greedy by default. The short history of this difficulty is:

1.19.3 uses newer OpenBLAS to work around fmod problem, uses more memory by default maybe a problem with 0.3.12 and NumPy? OpenMathLib/OpenBLAS#2970
1.19.4 uses older OpenBLAS, has fmod problem
1.19.5 uses even newer OpenBLAS to work around fmod problem, tries to use less memory

These memory problems seem docker specific, I expect the environment isn't providing accurate information about resources. The best solution going forward may be to using OPENBLAS_NUM_THREADS to limit the number of threads.

@mattip @martin-frbg Here we go again.

martin-frbg · 2021-01-09T14:01:53Z

Hm, I thought numpy had settled for building OpenBLAS with BUFFERSIZE=22 to keep the "old" memory footprint with 32MB GEMM buffers (at the expense of risking segfaults with large matrix sizes) ?

charris · 2021-01-09T14:28:37Z

@martin-frbg It is actually BUFFERSIZE=20. This may be a different problem, but it looks the same at first glance.

mattip · 2021-01-09T18:55:00Z

@whitty what is the lower bound of the softlimit for 1.19.3 (and if it is not too much trouble 1.19.2)?

whitty · 2021-01-09T23:37:26Z

@whitty what is the lower bound of the softlimit for 1.19.3 (and if it is not too much trouble 1.19.2)?

Updated table in original:

version	approx minimum import limit
1.19.4	`softlimit -a 400000000 python3`
1.19.5	`softlimit -a 1200000000 python3`
1.20.0rc2	`softlimit -a 1200000000 python3`
1.19.3	`softlimit -a 1200000000 python3`
1.19.2	`softlimit -a 400000000 python3`

whitty · 2021-01-10T00:07:56Z

The best solution going forward may be to using OPENBLAS_NUM_THREADS to limit the number of threads.

For reference I get get numpy==1.19.5 loaded in a limit of 400000000 (some as 1.19.4) using OPENBLAS_NUM_THREADS=3.

Relating this to number of threads changes my view of this somewhat - softlimit -a is restricting all process mappings - not actual memory usage. It sounds like the culprit here is threads reserving their stacks, which doesn't seem like a risk to us for limiting concurrent memory usage.

That said I'm not sure I can find a satisfactory alternative rlimit that suits our requirements.

These memory problems seem docker specific, I expect the environment isn't providing accurate information about resources.

Do you have any specifics here? is OpenBLAS trying to detect resources for scaling? All of our builders are docker-based (to validate multiple OS versions) and I observe the failure isn't 100% reliable across all builds. Perhaps some combination of host OS and container OS is implicated?

charris · 2021-01-10T01:02:53Z

Do you have any specifics here?

No :) I'm just spitballing. Someone who knows more about how OpenBLAS preallocates memory will need to address that. I find it curious that the memory usage didn't change between 1.19.3 and 1.19.5, my understanding is that it should have gone down, although I suppose that it is possible that the number of threads allocated increased in 1.19.5.

martin-frbg · 2021-01-10T12:05:48Z

Nope, OpenBLAS does not currently do resource detection beyond counting the number of available cores.Is that number constant for all entries in your table of minimum softrlimits above ? (In a way it would make me happy if the problem turned out to stem from something entirely different than the GEMM buffer, but I do not remember anybody creating huge sinkholes elsewhere in the code)

whitty · 2021-01-11T08:27:36Z

s that number constant for all entries in your table of minimum softrlimits above ?

Yes the only difference between the runs are the installed version of numpy

martin-frbg · 2021-01-13T21:30:11Z

I'm beginning to suspect this is all caused by the BUFFERSIZE=20 workaround not actually getting through to the compiler/preprocessor when building with gmake. (Looks like I dropped a crucial bit of my original patch that would make Makefile.system append the user-supplied BUFFERSIZE declaration to the CCOMMON_OPT). That is a bit embarassing...

mattip · 2021-01-14T06:29:43Z

xref OpenMathLib/OpenBLAS#3066

I added that change as a patch to openblas-libs and triggered a build in MacPython/openblas-libs#50. Once the tarballs are uploaded, I can try to play with the resulting openblas and see if it fixes this issue.

mattip · 2021-01-14T09:34:53Z

For me the script

LIMIT=XXX; softlimit -a ${LIMIT}00000000 python -c "import numpy; print(numpy.__version__)"

gave these results:

version	minimum XXX
1.19.3	34
1.19.4	11
1.19.5	34
with MacPython/openblas-libs#50	11

so once @martin-frbg makes that fix official we can build new openblas libs.

Aside: building with our openblas in a conda environment with openblas installed is painful: you need to override LDSHARED since otherwise conda python pulls LDSHARED from sysconfig.get_config_var('LDSHARED') which adds a -L directive to the conda-supplied libopenblas before the one added by site.cfg.

mattip · 2021-01-20T20:24:14Z

@whitty could you confirm that this issue should indeed be closed? You should be able to find wheels in the weekly builds in a few days

whitty · 2021-01-26T03:02:43Z

I've validated I can get basic loading under softlimit 400MB with numpy-1.21.0.dev0+485.gbec2b07db-cp38-cp38-manylinux2010_x86_64.whl, and a manually invoked build seems to build OK under 1GB softlimit.

Our formal build processes won't have any change until 1.19.6 or later comes out on pip

Thanks for your help

This was referenced Jan 17, 2021

Update OpenBLAS commit to af2b0d0 MacPython/openblas-libs#51

Merged

BLD: update OpenBLAS to af2b0d02 #18196

Merged

mattip closed this as completed in #18196 Jan 20, 2021

charris mentioned this issue Jan 26, 2021

BLD: update OpenBLAS to af2b0d02 #18231

Merged

martin-frbg mentioned this issue Apr 13, 2021

Segmentation fault with openblas >0.3.10 and numpy OpenMathLib/OpenBLAS#3180

Closed

mattip mentioned this issue Jul 25, 2021

numpy requires 1Gb paging file per process instance which causes MemoryError #19556

Closed

martin-frbg mentioned this issue Aug 18, 2021

Multiplication of a matrix with its transpose causes segfault for "large" matrices #19685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

whitty commented Jan 9, 2021 •

edited

Loading

charris commented Jan 9, 2021

martin-frbg commented Jan 9, 2021

charris commented Jan 9, 2021

mattip commented Jan 9, 2021

whitty commented Jan 9, 2021

whitty commented Jan 10, 2021

charris commented Jan 10, 2021

martin-frbg commented Jan 10, 2021

whitty commented Jan 11, 2021

martin-frbg commented Jan 13, 2021

mattip commented Jan 14, 2021

mattip commented Jan 14, 2021

mattip commented Jan 20, 2021

whitty commented Jan 26, 2021

Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141

Comments

whitty commented Jan 9, 2021 • edited Loading

Background

Reproducing code example:

Error message:

NumPy/Python version information:

charris commented Jan 9, 2021

martin-frbg commented Jan 9, 2021

charris commented Jan 9, 2021

mattip commented Jan 9, 2021

whitty commented Jan 9, 2021

whitty commented Jan 10, 2021

charris commented Jan 10, 2021

martin-frbg commented Jan 10, 2021

whitty commented Jan 11, 2021

martin-frbg commented Jan 13, 2021

mattip commented Jan 14, 2021

mattip commented Jan 14, 2021

mattip commented Jan 20, 2021

whitty commented Jan 26, 2021

whitty commented Jan 9, 2021 •

edited

Loading