-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Large memory overhead change in numpy==1.19.5 vs 1.19.4 on ubuntu 20.04 #18141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
OpenBLAS is more memory greedy by default. The short history of this difficulty is:
These memory problems seem docker specific, I expect the environment isn't providing accurate information about resources. The best solution going forward may be to using OPENBLAS_NUM_THREADS to limit the number of threads. @mattip @martin-frbg Here we go again. |
Hm, I thought numpy had settled for building OpenBLAS with |
@martin-frbg It is actually |
@whitty what is the lower bound of the |
Updated table in original:
|
For reference I get get numpy==1.19.5 loaded in a limit of 400000000 (some as 1.19.4) using Relating this to number of threads changes my view of this somewhat - That said I'm not sure I can find a satisfactory alternative rlimit that suits our requirements.
Do you have any specifics here? is OpenBLAS trying to detect resources for scaling? All of our builders are docker-based (to validate multiple OS versions) and I observe the failure isn't 100% reliable across all builds. Perhaps some combination of host OS and container OS is implicated? |
No :) I'm just spitballing. Someone who knows more about how OpenBLAS preallocates memory will need to address that. I find it curious that the memory usage didn't change between 1.19.3 and 1.19.5, my understanding is that it should have gone down, although I suppose that it is possible that the number of threads allocated increased in 1.19.5. |
Nope, OpenBLAS does not currently do resource detection beyond counting the number of available cores.Is that number constant for all entries in your table of minimum softrlimits above ? (In a way it would make me happy if the problem turned out to stem from something entirely different than the GEMM buffer, but I do not remember anybody creating huge sinkholes elsewhere in the code) |
Yes the only difference between the runs are the installed version of numpy |
I'm beginning to suspect this is all caused by the BUFFERSIZE=20 workaround not actually getting through to the compiler/preprocessor when building with gmake. (Looks like I dropped a crucial bit of my original patch that would make Makefile.system append the user-supplied BUFFERSIZE declaration to the CCOMMON_OPT). That is a bit embarassing... |
xref OpenMathLib/OpenBLAS#3066 I added that change as a patch to openblas-libs and triggered a build in MacPython/openblas-libs#50. Once the tarballs are uploaded, I can try to play with the resulting openblas and see if it fixes this issue. |
For me the script
gave these results:
so once @martin-frbg makes that fix official we can build new openblas libs. Aside: building with our openblas in a conda environment with openblas installed is painful: you need to override LDSHARED since otherwise conda python pulls LDSHARED from |
@whitty could you confirm that this issue should indeed be closed? You should be able to find wheels in the weekly builds in a few days |
I've validated I can get basic loading under softlimit 400MB with Our formal build processes won't have any change until 1.19.6 or later comes out on pip Thanks for your help |
Background
We run our compile runs with daemontools to lock virtual memory usage to around 1GB per process (important when running very very parallel builds). During update from 1.19.4 -> 1.19.5 our nightlies trip over code that uses numpy.
Fine with numpy==1.19.4 (1GB):
Needed to build with numpy==1.19.5 (1.7GB):
I've whittled it down to the smallest possible reproduction which is simply importing
numpy
version 1.19.5 with 1GB virtual memory limit. See "Reproducing code example"To examine further I ratcheted the limit up/down with 1.19.4 and 1.9.5 to find the point at which
import
succeeds.softlimit -a 400000000 python3
softlimit -a 1200000000 python3
softlimit -a 1200000000 python3
softlimit -a 1200000000 python3
softlimit -a 400000000 python3
Note this is with nothing but
import numpy
. The overhead difference ~800MB matches the limit changes we needed to get builds running.Reproducing code example:
You can reproduce this just with loading of the module for version 1.19.5 on ubuntu:20.04 with daemontools. See below for Dockerfile to set up clean 20.04 to reproduce.
Invoked with
softlimit -a 1073741824 python3
But ok with 1.19.4:
Dockerfile for reproducing - vanilla 20.04 + python3 + pip + softlimit
Error message:
NumPy/Python version information:
1.19.5
Let me know if there is any more information I can give you.
The text was updated successfully, but these errors were encountered: