Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: precision error when using jemalloc as memory allocator #27988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pedroaugustosmribeiro opened this issue Dec 12, 2024 · 7 comments
Closed
Labels

Comments

@pedroaugustosmribeiro
Copy link

Describe the issue:

When testing numpy using jemalloc as memory allocator (for performance reasons), one test of numy.core.test fails (bellow) because of precision. When using Linux standard malloc, all tests pass (all of numpy.test)

Reproduce the code example:

LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libjemalloc.so" python -c 'import numpy;numpy.core.test()'

Error message:

================================================================ FAILURES ================================================================
___________________________________________________ TestLogspace.test_start_stop_array ___________________________________________________

self = <numpy.core.tests.test_function_base.TestLogspace object at 0x7fcbed9a8860>

    def test_start_stop_array(self):
        start = array([0., 1.])
        stop = array([6., 7.])
        t1 = logspace(start, stop, 6)
        t2 = stack([logspace(_start, _stop, 6)
                    for _start, _stop in zip(start, stop)], axis=1)
        assert_equal(t1, t2)
        t3 = logspace(start, stop[0], 6)
        t4 = stack([logspace(_start, stop[0], 6)
                    for _start in start], axis=1)
>       assert_equal(t3, t4)

self       = <numpy.core.tests.test_function_base.TestLogspace object at 0x7fcbed9a8860>
start      = array([0., 1.])
stop       = array([6., 7.])
t1         = array([[1.00000000e+00, 1.00000000e+01],
       [1.58489319e+01, 1.58489319e+02],
       [2.51188643e+02, 2.51188643e+...   [3.98107171e+03, 3.98107171e+04],
       [6.30957344e+04, 6.30957344e+05],
       [1.00000000e+06, 1.00000000e+07]])
t2         = array([[1.00000000e+00, 1.00000000e+01],
       [1.58489319e+01, 1.58489319e+02],
       [2.51188643e+02, 2.51188643e+...   [3.98107171e+03, 3.98107171e+04],
       [6.30957344e+04, 6.30957344e+05],
       [1.00000000e+06, 1.00000000e+07]])
t3         = array([[1.00000000e+00, 1.00000000e+01],
       [1.58489319e+01, 1.00000000e+02],
       [2.51188643e+02, 1.00000000e+...   [3.98107171e+03, 1.00000000e+04],
       [6.30957344e+04, 1.00000000e+05],
       [1.00000000e+06, 1.00000000e+06]])
t4         = array([[1.00000000e+00, 1.00000000e+01],
       [1.58489319e+01, 1.00000000e+02],
       [2.51188643e+02, 1.00000000e+...   [3.98107171e+03, 1.00000000e+04],
       [6.30957344e+04, 1.00000000e+05],
       [1.00000000e+06, 1.00000000e+06]])

miniconda3/envs/murilo/lib/python3.12/site-packages/numpy/core/tests/test_function_base.py:65:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<built-in function eq>, array([[1.00000000e+00, 1.00000000e+01],
       [1.58489319e+01, 1.00000000e+02],
       [2.5...  [3.98107171e+03, 1.00000000e+04],
       [6.30957344e+04, 1.00000000e+05],
       [1.00000000e+06, 1.00000000e+06]]))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError:
E           Arrays are not equal
E
E           Mismatched elements: 2 / 12 (16.7%)
E           Max absolute difference: 4.54747351e-13
E           Max relative difference: 1.1422737e-16
E            x: array([[1.000000e+00, 1.000000e+01],
E                  [1.584893e+01, 1.000000e+02],
E                  [2.511886e+02, 1.000000e+03],...
E            y: array([[1.000000e+00, 1.000000e+01],
E                  [1.584893e+01, 1.000000e+02],
E                  [2.511886e+02, 1.000000e+03],...

args       = (<built-in function eq>, array([[1.00000000e+00, 1.00000000e+01],
       [1.58489319e+01, 1.00000000e+02],
       [2.5...  [3.98107171e+03, 1.00000000e+04],
       [6.30957344e+04, 1.00000000e+05],
       [1.00000000e+06, 1.00000000e+06]]))
func       = <function assert_array_compare at 0x7fcbf6938540>
kwds       = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}
self       = <contextlib._GeneratorContextManager object at 0x7fcbf69161e0>

Python and NumPy Versions:

1.26.4
3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0]

Runtime Environment:

[{'numpy_version': '1.26.4',
'python': '3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) '
'[GCC 12.4.0]',
'uname': uname_result(system='Linux', node='***', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
{'architecture': 'Cooperlake',
'filepath': '/home/pedro/miniconda3/envs/murilo/lib/libopenblasp-r0.3.28.so',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libopenblas',
'threading_layer': 'openmp',
'user_api': 'blas',
'version': '0.3.28'},
{'filepath': '/home/pedro/miniconda3/envs/murilo/lib/libomp.so',
'internal_api': 'openmp',
'num_threads': 24,
'prefix': 'libomp',
'user_api': 'openmp',
'version': None}]

Context for the issue:

for High Performance Computing jobs using jemalloc as memory allocator can decrease memory usage, allocate it faster and is recommended for usage with dask.

@pedroaugustosmribeiro
Copy link
Author

pedroaugustosmribeiro commented Dec 12, 2024

For Numpy and Python versions bellow all tests pass:
2.2.0
3.13.1 experimental free-threading build | packaged by conda-forge | (main, Dec 5 2024, 21:20:03) [GCC 13.3.0]

Runtime:
[{'numpy_version': '2.2.0',
'python': '3.13.1 experimental free-threading build | packaged by '
'conda-forge | (main, Dec 5 2024, 21:20:03) [GCC 13.3.0]',
'uname': uname_result(system='Linux', node='***', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
{'architecture': 'Cooperlake',
'filepath': '/home/pedro/miniconda3/envs/pyft/lib/libopenblasp-r0.3.28.so',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libopenblas',
'threading_layer': 'openmp',
'user_api': 'blas',
'version': '0.3.28'},
{'filepath': '/home/pedro/miniconda3/envs/pyft/lib/libomp.so',
'internal_api': 'openmp',
'num_threads': 24,
'prefix': 'libomp',
'user_api': 'openmp',
'version': None}]

@seberg
Copy link
Member

seberg commented Dec 12, 2024

My suspicion would be that this is pretty harmless. The power implementation has a scalar fall-back that uses the native math lib (not sure if it wouldn't be nicer to generally use the SIMD path even in the scalar case to get stable results).

Either way, it also decides whether it is taken based on a memory overlap check, and that check was slightly wrong. Maybe jemalloc allocates differently.

I.e. you are basically running into: #26972

@pedroaugustosmribeiro
Copy link
Author

My suspicion would be that this is pretty harmless. The power implementation has a scalar fall-back that uses the native math lib (not sure if it wouldn't be nicer to generally use the SIMD path even in the scalar case to get stable results).

Either way, it also decides whether it is taken based on a memory overlap check, and that check was slightly wrong. Maybe jemalloc allocates differently.

I.e. you are basically running into: #26972

Thank you for the prompt reply @seberg.
I'm not an expert on C language and low level programming, but the whole point of using jemalloc is to allocate memory differently, from what it says on page 29 of zendnn-user-guide: " jemalloc is a memory allocator that emphasizes fragmentation avoidance and scalable concurrency support."

Anyway I guess you are right that it was fixed on #26972, as testing with a newer numpy version passed all tests.
As a sidenote, on some tests with older numpy 1.24 and dask (using xarray), it was signficantly faster using jemalloc with no apparent errors.

@seberg
Copy link
Member

seberg commented Dec 13, 2024

Closing then, since I don't think you are worried about the actual precision change, just the test failure observation.

@seberg seberg closed this as completed Dec 13, 2024
@seberg
Copy link
Member

seberg commented Dec 13, 2024

FWIW, Python 3.13 switched to mimalloc IIUC. I'll doubt that is preloaded to override malloc, but we should maybe make sure to go through Python in the future to get that (rather than using malloc).

@pedroaugustosmribeiro
Copy link
Author

Closing then, since I don't think you are worried about the actual precision change, just the test failure observation.

I'm actually concerned about precision change, but I understood from your replies that precision wasn't affected significantly: "My suspicion would be that this is pretty harmless. " and it was more about an issue with the test suites, which newer releases (2.x) don't reproduce, maybe it was fixed, if I understood correctly.

@seberg
Copy link
Member

seberg commented Dec 14, 2024

Yes, the precision might even be pretty much the same, just rounding off in a different direction (most likely the SIMD is very slightly worse, but depending on the math library I don't know).
In either case, the difference isn't worrying in itself precision wise (although it could be nice to avoid it, just to avoid surprises).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants