Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Using open_memmap with shape tuple can create npys that are not loadable when tuple contains np.ints #28334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jungerm2 opened this issue Feb 13, 2025 · 1 comment

Comments

@jungerm2
Copy link

Describe the issue:

Passing a shape tuple to open_memmap that contains np.int64 instead of ints does not throw any errors and writes the array to disk without any issues, except that np.load fails to load it.

Specifically, the npy file will start with the following bytes (non-ascii chars removed):

NUMPY v {'descr': '|u1', 'fortran_order': False, 'shape': (np.int64(1), np.int64(2), np.int64(3), np.int64(4)), }

as opposed to:

NUMPY  v {'descr': '|u1', 'fortran_order': False, 'shape': (1, 2, 3, 4), } 

and it seems np.load fails on this as it does a ast.literal_eval on this header and thus cannot deserialize the np.int64()'s.

While the open_memmap docs correctly states that shape should be a tuple of ints, I think that either this should be enforced by raising an error if the type is wrong, or they should be converted to simple ints which would allow loading. This might be an open_memmap problem exclusively, but it might make sense to allow np.load to read headers with np.integer types. At the moment the write succeeds while creating an unusable npy.

Reproduce the code example:

import numpy as np
from numpy.lib.format import open_memmap
shape = np.array([1, 2, 3, 4])
data = open_memmap("data.npy", mode="w+", dtype=np.uint8, shape=tuple(shape))
data[:] = np.ones(shape, dtype=np.uint8)
np.load("data.npy")

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 np.load("data.npy")

File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/_npyio_impl.py:484, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
    481         return format.open_memmap(file, mode=mmap_mode,
    482                                   max_header_size=max_header_size)
    483     else:
--> 484         return format.read_array(fid, allow_pickle=allow_pickle,
    485                                  pickle_kwargs=pickle_kwargs,
    486                                  max_header_size=max_header_size)
    487 else:
    488     # Try a pickle
    489     if not allow_pickle:

File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/format.py:811, in read_array(fp, allow_pickle, pickle_kwargs, max_header_size)
    809 version = read_magic(fp)
    810 _check_version(version)
--> 811 shape, fortran_order, dtype = _read_array_header(
    812         fp, version, max_header_size=max_header_size)
    813 if len(shape) == 0:
    814     count = 1

File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/format.py:644, in _read_array_header(fp, version, max_header_size)
    633 # The header is a pretty-printed string representation of a literal
    634 # Python dictionary with trailing newlines padded to a ARRAY_ALIGN byte
    635 # boundary. The keys are strings.
   (...)
    641 #
    642 # For performance reasons, we try without _filter_header first though
    643 try:
--> 644     d = ast.literal_eval(header)
    645 except SyntaxError as e:
    646     if version <= (2, 0):

File ~/micromamba/envs/py39/lib/python3.9/ast.py:107, in literal_eval(node_or_string)
    105                 return left - right
    106     return _convert_signed_num(node)
--> 107 return _convert(node_or_string)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:96, in literal_eval.<locals>._convert(node)
     94     if len(node.keys) != len(node.values):
     95         _raise_malformed_node(node)
---> 96     return dict(zip(map(_convert, node.keys),
     97                     map(_convert, node.values)))
     98 elif isinstance(node, BinOp) and isinstance(node.op, (Add, Sub)):
     99     left = _convert_signed_num(node.left)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:85, in literal_eval.<locals>._convert(node)
     83     return node.value
     84 elif isinstance(node, Tuple):
---> 85     return tuple(map(_convert, node.elts))
     86 elif isinstance(node, List):
     87     return list(map(_convert, node.elts))

File ~/micromamba/envs/py39/lib/python3.9/ast.py:106, in literal_eval.<locals>._convert(node)
    104         else:
    105             return left - right
--> 106 return _convert_signed_num(node)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:80, in literal_eval.<locals>._convert_signed_num(node)
     78     else:
     79         return - operand
---> 80 return _convert_num(node)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:71, in literal_eval.<locals>._convert_num(node)
     69 def _convert_num(node):
     70     if not isinstance(node, Constant) or type(node.value) not in (int, float, complex):
---> 71         _raise_malformed_node(node)
     72     return node.value

File ~/micromamba/envs/py39/lib/python3.9/ast.py:68, in literal_eval.<locals>._raise_malformed_node(node)
     67 def _raise_malformed_node(node):
---> 68     raise ValueError(f'malformed node or string: {node!r}')

ValueError: malformed node or string: <ast.Call object at 0x7f06b1d31a60>

Python and NumPy Versions:

Tested with python 3.9 and numpy 2.02 as well as python 3.12 and numpy 2.2.3

Runtime Environment:

[{'numpy_version': '2.0.2',
'python': '3.9.21 | packaged by conda-forge | (main, Dec 5 2024, '
'13:51:40) \n'
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='fedora', release='6.12.11-200.fc41.x86_64', version='#1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'Cooperlake',
'filepath': '/home/sjung/micromamba/envs/py39/lib/python3.9/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

Silent failure causes unreadable npys to be created, which caused me data loss (or manual header re-write).

@jungerm2
Copy link
Author

Currently solving this by using shape=tuple(np.atleast_1d(shape).tolist()) which casts the shape to primitive python types. Maybe something like this should be done internally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants