-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Array from memoryview fails if there's trailing padding #7797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Right, it seems like #6361 correctly added trailing padding to structured types, but didn't update the format string. I went to look how to fix this. It's easy to add the trailing padding so we get a memoryview with the right format string. However, when converting back to an array,
should be
(By the way, numpy is sometimes a bit buggy for arrays containing padding bytes. See eg #2215, #3176, #5224, which mean that you can't always use |
@ahaldane I was wondering - should the dtypes "roundtrip" this way: >>> d2 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x'], 'itemsize': 4})
# assuming we will get 'T{B:x:3x}' as the memoryview format string from this dtype
# in future numpy versions
>>> np.core._internal._dtype_from_pep3118('T{B:x:3x}')
dtype([('x', 'u1'), ('', 'V3')]) So after passing it through array -> memoryview -> array we get an extra trailing field and technically it's a different dtype that's not type-equivalent because dtype constructor doesn't add this field and converter from buffer spec does. In general, why does the converter choose to add a trailing void field instead of explicitly setting the itemsize? Also consider this: >>> d3 = np.core._internal._dtype_from_pep3118('T{B:x:3x}')
>>> d3
dtype([('x', 'u1'), ('', 'V3')])
>>> d3 == np.dtype([('x', 'u1'), ('', 'V3')])
False
>>> d3 == d2
False
>>> memoryview(np.empty(0, d3)).format
'T{B:x:3x::}' # <-- whoops So the format string doesn't roundtrip either. If you add an initial offset: >>> d4 = np.core._internal._dtype_from_pep3118('T{xB:x:2x}')
>>> d4
dtype({'names':['x',''], 'formats':['u1','V2'], 'offsets':[1,2], 'itemsize':4})
# if we have explicit itemsize here, is the dummy V2 field necessary?
>>> memoryview(np.empty(0, d4)).format
'T{xB:x:2x::}' |
@aldanor, yeah I think we do want it to roundtrip. I opened #7798 which fixes both the trailing padding and your roundtripping examples. There are some other issues with trailing padding that came up that I'm still working on there. Actually, I haven;t "solved" your second examples involving conversion of |
Re: fields with empty names, yea that feels like a right thing to do (wonder if it could cause any potential breakage downstream). What we currently have is this (which doesn't look very consistent): >>> np.dtype([('', 'u4')])
dtype([('f0', '<u4')])
>>> np.dtype({'names': [''], 'formats': ['u4'], 'offsets': [0]})
dtype([('', '<u4')])
>>> np.core._internal._dtype_from_pep3118('I::')
dtype([('', '<u4')]) |
@ahaldane Any thoughts on this? Any chances to fit a fix in 1.16? (given that there's a related fix for np.save/np.load) There was #7798 PR, but it was forgotten? // Been 2.5 years :) I've hit this bug again in a different context, having completely forgotten about this, and while google-searching for numpy issues, was surprised to find one opened by myself! |
gh-7798 was probably a practically finished for PR to fix this this. But needs to be picked up and made to work with the current code base (the PR was unfortunately hanging for a very long time). If someone wants to pick up that PR, it is probably pretty straight forward to get it in. |
Uh oh!
There was an error while loading. Please reload this page.
Here's converting an array to a memoryview and back (Python 3) where itemsize equals the offsets of the last item plus its size:
If we try to do the same where itemsize is bigger, it fails:
This seems quite wrong.
Looking at the code where it fails,
_dtype_from_pep3118
only accepts a format string and not the itemsize, so the generated format string is probably wrong and should explicitly contain the trailing bytes?May be somewhat related: #6361
The text was updated successfully, but these errors were encountered: