Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Array from memoryview fails if there's trailing padding #7797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aldanor opened this issue Jul 2, 2016 · 6 comments
Open

Array from memoryview fails if there's trailing padding #7797

aldanor opened this issue Jul 2, 2016 · 6 comments
Labels
00 - Bug 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. Project Possible project, may require specific skills and long commitment

Comments

@aldanor
Copy link

aldanor commented Jul 2, 2016

Here's converting an array to a memoryview and back (Python 3) where itemsize equals the offsets of the last item plus its size:

>>> d1 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x']})
>>> a1 = np.empty(0, d1)
>>> memoryview(a1).format
'T{B:x:}'
>>> memoryview(a1).itemsize
1
>>> np.array(memoryview(a1))
array([], 
      dtype=[('x', 'u1')])

If we try to do the same where itemsize is bigger, it fails:

>>> d2 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x'], 'itemsize': 4})
>>> d2.descr
[('x', '|u1'), ('', '|V3')]  # the trailing padding is there
>>> a2 = np.empty(0, d2)
>>> memoryview(a2).format
'T{B:x:}'  # shouldn't this be 'T{B:x:3x}'?
>>> memoryview(a2).itemsize
4
# so far so good...
# however:
>>> np.array(memoryview(a2))
RuntimeWarning: Item size computed from the PEP 3118 buffer format string does not match the actual item size.
NotImplementedError: memoryview: unsupported format T{B:x:}

This seems quite wrong.

Looking at the code where it fails, _dtype_from_pep3118 only accepts a format string and not the itemsize, so the generated format string is probably wrong and should explicitly contain the trailing bytes?

May be somewhat related: #6361

@aldanor aldanor changed the title Array from memoryview fails if itemsize is set explicitly Array from memoryview fails if there's trailing padding Jul 2, 2016
@ahaldane
Copy link
Member

ahaldane commented Jul 2, 2016

Right, it seems like #6361 correctly added trailing padding to structured types, but didn't update the format string.

I went to look how to fix this. It's easy to add the trailing padding so we get a memoryview with the right format string.

However, when converting back to an array, _dtype_from_pep3118 has some special cases to explicitly keep trailing padding as a named field, which was added in 86d3b8 and b5967. Note all the test cases in 86d3b8 related to trailing padding. I'm thinking those should be changed, so that eg

self._check('ixxx', [('f0', 'i'), ('', VV(3))])

should be

self._check('ixxx', {'names': ['f0'], 'formats': ['i'], 'itemsize': 7})

(By the way, numpy is sometimes a bit buggy for arrays containing padding bytes. See eg #2215, #3176, #5224, which mean that you can't always use np.save and np.load if there are padding bytes. I hope to fix these once #6053 (or similar) goes through.)

@aldanor
Copy link
Author

aldanor commented Jul 3, 2016

@ahaldane I was wondering - should the dtypes "roundtrip" this way:

>>> d2 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x'], 'itemsize': 4})
# assuming we will get 'T{B:x:3x}' as the memoryview format string from this dtype
# in future numpy versions
>>> np.core._internal._dtype_from_pep3118('T{B:x:3x}')
dtype([('x', 'u1'), ('', 'V3')])

So after passing it through array -> memoryview -> array we get an extra trailing field and technically it's a different dtype that's not type-equivalent because dtype constructor doesn't add this field and converter from buffer spec does. In general, why does the converter choose to add a trailing void field instead of explicitly setting the itemsize?

Also consider this:

>>> d3 = np.core._internal._dtype_from_pep3118('T{B:x:3x}')
>>> d3
dtype([('x', 'u1'), ('', 'V3')])
>>> d3 == np.dtype([('x', 'u1'), ('', 'V3')])
False
>>> d3 == d2
False
>>> memoryview(np.empty(0, d3)).format
'T{B:x:3x::}'  # <-- whoops

So the format string doesn't roundtrip either.

If you add an initial offset:

>>> d4 = np.core._internal._dtype_from_pep3118('T{xB:x:2x}')
>>> d4
dtype({'names':['x',''], 'formats':['u1','V2'], 'offsets':[1,2], 'itemsize':4})
# if we have explicit itemsize here, is the dummy V2 field necessary?
>>> memoryview(np.empty(0, d4)).format
'T{xB:x:2x::}'

@ahaldane
Copy link
Member

ahaldane commented Jul 4, 2016

@aldanor, yeah I think we do want it to roundtrip. I opened #7798 which fixes both the trailing padding and your roundtripping examples. There are some other issues with trailing padding that came up that I'm still working on there.

Actually, I haven;t "solved" your second examples involving conversion of '' to 'f0' in the list form of specification, I only avoided them in #7798 because we no longer end up with explicit padding bytes. Nevertheless I think that conversion to f0 can still cause roundtripping problems. Maybe the solution is to completely disallow empty field names - they are currently only possible with the dict form of specification, like dtype({'names': ['a', ''], 'formats': ['u1', 'i4']}).

@aldanor
Copy link
Author

aldanor commented Jul 4, 2016

Re: fields with empty names, yea that feels like a right thing to do (wonder if it could cause any potential breakage downstream). What we currently have is this (which doesn't look very consistent):

>>> np.dtype([('', 'u4')])
dtype([('f0', '<u4')])

>>> np.dtype({'names': [''], 'formats': ['u4'], 'offsets': [0]})
dtype([('', '<u4')])

>>> np.core._internal._dtype_from_pep3118('I::')
dtype([('', '<u4')])

@aldanor
Copy link
Author

aldanor commented Jan 2, 2019

@ahaldane Any thoughts on this? Any chances to fit a fix in 1.16? (given that there's a related fix for np.save/np.load)

There was #7798 PR, but it was forgotten?

// Been 2.5 years :) I've hit this bug again in a different context, having completely forgotten about this, and while google-searching for numpy issues, was surprised to find one opened by myself!

ahaldane added a commit to ahaldane/numpy that referenced this issue Feb 21, 2021
@seberg seberg added Project Possible project, may require specific skills and long commitment 00 - Bug 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. labels Apr 17, 2024
@seberg
Copy link
Member

seberg commented Apr 17, 2024

gh-7798 was probably a practically finished for PR to fix this this. But needs to be picked up and made to work with the current code base (the PR was unfortunately hanging for a very long time).

If someone wants to pick up that PR, it is probably pretty straight forward to get it in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. Project Possible project, may require specific skills and long commitment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants