Array from memoryview fails if there's trailing padding #7797

aldanor · 2016-07-02T16:31:58Z

Here's converting an array to a memoryview and back (Python 3) where itemsize equals the offsets of the last item plus its size:

>>> d1 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x']})
>>> a1 = np.empty(0, d1)
>>> memoryview(a1).format
'T{B:x:}'
>>> memoryview(a1).itemsize
1
>>> np.array(memoryview(a1))
array([], 
      dtype=[('x', 'u1')])

If we try to do the same where itemsize is bigger, it fails:

>>> d2 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x'], 'itemsize': 4})
>>> d2.descr
[('x', '|u1'), ('', '|V3')]  # the trailing padding is there
>>> a2 = np.empty(0, d2)
>>> memoryview(a2).format
'T{B:x:}'  # shouldn't this be 'T{B:x:3x}'?
>>> memoryview(a2).itemsize
4
# so far so good...
# however:
>>> np.array(memoryview(a2))
RuntimeWarning: Item size computed from the PEP 3118 buffer format string does not match the actual item size.
NotImplementedError: memoryview: unsupported format T{B:x:}

This seems quite wrong.

Looking at the code where it fails, _dtype_from_pep3118 only accepts a format string and not the itemsize, so the generated format string is probably wrong and should explicitly contain the trailing bytes?

May be somewhat related: #6361

The text was updated successfully, but these errors were encountered:

ahaldane · 2016-07-02T18:07:58Z

Right, it seems like #6361 correctly added trailing padding to structured types, but didn't update the format string.

I went to look how to fix this. It's easy to add the trailing padding so we get a memoryview with the right format string.

However, when converting back to an array, _dtype_from_pep3118 has some special cases to explicitly keep trailing padding as a named field, which was added in 86d3b8 and b5967. Note all the test cases in 86d3b8 related to trailing padding. I'm thinking those should be changed, so that eg

self._check('ixxx', [('f0', 'i'), ('', VV(3))])

should be

self._check('ixxx', {'names': ['f0'], 'formats': ['i'], 'itemsize': 7})

(By the way, numpy is sometimes a bit buggy for arrays containing padding bytes. See eg #2215, #3176, #5224, which mean that you can't always use np.save and np.load if there are padding bytes. I hope to fix these once #6053 (or similar) goes through.)

aldanor · 2016-07-03T12:11:42Z

@ahaldane I was wondering - should the dtypes "roundtrip" this way:

>>> d2 = np.dtype({'formats': ['u1'], 'offsets': [0], 'names': ['x'], 'itemsize': 4})
# assuming we will get 'T{B:x:3x}' as the memoryview format string from this dtype
# in future numpy versions
>>> np.core._internal._dtype_from_pep3118('T{B:x:3x}')
dtype([('x', 'u1'), ('', 'V3')])

So after passing it through array -> memoryview -> array we get an extra trailing field and technically it's a different dtype that's not type-equivalent because dtype constructor doesn't add this field and converter from buffer spec does. In general, why does the converter choose to add a trailing void field instead of explicitly setting the itemsize?

Also consider this:

>>> d3 = np.core._internal._dtype_from_pep3118('T{B:x:3x}')
>>> d3
dtype([('x', 'u1'), ('', 'V3')])
>>> d3 == np.dtype([('x', 'u1'), ('', 'V3')])
False
>>> d3 == d2
False
>>> memoryview(np.empty(0, d3)).format
'T{B:x:3x::}'  # <-- whoops

So the format string doesn't roundtrip either.

If you add an initial offset:

>>> d4 = np.core._internal._dtype_from_pep3118('T{xB:x:2x}')
>>> d4
dtype({'names':['x',''], 'formats':['u1','V2'], 'offsets':[1,2], 'itemsize':4})
# if we have explicit itemsize here, is the dummy V2 field necessary?
>>> memoryview(np.empty(0, d4)).format
'T{xB:x:2x::}'

ahaldane · 2016-07-04T01:13:12Z

@aldanor, yeah I think we do want it to roundtrip. I opened #7798 which fixes both the trailing padding and your roundtripping examples. There are some other issues with trailing padding that came up that I'm still working on there.

Actually, I haven;t "solved" your second examples involving conversion of '' to 'f0' in the list form of specification, I only avoided them in #7798 because we no longer end up with explicit padding bytes. Nevertheless I think that conversion to f0 can still cause roundtripping problems. Maybe the solution is to completely disallow empty field names - they are currently only possible with the dict form of specification, like dtype({'names': ['a', ''], 'formats': ['u1', 'i4']}).

aldanor · 2016-07-04T17:55:28Z

Re: fields with empty names, yea that feels like a right thing to do (wonder if it could cause any potential breakage downstream). What we currently have is this (which doesn't look very consistent):

>>> np.dtype([('', 'u4')])
dtype([('f0', '<u4')])

>>> np.dtype({'names': [''], 'formats': ['u4'], 'offsets': [0]})
dtype([('', '<u4')])

>>> np.core._internal._dtype_from_pep3118('I::')
dtype([('', '<u4')])

aldanor · 2019-01-02T15:34:35Z

@ahaldane Any thoughts on this? Any chances to fit a fix in 1.16? (given that there's a related fix for np.save/np.load)

There was #7798 PR, but it was forgotten?

// Been 2.5 years :) I've hit this bug again in a different context, having completely forgotten about this, and while google-searching for numpy issues, was surprised to find one opened by myself!

Fixes numpy#7797

seberg · 2024-04-17T19:29:55Z

gh-7798 was probably a practically finished for PR to fix this this. But needs to be picked up and made to work with the current code base (the PR was unfortunately hanging for a very long time).

If someone wants to pick up that PR, it is probably pretty straight forward to get it in.

aldanor changed the title ~~Array from memoryview fails if itemsize is set explicitly~~ Array from memoryview fails if there's trailing padding Jul 2, 2016

aldanor mentioned this issue Jul 2, 2016

How to deal with structured/record arrays? pybind/pybind11#67

Closed

ahaldane mentioned this issue Jul 2, 2016

ENH: properly account for trailing padding in PEP3118 #7798

Closed

aldanor mentioned this issue Nov 21, 2016

Add the buffer interface for wrapped STL vectors pybind/pybind11#488

Merged

This was referenced Apr 16, 2018

BUG: Revert multifield-indexing adds padding bytes for NumPy 1.15. #10411

Merged

BUG: Fix np.load for aligned dtypes. #10931

Closed

ahaldane mentioned this issue Aug 17, 2018

BUG: np.save() and np.load() are not idempotent when align=True or fields are discontiguous #8100

Closed

aldanor mentioned this issue Jan 2, 2019

BUG: test, fix loading structured dtypes with padding #12358

Merged

ahaldane added a commit to ahaldane/numpy that referenced this issue Feb 21, 2021

ENH: output trailing padding in PEP3118 format strings

08734b1

Fixes numpy#7797

aldanor mentioned this issue Jan 10, 2022

Discussion: dtype system and integrating record types PyO3/rust-numpy#254

Closed

seberg added Project Possible project, may require specific skills and long commitment 00 - Bug 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. labels Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Array from memoryview fails if there's trailing padding #7797

Array from memoryview fails if there's trailing padding #7797

aldanor commented Jul 2, 2016 •

edited

Loading

ahaldane commented Jul 2, 2016

Uh oh!

aldanor commented Jul 3, 2016 •

edited

Loading

Uh oh!

ahaldane commented Jul 4, 2016

Uh oh!

aldanor commented Jul 4, 2016 •

edited

Loading

Uh oh!

aldanor commented Jan 2, 2019 •

edited

Loading

Uh oh!

seberg commented Apr 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

Array from memoryview fails if there's trailing padding #7797

Array from memoryview fails if there's trailing padding #7797

Comments

aldanor commented Jul 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ahaldane commented Jul 2, 2016

Uh oh!

aldanor commented Jul 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane commented Jul 4, 2016

Uh oh!

aldanor commented Jul 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldanor commented Jan 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldanor commented Jul 2, 2016 •

edited

Loading

aldanor commented Jul 3, 2016 •

edited

Loading

aldanor commented Jul 4, 2016 •

edited

Loading

aldanor commented Jan 2, 2019 •

edited

Loading

seberg commented Apr 17, 2024 •

edited

Loading