-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Bug: Fix behavior of structured_to_unstructured on non-trivial dtypes #14310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note to self, need backport to 1.16.x |
numpy/lib/recfunctions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate - easy to fix as:
dts = []
counts = []
offsets = []
names = []
for i, (dt, count, offset) in enumerate(fields):
dts.append(dt)
counts.append(count)
offsets.append(offset)
names.append('f{}'.format(i))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or as
def unzip(seq, n_items):
# like zip(*seq), but with the length of the sequences specified so
# that it generalizes to `len(seq) == 0`
arrs = ([],) * n_items
for item in seq:
assert len(item) == n_items
for j in range(n_items):
arrs[j].append(item[j])
return arrsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally put
if n_fields == 0:
if dtype is None:
raise ValueError('could not determine dtype: '
'no fields or dtype specified')
dts, counts, offsets = [], [], []
else:
dts, counts, offsets = zip(*fields)
and this worked fine for structured_to_unstructured. Note that if there are no fields the dtype must be given.
The problem was that after supplying a dtype, the resulting array of shape "(x, y, z, 0)" for some integers x, y, z has a 0-size axis at the end, which trips up unstructured_to_unstructured if you try to reverse the operation, in particular the line return arr.view(out_dtype)[..., 0]. It does not make sense to remove the last axis if the dtype is not 0-sized too, and numpy makes it difficult in that case.
Eg, this fails on that line (after adding if n_fields == 0 checks in unstructured_to_structured)
x = np.zeros(2, structured())
y = rfn.structured_to_unstructured(x, dtype=int)
z = rfn.unstructured_to_structured(y, dtype=x.dtype)
Because all this was tricky and probably not what the user intended, I thought it might be best to simply disallow mucking with 0-size arrays.
Edit: This whole comment might also be summarized as: If the structured array has no fields, what should the output dtype be? No choice really makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further investigation show that this view behavior is part of what makes it difficult:
>>> a = np.zeros((2, 0),dtype={'names':[], 'formats':[], 'offsets':[], 'itemsize':8})
>>> b = a.view(np.dtype([]))
>>> b
array([], shape=(2, 0),
dtype={'names':[], 'formats':[], 'offsets':[], 'itemsize':8})
>>> b.dtype == np.dtype([])
False
in other words for view seems to have no effect here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: This whole comment might also be summarized as: If the structured array has no fields, what should the output dtype be? No choice really makes sense.
That's a really good justification - can you include it in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this view behavior is part of what makes it difficult
I'm not sure that's the example you want - the correct behavior of that should be to raise ValueError, and it doubt its failure to do that is a problem for you now.
But that is indeed a bug, and stems from the fact that there is not enough difference between:
np.dtype(np.void), which might mean "a byte buffer of unknown length"np.dtype("V0"), which might mean "a byte buffer of length 1"np.dtype([]), which might mean "a structured type with no fields"
That was one of the motivations for introducing PyDescr_ISUNSIZED, which currently fires on all 3 but should really only be true for the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the correct behavior of that should be to raise ValueError
OK, that's fine. But that led me to think a little more about what is going on in structured_to_unstructured where I managed to convert a 0-field, itemsize-0 dtype array to a size-0 array. Consider:
>>> a = np.zeros(2, dtype='V0')
>>> a.reshape((2,0)) # numpy disallow creating size-0 axis.
ValueError: cannot reshape array of size 2 into shape (2,0)
>>> a.view((int, 0)) # but I can bypass by viewing with subarray of size 0
array([], shape=(2, 0), dtype=int64)So I was able to find a bypass numpy's normal restrictions for the size-0 dtype in structured_to_unstructured, but the behavior in my last comment prevented me from doing the reverse in unstructured_to_unstructured.
In any case, this is the mucky size-0 stuff that we can just forget about by simply disallowing fieldless structured types in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by simply disallowing fieldless structured types in this PR.
I'm fine with that, but we should probably raise NotImplementedError as I mention above. I'll take a look at .view some other time, your example looks pretty damning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth capturing this view weirdness in a new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, Ill add one
26718bf to
8ec9179
Compare
|
Fixed up the error messages. |
|
@eric-wieser Happy with this? |
|
Will look soon |
a15c961 to
44a90b2
Compare
|
good suggestions, fixed. |
44a90b2 to
63ecfb8
Compare
|
Codecov isn't tracking these files for some reason. |
|
Thanks Allan, Eric. |
Fixes #13333
This fixes up some weird behavior of
structured_to_unstructureddiscovered in #13333.Performance may not be so great for dtypes with a very large subarray of structured type, but that seems like a poor situation anyway.