Bug: Fix behavior of structured_to_unstructured on non-trivial dtypes #14310

ahaldane · 2019-08-20T20:52:50Z

This fixes up some weird behavior of structured_to_unstructured discovered in #13333.

Performance may not be so great for dtypes with a very large subarray of structured type, but that seems like a poor situation anyway.

charris · 2019-08-20T23:26:43Z

Note to self, need backport to 1.16.x

numpy/lib/recfunctions.py

eric-wieser · 2019-08-21T08:21:30Z

numpy/lib/recfunctions.py

To elaborate - easy to fix as:

dts = [] counts = [] offsets = [] names = [] for i, (dt, count, offset) in enumerate(fields): dts.append(dt) counts.append(count) offsets.append(offset) names.append('f{}'.format(i))

Or as

def unzip(seq, n_items): # like zip(*seq), but with the length of the sequences specified so # that it generalizes to `len(seq) == 0` arrs = ([],) * n_items for item in seq: assert len(item) == n_items for j in range(n_items): arrs[j].append(item[j]) return arrs

I originally put

if n_fields == 0: if dtype is None: raise ValueError('could not determine dtype: ' 'no fields or dtype specified') dts, counts, offsets = [], [], [] else: dts, counts, offsets = zip(*fields)

and this worked fine for structured_to_unstructured. Note that if there are no fields the dtype must be given.

The problem was that after supplying a dtype, the resulting array of shape "(x, y, z, 0)" for some integers x, y, z has a 0-size axis at the end, which trips up unstructured_to_unstructured if you try to reverse the operation, in particular the line return arr.view(out_dtype)[..., 0]. It does not make sense to remove the last axis if the dtype is not 0-sized too, and numpy makes it difficult in that case.

Eg, this fails on that line (after adding if n_fields == 0 checks in unstructured_to_structured)

x = np.zeros(2, structured()) y = rfn.structured_to_unstructured(x, dtype=int) z = rfn.unstructured_to_structured(y, dtype=x.dtype)

Because all this was tricky and probably not what the user intended, I thought it might be best to simply disallow mucking with 0-size arrays.

Edit: This whole comment might also be summarized as: If the structured array has no fields, what should the output dtype be? No choice really makes sense.

Some further investigation show that this view behavior is part of what makes it difficult:

>>> a = np.zeros((2, 0),dtype={'names':[], 'formats':[], 'offsets':[], 'itemsize':8}) >>> b = a.view(np.dtype([])) >>> b array([], shape=(2, 0), dtype={'names':[], 'formats':[], 'offsets':[], 'itemsize':8}) >>> b.dtype == np.dtype([]) False

in other words for view seems to have no effect here?

Edit: This whole comment might also be summarized as: If the structured array has no fields, what should the output dtype be? No choice really makes sense.

That's a really good justification - can you include it in the error message?

this view behavior is part of what makes it difficult

I'm not sure that's the example you want - the correct behavior of that should be to raise ValueError, and it doubt its failure to do that is a problem for you now.

But that is indeed a bug, and stems from the fact that there is not enough difference between:

np.dtype(np.void), which might mean "a byte buffer of unknown length"

np.dtype("V0"), which might mean "a byte buffer of length 1"

np.dtype([]), which might mean "a structured type with no fields"

That was one of the motivations for introducing PyDescr_ISUNSIZED, which currently fires on all 3 but should really only be true for the first one.

the correct behavior of that should be to raise ValueError

OK, that's fine. But that led me to think a little more about what is going on in structured_to_unstructured where I managed to convert a 0-field, itemsize-0 dtype array to a size-0 array. Consider:

>>> a = np.zeros(2, dtype='V0') >>> a.reshape((2,0)) # numpy disallow creating size-0 axis. ValueError: cannot reshape array of size 2 into shape (2,0) >>> a.view((int, 0)) # but I can bypass by viewing with subarray of size 0 array([], shape=(2, 0), dtype=int64)

So I was able to find a bypass numpy's normal restrictions for the size-0 dtype in structured_to_unstructured, but the behavior in my last comment prevented me from doing the reverse in unstructured_to_unstructured.

In any case, this is the mucky size-0 stuff that we can just forget about by simply disallowing fieldless structured types in this PR.

by simply disallowing fieldless structured types in this PR.

I'm fine with that, but we should probably raise NotImplementedError as I mention above. I'll take a look at .view some other time, your example looks pretty damning

Probably worth capturing this view weirdness in a new issue.

ok, Ill add one

ahaldane · 2019-08-22T17:08:40Z

Fixed up the error messages.

charris · 2019-08-23T02:38:43Z

@eric-wieser Happy with this?

eric-wieser · 2019-08-23T04:38:13Z

Will look soon

numpy/lib/recfunctions.py

ahaldane · 2019-08-23T15:48:47Z

good suggestions, fixed.

numpy/lib/recfunctions.py

Fixes numpy#13333

eric-wieser · 2019-08-23T18:53:25Z

Codecov isn't tracking these files for some reason.

charris · 2019-08-23T19:25:33Z

Thanks Allan, Eric.

ahaldane added 00 - Bug component: numpy.lib labels Aug 20, 2019

charris added the 09 - Backport-Candidate PRs tagged should be backported label Aug 20, 2019

charris added this to the 1.16.5 release milestone Aug 20, 2019

eric-wieser reviewed Aug 21, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 21, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 21, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 21, 2019

View reviewed changes

ahaldane mentioned this pull request Aug 21, 2019

BUG: Fix misuse of .names and .fields in various places #14290

Merged

ahaldane force-pushed the fix_struct_to_unstruct_nesting branch 2 times, most recently from 26718bf to 8ec9179 Compare August 22, 2019 17:08

charris changed the title ~~MAINT: fix behavior of structured_to_unstructured on non-trivial dtypes~~ MAINT: Fix behavior of structured_to_unstructured on non-trivial dtypes Aug 22, 2019

eric-wieser self-requested a review August 23, 2019 04:47

eric-wieser reviewed Aug 23, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 23, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

ahaldane force-pushed the fix_struct_to_unstruct_nesting branch 3 times, most recently from a15c961 to 44a90b2 Compare August 23, 2019 15:45

eric-wieser reviewed Aug 23, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 23, 2019

View reviewed changes

numpy/lib/recfunctions.py Outdated Show resolved Hide resolved

MAINT: fix behavior of structured_to_unstructured on non-trivial dtypes

63ecfb8

Fixes numpy#13333

ahaldane force-pushed the fix_struct_to_unstruct_nesting branch from 44a90b2 to 63ecfb8 Compare August 23, 2019 17:15

ahaldane mentioned this pull request Aug 23, 2019

viewing an array with a fieldless structured dtype does not update the dtype: #14344

Closed

eric-wieser approved these changes Aug 23, 2019

View reviewed changes

charris merged commit f7f8759 into numpy:master Aug 23, 2019

This was referenced Aug 23, 2019

BUG: fix behavior of structured_to_unstructured on non-trivial dtypes #14345

Merged

BUG: fix behavior of structured_to_unstructured on non-trivial dtypes #14346

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 23, 2019

charris removed this from the 1.16.5 release milestone Aug 23, 2019

charris changed the title ~~MAINT: Fix behavior of structured_to_unstructured on non-trivial dtypes~~ Bug: Fix behavior of structured_to_unstructured on non-trivial dtypes Aug 23, 2019

tylerjereddy mentioned this pull request Aug 23, 2019

TST, MAINT: include numpy package in .coveragerc #14347

Closed

Uh oh!

Bug: Fix behavior of structured_to_unstructured on non-trivial dtypes #14310

Bug: Fix behavior of structured_to_unstructured on non-trivial dtypes #14310

Uh oh!

Conversation

ahaldane commented Aug 20, 2019

Uh oh!

charris commented Aug 20, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Aug 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Aug 22, 2019

Uh oh!

charris commented Aug 23, 2019

Uh oh!

eric-wieser commented Aug 23, 2019

Uh oh!

Uh oh!

Uh oh!

ahaldane commented Aug 23, 2019

Uh oh!

Uh oh!

Uh oh!

eric-wieser commented Aug 23, 2019

Uh oh!

charris commented Aug 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eric-wieser Aug 21, 2019 •

edited

Loading

ahaldane Aug 21, 2019 •

edited

Loading