Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ahaldane
Copy link
Member

Fixes #13333

This fixes up some weird behavior of structured_to_unstructured discovered in #13333.

Performance may not be so great for dtypes with a very large subarray of structured type, but that seems like a poor situation anyway.

@charris
Copy link
Member

charris commented Aug 20, 2019

Note to self, need backport to 1.16.x

@charris charris added this to the 1.16.5 release milestone Aug 20, 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate - easy to fix as:

dts = []
counts = []
offsets = []
names = []
for i, (dt, count, offset) in enumerate(fields):
   dts.append(dt)
   counts.append(count)
   offsets.append(offset)
   names.append('f{}'.format(i))

Copy link
Member

@eric-wieser eric-wieser Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or as

def unzip(seq, n_items):
    # like zip(*seq), but with the length of the sequences specified so
    # that it generalizes to `len(seq) == 0`
    arrs = ([],) * n_items
    for item in seq:
        assert len(item) == n_items
        for j in range(n_items):
            arrs[j].append(item[j])
    return arrs

Copy link
Member Author

@ahaldane ahaldane Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally put

    if n_fields == 0:
        if dtype is None:
            raise ValueError('could not determine dtype: '
                             'no fields or dtype specified')
        dts, counts, offsets = [], [], []
    else:
        dts, counts, offsets = zip(*fields)

and this worked fine for structured_to_unstructured. Note that if there are no fields the dtype must be given.

The problem was that after supplying a dtype, the resulting array of shape "(x, y, z, 0)" for some integers x, y, z has a 0-size axis at the end, which trips up unstructured_to_unstructured if you try to reverse the operation, in particular the line return arr.view(out_dtype)[..., 0]. It does not make sense to remove the last axis if the dtype is not 0-sized too, and numpy makes it difficult in that case.

Eg, this fails on that line (after adding if n_fields == 0 checks in unstructured_to_structured)

x = np.zeros(2, structured())
y = rfn.structured_to_unstructured(x, dtype=int)
z = rfn.unstructured_to_structured(y, dtype=x.dtype)

Because all this was tricky and probably not what the user intended, I thought it might be best to simply disallow mucking with 0-size arrays.

Edit: This whole comment might also be summarized as: If the structured array has no fields, what should the output dtype be? No choice really makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further investigation show that this view behavior is part of what makes it difficult:

>>> a = np.zeros((2, 0),dtype={'names':[], 'formats':[], 'offsets':[], 'itemsize':8})
>>> b = a.view(np.dtype([]))
>>> b
array([], shape=(2, 0),
      dtype={'names':[], 'formats':[], 'offsets':[], 'itemsize':8})
>>> b.dtype == np.dtype([])
False

in other words for view seems to have no effect here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: This whole comment might also be summarized as: If the structured array has no fields, what should the output dtype be? No choice really makes sense.

That's a really good justification - can you include it in the error message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this view behavior is part of what makes it difficult

I'm not sure that's the example you want - the correct behavior of that should be to raise ValueError, and it doubt its failure to do that is a problem for you now.

But that is indeed a bug, and stems from the fact that there is not enough difference between:

  • np.dtype(np.void), which might mean "a byte buffer of unknown length"
  • np.dtype("V0"), which might mean "a byte buffer of length 1"
  • np.dtype([]), which might mean "a structured type with no fields"

That was one of the motivations for introducing PyDescr_ISUNSIZED, which currently fires on all 3 but should really only be true for the first one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct behavior of that should be to raise ValueError

OK, that's fine. But that led me to think a little more about what is going on in structured_to_unstructured where I managed to convert a 0-field, itemsize-0 dtype array to a size-0 array. Consider:

>>> a = np.zeros(2, dtype='V0') 
>>> a.reshape((2,0))  # numpy disallow creating size-0 axis.
ValueError: cannot reshape array of size 2 into shape (2,0)
>>> a.view((int, 0))  # but I can bypass by viewing with subarray of size 0
array([], shape=(2, 0), dtype=int64)

So I was able to find a bypass numpy's normal restrictions for the size-0 dtype in structured_to_unstructured, but the behavior in my last comment prevented me from doing the reverse in unstructured_to_unstructured.

In any case, this is the mucky size-0 stuff that we can just forget about by simply disallowing fieldless structured types in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by simply disallowing fieldless structured types in this PR.

I'm fine with that, but we should probably raise NotImplementedError as I mention above. I'll take a look at .view some other time, your example looks pretty damning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth capturing this view weirdness in a new issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, Ill add one

@ahaldane ahaldane force-pushed the fix_struct_to_unstruct_nesting branch 2 times, most recently from 26718bf to 8ec9179 Compare August 22, 2019 17:08
@ahaldane
Copy link
Member Author

Fixed up the error messages.

@charris charris changed the title MAINT: fix behavior of structured_to_unstructured on non-trivial dtypes MAINT: Fix behavior of structured_to_unstructured on non-trivial dtypes Aug 22, 2019
@charris
Copy link
Member

charris commented Aug 23, 2019

@eric-wieser Happy with this?

@eric-wieser
Copy link
Member

Will look soon

@eric-wieser eric-wieser self-requested a review August 23, 2019 04:47
@ahaldane ahaldane force-pushed the fix_struct_to_unstruct_nesting branch 3 times, most recently from a15c961 to 44a90b2 Compare August 23, 2019 15:45
@ahaldane
Copy link
Member Author

good suggestions, fixed.

@eric-wieser
Copy link
Member

Codecov isn't tracking these files for some reason.

@charris charris merged commit f7f8759 into numpy:master Aug 23, 2019
@charris
Copy link
Member

charris commented Aug 23, 2019

Thanks Allan, Eric.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 23, 2019
@charris charris removed this from the 1.16.5 release milestone Aug 23, 2019
@charris charris changed the title MAINT: Fix behavior of structured_to_unstructured on non-trivial dtypes Bug: Fix behavior of structured_to_unstructured on non-trivial dtypes Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Weird behavior of structured_to_unstructured on non-trivial dtypes

3 participants