-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: Handle subarrays in descr_to_dtype #13433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test failure was the matmul heisenbug. Restarted test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange beast, nested subfields... But they do seem to work fine for all things (including being resolved at arbitrary depth when there are no fields left).
Anyway, LGTM, will merge soon.
|
||
This function reverses the process, eliminating the empty padding fields. | ||
''' | ||
if isinstance(descr, (str, dict)): | ||
if isinstance(descr, (str, dict, tuple)): | ||
# No padding removal needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is a subarray of structured types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s = np.dtype([('a', np.int8), ('b', np.int16), ('c', np.int32)], align=True)
s_sub = np.dtype((s, (3,)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to recurse for subarray types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an interesting case. Top level subarrays are degenerated on arrays (they are added to the dimensions of the array), cannot quickly find a way to create an array with such a dtype, but it somewhat feels like there may have been strange ways to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s = np.dtype([('a', np.int8), ('b', np.int16), ('c', np.int32)], align=True)
s_sub = np.dtype((s, (1,1)))
arr = np.zeros(3, s_sub)
print(arr.shape, arr.dtype)
arr = np.ndarray(shape=3, buffer=arr, dtype=s_sub)
print(arr.shape, arr.dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, watch out for structured types like (int, [('fields', int)])
which have a non-void base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this one is still broken (although maybe the original issue is solved and this is just another issue). Had a too shallow look at this probably, though :/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the subarray to be at the top level to hit this code-path - nest it inside a structured one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the subarray to be at the top level to hit this code-path - nest it inside a structured one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to unstick my pending comment...
Here's the case that this handles incorrectly:
|
I think we should fail to create a dtype with In any case, its |
Agreed that the non-void struct is not important. We should still support arbitrarily nested subarrays though. |
I think the last commit fixed parsing nested subarrays, at least the tests with the new dtypes pass. |
numpy/lib/tests/test_format.py
Outdated
np.dtype([('x', ([('a', '|i1'), | ||
('', '|V3'), | ||
('b', '|i1'), | ||
('', '|V3'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you inserting these empty fields? The point of my example was that your code fails when there is unnamed padding here (fails by creating new fields, which this function's purpose is to avoid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To hit the failing path, you need to use np.dtype({'names':['a','b'], 'formats':['i1','i1'], 'offsets':[0,4], 'itemsize':8})
as the inner type here, not [('a', '|i1'), ('', '|V3'), ('b', '|i1'), ('', '|V3')]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that passes too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full type I use in a comment above still fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a complete example of a dtype that fails to roundtrip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test added and fixed
'offsets':[0,4], | ||
'itemsize':8, | ||
}, | ||
(3,)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think dtype(dict, tuple)
is legal, which will cause an error during test collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are passing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> np.dtype(int, "this argument is ignored")
dtype('int32')
This test is ignoring the (3,)
silently, which is a different bug.
It seems not, removing |
This stuff still confuses me a bit, but it does seem the test should cover the interesting corner cases, so can probably merge. |
numpy/lib/tests/test_format.py
Outdated
np.dtype(( | ||
np.dtype(( | ||
np.dtype([ | ||
('a', int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
('a', int) | |
('a', int), | |
('b', np.dtype({'names':['a','b'], | |
'formats':['i1','i1'], | |
'offsets':[0,4], | |
'itemsize':8})), |
Finally, this is what will make things fail...
numpy/lib/format.py
Outdated
# subtype, will always have a shape descr[1] | ||
dt = descr_to_dtype(descr[0]) | ||
return numpy.dtype((dt, descr[1])) | ||
return numpy.dtype(descr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return numpy.dtype(descr) | |
return np.dtype(descr_to_dtype(descr[0]), descr[1]) |
Is that assert correct here, since it is not a list around it, there cannot be a field name, so it must have two entries, right? (should probably not leave the assert, or doe sit get stripped on install?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost looks correct, but this should be np.dtype((descr_to_dtype(descr[0]), descr[1]))
close/reopen |
Ok, putting this in then. What I am not quite sure is whether there is some issue that should be opened here, may come back to it, but it will be a fringe issue in any case, I suppose. |
Fixes #13431.
There are alternative spellings of
dtype=[('c', '<f8', (2, 5))]
, handle thedtype=[('c', ('<f8', (5,)), (2,))]
variant.