-
-
Couldn't load subscription status.
- Fork 962
ENH: Accept nested CAI arrays in cupy.asarray and indexing #9419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The indexig path went through `min_scalar_type`, I am not sure that makes sense, but kept it for now. However, that required a slight change there because the old call expected a list/tuple (and this is a bare CAI), but full conversion seemed not helpful. I changed the flattening code to append recursively. It is still depth first. The reason is that it is now necessary to convert to a cupy array earlier or `obj.ndim` failed in the tests for CAI objects. Note: That is a very subtle change, because that path requires a `.shape` and `.dtype` attribute for objects supporting `__cupy_get_ndarray__`. If that is a worry, we should tweak this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bloated this slightly now, to definitely avoid regressions.
It also re-organizes the error raising in the indexing code. I think that should be safe/fine enough. But happy to undo this as it is unrelated to the original issue.
(I guess part is, I don't like min_scalar_type out of principle :).)
| except Exception: | ||
| # Can't convert, raise original (probably identical) error. | ||
| # For example `[1, [2]]` is "ragged" and fails this way. | ||
| raise e from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit, this got a bit longer than I hoped for and min_scalar_type should convert rarely.
But since the min_scalar_type looked unclear to me tried to re-organize it to not try ahead of time.
(For reference it was added in gh-7286.)
| return a.dtype | ||
| cdef _ndarray_base arr = _convert_from_cupy_like(a, error=False) | ||
| if arr is not None: | ||
| return arr.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that _array_info_from_nested_sequence actually requires a sequence.
We could use _compute_concat_info_impl instead. But it currently assumes that an object with __cupy_get_ndarray__ also has a .shape and .dtype attribute.
So, I assumed that just using the simple conversion pattern is good enough here.
|
/test mini |
|
|
||
| # If this is a NumPy array, it should have an unsupportd dtype | ||
| # raise that as as not "integer or boolean" dtype. | ||
| if isinstance(s, numpy.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a special edge case, but since s is a tensor of another CUDA tensor library that supports the CUDA array interface and has a dtype not supported by CuPy, this line might need to evaluate to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm good point, it's surprisingly tricky (to the point that I wouldn't bother making NumPy 100%)!
I'll let it sink in for a moment. min_scalar_type is probably pragmatic here in the end.
I think I'll apply this and that should get us as close as we reasonably can:
try:
dtype = min_scalar_type(s)
except Exception:
raise e from None # raise original error
if dtype.char in _dtype.all_type_chars:
raise # Conversion failed but dtype is fine, original error.
elif dtype.char != "O" or isinstance(s, numpy.ndarray):
raise IndexError('arrays used as indices...')
# Object dtype but not a NumPy array is probably an arbitrary
# object (that can be wrapped into a 0-D array).
raise IndexError(generic_message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asi1024 I am going back on my statement. You are right, but the TL;DR is: I am tempted to just delete this for simplicity or otherwise would keep it approximately as is.
NumPy also raises the generic IndexError for __array_interface__ objects. The specific error is only used for numpy.ndarray.
So matching that, we could say that only cupy.ndarray should get this specific error and that is already the case.
Or we keep broadening it up to numpy.ndarray input as well (which this attempts).
The problem is that for example arr[1.0] would also go into the "arrays used as indices" path there... NumPy may have a fighting chance to make this specific, but I don't really think even there it is worth the trouble.
EDIT: I'll simplify this to check for dtype.kind not in {'b', 'i', 'u'} though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, as you say, this case is unlikely to become an issue. Let's revisit it if someone opens an issue.
|
/test mini |
|
LGTM! |
DRAFT: Based on previous PR gh-9418This fixes nested CAI coercion and indirectly gh-8352. (I am not sure how much of a priority this is in the end!)
I changed the flattening code to append recursively. It is still
depth first.
The reason is that it is now necessary to convert to a cupy array
earlier or
obj.ndimfailed in the tests for CAI objects.The
min_scalar_typecall was the original source of the issue and problematic. While it is fixed, I opted to remove it from the indexing code anyway.EDIT: Updated to reflect change in PR content.