-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
DOC Add info on 'array-like' array API inputs when array_api_dispatch=False
#32676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Another thought that I am 50/50 on - currently when an array that is not able to be converted via What do we think about adding something along the lines of - "if you wish to enable array API support and get {namespace} arrays as output, set the configuration At the moment this change would be a bit complicated because we perform the |
|
I think it would be beneficial to mention that array API dispatch should be activated. I think we try to handle non NumPy arrays with |
|
Yeah I agree, this seems like a niche/odd thing that we do, but we wouldn't really recommend anyone to use it. I've amended the wording. I think it is important to document, so people know this happens, because it can be quite unexpected to get a different namespaced array than what you've input e.g., you've forgotten to set |
array_api_dispatch=Falsearray_api_dispatch=False
|
I'm not sure if the array API section is the right place for this information. I think we should have this in a more general section of the scikit-learn docs, possibly we already state it there :-/ It has happened that as part of the array API we've changed what happens when you pass torch arrays to scikit-learn. In which case we get reports about it breaking. This makes me think that it is something people rely and is possible on purpose. But I don't fully know the "rule of thumb" for what works and how. I think it is something like "if it looks enough like a Numpy array" and/or if it supports |
My naive guess would be arrays on CPU device can be converted. I have tried torch 'cpu' and jax (
Interesting, can you point me to any references?
Tricky to look up. In the SVM user guide we state:
In "Developing scikit-learn estimators" -> "Input validation" we state:
I looked for "array " and "asarray" in our I am happy to add a note about this behaviour in the metrics section of the user guide and link it to here? |
doc/modules/array_api.rst
Outdated
| to avoid having to reset it to `False` it at the end of every code snippet, so as to | ||
| not affect the rest of the documentation. | ||
|
|
||
| For historical reasons, if you provide a non-NumPy array input to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure 'for historical reasons' is the best phrase. I just meant that as some inputs can be lists, we always did asarray, which will also convert non-numpy arrays to numpy arrays.
|
#29107 is an example of a bug like this. |
lucyleeow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've realised that we (probably) do np.array or np.asarray on all inputs that we specify as array-like. Thus I amended the glossary entry for array-like to include pytorch tensors on cpu and JAX arrays as examples. Also amended working in the array API user guide.
I couldn't find another suitable place in the user guide to add this info, so I think I will leave for now and hope the glossary is adequate for now.
| to avoid having to reset it to `False` it at the end of every code snippet, so as to | ||
| not affect the rest of the documentation. | ||
|
|
||
| Scikit-learn accepts :term:`array-like` inputs for all :mod:`~sklearn.metrics` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all :mod:
~sklearn.metrics
Is this true..?
| into NumPy arrays using :func:`numpy.asarray` (or :func:`numpy.array`). | ||
| While this will successfully convert some array API inputs (e.g., JAX array), | ||
| we generally recommend setting `array_api_dispatch=True` when using array API inputs. | ||
| Note when `array_api_dispatch=False`, array outputs will be NumPy arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last line is probably redundant, I am happy to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to somehow highlight that when array_api_dispatch=False, NumPy conversion can fail, for instance, calling fit on a torch tensor allocated on a GPU will raise an error.
array_api_dispatch=Falsearray_api_dispatch=False
array_api_dispatch=Falsearray_api_dispatch=False
| * a :class:`pandas.DataFrame` with all columns numeric | ||
| * a numeric :class:`pandas.Series` | ||
|
|
||
| Some array API inputs see (:ref:`array_api` for details): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about joining the two lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to keep the lists separated because the latter is only expected to work when array API dispatch is enabled, which is not the case by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the reason for listing them here is that these inputs already work/are used in the wild. See the linked issue in the discussion of this PR (and I think the same person created another one like it earlier)
| Note that we set it with :func:`config_context` below to avoid having to call | ||
| :func:`set_config(array_api_dispatch=False)` at the end of every code snippet | ||
| that uses the array API. | ||
| Note that in the examples below, we set it within a context (:func:`config_context`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note that in the examples below, we set it within a context (:func:`config_context`) | |
| Note that in the examples below, we use a context manager (:func:`config_context`) |
| :func:`set_config(array_api_dispatch=False)` at the end of every code snippet | ||
| that uses the array API. | ||
| Note that in the examples below, we set it within a context (:func:`config_context`) | ||
| to avoid having to reset it to `False` it at the end of every code snippet, so as to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| to avoid having to reset it to `False` it at the end of every code snippet, so as to | |
| to avoid having to reset it to `False` at the end of every code snippet, so as to |
Reference Issues/PRs
Related #30454
Found out in #32600 (comment)
What does this implement/fix? Explain your changes.
Adds info on what happens when non-NumPy array input occurs with
array_api_dispatch=FalseI only realised that we do this and I think it would be nice if it was mentioned in the docs.
I thought I'd add a section on the
array_api_dispatchas it no longer seemed to fit under the 'Example usage' sectionAny other comments?
cc @OmarManzoor @lesteve