Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lucyleeow
Copy link
Member

Reference Issues/PRs

Related #30454
Found out in #32600 (comment)

What does this implement/fix? Explain your changes.

Adds info on what happens when non-NumPy array input occurs with array_api_dispatch=False
I only realised that we do this and I think it would be nice if it was mentioned in the docs.

I thought I'd add a section on the array_api_dispatch as it no longer seemed to fit under the 'Example usage' section

Any other comments?

cc @OmarManzoor @lesteve

@github-actions
Copy link

github-actions bot commented Nov 8, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 25e8251. Link to the linter CI: here

@lucyleeow
Copy link
Member Author

lucyleeow commented Nov 8, 2025

Another thought that I am 50/50 on - currently when an array that is not able to be converted via numpy.asarray we get an error like:

RuntimeError: Can not convert array on the 'array_api_strict.Device('device1')' device to a Numpy array.

What do we think about adding something along the lines of - "if you wish to enable array API support and get {namespace} arrays as output, set the configuration array_api_dispatch=True"? If they have input e.g., a cupy array, it is possible that they want to enable array API support but they just forgot/didn't know they need to set array_api_dispatch=True...

At the moment this change would be a bit complicated because we perform the asarray in different places for classification metrics vs regression metrics vs clustering metrics etc, but once we start adding support for mixed array inputs (#31829), I think the error could be raised in move_to, which would be called at the start of functions.

@OmarManzoor
Copy link
Contributor

I think it would be beneficial to mention that array API dispatch should be activated. I think we try to handle non NumPy arrays with array_api_dispatch=False by trying to convert to NumPy for metrics but I don't think that is something that should be recommended because we can run into unexpected errors.

@lucyleeow
Copy link
Member Author

Yeah I agree, this seems like a niche/odd thing that we do, but we wouldn't really recommend anyone to use it. I've amended the wording.

I think it is important to document, so people know this happens, because it can be quite unexpected to get a different namespaced array than what you've input e.g., you've forgotten to set array_api_dispatch=True.

@lucyleeow lucyleeow changed the title DOC Add info on array API input when array_api_dispatch=False DOC Add info on array API input in metrics when array_api_dispatch=False Nov 10, 2025
@betatim
Copy link
Member

betatim commented Nov 12, 2025

I'm not sure if the array API section is the right place for this information. I think we should have this in a more general section of the scikit-learn docs, possibly we already state it there :-/

It has happened that as part of the array API we've changed what happens when you pass torch arrays to scikit-learn. In which case we get reports about it breaking. This makes me think that it is something people rely and is possible on purpose. But I don't fully know the "rule of thumb" for what works and how. I think it is something like "if it looks enough like a Numpy array" and/or if it supports __array_function__

@lucyleeow
Copy link
Member Author

lucyleeow commented Nov 13, 2025

But I don't fully know the "rule of thumb" for what works and how. I think it is something like "if it looks enough like a Numpy array" and/or if it supports array_function

My naive guess would be arrays on CPU device can be converted. I have tried torch 'cpu' and jax (import jax.numpy as jnp; arr = jnp.array([1, 2, 3])) and both can be converted to numpy with np.asarray.

In which case we get reports about it breaking.

Interesting, can you point me to any references?

I think we should have this in a more general section of the scikit-learn docs, possibly we already state it there :-/

Tricky to look up. In the SVM user guide we state:

The support vector machines in scikit-learn support both dense (numpy.ndarray and convertible to that by numpy.asarray) and sparse (any scipy.sparse) sample vectors as input.

In "Developing scikit-learn estimators" -> "Input validation" we state:

The module sklearn.utils contains various functions for doing input validation and conversion. Sometimes, np.asarray suffices for validation;

I looked for "array " and "asarray" in our .rst files and couldn't find anything else relevant.

I am happy to add a note about this behaviour in the metrics section of the user guide and link it to here?

to avoid having to reset it to `False` it at the end of every code snippet, so as to
not affect the rest of the documentation.

For historical reasons, if you provide a non-NumPy array input to a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure 'for historical reasons' is the best phrase. I just meant that as some inputs can be lists, we always did asarray, which will also convert non-numpy arrays to numpy arrays.

@betatim
Copy link
Member

betatim commented Nov 13, 2025

#29107 is an example of a bug like this.

Copy link
Member Author

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised that we (probably) do np.array or np.asarray on all inputs that we specify as array-like. Thus I amended the glossary entry for array-like to include pytorch tensors on cpu and JAX arrays as examples. Also amended working in the array API user guide.

I couldn't find another suitable place in the user guide to add this info, so I think I will leave for now and hope the glossary is adequate for now.

to avoid having to reset it to `False` it at the end of every code snippet, so as to
not affect the rest of the documentation.

Scikit-learn accepts :term:`array-like` inputs for all :mod:`~sklearn.metrics`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all :mod:~sklearn.metrics

Is this true..?

into NumPy arrays using :func:`numpy.asarray` (or :func:`numpy.array`).
While this will successfully convert some array API inputs (e.g., JAX array),
we generally recommend setting `array_api_dispatch=True` when using array API inputs.
Note when `array_api_dispatch=False`, array outputs will be NumPy arrays.
Copy link
Member Author

@lucyleeow lucyleeow Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last line is probably redundant, I am happy to remove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to somehow highlight that when array_api_dispatch=False, NumPy conversion can fail, for instance, calling fit on a torch tensor allocated on a GPU will raise an error.

@lucyleeow lucyleeow changed the title DOC Add info on array API input in metrics when array_api_dispatch=False DOC Add info on 'array-like' array AI inputs when array_api_dispatch=False Nov 17, 2025
@betatim betatim changed the title DOC Add info on 'array-like' array AI inputs when array_api_dispatch=False DOC Add info on 'array-like' array API inputs when array_api_dispatch=False Nov 17, 2025
* a :class:`pandas.DataFrame` with all columns numeric
* a numeric :class:`pandas.Series`

Some array API inputs see (:ref:`array_api` for details):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about joining the two lists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the lists separated because the latter is only expected to work when array API dispatch is enabled, which is not the case by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the reason for listing them here is that these inputs already work/are used in the wild. See the linked issue in the discussion of this PR (and I think the same person created another one like it earlier)

Note that we set it with :func:`config_context` below to avoid having to call
:func:`set_config(array_api_dispatch=False)` at the end of every code snippet
that uses the array API.
Note that in the examples below, we set it within a context (:func:`config_context`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that in the examples below, we set it within a context (:func:`config_context`)
Note that in the examples below, we use a context manager (:func:`config_context`)

:func:`set_config(array_api_dispatch=False)` at the end of every code snippet
that uses the array API.
Note that in the examples below, we set it within a context (:func:`config_context`)
to avoid having to reset it to `False` it at the end of every code snippet, so as to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to avoid having to reset it to `False` it at the end of every code snippet, so as to
to avoid having to reset it to `False` at the end of every code snippet, so as to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants