FEA Add array API support for `average_precision_score` #32909

StefanieSenger · 2025-12-16T12:04:56Z

Reference Issues/PRs

towards #26024

What does this implement/fix? Explain your changes.

Adds array API support to average_precision_score

AI usage disclosure

I used AI assistance for:

Code generation (e.g., when writing an implementation or fixing a bug)
Test/benchmark generation
Documentation (including examples)
Research and understanding

Any other comments?

WIP

TODO:

add array API in average_precision_score
add common metrics test
add average_precision_score to the array API docs
removed old fix for numpy<= 1.24.1 from average_precision_score as well as from _check_set_wise_labels which allows to have present_labels as an array instead of a list
use LabelBinarizer instead of label_binarize # TODO: check if in fact necessary
add array API in _average_binary_score
fix remaining test failures
add changelog
make it accept mixed namespace inputs and add to new test in TST Add common test for mixed array API inputs for metrics #32755 if it is merged first

sklearn/metrics/tests/test_classification.py

StefanieSenger · 2025-12-16T12:08:03Z

sklearn/metrics/_base.py

+    binary_metric : callable, returns shape [n_classes]
+        The binary metric function to use.
+


I changed the order of the params in the docs to match the order of reading these into the function, like in all the other places.

_average_binary_score is also used in roc_auc_score. So it's probably best to first discuss how to properly add array API support to it, or wait until this PR is merged before creating a PR for roc_auc_score.

sklearn/metrics/_classification.py

StefanieSenger · 2025-12-16T16:41:06Z

sklearn/metrics/tests/test_common.py

+    if metric.__name__ == "average_precision_score":
+        # we need y_pred_nd to be of shape (n_samples, n_classes
+        y_pred_np = np.array(
+            [
+                [0.7, 0.2, 0.05, 0.05],
+                [0.1, 0.8, 0.05, 0.05],
+                [0.1, 0.1, 0.7, 0.1],
+                [0.05, 0.05, 0.1, 0.8],
+            ],
+            dtype=dtype_name,
+        )
+


This case handling needs to be added, because average_precision_score handels multiclass inputs by binarising it. As a consequence, y_score needs to be of shape (n_samples, n_classes). I have made a PR with a fix here: #32912.

sklearn/metrics/_ranking.py

StefanieSenger · 2025-12-16T17:17:48Z

sklearn/metrics/_base.py

        if average_weight is not None:
            # Scores with 0 weights are forced to be 0, preventing the average
            # score from being affected by 0-weighted NaN elements.
-            average_weight = np.asarray(average_weight)


This is not needed anymore, since average_weight is converted to an array earlier.

StefanieSenger · 2025-12-17T13:36:03Z

sklearn/metrics/_ranking.py

+    if not get_config().get("array_api_dispatch", False):
+        y_true = _convert_to_numpy(y_true, xp=xp)
+        y_score = _convert_to_numpy(y_score, xp=xp)
+        if sample_weight is not None:
+            sample_weight = _convert_to_numpy(sample_weight, xp=xp)


This (hopefully) fixes an error occurring on the CI (link to output):

E TypeError: sum() received an invalid combination of arguments - got (out=NoneType, axis=int, ), but expected one of: E * (*, torch.dtype dtype = None) E didn't match because some of the keywords were incorrect: out, axis E * (tuple of ints dim, bool keepdim = False, *, torch.dtype dtype = None) E * (tuple of names dim, bool keepdim = False, *, torch.dtype dtype = None)

@lesteve and I investigated this together. It seems that numpy.sum() when passed a torch tensor, internally tries to dispatch to torch, but that fails because torch.sum() doesn't have the out argument. The same happens with np.repeat() which has an axis argument, but torch.repeat() hasn't (which had raised in my setup locally).

Yes, I ran into the same issue when I was working on roc_auc_score.

Instead of converting everything to NumPy ahead of time, we should inspect the namespace and device of y_score only. Indeed, we want to allow y_true to have an object dtype for string class labels (hence a NumPy array) even when the predictions are a non-NumPy namespace with a non-CPU device.

See: https://scikit-learn.org/dev/modules/array_api.html#scoring-functions

y_true should thereafter convert to binary indicators (using pos_label when available, and moved to the same device, dtype and namespace as y_score just before doing the arithmetic computation.

sample_weight should similarly follow the namespace, dtype and device of y_score using move_to.

See the discussion and implementation of the common test being worked on in #32755 for mixed input metrics.

I will keep your comment in mind for later, but this here addresses a different problem. It fixes a test failure if all inputs (specifically y_true and y_score) are torch tensors on cpu.

In check_array_api_metric() we have defined a numpy_as_array_works boolean that is True if array api is not enabled, but we get input from a non-numpy namespace that can be converted to numpy (see here):

# When array API dispatch is disabled, and np.asarray works (for example PyTorch # with CPU device), calling the metric function with such numpy compatible inputs # should work (albeit by implicitly converting to numpy arrays instead of # dispatching to the array library). try: np.asarray(a_xp) np.asarray(b_xp) numpy_as_array_works = True

In this case, we feed two torch arrays into the metric, but the test fails (see here):

if numpy_as_array_works: metric_xp = metric(a_xp, b_xp, **metric_kwargs)

If I have understood correctly, then we would always try to convert to numpy, if get_config().get("array_api_dispatch", False) but the input would be convertible to numpy, and making sure that we do is the purpose of this test (added in #30454).

(In this WIP PR I haven't taken care of mixed inputs at all yet, also #32755 is not merged yet.)

sklearn/metrics/_ranking.py

virchan · 2025-12-18T14:12:58Z

sklearn/metrics/_base.py

+    binary_metric : callable, returns shape [n_classes]
+        The binary metric function to use.
+


_average_binary_score is also used in roc_auc_score. So it's probably best to first discuss how to properly add array API support to it, or wait until this PR is merged before creating a PR for roc_auc_score.

virchan · 2025-12-18T14:19:33Z

sklearn/metrics/_base.py


+import sklearn.externals.array_api_extra as xpx
 from sklearn.utils import check_array, check_consistent_length
+from sklearn.utils._array_api import _ravel, get_namespace_and_device


Suggested change

from sklearn.utils._array_api import _ravel, get_namespace_and_device

from sklearn.utils._array_api import _ravel, get_namespace_and_device, xpx

I wasn't aware of that, and maybe that had been discussed somewhere I missed. So I did a bit of searching and I have found xpx imported via _array_api.py in sklearn/model_selection/_search.py and in sklearn/tree/tests/test_tree.py.
And I have found from sklearn.externals import array_api_extra as xpx imports in three other files.

I prefer the more explicit import from sklearn.externals, since it is easier to see what is going on the first glance even if you are unfamiliar with array api, and it will have to be obvious for people unfamilar with array api, as it will be in every function in the future and we also want to make it easy for future contributors.

virchan · 2025-12-18T14:21:47Z

sklearn/metrics/_ranking.py

+    if not get_config().get("array_api_dispatch", False):
+        y_true = _convert_to_numpy(y_true, xp=xp)
+        y_score = _convert_to_numpy(y_score, xp=xp)
+        if sample_weight is not None:
+            sample_weight = _convert_to_numpy(sample_weight, xp=xp)


Yes, I ran into the same issue when I was working on roc_auc_score.

first iteration

5bf2414

github-actions bot added the module:metrics label Dec 16, 2025

StefanieSenger commented Dec 16, 2025

View reviewed changes

sklearn/metrics/_classification.py Show resolved Hide resolved

fix regex

4bc22a9

StefanieSenger added this to Labs Dec 16, 2025

StefanieSenger moved this to In progress - High Priority in Labs Dec 16, 2025

StefanieSenger mentioned this pull request Dec 16, 2025

FIX Error handling in ranking metrics supporting multiclass: average_precision_score, roc_auc_score and top_k_accuracy_score #32912

Open

4 tasks

StefanieSenger changed the title ~~Add array API support for average_precision_score~~ Add array API support for average_precision_score Dec 16, 2025

iteration 2

d4aff8a

StefanieSenger commented Dec 16, 2025

View reviewed changes

iteration3

8088bbe

StefanieSenger commented Dec 16, 2025

View reviewed changes

fix TypeError: sum() received an invalid combination of arguments

c7a7bc1

StefanieSenger commented Dec 17, 2025

View reviewed changes

sklearn/metrics/_ranking.py Show resolved Hide resolved

StefanieSenger mentioned this pull request Dec 17, 2025

MNT cleanup old numpy workaround in metrics functions #32917

Merged

4 tasks

virchan reviewed Dec 18, 2025

View reviewed changes

StefanieSenger and others added 2 commits December 18, 2025 15:31

Merge branch 'main' into array_api_average_precision_score

4d2d0fb

fix device failures

329564a

virchan mentioned this pull request Dec 19, 2025

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

StefanieSenger and others added 2 commits December 19, 2025 16:47

run type conversion only on condition

09a2719

Merge branch 'main' into array_api_average_precision_score

a7201b4

StefanieSenger added the CUDA CI label Dec 19, 2025

github-actions bot removed the CUDA CI label Dec 19, 2025

StefanieSenger changed the title ~~Add array API support for average_precision_score~~ FEA Add array API support for average_precision_score Dec 19, 2025

		binary_metric : callable, returns shape [n_classes]
		The binary metric function to use.

	from sklearn.utils._array_api import _ravel, get_namespace_and_device
	from sklearn.utils._array_api import _ravel, get_namespace_and_device, xpx

Uh oh!

FEA Add array API support for average_precision_score #32909

Are you sure you want to change the base?

FEA Add array API support for average_precision_score #32909

Uh oh!

Conversation

StefanieSenger commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

AI usage disclosure

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StefanieSenger Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FEA Add array API support for `average_precision_score` #32909

FEA Add array API support for `average_precision_score` #32909

StefanieSenger commented Dec 16, 2025 •

edited

Loading

StefanieSenger Dec 16, 2025 •

edited

Loading

StefanieSenger Dec 17, 2025 •

edited

Loading