FEA Add array API support for temperature scaling in CalibratedClassifierCV #32246

OmarManzoor · 2025-09-22T06:43:58Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Attempts to add array API support for temperature scaling in CalibratedClassifierCV

Notes:

find_minimum from scipy.optimize.elementwise that supports the array API doesn't quite work because it works with arrays and if we use arrays as inputs and outputs it breaks for array-api-strict. If we stick with scalars it breaks at a point where an array is expected.
I tried simply adapting the multinomial loss for the array API and since minimize_scalar which we use currently simply uses scalar values within it's main computation loops, I convert the input to the array API when entering the _half_multinomial_loss function and converting to a float when returning from it. This has the drawback of converting back and forth between the cpu and gpu.
I don't think we can use array API consistently within CalibratedClassifierCV because it involves an estimator which we don't know supports array API or not and also involves cross validation before going on to the actual calibration computation.
I ran some benchmarks for just the _TemperatureScaling class on google colab and my local mac M1, the results are as follows which show they vary based on the scipy version:

scipy == 1.16.1

Avg execution_time for numpy: 7.254365730285644
Avg execution_time for torch mps: 7.542782831192016

Google colab with T4 GPU

Avg execution_time for numpy: 14.60176453590393
Avg execution_time for torch cuda: 10.203765702247619

___________________________________________________

scipy == 1.15.3

Avg execution_time for numpy: 7.495703768730164
Avg execution_time for torch mps: 1.769882321357727

Google colab with T4 GPU

Avg execution_time for numpy: 14.949613022804261
Avg execution_time for torch cuda: 0.7114695549011231

I noted the significant performance with scipy == 1.15.3 because I also tried running the benchmarks on a Kaggle kernel and since that supports Python 3.10 something the maximum scipy version that we can get there is 1.15.3.

Any other comments?

CC: @ogrisel @betatim @virchan @lesteve

What is your opinion? Should we support array API for this class?

…ierCV

…oat at the end

github-actions · 2025-09-22T06:44:59Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: c57fef8. Link to the linter CI: here}

OmarManzoor · 2025-09-22T06:45:15Z

Note: Since this is a DRAFT PR for now I left the bench.py file so that we can experiment if required.

ogrisel · 2025-09-23T12:29:06Z

find_minimum from scipy.optimize.elementwise that supports the array API doesn't quite work because it works with arrays and if we use arrays as inputs and outputs it breaks for array-api-strict. If we stick with scalars it breaks at a point where an array is expected.

Has this problem already been reported upstream?

I don't think we can use array API consistently within CalibratedClassifierCV because it involves an estimator which we don't know supports array API or not

I think it's fair for meta estimators to only support array API when the base estimator does (and let any exception raised by the underlying estimator bubble up otherwise).

and also involves cross validation before going on to the actual calibration computation.

How is that a problem? I think our cross-validation tools support array API, no? If not, we should fix that first.

I ran some benchmarks for just the _TemperatureScaling class on google colab and my local mac M1, the results are as follows which show they vary based on the scipy version:

Have you tried to use a profiler to understand the performance difference between scipy versions with torch/mps? Ideally, this should be reported as a performance regression upstream along with a minimal reproducer that does not involve scikit-learn.

OmarManzoor · 2025-09-23T12:33:47Z

@ogrisel cross_val_predict currently does not support the array API.

Also I haven't reported either of the scipy issues that I noted. But wouldn't it be better for someone else to confirm my observations as well?

ogrisel · 2025-09-23T12:53:25Z

@ogrisel cross_val_predict currently does not support the array API.

Ah ok. This should indeed be addressed in a dedicated PR then.

ogrisel · 2025-09-23T13:26:21Z

I confirm that I observe a similar performance regression using SciPy 1.16 (vs 1.15) using the bench.py (with "mps" device and np/xp.float32 dtypes).

OmarManzoor · 2025-09-24T06:08:07Z

I opened a scipy issue with respect to the slower runtimes that we noted: scipy/scipy#23670

ogrisel

Here is some preliminary feedback, some of which we already discussed in over discord.

ogrisel · 2025-09-24T07:42:03Z

sklearn/calibration.py

            calibrators.append(calibrator)
    elif method == "temperature":
+        max_float_dtype = _max_precision_float_dtype(xp, xp_device)
+        predictions = xp.asarray(predictions, dtype=max_float_dtype, device=xp_device)


This should not be needed if the underlying classifier supports array API.

I think since we are going for the path in which we are supporting array API when the underlying estimator also supports array API, then yes these kinds of conversions in the CalibratedClassiferCV fit method and over here can be removed but I think we would first need to support array API for at least the cross_val_predict function so that we can deal with the case when ensemble=False. I think maybe this first PR should focus on ensemble being False. Because when we have ensemble=True I think our cv splitters are returning ndarrays for train, test indices which won't work when we are dealing with the array API. What do you think?

sklearn/calibration.py

sklearn/tests/test_calibration.py

ogrisel · 2025-09-24T08:07:36Z

sklearn/tests/test_calibration.py

+    with config_context(array_api_dispatch=True):
+        cal_clf_xp = CalibratedClassifierCV(
+            FrozenEstimator(clf), cv=3, method="temperature", ensemble=False
+        ).fit(X_cal_xp, y_cal_xp)


I think we should test with y_cal being an array of str instances here as well.

sklearn/calibration.py

virchan

The Array API version _half_multinomial_loss should be okay. I did some tests to compare it with HalfMultinomialLoss in addition to running test_half_multinomal_loss, and they all passed.

OmarManzoor · 2025-09-26T09:29:20Z

The Array API version _half_multinomial_loss should be okay. I did some tests to compare it with HalfMultinomialLoss in addition to running test_half_multinomal_loss, and they all passed.

Thank you for verifying

OmarManzoor added 9 commits September 9, 2025 12:59

FEA Add array API support for TemperatureScaling in CalibratedClassif…

11aed7a

…ierCV

Merge branch 'main' into array-api-temperature-scaling

3fa226f

Minor fix

86fbfef

Just add support for computing the multinomial loss and convert to fl…

ee42dce

…oat at the end

Add benchmark for testing

0aed74d

Update bench

892a3af

Update bench

dc1bd35

Update bench

01347b0

Update bench

4cae31d

github-actions bot added the module:utils label Sep 22, 2025

virchan added module:calibration Array API and removed module:utils labels Sep 22, 2025

OmarManzoor mentioned this pull request Sep 24, 2025

BUG: xp_promote and xp_result_type are slow for big tensors because iter is slow for tensors scipy/scipy#23670

Open

ogrisel reviewed Sep 24, 2025

View reviewed changes

OmarManzoor added 3 commits September 24, 2025 16:14

Update the code to use a classifier which supports array API

7f92b99

Merge branch 'main' into array-api-temperature-scaling

3492934

Some updates as suggested in the PR

c57fef8

OmarManzoor mentioned this pull request Sep 25, 2025

FEA Add array API support for cross_val_predict #32270

Merged

virchan reviewed Sep 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEA Add array API support for temperature scaling in CalibratedClassifierCV #32246

FEA Add array API support for temperature scaling in CalibratedClassifierCV #32246

OmarManzoor commented Sep 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

OmarManzoor commented Sep 22, 2025 •

edited

Loading

Uh oh!

ogrisel commented Sep 23, 2025

Uh oh!

OmarManzoor commented Sep 23, 2025

Uh oh!

ogrisel commented Sep 23, 2025

Uh oh!

ogrisel commented Sep 23, 2025

Uh oh!

OmarManzoor commented Sep 24, 2025

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Sep 24, 2025

Uh oh!

OmarManzoor Sep 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ogrisel Sep 24, 2025

Uh oh!

Uh oh!

virchan left a comment

Uh oh!

OmarManzoor commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

FEA Add array API support for temperature scaling in CalibratedClassifierCV #32246

Are you sure you want to change the base?

FEA Add array API support for temperature scaling in CalibratedClassifierCV #32246

Conversation

OmarManzoor commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

OmarManzoor commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Sep 23, 2025

Uh oh!

OmarManzoor commented Sep 23, 2025

Uh oh!

ogrisel commented Sep 23, 2025

Uh oh!

ogrisel commented Sep 23, 2025

Uh oh!

OmarManzoor commented Sep 24, 2025

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

OmarManzoor Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor commented Sep 26, 2025

Uh oh!

Uh oh!

OmarManzoor commented Sep 22, 2025 •

edited

Loading

github-actions bot commented Sep 22, 2025 •

edited

Loading

OmarManzoor commented Sep 22, 2025 •

edited

Loading

OmarManzoor Sep 24, 2025 •

edited

Loading