FEA Add array API support to `LabelBinarizer(sparse_output=False)` for numeric labels #32582

virchan · 2025-10-27T06:55:10Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds Array API support to LabelBinarizer and label_binarize when sparse_output=False for numeric labels, and therefore does not conflict with #30439 (comment). Specifically,

Both LabelBinarizer and label_binarize will raise a ValueError when the input y has a non-NumPy namespace and sparse_output=True.
If LabelBinarizer is fitted on a sparse matrix (i.e., sparse_input_=True), calling inverse_transform on a non-NumPy array will raise a ValueError.

~~3. If the input classes contains string labels, label_binarize will automatically fall back to the NumPy namespace.~~

Any other comments?

Adjusted the atol value in the test_graphical_lassos function due to a CI failure with `random_seed=95, discovered during local testing.

…r numeric labels

…inarizer_Array_API

github-actions · 2025-10-27T06:56:27Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 21ec7da. Link to the linter CI: here}

virchan · 2025-10-27T08:08:04Z

I'll fix the CI later.

sklearn/preprocessing/_label.py

…inarizer_Array_API

virchan · 2025-10-29T23:12:17Z

CI is green, including CUDA. Ready for review, @ogrisel, @OmarManzoor!

OmarManzoor

Thank you for the PR @virchan

An initial set of comments.

doc/whats_new/upcoming_changes/array-api/32582.feature.rst

sklearn/preprocessing/_label.py

OmarManzoor

Few more comments. Generally looks good

sklearn/preprocessing/_label.py

sklearn/preprocessing/tests/test_label.py

ogrisel

Please update the existing code with array API support that relies on LabelBinarizer to check that this PR can help simplify it. I think we only have the following two occurrences, but I am not 100% sure:

scikit-learn/sklearn/linear_model/_ridge.py

Lines 1324 to 1329 in ca39ad1

    
           # TODO: Update this line to avoid calling `_convert_to_numpy` 
        
           # once LabelBinarizer has been updated to accept non-NumPy array API 
        
           # compatible inputs. 
        
           Y = self._label_binarizer.fit_transform( 
        
               _convert_to_numpy(y, xp_y) if y_is_array_api else y 
        
           )

scikit-learn/sklearn/metrics/_classification.py

Lines 185 to 193 in ca39ad1

    
           # For classification metrics both array API compatible and non array API 
        
           # compatible inputs are allowed for `y_true`. This is because arrays that 
        
           # store class labels as strings cannot be represented in namespaces other 
        
           # than Numpy. Thus to avoid unnecessary complexity, we always convert 
        
           # `y_true` to a Numpy array so that it can be processed appropriately by 
        
           # `LabelBinarizer` and then transfer the integer encoded output back to the 
        
           # target namespace and device. 
        
           if is_y_true_array_api: 
        
               y_true = _convert_to_numpy(y_true, xp=xp_y_true)

sklearn/preprocessing/_label.py

sklearn/preprocessing/tests/test_label.py

…inarizer_Array_API

OmarManzoor

Otherwise LGTM. Thank you @virchan

sklearn/preprocessing/_label.py

…inarizer_Array_API

lucyleeow

Thanks for working on this! Just a few questions but looks good to me. I'm not that familiar with label binarizer though, so I would feel more comfortable if someone else looked at this...

sklearn/preprocessing/_label.py

sklearn/linear_model/_ridge.py

sklearn/covariance/tests/test_graphical_lasso.py

…inarizer_Array_API

virchan

Apologies for the force-push.

virchan · 2025-11-14T22:22:29Z

Update: CUDA CI passed after merging #32705.

lucyleeow

One small question about the atol change, but LGTM - I think I had a good look so will just merge after 😬

Co-authored-by: Vivaan Nanavati <[email protected]> Co-authored-by: Maren Westermann <[email protected]>

…eeds]

This reverts commit 2fa9c93.

virchan · 2025-11-17T08:40:15Z

I ran the CI with the [all random seeds] commit marker and found additional failures unrelated to LabelBinarizer and label_binarize.

In particular, I updated test_mcd, but there are still more failures showing up in the macOS pylatest_conda_forge_arm job. It might be better to handle those in a separate PR, so I opened #32725 to track them.

virchan added 2 commits October 26, 2025 23:24

FEA Add array API support to LabelBinarizer(sparse_output=False) fo…

d9dbad1

…r numeric labels

Merge remote-tracking branch 'upstream/main' into issues/26024/LabelB…

5a08907

…inarizer_Array_API

github-actions bot added module:covariance module:preprocessing labels Oct 27, 2025

virchan added 2 commits October 27, 2025 00:06

add changelog

078a8ef

Fix dtype for mps

7b2b972

lucyleeow added the Array API label Oct 28, 2025

OmarManzoor reviewed Oct 28, 2025

View reviewed changes

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

virchan added 9 commits October 28, 2025 17:27

change dtype casting

2f5818c

Merge remote-tracking branch 'upstream/main' into issues/26024/LabelB…

658f248

…inarizer_Array_API

update indexing dtype casting

6ef8e96

update integer type casting

d00382a

update _inverse_bianrize functions

4a6fc06

add integer type check to test_label_binarizer_array_api_compliance

7fe4ba8

Merge remote-tracking branch 'upstream/main' into issues/26024/LabelB…

678fae4

…inarizer_Array_API

update changelog

1065133

update test_label_binarize_array_api_compliance to fix codecov

0665124

virchan added the CUDA CI label Oct 29, 2025

github-actions bot removed the CUDA CI label Oct 29, 2025

virchan marked this pull request as ready for review October 29, 2025 23:11

ogrisel moved this to Todo in Array API Oct 30, 2025

ogrisel added this to Array API Oct 30, 2025

ogrisel moved this from Todo to In Progress in Array API Oct 30, 2025

OmarManzoor reviewed Oct 30, 2025

View reviewed changes

doc/whats_new/upcoming_changes/array-api/32582.feature.rst Outdated Show resolved Hide resolved

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

OmarManzoor reviewed Oct 31, 2025

View reviewed changes

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

sklearn/preprocessing/tests/test_label.py Outdated Show resolved Hide resolved

ogrisel reviewed Oct 31, 2025

View reviewed changes

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

sklearn/preprocessing/tests/test_label.py Outdated Show resolved Hide resolved

update changelog

ec0eacd

virchan added 2 commits November 7, 2025 17:03

Merge remote-tracking branch 'upstream/main' into issues/26024/LabelB…

7273c21

…inarizer_Array_API

Merge branch 'main' into issues/26024/LabelBinarizer_Array_API

b4355f6

OmarManzoor approved these changes Nov 10, 2025

View reviewed changes

sklearn/preprocessing/_label.py Outdated Show resolved Hide resolved

virchan added 2 commits November 9, 2025 22:59

fix dtype casting

710bef5

Merge remote-tracking branch 'upstream/main' into issues/26024/LabelB…

c96acc8

…inarizer_Array_API

virchan added the CUDA CI label Nov 10, 2025

github-actions bot removed the CUDA CI label Nov 10, 2025

lucyleeow reviewed Nov 12, 2025

View reviewed changes

virchan added 2 commits November 12, 2025 23:14

apply suggestions from code review

a43354c

Merge remote-tracking branch 'upstream/main' into issues/26024/LabelB…

0edd742

…inarizer_Array_API

virchan force-pushed the issues/26024/LabelBinarizer_Array_API branch from f0dd7f7 to 0edd742 Compare November 13, 2025 07:59

virchan commented Nov 13, 2025

View reviewed changes

virchan added 2 commits November 13, 2025 23:10

Merge branch 'main' into issues/26024/LabelBinarizer_Array_API

b857ceb

Merge branch 'main' into issues/26024/LabelBinarizer_Array_API

ff31ae8

virchan added the CUDA CI label Nov 14, 2025

github-actions bot removed the CUDA CI label Nov 14, 2025

lucyleeow approved these changes Nov 17, 2025

View reviewed changes

vivaannanavati123 and others added 7 commits November 16, 2025 17:38

DOC Add link to plot_gmm_pdf.py in GaussianMixture (scikit-learn#31230)

459a340

Co-authored-by: Vivaan Nanavati <[email protected]> Co-authored-by: Maren Westermann <[email protected]>

Revert "[all_random_seed]"

2fa9c93

[all_random_seed]

75db8a4

[all_random_seeds]

7e18273

[all random seeds]

b86dab9

update tol_support in test_mcd for random_seed=46 [all random s…

8a28354

…eeds]

Re-run CI

a728feb

virchan mentioned this pull request Nov 17, 2025

Random-seed-dependent test failures in macOS pylatest_conda_forge_arm job #32725

Open

Revert "Revert "[all_random_seed]""

21ec7da

This reverts commit 2fa9c93.

virchan added the CUDA CI label Nov 17, 2025

github-actions bot removed the CUDA CI label Nov 17, 2025

	# TODO: Update this line to avoid calling `_convert_to_numpy`
	# once LabelBinarizer has been updated to accept non-NumPy array API
	# compatible inputs.
	Y = self._label_binarizer.fit_transform(
	_convert_to_numpy(y, xp_y) if y_is_array_api else y
	)

	# For classification metrics both array API compatible and non array API
	# compatible inputs are allowed for `y_true`. This is because arrays that
	# store class labels as strings cannot be represented in namespaces other
	# than Numpy. Thus to avoid unnecessary complexity, we always convert
	# `y_true` to a Numpy array so that it can be processed appropriately by
	# `LabelBinarizer` and then transfer the integer encoded output back to the
	# target namespace and device.
	if is_y_true_array_api:
	y_true = _convert_to_numpy(y_true, xp=xp_y_true)

Uh oh!

FEA Add array API support to LabelBinarizer(sparse_output=False) for numeric labels #32582

Are you sure you want to change the base?

FEA Add array API support to LabelBinarizer(sparse_output=False) for numeric labels #32582

Conversation

virchan commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

virchan commented Oct 27, 2025

Uh oh!

Uh oh!

virchan commented Oct 29, 2025

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

virchan commented Nov 14, 2025

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

virchan commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

FEA Add array API support to `LabelBinarizer(sparse_output=False)` for numeric labels #32582

FEA Add array API support to `LabelBinarizer(sparse_output=False)` for numeric labels #32582

virchan commented Oct 27, 2025 •

edited

Loading

github-actions bot commented Oct 27, 2025 •

edited

Loading