Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@virchan
Copy link
Member

@virchan virchan commented Oct 27, 2025

Reference Issues/PRs

Towards #26024 and #32422 (comment).

What does this implement/fix? Explain your changes.

This PR adds Array API support to LabelBinarizer and label_binarize when sparse_output=False for numeric labels, and therefore does not conflict with #30439 (comment). Specifically,

  1. Both LabelBinarizer and label_binarize will raise a ValueError when the input y has a non-NumPy namespace and sparse_output=True.

  2. If LabelBinarizer is fitted on a sparse matrix (i.e., sparse_input_=True), calling inverse_transform on a non-NumPy array will raise a ValueError.

3. If the input classes contains string labels, label_binarize will automatically fall back to the NumPy namespace.

Any other comments?

Adjusted the atol value in the test_graphical_lassos function due to a CI failure with `random_seed=95, discovered during local testing.

@github-actions
Copy link

github-actions bot commented Oct 27, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 21ec7da. Link to the linter CI: here

@virchan
Copy link
Member Author

virchan commented Oct 27, 2025

I'll fix the CI later.

@github-actions github-actions bot removed the CUDA CI label Oct 29, 2025
@virchan virchan marked this pull request as ready for review October 29, 2025 23:11
@virchan
Copy link
Member Author

virchan commented Oct 29, 2025

CI is green, including CUDA. Ready for review, @ogrisel, @OmarManzoor!

@ogrisel ogrisel moved this to Todo in Array API Oct 30, 2025
@ogrisel ogrisel moved this from Todo to In Progress in Array API Oct 30, 2025
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @virchan

An initial set of comments.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments. Generally looks good

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the existing code with array API support that relies on LabelBinarizer to check that this PR can help simplify it. I think we only have the following two occurrences, but I am not 100% sure:

  • # TODO: Update this line to avoid calling `_convert_to_numpy`
    # once LabelBinarizer has been updated to accept non-NumPy array API
    # compatible inputs.
    Y = self._label_binarizer.fit_transform(
    _convert_to_numpy(y, xp_y) if y_is_array_api else y
    )
  • # For classification metrics both array API compatible and non array API
    # compatible inputs are allowed for `y_true`. This is because arrays that
    # store class labels as strings cannot be represented in namespaces other
    # than Numpy. Thus to avoid unnecessary complexity, we always convert
    # `y_true` to a Numpy array so that it can be processed appropriately by
    # `LabelBinarizer` and then transfer the integer encoded output back to the
    # target namespace and device.
    if is_y_true_array_api:
    y_true = _convert_to_numpy(y_true, xp=xp_y_true)

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. Thank you @virchan

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Just a few questions but looks good to me. I'm not that familiar with label binarizer though, so I would feel more comfortable if someone else looked at this...

@virchan virchan force-pushed the issues/26024/LabelBinarizer_Array_API branch from f0dd7f7 to 0edd742 Compare November 13, 2025 07:59
Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the force-push.

@virchan
Copy link
Member Author

virchan commented Nov 14, 2025

Update: CUDA CI passed after merging #32705.

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question about the atol change, but LGTM - I think I had a good look so will just merge after 😬

@virchan
Copy link
Member Author

virchan commented Nov 17, 2025

I ran the CI with the [all random seeds] commit marker and found additional failures unrelated to LabelBinarizer and label_binarize.

In particular, I updated test_mcd, but there are still more failures showing up in the macOS pylatest_conda_forge_arm job. It might be better to handle those in a separate PR, so I opened #32725 to track them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants