Support pandas nullable dtypes for scoring metrics #25578

tamargrey · 2023-02-09T21:11:33Z

Describe the workflow you want to enable

I would like to be able to pass data with the nullable pandas dtypes (Int64, Float64, and boolean) into sklearn metrics such as matthews_corrcoef, accuracy_score, and f1_score (and more) even if the data does not contain any nans. Currently, they result in one of several errors:

If y_true and y_pred are both nullable types: ValueError: unknown is not supported
it only one of y_true or y_pred is nullable and the other is non nullable : ValueError: Classification metrics can't handle a mix of unknown and binary [or multiclass] targets
Some metrics such as log_loss result in a different error when y_true is nullable: ValueError: Unknown label type: (0 1

Repro with sklearn 1.2.1 and pandas 1.5.3:

    import pandas as pd
    import pytest
    from sklearn import metrics

    for dtype in ['Int64', 'Float64', 'boolean']:
        # Error if only target uses nullable types
        X = pd.DataFrame({"a": pd.Series([1, 2, 3, 4]), 
                          "b": pd.Series([9,8,7,6])})

        # Two nullable dtypes used 
        y_true = pd.Series([1, 0, 1, 0], dtype=dtype)
        y_predicted = pd.Series([1, 0, 1, 0], dtype=dtype)
        with pytest.raises(ValueError, match="unknown is not supported"):
            metrics.accuracy_score(
                    y_true,
                    y_predicted,
                )

        # Only one nullable dtype used 
        y_predicted = pd.Series([1, 0, 1, 0], dtype="float64")
        with pytest.raises(ValueError, match="Classification metrics can't handle a mix of unknown and binary targets"):
            metrics.accuracy_score(
                    y_true,
                    y_predicted,
                )

Describe your proposed solution

Sklearn should recognize the pandas nullable dtypes as the correct type of target for their scoring metrics like it does with the non nullable dtypes.

Describe alternatives you've considered, if relevant

As this data doesn't have null values, we can convert to the non nullable dtype prior to passing to sklearn, but doing that will make it cumbersome to build software that leverages both the latest pandas dtypes and sklearn.

Additional context

No response

The text was updated successfully, but these errors were encountered:

thomasjpfan · 2023-02-24T14:29:19Z

Thank you for opening this issue! With #25638 merged, nullable dtypes are supported and the provided snippet does not raise anymore.

tamargrey · 2023-02-24T14:52:21Z

Thank you for the quick work! Do you know if this will go out in the next patch release?

thomasjpfan · 2023-02-24T15:20:46Z

Currently, #25638 is targeted for v1.3, but I think we can consider it for a bug patch release.

WDYT @lorentzenchr @jeremiedbb ?

lorentzenchr · 2023-02-24T16:02:47Z

I‘m not the release manager. If we do a further bugfix release, why not.

tamargrey added Needs Triage Issue requires triage New Feature labels Feb 9, 2023

This was referenced Feb 17, 2023

Objective Functions: Remove nullable type handling when sklearn adds support alteryx/evalml#4018

Closed

Add-handling-utils alteryx/evalml#4024

Merged

thomasjpfan closed this as completed Feb 24, 2023

thomasjpfan mentioned this issue Feb 24, 2023

DOC Move allowing pandas nullable dtypes to 1.2.2 #25692

Merged

fdsteffen mentioned this issue May 17, 2023

Classifier not training in scikit-learn < 1.2.2 fractal-napari-plugins-collection/napari-feature-classifier#27

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support pandas nullable dtypes for scoring metrics #25578

Support pandas nullable dtypes for scoring metrics #25578

tamargrey commented Feb 9, 2023

thomasjpfan commented Feb 24, 2023

Uh oh!

tamargrey commented Feb 24, 2023

Uh oh!

thomasjpfan commented Feb 24, 2023

Uh oh!

lorentzenchr commented Feb 24, 2023

Uh oh!

Uh oh!

Support pandas nullable dtypes for scoring metrics #25578

Support pandas nullable dtypes for scoring metrics #25578

Comments

tamargrey commented Feb 9, 2023

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

thomasjpfan commented Feb 24, 2023

Uh oh!

tamargrey commented Feb 24, 2023

Uh oh!

thomasjpfan commented Feb 24, 2023

Uh oh!

lorentzenchr commented Feb 24, 2023

Uh oh!