You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to be able to pass data with the nullable pandas dtypes (Int64, Float64, and boolean) into sklearn metrics such as matthews_corrcoef, accuracy_score, and f1_score (and more) even if the data does not contain any nans. Currently, they result in one of several errors:
If y_true and y_pred are both nullable types: ValueError: unknown is not supported
it only one of y_true or y_pred is nullable and the other is non nullable : ValueError: Classification metrics can't handle a mix of unknown and binary [or multiclass] targets
Some metrics such as log_loss result in a different error when y_true is nullable: ValueError: Unknown label type: (0 1
Repro with sklearn 1.2.1 and pandas 1.5.3:
importpandasaspdimportpytestfromsklearnimportmetricsfordtypein ['Int64', 'Float64', 'boolean']:
# Error if only target uses nullable typesX=pd.DataFrame({"a": pd.Series([1, 2, 3, 4]),
"b": pd.Series([9,8,7,6])})
# Two nullable dtypes used y_true=pd.Series([1, 0, 1, 0], dtype=dtype)
y_predicted=pd.Series([1, 0, 1, 0], dtype=dtype)
withpytest.raises(ValueError, match="unknown is not supported"):
metrics.accuracy_score(
y_true,
y_predicted,
)
# Only one nullable dtype used y_predicted=pd.Series([1, 0, 1, 0], dtype="float64")
withpytest.raises(ValueError, match="Classification metrics can't handle a mix of unknown and binary targets"):
metrics.accuracy_score(
y_true,
y_predicted,
)
Describe your proposed solution
Sklearn should recognize the pandas nullable dtypes as the correct type of target for their scoring metrics like it does with the non nullable dtypes.
Describe alternatives you've considered, if relevant
As this data doesn't have null values, we can convert to the non nullable dtype prior to passing to sklearn, but doing that will make it cumbersome to build software that leverages both the latest pandas dtypes and sklearn.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Describe the workflow you want to enable
I would like to be able to pass data with the nullable pandas dtypes (
Int64
,Float64
, andboolean
) into sklearn metrics such asmatthews_corrcoef
,accuracy_score
, andf1_score
(and more) even if the data does not contain any nans. Currently, they result in one of several errors:y_true
andy_pred
are both nullable types:ValueError: unknown is not supported
y_true
ory_pred
is nullable and the other is non nullable :ValueError: Classification metrics can't handle a mix of unknown and binary [or multiclass] targets
log_loss
result in a different error wheny_true
is nullable:ValueError: Unknown label type: (0 1
Repro with sklearn 1.2.1 and pandas 1.5.3:
Describe your proposed solution
Sklearn should recognize the pandas nullable dtypes as the correct type of target for their scoring metrics like it does with the non nullable dtypes.
Describe alternatives you've considered, if relevant
As this data doesn't have null values, we can convert to the non nullable dtype prior to passing to sklearn, but doing that will make it cumbersome to build software that leverages both the latest pandas dtypes and sklearn.
Additional context
No response
The text was updated successfully, but these errors were encountered: