-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Describe the bug
When all data instances come from the same class, #27412 changed the behaviour of roc_auc_score to return 0.0 instead of raising an exception. The argument for the change was the consistency with PR curves. I believe that this result is incorrect, or, at least, not correct under all interpretations. Even if only the latter: it is not worth breaking backwards compatibility for a change that is a matter of discussion - in particular if the change is masking an error by returning a (dubious) "default".
Arguments
The issue arises when all data instances belong to the same class. While AUC is, literally, the area under the ROC curve, we interpret it as the score reflecting the quality of ranking, which is also related to the Gini index and Mann-Whitney U-statistics, as also described in sklearn documentation.
- Under geometric interpretation, if all data comes from the same class, the curve may go either straight right or straight up, depending upon the class, so it can be either 0 or 1 (or 0.5), not (necessarily) 0.0.
- Under statistical interpretation, the AUC is undefined. AUC is the probability that for a random pair of instances from different classes, the score assigned to the instance from the positive class is higher than the score assigned to the instance from the negative class. This measure cannot be computed for data from a single class and is thus undefined. The function should return
np.nanor raise an exception (as it used to). - Furthermore (and related to the previous point), for any
y_trueandy_score, it holds that
auc(y_true, y_score) \
== auc(1 - y_true, 1 - y_score) \
== 1 - auc(y_true, 1 - y_score) \
== 1 - auc(1 - y_true, y_score)Flipping either labels or scores reverses the curve and the AUC, and flipping both keeps AUC the same. Before #27412, auc_roc_score returned an exception when the result cannot be computed. Now it returns 0.0, which leads to inconsistency when flipping classes or scores (or both).
Suggestion
I suggest reverting the change at https://github.com/scikit-learn/scikit-learn/pull/27412/files#diff-4eb3c023f8a3f088d62208f6adbd02b6df5196de2257ccd228dffc972c964634R375, that is, raising an exception instead of returning an (arbitrary, in some contexts) number. Alternatively, the function could return np.nan, but it is better to have an explicit exception and, above all, to keep the backward compatibility with behaviour that was not wrong.
Steps/Code to Reproduce
from sklearn.metrics import roc_auc_score
import numpy as np
y_true = np.array([1, 1, 1, 1, 1])
y_score = np.array([0.8, 0.6, 0.5, 0.3, 0.2])
print(roc_auc_score(y_true, y_score))Expected Results
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/janez/miniforge3/envs/o3/lib/python3.11/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/janez/miniforge3/envs/o3/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 640, in roc_auc_score
return _average_binary_score(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/janez/miniforge3/envs/o3/lib/python3.11/site-packages/sklearn/metrics/_base.py", line 76, in _average_binary_score
return binary_metric(y_true, y_score, sample_weight=sample_weight)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/janez/miniforge3/envs/o3/lib/python3.11/site-packages/sklearn/metrics/_ranking.py", line 382, in _binary_roc_auc_score
raise ValueError(
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.Actual Results
0.0
Versions
System:
python: 3.11.10 | packaged by conda-forge | (main, Sep 10 2024, 10:57:35) [Clang 17.0.6 ]
executable: /Users/janez/miniforge3/envs/o3edge/bin/python
machine: macOS-14.6.1-arm64-arm-64bit
Python dependencies:
sklearn: 1.6.dev0
pip: 24.2
setuptools: 73.0.1
numpy: 1.26.4
scipy: 1.15.0.dev0
Cython: 3.0.11
pandas: 3.0.0.dev0+1524.g23c497bb2f
matplotlib: 3.9.2
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /Users/janez/miniforge3/envs/o3edge/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
version: 0.3.23.dev
threading_layer: pthreads
architecture: armv8
user_api: openmp
internal_api: openmp
num_threads: 8
prefix: libomp
filepath: /Users/janez/miniforge3/envs/o3edge/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
version: None