Thanks to visit codestin.com
Credit goes to github.com

Skip to content

AUC of the ROC is based on class labels (predict()) instead of scores (decision_function() or predict_proba()) during call to cross_validate #28234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NoPenguinsLand opened this issue Jan 23, 2024 · 1 comment
Labels
Bug Needs Triage Issue requires triage

Comments

@NoPenguinsLand
Copy link
Contributor

NoPenguinsLand commented Jan 23, 2024

Describe the bug

Related to #27977

Also applies to pr_auc metric.

When defining multi-metric scoring as a dictionary and passing to cross_validate():

scoring = {
    "accuracy": make_scorer(metrics.accuracy_score),
    "sensitivity": make_scorer(metrics.recall_score),
    "specificity": make_scorer(metrics.recall_score, pos_label=0),
    "f1": make_scorer(metrics.f1_score),
    "roc_auc": make_scorer(metrics.roc_auc_score),
    "pr_auc": make_scorer(metrics.average_precision_score),
    "precision": make_scorer(metrics.precision_score),
}

the roc_auc is based on class labels (predict()) rather than scores (decision_function() or predict_proba())

Trying to set response_method in make_scorer doesn't work:

scoring = {
    "accuracy": make_scorer(metrics.accuracy_score),
    "sensitivity": make_scorer(metrics.recall_score),
    "specificity": make_scorer(metrics.recall_score, pos_label=0),
    "f1": make_scorer(metrics.f1_score),
    "roc_auc": make_scorer(metrics.roc_auc_score, response_method="decision_function"),
    "pr_auc": make_scorer(metrics.average_precision_score, response_method="decision_function"),
    "precision": make_scorer(metrics.precision_score),
}

because roc_auc is still a _PredictScorer object.

Passing roc_auc as string will work though.

scoring = {
    "accuracy": make_scorer(metrics.accuracy_score),
    "sensitivity": make_scorer(metrics.recall_score),
    "specificity": make_scorer(metrics.recall_score, pos_label=0),
    "f1": make_scorer(metrics.f1_score),
    "roc_auc": 'roc_auc',
    "pr_auc": make_scorer(metrics.average_precision_score),
    "precision": make_scorer(metrics.precision_score),
}

The following code also works:

roc_auc_score_dec_fnc = _ThresholdScorer(metrics.roc_auc_score, 1, {})
pr_auc_score_dec_fnc = _ThresholdScorer(metrics.average_precision_score, 1, {})

scoring = {
    "accuracy": make_scorer(metrics.accuracy_score),
    "sensitivity": make_scorer(metrics.recall_score),
    "specificity": make_scorer(metrics.recall_score, pos_label=0),
    "f1": make_scorer(metrics.f1_score),
    "roc_auc": roc_auc_score_dec_fnc ,
    "pr_auc": pr_auc_score_dec_fnc ,
    "precision": make_scorer(metrics.precision_score),
}

but _ThresholdScorer is an undocumented class.

I think make_scorer() should return _ThresholdScorer or _PredictProbaScorer object instead of _PredictScorer object when setting the response_method parameter because this bug is very easy to miss and most users might think make_scorer(metrics.roc_auc_score) and passing as 'roc_auc' will be the same.

Update: setting needs_threshold parameter in make_scorer() function will work properly.

roc_auc_score_dec_fnc = make_scorer(metrics.roc_auc_score, needs_threshold=True)
pr_auc_score_dec_fnc = make_scorer(metrics.average_precision_score, needs_threshold=True)

scoring = {
    "accuracy": make_scorer(metrics.accuracy_score),
    "sensitivity": make_scorer(metrics.recall_score),
    "specificity": make_scorer(metrics.recall_score, pos_label=0),
    "f1": make_scorer(metrics.f1_score),
    "roc_auc": roc_auc_score_dec_fnc,
    "pr_auc": pr_auc_score_dec_fnc,
    "precision": make_scorer(metrics.precision_score),
}

Steps/Code to Reproduce

from sklearn.metrics._scorer import _PredictScorer
from sklearn.metrics._scorer import _ThresholdScorer


scoring = {
    "accuracy": make_scorer(metrics.accuracy_score),
    "sensitivity": make_scorer(metrics.recall_score),
    "specificity": make_scorer(metrics.recall_score, pos_label=0),
    "f1": make_scorer(metrics.f1_score),
    "roc_auc": make_scorer(metrics.roc_auc_score),
    "pr_auc": make_scorer(metrics.average_precision_score),
    "precision": make_scorer(metrics.precision_score),
}

scorer = _MultimetricScorer(scorers=scoring , raise_exc=(error_score == "raise"))

# Should print True
print(isinstance(scorer._scorers['roc_auc'], _PredictScorer))
# Should print False
print(isinstance(scorer._scorers['roc_auc'], _ThresholdScorer))

Expected Results

n/a

Actual Results

n/a

Versions

System:
    python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:29:11) [MSC v.1935 64 bit (AMD64)]
executable: C:\Users\mning\AppData\Local\miniforge3\envs\mne\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.3.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.0
        scipy: 1.11.3
       Cython: None
       pandas: 2.1.3
   matplotlib: 3.8.1
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: C:\Users\mning\AppData\Local\miniforge3\envs\mne\Library\bin\libomp.dll
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libblas
       filepath: C:\Users\mning\AppData\Local\miniforge3\envs\mne\Library\bin\libblas.dll
        version: 0.3.24
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: vcomp
       filepath: C:\Users\mning\AppData\Local\miniforge3\envs\mne\vcomp140.dll
        version: None
@NoPenguinsLand NoPenguinsLand added Bug Needs Triage Issue requires triage labels Jan 23, 2024
@glemaitre
Copy link
Member

response_method is new in 1.4. You need to upgrade scikit-learn. In 1.4, just to show it works:

In [1]: from sklearn.metrics import make_scorer, roc_auc_score
In [2]: from sklearn.tree import DecisionTreeClassifier
In [3]: from sklearn.datasets import make_classification
In [4]: X, y = make_classification
In [5]: tree = DecisionTreeClassifier().fit(X, y)
In [6]: scorer = make_scorer(roc_auc_score, response_method="decision_function")
In [7]: scorer(tree, X, y)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 1
----> 1 scorer(tree, X, y)

File ~/Documents/packages/scikit-learn/sklearn/metrics/_scorer.py:253, in _BaseScorer.__call__(self, estimator, X, y_true, sample_weight, **kwargs)
    250 if sample_weight is not None:
    251     _kwargs["sample_weight"] = sample_weight
--> 253 return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)

File ~/Documents/packages/scikit-learn/sklearn/metrics/_scorer.py:344, in _Scorer._score(self, method_caller, estimator, X, y_true, **kwargs)
    334 self._warn_overlap(
    335     message=(
    336         "There is an overlap between set kwargs of this scorer instance and"
   (...)
    340     kwargs=kwargs,
    341 )
    343 pos_label = None if is_regressor(estimator) else self._get_pos_label()
--> 344 response_method = _check_response_method(estimator, self._response_method)
    345 y_pred = method_caller(
    346     estimator, response_method.__name__, X, pos_label=pos_label
    347 )
    349 scoring_kwargs = {**self._kwargs, **kwargs}

File ~/Documents/packages/scikit-learn/sklearn/utils/validation.py:2022, in _check_response_method(estimator, response_method)
   2020 prediction_method = reduce(lambda x, y: x or y, prediction_method)
   2021 if prediction_method is None:
-> 2022     raise AttributeError(
   2023         f"{estimator.__class__.__name__} has none of the following attributes: "
   2024         f"{', '.join(list_methods)}."
   2025     )
   2027 return prediction_method

AttributeError: DecisionTreeClassifier has none of the following attributes: decision_function.

This is indeed correct because DecisionTreeClassifier does not implement decision_function. Let's try LogisticRegression that should work:

In [13]: model = LogisticRegression().fit(X, y)
In [14]: scorer(model, X, y)
Out[14]: 0.9691876750700279

In 1.4, all scorers are now _Scorer:

In [15]: type(scorer)
Out[15]: sklearn.metrics._scorer._Scorer

We had a huge refactoring that allows to remove those internal scorers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants