Thanks to visit codestin.com
Credit goes to github.com

Skip to content

cross_val_score handling of multiple metrics #11006

Closed
@minggli

Description

@minggli

Description

model_selection.cross_val_score explicitly blocks multiple scores despite calling cross_validate underneath the hood. Essentially, we have two identical functions but have manually blocked cross_val_score from accepting multiple scores.

def cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None,
                    n_jobs=1, verbose=0, fit_params=None,
                    pre_dispatch='2*n_jobs'):

   # To ensure multimetric format is not supported
    scorer = check_scoring(estimator, scoring=scoring)

    cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups,
                                scoring={'score': scorer}, cv=cv,
                                return_train_score=False,
                                n_jobs=n_jobs, verbose=verbose,
                                fit_params=fit_params,
                                pre_dispatch=pre_dispatch)
    return cv_results['test_score']

It creates confusion and duplication of APIs. Open to discussion but can we perhaps consider:

  1. re-evaluate previous discussion in [MRG + 2] ENH Allow cross_val_score, GridSearchCV et al. to evaluate on multiple metrics #7388 and make cross_val_score accept both single and multiple scores and callable, None?
  2. Or alternatively make error message more explicit when passing multiple scores to cross_val_score.

Steps/Code to Reproduce

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
lr = LogisticRegression()

cross_val_score(lr, X, y, scoring=['accuracy', 'f1'])

Expected Results

>>> cross_val_score(lr, X, y, scoring=['accuracy', 'f1'])
{'fit_time': array([0.00580502, 0.00351   , 0.00399804]), 'score_time': array([0.00100303, 0.00081301, 0.00084615]), 'test_accuracy': array([0.93684211, 0.96842105, 0.94179894]), 'test_f1': array([0.95081967, 0.97520661, 0.9527897 ])}

Actual Results

>>> cross_val_score(lr, X, y, scoring=['accuracy', 'f1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mingli/GitHub/personal/scikit-learn/sklearn/model_selection/_validation.py", line 349, in cross_val_score
    scorer = check_scoring(estimator, scoring=scoring)
  File "/Users/mingli/GitHub/personal/scikit-learn/sklearn/metrics/scorer.py", line 305, in check_scoring
    " None. %r was passed" % scoring)
ValueError: scoring value should either be a callable, string or None. ['accuracy', 'f1'] was passed

Versions

Darwin-17.4.0-x86_64-i386-64bit
Python 3.6.4 (default, Mar 22 2018, 13:54:22)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
NumPy 1.14.2
SciPy 1.0.1
Scikit-Learn 0.20.dev0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions