Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX cross_validate with multimetric scoring returns the non-failed scorers results if there is a failing scorer #23101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d49b41c
Formatting
simonandras Apr 10, 2022
82e6dc1
If all scorer fails, raise an error.
simonandras Apr 10, 2022
056b40c
Update warning message.
simonandras Apr 10, 2022
2aa8ac7
Update raised exception
simonandras Apr 10, 2022
30363eb
Making the changes of the rewiev. TODO: update the tests and add new …
simonandras Apr 10, 2022
5caba43
Update to a list.
simonandras Apr 12, 2022
e88cfde
Added formatted exception to the warnings. All the failing scorers me…
simonandras Apr 16, 2022
69c61e8
Calling the function with positional arguments instead.
simonandras Apr 16, 2022
01ccc1a
added test_multimetric_scorer_returning_exceptions_in_dictianary. TOD…
simonandras Apr 18, 2022
218f5bd
Update test_cross_validate_failing_scorer to cover the failing and no…
simonandras Apr 18, 2022
1ab52b2
Changing '2' to more explicit 'score_2' in the test_cross_validate_fa…
simonandras Apr 30, 2022
3e066c6
_MultimetricScorer now can raise an exception in __call__ or not. Thi…
simonandras May 3, 2022
b56b585
Remove not used function from the imports.
simonandras May 3, 2022
ea793c9
Changing _MultimetricScorer API: now the 'scorers' parameter is a pos…
simonandras May 11, 2022
a6e05c7
Adding changelog.
simonandras May 11, 2022
8100661
small changes to changelelog
simonandras May 11, 2022
176d406
Changing the changelog again.
simonandras May 11, 2022
84a0808
Merge branch 'main' into cross_validate__with_multiple_scorer_nan_issue
simonandras May 11, 2022
fd616b9
Changing the number in the changelogand a small change in the test_cr…
simonandras May 11, 2022
f7857ad
Correct doc.
simonandras May 11, 2022
70c59e0
Changing changelog based on rewiev.
simonandras May 21, 2022
03d33c8
Removing assertions from _validation.py and leaving comments instead.
simonandras May 21, 2022
1f3e60b
Merge remote-tracking branch 'upstream/main' into cross_validate__wit…
simonandras Jun 4, 2022
94fab7e
Merge branch 'main' into cross_validate__with_multiple_scorer_nan_issue
glemaitre Dec 28, 2022
89cf53e
DOC move entry changelog
glemaitre Dec 28, 2022
7be9ca1
Apply suggestions from code review
glemaitre Dec 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats_new/v1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ Changelog
:class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor`.
:pr:`25177` by :user:`Tim Head <betatim>`.

:mod:`sklearn.model_selection`
..............................
- |Fix| :func:`model_selection.cross_validate` with multimetric scoring in
case of some failing scorers the non-failing scorers now returns proper
scores instead of `error_score` values.
:pr:`23101` by :user:`András Simon <simonandras>` and `Thomas Fan`_.

:mod:`sklearn.pipeline`
.......................
- |Feature| :class:`pipeline.FeatureUnion` can now use indexing notation (e.g.
Expand Down
2 changes: 1 addition & 1 deletion sklearn/inspection/_permutation_importance.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def permutation_importance(
scorer = check_scoring(estimator, scoring=scoring)
else:
scorers_dict = _check_multimetric_scoring(estimator, scoring)
scorer = _MultimetricScorer(**scorers_dict)
scorer = _MultimetricScorer(scorers=scorers_dict)

baseline_score = _weights_scorer(scorer, estimator, X, y, sample_weight)

Expand Down
26 changes: 20 additions & 6 deletions sklearn/metrics/_scorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from collections.abc import Iterable
from functools import partial
from collections import Counter
from traceback import format_exc

import numpy as np
import copy
Expand Down Expand Up @@ -91,10 +92,16 @@ class _MultimetricScorer:
----------
scorers : dict
Dictionary mapping names to callable scorers.

raise_exc : bool, default=True
Whether to raise the exception in `__call__` or not. If set to `False`
a formatted string of the exception details is passed as result of
the failing scorer.
"""

def __init__(self, **scorers):
def __init__(self, *, scorers, raise_exc=True):
self._scorers = scorers
self._raise_exc = raise_exc

def __call__(self, estimator, *args, **kwargs):
"""Evaluate predicted target values."""
Expand All @@ -103,11 +110,18 @@ def __call__(self, estimator, *args, **kwargs):
cached_call = partial(_cached_call, cache)

for name, scorer in self._scorers.items():
if isinstance(scorer, _BaseScorer):
score = scorer._score(cached_call, estimator, *args, **kwargs)
else:
score = scorer(estimator, *args, **kwargs)
scores[name] = score
try:
if isinstance(scorer, _BaseScorer):
score = scorer._score(cached_call, estimator, *args, **kwargs)
else:
score = scorer(estimator, *args, **kwargs)
scores[name] = score
except Exception as e:
if self._raise_exc:
raise e
else:
scores[name] = format_exc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reviewers, a string is used to pass the information back to the caller.


return scores

def _use_cache(self, estimator):
Expand Down
51 changes: 47 additions & 4 deletions sklearn/metrics/tests/test_score_objects.py
Original file line number Diff line number Diff line change
Expand Up @@ -786,7 +786,7 @@ def test_multimetric_scorer_calls_method_once(
mock_est.classes_ = np.array([0, 1])

scorer_dict = _check_multimetric_scoring(LogisticRegression(), scorers)
multi_scorer = _MultimetricScorer(**scorer_dict)
multi_scorer = _MultimetricScorer(scorers=scorer_dict)
results = multi_scorer(mock_est, X, y)

assert set(scorers) == set(results) # compare dict keys
Expand All @@ -813,7 +813,7 @@ def predict_proba(self, X):

scorers = ["roc_auc", "neg_log_loss"]
scorer_dict = _check_multimetric_scoring(clf, scorers)
scorer = _MultimetricScorer(**scorer_dict)
scorer = _MultimetricScorer(scorers=scorer_dict)
scorer(clf, X, y)

assert predict_proba_call_cnt == 1
Expand All @@ -836,7 +836,7 @@ def predict(self, X):

scorers = {"neg_mse": "neg_mean_squared_error", "r2": "roc_auc"}
scorer_dict = _check_multimetric_scoring(clf, scorers)
scorer = _MultimetricScorer(**scorer_dict)
scorer = _MultimetricScorer(scorers=scorer_dict)
scorer(clf, X, y)

assert predict_called_cnt == 1
Expand All @@ -859,7 +859,7 @@ def test_multimetric_scorer_sanity_check():
clf.fit(X, y)

scorer_dict = _check_multimetric_scoring(clf, scorers)
multi_scorer = _MultimetricScorer(**scorer_dict)
multi_scorer = _MultimetricScorer(scorers=scorer_dict)

result = multi_scorer(clf, X, y)

Expand All @@ -873,6 +873,49 @@ def test_multimetric_scorer_sanity_check():
assert_allclose(value, separate_scores[score_name])


@pytest.mark.parametrize("raise_exc", [True, False])
def test_multimetric_scorer_exception_handling(raise_exc):
"""Check that the calling of the `_MultimetricScorer` returns
exception messages in the result dict for the failing scorers
in case of `raise_exc` is `False` and if `raise_exc` is `True`,
then the proper exception is raised.
"""
scorers = {
"failing_1": "neg_mean_squared_log_error",
"non_failing": "neg_median_absolute_error",
"failing_2": "neg_mean_squared_log_error",
}

X, y = make_classification(
n_samples=50, n_features=2, n_redundant=0, random_state=0
)
y *= -1 # neg_mean_squared_log_error fails if y contains negative values

clf = DecisionTreeClassifier().fit(X, y)

scorer_dict = _check_multimetric_scoring(clf, scorers)
multi_scorer = _MultimetricScorer(scorers=scorer_dict, raise_exc=raise_exc)

error_msg = (
"Mean Squared Logarithmic Error cannot be used when targets contain"
" negative values."
)

if raise_exc:
with pytest.raises(ValueError, match=error_msg):
multi_scorer(clf, X, y)
else:
result = multi_scorer(clf, X, y)

exception_message_1 = result["failing_1"]
score = result["non_failing"]
exception_message_2 = result["failing_2"]

assert isinstance(exception_message_1, str) and error_msg in exception_message_1
assert isinstance(score, float)
assert isinstance(exception_message_2, str) and error_msg in exception_message_2


@pytest.mark.parametrize(
"scorer_name, metric",
[
Expand Down
38 changes: 28 additions & 10 deletions sklearn/model_selection/_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -758,27 +758,45 @@ def _score(estimator, X_test, y_test, scorer, error_score="raise"):
"""
if isinstance(scorer, dict):
# will cache method calls if needed. scorer() returns a dict
scorer = _MultimetricScorer(**scorer)
scorer = _MultimetricScorer(scorers=scorer, raise_exc=(error_score == "raise"))

try:
if y_test is None:
scores = scorer(estimator, X_test)
else:
scores = scorer(estimator, X_test, y_test)
except Exception:
if error_score == "raise":
if isinstance(scorer, _MultimetricScorer):
# If `_MultimetricScorer` raises exception, the `error_score`
# parameter is equal to "raise".
raise
else:
if isinstance(scorer, _MultimetricScorer):
scores = {name: error_score for name in scorer._scorers}
if error_score == "raise":
raise
else:
scores = error_score
warnings.warn(
"Scoring failed. The score on this train-test partition for "
f"these parameters will be set to {error_score}. Details: \n"
f"{format_exc()}",
UserWarning,
)
warnings.warn(
"Scoring failed. The score on this train-test partition for "
f"these parameters will be set to {error_score}. Details: \n"
f"{format_exc()}",
UserWarning,
)

# Check non-raised error messages in `_MultimetricScorer`
if isinstance(scorer, _MultimetricScorer):
exception_messages = [
(name, str_e) for name, str_e in scores.items() if isinstance(str_e, str)
]
if exception_messages:
# error_score != "raise"
for name, str_e in exception_messages:
scores[name] = error_score
warnings.warn(
"Scoring failed. The score on this train-test partition for "
f"these parameters will be set to {error_score}. Details: \n"
f"{str_e}",
UserWarning,
)

error_msg = "scoring must return a number, got %s (%s) instead. (scorer=%s)"
if isinstance(scores, dict):
Expand Down
25 changes: 19 additions & 6 deletions sklearn/model_selection/tests/test_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -2232,15 +2232,22 @@ def test_cross_val_score_failing_scorer(error_score):
def test_cross_validate_failing_scorer(
error_score, return_train_score, with_multimetric
):
# check that an estimator can fail during scoring in `cross_validate` and
# that we can optionally replaced it with `error_score`
# Check that an estimator can fail during scoring in `cross_validate` and
# that we can optionally replace it with `error_score`. In the multimetric
# case also check the result of a non-failing scorer where the other scorers
# are failing.
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(max_iter=5).fit(X, y)

error_msg = "This scorer is supposed to fail!!!"
failing_scorer = partial(_failing_scorer, error_msg=error_msg)
if with_multimetric:
scoring = {"score_1": failing_scorer, "score_2": failing_scorer}
non_failing_scorer = make_scorer(mean_squared_error)
scoring = {
"score_1": failing_scorer,
"score_2": non_failing_scorer,
"score_3": failing_scorer,
}
else:
scoring = failing_scorer

Expand Down Expand Up @@ -2272,9 +2279,15 @@ def test_cross_validate_failing_scorer(
)
for key in results:
if "_score" in key:
# check the test (and optionally train score) for all
# scorers that should be assigned to `error_score`.
assert_allclose(results[key], error_score)
if "_score_2" in key:
# check the test (and optionally train) score for the
# scorer that should be non-failing
for i in results[key]:
assert isinstance(i, float)
else:
# check the test (and optionally train) score for all
# scorers that should be assigned to `error_score`.
assert_allclose(results[key], error_score)


def three_params_scorer(i, j, k):
Expand Down