-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG + 2] ENH Allow cross_val_score
, GridSearchCV
et al. to evaluate on multiple metrics
#7388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d6d1000
2e52d9b
4e7845a
47e282f
823a079
55c0743
8e4dd35
a4ce716
dfcd15e
729c262
9b71cfe
41c1b02
92e031b
10043a7
752adb6
929158f
ac54beb
137c788
dcae56d
9b0b5ef
20fe3ad
5960b40
dc3ee2c
88004b9
590b49b
ef0fe7d
5951e94
8815b80
117de6c
728d004
3bac166
8d0c1e5
91053cd
5611a08
e797c08
9501be1
1e84296
2551155
fd36391
24ef398
c16ac88
c7094c4
51cab0d
21a6304
7143ad5
44659a1
a6d0060
e64041c
21b6fcb
9e57644
fbf0527
2b666bd
c74d5f3
c521b63
a6727aa
feed649
528d36c
cf3faa4
27f025f
095f3cf
074cbcf
e853017
faa1fd0
946f41c
5696627
4819968
7627e34
a52f9cc
afb2a34
36dfd1a
afe2bf8
ef88554
5986e18
aa6fa8c
5e424e7
c431ce8
6f0396e
e9c71bf
ee23970
e457bdb
b4e0213
c9a7da4
a7d865f
277f0a0
9eeacfc
428121c
dd4ac3a
6bc8726
0134338
5db9cea
1d44f9e
2372fbb
8c02a66
9eae2d6
da932ec
5c695bb
e40d6e5
2c89a25
b5c8b46
ff88ace
6f40803
674882b
09fd482
fd9e82c
9896333
d136889
13f6a44
d5ab0f1
bf408a8
bc8c815
d63c770
309c33b
2631ffe
afe4837
0da53b2
c63d6e3
2a59b12
7b204d8
5dbe2a1
b6de448
2364e39
0984907
58dac26
0130163
1d077a5
f72ab5c
bcfe238
ec290fb
ea7cae3
1c70d51
a1d386f
bcb0051
39c43c3
751e10a
192d146
e01a1c4
809db2e
f1b4bf1
67fc1a4
3822147
926f81f
b2e7833
d48443c
8a20053
81713b6
e06495b
9402cbb
d03a515
8a1ebf1
2d51ac6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,6 +84,10 @@ evaluated and the best combination is retained. | |
dataset. This is the best practice for evaluating the performance of a | ||
model with grid search. | ||
|
||
- See :ref:`sphx_glr_auto_examples_model_selection_plot_multi_metric_evaluation` | ||
for an example of :class:`GridSearchCV` being used to evaluate multiple | ||
metrics simultaneously. | ||
|
||
.. _randomized_parameter_search: | ||
|
||
Randomized Parameter Optimization | ||
|
@@ -161,6 +165,27 @@ scoring function can be specified via the ``scoring`` parameter to | |
specialized cross-validation tools described below. | ||
See :ref:`scoring_parameter` for more details. | ||
|
||
.. _multimetric_grid_search: | ||
|
||
Specifying multiple metrics for evaluation | ||
------------------------------------------ | ||
|
||
``GridSearchCV`` and ``RandomizedSearchCV`` allow specifying multiple metrics | ||
for the ``scoring`` parameter. | ||
|
||
Multimetric scoring can either be specified as a list of strings of predefined | ||
scores names or a dict mapping the scorer name to the scorer function and/or | ||
the predefined scorer name(s). See :ref:`multimetric_scoring` for more details. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's appropriate to mention the |
||
When specifying multiple metrics, the ``refit`` parameter must be set to the | ||
metric (string) for which the ``best_params_`` will be found and used to build | ||
the ``best_estimator_`` on the whole dataset. If the search should not be | ||
refit, set ``refit=False``. Leaving refit to the default value ``None`` will | ||
result in an error when using multiple metrics. | ||
|
||
See :ref:`sphx_glr_auto_examples_model_selection_plot_multi_metric_evaluation` | ||
for an example usage. | ||
|
||
Composite estimators and parameter spaces | ||
----------------------------------------- | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -210,6 +210,51 @@ the following two rules: | |
Again, by convention higher numbers are better, so if your scorer | ||
returns loss, that value should be negated. | ||
|
||
.. _multimetric_scoring: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why this is here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've referenced it in |
||
|
||
Using mutiple metric evaluation | ||
------------------------------- | ||
|
||
Scikit-learn also permits evaluation of multiple metrics in ``GridSearchCV``, | ||
``RandomizedSearchCV`` and ``cross_validate``. | ||
|
||
There are two ways to specify multiple scoring metrics for the ``scoring`` | ||
parameter: | ||
|
||
- As an iterable of string metrics:: | ||
>>> scoring = ['accuracy', 'precision'] | ||
|
||
- As a ``dict`` mapping the scorer name to the scoring function:: | ||
>>> from sklearn.metrics import accuracy_score | ||
>>> from sklearn.metrics import make_scorer | ||
>>> scoring = {'accuracy': make_scorer(accuracy_score), | ||
... 'prec': 'precision'} | ||
|
||
Note that the dict values can either be scorer functions or one of the | ||
predefined metric strings. | ||
|
||
Currently only those scorer functions that return a single score can be passed | ||
inside the dict. Scorer functions that return multiple values are not | ||
permitted and will require a wrapper to return a single metric:: | ||
|
||
>>> from sklearn.model_selection import cross_validate | ||
>>> from sklearn.metrics import confusion_matrix | ||
>>> # A sample toy binary classification dataset | ||
>>> X, y = datasets.make_classification(n_classes=2, random_state=0) | ||
>>> svm = LinearSVC(random_state=0) | ||
>>> tp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it a bad idea to recommend lambda when it's not able to be pickled (dill notwithstanding)? |
||
>>> tn = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0] | ||
>>> fp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[1, 0] | ||
>>> fn = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 1] | ||
>>> scoring = {'tp' : make_scorer(tp), 'tn' : make_scorer(tn), | ||
... 'fp' : make_scorer(fp), 'fn' : make_scorer(fn)} | ||
>>> cv_results = cross_validate(svm.fit(X, y), X, y, scoring=scoring) | ||
>>> # Getting the test set false positive scores | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Next line says tp, not fp |
||
>>> print(cv_results['test_tp']) # doctest: +NORMALIZE_WHITESPACE | ||
[12 13 15] | ||
>>> # Getting the test set false negative scores | ||
>>> print(cv_results['test_fn']) # doctest: +NORMALIZE_WHITESPACE | ||
[5 4 1] | ||
|
||
.. _classification_metrics: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
"""Demonstration of multi-metric evaluation on cross_val_score and GridSearchCV | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @GaelVaroquaux is probably right in suggesting we can reduce the number of examples, and instead demonstrate features successively when needed. Can we roll this feature into existing grid search examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And we really don't need to illustrate each quirk of the feature, e.g. different ways to specify multiple metrics. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think about keeping this example and removing this one instead? The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not change the existing example? will break less links. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like discussed IRL, I feel it's better to have both... (at least for now) |
||
|
||
Multiple metric parameter search can be done by setting the ``scoring`` | ||
parameter to a list of metric scorer names or a dict mapping the scorer names | ||
to the scorer callables. | ||
|
||
The scores of all the scorers are available in the ``cv_results_`` dict at keys | ||
ending in ``'_<scorer_name>'`` (``'mean_test_precision'``, | ||
``'rank_test_precision'``, etc...) | ||
|
||
The ``best_estimator_``, ``best_index_``, ``best_score_`` and ``best_params_`` | ||
correspond to the scorer (key) that is set to the ``refit`` attribute. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When using multiple metrics, you need to specify the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @amueller That is the current case, except Also cc: @jnothman @GaelVaroquaux There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah you've already answered this at #7388 (comment). I read this one before. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but I just want to be very explicit about the current behavior. This is something that people will definitely run into, so just tell them exactly what they need to do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @amueller is suggesting you use this kind of instructive wording in the narrative docs. Perhaps just adopt his wording? |
||
""" | ||
|
||
# Author: Raghav RV <[email protected]> | ||
# License: BSD | ||
|
||
import numpy as np | ||
from matplotlib import pyplot as plt | ||
|
||
from sklearn.datasets import make_hastie_10_2 | ||
from sklearn.model_selection import GridSearchCV | ||
from sklearn.metrics import make_scorer | ||
from sklearn.metrics import accuracy_score | ||
from sklearn.tree import DecisionTreeClassifier | ||
|
||
print(__doc__) | ||
|
||
############################################################################### | ||
# Running ``GridSearchCV`` using multiple evaluation metrics | ||
# ---------------------------------------------------------- | ||
# | ||
|
||
X, y = make_hastie_10_2(n_samples=8000, random_state=42) | ||
|
||
# The scorers can be either be one of the predefined metric strings or a scorer | ||
# callable, like the one returned by make_scorer | ||
scoring = {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score)} | ||
|
||
# Setting refit='AUC', refits an estimator on the whole dataset with the | ||
# parameter setting that has the best cross-validated AUC score. | ||
# That estimator is made available at ``gs.best_estimator_`` along with | ||
# parameters like ``gs.best_score_``, ``gs.best_parameters_`` and | ||
# ``gs.best_index_`` | ||
gs = GridSearchCV(DecisionTreeClassifier(random_state=42), | ||
param_grid={'min_samples_split': range(2, 403, 10)}, | ||
scoring=scoring, cv=5, refit='AUC') | ||
gs.fit(X, y) | ||
results = gs.cv_results_ | ||
|
||
############################################################################### | ||
# Plotting the result | ||
# ------------------- | ||
|
||
plt.figure(figsize=(13, 13)) | ||
plt.title("GridSearchCV evaluating using multiple scorers simultaneously", | ||
fontsize=16) | ||
|
||
plt.xlabel("min_samples_split") | ||
plt.ylabel("Score") | ||
plt.grid() | ||
|
||
ax = plt.axes() | ||
ax.set_xlim(0, 402) | ||
ax.set_ylim(0.73, 1) | ||
|
||
# Get the regular numpy array from the MaskedArray | ||
X_axis = np.array(results['param_min_samples_split'].data, dtype=float) | ||
|
||
for scorer, color in zip(sorted(scoring), ['g', 'k']): | ||
for sample, style in (('train', '--'), ('test', '-')): | ||
sample_score_mean = results['mean_%s_%s' % (sample, scorer)] | ||
sample_score_std = results['std_%s_%s' % (sample, scorer)] | ||
ax.fill_between(X_axis, sample_score_mean - sample_score_std, | ||
sample_score_mean + sample_score_std, | ||
alpha=0.1 if sample == 'test' else 0, color=color) | ||
ax.plot(X_axis, sample_score_mean, style, color=color, | ||
alpha=1 if sample == 'test' else 0.7, | ||
label="%s (%s)" % (scorer, sample)) | ||
|
||
best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0] | ||
best_score = results['mean_test_%s' % scorer][best_index] | ||
|
||
# Plot a dotted vertical line at the best score for that scorer marked by x | ||
ax.plot([X_axis[best_index], ] * 2, [0, best_score], | ||
linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8) | ||
|
||
# Annotate the best score for that scorer | ||
ax.annotate("%0.2f" % best_score, | ||
(X_axis[best_index], best_score + 0.005)) | ||
|
||
plt.legend(loc="best") | ||
plt.grid('off') | ||
plt.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something? What's changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clf now is different as we modified it above.