scikit-learn · amueller · Jul 7, 2017 · Sep 28, 2016 · Sep 29, 2016 · Sep 29, 2016
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -223,6 +223,7 @@ Model validation
    :toctree: generated/
    :template: function.rst
 
+   model_selection.cross_validate
    model_selection.cross_val_score
    model_selection.cross_val_predict
    model_selection.permutation_test_score

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
@@ -172,6 +172,65 @@ validation iterator instead, for instance::
 
     See :ref:`combining_estimators`.
 
+
+.. _multimetric_cross_validation:
+
+The cross_validate function and multiple metric evaluation
+----------------------------------------------------------
+
+The ``cross_validate`` function differs from ``cross_val_score`` in two ways -
+
+- It allows specifying multiple metrics for evaluation.
+
+- It returns a dict containing training scores, fit-times and score-times in
+  addition to the test score.
+
+For single metric evaluation, where the scoring parameter is a string,
+callable or None, the keys will be - ``['test_score', 'fit_time', 'score_time']``
+
+And for multiple metric evaluation, the return value is a dict with the
+following keys -
+``['test_<scorer1_name>', 'test_<scorer2_name>', 'test_<scorer...>', 'fit_time', 'score_time']``
+
+``return_train_score`` is set to ``True`` by default. It adds train score keys
+for all the scorers. If train scores are not needed, this should be set to
+``False`` explicitly.
+
+The multiple metrics can be specified either as a list, tuple or set of
+predefined scorer names::
+
+    >>> from sklearn.model_selection import cross_validate
+    >>> from sklearn.metrics import recall_score
+    >>> scoring = ['precision_macro', 'recall_macro']
+    >>> clf = svm.SVC(kernel='linear', C=1, random_state=0)
+    >>> scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,
+    ...                         cv=5, return_train_score=False)
+    >>> sorted(scores.keys())
+    ['fit_time', 'score_time', 'test_precision_macro', 'test_recall_macro']
+    >>> scores['test_recall_macro']                       # doctest: +ELLIPSIS
+    array([ 0.96...,  1.  ...,  0.96...,  0.96...,  1.        ])
+
+Or as a dict mapping scorer name to a predefined or custom scoring function::
+
+    >>> from sklearn.metrics.scorer import make_scorer
+    >>> scoring = {'prec_macro': 'precision_macro',
+    ...            'rec_micro': make_scorer(recall_score, average='macro')}
+    >>> scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,
+    ...                         cv=5, return_train_score=True)
+    >>> sorted(scores.keys())                 # doctest: +NORMALIZE_WHITESPACE
+    ['fit_time', 'score_time', 'test_prec_macro', 'test_rec_micro',
+     'train_prec_macro', 'train_rec_micro']
+    >>> scores['train_rec_micro']                         # doctest: +ELLIPSIS
+    array([ 0.97...,  0.97...,  0.99...,  0.98...,  0.98...])
+
+Here is an example of ``cross_validate`` using a single metric::
+
+    >>> scores = cross_validate(clf, iris.data, iris.target,
+    ...                         scoring='precision_macro')
+    >>> sorted(scores.keys())
+    ['fit_time', 'score_time', 'test_score', 'train_score']
+
+
 Obtaining predictions by cross-validation
 -----------------------------------------
 
@@ -186,7 +245,7 @@ These prediction can then be used to evaluate the classifier::
   >>> from sklearn.model_selection import cross_val_predict
   >>> predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
   >>> metrics.accuracy_score(iris.target, predicted) # doctest: +ELLIPSIS
-  0.966...
+  0.973...
 
 Note that the result of this computation may be slightly different from those
 obtained using :func:`cross_val_score` as the elements are grouped in different

diff --git a/doc/modules/grid_search.rst b/doc/modules/grid_search.rst
@@ -84,6 +84,10 @@ evaluated and the best combination is retained.
       dataset. This is the best practice for evaluating the performance of a
       model with grid search.
 
+    - See :ref:`sphx_glr_auto_examples_model_selection_plot_multi_metric_evaluation`
+      for an example of :class:`GridSearchCV` being used to evaluate multiple
+      metrics simultaneously.
+
 .. _randomized_parameter_search:
 
 Randomized Parameter Optimization
@@ -161,6 +165,27 @@ scoring function can be specified via the ``scoring`` parameter to
 specialized cross-validation tools described below.
 See :ref:`scoring_parameter` for more details.
 
+.. _multimetric_grid_search:
+
+Specifying multiple metrics for evaluation
+------------------------------------------
+
+``GridSearchCV`` and ``RandomizedSearchCV`` allow specifying multiple metrics
+for the ``scoring`` parameter.
+
+Multimetric scoring can either be specified as a list of strings of predefined
+scores names or a dict mapping the scorer name to the scorer function and/or
+the predefined scorer name(s). See :ref:`multimetric_scoring` for more details.
+
+When specifying multiple metrics, the ``refit`` parameter must be set to the
+metric (string) for which the ``best_params_`` will be found and used to build
+the ``best_estimator_`` on the whole dataset. If the search should not be
+refit, set ``refit=False``. Leaving refit to the default value ``None`` will
+result in an error when using multiple metrics.
+
+See :ref:`sphx_glr_auto_examples_model_selection_plot_multi_metric_evaluation`
+for an example usage.
+
 Composite estimators and parameter spaces
 -----------------------------------------
 

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
@@ -210,6 +210,51 @@ the following two rules:
   Again, by convention higher numbers are better, so if your scorer
   returns loss, that value should be negated.
 
+.. _multimetric_scoring:
+
+Using mutiple metric evaluation
+-------------------------------
+
+Scikit-learn also permits evaluation of multiple metrics in ``GridSearchCV``,
+``RandomizedSearchCV`` and ``cross_validate``.
+
+There are two ways to specify multiple scoring metrics for the ``scoring``
+parameter:
+
+- As an iterable of string metrics::
+      >>> scoring = ['accuracy', 'precision']
+
+- As a ``dict`` mapping the scorer name to the scoring function::
+      >>> from sklearn.metrics import accuracy_score
+      >>> from sklearn.metrics import make_scorer
+      >>> scoring = {'accuracy': make_scorer(accuracy_score),
+      ...            'prec': 'precision'}
+
+Note that the dict values can either be scorer functions or one of the
+predefined metric strings.
+
+Currently only those scorer functions that return a single score can be passed
+inside the dict. Scorer functions that return multiple values are not
+permitted and will require a wrapper to return a single metric::
+
+    >>> from sklearn.model_selection import cross_validate
+    >>> from sklearn.metrics import confusion_matrix
+    >>> # A sample toy binary classification dataset
+    >>> X, y = datasets.make_classification(n_classes=2, random_state=0)
+    >>> svm = LinearSVC(random_state=0)
+    >>> tp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0]
+    >>> tn = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0]
+    >>> fp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[1, 0]
+    >>> fn = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 1]
+    >>> scoring = {'tp' : make_scorer(tp), 'tn' : make_scorer(tn),
+    ...            'fp' : make_scorer(fp), 'fn' : make_scorer(fn)}
+    >>> cv_results = cross_validate(svm.fit(X, y), X, y, scoring=scoring)
+    >>> # Getting the test set false positive scores
+    >>> print(cv_results['test_tp'])          # doctest: +NORMALIZE_WHITESPACE
+    [12 13 15]
+    >>> # Getting the test set false negative scores
+    >>> print(cv_results['test_fn'])          # doctest: +NORMALIZE_WHITESPACE
+    [5 4 1]
 
 .. _classification_metrics:
 

diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -31,6 +31,19 @@ Changelog
 New features
 ............
 
+   - :class:`model_selection.GridSearchCV` and
+     :class:`model_selection.RandomizedSearchCV` now support simultaneous
+     evaluation of multiple metrics. Refer to the
+     :ref:`multimetric_grid_search` section of the user guide for more
+     information. :issue:`7388` by `Raghav RV`_
+
+   - Added the :func:`model_selection.cross_validate` which allows evaluation
+     of multiple metrics. This function returns a dict with more useful
+     information from cross-validation such as the train scores, fit times and
+     score times.
+     Refer to :ref:`multimetric_cross_validation` section of the userguide
+     for more information. :issue:`7388` by `Raghav RV`_
+
    - Added :class:`multioutput.ClassifierChain` for multi-label
      classification. By `Adam Kleczewski <adamklec>`_.
 

diff --git a/examples/model_selection/plot_multi_metric_evaluation.py b/examples/model_selection/plot_multi_metric_evaluation.py
@@ -0,0 +1,94 @@
+"""Demonstration of multi-metric evaluation on cross_val_score and GridSearchCV
+
+Multiple metric parameter search can be done by setting the ``scoring``
+parameter to a list of metric scorer names or a dict mapping the scorer names
+to the scorer callables.
+
+The scores of all the scorers are available in the ``cv_results_`` dict at keys
+ending in ``'_<scorer_name>'`` (``'mean_test_precision'``,
+``'rank_test_precision'``, etc...)
+
+The ``best_estimator_``, ``best_index_``, ``best_score_`` and ``best_params_``
+correspond to the scorer (key) that is set to the ``refit`` attribute.
+"""
+
+# Author: Raghav RV <[email protected]>
+# License: BSD
+
+import numpy as np
+from matplotlib import pyplot as plt
+
+from sklearn.datasets import make_hastie_10_2
+from sklearn.model_selection import GridSearchCV
+from sklearn.metrics import make_scorer
+from sklearn.metrics import accuracy_score
+from sklearn.tree import DecisionTreeClassifier
+
+print(__doc__)
+
+###############################################################################
+# Running ``GridSearchCV`` using multiple evaluation metrics
+# ----------------------------------------------------------
+#
+
+X, y = make_hastie_10_2(n_samples=8000, random_state=42)
+
+# The scorers can be either be one of the predefined metric strings or a scorer
+# callable, like the one returned by make_scorer
+scoring = {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score)}
+
+# Setting refit='AUC', refits an estimator on the whole dataset with the
+# parameter setting that has the best cross-validated AUC score.
+# That estimator is made available at ``gs.best_estimator_`` along with
+# parameters like ``gs.best_score_``, ``gs.best_parameters_`` and
+# ``gs.best_index_``
+gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
+                  param_grid={'min_samples_split': range(2, 403, 10)},
+                  scoring=scoring, cv=5, refit='AUC')
+gs.fit(X, y)
+results = gs.cv_results_
+
+###############################################################################
+# Plotting the result
+# -------------------
+
+plt.figure(figsize=(13, 13))
+plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
+          fontsize=16)
+
+plt.xlabel("min_samples_split")
+plt.ylabel("Score")
+plt.grid()
+
+ax = plt.axes()
+ax.set_xlim(0, 402)
+ax.set_ylim(0.73, 1)
+
+# Get the regular numpy array from the MaskedArray
+X_axis = np.array(results['param_min_samples_split'].data, dtype=float)
+
+for scorer, color in zip(sorted(scoring), ['g', 'k']):
+    for sample, style in (('train', '--'), ('test', '-')):
+        sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
+        sample_score_std = results['std_%s_%s' % (sample, scorer)]
+        ax.fill_between(X_axis, sample_score_mean - sample_score_std,
+                        sample_score_mean + sample_score_std,
+                        alpha=0.1 if sample == 'test' else 0, color=color)
+        ax.plot(X_axis, sample_score_mean, style, color=color,
+                alpha=1 if sample == 'test' else 0.7,
+                label="%s (%s)" % (scorer, sample))
+
+    best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
+    best_score = results['mean_test_%s' % scorer][best_index]
+
+    # Plot a dotted vertical line at the best score for that scorer marked by x
+    ax.plot([X_axis[best_index], ] * 2, [0, best_score],
+            linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)
+
+    # Annotate the best score for that scorer
+    ax.annotate("%0.2f" % best_score,
+                (X_axis[best_index], best_score + 0.005))
+
+plt.legend(loc="best")
+plt.grid('off')
+plt.show()