From be6a14a67b3dfb760312b193bd8f814d5bd39bfc Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Thu, 21 Nov 2024 15:44:04 +1100 Subject: [PATCH 1/9] improve scoring param --- doc/modules/model_evaluation.rst | 162 ++++++++++++++++++------------- sklearn/metrics/_scorer.py | 2 +- 2 files changed, 95 insertions(+), 69 deletions(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index b161014f5268f..8abc11906c82f 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -11,13 +11,16 @@ predictions: * **Estimator score method**: Estimators have a ``score`` method providing a default evaluation criterion for the problem they are designed to solve. - This is not discussed on this page, but in each estimator's documentation. + Most commonly this is mean :ref:`accuracy ` for classifiers and the + :ref:`coefficient of determination ` (:math:`R^2`) for regressors. + Details for each estimator can be found in it's documentation. -* **Scoring parameter**: Model-evaluation tools using +* **Scoring parameter**: Model-evaluation tools that use :ref:`cross-validation ` (such as - :func:`model_selection.cross_val_score` and - :class:`model_selection.GridSearchCV`) rely on an internal *scoring* strategy. - This is discussed in the section :ref:`scoring_parameter`. + :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and + :class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy. + This can be specified using the `scoring` parameter and is discussed in the + section :ref:`scoring_parameter`. * **Metric functions**: The :mod:`sklearn.metrics` module implements functions assessing prediction error for specific purposes. These metrics are detailed @@ -38,24 +41,39 @@ value of those metrics for random predictions. The ``scoring`` parameter: defining model evaluation rules ========================================================== -Model selection and evaluation using tools, such as -:class:`model_selection.GridSearchCV` and -:func:`model_selection.cross_val_score`, take a ``scoring`` parameter that +Model selection and evaluation using tools that use +:ref:`cross-validation ` (such as +:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and +:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that controls what metric they apply to the estimators evaluated. -Common cases: predefined values -------------------------------- +They can be specified in several ways: + +* `None`: the estimator's default evaluation criterion (i.e., the method used in the + estimators `score` method) is used. +* :ref:`String name `: common metrics can be passed via a string + name. +* :ref:`Callable `: more complex metrics can be passed via a callable + (e.g., function). + +Some tools may also accept multiple metric evaluation. See :ref:`multimetric_scoring` +for details. + +.. _scoring_string_names: + +Common cases: string names +-------------------------- For the most common use cases, you can designate a scorer object with the -``scoring`` parameter; the table below shows all possible values. +``scoring`` parameter via a string name; the table below shows all possible values. All scorer objects follow the convention that **higher return values are better than lower return values**. Thus metrics which measure the distance between the model and the data, like :func:`metrics.mean_squared_error`, are -available as neg_mean_squared_error which return the negated value +available as 'neg_mean_squared_error' which return the negated value of the metric. ==================================== ============================================== ================================== -Scoring Function Comment +Scoring string name Function Comment ==================================== ============================================== ================================== **Classification** 'accuracy' :func:`metrics.accuracy_score` @@ -123,10 +141,17 @@ Usage examples: .. currentmodule:: sklearn.metrics -.. _scoring: +.. _scoring_callable: + +Callable scorers +---------------- + +For more more complex use cases and more flexibility, you can pass a callable to +the `scoring` parameter. Below we describe different methods of creating the callable, +in increasing order of flexibility. Defining your scoring strategy from metric functions ------------------------------------------------------ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following metrics functions are not implemented as named scorers, sometimes because they require additional parameters, such as @@ -171,59 +196,61 @@ measuring a prediction error given ground truth and prediction: the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the parameter description below). +.. _scoring_make_scorer: + +Custom scorer objects using `make_scorer` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -.. dropdown:: Custom scorer objects - - The second use case is to build a completely custom scorer object - from a simple python function using :func:`make_scorer`, which can - take several parameters: - - * the python function you want to use (``my_custom_loss_func`` - in the example below) - - * whether the python function returns a score (``greater_is_better=True``, - the default) or a loss (``greater_is_better=False``). If a loss, the output - of the python function is negated by the scorer object, conforming to - the cross validation convention that scorers return higher values for better models. - - * for classification metrics only: whether the python function you provided requires - continuous decision certainties. If the scoring function only accepts probability - estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter - `response_method`, thus in this case `response_method="predict_proba"`. Some scoring - function do not necessarily require probability estimates but rather non-thresholded - decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a - list such as `response_method=["decision_function", "predict_proba"]`. In this case, - the scorer will use the first available method, in the order given in the list, - to compute the scores. - - * any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`. - - Here is an example of building custom scorers, and of using the - ``greater_is_better`` parameter:: - - >>> import numpy as np - >>> def my_custom_loss_func(y_true, y_pred): - ... diff = np.abs(y_true - y_pred).max() - ... return np.log1p(diff) - ... - >>> # score will negate the return value of my_custom_loss_func, - >>> # which will be np.log(2), 0.693, given the values for X - >>> # and y defined below. - >>> score = make_scorer(my_custom_loss_func, greater_is_better=False) - >>> X = [[1], [1]] - >>> y = [0, 1] - >>> from sklearn.dummy import DummyClassifier - >>> clf = DummyClassifier(strategy='most_frequent', random_state=0) - >>> clf = clf.fit(X, y) - >>> my_custom_loss_func(y, clf.predict(X)) - 0.69... - >>> score(clf, X, y) - -0.69... +The second use case is to build a completely custom scorer object +from a simple python function using :func:`make_scorer`, which can +take several parameters: + +* the python function you want to use (``my_custom_loss_func`` + in the example below) + +* whether the python function returns a score (``greater_is_better=True``, + the default) or a loss (``greater_is_better=False``). If a loss, the output + of the python function is negated by the scorer object, conforming to + the cross validation convention that scorers return higher values for better models. + +* for classification metrics only: whether the python function you provided requires + continuous decision certainties. If the scoring function only accepts probability + estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter + `response_method`, thus in this case `response_method="predict_proba"`. Some scoring + function do not necessarily require probability estimates but rather non-thresholded + decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a + list such as `response_method=["decision_function", "predict_proba"]`. In this case, + the scorer will use the first available method, in the order given in the list, + to compute the scores. + +* any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`. + +Here is an example of building custom scorers, and of using the +``greater_is_better`` parameter:: + + >>> import numpy as np + >>> def my_custom_loss_func(y_true, y_pred): + ... diff = np.abs(y_true - y_pred).max() + ... return np.log1p(diff) + ... + >>> # score will negate the return value of my_custom_loss_func, + >>> # which will be np.log(2), 0.693, given the values for X + >>> # and y defined below. + >>> score = make_scorer(my_custom_loss_func, greater_is_better=False) + >>> X = [[1], [1]] + >>> y = [0, 1] + >>> from sklearn.dummy import DummyClassifier + >>> clf = DummyClassifier(strategy='most_frequent', random_state=0) + >>> clf = clf.fit(X, y) + >>> my_custom_loss_func(y, clf.predict(X)) + 0.69... + >>> score(clf, X, y) + -0.69... .. _diy_scoring: -Implementing your own scoring object ------------------------------------- +Custom scorer objects from scratch +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can generate even more flexible model scorers by constructing your own scoring object from scratch, without using the :func:`make_scorer` factory. @@ -2934,10 +2961,9 @@ Clustering metrics .. currentmodule:: sklearn.metrics The :mod:`sklearn.metrics` module implements several loss, score, and utility -functions. For more information see the :ref:`clustering_evaluation` -section for instance clustering, and :ref:`biclustering_evaluation` for -biclustering. - +functions to measure clustering performance. For more information see the +:ref:`clustering_evaluation` section for instance clustering, and +:ref:`biclustering_evaluation` for biclustering. .. _dummy_estimators: diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py index bc8c3a09a320c..3a565c67e10ab 100644 --- a/sklearn/metrics/_scorer.py +++ b/sklearn/metrics/_scorer.py @@ -640,7 +640,7 @@ def make_scorer( The parameter `response_method` allows to specify which method of the estimator should be used to feed the scoring/loss function. - Read more in the :ref:`User Guide `. + Read more in the :ref:`User Guide `. Parameters ---------- From 491982a8770fb779fc80d656870ec597d342630e Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Thu, 21 Nov 2024 16:03:16 +1100 Subject: [PATCH 2/9] wording --- doc/modules/model_evaluation.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index 8abc11906c82f..e20c3b6b24270 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -19,8 +19,8 @@ predictions: :ref:`cross-validation ` (such as :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and :class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy. - This can be specified using the `scoring` parameter and is discussed in the - section :ref:`scoring_parameter`. + This can be specified using the `scoring` parameter of that tool and is discussed + in the section :ref:`scoring_parameter`. * **Metric functions**: The :mod:`sklearn.metrics` module implements functions assessing prediction error for specific purposes. These metrics are detailed From 853cfd80353386e5c0953f8a76b3f7b6575a127b Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Thu, 21 Nov 2024 16:03:54 +1100 Subject: [PATCH 3/9] wording --- doc/modules/model_evaluation.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index e20c3b6b24270..f2473dd169ada 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -49,7 +49,7 @@ controls what metric they apply to the estimators evaluated. They can be specified in several ways: -* `None`: the estimator's default evaluation criterion (i.e., the method used in the +* `None`: the estimator's default evaluation criterion (i.e., the metric used in the estimators `score` method) is used. * :ref:`String name `: common metrics can be passed via a string name. From f09f35307eb1361fb45f5c2fdafd97439658906f Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Thu, 21 Nov 2024 16:36:08 +1100 Subject: [PATCH 4/9] wip --- doc/modules/model_evaluation.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index f2473dd169ada..cbc64ca89aa4d 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -61,8 +61,8 @@ for details. .. _scoring_string_names: -Common cases: string names --------------------------- +Scoring parameter: string names +------------------------------- For the most common use cases, you can designate a scorer object with the ``scoring`` parameter via a string name; the table below shows all possible values. From 69f23de4dec6b4829b99aced63188772f51f76ca Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Thu, 21 Nov 2024 16:40:08 +1100 Subject: [PATCH 5/9] fix cross ref --- doc/modules/classification_threshold.rst | 2 +- doc/modules/model_evaluation.rst | 4 ++-- sklearn/feature_selection/_sequential.py | 2 +- sklearn/inspection/_permutation_importance.py | 2 +- sklearn/metrics/_scorer.py | 4 ++-- sklearn/model_selection/_plot.py | 4 ++-- sklearn/model_selection/_search.py | 4 ++-- sklearn/model_selection/_search_successive_halving.py | 4 ++-- sklearn/model_selection/_validation.py | 4 ++-- 9 files changed, 15 insertions(+), 15 deletions(-) diff --git a/doc/modules/classification_threshold.rst b/doc/modules/classification_threshold.rst index 8b3e6e3a68438..9adf846e75cba 100644 --- a/doc/modules/classification_threshold.rst +++ b/doc/modules/classification_threshold.rst @@ -97,7 +97,7 @@ a meaningful metric for their use case. the label of the class of interest (i.e. `pos_label`). Thus, if this label is not the right one for your application, you need to define a scorer and pass the right `pos_label` (and additional parameters) using the - :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get + :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring_callable` to get information to define your own scoring function. For instance, we show how to pass the information to the scorer that the label of interest is `0` when maximizing the :func:`~sklearn.metrics.f1_score`:: diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index cbc64ca89aa4d..4473005c32514 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -61,8 +61,8 @@ for details. .. _scoring_string_names: -Scoring parameter: string names -------------------------------- +String name scorers +------------------- For the most common use cases, you can designate a scorer object with the ``scoring`` parameter via a string name; the table below shows all possible values. diff --git a/sklearn/feature_selection/_sequential.py b/sklearn/feature_selection/_sequential.py index ac5f13fd00e7d..bd1e27efef60b 100644 --- a/sklearn/feature_selection/_sequential.py +++ b/sklearn/feature_selection/_sequential.py @@ -78,7 +78,7 @@ class SequentialFeatureSelector(SelectorMixin, MetaEstimatorMixin, BaseEstimator scoring : str or callable, default=None A single str (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. NOTE that when using a custom scorer, it should return a single value. diff --git a/sklearn/inspection/_permutation_importance.py b/sklearn/inspection/_permutation_importance.py index fb3c646a271a6..74000aa9e8556 100644 --- a/sklearn/inspection/_permutation_importance.py +++ b/sklearn/inspection/_permutation_importance.py @@ -177,7 +177,7 @@ def permutation_importance( If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py index 3a565c67e10ab..fb173cd096a43 100644 --- a/sklearn/metrics/_scorer.py +++ b/sklearn/metrics/_scorer.py @@ -640,7 +640,7 @@ def make_scorer( The parameter `response_method` allows to specify which method of the estimator should be used to feed the scoring/loss function. - Read more in the :ref:`User Guide `. + Read more in the :ref:`User Guide `. Parameters ---------- @@ -933,7 +933,7 @@ def check_scoring(estimator=None, scoring=None, *, allow_none=False, raise_exc=T Scorer to use. If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: diff --git a/sklearn/model_selection/_plot.py b/sklearn/model_selection/_plot.py index b16e0f4c1019a..8cae3dc97d2c5 100644 --- a/sklearn/model_selection/_plot.py +++ b/sklearn/model_selection/_plot.py @@ -369,7 +369,7 @@ def from_estimator( scoring : str or callable, default=None A string (see :ref:`scoring_parameter`) or a scorer callable object / function with signature - `scorer(estimator, X, y)` (see :ref:`scoring`). + `scorer(estimator, X, y)` (see :ref:`scoring_callable`). exploit_incremental_learning : bool, default=False If the estimator supports incremental learning, this will be @@ -752,7 +752,7 @@ def from_estimator( scoring : str or callable, default=None A string (see :ref:`scoring_parameter`) or a scorer callable object / function with signature - `scorer(estimator, X, y)` (see :ref:`scoring`). + `scorer(estimator, X, y)` (see :ref:`scoring_callable`). n_jobs : int, default=None Number of jobs to run in parallel. Training the estimator and diff --git a/sklearn/model_selection/_search.py b/sklearn/model_selection/_search.py index 7515436af33da..4c05098269521 100644 --- a/sklearn/model_selection/_search.py +++ b/sklearn/model_selection/_search.py @@ -1247,7 +1247,7 @@ class GridSearchCV(BaseSearchCV): If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: @@ -1623,7 +1623,7 @@ class RandomizedSearchCV(BaseSearchCV): If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: diff --git a/sklearn/model_selection/_search_successive_halving.py b/sklearn/model_selection/_search_successive_halving.py index 5ff5f1198121a..55073df14bfc1 100644 --- a/sklearn/model_selection/_search_successive_halving.py +++ b/sklearn/model_selection/_search_successive_halving.py @@ -480,7 +480,7 @@ class HalvingGridSearchCV(BaseSuccessiveHalving): scoring : str, callable, or None, default=None A single string (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. If None, the estimator's score method is used. refit : bool, default=True @@ -821,7 +821,7 @@ class HalvingRandomSearchCV(BaseSuccessiveHalving): scoring : str, callable, or None, default=None A single string (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. If None, the estimator's score method is used. refit : bool, default=True diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py index dddc0cce795af..353ed6caf9559 100644 --- a/sklearn/model_selection/_validation.py +++ b/sklearn/model_selection/_validation.py @@ -175,7 +175,7 @@ def cross_validate( If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: @@ -1562,7 +1562,7 @@ def permutation_test_score( scoring : str or callable, default=None A single str (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. If `None` the estimator's score method is used. From 836a263a7b2eae822e21fa51552499f83d6118df Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Wed, 27 Nov 2024 14:29:57 +1100 Subject: [PATCH 6/9] reviews --- doc/modules/model_evaluation.rst | 181 ++++++++++++++++--------------- 1 file changed, 94 insertions(+), 87 deletions(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index 96ae43eb75a4f..b01952a14e7b7 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -148,9 +148,9 @@ predictions: * **Estimator score method**: Estimators have a ``score`` method providing a default evaluation criterion for the problem they are designed to solve. - Most commonly this is mean :ref:`accuracy ` for classifiers and the + Most commonly this is :ref:`accuracy ` for classifiers and the :ref:`coefficient of determination ` (:math:`R^2`) for regressors. - Details for each estimator can be found in it's documentation. + Details for each estimator can be found in its documentation. * **Scoring parameter**: Model-evaluation tools that use :ref:`cross-validation ` (such as @@ -178,7 +178,7 @@ value of those metrics for random predictions. The ``scoring`` parameter: defining model evaluation rules ========================================================== -Model selection and evaluation using tools that use +Model selection and evaluation tools that internally use :ref:`cross-validation ` (such as :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and :class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that @@ -187,13 +187,13 @@ controls what metric they apply to the estimators evaluated. They can be specified in several ways: * `None`: the estimator's default evaluation criterion (i.e., the metric used in the - estimators `score` method) is used. + estimator's `score` method) is used. * :ref:`String name `: common metrics can be passed via a string name. -* :ref:`Callable `: more complex metrics can be passed via a callable - (e.g., function). +* :ref:`Callable `: more complex metrics can be passed via a custom + metric callable (e.g., function). -Some tools may also accept multiple metric evaluation. See :ref:`multimetric_scoring` +Some tools do also accept multiple metric evaluation. See :ref:`multimetric_scoring` for details. .. _scoring_string_names: @@ -204,7 +204,7 @@ String name scorers For the most common use cases, you can designate a scorer object with the ``scoring`` parameter via a string name; the table below shows all possible values. All scorer objects follow the convention that **higher return values are better -than lower return values**. Thus metrics which measure the distance between +than lower return values**. Thus metrics which measure the distance between the model and the data, like :func:`metrics.mean_squared_error`, are available as 'neg_mean_squared_error' which return the negated value of the metric. @@ -283,14 +283,20 @@ Usage examples: Callable scorers ---------------- -For more more complex use cases and more flexibility, you can pass a callable to -the `scoring` parameter. Below we describe different methods of creating the callable, -in increasing order of flexibility. +For more complex use cases and more flexibility, you can pass a callable to +the `scoring` parameter. This can be done by: -Defining your scoring strategy from metric functions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +* :ref:`scoring_adapt_metric` (least flexible) +* :ref:`scoring_make_scorer` + * Using `make_scorer` (more flexible) + * From scratch (most flexible) -The following metrics functions are not implemented as named scorers, +.. _scoring_adapt_metric: + +Adapting predefined metrics via `make_scorer` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following metric functions are not implemented as named scorers, sometimes because they require additional parameters, such as :func:`fbeta_score`. They cannot be passed to the ``scoring`` parameters; instead their callable needs to be passed to @@ -328,72 +334,73 @@ measuring a prediction error given ground truth and prediction: maximize, the higher the better. - functions ending with ``_error``, ``_loss``, or ``_deviance`` return a - value to minimize, the lower the better. When converting + value to minimize, the lower the better. When converting into a scorer object using :func:`make_scorer`, set the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the parameter description below). .. _scoring_make_scorer: -Custom scorer objects using `make_scorer` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The second use case is to build a completely custom scorer object -from a simple python function using :func:`make_scorer`, which can -take several parameters: - -* the python function you want to use (``my_custom_loss_func`` - in the example below) +Creating a custom scorer object +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -* whether the python function returns a score (``greater_is_better=True``, - the default) or a loss (``greater_is_better=False``). If a loss, the output - of the python function is negated by the scorer object, conforming to - the cross validation convention that scorers return higher values for better models. +You can create your own custom scorer object using +:func:`make_scorer` or for the most flexibility, from scratch. -* for classification metrics only: whether the python function you provided requires - continuous decision certainties. If the scoring function only accepts probability - estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter - `response_method`, thus in this case `response_method="predict_proba"`. Some scoring - function do not necessarily require probability estimates but rather non-thresholded - decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a - list such as `response_method=["decision_function", "predict_proba"]`. In this case, - the scorer will use the first available method, in the order given in the list, - to compute the scores. - -* any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`. - -Here is an example of building custom scorers, and of using the -``greater_is_better`` parameter:: +Custom scorer objects using `make_scorer` - >>> import numpy as np - >>> def my_custom_loss_func(y_true, y_pred): - ... diff = np.abs(y_true - y_pred).max() - ... return np.log1p(diff) - ... - >>> # score will negate the return value of my_custom_loss_func, - >>> # which will be np.log(2), 0.693, given the values for X - >>> # and y defined below. - >>> score = make_scorer(my_custom_loss_func, greater_is_better=False) - >>> X = [[1], [1]] - >>> y = [0, 1] - >>> from sklearn.dummy import DummyClassifier - >>> clf = DummyClassifier(strategy='most_frequent', random_state=0) - >>> clf = clf.fit(X, y) - >>> my_custom_loss_func(y, clf.predict(X)) - 0.69... - >>> score(clf, X, y) - -0.69... - -.. _diy_scoring: - -Custom scorer objects from scratch -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -You can generate even more flexible model scorers by constructing your own -scoring object from scratch, without using the :func:`make_scorer` factory. - - -.. dropdown:: How to build a scorer from scratch +.. dropdown:: Custom scorer objects using `make_scorer` + + You can build a completely custom scorer object + from a simple python function using :func:`make_scorer`, which can + take several parameters: + + * the python function you want to use (``my_custom_loss_func`` + in the example below) + + * whether the python function returns a score (``greater_is_better=True``, + the default) or a loss (``greater_is_better=False``). If a loss, the output + of the python function is negated by the scorer object, conforming to + the cross validation convention that scorers return higher values for better models. + + * for classification metrics only: whether the python function you provided requires + continuous decision certainties. If the scoring function only accepts probability + estimates (e.g. :func:`metrics.log_loss`), then one needs to set the parameter + `response_method="predict_proba"`. Some scoring + functions do not necessarily require probability estimates but rather non-thresholded + decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one can provide a + list (e.g., `response_method=["decision_function", "predict_proba"]`), + and scorer will use the first available method, in the order given in the list, + to compute the scores. + + * any additional parameters of the scoring function, such as ``beta`` or ``labels``. + + Here is an example of building custom scorers, and of using the + ``greater_is_better`` parameter:: + + >>> import numpy as np + >>> def my_custom_loss_func(y_true, y_pred): + ... diff = np.abs(y_true - y_pred).max() + ... return np.log1p(diff) + ... + >>> # score will negate the return value of my_custom_loss_func, + >>> # which will be np.log(2), 0.693, given the values for X + >>> # and y defined below. + >>> score = make_scorer(my_custom_loss_func, greater_is_better=False) + >>> X = [[1], [1]] + >>> y = [0, 1] + >>> from sklearn.dummy import DummyClassifier + >>> clf = DummyClassifier(strategy='most_frequent', random_state=0) + >>> clf = clf.fit(X, y) + >>> my_custom_loss_func(y, clf.predict(X)) + 0.69... + >>> score(clf, X, y) + -0.69... + +.. dropdown:: Custom scorer objects from scratch + + You can generate even more flexible model scorers by constructing your own + scoring object from scratch, without using the :func:`make_scorer` factory. For a callable to be a scorer, it needs to meet the protocol specified by the following two rules: @@ -416,24 +423,24 @@ scoring object from scratch, without using the :func:`make_scorer` factory. more details. - .. note:: **Using custom scorers in functions where n_jobs > 1** +.. dropdown:: Using custom scorers in functions where n_jobs > 1 - While defining the custom scoring function alongside the calling function - should work out of the box with the default joblib backend (loky), - importing it from another module will be a more robust approach and work - independently of the joblib backend. + While defining the custom scoring function alongside the calling function + should work out of the box with the default joblib backend (loky), + importing it from another module will be a more robust approach and work + independently of the joblib backend. - For example, to use ``n_jobs`` greater than 1 in the example below, - ``custom_scoring_function`` function is saved in a user-created module - (``custom_scorer_module.py``) and imported:: + For example, to use ``n_jobs`` greater than 1 in the example below, + ``custom_scoring_function`` function is saved in a user-created module + (``custom_scorer_module.py``) and imported:: - >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP - >>> cross_val_score(model, - ... X_train, - ... y_train, - ... scoring=make_scorer(custom_scoring_function, greater_is_better=False), - ... cv=5, - ... n_jobs=-1) # doctest: +SKIP + >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP + >>> cross_val_score(model, + ... X_train, + ... y_train, + ... scoring=make_scorer(custom_scoring_function, greater_is_better=False), + ... cv=5, + ... n_jobs=-1) # doctest: +SKIP .. _multimetric_scoring: @@ -3093,7 +3100,7 @@ display. .. _clustering_metrics: Clustering metrics -====================== +================== .. currentmodule:: sklearn.metrics From 68ba8610a37f0a3b4b7df8d72ae14ed71344d33a Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Wed, 27 Nov 2024 15:27:49 +1100 Subject: [PATCH 7/9] fix typos --- doc/modules/model_evaluation.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index b01952a14e7b7..5b282a28afa76 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -288,6 +288,7 @@ the `scoring` parameter. This can be done by: * :ref:`scoring_adapt_metric` (least flexible) * :ref:`scoring_make_scorer` + * Using `make_scorer` (more flexible) * From scratch (most flexible) @@ -345,9 +346,7 @@ Creating a custom scorer object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can create your own custom scorer object using -:func:`make_scorer` or for the most flexibility, from scratch. - -Custom scorer objects using `make_scorer` +:func:`make_scorer` or for the most flexibility, from scratch. See below for details. .. dropdown:: Custom scorer objects using `make_scorer` From 84cba213f65dc09b0bbfbbf0798335e282c8ff73 Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Wed, 27 Nov 2024 16:02:57 +1100 Subject: [PATCH 8/9] fixes --- doc/modules/model_evaluation.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index 5b282a28afa76..dacdb19a0111c 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -286,11 +286,8 @@ Callable scorers For more complex use cases and more flexibility, you can pass a callable to the `scoring` parameter. This can be done by: -* :ref:`scoring_adapt_metric` (least flexible) -* :ref:`scoring_make_scorer` - - * Using `make_scorer` (more flexible) - * From scratch (most flexible) +* :ref:`scoring_adapt_metric` +* :ref:`scoring_custom` (most flexible) .. _scoring_adapt_metric: @@ -340,7 +337,7 @@ measuring a prediction error given ground truth and prediction: the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the parameter description below). -.. _scoring_make_scorer: +.. _scoring_custom: Creating a custom scorer object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ From bb03495671bf00079bcdc00a11f80cf9278cee40 Mon Sep 17 00:00:00 2001 From: Lucy Liu Date: Fri, 29 Nov 2024 09:39:55 +1100 Subject: [PATCH 9/9] review --- sklearn/model_selection/_validation.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py index 353ed6caf9559..7d38182911fb8 100644 --- a/sklearn/model_selection/_validation.py +++ b/sklearn/model_selection/_validation.py @@ -170,7 +170,8 @@ def cross_validate( scoring : str, callable, list, tuple, or dict, default=None Strategy to evaluate the performance of the cross-validated model on the test set. If `None`, the - :ref:`default evaluation criterion ` of the estimator is used. + :ref:`default evaluation criterion ` of the estimator + is used. If `scoring` represents a single score, one can use: