From be6a14a67b3dfb760312b193bd8f814d5bd39bfc Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Thu, 21 Nov 2024 15:44:04 +1100
Subject: [PATCH 1/9] improve scoring param

---
 doc/modules/model_evaluation.rst | 162 ++++++++++++++++++-------------
 sklearn/metrics/_scorer.py       |   2 +-
 2 files changed, 95 insertions(+), 69 deletions(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index b161014f5268f..8abc11906c82f 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -11,13 +11,16 @@ predictions:
 
 * **Estimator score method**: Estimators have a ``score`` method providing a
   default evaluation criterion for the problem they are designed to solve.
-  This is not discussed on this page, but in each estimator's documentation.
+  Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
+  :ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
+  Details for each estimator can be found in it's documentation.
 
-* **Scoring parameter**: Model-evaluation tools using
+* **Scoring parameter**: Model-evaluation tools that use
   :ref:`cross-validation <cross_validation>` (such as
-  :func:`model_selection.cross_val_score` and
-  :class:`model_selection.GridSearchCV`) rely on an internal *scoring* strategy.
-  This is discussed in the section :ref:`scoring_parameter`.
+  :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
+  :class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy.
+  This can be specified using the `scoring` parameter and is discussed in the
+  section :ref:`scoring_parameter`.
 
 * **Metric functions**: The :mod:`sklearn.metrics` module implements functions
   assessing prediction error for specific purposes. These metrics are detailed
@@ -38,24 +41,39 @@ value of those metrics for random predictions.
 The ``scoring`` parameter: defining model evaluation rules
 ==========================================================
 
-Model selection and evaluation using tools, such as
-:class:`model_selection.GridSearchCV` and
-:func:`model_selection.cross_val_score`, take a ``scoring`` parameter that
+Model selection and evaluation using tools that use
+:ref:`cross-validation <cross_validation>` (such as
+:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
+:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that
 controls what metric they apply to the estimators evaluated.
 
-Common cases: predefined values
--------------------------------
+They can be specified in several ways:
+
+* `None`: the estimator's default evaluation criterion (i.e., the method used in the
+  estimators `score` method) is used.
+* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
+  name.
+* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable
+  (e.g., function).
+
+Some tools may also accept multiple metric evaluation. See :ref:`multimetric_scoring`
+for details.
+
+.. _scoring_string_names:
+
+Common cases: string names
+--------------------------
 
 For the most common use cases, you can designate a scorer object with the
-``scoring`` parameter; the table below shows all possible values.
+``scoring`` parameter via a string name; the table below shows all possible values.
 All scorer objects follow the convention that **higher return values are better
 than lower return values**.  Thus metrics which measure the distance between
 the model and the data, like :func:`metrics.mean_squared_error`, are
-available as neg_mean_squared_error which return the negated value
+available as 'neg_mean_squared_error' which return the negated value
 of the metric.
 
 ====================================   ==============================================     ==================================
-Scoring                                Function                                           Comment
+Scoring string name                    Function                                           Comment
 ====================================   ==============================================     ==================================
 **Classification**
 'accuracy'                             :func:`metrics.accuracy_score`
@@ -123,10 +141,17 @@ Usage examples:
 
 .. currentmodule:: sklearn.metrics
 
-.. _scoring:
+.. _scoring_callable:
+
+Callable scorers
+----------------
+
+For more more complex use cases and more flexibility, you can pass a callable to
+the `scoring` parameter. Below we describe different methods of creating the callable,
+in increasing order of flexibility.
 
 Defining your scoring strategy from metric functions
------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following metrics functions are not implemented as named scorers,
 sometimes because they require additional parameters, such as
@@ -171,59 +196,61 @@ measuring a prediction error given ground truth and prediction:
   the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
   parameter description below).
 
+.. _scoring_make_scorer:
+
+Custom scorer objects using `make_scorer`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-.. dropdown:: Custom scorer objects
-
-  The second use case is to build a completely custom scorer object
-  from a simple python function using :func:`make_scorer`, which can
-  take several parameters:
-
-  * the python function you want to use (``my_custom_loss_func``
-    in the example below)
-
-  * whether the python function returns a score (``greater_is_better=True``,
-    the default) or a loss (``greater_is_better=False``).  If a loss, the output
-    of the python function is negated by the scorer object, conforming to
-    the cross validation convention that scorers return higher values for better models.
-
-  * for classification metrics only: whether the python function you provided requires
-    continuous decision certainties. If the scoring function only accepts probability
-    estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter
-    `response_method`, thus in this case `response_method="predict_proba"`. Some scoring
-    function do not necessarily require probability estimates but rather non-thresholded
-    decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a
-    list such as `response_method=["decision_function", "predict_proba"]`. In this case,
-    the scorer will use the first available method, in the order given in the list,
-    to compute the scores.
-
-  * any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`.
-
-  Here is an example of building custom scorers, and of using the
-  ``greater_is_better`` parameter::
-
-      >>> import numpy as np
-      >>> def my_custom_loss_func(y_true, y_pred):
-      ...     diff = np.abs(y_true - y_pred).max()
-      ...     return np.log1p(diff)
-      ...
-      >>> # score will negate the return value of my_custom_loss_func,
-      >>> # which will be np.log(2), 0.693, given the values for X
-      >>> # and y defined below.
-      >>> score = make_scorer(my_custom_loss_func, greater_is_better=False)
-      >>> X = [[1], [1]]
-      >>> y = [0, 1]
-      >>> from sklearn.dummy import DummyClassifier
-      >>> clf = DummyClassifier(strategy='most_frequent', random_state=0)
-      >>> clf = clf.fit(X, y)
-      >>> my_custom_loss_func(y, clf.predict(X))
-      0.69...
-      >>> score(clf, X, y)
-      -0.69...
+The second use case is to build a completely custom scorer object
+from a simple python function using :func:`make_scorer`, which can
+take several parameters:
+
+* the python function you want to use (``my_custom_loss_func``
+  in the example below)
+
+* whether the python function returns a score (``greater_is_better=True``,
+  the default) or a loss (``greater_is_better=False``).  If a loss, the output
+  of the python function is negated by the scorer object, conforming to
+  the cross validation convention that scorers return higher values for better models.
+
+* for classification metrics only: whether the python function you provided requires
+  continuous decision certainties. If the scoring function only accepts probability
+  estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter
+  `response_method`, thus in this case `response_method="predict_proba"`. Some scoring
+  function do not necessarily require probability estimates but rather non-thresholded
+  decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a
+  list such as `response_method=["decision_function", "predict_proba"]`. In this case,
+  the scorer will use the first available method, in the order given in the list,
+  to compute the scores.
+
+* any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`.
+
+Here is an example of building custom scorers, and of using the
+``greater_is_better`` parameter::
+
+    >>> import numpy as np
+    >>> def my_custom_loss_func(y_true, y_pred):
+    ...     diff = np.abs(y_true - y_pred).max()
+    ...     return np.log1p(diff)
+    ...
+    >>> # score will negate the return value of my_custom_loss_func,
+    >>> # which will be np.log(2), 0.693, given the values for X
+    >>> # and y defined below.
+    >>> score = make_scorer(my_custom_loss_func, greater_is_better=False)
+    >>> X = [[1], [1]]
+    >>> y = [0, 1]
+    >>> from sklearn.dummy import DummyClassifier
+    >>> clf = DummyClassifier(strategy='most_frequent', random_state=0)
+    >>> clf = clf.fit(X, y)
+    >>> my_custom_loss_func(y, clf.predict(X))
+    0.69...
+    >>> score(clf, X, y)
+    -0.69...
 
 .. _diy_scoring:
 
-Implementing your own scoring object
-------------------------------------
+Custom scorer objects from scratch
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 You can generate even more flexible model scorers by constructing your own
 scoring object from scratch, without using the :func:`make_scorer` factory.
@@ -2934,10 +2961,9 @@ Clustering metrics
 .. currentmodule:: sklearn.metrics
 
 The :mod:`sklearn.metrics` module implements several loss, score, and utility
-functions. For more information see the :ref:`clustering_evaluation`
-section for instance clustering, and :ref:`biclustering_evaluation` for
-biclustering.
-
+functions to measure clustering performance. For more information see the
+:ref:`clustering_evaluation` section for instance clustering, and
+:ref:`biclustering_evaluation` for biclustering.
 
 .. _dummy_estimators:
 
diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py
index bc8c3a09a320c..3a565c67e10ab 100644
--- a/sklearn/metrics/_scorer.py
+++ b/sklearn/metrics/_scorer.py
@@ -640,7 +640,7 @@ def make_scorer(
     The parameter `response_method` allows to specify which method of the estimator
     should be used to feed the scoring/loss function.
 
-    Read more in the :ref:`User Guide <scoring>`.
+    Read more in the :ref:`User Guide <scoring_metric_functions>`.
 
     Parameters
     ----------

From 491982a8770fb779fc80d656870ec597d342630e Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Thu, 21 Nov 2024 16:03:16 +1100
Subject: [PATCH 2/9] wording

---
 doc/modules/model_evaluation.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index 8abc11906c82f..e20c3b6b24270 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -19,8 +19,8 @@ predictions:
   :ref:`cross-validation <cross_validation>` (such as
   :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
   :class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy.
-  This can be specified using the `scoring` parameter and is discussed in the
-  section :ref:`scoring_parameter`.
+  This can be specified using the `scoring` parameter of that tool and is discussed
+  in the section :ref:`scoring_parameter`.
 
 * **Metric functions**: The :mod:`sklearn.metrics` module implements functions
   assessing prediction error for specific purposes. These metrics are detailed

From 853cfd80353386e5c0953f8a76b3f7b6575a127b Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Thu, 21 Nov 2024 16:03:54 +1100
Subject: [PATCH 3/9] wording

---
 doc/modules/model_evaluation.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index e20c3b6b24270..f2473dd169ada 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -49,7 +49,7 @@ controls what metric they apply to the estimators evaluated.
 
 They can be specified in several ways:
 
-* `None`: the estimator's default evaluation criterion (i.e., the method used in the
+* `None`: the estimator's default evaluation criterion (i.e., the metric used in the
   estimators `score` method) is used.
 * :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
   name.

From f09f35307eb1361fb45f5c2fdafd97439658906f Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Thu, 21 Nov 2024 16:36:08 +1100
Subject: [PATCH 4/9] wip

---
 doc/modules/model_evaluation.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index f2473dd169ada..cbc64ca89aa4d 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -61,8 +61,8 @@ for details.
 
 .. _scoring_string_names:
 
-Common cases: string names
---------------------------
+Scoring parameter: string names
+-------------------------------
 
 For the most common use cases, you can designate a scorer object with the
 ``scoring`` parameter via a string name; the table below shows all possible values.

From 69f23de4dec6b4829b99aced63188772f51f76ca Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Thu, 21 Nov 2024 16:40:08 +1100
Subject: [PATCH 5/9] fix cross ref

---
 doc/modules/classification_threshold.rst              | 2 +-
 doc/modules/model_evaluation.rst                      | 4 ++--
 sklearn/feature_selection/_sequential.py              | 2 +-
 sklearn/inspection/_permutation_importance.py         | 2 +-
 sklearn/metrics/_scorer.py                            | 4 ++--
 sklearn/model_selection/_plot.py                      | 4 ++--
 sklearn/model_selection/_search.py                    | 4 ++--
 sklearn/model_selection/_search_successive_halving.py | 4 ++--
 sklearn/model_selection/_validation.py                | 4 ++--
 9 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/doc/modules/classification_threshold.rst b/doc/modules/classification_threshold.rst
index 8b3e6e3a68438..9adf846e75cba 100644
--- a/doc/modules/classification_threshold.rst
+++ b/doc/modules/classification_threshold.rst
@@ -97,7 +97,7 @@ a meaningful metric for their use case.
     the label of the class of interest (i.e. `pos_label`). Thus, if this label is not
     the right one for your application, you need to define a scorer and pass the right
     `pos_label` (and additional parameters) using the
-    :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get
+    :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring_callable` to get
     information to define your own scoring function. For instance, we show how to pass
     the information to the scorer that the label of interest is `0` when maximizing the
     :func:`~sklearn.metrics.f1_score`::
diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index cbc64ca89aa4d..4473005c32514 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -61,8 +61,8 @@ for details.
 
 .. _scoring_string_names:
 
-Scoring parameter: string names
--------------------------------
+String name scorers
+-------------------
 
 For the most common use cases, you can designate a scorer object with the
 ``scoring`` parameter via a string name; the table below shows all possible values.
diff --git a/sklearn/feature_selection/_sequential.py b/sklearn/feature_selection/_sequential.py
index ac5f13fd00e7d..bd1e27efef60b 100644
--- a/sklearn/feature_selection/_sequential.py
+++ b/sklearn/feature_selection/_sequential.py
@@ -78,7 +78,7 @@ class SequentialFeatureSelector(SelectorMixin, MetaEstimatorMixin, BaseEstimator
 
     scoring : str or callable, default=None
         A single str (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
 
         NOTE that when using a custom scorer, it should return a single
         value.
diff --git a/sklearn/inspection/_permutation_importance.py b/sklearn/inspection/_permutation_importance.py
index fb3c646a271a6..74000aa9e8556 100644
--- a/sklearn/inspection/_permutation_importance.py
+++ b/sklearn/inspection/_permutation_importance.py
@@ -177,7 +177,7 @@ def permutation_importance(
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py
index 3a565c67e10ab..fb173cd096a43 100644
--- a/sklearn/metrics/_scorer.py
+++ b/sklearn/metrics/_scorer.py
@@ -640,7 +640,7 @@ def make_scorer(
     The parameter `response_method` allows to specify which method of the estimator
     should be used to feed the scoring/loss function.
 
-    Read more in the :ref:`User Guide <scoring_metric_functions>`.
+    Read more in the :ref:`User Guide <scoring_callable>`.
 
     Parameters
     ----------
@@ -933,7 +933,7 @@ def check_scoring(estimator=None, scoring=None, *, allow_none=False, raise_exc=T
         Scorer to use. If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
diff --git a/sklearn/model_selection/_plot.py b/sklearn/model_selection/_plot.py
index b16e0f4c1019a..8cae3dc97d2c5 100644
--- a/sklearn/model_selection/_plot.py
+++ b/sklearn/model_selection/_plot.py
@@ -369,7 +369,7 @@ def from_estimator(
         scoring : str or callable, default=None
             A string (see :ref:`scoring_parameter`) or
             a scorer callable object / function with signature
-            `scorer(estimator, X, y)` (see :ref:`scoring`).
+            `scorer(estimator, X, y)` (see :ref:`scoring_callable`).
 
         exploit_incremental_learning : bool, default=False
             If the estimator supports incremental learning, this will be
@@ -752,7 +752,7 @@ def from_estimator(
         scoring : str or callable, default=None
             A string (see :ref:`scoring_parameter`) or
             a scorer callable object / function with signature
-            `scorer(estimator, X, y)` (see :ref:`scoring`).
+            `scorer(estimator, X, y)` (see :ref:`scoring_callable`).
 
         n_jobs : int, default=None
             Number of jobs to run in parallel. Training the estimator and
diff --git a/sklearn/model_selection/_search.py b/sklearn/model_selection/_search.py
index 7515436af33da..4c05098269521 100644
--- a/sklearn/model_selection/_search.py
+++ b/sklearn/model_selection/_search.py
@@ -1247,7 +1247,7 @@ class GridSearchCV(BaseSearchCV):
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
@@ -1623,7 +1623,7 @@ class RandomizedSearchCV(BaseSearchCV):
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
diff --git a/sklearn/model_selection/_search_successive_halving.py b/sklearn/model_selection/_search_successive_halving.py
index 5ff5f1198121a..55073df14bfc1 100644
--- a/sklearn/model_selection/_search_successive_halving.py
+++ b/sklearn/model_selection/_search_successive_halving.py
@@ -480,7 +480,7 @@ class HalvingGridSearchCV(BaseSuccessiveHalving):
 
     scoring : str, callable, or None, default=None
         A single string (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
         If None, the estimator's score method is used.
 
     refit : bool, default=True
@@ -821,7 +821,7 @@ class HalvingRandomSearchCV(BaseSuccessiveHalving):
 
     scoring : str, callable, or None, default=None
         A single string (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
         If None, the estimator's score method is used.
 
     refit : bool, default=True
diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py
index dddc0cce795af..353ed6caf9559 100644
--- a/sklearn/model_selection/_validation.py
+++ b/sklearn/model_selection/_validation.py
@@ -175,7 +175,7 @@ def cross_validate(
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
@@ -1562,7 +1562,7 @@ def permutation_test_score(
 
     scoring : str or callable, default=None
         A single str (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
 
         If `None` the estimator's score method is used.
 

From 836a263a7b2eae822e21fa51552499f83d6118df Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Wed, 27 Nov 2024 14:29:57 +1100
Subject: [PATCH 6/9] reviews

---
 doc/modules/model_evaluation.rst | 181 ++++++++++++++++---------------
 1 file changed, 94 insertions(+), 87 deletions(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index 96ae43eb75a4f..b01952a14e7b7 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -148,9 +148,9 @@ predictions:
 
 * **Estimator score method**: Estimators have a ``score`` method providing a
   default evaluation criterion for the problem they are designed to solve.
-  Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
+  Most commonly this is :ref:`accuracy <accuracy_score>` for classifiers and the
   :ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
-  Details for each estimator can be found in it's documentation.
+  Details for each estimator can be found in its documentation.
 
 * **Scoring parameter**: Model-evaluation tools that use
   :ref:`cross-validation <cross_validation>` (such as
@@ -178,7 +178,7 @@ value of those metrics for random predictions.
 The ``scoring`` parameter: defining model evaluation rules
 ==========================================================
 
-Model selection and evaluation using tools that use
+Model selection and evaluation tools that internally use
 :ref:`cross-validation <cross_validation>` (such as
 :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
 :class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that
@@ -187,13 +187,13 @@ controls what metric they apply to the estimators evaluated.
 They can be specified in several ways:
 
 * `None`: the estimator's default evaluation criterion (i.e., the metric used in the
-  estimators `score` method) is used.
+  estimator's `score` method) is used.
 * :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
   name.
-* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable
-  (e.g., function).
+* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a custom
+  metric callable (e.g., function).
 
-Some tools may also accept multiple metric evaluation. See :ref:`multimetric_scoring`
+Some tools do also accept multiple metric evaluation. See :ref:`multimetric_scoring`
 for details.
 
 .. _scoring_string_names:
@@ -204,7 +204,7 @@ String name scorers
 For the most common use cases, you can designate a scorer object with the
 ``scoring`` parameter via a string name; the table below shows all possible values.
 All scorer objects follow the convention that **higher return values are better
-than lower return values**.  Thus metrics which measure the distance between
+than lower return values**. Thus metrics which measure the distance between
 the model and the data, like :func:`metrics.mean_squared_error`, are
 available as 'neg_mean_squared_error' which return the negated value
 of the metric.
@@ -283,14 +283,20 @@ Usage examples:
 Callable scorers
 ----------------
 
-For more more complex use cases and more flexibility, you can pass a callable to
-the `scoring` parameter. Below we describe different methods of creating the callable,
-in increasing order of flexibility.
+For more complex use cases and more flexibility, you can pass a callable to
+the `scoring` parameter. This can be done by:
 
-Defining your scoring strategy from metric functions
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* :ref:`scoring_adapt_metric` (least flexible)
+* :ref:`scoring_make_scorer`
+  * Using `make_scorer` (more flexible)
+  * From scratch (most flexible)
 
-The following metrics functions are not implemented as named scorers,
+.. _scoring_adapt_metric:
+
+Adapting predefined metrics via `make_scorer`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following metric functions are not implemented as named scorers,
 sometimes because they require additional parameters, such as
 :func:`fbeta_score`. They cannot be passed to the ``scoring``
 parameters; instead their callable needs to be passed to
@@ -328,72 +334,73 @@ measuring a prediction error given ground truth and prediction:
   maximize, the higher the better.
 
 - functions ending with ``_error``, ``_loss``, or ``_deviance`` return a
-  value to minimize, the lower the better.  When converting
+  value to minimize, the lower the better. When converting
   into a scorer object using :func:`make_scorer`, set
   the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
   parameter description below).
 
 .. _scoring_make_scorer:
 
-Custom scorer objects using `make_scorer`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The second use case is to build a completely custom scorer object
-from a simple python function using :func:`make_scorer`, which can
-take several parameters:
-
-* the python function you want to use (``my_custom_loss_func``
-  in the example below)
+Creating a custom scorer object
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-* whether the python function returns a score (``greater_is_better=True``,
-  the default) or a loss (``greater_is_better=False``).  If a loss, the output
-  of the python function is negated by the scorer object, conforming to
-  the cross validation convention that scorers return higher values for better models.
+You can create your own custom scorer object using
+:func:`make_scorer` or for the most flexibility, from scratch.
 
-* for classification metrics only: whether the python function you provided requires
-  continuous decision certainties. If the scoring function only accepts probability
-  estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter
-  `response_method`, thus in this case `response_method="predict_proba"`. Some scoring
-  function do not necessarily require probability estimates but rather non-thresholded
-  decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a
-  list such as `response_method=["decision_function", "predict_proba"]`. In this case,
-  the scorer will use the first available method, in the order given in the list,
-  to compute the scores.
-
-* any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`.
-
-Here is an example of building custom scorers, and of using the
-``greater_is_better`` parameter::
+Custom scorer objects using `make_scorer`
 
-    >>> import numpy as np
-    >>> def my_custom_loss_func(y_true, y_pred):
-    ...     diff = np.abs(y_true - y_pred).max()
-    ...     return np.log1p(diff)
-    ...
-    >>> # score will negate the return value of my_custom_loss_func,
-    >>> # which will be np.log(2), 0.693, given the values for X
-    >>> # and y defined below.
-    >>> score = make_scorer(my_custom_loss_func, greater_is_better=False)
-    >>> X = [[1], [1]]
-    >>> y = [0, 1]
-    >>> from sklearn.dummy import DummyClassifier
-    >>> clf = DummyClassifier(strategy='most_frequent', random_state=0)
-    >>> clf = clf.fit(X, y)
-    >>> my_custom_loss_func(y, clf.predict(X))
-    0.69...
-    >>> score(clf, X, y)
-    -0.69...
-
-.. _diy_scoring:
-
-Custom scorer objects from scratch
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-You can generate even more flexible model scorers by constructing your own
-scoring object from scratch, without using the :func:`make_scorer` factory.
-
-
-.. dropdown:: How to build a scorer from scratch
+.. dropdown:: Custom scorer objects using `make_scorer`
+
+  You can build a completely custom scorer object
+  from a simple python function using :func:`make_scorer`, which can
+  take several parameters:
+
+  * the python function you want to use (``my_custom_loss_func``
+    in the example below)
+
+  * whether the python function returns a score (``greater_is_better=True``,
+    the default) or a loss (``greater_is_better=False``). If a loss, the output
+    of the python function is negated by the scorer object, conforming to
+    the cross validation convention that scorers return higher values for better models.
+
+  * for classification metrics only: whether the python function you provided requires
+    continuous decision certainties. If the scoring function only accepts probability
+    estimates (e.g. :func:`metrics.log_loss`), then one needs to set the parameter
+    `response_method="predict_proba"`. Some scoring
+    functions do not necessarily require probability estimates but rather non-thresholded
+    decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one can provide a
+    list (e.g., `response_method=["decision_function", "predict_proba"]`),
+    and scorer will use the first available method, in the order given in the list,
+    to compute the scores.
+
+  * any additional parameters of the scoring function, such as ``beta`` or ``labels``.
+
+  Here is an example of building custom scorers, and of using the
+  ``greater_is_better`` parameter::
+
+      >>> import numpy as np
+      >>> def my_custom_loss_func(y_true, y_pred):
+      ...     diff = np.abs(y_true - y_pred).max()
+      ...     return np.log1p(diff)
+      ...
+      >>> # score will negate the return value of my_custom_loss_func,
+      >>> # which will be np.log(2), 0.693, given the values for X
+      >>> # and y defined below.
+      >>> score = make_scorer(my_custom_loss_func, greater_is_better=False)
+      >>> X = [[1], [1]]
+      >>> y = [0, 1]
+      >>> from sklearn.dummy import DummyClassifier
+      >>> clf = DummyClassifier(strategy='most_frequent', random_state=0)
+      >>> clf = clf.fit(X, y)
+      >>> my_custom_loss_func(y, clf.predict(X))
+      0.69...
+      >>> score(clf, X, y)
+      -0.69...
+
+.. dropdown:: Custom scorer objects from scratch
+
+  You can generate even more flexible model scorers by constructing your own
+  scoring object from scratch, without using the :func:`make_scorer` factory.
 
   For a callable to be a scorer, it needs to meet the protocol specified by
   the following two rules:
@@ -416,24 +423,24 @@ scoring object from scratch, without using the :func:`make_scorer` factory.
     more details.
 
 
-  .. note:: **Using custom scorers in functions where n_jobs > 1**
+.. dropdown:: Using custom scorers in functions where n_jobs > 1
 
-      While defining the custom scoring function alongside the calling function
-      should work out of the box with the default joblib backend (loky),
-      importing it from another module will be a more robust approach and work
-      independently of the joblib backend.
+    While defining the custom scoring function alongside the calling function
+    should work out of the box with the default joblib backend (loky),
+    importing it from another module will be a more robust approach and work
+    independently of the joblib backend.
 
-      For example, to use ``n_jobs`` greater than 1 in the example below,
-      ``custom_scoring_function`` function is saved in a user-created module
-      (``custom_scorer_module.py``) and imported::
+    For example, to use ``n_jobs`` greater than 1 in the example below,
+    ``custom_scoring_function`` function is saved in a user-created module
+    (``custom_scorer_module.py``) and imported::
 
-          >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
-          >>> cross_val_score(model,
-          ...  X_train,
-          ...  y_train,
-          ...  scoring=make_scorer(custom_scoring_function, greater_is_better=False),
-          ...  cv=5,
-          ...  n_jobs=-1) # doctest: +SKIP
+        >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
+        >>> cross_val_score(model,
+        ...  X_train,
+        ...  y_train,
+        ...  scoring=make_scorer(custom_scoring_function, greater_is_better=False),
+        ...  cv=5,
+        ...  n_jobs=-1) # doctest: +SKIP
 
 .. _multimetric_scoring:
 
@@ -3093,7 +3100,7 @@ display.
 .. _clustering_metrics:
 
 Clustering metrics
-======================
+==================
 
 .. currentmodule:: sklearn.metrics
 

From 68ba8610a37f0a3b4b7df8d72ae14ed71344d33a Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Wed, 27 Nov 2024 15:27:49 +1100
Subject: [PATCH 7/9] fix typos

---
 doc/modules/model_evaluation.rst | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index b01952a14e7b7..5b282a28afa76 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -288,6 +288,7 @@ the `scoring` parameter. This can be done by:
 
 * :ref:`scoring_adapt_metric` (least flexible)
 * :ref:`scoring_make_scorer`
+
   * Using `make_scorer` (more flexible)
   * From scratch (most flexible)
 
@@ -345,9 +346,7 @@ Creating a custom scorer object
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 You can create your own custom scorer object using
-:func:`make_scorer` or for the most flexibility, from scratch.
-
-Custom scorer objects using `make_scorer`
+:func:`make_scorer` or for the most flexibility, from scratch. See below for details.
 
 .. dropdown:: Custom scorer objects using `make_scorer`
 

From 84cba213f65dc09b0bbfbbf0798335e282c8ff73 Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Wed, 27 Nov 2024 16:02:57 +1100
Subject: [PATCH 8/9] fixes

---
 doc/modules/model_evaluation.rst | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index 5b282a28afa76..dacdb19a0111c 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -286,11 +286,8 @@ Callable scorers
 For more complex use cases and more flexibility, you can pass a callable to
 the `scoring` parameter. This can be done by:
 
-* :ref:`scoring_adapt_metric` (least flexible)
-* :ref:`scoring_make_scorer`
-
-  * Using `make_scorer` (more flexible)
-  * From scratch (most flexible)
+* :ref:`scoring_adapt_metric`
+* :ref:`scoring_custom` (most flexible)
 
 .. _scoring_adapt_metric:
 
@@ -340,7 +337,7 @@ measuring a prediction error given ground truth and prediction:
   the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
   parameter description below).
 
-.. _scoring_make_scorer:
+.. _scoring_custom:
 
 Creating a custom scorer object
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

From bb03495671bf00079bcdc00a11f80cf9278cee40 Mon Sep 17 00:00:00 2001
From: Lucy Liu <jliu176@gmail.com>
Date: Fri, 29 Nov 2024 09:39:55 +1100
Subject: [PATCH 9/9] review

---
 sklearn/model_selection/_validation.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py
index 353ed6caf9559..7d38182911fb8 100644
--- a/sklearn/model_selection/_validation.py
+++ b/sklearn/model_selection/_validation.py
@@ -170,7 +170,8 @@ def cross_validate(
     scoring : str, callable, list, tuple, or dict, default=None
         Strategy to evaluate the performance of the cross-validated model on
         the test set. If `None`, the
-        :ref:`default evaluation criterion <model_evaluation>` of the estimator is used.
+        :ref:`default evaluation criterion <scoring_api_overview>` of the estimator
+        is used.
 
         If `scoring` represents a single score, one can use: