scikit-learn · glemaitre · Jan 13, 2024 · Jan 12, 2024 · Jan 12, 2024 · Jan 12, 2024
diff --git a/doc/about.rst b/doc/about.rst
@@ -96,44 +96,44 @@ Citing scikit-learn
 If you use scikit-learn in a scientific publication, we would appreciate
 citations to the following paper:
 
-  `Scikit-learn: Machine Learning in Python
-  <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>`_, Pedregosa
-  *et al.*, JMLR 12, pp. 2825-2830, 2011.
-
-  Bibtex entry::
-
-    @article{scikit-learn,
-     title={Scikit-learn: Machine Learning in {P}ython},
-     author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
-             and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
-             and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
-             Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
-     journal={Journal of Machine Learning Research},
-     volume={12},
-     pages={2825--2830},
-     year={2011}
-    }
+`Scikit-learn: Machine Learning in Python
+<https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>`_, Pedregosa
+*et al.*, JMLR 12, pp. 2825-2830, 2011.
+
+Bibtex entry::
+
+  @article{scikit-learn,
+    title={Scikit-learn: Machine Learning in {P}ython},
+    author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
+            and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
+            and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
+            Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
+    journal={Journal of Machine Learning Research},
+    volume={12},
+    pages={2825--2830},
+    year={2011}
+  }
 
 If you want to cite scikit-learn for its API or design, you may also want to consider the
 following paper:
 
-  :arxiv:`API design for machine learning software: experiences from the scikit-learn
-  project <1309.0238>`, Buitinck *et al.*, 2013.
-
-  Bibtex entry::
-
-    @inproceedings{sklearn_api,
-      author    = {Lars Buitinck and Gilles Louppe and Mathieu Blondel and
-                   Fabian Pedregosa and Andreas Mueller and Olivier Grisel and
-                   Vlad Niculae and Peter Prettenhofer and Alexandre Gramfort
-                   and Jaques Grobler and Robert Layton and Jake VanderPlas and
-                   Arnaud Joly and Brian Holt and Ga{\"{e}}l Varoquaux},
-      title     = {{API} design for machine learning software: experiences from the scikit-learn
-                   project},
-      booktitle = {ECML PKDD Workshop: Languages for Data Mining and Machine Learning},
-      year      = {2013},
-      pages = {108--122},
-    }
+:arxiv:`API design for machine learning software: experiences from the scikit-learn
+project <1309.0238>`, Buitinck *et al.*, 2013.
+
+Bibtex entry::
+
+  @inproceedings{sklearn_api,
+    author    = {Lars Buitinck and Gilles Louppe and Mathieu Blondel and
+                  Fabian Pedregosa and Andreas Mueller and Olivier Grisel and
+                  Vlad Niculae and Peter Prettenhofer and Alexandre Gramfort
+                  and Jaques Grobler and Robert Layton and Jake VanderPlas and
+                  Arnaud Joly and Brian Holt and Ga{\"{e}}l Varoquaux},
+    title     = {{API} design for machine learning software: experiences from the scikit-learn
+                  project},
+    booktitle = {ECML PKDD Workshop: Languages for Data Mining and Machine Learning},
+    year      = {2013},
+    pages = {108--122},
+  }
 
 Artwork
 -------

diff --git a/doc/computing/computational_performance.rst b/doc/computing/computational_performance.rst
@@ -39,10 +39,11 @@ machine learning toolkit is the latency at which predictions can be made in a
 production environment.
 
 The main factors that influence the prediction latency are
-  1. Number of features
-  2. Input data representation and sparsity
-  3. Model complexity
-  4. Feature extraction
+
+1. Number of features
+2. Input data representation and sparsity
+3. Model complexity
+4. Feature extraction
 
 A last major parameter is also the possibility to do predictions in bulk or
 one-at-a-time mode.
@@ -224,9 +225,9 @@ files, tokenizing the text and hashing it into a common vector space) is
 taking 100 to 500 times more time than the actual prediction code, depending on
 the chosen model.
 
- .. |prediction_time| image::  ../auto_examples/applications/images/sphx_glr_plot_out_of_core_classification_004.png
-    :target: ../auto_examples/applications/plot_out_of_core_classification.html
-    :scale: 80
+.. |prediction_time| image::  ../auto_examples/applications/images/sphx_glr_plot_out_of_core_classification_004.png
+  :target: ../auto_examples/applications/plot_out_of_core_classification.html
+  :scale: 80
 
 .. centered:: |prediction_time|
 
@@ -283,10 +284,11 @@ scikit-learn install with the following command::
     python -c "import sklearn; sklearn.show_versions()"
 
 Optimized BLAS / LAPACK implementations include:
- - Atlas (need hardware specific tuning by rebuilding on the target machine)
- - OpenBLAS
- - MKL
- - Apple Accelerate and vecLib frameworks (OSX only)
+
+- Atlas (need hardware specific tuning by rebuilding on the target machine)
+- OpenBLAS
+- MKL
+- Apple Accelerate and vecLib frameworks (OSX only)
 
 More information can be found on the `NumPy install page <https://numpy.org/install/>`_
 and in this
@@ -364,5 +366,5 @@ sufficient to not generate the relevant features, leaving their columns empty.
 Links
 ......
 
-  - :ref:`scikit-learn developer performance documentation <performance-howto>`
-  - `Scipy sparse matrix formats documentation <https://docs.scipy.org/doc/scipy/reference/sparse.html>`_
+- :ref:`scikit-learn developer performance documentation <performance-howto>`
+- `Scipy sparse matrix formats documentation <https://docs.scipy.org/doc/scipy/reference/sparse.html>`_
diff --git a/doc/computing/parallelism.rst b/doc/computing/parallelism.rst
@@ -87,15 +87,15 @@ will use as many threads as possible, i.e. as many threads as logical cores.
 
 You can control the exact number of threads that are used either:
 
- - via the ``OMP_NUM_THREADS`` environment variable, for instance when:
-   running a python script:
+- via the ``OMP_NUM_THREADS`` environment variable, for instance when:
+  running a python script:
 
-   .. prompt:: bash $
+  .. prompt:: bash $
 
-        OMP_NUM_THREADS=4 python my_script.py
+      OMP_NUM_THREADS=4 python my_script.py
 
- - or via `threadpoolctl` as explained by `this piece of documentation
-   <https://github.com/joblib/threadpoolctl/#setting-the-maximum-size-of-thread-pools>`_.
+- or via `threadpoolctl` as explained by `this piece of documentation
+  <https://github.com/joblib/threadpoolctl/#setting-the-maximum-size-of-thread-pools>`_.
 
 Parallel NumPy and SciPy routines from numerical libraries
 ..........................................................
@@ -107,15 +107,15 @@ such as MKL, OpenBLAS or BLIS.
 You can control the exact number of threads used by BLAS for each library
 using environment variables, namely:
 
-  - ``MKL_NUM_THREADS`` sets the number of thread MKL uses,
-  - ``OPENBLAS_NUM_THREADS`` sets the number of threads OpenBLAS uses
-  - ``BLIS_NUM_THREADS`` sets the number of threads BLIS uses
+- ``MKL_NUM_THREADS`` sets the number of thread MKL uses,
+- ``OPENBLAS_NUM_THREADS`` sets the number of threads OpenBLAS uses
+- ``BLIS_NUM_THREADS`` sets the number of threads BLIS uses
 
 Note that BLAS & LAPACK implementations can also be impacted by
 `OMP_NUM_THREADS`. To check whether this is the case in your environment,
 you can inspect how the number of threads effectively used by those libraries
 is affected when running the following command in a bash or zsh terminal
-for different values of `OMP_NUM_THREADS`::
+for different values of `OMP_NUM_THREADS`:
 
 .. prompt:: bash $
 

diff --git a/doc/computing/scaling_strategies.rst b/doc/computing/scaling_strategies.rst
@@ -20,9 +20,9 @@ data that cannot fit in a computer's main memory (RAM).
 
 Here is a sketch of a system designed to achieve this goal:
 
-  1. a way to stream instances
-  2. a way to extract features from instances
-  3. an incremental algorithm
+1. a way to stream instances
+2. a way to extract features from instances
+3. an incremental algorithm
 
 Streaming instances
 ....................
@@ -62,29 +62,29 @@ balances relevancy and memory footprint could involve some tuning [1]_.
 
 Here is a list of incremental estimators for different tasks:
 
-  - Classification
-      + :class:`sklearn.naive_bayes.MultinomialNB`
-      + :class:`sklearn.naive_bayes.BernoulliNB`
-      + :class:`sklearn.linear_model.Perceptron`
-      + :class:`sklearn.linear_model.SGDClassifier`
-      + :class:`sklearn.linear_model.PassiveAggressiveClassifier`
-      + :class:`sklearn.neural_network.MLPClassifier`
-  - Regression
-      + :class:`sklearn.linear_model.SGDRegressor`
-      + :class:`sklearn.linear_model.PassiveAggressiveRegressor`
-      + :class:`sklearn.neural_network.MLPRegressor`
-  - Clustering
-      + :class:`sklearn.cluster.MiniBatchKMeans`
-      + :class:`sklearn.cluster.Birch`
-  - Decomposition / feature Extraction
-      + :class:`sklearn.decomposition.MiniBatchDictionaryLearning`
-      + :class:`sklearn.decomposition.IncrementalPCA`
-      + :class:`sklearn.decomposition.LatentDirichletAllocation`
-      + :class:`sklearn.decomposition.MiniBatchNMF`
-  - Preprocessing
-      + :class:`sklearn.preprocessing.StandardScaler`
-      + :class:`sklearn.preprocessing.MinMaxScaler`
-      + :class:`sklearn.preprocessing.MaxAbsScaler`
+- Classification
+    + :class:`sklearn.naive_bayes.MultinomialNB`
+    + :class:`sklearn.naive_bayes.BernoulliNB`
+    + :class:`sklearn.linear_model.Perceptron`
+    + :class:`sklearn.linear_model.SGDClassifier`
+    + :class:`sklearn.linear_model.PassiveAggressiveClassifier`
+    + :class:`sklearn.neural_network.MLPClassifier`
+- Regression
+    + :class:`sklearn.linear_model.SGDRegressor`
+    + :class:`sklearn.linear_model.PassiveAggressiveRegressor`
+    + :class:`sklearn.neural_network.MLPRegressor`
+- Clustering
+    + :class:`sklearn.cluster.MiniBatchKMeans`
+    + :class:`sklearn.cluster.Birch`
+- Decomposition / feature Extraction
+    + :class:`sklearn.decomposition.MiniBatchDictionaryLearning`
+    + :class:`sklearn.decomposition.IncrementalPCA`
+    + :class:`sklearn.decomposition.LatentDirichletAllocation`
+    + :class:`sklearn.decomposition.MiniBatchNMF`
+- Preprocessing
+    + :class:`sklearn.preprocessing.StandardScaler`
+    + :class:`sklearn.preprocessing.MinMaxScaler`
+    + :class:`sklearn.preprocessing.MaxAbsScaler`
 
 For classification, a somewhat important thing to note is that although a
 stateless feature extraction routine may be able to cope with new/unseen

diff --git a/doc/developers/bug_triaging.rst b/doc/developers/bug_triaging.rst
@@ -19,18 +19,18 @@ A third party can give useful feedback or even add
 comments on the issue.
 The following actions are typically useful:
 
-  - documenting issues that are missing elements to reproduce the problem
-    such as code samples
+- documenting issues that are missing elements to reproduce the problem
+  such as code samples
 
-  - suggesting better use of code formatting
+- suggesting better use of code formatting
 
-  - suggesting to reformulate the title and description to make them more
-    explicit about the problem to be solved
+- suggesting to reformulate the title and description to make them more
+  explicit about the problem to be solved
 
-  - linking to related issues or discussions while briefly describing how
-    they are related, for instance "See also #xyz for a similar attempt
-    at this" or "See also #xyz where the same thing happened in
-    SomeEstimator" provides context and helps the discussion.
+- linking to related issues or discussions while briefly describing how
+  they are related, for instance "See also #xyz for a similar attempt
+  at this" or "See also #xyz where the same thing happened in
+  SomeEstimator" provides context and helps the discussion.
 
 .. topic:: Fruitful discussions
 

diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst
@@ -291,7 +291,7 @@ The next steps now describe the process of modifying code and submitting a PR:
 
 9. Create a feature branch to hold your development changes:
 
-    .. prompt:: bash $
+   .. prompt:: bash $
 
         git checkout -b my_feature
 
@@ -529,25 +529,25 @@ Continuous Integration (CI)
 Please note that if one of the following markers appear in the latest commit
 message, the following actions are taken.
 
-    ====================== ===================
-    Commit Message Marker  Action Taken by CI
-    ---------------------- -------------------
-    [ci skip]              CI is skipped completely
-    [cd build]             CD is run (wheels and source distribution are built)
-    [cd build gh]          CD is run only for GitHub Actions
-    [cd build cirrus]      CD is run only for Cirrus CI
-    [lint skip]            Azure pipeline skips linting
-    [scipy-dev]            Build & test with our dependencies (numpy, scipy, etc.) development builds
-    [nogil]                Build & test with the nogil experimental branches of CPython, Cython, NumPy, SciPy, ...
-    [pypy]                 Build & test with PyPy
-    [pyodide]              Build & test with Pyodide
-    [azure parallel]       Run Azure CI jobs in parallel
-    [cirrus arm]           Run Cirrus CI ARM test
-    [float32]              Run float32 tests by setting `SKLEARN_RUN_FLOAT32_TESTS=1`. See :ref:`environment_variable` for more details
-    [doc skip]             Docs are not built
-    [doc quick]            Docs built, but excludes example gallery plots
-    [doc build]            Docs built including example gallery plots (very long)
-    ====================== ===================
+====================== ===================
+Commit Message Marker  Action Taken by CI
+---------------------- -------------------
+[ci skip]              CI is skipped completely
+[cd build]             CD is run (wheels and source distribution are built)
+[cd build gh]          CD is run only for GitHub Actions
+[cd build cirrus]      CD is run only for Cirrus CI
+[lint skip]            Azure pipeline skips linting
+[scipy-dev]            Build & test with our dependencies (numpy, scipy, etc.) development builds
+[nogil]                Build & test with the nogil experimental branches of CPython, Cython, NumPy, SciPy, ...
+[pypy]                 Build & test with PyPy
+[pyodide]              Build & test with Pyodide
+[azure parallel]       Run Azure CI jobs in parallel
+[cirrus arm]           Run Cirrus CI ARM test
+[float32]              Run float32 tests by setting `SKLEARN_RUN_FLOAT32_TESTS=1`. See :ref:`environment_variable` for more details
+[doc skip]             Docs are not built
+[doc quick]            Docs built, but excludes example gallery plots
+[doc build]            Docs built including example gallery plots (very long)
+====================== ===================
 
 Note that, by default, the documentation is built but only the examples
 that are directly modified by the pull request are executed.
@@ -713,30 +713,30 @@ We are glad to accept any sort of documentation:
 
   In general have the following in mind:
 
-    * Use Python basic types. (``bool`` instead of ``boolean``)
-    * Use parenthesis for defining shapes: ``array-like of shape (n_samples,)``
-      or ``array-like of shape (n_samples, n_features)``
-    * For strings with multiple options, use brackets: ``input: {'log',
-      'squared', 'multinomial'}``
-    * 1D or 2D data can be a subset of ``{array-like, ndarray, sparse matrix,
-      dataframe}``. Note that ``array-like`` can also be a ``list``, while
-      ``ndarray`` is explicitly only a ``numpy.ndarray``.
-    * Specify ``dataframe`` when "frame-like" features are being used, such as
-      the column names.
-    * When specifying the data type of a list, use ``of`` as a delimiter: ``list
-      of int``. When the parameter supports arrays giving details about the
-      shape and/or data type and a list of such arrays, you can use one of
-      ``array-like of shape (n_samples,) or list of such arrays``.
-    * When specifying the dtype of an ndarray, use e.g. ``dtype=np.int32`` after
-      defining the shape: ``ndarray of shape (n_samples,), dtype=np.int32``. You
-      can specify multiple dtype as a set: ``array-like of shape (n_samples,),
-      dtype={np.float64, np.float32}``. If one wants to mention arbitrary
-      precision, use `integral` and `floating` rather than the Python dtype
-      `int` and `float`. When both `int` and `floating` are supported, there is
-      no need to specify the dtype.
-    * When the default is ``None``, ``None`` only needs to be specified at the
-      end with ``default=None``. Be sure to include in the docstring, what it
-      means for the parameter or attribute to be ``None``.
+  * Use Python basic types. (``bool`` instead of ``boolean``)
+  * Use parenthesis for defining shapes: ``array-like of shape (n_samples,)``
+    or ``array-like of shape (n_samples, n_features)``
+  * For strings with multiple options, use brackets: ``input: {'log',
+    'squared', 'multinomial'}``
+  * 1D or 2D data can be a subset of ``{array-like, ndarray, sparse matrix,
+    dataframe}``. Note that ``array-like`` can also be a ``list``, while
+    ``ndarray`` is explicitly only a ``numpy.ndarray``.
+  * Specify ``dataframe`` when "frame-like" features are being used, such as
+    the column names.
+  * When specifying the data type of a list, use ``of`` as a delimiter: ``list
+    of int``. When the parameter supports arrays giving details about the
+    shape and/or data type and a list of such arrays, you can use one of
+    ``array-like of shape (n_samples,) or list of such arrays``.
+  * When specifying the dtype of an ndarray, use e.g. ``dtype=np.int32`` after
+    defining the shape: ``ndarray of shape (n_samples,), dtype=np.int32``. You
+    can specify multiple dtype as a set: ``array-like of shape (n_samples,),
+    dtype={np.float64, np.float32}``. If one wants to mention arbitrary
+    precision, use `integral` and `floating` rather than the Python dtype
+    `int` and `float`. When both `int` and `floating` are supported, there is
+    no need to specify the dtype.
+  * When the default is ``None``, ``None`` only needs to be specified at the
+    end with ``default=None``. Be sure to include in the docstring, what it
+    means for the parameter or attribute to be ``None``.
 
 * Add "See Also" in docstrings for related classes/functions.
 
@@ -809,15 +809,15 @@ details, and give intuition to the reader on what the algorithm does.
 
 * Information that can be hidden by default using dropdowns is:
 
-    * low hierarchy sections such as `References`, `Properties`, etc. (see for
-      instance the subsections in :ref:`det_curve`);
+  * low hierarchy sections such as `References`, `Properties`, etc. (see for
+    instance the subsections in :ref:`det_curve`);
 
-    * in-depth mathematical details;
+  * in-depth mathematical details;
 
-    * narrative that is use-case specific;
+  * narrative that is use-case specific;
 
-    * in general, narrative that may only interest users that want to go beyond
-      the pragmatics of a given tool.
+  * in general, narrative that may only interest users that want to go beyond
+    the pragmatics of a given tool.
 
 * Do not use dropdowns for the low level section `Examples`, as it should stay
   visible to all users. Make sure that the `Examples` section comes right after