diff --git a/doc/themes/scikit-learn-modern/static/css/theme.css b/doc/themes/scikit-learn-modern/static/css/theme.css index db2acbc3a11bb..c00ecee1ad1ad 100644 --- a/doc/themes/scikit-learn-modern/static/css/theme.css +++ b/doc/themes/scikit-learn-modern/static/css/theme.css @@ -839,10 +839,6 @@ div.highlight:hover span.copybutton:hover { background-color: #20252B; } -div.body img.align-center { - max-width: 800px; -} - div.body img { max-width: 100%; height: unset!important; /* Needed because sphinx sets the height */ @@ -1234,6 +1230,10 @@ table.sk-sponsor-table td { padding: 0.30rem; } +.caption { + text-align: center +} + /* pygments - highlightning */ .highlight .hll { background-color: #ffffcc } diff --git a/doc/tutorial/statistical_inference/finding_help.rst b/doc/tutorial/statistical_inference/finding_help.rst deleted file mode 100644 index 69026e2e5dbd2..0000000000000 --- a/doc/tutorial/statistical_inference/finding_help.rst +++ /dev/null @@ -1,32 +0,0 @@ -Finding help -============ - - -The project mailing list ------------------------- - -If you encounter a bug with ``scikit-learn`` or something that needs -clarification in the docstring or the online documentation, please feel free to -ask on the `Mailing List `_ - - -Q&A communities with Machine Learning practitioners ----------------------------------------------------- - - :Quora.com: - - Quora has a topic for Machine Learning related questions that - also features some interesting discussions: - https://www.quora.com/topic/Machine-Learning - - :Stack Exchange: - - The Stack Exchange family of sites hosts `multiple subdomains for Machine Learning questions`_. - -.. _`How do I learn machine learning?`: https://www.quora.com/How-do-I-learn-machine-learning-1 - -.. _`multiple subdomains for Machine Learning questions`: https://meta.stackexchange.com/q/130524 - --- _'An excellent free online course for Machine Learning taught by Professor Andrew Ng of Stanford': https://www.coursera.org/learn/machine-learning - --- _'Another excellent free online course that takes a more general approach to Artificial Intelligence': https://www.udacity.com/course/intro-to-artificial-intelligence--cs271 diff --git a/doc/tutorial/statistical_inference/index.rst b/doc/tutorial/statistical_inference/index.rst index f4aa9f8833129..1ea527054fc38 100644 --- a/doc/tutorial/statistical_inference/index.rst +++ b/doc/tutorial/statistical_inference/index.rst @@ -34,4 +34,3 @@ A tutorial on statistical-learning for scientific data processing model_selection unsupervised_learning putting_together - finding_help diff --git a/doc/tutorial/statistical_inference/model_selection.rst b/doc/tutorial/statistical_inference/model_selection.rst index 63af08b320752..070e86c18e8b1 100644 --- a/doc/tutorial/statistical_inference/model_selection.rst +++ b/doc/tutorial/statistical_inference/model_selection.rst @@ -180,23 +180,20 @@ scoring method. .. currentmodule:: sklearn.svm .. topic:: **Exercise** - :class: green - On the digits dataset, plot the cross-validation score of a :class:`SVC` - estimator with an linear kernel as a function of parameter ``C`` (use a - logarithmic grid of points, from 1 to 10). + On the digits dataset, plot the cross-validation score of a :class:`SVC` + estimator with an linear kernel as a function of parameter ``C`` (use a + logarithmic grid of points, from 1 to 10). - .. literalinclude:: ../../auto_examples/exercises/plot_cv_digits.py - :lines: 13-23 - - .. image:: /auto_examples/exercises/images/sphx_glr_plot_cv_digits_001.png + .. literalinclude:: ../../auto_examples/exercises/plot_cv_digits.py + :lines: 13-23 + + .. image:: /auto_examples/exercises/images/sphx_glr_plot_cv_digits_001.png :target: ../../auto_examples/exercises/plot_cv_digits.html :align: center :scale: 90 - **Solution:** :ref:`sphx_glr_auto_examples_exercises_plot_cv_digits.py` - - + **Solution:** :ref:`sphx_glr_auto_examples_exercises_plot_cv_digits.py` Grid-search and cross-validated estimators ============================================ @@ -272,7 +269,6 @@ These estimators are called similarly to their counterparts, with 'CV' appended to their name. .. topic:: **Exercise** - :class: green On the diabetes dataset, find the optimal regularization parameter alpha. diff --git a/doc/tutorial/statistical_inference/putting_together.rst b/doc/tutorial/statistical_inference/putting_together.rst index 5106958d77e96..033bed2e33884 100644 --- a/doc/tutorial/statistical_inference/putting_together.rst +++ b/doc/tutorial/statistical_inference/putting_together.rst @@ -11,16 +11,13 @@ Pipelining We have seen that some estimators can transform data and that some estimators can predict variables. We can also create combined estimators: -.. image:: ../../auto_examples/compose/images/sphx_glr_plot_digits_pipe_001.png - :target: ../../auto_examples/compose/plot_digits_pipe.html - :scale: 65 - :align: right - .. literalinclude:: ../../auto_examples/compose/plot_digits_pipe.py :lines: 23-63 - - +.. image:: ../../auto_examples/compose/images/sphx_glr_plot_digits_pipe_001.png + :target: ../../auto_examples/compose/plot_digits_pipe.html + :scale: 65 + :align: center Face recognition with eigenfaces ================================= @@ -34,26 +31,15 @@ The dataset used in this example is a preprocessed excerpt of the .. literalinclude:: ../../auto_examples/applications/plot_face_recognition.py -.. |prediction| image:: ../../images/plot_face_recognition_1.png - :scale: 50 - -.. |eigenfaces| image:: ../../images/plot_face_recognition_2.png +.. figure:: ../../images/plot_face_recognition_1.png :scale: 50 -.. list-table:: - :class: centered - - * + **Prediction** - - |prediction| - - - |eigenfaces| - - * - - - **Prediction** +.. figure:: ../../images/plot_face_recognition_2.png + :scale: 50 - - **Eigenfaces** + **Eigenfaces** Expected results for the top 5 most represented people in the dataset:: diff --git a/doc/tutorial/statistical_inference/settings.rst b/doc/tutorial/statistical_inference/settings.rst index 0ca4c69f48f2e..0ab9e39d63345 100644 --- a/doc/tutorial/statistical_inference/settings.rst +++ b/doc/tutorial/statistical_inference/settings.rst @@ -38,19 +38,20 @@ needs to be preprocessed in order to be used by scikit-learn. >>> digits.images.shape (1797, 8, 8) >>> import matplotlib.pyplot as plt #doctest: +SKIP - >>> plt.imshow(digits.images[-1], cmap=plt.cm.gray_r) #doctest: +SKIP + >>> plt.imshow(digits.images[-1], + ... cmap=plt.cm.gray_r) #doctest: +SKIP .. image:: /auto_examples/datasets/images/sphx_glr_plot_digits_last_image_001.png :target: ../../auto_examples/datasets/plot_digits_last_image.html - :align: left - :scale: 60 - + :align: center + To use this dataset with scikit-learn, we transform each 8x8 image into a feature vector of length 64 :: - >>> data = digits.images.reshape((digits.images.shape[0], -1)) - + >>> data = digits.images.reshape( + ... (digits.images.shape[0], -1) + ... ) Estimators objects =================== diff --git a/doc/tutorial/statistical_inference/supervised_learning.rst b/doc/tutorial/statistical_inference/supervised_learning.rst index 18a7f1336da11..013100a054648 100644 --- a/doc/tutorial/statistical_inference/supervised_learning.rst +++ b/doc/tutorial/statistical_inference/supervised_learning.rst @@ -38,11 +38,6 @@ Nearest neighbor and the curse of dimensionality .. topic:: Classifying irises: - .. image:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png - :target: ../../auto_examples/datasets/plot_iris_dataset.html - :align: right - :scale: 65 - The iris dataset is a classification task consisting in identifying 3 different types of irises (Setosa, Versicolour, and Virginica) from their petal and sepal length and width:: @@ -53,6 +48,11 @@ Nearest neighbor and the curse of dimensionality >>> np.unique(iris_y) array([0, 1, 2]) + .. image:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png + :target: ../../auto_examples/datasets/plot_iris_dataset.html + :align: center + :scale: 50 + k-Nearest neighbors classifier ------------------------------- @@ -155,11 +155,6 @@ in its simplest form, fits a linear model to the data set by adjusting a set of parameters in order to make the sum of the squared residuals of the model as small as possible. -.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_001.png - :target: ../../auto_examples/linear_model/plot_ols.html - :scale: 40 - :align: right - Linear models: :math:`y = X\beta + \epsilon` * :math:`X`: data @@ -167,6 +162,11 @@ Linear models: :math:`y = X\beta + \epsilon` * :math:`\beta`: Coefficients * :math:`\epsilon`: Observation noise +.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_001.png + :target: ../../auto_examples/linear_model/plot_ols.html + :scale: 50 + :align: center + :: >>> from sklearn import linear_model @@ -197,11 +197,6 @@ Shrinkage If there are few data points per dimension, noise in the observations induces high variance: -.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_ridge_variance_001.png - :target: ../../auto_examples/linear_model/plot_ols_ridge_variance.html - :scale: 70 - :align: right - :: >>> X = np.c_[ .5, 1].T @@ -219,18 +214,15 @@ induces high variance: ... plt.plot(test, regr.predict(test)) # doctest: +SKIP ... plt.scatter(this_X, y, s=3) # doctest: +SKIP - +.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_ridge_variance_001.png + :target: ../../auto_examples/linear_model/plot_ols_ridge_variance.html + :align: center A solution in high-dimensional statistical learning is to *shrink* the regression coefficients to zero: any two randomly chosen set of observations are likely to be uncorrelated. This is called :class:`Ridge` regression: -.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_ridge_variance_002.png - :target: ../../auto_examples/linear_model/plot_ols_ridge_variance.html - :scale: 70 - :align: right - :: >>> regr = linear_model.Ridge(alpha=.1) @@ -244,6 +236,10 @@ regression: ... plt.plot(test, regr.predict(test)) # doctest: +SKIP ... plt.scatter(this_X, y, s=3) # doctest: +SKIP +.. image:: /auto_examples/linear_model/images/sphx_glr_plot_ols_ridge_variance_002.png + :target: ../../auto_examples/linear_model/plot_ols_ridge_variance.html + :align: center + This is an example of **bias/variance tradeoff**: the larger the ridge ``alpha`` parameter, the higher the bias and the lower the variance. @@ -327,8 +323,8 @@ application of Occam's razor: *prefer simpler models*. >>> regr.fit(diabetes_X_train, diabetes_y_train) Lasso(alpha=0.025118864315095794) >>> print(regr.coef_) - [ 0. -212.43764548 517.19478111 313.77959962 -160.8303982 -0. - -187.19554705 69.38229038 508.66011217 71.84239008] + [ 0. -212.437... 517.194... 313.779... -160.830... + -0. -187.195... 69.382... 508.660... 71.842...] .. topic:: **Different algorithms for the same problem** @@ -346,17 +342,17 @@ application of Occam's razor: *prefer simpler models*. Classification --------------- -.. image:: /auto_examples/linear_model/images/sphx_glr_plot_logistic_001.png - :target: ../../auto_examples/linear_model/plot_logistic.html - :scale: 65 - :align: right - For classification, as in the labeling `iris `_ task, linear regression is not the right approach as it will give too much weight to data far from the decision frontier. A linear approach is to fit a sigmoid function or **logistic** function: +.. image:: /auto_examples/linear_model/images/sphx_glr_plot_logistic_001.png + :target: ../../auto_examples/linear_model/plot_logistic.html + :scale: 70 + :align: center + .. math:: y = \textrm{sigmoid}(X\beta - \textrm{offset}) + \epsilon = @@ -373,6 +369,7 @@ This is known as :class:`LogisticRegression`. .. image:: /auto_examples/linear_model/images/sphx_glr_plot_iris_logistic_001.png :target: ../../auto_examples/linear_model/plot_iris_logistic.html :scale: 83 + :align: center .. topic:: Multiclass classification @@ -398,7 +395,7 @@ This is known as :class:`LogisticRegression`. .. literalinclude:: ../../auto_examples/exercises/plot_digits_classification_exercise.py :lines: 15-19 - Solution: :download:`../../auto_examples/exercises/plot_digits_classification_exercise.py` + A solution can be downloaded :download:`here <../../auto_examples/exercises/plot_digits_classification_exercise.py>`. Support vector machines (SVMs) @@ -418,21 +415,15 @@ the separating line (less regularization). .. currentmodule :: sklearn.svm -.. |svm_margin_unreg| image:: /auto_examples/svm/images/sphx_glr_plot_svm_margin_001.png +.. figure:: /auto_examples/svm/images/sphx_glr_plot_svm_margin_001.png :target: ../../auto_examples/svm/plot_svm_margin.html - :scale: 70 + + **Unregularized SVM** -.. |svm_margin_reg| image:: /auto_examples/svm/images/sphx_glr_plot_svm_margin_002.png +.. figure:: /auto_examples/svm/images/sphx_glr_plot_svm_margin_002.png :target: ../../auto_examples/svm/plot_svm_margin.html - :scale: 70 - -.. rst-class:: centered - ============================= ============================== - **Unregularized SVM** **Regularized SVM (default)** - ============================= ============================== - |svm_margin_unreg| |svm_margin_reg| - ============================= ============================== + **Regularized SVM (default)** .. topic:: Example: @@ -459,79 +450,46 @@ classification --:class:`SVC` (Support Vector Classification). .. _using_kernels_tut: Using kernels --------------- +------------- Classes are not always linearly separable in feature space. The solution is to build a decision function that is not linear but may be polynomial instead. This is done using the *kernel trick* that can be seen as creating a decision energy by positioning *kernels* on observations: -.. |svm_kernel_linear| image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_001.png - :target: ../../auto_examples/svm/plot_svm_kernels.html - :scale: 65 - -.. |svm_kernel_poly| image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_002.png - :target: ../../auto_examples/svm/plot_svm_kernels.html - :scale: 65 - -.. rst-class:: centered - - .. list-table:: - - * - - - **Linear kernel** - - - **Polynomial kernel** - - - - * - - - |svm_kernel_linear| - - - |svm_kernel_poly| - - - - * - - - :: - - >>> svc = svm.SVC(kernel='linear') - - - :: - - >>> svc = svm.SVC(kernel='poly', - ... degree=3) - >>> # degree: polynomial degree +Linear kernel +^^^^^^^^^^^^^ +:: + >>> svc = svm.SVC(kernel='linear') -.. |svm_kernel_rbf| image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_003.png +.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_001.png :target: ../../auto_examples/svm/plot_svm_kernels.html - :scale: 65 -.. rst-class:: centered - - .. list-table:: - - * +Polynomial kernel +^^^^^^^^^^^^^^^^^ - - **RBF kernel (Radial Basis Function)** +:: + >>> svc = svm.SVC(kernel='poly', + ... degree=3) + >>> # degree: polynomial degree - * +.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_002.png + :target: ../../auto_examples/svm/plot_svm_kernels.html - - |svm_kernel_rbf| +RBF kernel (Radial Basis Function) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - * +:: - - :: + >>> svc = svm.SVC(kernel='rbf') + >>> # gamma: inverse of size of + >>> # radial kernel - >>> svc = svm.SVC(kernel='rbf') - >>> # gamma: inverse of size of - >>> # radial kernel +.. image:: /auto_examples/svm/images/sphx_glr_plot_svm_kernels_003.png + :target: ../../auto_examples/svm/plot_svm_kernels.html @@ -541,11 +499,6 @@ creating a decision energy by positioning *kernels* on observations: ``svm_gui.py``; add data points of both classes with right and left button, fit the model and change parameters and data. -.. image:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png - :target: ../../auto_examples/datasets/plot_iris_dataset.html - :align: right - :scale: 70 - .. topic:: **Exercise** :class: green @@ -562,4 +515,10 @@ creating a decision energy by positioning *kernels* on observations: .. literalinclude:: ../../auto_examples/exercises/plot_iris_exercise.py :lines: 18-23 - Solution: :download:`../../auto_examples/exercises/plot_iris_exercise.py` + .. image:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png + :target: ../../auto_examples/datasets/plot_iris_dataset.html + :align: center + :scale: 70 + + + A solution can be downloaded :download:`here <../../auto_examples/exercises/plot_iris_exercise.py>` diff --git a/doc/tutorial/statistical_inference/unsupervised_learning.rst b/doc/tutorial/statistical_inference/unsupervised_learning.rst index b87fb64ec8d9b..033872fac895e 100644 --- a/doc/tutorial/statistical_inference/unsupervised_learning.rst +++ b/doc/tutorial/statistical_inference/unsupervised_learning.rst @@ -41,18 +41,6 @@ algorithms. The simplest clustering algorithm is :ref:`k_means`. >>> print(y_iris[::10]) [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2] -.. |k_means_iris_bad_init| image:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_003.png - :target: ../../auto_examples/cluster/plot_cluster_iris.html - :scale: 63 - -.. |k_means_iris_8| image:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_001.png - :target: ../../auto_examples/cluster/plot_cluster_iris.html - :scale: 63 - -.. |cluster_iris_truth| image:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_004.png - :target: ../../auto_examples/cluster/plot_cluster_iris.html - :scale: 63 - .. warning:: There is absolutely no guarantee of recovering a ground truth. First, @@ -60,43 +48,28 @@ algorithms. The simplest clustering algorithm is :ref:`k_means`. is sensitive to initialization, and can fall into local minima, although scikit-learn employs several tricks to mitigate this issue. - .. list-table:: - :class: centered - - * + | - - |k_means_iris_bad_init| + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_003.png + :target: ../../auto_examples/cluster/plot_cluster_iris.html + :scale: 63 - - |k_means_iris_8| + **Bad initialization** - - |cluster_iris_truth| + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_001.png + :target: ../../auto_examples/cluster/plot_cluster_iris.html + :scale: 63 - * + **8 clusters** - - **Bad initialization** + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_cluster_iris_004.png + :target: ../../auto_examples/cluster/plot_cluster_iris.html + :scale: 63 - - **8 clusters** - - - **Ground truth** + **Ground truth** **Don't over-interpret clustering results** -.. |face| image:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_001.png - :target: ../../auto_examples/cluster/plot_face_compress.html - :scale: 60 - -.. |face_regular| image:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_002.png - :target: ../../auto_examples/cluster/plot_face_compress.html - :scale: 60 - -.. |face_compressed| image:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_003.png - :target: ../../auto_examples/cluster/plot_face_compress.html - :scale: 60 - -.. |face_histogram| image:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_004.png - :target: ../../auto_examples/cluster/plot_face_compress.html - :scale: 60 - .. topic:: **Application example: vector quantization** Clustering in general and KMeans, in particular, can be seen as a way @@ -120,28 +93,27 @@ algorithms. The simplest clustering algorithm is :ref:`k_means`. >>> face_compressed = np.choose(labels, values) >>> face_compressed.shape = face.shape - .. list-table:: - :class: centered - - * - - |face| - - |face_compressed| + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_001.png + :target: ../../auto_examples/cluster/plot_face_compress.html - - |face_regular| + **Raw image** - - |face_histogram| + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_003.png + :target: ../../auto_examples/cluster/plot_face_compress.html - * + **K-means quantization** - - Raw image + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_002.png + :target: ../../auto_examples/cluster/plot_face_compress.html - - K-means quantization + **Equal bins** - - Equal bins - - Image histogram + .. figure:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_004.png + :target: ../../auto_examples/cluster/plot_face_compress.html + **Image histogram** Hierarchical agglomerative clustering: Ward ---------------------------------------------