diff --git a/doc/Makefile b/doc/Makefile
index 5e9861159a1..4390fed7e0b 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -38,6 +38,7 @@ $(HTML_DIR)/2019-07-03-%.html: $(IPYNB_DIR)/%.ipynb
 	@mkdir -p $(FAIL_DIR)
 	@echo "[nbconvert]  $<"
 	@jupyter nbconvert $< --to html --template nb.tpl \
+			--ExecutePreprocessor.timeout=600\
 	  	--output-dir $(HTML_DIR) --output 2019-07-03-$*.html \
 	  	--execute > $(FAIL_DIR)/$* 2>&1  && rm -f $(FAIL_DIR)/$*
 
diff --git a/doc/python/ml-knn.md b/doc/python/ml-knn.md
new file mode 100644
index 00000000000..6d823cf49b1
--- /dev/null
+++ b/doc/python/ml-knn.md
@@ -0,0 +1,324 @@
+---
+jupyter:
+  jupytext:
+    notebook_metadata_filter: all
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.2'
+      jupytext_version: 1.4.2
+  kernelspec:
+    display_name: Python 3
+    language: python
+    name: python3
+  language_info:
+    codemirror_mode:
+      name: ipython
+      version: 3
+    file_extension: .py
+    mimetype: text/x-python
+    name: python
+    nbconvert_exporter: python
+    pygments_lexer: ipython3
+    version: 3.7.7
+  plotly:
+    description: Visualize scikit-learn's k-Nearest Neighbors (kNN) classification
+      in Python with Plotly.
+    display_as: ai_ml
+    language: python
+    layout: base
+    name: kNN Classification
+    order: 2
+    page_type: u-guide
+    permalink: python/knn-classification/
+    thumbnail: thumbnail/knn-classification.png
+---
+
+## Basic binary classification with kNN
+
+This section gets us started with displaying basic binary classification using 2D data. We first show how to display training versus testing data using [various marker styles](https://plot.ly/python/marker-style/), then demonstrate how to evaluate our classifier's performance on the **test split** using a continuous color gradient to indicate the model's predicted score.
+
+We will use [Scikit-learn](https://scikit-learn.org/) for training our model and for loading and splitting data. Scikit-learn is a popular Machine Learning (ML) library that offers various tools for creating and training ML algorithms, feature engineering, data cleaning, and evaluating and testing models. It was designed to be accessible, and to work seamlessly with popular libraries like NumPy and Pandas.
+
+We will train a [k-Nearest Neighbors (kNN)](https://scikit-learn.org/stable/modules/neighbors.html) classifier. First, the model records the label of each training sample. Then, whenever we give it a new sample, it will look at the `k` closest samples from the training set to find the most common label, and assign it to our new sample.
+
+
+### Display training and test splits
+
+Using Scikit-learn, we first generate synthetic data that form the shape of a moon. We then split it into a training and testing set. Finally, we display the ground truth labels using [a scatter plot](https://plotly.com/python/line-and-scatter/).
+
+In the graph, we display all the negative labels as squares, and positive labels as circles. We differentiate the training and test set by adding a dot to the center of test data.
+
+In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
+
+```python
+import plotly.graph_objects as go
+import numpy as np
+from sklearn.datasets import make_moons
+from sklearn.model_selection import train_test_split
+from sklearn.neighbors import KNeighborsClassifier
+
+# Load and split data
+X, y = make_moons(noise=0.3, random_state=0)
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y.astype(str), test_size=0.25, random_state=0)
+
+trace_specs = [
+    [X_train, y_train, '0', 'Train', 'square'],
+    [X_train, y_train, '1', 'Train', 'circle'],
+    [X_test, y_test, '0', 'Test', 'square-dot'],
+    [X_test, y_test, '1', 'Test', 'circle-dot']
+]
+
+fig = go.Figure(data=[
+    go.Scatter(
+        x=X[y==label, 0], y=X[y==label, 1],
+        name=f'{split} Split, Label {label}',
+        mode='markers', marker_symbol=marker
+    )
+    for X, y, label, split, marker in trace_specs
+])
+fig.update_traces(
+    marker_size=12, marker_line_width=1.5,
+    marker_color="lightyellow"
+)
+fig.show()
+```
+
+### Visualize predictions on test split with [`plotly.express`](https://plotly.com/python/plotly-express/)
+
+
+Now, we train the kNN model on the same training data displayed in the previous graph. Then, we predict the confidence score of the model for each of the data points in the test set. We will use shapes to denote the true labels, and the color will indicate the confidence of the model for assign that score.
+
+In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures. Notice that `px.scatter` only require 1 function call to plot both negative and positive labels, and can additionally set a continuous color scale based on the `y_score` output by our kNN model.
+
+```python
+import plotly.express as px
+import numpy as np
+from sklearn.datasets import make_moons
+from sklearn.model_selection import train_test_split
+from sklearn.neighbors import KNeighborsClassifier
+
+# Load and split data
+X, y = make_moons(noise=0.3, random_state=0)
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y.astype(str), test_size=0.25, random_state=0)
+
+# Fit the model on training data, predict on test data
+clf = KNeighborsClassifier(15)
+clf.fit(X_train, y_train)
+y_score = clf.predict_proba(X_test)[:, 1]
+
+fig = px.scatter(
+    X_test, x=0, y=1,
+    color=y_score, color_continuous_scale='RdBu',
+    symbol=y_test, symbol_map={'0': 'square-dot', '1': 'circle-dot'},
+    labels={'symbol': 'label', 'color': 'score of <br>first class'}
+)
+fig.update_traces(marker_size=12, marker_line_width=1.5)
+fig.update_layout(legend_orientation='h')
+fig.show()
+```
+
+## Probability Estimates with `go.Contour`
+
+Just like the previous example, we will first train our kNN model on the training set.
+
+Instead of predicting the conference for the test set, we can predict the confidence map for the entire area that wraps around the dimensions of our dataset. To do this, we use [`np.meshgrid`](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a grid, where the distance between each point is denoted by the `mesh_size` variable.
+
+Then, for each of those points, we will use our model to give a confidence score, and plot it with a [contour plot](https://plotly.com/python/contour-plots/).
+
+In this example, we will use [graph objects](/python/graph-objects/), Plotly's low-level API for building figures.
+
+```python
+import plotly.graph_objects as go
+import numpy as np
+from sklearn.datasets import make_moons
+from sklearn.model_selection import train_test_split
+from sklearn.neighbors import KNeighborsClassifier
+
+mesh_size = .02
+margin = 0.25
+
+# Load and split data
+X, y = make_moons(noise=0.3, random_state=0)
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y.astype(str), test_size=0.25, random_state=0)
+
+# Create a mesh grid on which we will run our model
+x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
+y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
+xrange = np.arange(x_min, x_max, mesh_size)
+yrange = np.arange(y_min, y_max, mesh_size)
+xx, yy = np.meshgrid(xrange, yrange)
+
+# Create classifier, run predictions on grid
+clf = KNeighborsClassifier(15, weights='uniform')
+clf.fit(X, y)
+Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
+Z = Z.reshape(xx.shape)
+
+
+# Plot the figure
+fig = go.Figure(data=[
+    go.Contour(
+        x=xrange,
+        y=yrange,
+        z=Z,
+        colorscale='RdBu'
+    )
+])
+fig.show()
+```
+
+Now, let's try to combine our `go.Contour` plot with the first scatter plot of our data points, so that we can visually compare the confidence of our model with the true labels.
+
+```python
+import plotly.graph_objects as go
+import numpy as np
+from sklearn.datasets import make_moons
+from sklearn.model_selection import train_test_split
+from sklearn.neighbors import KNeighborsClassifier
+
+mesh_size = .02
+margin = 0.25
+
+# Load and split data
+X, y = make_moons(noise=0.3, random_state=0)
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y.astype(str), test_size=0.25, random_state=0)
+
+# Create a mesh grid on which we will run our model
+x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
+y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
+xrange = np.arange(x_min, x_max, mesh_size)
+yrange = np.arange(y_min, y_max, mesh_size)
+xx, yy = np.meshgrid(xrange, yrange)
+
+# Create classifier, run predictions on grid
+clf = KNeighborsClassifier(15, weights='uniform')
+clf.fit(X, y)
+Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
+Z = Z.reshape(xx.shape)
+
+trace_specs = [
+    [X_train, y_train, '0', 'Train', 'square'],
+    [X_train, y_train, '1', 'Train', 'circle'],
+    [X_test, y_test, '0', 'Test', 'square-dot'],
+    [X_test, y_test, '1', 'Test', 'circle-dot']
+]
+
+fig = go.Figure(data=[
+    go.Scatter(
+        x=X[y==label, 0], y=X[y==label, 1],
+        name=f'{split} Split, Label {label}',
+        mode='markers', marker_symbol=marker
+    )
+    for X, y, label, split, marker in trace_specs
+])
+fig.update_traces(
+    marker_size=12, marker_line_width=1.5,
+    marker_color="lightyellow"
+)
+
+fig.add_trace(
+    go.Contour(
+        x=xrange,
+        y=yrange,
+        z=Z,
+        showscale=False,
+        colorscale='RdBu',
+        opacity=0.4,
+        name='Score',
+        hoverinfo='skip'
+    )
+)
+fig.show()
+```
+
+## Multi-class prediction confidence with [`go.Heatmap`](https://plotly.com/python/heatmaps/)
+
+It is also possible to visualize the prediction confidence of the model using [heatmaps](https://plotly.com/python/heatmaps/). In this example, you can see how to compute how confident the model is about its prediction at every point in the 2D grid. Here, we define the confidence as the difference between the highest score and the score of the other classes summed, at a certain point.
+
+In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
+
+```python
+import plotly.express as px
+import plotly.graph_objects as go
+import numpy as np
+from sklearn.neighbors import KNeighborsClassifier
+
+mesh_size = .02
+margin = 1
+
+# We will use the iris data, which is included in px
+df = px.data.iris()
+df_train, df_test = train_test_split(df, test_size=0.25, random_state=0)
+X_train = df_train[['sepal_length', 'sepal_width']]
+y_train = df_train.species_id
+
+# Create a mesh grid on which we will run our model
+l_min, l_max = df.sepal_length.min() - margin, df.sepal_length.max() + margin
+w_min, w_max = df.sepal_width.min() - margin, df.sepal_width.max() + margin
+lrange = np.arange(l_min, l_max, mesh_size)
+wrange = np.arange(w_min, w_max, mesh_size)
+ll, ww = np.meshgrid(lrange, wrange)
+
+# Create classifier, run predictions on grid
+clf = KNeighborsClassifier(15, weights='distance')
+clf.fit(X_train, y_train)
+Z = clf.predict(np.c_[ll.ravel(), ww.ravel()])
+Z = Z.reshape(ll.shape)
+proba = clf.predict_proba(np.c_[ll.ravel(), ww.ravel()])
+proba = proba.reshape(ll.shape + (3,))
+
+# Compute the confidence, which is the difference
+diff = proba.max(axis=-1) - (proba.sum(axis=-1) - proba.max(axis=-1))
+
+fig = px.scatter(
+    df_test, x='sepal_length', y='sepal_width',
+    symbol='species',
+    symbol_map={
+        'setosa': 'square-dot',
+        'versicolor': 'circle-dot',
+        'virginica': 'diamond-dot'},
+)
+fig.update_traces(
+    marker_size=12, marker_line_width=1.5,
+    marker_color="lightyellow"
+)
+fig.add_trace(
+    go.Heatmap(
+        x=lrange,
+        y=wrange,
+        z=diff,
+        opacity=0.25,
+        customdata=proba,
+        colorscale='RdBu',
+        hovertemplate=(
+            'sepal length: %{x} <br>'
+            'sepal width: %{y} <br>'
+            'p(setosa): %{customdata[0]:.3f}<br>'
+            'p(versicolor): %{customdata[1]:.3f}<br>'
+            'p(virginica): %{customdata[2]:.3f}<extra></extra>'
+        )
+    )
+)
+fig.update_layout(
+    legend_orientation='h',
+    title='Prediction Confidence on Test Split'
+)
+fig.show()
+```
+
+### Reference
+
+Learn more about `px`, `go.Contour`, and `go.Heatmap` here:
+* https://plot.ly/python/plotly-express/
+* https://plot.ly/python/heatmaps/
+* https://plot.ly/python/contour-plots/
+
+This tutorial was inspired by amazing examples from the official scikit-learn docs:
+* https://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html
+* https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
+* https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
diff --git a/doc/python/ml-pca.md b/doc/python/ml-pca.md
new file mode 100644
index 00000000000..3737a2c5e3a
--- /dev/null
+++ b/doc/python/ml-pca.md
@@ -0,0 +1,265 @@
+---
+jupyter:
+  jupytext:
+    notebook_metadata_filter: all
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.1'
+      jupytext_version: 1.1.1
+  kernelspec:
+    display_name: Python 3
+    language: python
+    name: python3
+  language_info:
+    codemirror_mode:
+      name: ipython
+      version: 3
+    file_extension: .py
+    mimetype: text/x-python
+    name: python
+    nbconvert_exporter: python
+    pygments_lexer: ipython3
+    version: 3.7.7
+  plotly:
+    description: Visualize Principle Component Analysis (PCA) of your high-dimensional
+      data in Python with Plotly.
+    display_as: ai_ml
+    language: python
+    layout: base
+    name: PCA Visualization
+    order: 4
+    page_type: u-guide
+    permalink: python/pca-visualization/
+    thumbnail: thumbnail/ml-pca.png
+---
+
+This page first shows how to visualize higher dimension data using various Plotly figures combined with dimensionality reduction (aka projection). Then, we dive into the specific details of our projection algorithm.
+
+We will use [Scikit-learn](https://scikit-learn.org/) to load one of the datasets, and apply dimensionality reduction. Scikit-learn is a popular Machine Learning (ML) library that offers various tools for creating and training ML algorithms, feature engineering, data cleaning, and evaluating and testing models. It was designed to be accessible, and to work seamlessly with popular libraries like NumPy and Pandas.
+
+
+## High-dimensional PCA Analysis with  `px.scatter_matrix`
+
+The dimensionality reduction technique we will be using is called the [Principal Component Analysis (PCA)](https://scikit-learn.org/stable/modules/decomposition.html#pca). It is a powerful technique that arises from linear algebra and probability theory. In essense, it computes a matrix that represents the variation of your data ([covariance matrix/eigenvectors][covmatrix]), and rank them by their relevance (explained variance/eigenvalues). For a video tutorial, see [this segment on PCA](https://youtu.be/rng04VJxUt4?t=98) from the Coursera ML course.
+
+[covmatrix]: https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues#:~:text=As%20it%20is%20a%20square%20symmetric%20matrix%2C%20it%20can%20be%20diagonalized%20by%20choosing%20a%20new%20orthogonal%20coordinate%20system%2C%20given%20by%20its%20eigenvectors%20(incidentally%2C%20this%20is%20called%20spectral%20theorem)%3B%20corresponding%20eigenvalues%20will%20then%20be%20located%20on%20the%20diagonal.%20In%20this%20new%20coordinate%20system%2C%20the%20covariance%20matrix%20is%20diagonal%20and%20looks%20like%20that%3A
+
+
+### Visualize all the original dimensions
+
+First, let's plot all the features and see how the `species` in the Iris dataset are grouped. In a [Scatter Plot Matrix (splom)](https://plot.ly/python/splom/), each subplot displays a feature against another, so if we have $N$ features we have a $N \times N$ matrix.
+
+In our example, we are plotting all 4 features from the Iris dataset, thus we can see how `sepal_width` is compared against `sepal_length`, then against `petal_width`, and so forth. Keep in mind how some pairs of features can more easily separate different species.
+
+In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
+
+```python
+import plotly.express as px
+
+df = px.data.iris()
+features = ["sepal_width", "sepal_length", "petal_width", "petal_length"]
+
+fig = px.scatter_matrix(
+    df,
+    dimensions=features,
+    color="species"
+)
+fig.update_traces(diagonal_visible=False)
+fig.show()
+```
+
+###  Visualize all the principal components
+
+Now, we apply `PCA` the same dataset, and retrieve **all** the components. We use the same `px.scatter_matrix` trace to display our results, but this time our features are the resulting *principal components*, ordered by how much variance they are able to explain.
+
+The importance of explained variance is demonstrated in the example below. The subplot between PC3 and PC4 is clearly unable to separate each class, whereas the subplot between PC1 and PC2 shows a clear separation between each species.
+
+In this example, we will use [Plotly Express](/python/plotly-express/), Plotly's high-level API for building figures.
+
+```python
+import plotly.express as px
+from sklearn.decomposition import PCA
+
+df = px.data.iris()
+features = ["sepal_width", "sepal_length", "petal_width", "petal_length"]
+
+pca = PCA()
+components = pca.fit_transform(df[features])
+labels = {
+    str(i): f"PC {i+1} ({var:.1f}%)"
+    for i, var in enumerate(pca.explained_variance_ratio_ * 100)
+}
+
+fig = px.scatter_matrix(
+    components,
+    labels=labels,
+    dimensions=range(4),
+    color=df["species"]
+)
+fig.update_traces(diagonal_visible=False)
+fig.show()
+```
+
+### Visualize a subset of the principal components
+
+When you will have too many features to visualize, you might be interested in only visualizing the most relevant components. Those components often capture a majority of the [explained variance](https://en.wikipedia.org/wiki/Explained_variation), which is a good way to tell if those components are sufficient for modelling this dataset.
+
+In the example below, our dataset contains 10 features, but we only select the first 4 components, since they explain over 99% of the total variance.
+
+```python
+import pandas as pd
+import plotly.express as px
+from sklearn.decomposition import PCA
+from sklearn.datasets import load_boston
+
+boston = load_boston()
+df = pd.DataFrame(boston.data, columns=boston.feature_names)
+n_components = 4
+
+pca = PCA(n_components=n_components)
+components = pca.fit_transform(df)
+
+total_var = pca.explained_variance_ratio_.sum() * 100
+
+labels = {str(i): f"PC {i+1}" for i in range(n_components)}
+labels['color'] = 'Median Price'
+
+fig = px.scatter_matrix(
+    components,
+    color=boston.target,
+    dimensions=range(n_components),
+    labels=labels,
+    title=f'Total Explained Variance: {total_var:.2f}%',
+)
+fig.update_traces(diagonal_visible=False)
+fig.show()
+```
+
+## 2D PCA Scatter Plot
+
+In the previous examples, you saw how to visualize high-dimensional PCs. In this example, we show you how to simply visualize the first two principal components of a PCA, by reducing a dataset of 4 dimensions to 2D.
+
+```python
+import plotly.express as px
+from sklearn.decomposition import PCA
+
+df = px.data.iris()
+X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
+
+pca = PCA(n_components=2)
+components = pca.fit_transform(X)
+
+fig = px.scatter(components, x=0, y=1, color=df['species'])
+fig.show()
+```
+
+## Visualize PCA with `px.scatter_3d`
+
+With `px.scatter_3d`, you can visualize an additional dimension, which let you capture even more variance.
+
+```python
+import plotly.express as px
+from sklearn.decomposition import PCA
+
+df = px.data.iris()
+X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
+
+pca = PCA(n_components=3)
+components = pca.fit_transform(X)
+
+total_var = pca.explained_variance_ratio_.sum() * 100
+
+fig = px.scatter_3d(
+    components, x=0, y=1, z=2, color=df['species'],
+    title=f'Total Explained Variance: {total_var:.2f}%',
+    labels={'0': 'PC 1', '1': 'PC 2', '2': 'PC 3'}
+)
+fig.show()
+```
+
+## Plotting explained variance
+
+Often, you might be interested in seeing how much variance PCA is able to explain as you increase the number of components, in order to decide how many dimensions to ultimately keep or analyze. This example shows you how to quickly plot the cumulative sum of explained variance for a high-dimensional dataset like [Diabetes](https://scikit-learn.org/stable/datasets/index.html#diabetes-dataset).
+
+With a higher explained variance, you are able to capture more variability in your dataset, which could potentially lead to better performance when training your model. For a more mathematical explanation, see this [Q&A thread](https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained).
+
+```python
+import plotly.express as px
+import numpy as np
+import pandas as pd
+from sklearn.decomposition import PCA
+from sklearn.datasets import load_diabetes
+
+boston = load_diabetes()
+df = pd.DataFrame(boston.data, columns=boston.feature_names)
+
+pca = PCA()
+pca.fit(df)
+exp_var_cumul = np.cumsum(pca.explained_variance_ratio_)
+
+px.area(
+    x=range(1, exp_var_cumul.shape[0] + 1),
+    y=exp_var_cumul,
+    labels={"x": "# Components", "y": "Explained Variance"}
+)
+```
+
+## Visualize Loadings
+
+It is also possible to visualize loadings using `shapes`, and use `annotations` to indicate which feature a certain loading original belong to. Here, we define loadings as:
+
+$$
+loadings = eigenvectors \cdot \sqrt{eigenvalues}
+$$
+
+For more details about the linear algebra behind eigenvectors and loadings, see this [Q&A thread](https://stats.stackexchange.com/questions/143905/loadings-vs-eigenvectors-in-pca-when-to-use-one-or-another).
+
+```python
+import plotly.express as px
+from sklearn.decomposition import PCA
+from sklearn import datasets
+from sklearn.preprocessing import StandardScaler
+
+df = px.data.iris()
+features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
+X = df[features]
+
+pca = PCA(n_components=2)
+components = pca.fit_transform(X)
+
+loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
+
+fig = px.scatter(components, x=0, y=1, color=df['species'])
+
+for i, feature in enumerate(features):
+    fig.add_shape(
+        type='line',
+        x0=0, y0=0,
+        x1=loadings[i, 0],
+        y1=loadings[i, 1]
+    )
+    fig.add_annotation(
+        x=loadings[i, 0],
+        y=loadings[i, 1],
+        ax=0, ay=0,
+        xanchor="center",
+        yanchor="bottom",
+        text=feature,
+    )
+fig.show()
+```
+
+## References
+
+Learn more about `px`, `px.scatter_3d`, and `px.scatter_matrix` here:
+* https://plot.ly/python/plotly-express/
+* https://plot.ly/python/3d-scatter-plots/
+* https://plot.ly/python/splom/
+
+The following resources offer an in-depth overview of PCA and explained variance:
+* https://en.wikipedia.org/wiki/Explained_variation
+* https://scikit-learn.org/stable/modules/decomposition.html#pca
+* https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues/140579#140579
+* https://stats.stackexchange.com/questions/143905/loadings-vs-eigenvectors-in-pca-when-to-use-one-or-another
+* https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained
diff --git a/doc/python/ml-regression.md b/doc/python/ml-regression.md
new file mode 100644
index 00000000000..42215c7ad9d
--- /dev/null
+++ b/doc/python/ml-regression.md
@@ -0,0 +1,537 @@
+---
+jupyter:
+  jupytext:
+    notebook_metadata_filter: all
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.2'
+      jupytext_version: 1.4.2
+  kernelspec:
+    display_name: Python 3
+    language: python
+    name: python3
+  language_info:
+    codemirror_mode:
+      name: ipython
+      version: 3
+    file_extension: .py
+    mimetype: text/x-python
+    name: python
+    nbconvert_exporter: python
+    pygments_lexer: ipython3
+    version: 3.7.7
+  plotly:
+    description: Visualize regression in scikit-learn with Plotly.
+    display_as: ai_ml
+    language: python
+    layout: base
+    name: ML Regression
+    order: 1
+    page_type: u-guide
+    permalink: python/ml-regression/
+    thumbnail: thumbnail/ml-regression.png
+---
+
+<!-- #region -->
+This page shows how to use Plotly charts for displaying various types of regression models, starting from simple models like [Linear Regression](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html), and progressively move towards models like [Decision Tree][tree] and [Polynomial Features][poly]. We highlight various capabilities of plotly, such as comparative analysis of the same model with different parameters, displaying Latex, [surface plots](https://plotly.com/python/3d-surface-plots/) for 3D data, and enhanced prediction error analysis with [Plotly Express](https://plotly.com/python/plotly-express/).
+
+We will use [Scikit-learn](https://scikit-learn.org/) to split and preprocess our data and train various regression models. Scikit-learn is a popular Machine Learning (ML) library that offers various tools for creating and training ML algorithms, feature engineering, data cleaning, and evaluating and testing models. It was designed to be accessible, and to work seamlessly with popular libraries like NumPy and Pandas.
+
+
+[lasso]: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
+[tree]: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
+[poly]: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
+<!-- #endregion -->
+
+## Basic linear regression plots
+
+In this section, we show you how to apply a simple regression model for predicting tips a server will receive based on various client attributes (such as sex, time of the week, and whether they are a smoker).
+
+We will be using the [Linear Regression][lr], which is a simple model that fit an intercept (the mean tip received by a server), and add a slope for each feature we use, such as the value of the total bill. We show you how to do that with both Plotly Express and Scikit-learn.
+
+### Ordinary Least Square (OLS) with `plotly.express`
+
+This example shows [how to use `plotly.express`'s `trendline` parameter to train a simply Ordinary Least Square (OLS)](/python/linear-fits/) for predicting the tips waiters will receive based on the value of the total bill.
+
+[lr]: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
+
+```python
+import plotly.express as px
+
+df = px.data.tips()
+fig = px.scatter(
+    df, x='total_bill', y='tip', opacity=0.65,
+    trendline='ols', trendline_color_override='darkblue'
+)
+fig.show()
+```
+
+### Linear Regression with scikit-learn
+
+You can also perform the same prediction using scikit-learn's `LinearRegression`.
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+
+df = px.data.tips()
+X = df.total_bill.values.reshape(-1, 1)
+
+model = LinearRegression()
+model.fit(X, df.tip)
+
+x_range = np.linspace(X.min(), X.max(), 100)
+y_range = model.predict(x_range.reshape(-1, 1))
+
+fig = px.scatter(df, x='total_bill', y='tip', opacity=0.65)
+fig.add_traces(go.Scatter(x=x_range, y=y_range, name='Regression Fit'))
+fig.show()
+```
+
+## Model generalization on unseen data
+
+With `go.Scatter`, you can easily color your plot based on a predefined data split. By coloring the training and the testing data points with different colors, you can easily see if whether the model generalizes well to the test data or not.
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import train_test_split
+
+df = px.data.tips()
+X = df.total_bill[:, None]
+X_train, X_test, y_train, y_test = train_test_split(X, df.tip, random_state=0)
+
+model = LinearRegression()
+model.fit(X_train, y_train)
+
+x_range = np.linspace(X.min(), X.max(), 100)
+y_range = model.predict(x_range.reshape(-1, 1))
+
+
+fig = go.Figure([
+    go.Scatter(x=X_train.squeeze(), y=y_train, name='train', mode='markers'),
+    go.Scatter(x=X_test.squeeze(), y=y_test, name='test', mode='markers'),
+    go.Scatter(x=x_range, y=y_range, name='prediction')
+])
+fig.show()
+```
+
+## Comparing different kNN models parameters
+
+In addition to linear regression, it's possible to fit the same data using [k-Nearest Neighbors][knn]. When you perform a prediction on a new sample, this model either takes the weighted or un-weighted average of the neighbors. In order to see the difference between those two averaging options, we train a kNN model with both of those parameters, and we plot them in the same way as the previous graph.
+
+Notice how we can combine scatter points with lines using Plotly.py. You can learn more about [multiple chart types](https://plotly.com/python/graphing-multiple-chart-types/).
+
+[knn]: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.neighbors import KNeighborsRegressor
+
+df = px.data.tips()
+X = df.total_bill.values.reshape(-1, 1)
+x_range = np.linspace(X.min(), X.max(), 100)
+
+# Model #1
+knn_dist = KNeighborsRegressor(10, weights='distance')
+knn_dist.fit(X, df.tip)
+y_dist = knn_dist.predict(x_range.reshape(-1, 1))
+
+# Model #2
+knn_uni = KNeighborsRegressor(10, weights='uniform')
+knn_uni.fit(X, df.tip)
+y_uni = knn_uni.predict(x_range.reshape(-1, 1))
+
+fig = px.scatter(df, x='total_bill', y='tip', color='sex', opacity=0.65)
+fig.add_traces(go.Scatter(x=x_range, y=y_uni, name='Weights: Uniform'))
+fig.add_traces(go.Scatter(x=x_range, y=y_dist, name='Weights: Distance'))
+fig.show()
+```
+
+<!-- #region -->
+## Displaying `PolynomialFeatures` using $\LaTeX$
+
+Notice how linear regression fits a straight line, but kNN can take non-linear shapes. Moreover, it is possible to extend linear regression to polynomial regression by using scikit-learn's `PolynomialFeatures`, which lets you fit a slope for your features raised to the power of `n`, where `n=1,2,3,4` in our example.
+
+
+With Plotly, it's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation. This way, you can see the coefficients that our polynomial regression fitted.
+<!-- #endregion -->
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+from sklearn.preprocessing import PolynomialFeatures
+
+def format_coefs(coefs):
+    equation_list = [f"{coef}x^{i}" for i, coef in enumerate(coefs)]
+    equation = "$" +  " + ".join(equation_list) + "$"
+
+    replace_map = {"x^0": "", "x^1": "x", '+ -': '- '}
+    for old, new in replace_map.items():
+        equation = equation.replace(old, new)
+
+    return equation
+
+df = px.data.tips()
+X = df.total_bill.values.reshape(-1, 1)
+x_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
+
+fig = px.scatter(df, x='total_bill', y='tip', opacity=0.65)
+for degree in [1, 2, 3, 4]:
+    poly = PolynomialFeatures(degree)
+    poly.fit(X)
+    X_poly = poly.transform(X)
+    x_range_poly = poly.transform(x_range)
+
+    model = LinearRegression(fit_intercept=False)
+    model.fit(X_poly, df.tip)
+    y_poly = model.predict(x_range_poly)
+
+    equation = format_coefs(model.coef_.round(2))
+    fig.add_traces(go.Scatter(x=x_range.squeeze(), y=y_poly, name=equation))
+
+fig.show()
+```
+
+## 3D regression surface with `px.scatter_3d` and `go.Surface`
+
+Visualize the decision plane of your model whenever you have more than one variable in your input data. Here, we will use [`sklearn.svm.SVR`](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html), which is a Support Vector Machine (SVM) model specifically designed for regression.
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.svm import SVR
+
+mesh_size = .02
+margin = 0
+
+df = px.data.iris()
+
+X = df[['sepal_width', 'sepal_length']]
+y = df['petal_width']
+
+# Condition the model on sepal width and length, predict the petal width
+model = SVR(C=1.)
+model.fit(X, y)
+
+# Create a mesh grid on which we will run our model
+x_min, x_max = X.sepal_width.min() - margin, X.sepal_width.max() + margin
+y_min, y_max = X.sepal_length.min() - margin, X.sepal_length.max() + margin
+xrange = np.arange(x_min, x_max, mesh_size)
+yrange = np.arange(y_min, y_max, mesh_size)
+xx, yy = np.meshgrid(xrange, yrange)
+
+# Run model
+pred = model.predict(np.c_[xx.ravel(), yy.ravel()])
+pred = pred.reshape(xx.shape)
+
+# Generate the plot
+fig = px.scatter_3d(df, x='sepal_width', y='sepal_length', z='petal_width')
+fig.update_traces(marker=dict(size=5))
+fig.add_traces(go.Surface(x=xrange, y=yrange, z=pred, name='pred_surface'))
+fig.show()
+```
+
+## Visualizing coefficients for multiple linear regression (MLR)
+
+Visualizing regression with one or two variables is straightforward, since we can respectively plot them with scatter plots and 3D scatter plots. Moreover, if you have more than 2 features, you will need to find alternative ways to visualize your data.
+
+One way is to use [bar charts](https://plotly.com/python/bar-charts/). In our example, each bar indicates the coefficients of our linear regression model for each input feature. Our model was trained on the [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris).
+
+```python
+import pandas as pd
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+
+df = px.data.iris()
+
+X = df.drop(columns=['petal_width', 'species_id'])
+X = pd.get_dummies(X, columns=['species'], prefix_sep='=')
+y = df['petal_width']
+
+model = LinearRegression()
+model.fit(X, y)
+
+colors = ['Positive' if c > 0 else 'Negative' for c in model.coef_]
+
+fig = px.bar(
+    x=X.columns, y=model.coef_, color=colors,
+    color_discrete_sequence=['red', 'blue'],
+    labels=dict(x='Feature', y='Linear coefficient'),
+    title='Weight of each feature for predicting petal width'
+)
+fig.show()
+```
+
+## Prediction Error Plots
+
+When you are working with very high-dimensional data, it is inconvenient to plot every dimension with your output `y`. Instead, you can use methods such as prediction error plots, which let you visualize how well your model does compared to the ground truth.
+
+
+### Simple actual vs predicted plot
+
+This example shows you the simplest way to compare the predicted output vs. the actual output. A good model will have most of the scatter dots near the diagonal black line.
+
+```python
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+
+df = px.data.iris()
+X = df[['sepal_width', 'sepal_length']]
+y = df['petal_width']
+
+# Condition the model on sepal width and length, predict the petal width
+model = LinearRegression()
+model.fit(X, y)
+y_pred = model.predict(X)
+
+fig = px.scatter(x=y, y=y_pred, labels={'x': 'ground truth', 'y': 'prediction'})
+fig.add_shape(
+    type="line", line=dict(dash='dash'),
+    x0=y.min(), y0=y.min(),
+    x1=y.max(), y1=y.max()
+)
+fig.show()
+```
+
+### Enhanced prediction error analysis using `plotly.express`
+
+Add marginal histograms to quickly diagnoses any prediction bias your model might have. The built-in `OLS` functionality let you visualize how well your model generalizes by comparing it with the theoretical optimal fit (black dotted line).
+
+```python
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import train_test_split
+
+df = px.data.iris()
+
+# Split data into training and test splits
+train_idx, test_idx = train_test_split(df.index, test_size=.25, random_state=0)
+df['split'] = 'train'
+df.loc[test_idx, 'split'] = 'test'
+
+X = df[['sepal_width', 'sepal_length']]
+y = df['petal_width']
+X_train = df.loc[train_idx, ['sepal_width', 'sepal_length']]
+y_train = df.loc[train_idx, 'petal_width']
+
+# Condition the model on sepal width and length, predict the petal width
+model = LinearRegression()
+model.fit(X_train, y_train)
+df['prediction'] = model.predict(X)
+
+fig = px.scatter(
+    df, x='petal_width', y='prediction',
+    marginal_x='histogram', marginal_y='histogram',
+    color='split', trendline='ols'
+)
+fig.update_traces(histnorm='probability', selector={'type':'histogram'})
+fig.add_shape(
+    type="line", line=dict(dash='dash'),
+    x0=y.min(), y0=y.min(),
+    x1=y.max(), y1=y.max()
+)
+
+fig.show()
+```
+
+## Residual plots
+
+Just like prediction error plots, it's easy to visualize your prediction residuals in just a few lines of codes using `plotly.express` built-in capabilities.
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import train_test_split
+
+df = px.data.iris()
+
+# Split data into training and test splits
+train_idx, test_idx = train_test_split(df.index, test_size=.25, random_state=0)
+df['split'] = 'train'
+df.loc[test_idx, 'split'] = 'test'
+
+X = df[['sepal_width', 'sepal_length']]
+X_train = df.loc[train_idx, ['sepal_width', 'sepal_length']]
+y_train = df.loc[train_idx, 'petal_width']
+
+# Condition the model on sepal width and length, predict the petal width
+model = LinearRegression()
+model.fit(X_train, y_train)
+df['prediction'] = model.predict(X)
+df['residual'] = df['prediction'] - df['petal_width']
+
+fig = px.scatter(
+    df, x='prediction', y='residual',
+    marginal_y='violin',
+    color='split', trendline='ols'
+)
+fig.show()
+```
+
+## Visualize regularization across cross-validation folds
+
+
+In this example, we show how to plot the results of various $\alpha$ penalization values from the results of cross-validation using scikit-learn's `LassoCV`. This is useful to see how much the error of the optimal alpha actually varies across CV folds.
+
+```python
+import numpy as np
+import pandas as pd
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LassoCV
+
+N_FOLD = 6
+
+# Load and preprocess the data
+df = px.data.gapminder()
+X = df.drop(columns=['lifeExp', 'iso_num'])
+X = pd.get_dummies(X, columns=['country', 'continent', 'iso_alpha'])
+y = df['lifeExp']
+
+# Train model to predict life expectancy
+model = LassoCV(cv=N_FOLD, normalize=True)
+model.fit(X, y)
+mean_alphas = model.mse_path_.mean(axis=-1)
+
+fig = go.Figure([
+    go.Scatter(
+        x=model.alphas_, y=model.mse_path_[:, i],
+        name=f"Fold: {i+1}", opacity=.5, line=dict(dash='dash'),
+        hovertemplate="alpha: %{x} <br>MSE: %{y}"
+    )
+    for i in range(N_FOLD)
+])
+fig.add_traces(go.Scatter(
+    x=model.alphas_, y=mean_alphas,
+    name='Mean', line=dict(color='black', width=3),
+    hovertemplate="alpha: %{x} <br>MSE: %{y}",
+))
+
+fig.add_shape(
+    type="line", line=dict(dash='dash'),
+    x0=model.alpha_, y0=0,
+    x1=model.alpha_, y1=1,
+    yref='paper'
+)
+
+fig.update_layout(
+    xaxis_title='alpha',
+    xaxis_type="log",
+    yaxis_title="Mean Square Error (MSE)"
+)
+fig.show()
+```
+
+## Grid search visualization using `px.density_heatmap` and `px.box`
+
+In this example, we show how to visualize the results of a grid search on a `DecisionTreeRegressor`. The first plot shows how to visualize the score of each model parameter on individual splits (grouped using facets). The second plot aggregates the results of all splits such that each box represents a single model.
+
+```python
+import numpy as np
+import pandas as pd
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.model_selection import GridSearchCV
+from sklearn.tree import DecisionTreeRegressor
+
+N_FOLD = 6
+
+# Load and shuffle dataframe
+df = px.data.iris()
+df = df.sample(frac=1, random_state=0)
+
+X = df[['sepal_width', 'sepal_length']]
+y = df['petal_width']
+
+# Define and fit the grid
+model = DecisionTreeRegressor()
+param_grid = {
+    'criterion': ['mse', 'friedman_mse', 'mae'],
+    'max_depth': range(2, 5)
+}
+grid = GridSearchCV(model, param_grid, cv=N_FOLD)
+grid.fit(X, y)
+grid_df = pd.DataFrame(grid.cv_results_)
+
+# Convert the wide format of the grid into the long format
+# accepted by plotly.express
+melted = (
+    grid_df
+    .rename(columns=lambda col: col.replace('param_', ''))
+    .melt(
+        value_vars=[f'split{i}_test_score' for i in range(N_FOLD)],
+        id_vars=['mean_test_score', 'mean_fit_time', 'criterion', 'max_depth'],
+        var_name="cv_split",
+        value_name="r_squared"
+    )
+)
+
+# Format the variable names for simplicity
+melted['cv_split'] = (
+    melted['cv_split']
+    .str.replace('_test_score', '')
+    .str.replace('split', '')
+)
+
+# Single function call to plot each figure
+fig_hmap = px.density_heatmap(
+    melted, x="max_depth", y='criterion',
+    histfunc="sum", z="r_squared",
+    title='Grid search results on individual fold',
+    hover_data=['mean_fit_time'],
+    facet_col="cv_split", facet_col_wrap=3,
+    labels={'mean_test_score': "mean_r_squared"}
+)
+
+fig_box = px.box(
+    melted, x='max_depth', y='r_squared',
+    title='Grid search results ',
+    hover_data=['mean_fit_time'],
+    points='all',
+    color="criterion",
+    hover_name='cv_split',
+    labels={'mean_test_score': "mean_r_squared"}
+)
+
+# Display
+fig_hmap.show()
+fig_box.show()
+```
+
+### Reference
+
+Learn more about the `px` figures used in this tutorial:
+* Plotly Express: https://plot.ly/python/plotly-express/
+* Vertical Lines: https://plot.ly/python/shapes/
+* Heatmaps: https://plot.ly/python/heatmaps/
+* Box Plots: https://plot.ly/python/box-plots/
+* 3D Scatter: https://plot.ly/python/3d-scatter-plots/
+* Surface Plots: https://plot.ly/python/3d-surface-plots/
+
+Learn more about the Machine Learning models used in this tutorial:
+* https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
+* https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
+* https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
+* https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
+* https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
+
+Other tutorials that inspired this notebook:
+* https://seaborn.pydata.org/examples/residplot.html
+* https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html
+* http://www.scikit-yb.org/zh/latest/api/regressor/peplot.html
diff --git a/doc/python/ml-roc-pr.md b/doc/python/ml-roc-pr.md
new file mode 100644
index 00000000000..eced1074109
--- /dev/null
+++ b/doc/python/ml-roc-pr.md
@@ -0,0 +1,275 @@
+---
+jupyter:
+  jupytext:
+    notebook_metadata_filter: all
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.2'
+      jupytext_version: 1.4.2
+  kernelspec:
+    display_name: Python 3
+    language: python
+    name: python3
+  language_info:
+    codemirror_mode:
+      name: ipython
+      version: 3
+    file_extension: .py
+    mimetype: text/x-python
+    name: python
+    nbconvert_exporter: python
+    pygments_lexer: ipython3
+    version: 3.7.7
+  plotly:
+    description: Interpret the results of your classification using Receiver Operating
+      Characteristics (ROC) and Precision-Recall (PR) Curves in Python with Plotly.
+    display_as: ai_ml
+    language: python
+    layout: base
+    name: ROC and PR Curves
+    order: 3
+    page_type: u-guide
+    permalink: python/roc-and-pr-curves/
+    thumbnail: thumbnail/ml-roc-pr.png
+---
+
+## Preliminary plots
+
+Before diving into the receiver operating characteristic (ROC) curve, we will look at two plots that will give some context to the thresholds mechanism behind the ROC and PR curves.
+
+In the histogram, we observe that the score spread such that most of the positive labels are binned near 1, and a lot of the negative labels are close to 0. When we set a threshold on the score, all of the bins to its left will be classified as 0's, and everything to the right will be 1's. There are obviously a few outliers, such as **negative** samples that our model gave a high score, and *positive* samples with a low score. If we set a threshold right in the middle, those outliers will respectively become **false positives** and *false negatives*.
+
+As we adjust thresholds, the number of positive positives will increase or decrease, and at the same time the number of true positives will also change; this is shown in the second plot. As you can see, the model seems to perform fairly well, because the true positive rate decreases slowly, whereas the false positive rate decreases sharply as we increase the threshold. Those two lines each represent a dimension of the ROC curve.
+
+```python
+import plotly.express as px
+import pandas as pd
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import roc_curve, auc
+from sklearn.datasets import make_classification
+
+X, y = make_classification(n_samples=500, random_state=0)
+
+model = LogisticRegression()
+model.fit(X, y)
+y_score = model.predict_proba(X)[:, 1]
+fpr, tpr, thresholds = roc_curve(y, y_score)
+
+# The histogram of scores compared to true labels
+fig_hist = px.histogram(
+    x=y_score, color=y, nbins=50,
+    labels=dict(color='True Labels', x='Score')
+)
+
+fig_hist.show()
+
+
+# Evaluating model performance at various thresholds
+df = pd.DataFrame({
+    'False Positive Rate': fpr,
+    'True Positive Rate': tpr
+}, index=thresholds)
+df.index.name = "Thresholds"
+df.columns.name = "Rate"
+
+fig_thresh = px.line(
+    df, title='TPR and FPR at every threshold',
+    width=700, height=500
+)
+
+fig_thresh.update_yaxes(scaleanchor="x", scaleratio=1)
+fig_thresh.update_xaxes(range=[0, 1], constrain='domain')
+fig_thresh.show()
+```
+
+## Basic binary ROC curve
+
+Notice how this ROC curve looks similar to the True Positive Rate curve from the previous plot. This is because they are the same curve, except the x-axis consists of increasing values of FPR instead of threshold, which is why the line is flipped and distorted.
+
+We also display the area under the ROC curve (ROC AUC), which is fairly high, thus consistent with our intepretation of the previous plots.
+
+```python
+import plotly.express as px
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import roc_curve, auc
+from sklearn.datasets import make_classification
+
+X, y = make_classification(n_samples=500, random_state=0)
+
+model = LogisticRegression()
+model.fit(X, y)
+y_score = model.predict_proba(X)[:, 1]
+
+fpr, tpr, thresholds = roc_curve(y, y_score)
+
+fig = px.area(
+    x=fpr, y=tpr,
+    title=f'ROC Curve (AUC={auc(fpr, tpr):.4f})',
+    labels=dict(x='False Positive Rate', y='True Positive Rate'),
+    width=700, height=500
+)
+fig.add_shape(
+    type='line', line=dict(dash='dash'),
+    x0=0, x1=1, y0=0, y1=1
+)
+
+fig.update_yaxes(scaleanchor="x", scaleratio=1)
+fig.update_xaxes(constrain='domain')
+fig.show()
+```
+
+## Multiclass ROC Curve
+
+When you have more than 2 classes, you will need to plot the ROC curve for each class separately. Make sure that you use a [one-versus-rest](https://scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest) model, or make sure that your problem has a [multi-label](https://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification-format) format; otherwise, your ROC curve might not return the expected results.
+
+```python
+import plotly.graph_objects as go
+import plotly.express as px
+import numpy as np
+import pandas as pd
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import roc_curve, roc_auc_score
+
+np.random.seed(0)
+
+# Artificially add noise to make task harder
+df = px.data.iris()
+samples = df.species.sample(n=50, random_state=0)
+np.random.shuffle(samples.values)
+df.loc[samples.index, 'species'] = samples.values
+
+# Define the inputs and outputs
+X = df.drop(columns=['species', 'species_id'])
+y = df['species']
+y_onehot = pd.get_dummies(y, columns=model.classes_)
+
+# Fit the model
+model = LogisticRegression(max_iter=200)
+model.fit(X, y)
+y_scores = model.predict_proba(X)
+
+# Create an empty figure, and iteratively add new lines
+# every time we compute a new class
+fig = go.Figure()
+fig.add_shape(
+    type='line', line=dict(dash='dash'),
+    x0=0, x1=1, y0=0, y1=1
+)
+
+for i in range(y_scores.shape[1]):
+    y_true = y_onehot.iloc[:, i]
+    y_score = y_scores[:, i]
+
+    fpr, tpr, _ = roc_curve(y_true, y_score)
+    auc_score = roc_auc_score(y_true, y_score)
+
+    name = f"{y_onehot.columns[i]} (AUC={auc_score:.2f})"
+    fig.add_trace(go.Scatter(x=fpr, y=tpr, name=name, mode='lines'))
+
+fig.update_layout(
+    xaxis_title='False Positive Rate',
+    yaxis_title='True Positive Rate',
+    yaxis=dict(scaleanchor="x", scaleratio=1),
+    xaxis=dict(constrain='domain'),
+    width=700, height=500
+)
+fig.show()
+```
+
+## Precision-Recall Curves
+
+Plotting the PR curve is very similar to plotting the ROC curve. The following examples are slightly modified from the previous examples:
+
+```python
+import plotly.express as px
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import precision_recall_curve, auc
+from sklearn.datasets import make_classification
+
+X, y = make_classification(n_samples=500, random_state=0)
+
+model = LogisticRegression()
+model.fit(X, y)
+y_score = model.predict_proba(X)[:, 1]
+
+precision, recall, thresholds = precision_recall_curve(y, y_score)
+
+fig = px.area(
+    x=recall, y=precision,
+    title=f'Precision-Recall Curve (AUC={auc(fpr, tpr):.4f})',
+    labels=dict(x='Recall', y='Precision'),
+    width=700, height=500
+)
+fig.add_shape(
+    type='line', line=dict(dash='dash'),
+    x0=0, x1=1, y0=1, y1=0
+)
+fig.update_yaxes(scaleanchor="x", scaleratio=1)
+fig.update_xaxes(constrain='domain')
+
+fig.show()
+```
+
+In this example, we use the [average precision](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html) metric, which is an alternative scoring method to the area under the PR curve.
+
+```python
+import plotly.graph_objects as go
+import plotly.express as px
+import numpy as np
+import pandas as pd
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import precision_recall_curve, average_precision_score
+
+np.random.seed(0)
+
+# Artificially add noise to make task harder
+df = px.data.iris()
+samples = df.species.sample(n=30, random_state=0)
+np.random.shuffle(samples.values)
+df.loc[samples.index, 'species'] = samples.values
+
+# Define the inputs and outputs
+X = df.drop(columns=['species', 'species_id'])
+y = df['species']
+y_onehot = pd.get_dummies(y, columns=model.classes_)
+
+# Fit the model
+model = LogisticRegression(max_iter=200)
+model.fit(X, y)
+y_scores = model.predict_proba(X)
+
+# Create an empty figure, and iteratively add new lines
+# every time we compute a new class
+fig = go.Figure()
+fig.add_shape(
+    type='line', line=dict(dash='dash'),
+    x0=0, x1=1, y0=1, y1=0
+)
+
+for i in range(y_scores.shape[1]):
+    y_true = y_onehot.iloc[:, i]
+    y_score = y_scores[:, i]
+
+    precision, recall, _ = precision_recall_curve(y_true, y_score)
+    auc_score = average_precision_score(y_true, y_score)
+
+    name = f"{y_onehot.columns[i]} (AP={auc_score:.2f})"
+    fig.add_trace(go.Scatter(x=recall, y=precision, name=name, mode='lines'))
+
+fig.update_layout(
+    xaxis_title='Recall',
+    yaxis_title='Precision',
+    yaxis=dict(scaleanchor="x", scaleratio=1),
+    xaxis=dict(constrain='domain'),
+    width=700, height=500
+)
+fig.show()
+```
+
+## References
+
+Learn more about `px`, `px.area`, `px.hist`:
+* https://plot.ly/python/histograms/
+* https://plot.ly/python/filled-area-plots/
+* https://plot.ly/python/line-charts/
diff --git a/doc/python/ml-tsne-umap-projections.md b/doc/python/ml-tsne-umap-projections.md
new file mode 100644
index 00000000000..26ca99d51b4
--- /dev/null
+++ b/doc/python/ml-tsne-umap-projections.md
@@ -0,0 +1,179 @@
+---
+jupyter:
+  jupytext:
+    notebook_metadata_filter: all
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.1'
+      jupytext_version: 1.1.1
+  kernelspec:
+    display_name: Python 3
+    language: python
+    name: python3
+  language_info:
+    codemirror_mode:
+      name: ipython
+      version: 3
+    file_extension: .py
+    mimetype: text/x-python
+    name: python
+    nbconvert_exporter: python
+    pygments_lexer: ipython3
+    version: 3.7.7
+  plotly:
+    description: Visualize scikit-learn's t-SNE and UMAP in Python with Plotly.
+    display_as: ai_ml
+    language: python
+    layout: base
+    name: t-SNE and UMAP projections
+    order: 5
+    page_type: u-guide
+    permalink: python/t-sne-and-umap-projections/
+    thumbnail: thumbnail/tsne-umap-projections.png
+---
+
+This page presents various ways to visualize two popular dimensionality reduction techniques, namely the [t-distributed stochastic neighbor embedding](https://lvdmaaten.github.io/tsne/) (t-SNE) and [Uniform Manifold Approximation and Projection](https://umap-learn.readthedocs.io/en/latest/index.html) (UMAP). They are needed whenever you want to visualize data with more than two or three features (i.e. dimensions).
+
+We first show how to visualize data with more than three features using the [scatter plot matrix](https://medium.com/plotly/what-is-a-splom-chart-make-scatterplot-matrices-in-python-8dc4998921c3), then we apply dimensionality reduction techniques to get 2D/3D representation of our data, and visualize the results with [scatter plots](https://plotly.com/python/line-and-scatter/) and [3D scatter plots](https://plotly.com/python/3d-scatter-plots/).
+
+
+## Basic t-SNE projections
+
+t-SNE is a popular dimensionality reduction algorithm that arises from probability theory. Simply put, it projects the high-dimensional data points (sometimes with hundreds of features) into 2D/3D by inducing the projected data to have a similar distribution as the original data points by minimizing something called the [KL divergence](https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8).
+
+Compared to a method like Principal Component Analysis (PCA), it takes signficantly more time to converge, but present signficiantly better insights when visualized. For example, by projecting features of a flowers, it will be able to distinctly group
+
+
+### Visualizing high-dimensional data with `px.scatter_matrix`
+
+First, let's try to visualize every feature of the [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris), and color everything by the species. We will use the Scatter Plot Matrix ([splom](https://plotly.com/python/splom/)), which lets us plot each feature against everything else, which is convenient when your dataset has more than 3 dimensions.
+
+```python
+import plotly.express as px
+
+df = px.data.iris()
+features = ["sepal_width", "sepal_length", "petal_width", "petal_length"]
+fig = px.scatter_matrix(df, dimensions=features, color="species")
+fig.show()
+```
+
+### Project data into 2D with t-SNE and `px.scatter`
+
+Now, let's use the t-SNE algorithm to project the data shown above into two dimensions. Notice how each of the species is physically separate from each other.
+
+```python
+from sklearn.manifold import TSNE
+import plotly.express as px
+
+df = px.data.iris()
+
+features = df.loc[:, :'petal_width']
+
+tsne = TSNE(n_components=2, random_state=0)
+projections = tsne.fit_transform(features)
+
+fig = px.scatter(
+    projections, x=0, y=1,
+    color=df.species, labels={'color': 'species'}
+)
+fig.show()
+```
+
+### Project data into 3D with t-SNE and `px.scatter_3d`
+
+t-SNE can reduce your data to any number of dimensions you want! Here, we show you how to project it to 3D and visualize with a 3D scatter plot.
+
+```python
+from sklearn.manifold import TSNE
+import plotly.express as px
+
+df = px.data.iris()
+
+features = df.loc[:, :'petal_width']
+
+tsne = TSNE(n_components=3, random_state=0)
+projections = tsne.fit_transform(features, )
+
+fig = px.scatter_3d(
+    projections, x=0, y=1, z=2,
+    color=df.species, labels={'color': 'species'}
+)
+fig.update_traces(marker_size=8)
+fig.show()
+```
+
+## Projections with UMAP
+
+Just like t-SNE, [UMAP](https://umap-learn.readthedocs.io/en/latest/index.html) is a dimensionality reduction specifically designed for visualizing complex data in low dimensions (2D or 3D). As the number of data points increase, [UMAP becomes more time efficient](https://umap-learn.readthedocs.io/en/latest/benchmarking.html) compared to TSNE.
+
+In the example below, we see how easy it is to use UMAP as a drop-in replacement for scikit-learn's `manifold.TSNE`.
+
+```python
+from umap import UMAP
+import plotly.express as px
+
+df = px.data.iris()
+
+features = df.loc[:, :'petal_width']
+
+umap_2d = UMAP(n_components=2, init='random', random_state=0)
+umap_3d = UMAP(n_components=3, init='random', random_state=0)
+
+proj_2d = umap_2d.fit_transform(features)
+proj_3d = umap_3d.fit_transform(features)
+
+fig_2d = px.scatter(
+    proj_2d, x=0, y=1,
+    color=df.species, labels={'color': 'species'}
+)
+fig_3d = px.scatter_3d(
+    proj_3d, x=0, y=1, z=2,
+    color=df.species, labels={'color': 'species'}
+)
+fig_3d.update_traces(marker_size=5)
+
+fig_2d.show()
+fig_3d.show()
+```
+
+## Visualizing image datasets
+
+In the following example, we show how to visualize large image datasets using UMAP. Here, we use [`load_digits`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), a subset of the famous [MNIST dataset](http://yann.lecun.com/exdb/mnist/) that was downsized to 8x8 and flattened to 64 dimensions.
+
+Although there's over 1000 data points, and many more dimensions than the previous example, it is still extremely fast. This is because UMAP is optimized for speed, both from a theoretical perspective, and in the way it is implemented. Learn more in [this comparison post](https://umap-learn.readthedocs.io/en/latest/benchmarking.html).
+
+```python
+import plotly.express as px
+from sklearn.datasets import load_digits
+from umap import UMAP
+
+digits = load_digits()
+
+umap_2d = UMAP(random_state=0)
+umap_2d.fit(digits.data)
+
+projections = umap_2d.transform(digits.data)
+
+fig = px.scatter(
+    projections, x=0, y=1,
+    color=digits.target.astype(str), labels={'color': 'digit'}
+)
+fig.show()
+```
+
+<!-- #region -->
+## Reference
+
+Plotly figures:
+* https://plotly.com/python/line-and-scatter/
+* https://plotly.com/python/3d-scatter-plots/
+* https://plotly.com/python/splom/
+
+
+Details about algorithms:
+* UMAP library: https://umap-learn.readthedocs.io/en/latest/
+* t-SNE User guide: https://scikit-learn.org/stable/modules/manifold.html#t-sne
+* t-SNE paper: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
+* MNIST: http://yann.lecun.com/exdb/mnist/
+<!-- #endregion -->
diff --git a/doc/requirements.txt b/doc/requirements.txt
index 51c56a393a1..63860785d93 100644
--- a/doc/requirements.txt
+++ b/doc/requirements.txt
@@ -17,6 +17,7 @@ requests
 networkx
 squarify
 scikit-image
+scikit-learn
 sphinx
 sphinx_bootstrap_theme
 recommonmark
@@ -25,3 +26,4 @@ python-frontmatter
 datashader
 pyarrow 
 cufflinks==0.17.3
+umap-learn
\ No newline at end of file