scikit-learn · GaelVaroquaux · Jan 22, 2018 · Jul 15, 2017 · Jul 15, 2017 · Jul 15, 2017
diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
@@ -567,30 +567,24 @@ considers at each step all the possible merges.
    number of features. It is a dimensionality reduction tool, see
    :ref:`data_reduction`.
 
-Different linkage type: Ward, complete and average linkage
------------------------------------------------------------
+Different linkage type: Ward, complete, average, and single linkage
+-----------------------------------------------------------------)-
 
-:class:`AgglomerativeClustering` supports Ward, average, and complete
+:class:`AgglomerativeClustering` supports Ward, single, average, and complete
 linkage strategies.
 
-.. image:: ../auto_examples/cluster/images/sphx_glr_plot_digits_linkage_001.png
-    :target: ../auto_examples/cluster/plot_digits_linkage.html
+.. image:: ../auto_examples/cluster/images/sphx_glr_plot_linkage_comparison_001.png
+    :target: ../auto_examples/cluster/plot_linkage_comparison.html
     :scale: 43
 
-.. image:: ../auto_examples/cluster/images/sphx_glr_plot_digits_linkage_002.png
-    :target: ../auto_examples/cluster/plot_digits_linkage.html
-    :scale: 43
-
-.. image:: ../auto_examples/cluster/images/sphx_glr_plot_digits_linkage_003.png
-    :target: ../auto_examples/cluster/plot_digits_linkage.html
-    :scale: 43
-
-
 Agglomerative cluster has a "rich get richer" behavior that leads to
-uneven cluster sizes. In this regard, complete linkage is the worst
+uneven cluster sizes. In this regard, single linkage is the worst
 strategy, and Ward gives the most regular sizes. However, the affinity
 (or distance used in clustering) cannot be varied with Ward, thus for non
-Euclidean metrics, average linkage is a good alternative.
+Euclidean metrics, average linkage is a good alternative. Single linkage,
+while not robust to noisy data, can be computed very efficiently and can
+therefore be useful to provide hierarchical clustering of larger datasets.
+Single linkage can also perform well on non-globular data.
 
 .. topic:: Examples:
 
@@ -652,15 +646,16 @@ enable only merging of neighboring pixels on an image, as in the
 
  * :ref:`sphx_glr_auto_examples_cluster_plot_agglomerative_clustering.py`
 
-.. warning:: **Connectivity constraints with average and complete linkage**
+.. warning:: **Connectivity constraints with single, average and complete linkage**
 
-    Connectivity constraints and complete or average linkage can enhance
+    Connectivity constraints and single, complete or average linkage can enhance
     the 'rich getting richer' aspect of agglomerative clustering,
     particularly so if they are built with
     :func:`sklearn.neighbors.kneighbors_graph`. In the limit of a small
     number of clusters, they tend to give a few macroscopically occupied
     clusters and almost empty ones. (see the discussion in
     :ref:`sphx_glr_auto_examples_cluster_plot_agglomerative_clustering.py`).
+    Single linkage is the most brittle linkage option with regard to this issue.
 
 .. image:: ../auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_001.png
     :target: ../auto_examples/cluster/plot_agglomerative_clustering.html
@@ -682,7 +677,7 @@ enable only merging of neighboring pixels on an image, as in the
 Varying the metric
 -------------------
 
-Average and complete linkage can be used with a variety of distances (or
+Single, average and complete linkage can be used with a variety of distances (or
 affinities), in particular Euclidean distance (*l2*), Manhattan distance
 (or Cityblock, or *l1*), cosine distance, or any precomputed affinity
 matrix.

diff --git a/doc/whats_new/v0.20.rst b/doc/whats_new/v0.20.rst
@@ -78,14 +78,15 @@ Model evaluation
 - Added the :func:`metrics.balanced_accuracy_score` metric and a corresponding
   ``'balanced_accuracy'`` scorer for binary classification.
   :issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`.
-
 - Added :class:`multioutput.RegressorChain` for multi-target
   regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`.
 
-- Added the :class:`preprocessing.TransformedTargetRegressor` which transforms
-  the target y before fitting a regression model. The predictions are mapped
-  back to the original space via an inverse transform. :issue:`9041` by
-  `Andreas Müller`_ and :user:`Guillaume Lemaitre <glemaitre>`.
+Clustering
+
+- :class:`cluster.AgglomerativeClustering` now supports Single Linkage
+  clustering via ``linkage='single'``. :issue:`9372` by
+  :user:`Leland McInnes <lmcinnes>` and :user:`Steve Astels <sastels>`.
+
 
 Enhancements
 ............

diff --git a/examples/cluster/plot_agglomerative_clustering.py b/examples/cluster/plot_agglomerative_clustering.py
@@ -9,17 +9,18 @@
 Two consequences of imposing a connectivity can be seen. First clustering
 with a connectivity matrix is much faster.
 
-Second, when using a connectivity matrix, average and complete linkage are
-unstable and tend to create a few clusters that grow very quickly. Indeed,
-average and complete linkage fight this percolation behavior by considering all
-the distances between two clusters when merging them. The connectivity
-graph breaks this mechanism. This effect is more pronounced for very
-sparse graphs (try decreasing the number of neighbors in
-kneighbors_graph) and with complete linkage. In particular, having a very
-small number of neighbors in the graph, imposes a geometry that is
-close to that of single linkage, which is well known to have this
-percolation instability.
-"""
+Second, when using a connectivity matrix, single, average and complete
+linkage are unstable and tend to create a few clusters that grow very
+quickly. Indeed, average and complete linkage fight this percolation behavior
+by considering all the distances between two clusters when merging them (
+while single linkage exaggerates the behaviour by considering only the
+shortest distance between clusters). The connectivity graph breaks this
+mechanism for average and complete linkage, making them resemble the more
+brittle single linkage. This effect is more pronounced for very sparse graphs
+(try decreasing the number of neighbors in kneighbors_graph) and with
+complete linkage. In particular, having a very small number of neighbors in
+the graph, imposes a geometry that is close to that of single linkage,
+which is well known to have this percolation instability. """
 # Authors: Gael Varoquaux, Nelle Varoquaux
 # License: BSD 3 clause
 
@@ -52,8 +53,11 @@
 for connectivity in (None, knn_graph):
     for n_clusters in (30, 3):
         plt.figure(figsize=(10, 4))
-        for index, linkage in enumerate(('average', 'complete', 'ward')):
-            plt.subplot(1, 3, index + 1)
+        for index, linkage in enumerate(('average',
+                                         'complete',
+                                         'ward',
+                                         'single')):
+            plt.subplot(1, 4, index + 1)
             model = AgglomerativeClustering(linkage=linkage,
                                             connectivity=connectivity,
                                             n_clusters=n_clusters)
@@ -62,7 +66,7 @@
             elapsed_time = time.time() - t0
             plt.scatter(X[:, 0], X[:, 1], c=model.labels_,
                         cmap=plt.cm.spectral)
-            plt.title('linkage=%s (time %.2fs)' % (linkage, elapsed_time),
+            plt.title('linkage=%s\n(time %.2fs)' % (linkage, elapsed_time),
                       fontdict=dict(verticalalignment='top'))
             plt.axis('equal')
             plt.axis('off')

diff --git a/examples/cluster/plot_digits_linkage.py b/examples/cluster/plot_digits_linkage.py
@@ -12,8 +12,10 @@
 
 What this example shows us is the behavior "rich getting richer" of
 agglomerative clustering that tends to create uneven cluster sizes.
-This behavior is especially pronounced for the average linkage strategy,
-that ends up with a couple of singleton clusters.
+This behavior is pronounced for the average linkage strategy,
+that ends up with a couple of singleton clusters, while in the case
+of single linkage we get a single central cluster with all other clusters
+being drawn from noise points around the fringes.
 """
 
 # Authors: Gael Varoquaux
@@ -69,7 +71,7 @@ def plot_clustering(X_red, X, labels, title=None):
     if title is not None:
         plt.title(title, size=17)
     plt.axis('off')
-    plt.tight_layout()
+    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
 
 #----------------------------------------------------------------------
 # 2D embedding of the digits dataset
@@ -79,11 +81,11 @@ def plot_clustering(X_red, X, labels, title=None):
 
 from sklearn.cluster import AgglomerativeClustering
 
-for linkage in ('ward', 'average', 'complete'):
+for linkage in ('ward', 'average', 'complete', 'single'):
     clustering = AgglomerativeClustering(linkage=linkage, n_clusters=10)
     t0 = time()
     clustering.fit(X_red)
-    print("%s : %.2fs" % (linkage, time() - t0))
+    print("%s :\t%.2fs" % (linkage, time() - t0))
 
     plot_clustering(X_red, X, clustering.labels_, "%s linkage" % linkage)
 

diff --git a/examples/cluster/plot_linkage_comparison.py b/examples/cluster/plot_linkage_comparison.py
@@ -0,0 +1,149 @@
+"""
+================================================================
+Comparing different hierarchical linkage methods on toy datasets
+================================================================
+
+This example shows characteristics of different linkage
+methods for hierarchical clustering on datasets that are
+"interesting" but still in 2D.
+
+The main observations to make are:
+
+- single linkage is fast, and can perform well on
+  non-globular data, but it performs poorly in the
+  presence of noise.
+- average and complete linkage perform well on
+  cleanly separated globular clusters, but have mixed
+  results otherwise.
+- Ward is the most effective method for noisy data.
+
+While these examples give some intuition about the
+algorithms, this intuition might not apply to very high
+dimensional data.
+"""
+print(__doc__)
+
+import time
+import warnings
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+from sklearn import cluster, datasets
+from sklearn.preprocessing import StandardScaler
+from itertools import cycle, islice
+
+np.random.seed(0)
+
+######################################################################
+# Generate datasets. We choose the size big enough to see the scalability
+# of the algorithms, but not too big to avoid too long running times
+
+n_samples = 1500
+noisy_circles = datasets.make_circles(n_samples=n_samples, factor=.5,
+                                      noise=.05)
+noisy_moons = datasets.make_moons(n_samples=n_samples, noise=.05)
+blobs = datasets.make_blobs(n_samples=n_samples, random_state=8)
+no_structure = np.random.rand(n_samples, 2), None
+
+# Anisotropicly distributed data
+random_state = 170
+X, y = datasets.make_blobs(n_samples=n_samples, random_state=random_state)
+transformation = [[0.6, -0.6], [-0.4, 0.8]]
+X_aniso = np.dot(X, transformation)
+aniso = (X_aniso, y)
+
+# blobs with varied variances
+varied = datasets.make_blobs(n_samples=n_samples,
+                             cluster_std=[1.0, 2.5, 0.5],
+                             random_state=random_state)
+
+######################################################################
+# Run the clustering and plot
+
+# Set up cluster parameters
+plt.figure(figsize=(9 * 1.3 + 2, 14.5))
+plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05,
+                    hspace=.01)
+
+plot_num = 1
+
+default_base = {'n_neighbors': 10,
+                'n_clusters': 3}
+
+datasets = [
+    (noisy_circles, {'n_clusters': 2}),
+    (noisy_moons, {'n_clusters': 2}),
+    (varied, {'n_neighbors': 2}),
+    (aniso, {'n_neighbors': 2}),
+    (blobs, {}),
+    (no_structure, {})]
+
+for i_dataset, (dataset, algo_params) in enumerate(datasets):
+    # update parameters with dataset-specific values
+    params = default_base.copy()
+    params.update(algo_params)
+
+    X, y = dataset
+
+    # normalize dataset for easier parameter selection
+    X = StandardScaler().fit_transform(X)
+
+    # ============
+    # Create cluster objects
+    # ============
+    ward = cluster.AgglomerativeClustering(
+        n_clusters=params['n_clusters'], linkage='ward')
+    complete = cluster.AgglomerativeClustering(
+        n_clusters=params['n_clusters'], linkage='complete')
+    average = cluster.AgglomerativeClustering(
+        n_clusters=params['n_clusters'], linkage='average')
+    single = cluster.AgglomerativeClustering(
+        n_clusters=params['n_clusters'], linkage='single')
+
+    clustering_algorithms = (
+        ('Single Linkage', single),
+        ('Average Linkage', average),
+        ('Complete Linkage', complete),
+        ('Ward Linkage', ward),
+    )
+
+    for name, algorithm in clustering_algorithms:
+        t0 = time.time()
+
+        # catch warnings related to kneighbors_graph
+        with warnings.catch_warnings():
+            warnings.filterwarnings(
+                "ignore",
+                message="the number of connected components of the " +
+                "connectivity matrix is [0-9]{1,2}" +
+                " > 1. Completing it to avoid stopping the tree early.",
+                category=UserWarning)
+            algorithm.fit(X)
+
+        t1 = time.time()
+        if hasattr(algorithm, 'labels_'):
+            y_pred = algorithm.labels_.astype(np.int)
+        else:
+            y_pred = algorithm.predict(X)
+
+        plt.subplot(len(datasets), len(clustering_algorithms), plot_num)
+        if i_dataset == 0:
+            plt.title(name, size=18)
+
+        colors = np.array(list(islice(cycle(['#377eb8', '#ff7f00', '#4daf4a',
+                                             '#f781bf', '#a65628', '#984ea3',
+                                             '#999999', '#e41a1c', '#dede00']),
+                                      int(max(y_pred) + 1))))
+        plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[y_pred])
+
+        plt.xlim(-2.5, 2.5)
+        plt.ylim(-2.5, 2.5)
+        plt.xticks(())
+        plt.yticks(())
+        plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'),
+                 transform=plt.gca().transAxes, size=15,
+                 horizontalalignment='right')
+        plot_num += 1
+
+plt.show()