Description
Description
Single linkage clustering fails for sufficiently large data arrays. This is due to issues in the scipy single linkage clustering see issue scipy/scipy#9031. Pushing a fix upstream to scipy is complicated due to the way the code is structured and which parts are in pure C++ and inaccessible to cython.
Steps/Code to Reproduce
import numpy as np
import sklearn.cluster
data = np.random.normal(size=(64000,2))
clusterer = sklearn.cluster. AgglomerativeClustering(linkage='single').fit(data)
Expected Results
clusterer is assigned a trained single linkage clustering instance.
Actual Results
On my mac laptop this simply freezes the whole machine. On Linux a MemoryError with no traceback results. If the (lack of) error is not reproducible on your machine simply make data a larger array.
Versions
Darwin-17.6.0-x86_64-i386-64bit
Python 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 12:43:10)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.11.3
SciPy 1.0.0
Scikit-Learn 0.19.1