sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

rth · 2018-11-15T21:50:18Z

Currently, passing metric='sqeuclidean' to pairwise_distances falls back to scipy pairwise distance calculations instead of using the fast (and less accurate) implementation in scikit-learn. As a result, it can be up to 10x slower in some cases,

In [1]: from eucl_dist.cpu_dist import dist                                                                                                                  
    ... import numpy as np                                                                                                                                   
In [2]: rng = np.random.RandomState(1)                                                                                                                        
    ... a = rng.rand(1000, 300)                                                                                                                              
    ...b = rng.rand(1000, 300)
In [3]: %timeit pairwise_distances(a, b, metric='euclidean')**2                                                                                             
21.5 ms ± 804 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit pairwise_distances(a, b, metric='sqeuclidean')                                                                                              
212 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

this is particularly confusing because people who would use sqeuclidean will mostly do it for performance reasons, I think, to save one square root computation with repsect to euclidean.

The text was updated successfully, but these errors were encountered:

jeremiedbb · 2018-11-15T21:56:32Z

This is just a matter of documentation because you can use sklearn's implementation with

pairwise_distances(a, b, metric='euclidean', metric_params={'squared': True})

amueller · 2018-11-15T21:57:38Z

The question is also what's the expected behavior for sqeuclidean.

jnothman · 2018-11-18T05:19:28Z

I suppose this is similar to the fact that we use different implementations for euclidean and minkowski,p=2... should we be trying to resolve aliases?

rth mentioned this issue Nov 15, 2018

Numerical precision of euclidean_distances with float32 #9354

Closed

rth mentioned this issue Nov 15, 2018

[MRG] Use fast pairwise distance calculations for metric='sqeuclidean' #12601

Closed

TomDLT added the Performance label Nov 16, 2018

cmarmo added the module:cluster label Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

rth commented Nov 15, 2018

jeremiedbb commented Nov 15, 2018

Uh oh!

amueller commented Nov 15, 2018

Uh oh!

jnothman commented Nov 18, 2018 via email

Uh oh!

Uh oh!

sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

Comments

rth commented Nov 15, 2018

jeremiedbb commented Nov 15, 2018

Uh oh!

amueller commented Nov 15, 2018

Uh oh!

jnothman commented Nov 18, 2018 via email

Uh oh!