Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rth opened this issue Nov 15, 2018 · 3 comments
Open

sqeuclidean metric is much slower than euclidean in pairwise_distances #12600

rth opened this issue Nov 15, 2018 · 3 comments

Comments

@rth
Copy link
Member

rth commented Nov 15, 2018

Currently, passing metric='sqeuclidean' to pairwise_distances falls back to scipy pairwise distance calculations instead of using the fast (and less accurate) implementation in scikit-learn. As a result, it can be up to 10x slower in some cases,

In [1]: from eucl_dist.cpu_dist import dist                                                                                                                  
    ... import numpy as np                                                                                                                                   
In [2]: rng = np.random.RandomState(1)                                                                                                                        
    ... a = rng.rand(1000, 300)                                                                                                                              
    ...b = rng.rand(1000, 300)
In [3]: %timeit pairwise_distances(a, b, metric='euclidean')**2                                                                                             
21.5 ms ± 804 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit pairwise_distances(a, b, metric='sqeuclidean')                                                                                              
212 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

this is particularly confusing because people who would use sqeuclidean will mostly do it for performance reasons, I think, to save one square root computation with repsect to euclidean.

@jeremiedbb
Copy link
Member

This is just a matter of documentation because you can use sklearn's implementation with

pairwise_distances(a, b, metric='euclidean', metric_params={'squared': True})

@amueller
Copy link
Member

The question is also what's the expected behavior for sqeuclidean.

@jnothman
Copy link
Member

jnothman commented Nov 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants