Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jnothman opened this issue Oct 22, 2018 · 7 comments
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices

Comments

@jnothman
Copy link
Member

Where estimators can be configured for metric, the point of providing a parameter p is that it should be possible to switch between Manhattan and Euclidian and other Minkowski distances just by modifying p (without changing metric). However this is not possible where metric='euclidean' by default. Rather, where p is available, metric should be 'minkowski' by default.

We should change the default metric in OPTICS without deprecation as it has not been released. Elsewhere, we need to:

  • Warn for two versions if the user has not set metric but has set p != 2
  • Change the default metric to minkowski when that deprecation period is over.
@jnothman jnothman added Easy Well-defined and straightforward way to resolve help wanted labels Oct 22, 2018
@amueller
Copy link
Member

which estimators are those?

@qinhanmin2014
Copy link
Member

+1 to change the default in OPTICS (without deprecation cycle), especially considering that our euclidean is not robust.

@TomDLT
Copy link
Member

TomDLT commented Oct 23, 2018

which estimators are those?

from sklearn.utils.testing import all_estimators, ignore_warnings

for name, estimator in all_estimators():
    with ignore_warnings():
        params = estimator().get_params()

    keys = ['p', 'metric']

    if any(key in params for key in keys):
        msg = name + '('
        for key in keys:
            if key in params:
                msg += '%s=%r, ' % (key, params[key])
        msg = msg[:-2] + ')'
        print(msg)
# Results
DBSCAN(p=None, metric='euclidean')
KNeighborsClassifier(p=2, metric='minkowski')
KNeighborsRegressor(p=2, metric='minkowski')
KernelDensity(metric='euclidean')
LocalOutlierFactor(p=2, metric='minkowski')
MDS(metric=True)
NearestCentroid(metric='euclidean')
NearestNeighbors(p=2, metric='minkowski')
OPTICS(p=2, metric='euclidean')
RadiusNeighborsClassifier(p=2, metric='minkowski')
RadiusNeighborsRegressor(p=2, metric='minkowski')
TSNE(metric='euclidean')

So we have:

@jnothman
Copy link
Member Author

Hmmm.... @qinhanmin2014 has implied that this isn't the full story. Where sklearn.metrics.pairwise is used, unlike sklearn.neighbors, the implementation of minkowski(p=2) is not the same as the the implementation of euclidean.

Should pairwise_distances be special-casing the minkowski(p=2) case as equivalent to euclidean (and similar for manhattan)? Perhaps we should only do this after we've solved open issues with the precision of euclidean_distances.

@giba0
Copy link

giba0 commented Nov 5, 2018

I'll try to do this!

@rth
Copy link
Member

rth commented Nov 19, 2018

Should pairwise_distances be special-casing the minkowski(p=2) case as equivalent to euclidean (and similar for manhattan)? Perhaps we should only do this after we've solved open issues with the precision of euclidean_distances.

Yes, the solution proposed by @jnothman in #12601 (review) would be most elegant, but we need to come up with a solution to Euclidean distance accuracy first.

@jnothman
Copy link
Member Author

jnothman commented Mar 4, 2019

This is available for contributors to attempt, but it is not trivial.

@amueller amueller removed the Easy Well-defined and straightforward way to resolve label Jul 12, 2019
@cmarmo cmarmo added the Moderate Anything that requires some knowledge of conventions and best practices label Aug 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
7 participants