Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437

jnothman · 2018-10-22T23:00:20Z

Where estimators can be configured for metric, the point of providing a parameter p is that it should be possible to switch between Manhattan and Euclidian and other Minkowski distances just by modifying p (without changing metric). However this is not possible where metric='euclidean' by default. Rather, where p is available, metric should be 'minkowski' by default.

We should change the default metric in OPTICS without deprecation as it has not been released. Elsewhere, we need to:

Warn for two versions if the user has not set metric but has set p != 2
Change the default metric to minkowski when that deprecation period is over.

The text was updated successfully, but these errors were encountered:

amueller · 2018-10-23T00:55:29Z

which estimators are those?

qinhanmin2014 · 2018-10-23T01:39:02Z

+1 to change the default in OPTICS (without deprecation cycle), especially considering that our euclidean is not robust.

TomDLT · 2018-10-23T09:37:22Z

which estimators are those?

from sklearn.utils.testing import all_estimators, ignore_warnings

for name, estimator in all_estimators():
    with ignore_warnings():
        params = estimator().get_params()

    keys = ['p', 'metric']

    if any(key in params for key in keys):
        msg = name + '('
        for key in keys:
            if key in params:
                msg += '%s=%r, ' % (key, params[key])
        msg = msg[:-2] + ')'
        print(msg)

# Results
DBSCAN(p=None, metric='euclidean')
KNeighborsClassifier(p=2, metric='minkowski')
KNeighborsRegressor(p=2, metric='minkowski')
KernelDensity(metric='euclidean')
LocalOutlierFactor(p=2, metric='minkowski')
MDS(metric=True)
NearestCentroid(metric='euclidean')
NearestNeighbors(p=2, metric='minkowski')
OPTICS(p=2, metric='euclidean')
RadiusNeighborsClassifier(p=2, metric='minkowski')
RadiusNeighborsRegressor(p=2, metric='minkowski')
TSNE(metric='euclidean')

So we have:

DBSCAN. Interestingly, the function uses different defaults: dbscan(p=2, metric='minkowski'). we should probably fix this.
OPTICS ([MRG] MNT Change default metric in OPTICS #12439)
TSNE if we want to add parameter p ([MRG] Add distance metric_params argument to TSNE constructor #12387).
KernelDensity if we want to add parameter p.
MDS is not concerned.
other estimators already have metric='minkowski' as default.

jnothman · 2018-10-23T09:47:17Z

Hmmm.... @qinhanmin2014 has implied that this isn't the full story. Where sklearn.metrics.pairwise is used, unlike sklearn.neighbors, the implementation of minkowski(p=2) is not the same as the the implementation of euclidean.

Should pairwise_distances be special-casing the minkowski(p=2) case as equivalent to euclidean (and similar for manhattan)? Perhaps we should only do this after we've solved open issues with the precision of euclidean_distances.

giba0 · 2018-11-05T23:24:26Z

I'll try to do this!

rth · 2018-11-19T10:27:37Z

Should pairwise_distances be special-casing the minkowski(p=2) case as equivalent to euclidean (and similar for manhattan)? Perhaps we should only do this after we've solved open issues with the precision of euclidean_distances.

Yes, the solution proposed by @jnothman in #12601 (review) would be most elegant, but we need to come up with a solution to Euclidean distance accuracy first.

jnothman · 2019-03-04T07:30:04Z

This is available for contributors to attempt, but it is not trivial.

jnothman added Easy Well-defined and straightforward way to resolve help wanted labels Oct 22, 2018

qinhanmin2014 mentioned this issue Oct 23, 2018

[MRG] MNT Change default metric in OPTICS #12439

Merged

TomDLT mentioned this issue Oct 23, 2018

[MRG] Add distance metric_params argument to TSNE constructor #12387

Closed

giba0 mentioned this issue Nov 6, 2018

[WIP] Fix Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12524

Closed

rth removed the help wanted label Nov 18, 2018

jnothman added the help wanted label Mar 4, 2019

amueller removed the Easy Well-defined and straightforward way to resolve label Jul 12, 2019

cmarmo added the Moderate Anything that requires some knowledge of conventions and best practices label Aug 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437

Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437

jnothman commented Oct 22, 2018

amueller commented Oct 23, 2018

Uh oh!

qinhanmin2014 commented Oct 23, 2018

Uh oh!

TomDLT commented Oct 23, 2018 •

edited

Loading

Uh oh!

jnothman commented Oct 23, 2018

Uh oh!

giba0 commented Nov 5, 2018

Uh oh!

rth commented Nov 19, 2018

Uh oh!

jnothman commented Mar 4, 2019

Uh oh!

Uh oh!

Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437

Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437

Comments

jnothman commented Oct 22, 2018

amueller commented Oct 23, 2018

Uh oh!

qinhanmin2014 commented Oct 23, 2018

Uh oh!

TomDLT commented Oct 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Oct 23, 2018

Uh oh!

giba0 commented Nov 5, 2018

Uh oh!

rth commented Nov 19, 2018

Uh oh!

jnothman commented Mar 4, 2019

Uh oh!

TomDLT commented Oct 23, 2018 •

edited

Loading