-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Estimators with metric='euclidean' default but supporting metric_params and p should instead have metric='minkowski' #12437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
which estimators are those? |
+1 to change the default in OPTICS (without deprecation cycle), especially considering that our euclidean is not robust. |
from sklearn.utils.testing import all_estimators, ignore_warnings
for name, estimator in all_estimators():
with ignore_warnings():
params = estimator().get_params()
keys = ['p', 'metric']
if any(key in params for key in keys):
msg = name + '('
for key in keys:
if key in params:
msg += '%s=%r, ' % (key, params[key])
msg = msg[:-2] + ')'
print(msg) # Results
DBSCAN(p=None, metric='euclidean')
KNeighborsClassifier(p=2, metric='minkowski')
KNeighborsRegressor(p=2, metric='minkowski')
KernelDensity(metric='euclidean')
LocalOutlierFactor(p=2, metric='minkowski')
MDS(metric=True)
NearestCentroid(metric='euclidean')
NearestNeighbors(p=2, metric='minkowski')
OPTICS(p=2, metric='euclidean')
RadiusNeighborsClassifier(p=2, metric='minkowski')
RadiusNeighborsRegressor(p=2, metric='minkowski')
TSNE(metric='euclidean') So we have:
|
Hmmm.... @qinhanmin2014 has implied that this isn't the full story. Where Should pairwise_distances be special-casing the minkowski(p=2) case as equivalent to euclidean (and similar for manhattan)? Perhaps we should only do this after we've solved open issues with the precision of |
I'll try to do this! |
Yes, the solution proposed by @jnothman in #12601 (review) would be most elegant, but we need to come up with a solution to Euclidean distance accuracy first. |
This is available for contributors to attempt, but it is not trivial. |
Where estimators can be configured for metric, the point of providing a parameter
p
is that it should be possible to switch between Manhattan and Euclidian and other Minkowski distances just by modifyingp
(without changingmetric
). However this is not possible wheremetric='euclidean'
by default. Rather, wherep
is available,metric
should be 'minkowski' by default.We should change the default metric in OPTICS without deprecation as it has not been released. Elsewhere, we need to:
p != 2
metric
to minkowski when that deprecation period is over.The text was updated successfully, but these errors were encountered: