-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
Describe the workflow you want to enable
Currently KNeighborsRegressor.predict() accepts None as input, in which case it returns prediction for all samples in the training set based on the nearest neighbors not including the sample itself (consistent with NearestNeighbors behavior). However, KNeighborsClassifier.predict() does not accept None as input. This is inconsistent and should arguably be harmonized:
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors
import numpy as np
X = np.random.normal(size=(10, 5))
y = np.random.normal(size=(10, 1))
knn = NearestNeighbors(n_neighbors=3)
knn.fit(X)
knn.kneighbors() # works
knn = KNeighborsRegressor(n_neighbors=3)
knn.fit(X, y)
knn.predict(None) # works (NB: does not work without "None")
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, np.ravel(y) > 0)
knn.predict(None) # fails with an errorDescribe your proposed solution
My proposed solution is to make KNeighborsClassifier.predict(None) behave the same as KNeighborsRegressor.predict(None). As explained in #27747, the necessary fix requires changing only two lines of code.
Additional context
As explained in #27747, this would be a great feature, super useful and convenient for computing LOOCV accuracy simply via score(None, y). Using score(X, y) where X is the training set used in fit(X) gives a biased result because each (training set) sample gets included into its own neighbors.