Thanks to visit codestin.com
Credit goes to github.com

Skip to content

knn predict unreasonably slow b/c of use of scipy.stats.mode #13783

Closed
@amueller

Description

@amueller
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.neighbors import KNeighborsClassifier

X, y = make_blobs(centers=2, random_state=4, n_samples=30)
knn = KNeighborsClassifier(algorithm='kd_tree').fit(X, y)

x_min, x_max = X[:, 0].min(), X[:, 0].max()
y_min, y_max = X[:, 1].min(), X[:, 1].max()

xx = np.linspace(x_min, x_max, 1000)
# change 100 to 1000 below and wait a long time                                          
yy = np.linspace(y_min, y_max, 100)                                          

X1, X2 = np.meshgrid(xx, yy)                                                  
X_grid = np.c_[X1.ravel(), X2.ravel()]                                        
decision_values = knn.predict(X_grid)

spends all it's time in unique within stats.mode, not within the distance calculation. mode runs unique for every row.
I'm pretty sure we can replace the call to mode by some call to making a csr matrix and then argmax.

How much is it worth optimizing this? I feel KNN should be fast in low dimensions and people might actually use this. Having the bottleneck in the wrong place just feels wrong to me ;)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions