knn predict unreasonably slow b/c of use of scipy.stats.mode

```python
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.neighbors import KNeighborsClassifier

X, y = make_blobs(centers=2, random_state=4, n_samples=30)
knn = KNeighborsClassifier(algorithm='kd_tree').fit(X, y)

x_min, x_max = X[:, 0].min(), X[:, 0].max()
y_min, y_max = X[:, 1].min(), X[:, 1].max()

xx = np.linspace(x_min, x_max, 1000)
# change 100 to 1000 below and wait a long time                                          
yy = np.linspace(y_min, y_max, 100)                                          

X1, X2 = np.meshgrid(xx, yy)                                                  
X_grid = np.c_[X1.ravel(), X2.ravel()]                                        
decision_values = knn.predict(X_grid)
```

spends all it's time in unique within stats.mode, not within the distance calculation. ``mode`` runs ``unique`` for every row.
I'm pretty sure we can replace the call to mode by some call to making a csr matrix and then argmax.

How much is it worth optimizing this? I feel KNN should be fast in low dimensions and people might actually use this. Having the bottleneck in the wrong place just feels wrong to me ;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

knn predict unreasonably slow b/c of use of scipy.stats.mode #13783

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

knn predict unreasonably slow b/c of use of scipy.stats.mode #13783

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions