Open
Description
Describe the bug
My simple implementation of the RBF kernel is significantly faster than the default implementation for n_samples << n_features
. How does this happen? The effect diminishes if n_samples >> n_features
, in this case both implementations have a similar runtime.
Steps/Code to Reproduce
Gist n_samples << n_features
Gist n_samples >> n_features
Expected Results
The default C implementation is faster than a naive implementation in python.
Actual Results
The naive implementation in python is 10x faster than the default C implementation for n_samples << n_features
.
Versions
System:
python: 3.8.12 (default, Aug 30 2021, 00:00:00) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]
executable: /home/t/.cache/pypoetry/virtualenvs/mutation-prediction-VhT0dLh3-py3.8/bin/python
machine: Linux-5.14.10-200.fc34.x86_64-x86_64-with-glibc2.2.5
Python dependencies:
pip: 21.0.1
setuptools: 54.1.2
sklearn: 0.24.2
numpy: 1.21.0
scipy: 1.7.0
Cython: None
pandas: 1.2.5
matplotlib: 3.4.2
joblib: 1.0.1
threadpoolctl: 2.1.0
Built with OpenMP: True
I ran the benchmark on a Ryzen 7 2700 Octa-Core (Hyperthreading enabled) with 64 GB RAM.