Description
Describe the bug
Observed behavior
When using the "most_frequent" strategy from SimpleImputer and there is a tie, the code takes the minimum values among all ties. This crashes if the values are not comparable such as str
and NoneType
.
Steps/Code to Reproduce
import numpy as np
from sklearn.impute import SimpleImputer
X1 = np.asarray(['a', None])[:, None]
X2 = np.asarray(['a', None, None])[:, None]
imputer = SimpleImputer(add_indicator=True, strategy="most_frequent")
try:
imputer.fit_transform(X1)
print('X1 processed successfully')
except Exception as e:
print('Error while processing X1:', e)
try:
imputer.fit_transform(X2)
print('X2 processed successfully')
except Exception as e:
print('Error while processing X2:', e)
Expected Results
I would expect the Imputer to have a consistant behavior not depending on whether or not a tie is presente. Namely:
- Run whether or not values are comparable
- Crashes if values are not comparable, wheter there is a tie or not.
Note that the code claims to process data like scipy.stats.mode
but mode
only processes numeric values since scipy 1.9.0, it therefore crashed on this example and redirect the user toward np.unique
:
Traceback (most recent call last):
File "/Users/aabraham/NeuralkFoundry/tutorials/repro.py", line 11, in <module>
print(scipy.stats.mode(X1))
~~~~~~~~~~~~~~~~^^^^
File "/Users/aabraham/.local/share/mamba/envs/skle/lib/python3.13/site-packages/scipy/stats/_axis_nan_policy.py", line 611, in axis_nan_policy_wrapper
res = hypotest_fun_out(*samples, axis=axis, **kwds)
File "/Users/aabraham/.local/share/mamba/envs/skle/lib/python3.13/site-packages/scipy/stats/_stats_py.py", line 567, in mode
raise TypeError(message)
TypeError: Argument `a` is not recognized as numeric. Support for input that cannot be coerced to a numeric array was deprecated in SciPy 1.9.0 and removed in SciPy 1.11.0. Please consider `np.unique`.
Let me know the correct behavior you expect and I can contribute a PR. A quick way to solve it would be to use hash(value)
in case values are not comparable.
Actual Results
Error while processing X1: '<' not supported between instances of 'NoneType' and 'str'
X2 processed successfully
If the error is not catched, here is the stack trace:
Traceback (most recent call last):
File "/Users/aabraham/NeuralkFoundry/tutorials/repro.py", line 10, in <module>
imputer.fit_transform(X1)
~~~~~~~~~~~~~~~~~~~~~^^^^
File "/Users/aabraham/scikit-learn/sklearn/utils/_set_output.py", line 316, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/aabraham/scikit-learn/sklearn/base.py", line 894, in fit_transform
return self.fit(X, **fit_params).transform(X)
~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/aabraham/scikit-learn/sklearn/base.py", line 1365, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/Users/aabraham/scikit-learn/sklearn/impute/_base.py", line 453, in fit
self.statistics_ = self._dense_fit(
~~~~~~~~~~~~~~~^
X, self.strategy, self.missing_values, fill_value
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/aabraham/scikit-learn/sklearn/impute/_base.py", line 565, in _dense_fit
most_frequent[i] = _most_frequent(row, np.nan, 0)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/Users/aabraham/scikit-learn/sklearn/impute/_base.py", line 53, in _most_frequent
most_frequent_value = min(
value
for value, count in counter.items()
if count == most_frequent_count
)
TypeError: '<' not supported between instances of 'NoneType' and 'str'
Versions
System:
python: 3.13.5 | packaged by conda-forge | (main, Jun 16 2025, 08:24:05) [Clang 18.1.8 ]
executable: /Users/aabraham/.local/share/mamba/envs/skle/bin/python
machine: macOS-15.4.1-arm64-arm-64bit-Mach-O
Python dependencies:
sklearn: 1.8.dev0
pip: 25.1.1
setuptools: 80.9.0
numpy: 2.3.1
scipy: 1.16.0
Cython: 3.1.2
pandas: 2.3.0
matplotlib: 3.10.3
joblib: 1.5.1
threadpoolctl: 3.6.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 12
prefix: libopenblas
filepath: /Users/aabraham/.local/share/mamba/envs/skle/lib/libopenblas.0.dylib
version: 0.3.30
threading_layer: openmp
architecture: VORTEX
user_api: openmp
internal_api: openmp
num_threads: 12
prefix: libomp
filepath: /Users/aabraham/.local/share/mamba/envs/skle/lib/libomp.dylib
version: None