Description
Describe the bug
While using CalibratedClassifierCV with a multiclass dataset, I noticed that the following warning is raised, even though the number of classes is much smaller than the number of samples:
UserWarning: The number of unique classes is greater than 50% of the number of samples.
This seems unexpected, so I tried to reproduce the situation with synthetic data. From what I can tell, the number of classes is well below 50% of the number of training samples passed to fit().
It’s possible I’m misunderstanding the intended behavior, but based on reading the source code, it looks like this might be caused by a call to type_of_target(classes_) (instead of y), which could falsely trigger the condition if classes_ is treated like data.
(The same happens with GridSearchCV, for example).
Steps/Code to Reproduce
import numpy as np
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import RandomForestClassifier
def main():
# Simulate 1000 samples, 40 features, 30 classes (<< 50%)
n_samples = 1000
n_features = 40
n_classes = 30
rng = np.random.RandomState(42)
x = rng.rand(n_samples, n_features)
y = np.tile(np.arange(n_classes), int(np.ceil(n_samples / n_classes)))[:n_samples]
print(f"Samples: {len(y)}")
print(f"Unique classes: {len(np.unique(y))}")
print(f"Class/sample ratio: {len(np.unique(y)) / len(y):.2%}")
base_clf = RandomForestClassifier(n_estimators=100, random_state=42)
cal_clf = CalibratedClassifierCV(base_clf, method='isotonic', cv=2)
cal_clf.fit(x, y)
if __name__ == '__main__':
main()
Expected Results
I expected no warning to be raised, as the class/sample ratio is only ~3% (well under the 50% threshold). There are no rare classes, and the splits from CV should still contain enough samples.
Actual Results
Samples: 1000
Unique classes: 30
Class/sample ratio: 3.00%
/miniconda3/envs/sklearn_check/lib/python3.13/site-packages/sklearn/utils/_response.py:203: UserWarning: The number of unique classes is greater than 50% of the number of samples.
target_type = type_of_target(classes)
/miniconda3/envs/sklearn_check/lib/python3.13/site-packages/sklearn/utils/_response.py:203: UserWarning: The number of unique classes is greater than 50% of the number of samples.
target_type = type_of_target(classes)
Versions
System:
python: 3.13.5 | packaged by conda-forge | (main, Jun 16 2025, 08:27:50) [GCC 13.3.0]
executable: /miniconda3/envs/sklearn_check/bin/python
machine: Linux-6.8.0-60-generic-x86_64-with-glibc2.39
Python dependencies:
sklearn: 1.7.0
pip: 25.1.1
setuptools: 80.9.0
numpy: 2.3.0
scipy: 1.15.2
Cython: None
pandas: None
matplotlib: None
joblib: 1.5.1
threadpoolctl: 3.6.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 20
prefix: libopenblas
filepath: /miniconda3/envs/sklearn_check/lib/libopenblasp-r0.3.29.so
version: 0.3.29
threading_layer: pthreads
architecture: Haswell
user_api: openmp
internal_api: openmp
num_threads: 20
prefix: libgomp
filepath: /miniconda3/envs/sklearn_check/lib/libgomp.so.1.0.0
version: None