Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unjustified "number of unique classes > 50%" warning in CalibratedClassifierCV #31583

Open
@saskra

Description

@saskra

Describe the bug

While using CalibratedClassifierCV with a multiclass dataset, I noticed that the following warning is raised, even though the number of classes is much smaller than the number of samples:

UserWarning: The number of unique classes is greater than 50% of the number of samples.

This seems unexpected, so I tried to reproduce the situation with synthetic data. From what I can tell, the number of classes is well below 50% of the number of training samples passed to fit().

It’s possible I’m misunderstanding the intended behavior, but based on reading the source code, it looks like this might be caused by a call to type_of_target(classes_) (instead of y), which could falsely trigger the condition if classes_ is treated like data.

(The same happens with GridSearchCV, for example).

Steps/Code to Reproduce

import numpy as np
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import RandomForestClassifier


def main():
	# Simulate 1000 samples, 40 features, 30 classes (<< 50%)
	n_samples = 1000
	n_features = 40
	n_classes = 30

	rng = np.random.RandomState(42)
	x = rng.rand(n_samples, n_features)
	y = np.tile(np.arange(n_classes), int(np.ceil(n_samples / n_classes)))[:n_samples]

	print(f"Samples: {len(y)}")
	print(f"Unique classes: {len(np.unique(y))}")
	print(f"Class/sample ratio: {len(np.unique(y)) / len(y):.2%}")

	base_clf = RandomForestClassifier(n_estimators=100, random_state=42)
	cal_clf = CalibratedClassifierCV(base_clf, method='isotonic', cv=2)
	cal_clf.fit(x, y)


if __name__ == '__main__':
	main()

Expected Results

I expected no warning to be raised, as the class/sample ratio is only ~3% (well under the 50% threshold). There are no rare classes, and the splits from CV should still contain enough samples.

Actual Results

Samples: 1000
Unique classes: 30
Class/sample ratio: 3.00%
/miniconda3/envs/sklearn_check/lib/python3.13/site-packages/sklearn/utils/_response.py:203: UserWarning: The number of unique classes is greater than 50% of the number of samples.
  target_type = type_of_target(classes)
/miniconda3/envs/sklearn_check/lib/python3.13/site-packages/sklearn/utils/_response.py:203: UserWarning: The number of unique classes is greater than 50% of the number of samples.
  target_type = type_of_target(classes)

Versions

System:
    python: 3.13.5 | packaged by conda-forge | (main, Jun 16 2025, 08:27:50) [GCC 13.3.0]
executable: /miniconda3/envs/sklearn_check/bin/python
   machine: Linux-6.8.0-60-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.7.0
          pip: 25.1.1
   setuptools: 80.9.0
        numpy: 2.3.0
        scipy: 1.15.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.5.1
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 20
         prefix: libopenblas
       filepath: /miniconda3/envs/sklearn_check/lib/libopenblasp-r0.3.29.so
        version: 0.3.29
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 20
         prefix: libgomp
       filepath: /miniconda3/envs/sklearn_check/lib/libgomp.so.1.0.0
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions