KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000

ethanresnick · 2024-05-11T18:42:38Z

Describe the bug

Calling CalibratedClassifierCV() with cv=KFold(n_samples=n) (where n is the number of samples) can give different results than using cv=LeaveOneOut(), but the docs for LeaveOneOut() say these should be equivalent.

In particular, the KFold class has an "n_splits" attribute, which means this branch runs when setting up sigmoid calibration, and then this error can be thrown. With LeaveOneOut(), n_folds is set to None and that error is never hit.

I'm not sure whether that error is correct/desirable in every case (see the code to reproduce for my use case where I think(?) the error may be unnecessary) but, either way, the two different cv values seem like they should behave equivalently.

Steps/Code to Reproduce

from sklearn.pipeline import make_pipeline
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import KFold, LeaveOneOut
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20, random_state=42)

pipeline = make_pipeline(
    StandardScaler(),
    CalibratedClassifierCV(
        SVC(probability=False),
        ensemble=False,
        cv=LeaveOneOut()
    )
)
pipeline.fit(X, y)

pipeline2 = make_pipeline(
    StandardScaler(),
    CalibratedClassifierCV(
        SVC(probability=False),
        ensemble=False,
        cv=KFold(n_splits=20, shuffle=True)
    )
)
pipeline2.fit(X, y)

Expected Results

pipeline and pipeline2 should function identically. Instead, pipeline.fit() succeeds and pipeline2.fit() throws.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/sklearn/pipeline.py", line 427, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/sklearn/calibration.py", line 419, in fit
    raise ValueError(
ValueError: Requesting 20-fold cross-validation but provided less than 20 examples for at least one class.

Versions

System:
    python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
   machine: macOS-14.4.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.3.2
          pip: 24.0
   setuptools: 69.0.2
        numpy: 1.26.2
        scipy: 1.11.4
       Cython: None
       pandas: 2.1.3
   matplotlib: 3.8.2
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 12
         prefix: libomp
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: armv8

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-05-16T08:02:29Z

The error in KFold is actually expected. We expect to have at least a sample from each class in each fold. This cannot be achieved with the LeaveOneOut cross-validation. So we should not accept this strategy.

So we could raise early an error for this strategy. However, I can also see some other strategy leading to having a single class present when fitting the calibrator. I assume that it should be safer to raise an error in this case as well otherwise we get a ill-fitted calibrator anyway.

ping @lucyleeow @ogrisel that might have more insight on this part of the calibrator and to know their opinions.

ogrisel · 2024-05-16T11:25:33Z

I think i agree on both accounts but did not check the details in the code yet.

ethanresnick · 2024-05-16T17:14:25Z

The error in KFold is actually expected. We expect to have at least a sample from each class in each fold.

Isn't it the case that KFold also doesn't guarantee one sample from each class in each fold (since it doesn't create stratified folds)?

However, I can also see some other strategy leading to having a single class present when fitting the calibrator.

Yeah, exactly. There are lots of ways to end up with poorly-fit calibrators, and I'm not sure the code's current check (even when it does apply) really covers that.

kyrajeep · 2024-05-28T01:07:40Z

LeaveOneOut does not have different groups like k-folds cv (https://www.cs.cmu.edu/~schneide/tut5/node42.html). More accurately, it sets each sample as a 'fold.' It trains on (n-1) training data at a time (where train data size = n) making it computationally expensive but very reliable. K-folds, on the other hand, divides the training data into k groups and trains the model k times, leaving one group at a time. Perhaps this clarification was not the main issue, but I thought it might be helpful :)

lucyleeow · 2024-06-04T06:19:24Z

We expect to have at least a sample from each class in each fold.

Looking into this, with ensemble=True we do need at least one sample from each class in each fold. I think it makes sense to warn when this is not guaranteed (e.g., KFold) and error when it is not possible (e.g., LeaveOneOut).

In the case of ensemble=False we use cross_val_predict to get unbiased predictions. These predictions are then used to fit a single calibrator. I think (?) in this case it is okay that there is not at least a sample from each class in each fold.

WDYT @glemaitre @ogrisel ?

glemaitre · 2024-06-06T08:16:43Z

These predictions are then used to fit a single calibrator. I think (?) in this case it is okay that there is not at least a sample from each class in each fold.

I think this is still kind of weird to fit a calibrator on a single class, isn't it?

lucyleeow · 2024-06-06T11:44:28Z

But it wouldn't be a single class. The 'test' set for each split would be a single sample (and thus one class) but cross_val_predict(cv=LeaveOneOut) would iterate through splits for all the data and give predictions for all samples?

glemaitre · 2024-06-06T11:51:18Z

Yes you are completely right, I was not awake yet :)

daustria · 2024-06-27T22:36:07Z

hi, i'm new to this repository and ML in general, found this discussion interesting and hope you dont mind me chiming in.

Another suggestion might be to not throw errors but maybe give warnings in the case when ensemble=True and cv does not guarantee that each fold has a sample from each class. Perhaps it could be added in the documentation to further caution the user of this issue when using CalibratedClassifierCV().

This would result in KFold(samples=n) and LeaveOneOut() having the same behaviour. Also I can imagine it could be annoying when you are working with 20 classes and want to do cross validation with KFold(n_samples=30), but exactly one of your 20 classes has only 29 samples. Would the result of having 29 over 30 folds be so bad that it should warrant the error? I understand that the negative effects would be much more severe in the case of LeaveOneOut() though.

edit: in the first case, perhaps we require each fold to have a sample of each class because of something like #28178 happening?

lucyleeow · 2024-07-10T05:56:51Z

Looking at this further, the only cv splitter where we guarantee that each class is represented in every split is StratifiedKFold and StratifiedShuffleSplit. Perhaps it is a better idea to highlight in our docs what happens when a class is missing; predicited prob for that split defaults to 0 for the missing class, skewing results when its averaged in ensemble=True.

In the case of LeaveOneOut, which is really should not be used with CalibratedClassifierCV, I agree with OP that we should give the same error.

cc @glemaitre

ethanresnick added Bug Needs Triage Issue requires triage labels May 11, 2024

glemaitre removed the Needs Triage Issue requires triage label May 16, 2024

lucyleeow mentioned this issue Jul 10, 2024

DOC Add note on cv splits in CalibratedClassifierCV missing classes #29449

Merged

This was referenced Jul 23, 2024

FIX Add error when LeaveOneOut used in CalibratedClassifierCV #29545

Merged

BUG Problem when CalibratedClassifierCV train contains 2 classes but data contains more #29551

Open

thomasjpfan closed this as completed in #29545 Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000

KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000

ethanresnick commented May 11, 2024 •

edited

Loading

glemaitre commented May 16, 2024

Uh oh!

ogrisel commented May 16, 2024

Uh oh!

ethanresnick commented May 16, 2024 •

edited

Loading

Uh oh!

kyrajeep commented May 28, 2024 •

edited

Loading

Uh oh!

lucyleeow commented Jun 4, 2024

Uh oh!

glemaitre commented Jun 6, 2024

Uh oh!

lucyleeow commented Jun 6, 2024

Uh oh!

glemaitre commented Jun 6, 2024

Uh oh!

daustria commented Jun 27, 2024 •

edited

Loading

Uh oh!

lucyleeow commented Jul 10, 2024

Uh oh!

Uh oh!

KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000

KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000

Comments

ethanresnick commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented May 16, 2024

Uh oh!

ogrisel commented May 16, 2024

Uh oh!

ethanresnick commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyrajeep commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented Jun 4, 2024

Uh oh!

glemaitre commented Jun 6, 2024

Uh oh!

lucyleeow commented Jun 6, 2024

Uh oh!

glemaitre commented Jun 6, 2024

Uh oh!

daustria commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented Jul 10, 2024

Uh oh!

ethanresnick commented May 11, 2024 •

edited

Loading

ethanresnick commented May 16, 2024 •

edited

Loading

kyrajeep commented May 28, 2024 •

edited

Loading

daustria commented Jun 27, 2024 •

edited

Loading