-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
FIX Add error when LeaveOneOut used in CalibratedClassifierCV
#29545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LeaveOneOut used in CalibratedClassifierCV
| assert np.all(proba[:, :i] > 0) | ||
| assert np.all(proba[:, i + 1 :] > 0) | ||
| else: | ||
| # Check `proba` are all 1/n_classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note proba was 1/n_classes here because the original test data was unique (consisted of 10 samples belonging to 10 classes) and this was not really related to train subset not containing all classes.
I think the estimator ended up being overfit and the calibrator did not respond well to low predict values, calibrating them all to the same value. Note proba was the same value even before normalization of the probabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we even have to check ensemble=False as we use cross_val_predict to get the predictions to use to calibrate and at predict, only one estimator is fit using all the data.
sklearn/calibration.py
Outdated
| if isinstance(self.cv, LeaveOneOut): | ||
| raise ValueError( | ||
| "LeaveOneOut cross-validation does not allow" | ||
| "all classes to be present in test splits." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be explicit by asking people to use an alternative cross-validation strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amended. Hopefully not too long now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nop this is good.
|
|
||
| @pytest.mark.parametrize("ensemble", [True, False]) | ||
| def test_calibration_less_classes(ensemble): | ||
| # Test to check calibration works fine when train set in a test-train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a use case where I'm really wondering if this is valid :). But this is here so let's go with it.
glemaitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the scope of the PR, I'm happy to already have this in the codebase. I open a related PR regarding the cross-validation and the fact that we can get a subset of classes in training. I'm not sure that this is something that we should allow but it requires much more discussions and I might overlook some aspects.
|
I'll add this PR in the milestone for 1.5.2 |
thomasjpfan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
closes #29000
What does this implement/fix? Explain your changes.
LeaveOneOutused inCalibratedClassifierCVLeaveOneOutAny other comments?