-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
CalibratedClassifierCV with mode = 'isotonic' has predict_proba return infinite probabilities #10903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report |
@amueller Shall I take this ? |
I can take this up. NYC WiMLDS |
I'm going to take a stab at this and will edit this comment for updates. -NYC WiMLDS Update #1: I'm currently scouring through both calibration.py and isotonic.py files, and I can't figure out what makes the code trigger for infinite values. For example, y_min and y_max are default set to None and the if block on isotonic.py handles for when y_min/y_max is None, set them to infinite values. I am looking through the predict_proba code for predicting probabilities, and this might be the piece of code that sets it off. I'll do more testing on predict_proba() to see if any infinite values come up. @jay-z007, any thoughts from your end? `def predict_proba(self, X):
|
@jay-z007 @jfrank94 please make sure you coordinate. |
I will take this one |
Me and @ImenRajhi, so far found out that it is caused by scipy.interpolate.interp1d() function that is used in isotonic.py in line249: self.f_ = interpolate.interp1d(X, y, kind='linear',
bounds_error=bounds_error) It is caused by small values of X input. If you add fill_value="extrapolate" to that line, you would get Error: .../scipy/interpolate/interpolate.py:609: RuntimeWarning: overflow encountered in true_divide
Since, |
@NurzatRakhman what versions of NumPy scipy and scikit-learn are you running? |
Libraries: [this is the latest] When you run code with given data, set seed to split method to 22 Then you will see that predicted probabilities for index = [1111, 2057] return "inf" result and you can observe the behaviour above that I described. |
I'm working on this issue. |
Take |
Description
I am using scikit-learn's
CalibratedClassifierCV
withGaussianNB()
to run binary classification on some data. When I run.predict_proba(X_test)
, the probabilities returned for some of the samples are-inf
orinf
.This came to light when I tried running
brier_score_loss
on the resulting predictions, and it threw aValueError: Input contains NaN, infinity or a value too large for dtype('float64')
.I have posted this issue as a question on StackOverflow (see this link). According to an answer there, the problem stems from some linear regression, used specifically by the 'isotonic' mode, being unable to handle extreme values.
Code to Reproduce
I have added some data to this Google drive link. It's larger than what I wanted but I couldn't get consistent reproduction with smaller datasets.
The code for reproduction lies below. There is some randomness to the code so if no infinites are found try running it again, but from my experiments it should find them on the first try.
Expected Results
Probabilities calcualted lie within [0,1].
Actual Results
Some entries have infinite probabilities.
Versions
Linux-4.9.81-35.56.amzn1.x86_64-x86_64-with-glibc2.9
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1
The text was updated successfully, but these errors were encountered: