Thanks to visit codestin.com
Credit goes to github.com

Skip to content

CalibratedClassifierCV with mode = 'isotonic' has predict_proba return infinite probabilities #10903

Closed
@LotusZephyr

Description

@LotusZephyr

Description

I am using scikit-learn's CalibratedClassifierCV with GaussianNB() to run binary classification on some data. When I run .predict_proba(X_test), the probabilities returned for some of the samples are -inf or inf.

This came to light when I tried running brier_score_loss on the resulting predictions, and it threw a ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have posted this issue as a question on StackOverflow (see this link). According to an answer there, the problem stems from some linear regression, used specifically by the 'isotonic' mode, being unable to handle extreme values.

Code to Reproduce

I have added some data to this Google drive link. It's larger than what I wanted but I couldn't get consistent reproduction with smaller datasets.

The code for reproduction lies below. There is some randomness to the code so if no infinites are found try running it again, but from my experiments it should find them on the first try.

from sklearn.naive_bayes import GaussianNB
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import StratifiedShuffleSplit
import numpy as np

loaded = np.load('data.npz')
X = loaded['X']
y = loaded['y']

num = 2*10**4
sss = StratifiedShuffleSplit(n_splits = 10, test_size = 0.2)
cal_classifier = CalibratedClassifierCV(GaussianNB(), method = 'isotonic', cv = sss)

classifier_fit = cal_classifier.fit(X[:num], y[:num])
predicted_probabilities = classifier_fit.predict_proba(X[num:num+num//4])[:,1]

predicted_probabilities[np.argwhere(~np.isfinite(predicted_probabilities))]

Expected Results

Probabilities calcualted lie within [0,1].

Actual Results

Some entries have infinite probabilities.

Versions

Linux-4.9.81-35.56.amzn1.x86_64-x86_64-with-glibc2.9
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1

Metadata

Metadata

Assignees

Labels

BugEasyWell-defined and straightforward way to resolveSprint

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions