-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Bug in metrics.roc_auc_score #3864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a result of the following code:
The numpy function |
I have been affected by the same bug after I noticed that I got very different auROC values on two different machines using two different versions of sklearn on the same data. On my dataset, the auROC score changes from 0.9821 (old code) to 0.9764 (new code). I am making my data available at https://www.dropbox.com/s/7nbhw9nhyavxcdm/roc-test-data.pkl.gz?dl=0 so you can verify the bug. The file unpickles to a tuple that has |
This behaviour is due to #3268. It's a problem using differences within floating point error as the basis of ranking, but perhaps we should require a higher resolution, or allow the user to set this as a parameter (but the chances someone will do that of their own accord seem slim). |
It seems to me that we could decrease the tol down to 1e-12 (but maybe |
;) I'm inclined to agree, particularly under rank transformations. |
This has been reported again at #3950. Given that we may dealing with probabilities, which are frequently very small (and in particular, our scorer implementation does not exploit It is frustrating that such a widely-used metric is so brittle to numeric instability. |
Maybe we should use predict_log_proba when possible.
I am worried that if we do this, people will then complain about unstable I am certainly in favor of decreasing the tol. |
Sounds reasonable! |
I think there was a discussion elsewhere on always providing a |
Duplicate issues in #4864, #6688, perhaps #6711. It seems that reverting #3268 might be the best solution (as discussed at #6693). As @jblackburne states there:
This is only somewhat true given that the main source of data for metrics is the output of our estimators without the opportunity for tweaking, and that, as @GaelVaroquaux says above:
|
I noticed the same, I checked how the |
@nielsenmarkus11 can you provide code to reproduce that? Which version of scikit-learn are you using? Can you try using master? |
@jnothman I'm not on top of all the auc issues. Which do you think we can fix for 0.18-rc or 0.18? |
I'm using scikit-learn 0.17.1. So it appears that the problem is created when I add the # Generate Data
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
pred = np.random.beta(2,10,20000)
pred = np.append(pred,np.zeros(5000))
# Generate label and features then put data into sparse DataFrame
import pandas as pd
actual=pd.DataFrame([np.random.binomial(1,p) for p in pred],columns=['resp'])
for i in range(10):
actual['PRE_'+str(i)] = [np.random.binomial(1,p*i/10) for p in pred]
actual = actual.to_sparse()
# Fit the model
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_jobs=-1,warm_start=True,oob_score=True,n_estimators=101,random_state=123)
rf.fit(actual.filter(regex=('PRE+.')),actual['resp'])
# AUC above 1 and tpr fpr above 1
roc_auc_score(actual['resp'],rf.oob_decision_function_[:,1])
roc_curve(actual['resp'],rf.oob_decision_function_[:,1])
# AUC and ROC Curve with 'nan's
roc_auc_score(actual['resp'],rf.predict_proba(actual.filter(regex=('PRE+.')))[:,1])
roc_curve(actual['resp'],rf.predict_proba(actual.filter(regex=('PRE+.')))[:,1])
# Check length of different predicted probability vectors:
(len(rf.oob_decision_function_[:,1]),
len(rf.predict_proba(actual.filter(regex=('PRE+.')))[:,1]))
|
@amueller, I'm personally in favour of removing the tolerance for small differences. It's only been in there recently, and I think it was a mistake. Such differences are just part and parcel of rank-based metrics. |
I've not yet looked at the new issue here |
@nielsenmarkus11, your issue is unrelated. See #7352 |
pred=[1e-10, 0, 0]
sol=[1, 0, 0]
metrics.roc_auc_score(sol, pred) # 0.5, wrong, 1 is correct
pred=[1, 0, 0]
sol=[1, 0, 0]
metrics.roc_auc_score(sol, pred) # 1 correct
The text was updated successfully, but these errors were encountered: