Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Bug in precision_recall_curve? #6265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lfiaschi opened this issue Feb 2, 2016 · 3 comments
Closed

Bug in precision_recall_curve? #6265

lfiaschi opened this issue Feb 2, 2016 · 3 comments

Comments

@lfiaschi
Copy link

lfiaschi commented Feb 2, 2016

According to the definition of precision and recall:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
when when the Tp=0 and Fp=0 Recall is zero while Precision is undefined (limit 0/0)
In such cases the function precision_recall_curve returns precision equals to 1 that is inconsistent with other functions within sklearn and lead could lead to misleading interpretations. The following code demonstrate the problem:

import numpy as np
from sklearn.metrics import precision_recall_curve, precision_recall_fscore_support,\
average_precision_score,auc
from matplotlib import pyplot as plt

#create some example labels
y=np.hstack([np.ones((100,)),np.zeros((900,))])

#create fake uniform probabilities of a dummy classifier
prob_random=np.ones_like(y)*.5

#Note that when recall is 0 (all samples assigned to class 0) precision returned is 1.
#However, this case is mathematically undefined (0/0 limit) and could make sense to set returned #precision to 0 rather than one as it is done in other function within sklearn
precision,recall,_ = precision_recall_curve(y,prob_random)

print('precision at 0 recall = {}'.format(precision[1]))

#For example, this function returns precision 0 and recall 0
P,R,_,_ =precision_recall_fscore_support(y,np.zeros_like(y))
print('precision at 0 recall = {}'.format(P[1]))

#A problem with the first definition is that 
# the AUC under the precision recall curve of a random classifier appears  artificially very good
plt.plot(recall,precision)
recall_def2=[1,0]
precision_def2=[.1,0]
plt.plot(recall_def2,precision_def2)
print('AUC current definition {}'.format(auc(recall,precision)))
print('AUC second definition {}'.format(auc(recall_def2,precision_def2)))
precision at 0 recall = 1.0
precision at 0 recall = 0.0
AUC current definition 0.55
AUC second definition 0.05

Also this definition may create problem when using PR-AUC as CV scorer because classifier with crisp probabilities (such as the random one) will get high scores.

@lfiaschi
Copy link
Author

Also discussed in issue #4223

@amueller amueller added the Bug label Oct 7, 2016
@amueller amueller added this to the 0.19 milestone Oct 7, 2016
@amueller
Copy link
Member

amueller commented Oct 7, 2016

Is this fixed by #7356 ?

@amueller
Copy link
Member

OK I got double confused here. This is not AUC, AUC is the area under the ROC curve. You're talking about average precision, which is not computed using the auc function and should not be computed that way. calling average_precision_score on your example yields .1.

Please reopen if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants