Bug in precision_recall_curve?

According to the definition of precision and recall:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
when when the Tp=0 and Fp=0 Recall is zero while Precision is undefined (limit 0/0)
In such cases the function `precision_recall_curve` returns  precision equals to 1 that is inconsistent with other functions within sklearn and lead could lead to misleading interpretations. The following code demonstrate the problem:

``` python
import numpy as np
from sklearn.metrics import precision_recall_curve, precision_recall_fscore_support,\
average_precision_score,auc
from matplotlib import pyplot as plt

#create some example labels
y=np.hstack([np.ones((100,)),np.zeros((900,))])

#create fake uniform probabilities of a dummy classifier
prob_random=np.ones_like(y)*.5

#Note that when recall is 0 (all samples assigned to class 0) precision returned is 1.
#However, this case is mathematically undefined (0/0 limit) and could make sense to set returned #precision to 0 rather than one as it is done in other function within sklearn
precision,recall,_ = precision_recall_curve(y,prob_random)

print('precision at 0 recall = {}'.format(precision[1]))

#For example, this function returns precision 0 and recall 0
P,R,_,_ =precision_recall_fscore_support(y,np.zeros_like(y))
print('precision at 0 recall = {}'.format(P[1]))

#A problem with the first definition is that 
# the AUC under the precision recall curve of a random classifier appears  artificially very good
plt.plot(recall,precision)
recall_def2=[1,0]
precision_def2=[.1,0]
plt.plot(recall_def2,precision_def2)
print('AUC current definition {}'.format(auc(recall,precision)))
print('AUC second definition {}'.format(auc(recall_def2,precision_def2)))
```
```
precision at 0 recall = 1.0
precision at 0 recall = 0.0
AUC current definition 0.55
AUC second definition 0.05
```
Also this definition may create problem when using PR-AUC as CV scorer because classifier with crisp probabilities (such as the random one) will get high scores.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bug in precision_recall_curve? #6265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Bug in precision_recall_curve? #6265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions