Closed
Description
Description
precision_recall_curves return values which are not correct.
Steps/Code to Reproduce
Example:
compare the answer here https://stats.stackexchange.com/questions/183504/are-precision-and-recall-supposed-to-be-monotonic-to-classification-threshold to what sklearn returns
import numpy as np
from sklearn import metrics
labels = np.array([False,True,False,True,True,True,False,False])
scores = np.linspace(0,1, len(labels))
pr, rc, th = metrics.precision_recall_curve(y_true=labels, probas_pred=scores,pos_label=True)
plt.plot(rc, pr,'o-',label='sklearn')
pr = np.cumsum(labels)/np.arange(1,len(labels)+1)
rc = np.cumsum(labels)/np.sum(labels)
pr[0]=0
plt.plot(rc,pr,'.-',label='mine')
plt.legend()
labels.mean()
Expected Results
rc = array([ 0. , 0.25, 0.25, 0.5 , 0.75, 1. , 1. , 1. ])
pr = array([ 0. , 0.5 , 0.33333333, 0.5 , 0.6 , 0.66666667, 0.57142857, 0.5 ])
Actual Results
rc= array([ 1. , 0.75, 0.75, 0.5 , 0.25, 0. , 0. , 0. ])
pr=array([ 0.57142857, 0.5 , 0.6 , 0.5 , 0.33333333, 0. , 0. , 1. ])
Versions
Windows-10-10.0.14393-SP0
Python 3.5.3 |Anaconda custom (64-bit)| (default, Feb 22 2017, 21:28:42) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1