-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Bugs in metrics.ranking.precision_recall_curve #5073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
how would you fix this? what test would you use to check that we do it the
right way?
|
I found this problem because one of my baseline model had a AUC score much higher then my expectation. My suggestion is use the point (0, precision of the first threshold) as the most left end. (0,1) will over-estimate the AUC, while (0, 0) will under-estimate it. The current metrics.precision_recall_curve returns a curve with AUC = 0.5167, and the method based on my suggestion will return: p_fix = [0.4, 0.33333333, 0.3333333] |
can you report how it is done in other standard packages? eg. in R for
example.
It would be great to agree on this.
|
R has several packages to plot PR curve. According to the 7th page of the doc of PRROC (https://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf), the PR curve starts from neither (0,0) or (0,1). The source code is like a mess, I'm not really understanding what they are doing there. But looks like they compute the curve point by point. |
honestly I don't know what's the best way to go. I acknowledge we certainly
do something wrong here.
|
Yeah, but taking average score of the first threshold should be a reasonable solution, since it's an expected value of y axis. If n is large and points are randomly ordered, the curve will be really close to the expected line. |
I assume this is a duplicate of #4223 (and no doubt others!) |
Yeah, exactly. |
Closing as duplicate. |
Current code for precision_recall_curve assumes the curve always passes through (recall=0, precision=1). However this is not the case.
For instance,

pred_proba = [0.8, 0.8, 0.8, 0.2, 0.2]
true_value = [0, 0, 1, 1, 0]
metrics.precision_recall_curve(true_value, pred_proba) will return
precision = [ 0.4 0.33333333 1. ]
recall = [ 1. 0.5 0. ]
the result's not correct and actually in favor of the poor model (the model misclassified points with high-score will have more area under the curve).
Be careful when using auc based on metrics.ranking.precision_recall_curve before the bug is solved
The text was updated successfully, but these errors were encountered: