Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Bugs in metrics.ranking.precision_recall_curve #5073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adamzjw opened this issue Jul 31, 2015 · 9 comments
Closed

Bugs in metrics.ranking.precision_recall_curve #5073

adamzjw opened this issue Jul 31, 2015 · 9 comments
Labels
Milestone

Comments

@adamzjw
Copy link

adamzjw commented Jul 31, 2015

Current code for precision_recall_curve assumes the curve always passes through (recall=0, precision=1). However this is not the case.

For instance,
pred_proba = [0.8, 0.8, 0.8, 0.2, 0.2]
true_value = [0, 0, 1, 1, 0]
metrics.precision_recall_curve(true_value, pred_proba) will return
precision = [ 0.4 0.33333333 1. ]
recall = [ 1. 0.5 0. ]
index

the result's not correct and actually in favor of the poor model (the model misclassified points with high-score will have more area under the curve).

Be careful when using auc based on metrics.ranking.precision_recall_curve before the bug is solved

@agramfort
Copy link
Member

agramfort commented Aug 1, 2015 via email

@amueller amueller added the Bug label Aug 1, 2015
@adamzjw
Copy link
Author

adamzjw commented Aug 1, 2015

I found this problem because one of my baseline model had a AUC score much higher then my expectation.
The baseline model returned a result similar to my example above. The tie in the first threshold is the root cause.

My suggestion is use the point (0, precision of the first threshold) as the most left end. (0,1) will over-estimate the AUC, while (0, 0) will under-estimate it.

The current metrics.precision_recall_curve returns a curve with AUC = 0.5167, and the method based on my suggestion will return:

p_fix = [0.4, 0.33333333, 0.3333333]
recall = [ 1. 0.5 0. ]
AUC = 0.3498

@amueller amueller added this to the 0.17 milestone Aug 3, 2015
@agramfort
Copy link
Member

agramfort commented Aug 4, 2015 via email

@adamzjw
Copy link
Author

adamzjw commented Aug 6, 2015

R has several packages to plot PR curve.

According to the 7th page of the doc of PRROC (https://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf), the PR curve starts from neither (0,0) or (0,1).

The source code is like a mess, I'm not really understanding what they are doing there. But looks like they compute the curve point by point.

@agramfort
Copy link
Member

agramfort commented Aug 8, 2015 via email

@adamzjw
Copy link
Author

adamzjw commented Aug 11, 2015

Yeah, but taking average score of the first threshold should be a reasonable solution, since it's an expected value of y axis. If n is large and points are randomly ordered, the curve will be really close to the expected line.

@jnothman
Copy link
Member

I assume this is a duplicate of #4223 (and no doubt others!)

@adamzjw
Copy link
Author

adamzjw commented Aug 16, 2015

Yeah, exactly.

@jnothman
Copy link
Member

Closing as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants