Bugs in metrics.ranking.precision_recall_curve #5073

adamzjw · 2015-07-31T23:43:32Z

Current code for precision_recall_curve assumes the curve always passes through (recall=0, precision=1). However this is not the case.

For instance,
pred_proba = [0.8, 0.8, 0.8, 0.2, 0.2]
true_value = [0, 0, 1, 1, 0]
metrics.precision_recall_curve(true_value, pred_proba) will return
precision = [ 0.4 0.33333333 1. ]
recall = [ 1. 0.5 0. ]

the result's not correct and actually in favor of the poor model (the model misclassified points with high-score will have more area under the curve).

Be careful when using auc based on metrics.ranking.precision_recall_curve before the bug is solved

agramfort · 2015-08-01T07:43:11Z

how would you fix this? what test would you use to check that we do it the right way?

adamzjw · 2015-08-01T18:34:28Z

I found this problem because one of my baseline model had a AUC score much higher then my expectation.
The baseline model returned a result similar to my example above. The tie in the first threshold is the root cause.

My suggestion is use the point (0, precision of the first threshold) as the most left end. (0,1) will over-estimate the AUC, while (0, 0) will under-estimate it.

The current metrics.precision_recall_curve returns a curve with AUC = 0.5167, and the method based on my suggestion will return:

p_fix = [0.4, 0.33333333, 0.3333333]
recall = [ 1. 0.5 0. ]
AUC = 0.3498

agramfort · 2015-08-04T12:01:42Z

can you report how it is done in other standard packages? eg. in R for example. It would be great to agree on this.

adamzjw · 2015-08-06T21:53:06Z

R has several packages to plot PR curve.

According to the 7th page of the doc of PRROC (https://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf), the PR curve starts from neither (0,0) or (0,1).

The source code is like a mess, I'm not really understanding what they are doing there. But looks like they compute the curve point by point.

agramfort · 2015-08-08T14:07:41Z

honestly I don't know what's the best way to go. I acknowledge we certainly do something wrong here.

adamzjw · 2015-08-11T06:59:35Z

Yeah, but taking average score of the first threshold should be a reasonable solution, since it's an expected value of y axis. If n is large and points are randomly ordered, the curve will be really close to the expected line.

jnothman · 2015-08-15T10:07:08Z

I assume this is a duplicate of #4223 (and no doubt others!)

adamzjw · 2015-08-16T17:35:09Z

Yeah, exactly.

jnothman · 2015-08-17T01:32:20Z

Closing as duplicate.

amueller added the Bug label Aug 1, 2015

amueller added this to the 0.17 milestone Aug 3, 2015

This was referenced Aug 15, 2015

precision_recall_curve threshold fix #5091

Closed

The shape of threshold returned by precision_recall_curve #4996

Closed

jnothman closed this as completed Aug 17, 2015

jnothman mentioned this issue Aug 17, 2015

precision_recall_curve - assumed limits can be misleading #4223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bugs in metrics.ranking.precision_recall_curve #5073

Bugs in metrics.ranking.precision_recall_curve #5073

adamzjw commented Jul 31, 2015

agramfort commented Aug 1, 2015 via email

Uh oh!

adamzjw commented Aug 1, 2015

Uh oh!

agramfort commented Aug 4, 2015 via email

Uh oh!

adamzjw commented Aug 6, 2015

Uh oh!

agramfort commented Aug 8, 2015 via email

Uh oh!

adamzjw commented Aug 11, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!

adamzjw commented Aug 16, 2015

Uh oh!

jnothman commented Aug 17, 2015

Uh oh!

Uh oh!

Bugs in metrics.ranking.precision_recall_curve #5073

Bugs in metrics.ranking.precision_recall_curve #5073

Comments

adamzjw commented Jul 31, 2015

agramfort commented Aug 1, 2015 via email

Uh oh!

adamzjw commented Aug 1, 2015

Uh oh!

agramfort commented Aug 4, 2015 via email

Uh oh!

adamzjw commented Aug 6, 2015

Uh oh!

agramfort commented Aug 8, 2015 via email

Uh oh!

adamzjw commented Aug 11, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!

adamzjw commented Aug 16, 2015

Uh oh!

jnothman commented Aug 17, 2015

Uh oh!