The shape of threshold returned by precision_recall_curve #4996

yangj1e · 2015-07-18T07:29:30Z

In the sklearn.metrics.precision_recall_curve documentation,

thresholds : array, shape = [n_thresholds := len(np.unique(probas_pred))]

But as the example below shows,

>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
...
>>> thresholds
array([ 0.35,  0.4 ,  0.8 ])

the shape of threshold is not len(np.unique(probas_pred)).

The text was updated successfully, but these errors were encountered:

amueller · 2015-07-31T16:36:31Z

True... and that is inconsistent between roc_curve (which includes all thresholds) and precision_recall_curve. Maybe we should just add the smallest values as threshold for precision_recall_curve, too?

That would be backward-incompatible but would actually do what the docs say.

jayflo · 2015-08-05T17:29:09Z

I can try this one.

Can you expand on "Maybe we should just add the smallest values as threshold for precision_recall_curve, too?"?

amueller · 2015-08-05T18:25:38Z

well the smallest score value seems to be dropped from threshold. We could not drop it.

jayflo · 2015-08-05T18:31:35Z

my fault, i misread the 0.1 as 1.0 and did not understand the "smallest" part in your statement. taking a look now.

jnothman · 2015-08-14T00:59:17Z

My explanation in #5091 (comment). TLDR: documentation not precisely correct, but code is.

amueller · 2015-08-14T17:22:44Z

~~Why is that the correct behavior? I find it surprising, and~~ it seems to be inconsistent with roc_curve, but maybe I'm missing something?

amueller · 2015-08-14T17:26:29Z

Inspecting the output of the code you gave, never more then one value is dropped (but maybe just because it is very unlikely?)

jayflo · 2015-08-14T17:54:39Z

len(thresholds) depends both on the unique values in y_scores and the index of the final y_scores value that is a true positive (after y_scores has been sorted desc.). When y_true values are generated uniformly, it's likely that even indices with low y_scores can be true positives which keeps len(y_scores) - len(thresholds) small.

amueller · 2015-08-14T18:20:37Z

right, len(np.unique(y_scores)) - len(thresholds) can be arbitrary. but len(precision) - len(thresholds) will always be 1.
I thought this wasn't about discrepancy between the unique scores and thresholds but between precisions and thresholds. I understand that we don't give the last threshold, I just don't understand why, and I feel that should be consistent between roc and precision_recall.

But then again, I have to recheck the behavior of roc_curve which seems to have a different behavior of dropping points.

jnothman · 2015-08-15T10:13:51Z

Yes, we possibly should be consistent, but it depends in part on how auc
should be calculated for each, which is disputed elsewhere (e.g. #4223).
The difference in size might indeed be considered a historical quirk, but I
don't think it's harmful as long as documentation is sufficient.

On 15 August 2015 at 04:21, Andreas Mueller [email protected]
wrote:

right, len(np.unique(y_scores)) - len(thresholds) can be arbitrary. but len(precision)

len(thresholds) will always be 1.
I thought this wasn't about discrepancy between the unique scores and
thresholds but between precisions and thresholds. I understand that we
don't give the last threshold, I just don't understand why, and I feel that
should be consistent between roc and precision_recall.

But then again, I have to recheck the behavior of roc_curve which seems
to have a different behavior of dropping points.

—
Reply to this email directly or view it on GitHub
#4996 (comment)
.

jnothman · 2015-08-15T13:40:32Z

I'm considering this a duplicate of #4223, #5073 and closing it, while quick-fixing the documentation to say <= instead of := in 749f2a9.

amueller added Bug Documentation and removed Documentation labels Jul 31, 2015

jayflo mentioned this issue Aug 5, 2015

precision_recall_curve threshold fix #5091

Closed

jnothman closed this as completed Aug 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The shape of threshold returned by precision_recall_curve #4996

The shape of threshold returned by precision_recall_curve #4996

yangj1e commented Jul 18, 2015

amueller commented Jul 31, 2015

Uh oh!

jayflo commented Aug 5, 2015

Uh oh!

amueller commented Aug 5, 2015

Uh oh!

jayflo commented Aug 5, 2015

Uh oh!

jnothman commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

jayflo commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!

Uh oh!

The shape of threshold returned by precision_recall_curve #4996

The shape of threshold returned by precision_recall_curve #4996

Comments

yangj1e commented Jul 18, 2015

amueller commented Jul 31, 2015

Uh oh!

jayflo commented Aug 5, 2015

Uh oh!

amueller commented Aug 5, 2015

Uh oh!

jayflo commented Aug 5, 2015

Uh oh!

jnothman commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

jayflo commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!