precision_recall_curve threshold fix #5091

jayflo · 2015-08-05T20:07:34Z

(potential) Fix for #4996

Included smallest threshold value when full recall is attained in sklearn.metrics.precision_recall_curve. Modified associated tests to agree.

sklearn.metrics.precision_recall_curve. Adjusted associated tests to compensate.

amueller · 2015-08-06T16:33:59Z

I'm not sure if we should change it just like this. It is now doing what it says in the docstring but the behavior change might break peoples code. Should we do a deprecation thing?

amueller · 2015-08-06T16:35:13Z

@ogrisel do you know what's happening with appveyor? Can we restart? I still haven't figured that out.

jayflo · 2015-08-06T16:52:34Z

Good point. We could provide this behavior via an optional argument, and then deprecate that argument once the new behavior becomes the default.

amueller · 2015-08-06T17:22:46Z

That means two deprecations, though, and I'm not sure if this is worth it :-/

jayflo · 2015-08-06T18:14:33Z

Deprecation it is!

arjoly · 2015-08-07T08:00:36Z

To me, it looks like a bug fix more than a backward incompatible change.
LGTM

jayflo · 2015-08-07T13:29:03Z

I'll wait for the verdict, but (for my information) in this case, if we do both the deprecation and the fix, those would need to be in separate pr's, right?

amueller · 2015-08-13T17:06:22Z

sklearn/metrics/tests/test_ranking.py

@@ -467,7 +467,7 @@ def test_precision_recall_curve_pos_label():
    assert_array_almost_equal(r, r2)
    assert_array_almost_equal(thresholds, thresholds2)
    assert_equal(p.size, r.size)
-    assert_equal(p.size, thresholds.size + 1)


this looks so deliberate. I'm confused.

How about in the test above? What is happening there?

Ah, this is an issue due to the appendings in the return statement. When last_ind == thresholds.size - 1 (which occurs when the smallest y_scores value has ground truth True), there are no more threshold values available to make the returned thresholds array have the same size as precision and recall.

If we want to keep array sizes consistent, in this case we could simply double the final threshold value:

thresholds = thresholds[sl] ... np.r_[thresholds, thresholds[min(last_ind + 1, thresholds.size - 1)]]

We probably want. It would be odd to have different shapes depending on the data. This might have been the reason for the original behavior?

amueller · 2015-08-13T17:11:15Z

@jnothman added the tests here: 8e9719d It would be good to have his opinion.

@jayflo if we deprecate it needs to be in the same PR. but if @arjoly says no deprecation, maybe not. Let's see what @jnothman says.

jnothman · 2015-08-14T00:55:17Z

Sorry, that's not right. Nor is the shape specified in the documentation, I'll admit.

The docstring explains the discrepancy between P, R and threshold arrays, which your patch misconstrues: "The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold."

threshold consists of the maximum score (i.e. the most conservative model) obtaining the corresponding P and R. Lower scores that are redundant in the sense that no further data points are true or predicted are omitted. Hence len(threshold) <= np.unique(y_score). The documentation should be updated to make this clearer, and a PR is very welcome.

You might want to assure yourself of this behaviour (i.e. that the lowest score is not always dropped and sometimes many scores are) by repeatedly inspecting the output of:

y_score = np.arange(10)
y_true = np.random.randint(2, size=10)
p, r, t = precision_recall_curve(y_true, y_score)
print(p, r, t, y_true, [p.shape, r.shape, t.shape])

jayflo · 2015-08-14T14:12:23Z

I agree with your statements and that the code is correctly doing what it says. What is being called into question is if that is what it should do. For instance, should we have 1. and 0. values appended that have no corresponding threshold values.

arjoly · 2015-08-14T14:36:29Z

The documentation should be updated to make this clearer, and a PR is very welcome.

+1

jnothman · 2015-08-15T10:12:45Z

For instance, should we have 1. and 0. values appended that have no corresponding threshold values.

I can't say I'm certain about this, and it and similar issues have been raised (at least #5073, #4223). But it's certainly the behaviour in the initial scikit-learn implementation, so we need to be sure about its theoretical correctness (in terms of calculating average precision) and usefulness, before breaking people's current code. In the meantime, because the bug is not as stated above, I'm considering this a duplicate of the above issues and closing it, while quick-fixing the documentation to say <= instead of := in 749f2a9.

See #5091

Smallest threshold value included when full recall is attained in

f8e458e

sklearn.metrics.precision_recall_curve. Adjusted associated tests to compensate.

jayflo changed the title ~~precision_recall_curve threshold fix (#4996)~~ precision_recall_curve threshold fix Aug 5, 2015

Updating example in rst

40a6498

amueller reviewed Aug 13, 2015
View reviewed changes

jnothman closed this Aug 14, 2015

jnothman mentioned this pull request Aug 14, 2015

The shape of threshold returned by precision_recall_curve #4996

Closed

jnothman reopened this Aug 15, 2015

jnothman added a commit that referenced this pull request Aug 15, 2015

DOC n_thresholds may be < no. of unique scores

749f2a9

See #5091

jnothman closed this Aug 15, 2015

Uh oh!

precision_recall_curve threshold fix #5091

precision_recall_curve threshold fix #5091

Uh oh!

Conversation

jayflo commented Aug 5, 2015

Uh oh!

amueller commented Aug 6, 2015

Uh oh!

amueller commented Aug 6, 2015

Uh oh!

jayflo commented Aug 6, 2015

Uh oh!

amueller commented Aug 6, 2015

Uh oh!

jayflo commented Aug 6, 2015

Uh oh!

arjoly commented Aug 7, 2015

Uh oh!

jayflo commented Aug 7, 2015

Uh oh!

amueller Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

amueller Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

jayflo Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

amueller Aug 13, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 13, 2015

Uh oh!

jnothman commented Aug 14, 2015

Uh oh!

jayflo commented Aug 14, 2015

Uh oh!

arjoly commented Aug 14, 2015

Uh oh!

jnothman commented Aug 15, 2015

Uh oh!

Uh oh!