Thanks to visit codestin.com
Credit goes to github.com

Skip to content

precision_recall_curve threshold fix #5091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jayflo
Copy link

@jayflo jayflo commented Aug 5, 2015

(potential) Fix for #4996

Included smallest threshold value when full recall is attained in sklearn.metrics.precision_recall_curve. Modified associated tests to agree.

sklearn.metrics.precision_recall_curve.  Adjusted associated tests to
compensate.
@jayflo jayflo changed the title precision_recall_curve threshold fix (#4996) precision_recall_curve threshold fix Aug 5, 2015
@amueller
Copy link
Member

amueller commented Aug 6, 2015

I'm not sure if we should change it just like this. It is now doing what it says in the docstring but the behavior change might break peoples code. Should we do a deprecation thing?

@amueller
Copy link
Member

amueller commented Aug 6, 2015

@ogrisel do you know what's happening with appveyor? Can we restart? I still haven't figured that out.

@jayflo
Copy link
Author

jayflo commented Aug 6, 2015

Good point. We could provide this behavior via an optional argument, and then deprecate that argument once the new behavior becomes the default.

@amueller
Copy link
Member

amueller commented Aug 6, 2015

That means two deprecations, though, and I'm not sure if this is worth it :-/

@jayflo
Copy link
Author

jayflo commented Aug 6, 2015

Deprecation it is!

@arjoly
Copy link
Member

arjoly commented Aug 7, 2015

To me, it looks like a bug fix more than a backward incompatible change.
LGTM

@jayflo
Copy link
Author

jayflo commented Aug 7, 2015

I'll wait for the verdict, but (for my information) in this case, if we do both the deprecation and the fix, those would need to be in separate pr's, right?

@@ -467,7 +467,7 @@ def test_precision_recall_curve_pos_label():
assert_array_almost_equal(r, r2)
assert_array_almost_equal(thresholds, thresholds2)
assert_equal(p.size, r.size)
assert_equal(p.size, thresholds.size + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks so deliberate. I'm confused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about in the test above? What is happening there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is an issue due to the appendings in the return statement. When last_ind == thresholds.size - 1 (which occurs when the smallest y_scores value has ground truth True), there are no more threshold values available to make the returned thresholds array have the same size as precision and recall.

If we want to keep array sizes consistent, in this case we could simply double the final threshold value:

thresholds = thresholds[sl]
... np.r_[thresholds, thresholds[min(last_ind + 1, thresholds.size - 1)]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want. It would be odd to have different shapes depending on the data. This might have been the reason for the original behavior?

@amueller
Copy link
Member

@jnothman added the tests here: 8e9719d It would be good to have his opinion.

@jayflo if we deprecate it needs to be in the same PR. but if @arjoly says no deprecation, maybe not. Let's see what @jnothman says.

@jnothman
Copy link
Member

Sorry, that's not right. Nor is the shape specified in the documentation, I'll admit.

The docstring explains the discrepancy between P, R and threshold arrays, which your patch misconstrues: "The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold."

threshold consists of the maximum score (i.e. the most conservative model) obtaining the corresponding P and R. Lower scores that are redundant in the sense that no further data points are true or predicted are omitted. Hence len(threshold) <= np.unique(y_score). The documentation should be updated to make this clearer, and a PR is very welcome.

You might want to assure yourself of this behaviour (i.e. that the lowest score is not always dropped and sometimes many scores are) by repeatedly inspecting the output of:

y_score = np.arange(10)
y_true = np.random.randint(2, size=10)
p, r, t = precision_recall_curve(y_true, y_score)
print(p, r, t, y_true, [p.shape, r.shape, t.shape])

@jayflo
Copy link
Author

jayflo commented Aug 14, 2015

I agree with your statements and that the code is correctly doing what it says. What is being called into question is if that is what it should do. For instance, should we have 1. and 0. values appended that have no corresponding threshold values.

@arjoly
Copy link
Member

arjoly commented Aug 14, 2015

The documentation should be updated to make this clearer, and a PR is very welcome.

+1

@jnothman
Copy link
Member

For instance, should we have 1. and 0. values appended that have no corresponding threshold values.

I can't say I'm certain about this, and it and similar issues have been raised (at least #5073, #4223). But it's certainly the behaviour in the initial scikit-learn implementation, so we need to be sure about its theoretical correctness (in terms of calculating average precision) and usefulness, before breaking people's current code. In the meantime, because the bug is not as stated above, I'm considering this a duplicate of the above issues and closing it, while quick-fixing the documentation to say <= instead of := in 749f2a9.

@jnothman jnothman reopened this Aug 15, 2015
jnothman added a commit that referenced this pull request Aug 15, 2015
@jnothman jnothman closed this Aug 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants