-
-
Notifications
You must be signed in to change notification settings - Fork 26k
precision_recall_curve threshold fix #5091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn.metrics.precision_recall_curve. Adjusted associated tests to compensate.
I'm not sure if we should change it just like this. It is now doing what it says in the docstring but the behavior change might break peoples code. Should we do a deprecation thing? |
@ogrisel do you know what's happening with appveyor? Can we restart? I still haven't figured that out. |
Good point. We could provide this behavior via an optional argument, and then deprecate that argument once the new behavior becomes the default. |
That means two deprecations, though, and I'm not sure if this is worth it :-/ |
Deprecation it is! |
To me, it looks like a bug fix more than a backward incompatible change. |
I'll wait for the verdict, but (for my information) in this case, if we do both the deprecation and the fix, those would need to be in separate pr's, right? |
@@ -467,7 +467,7 @@ def test_precision_recall_curve_pos_label(): | |||
assert_array_almost_equal(r, r2) | |||
assert_array_almost_equal(thresholds, thresholds2) | |||
assert_equal(p.size, r.size) | |||
assert_equal(p.size, thresholds.size + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks so deliberate. I'm confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about in the test above? What is happening there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is an issue due to the appendings in the return statement. When last_ind == thresholds.size - 1
(which occurs when the smallest y_scores
value has ground truth True
), there are no more threshold values available to make the returned thresholds
array have the same size as precision
and recall
.
If we want to keep array sizes consistent, in this case we could simply double the final threshold value:
thresholds = thresholds[sl]
... np.r_[thresholds, thresholds[min(last_ind + 1, thresholds.size - 1)]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want. It would be odd to have different shapes depending on the data. This might have been the reason for the original behavior?
Sorry, that's not right. Nor is the shape specified in the documentation, I'll admit. The docstring explains the discrepancy between P, R and threshold arrays, which your patch misconstrues: "The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold."
You might want to assure yourself of this behaviour (i.e. that the lowest score is not always dropped and sometimes many scores are) by repeatedly inspecting the output of: y_score = np.arange(10)
y_true = np.random.randint(2, size=10)
p, r, t = precision_recall_curve(y_true, y_score)
print(p, r, t, y_true, [p.shape, r.shape, t.shape]) |
I agree with your statements and that the code is correctly doing what it says. What is being called into question is if that is what it should do. For instance, should we have 1. and 0. values appended that have no corresponding threshold values. |
+1 |
I can't say I'm certain about this, and it and similar issues have been raised (at least #5073, #4223). But it's certainly the behaviour in the initial scikit-learn implementation, so we need to be sure about its theoretical correctness (in terms of calculating average precision) and usefulness, before breaking people's current code. In the meantime, because the bug is not as stated above, I'm considering this a duplicate of the above issues and closing it, while quick-fixing the documentation to say |
(potential) Fix for #4996
Included smallest threshold value when full recall is attained in
sklearn.metrics.precision_recall_curve
. Modified associated tests to agree.