-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Improve check to ensure that ROC curve starts at (0,0) point #9850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CIs are red you need to look at the test failures and try to fix them. Look at Travis first, ignore lgtm. |
As I see, tests don't expect extra threshold to be added, for instance one of them checks: |
I think you should look at git blame and understand why these tests were added. When you are confident that you understand the reason, and that the reason is not that relevant anymore, or for other reason you feel like these tests do not test something that matters that much, only then you can change the test. Also basically you are changing the behaviour of the |
Seems that the code is introduced in 8e9719d and the comment is introduced in 4d9a67f. From my perspectve, the reason for the code is that we want to ensure that the curve starts at y-axis (according to _binary_clf_curve, when |
I think it is good just to clearly explain how the function works in the docs, but from my point of view the function works a bit tricky:
I'm not sure what is the best solution to fix this. May be we need always add (0,0), but set thresholds[0] to max(y_score) (not max(y_score+1))? |
@alexryndin At this point we might need to hear from the core developers, especially the author of the code. |
Don't worry too much about tests failing if the tests seem unjustified. Also our concern should not be about making the minutiae of the documentation be matched by the function. IMO, our concern should be that in all cases, the curve includes a point that represents predicting nothing in the positive class, and that every further point represents predicting more than nothing, for every threshold at which this changes the fpr or tpr, until all are predicted. If this is the case with the current implementation, then update the docs. If this is not the case with the current implementation, then update the implementation and its tests. |
@alexryndin Are you still working on this? If so, could you please try to make CIs green? |
@qinhanmin2014 I'm not currently working on this and unfortunately I'm not sure I could get working on this in a near future. Also, I'm not sure that I understand why Igtm Analysis Failed, Igtm Log doesn't give me any error, just "could not build the base commit" |
lgtm.com was down at the time. Don't worry about it.
Thanks for your work so far, and your honesty in leaving it to someone else
to complete.
|
Reference Issue
Fixes #9790
What does this implement/fix? Explain your changes.
roc_curve will always arbitrarily set thresholds[0] to max(y_score) + 1
Any other comments?
Not checked yet on a large dataset.