Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX Improve check to ensure that ROC curve starts at (0,0) point #9850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

alexryndin
Copy link

Reference Issue

Fixes #9790

What does this implement/fix? Explain your changes.

roc_curve will always arbitrarily set thresholds[0] to max(y_score) + 1

Any other comments?

Not checked yet on a large dataset.

@lesteve
Copy link
Member

lesteve commented Sep 29, 2017

CIs are red you need to look at the test failures and try to fix them. Look at Travis first, ignore lgtm.

@alexryndin
Copy link
Author

As I see, tests don't expect extra threshold to be added, for instance one of them checks:
assert_array_almost_equal(thresholds, [1., 0.7, 0.])
whereas we suppose to see one more point with threshold = max(y_score) + 1.
Should I fix tests?

@lesteve
Copy link
Member

lesteve commented Sep 29, 2017

I think you should look at git blame and understand why these tests were added. When you are confident that you understand the reason, and that the reason is not that relevant anymore, or for other reason you feel like these tests do not test something that matters that much, only then you can change the test.

Also basically you are changing the behaviour of the roc_curve function, you should ask yourself about the potential implications on someone that is using this function. How could a user use roc_curve in a way that your change would break his code?

@qinhanmin2014
Copy link
Member

Seems that the code is introduced in 8e9719d and the comment is introduced in 4d9a67f. From my perspectve, the reason for the code is that we want to ensure that the curve starts at y-axis (according to _binary_clf_curve, when tps[0](tpr[0]) != 0 , then fps[0](fpr[0]) =0 , the curve already starts at the y-axis). So seems that it will not make huge difference except that when you modify the code, you will introduce a seemingly unnecessary point in the curve and you also need to modify some test. I would prefer to modify the comment to make the result more consistent to users.

@alexryndin
Copy link
Author

alexryndin commented Sep 29, 2017

I think it is good just to clearly explain how the function works in the docs, but from my point of view the function works a bit tricky:

  • the function ensures that curve will always start at y-axis (if it is not so, the function adds (0,0) point), but why it doesn't ensures that the curve starts at x-axis?
  • sometimes user will get strange thresholds[0] = max(y_score) + 1, sometimes won't.

I'm not sure what is the best solution to fix this. May be we need always add (0,0), but set thresholds[0] to max(y_score) (not max(y_score+1))?

@qinhanmin2014
Copy link
Member

@alexryndin At this point we might need to hear from the core developers, especially the author of the code.

@jnothman
Copy link
Member

jnothman commented Oct 3, 2017

Don't worry too much about tests failing if the tests seem unjustified. Also our concern should not be about making the minutiae of the documentation be matched by the function.

IMO, our concern should be that in all cases, the curve includes a point that represents predicting nothing in the positive class, and that every further point represents predicting more than nothing, for every threshold at which this changes the fpr or tpr, until all are predicted. If this is the case with the current implementation, then update the docs. If this is not the case with the current implementation, then update the implementation and its tests.

@qinhanmin2014
Copy link
Member

@alexryndin Are you still working on this? If so, could you please try to make CIs green?
(1) please merge master in/rebase to make lgtm run
(2) please change the test accordingly

@alexryndin
Copy link
Author

@qinhanmin2014 I'm not currently working on this and unfortunately I'm not sure I could get working on this in a near future. Also, I'm not sure that I understand why Igtm Analysis Failed, Igtm Log doesn't give me any error, just "could not build the base commit"

@jnothman
Copy link
Member

jnothman commented Nov 8, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roc_curve doesn't always arbitrarily set thresholds[0] to max(y_score) + 1
4 participants