FIX Improve check to ensure that ROC curve starts at (0,0) point #9850

alexryndin · 2017-09-28T19:47:19Z

Reference Issue

Fixes #9790

What does this implement/fix? Explain your changes.

roc_curve will always arbitrarily set thresholds[0] to max(y_score) + 1

Any other comments?

Not checked yet on a large dataset.

lesteve · 2017-09-29T09:41:13Z

CIs are red you need to look at the test failures and try to fix them. Look at Travis first, ignore lgtm.

alexryndin · 2017-09-29T10:46:50Z

As I see, tests don't expect extra threshold to be added, for instance one of them checks:
assert_array_almost_equal(thresholds, [1., 0.7, 0.])
whereas we suppose to see one more point with threshold = max(y_score) + 1.
Should I fix tests?

lesteve · 2017-09-29T12:03:27Z

I think you should look at git blame and understand why these tests were added. When you are confident that you understand the reason, and that the reason is not that relevant anymore, or for other reason you feel like these tests do not test something that matters that much, only then you can change the test.

Also basically you are changing the behaviour of the roc_curve function, you should ask yourself about the potential implications on someone that is using this function. How could a user use roc_curve in a way that your change would break his code?

qinhanmin2014 · 2017-09-29T15:21:00Z

Seems that the code is introduced in 8e9719d and the comment is introduced in 4d9a67f. From my perspectve, the reason for the code is that we want to ensure that the curve starts at y-axis (according to _binary_clf_curve, when tps[0](tpr[0]) != 0 , then fps[0](fpr[0]) =0 , the curve already starts at the y-axis). So seems that it will not make huge difference except that when you modify the code, you will introduce a seemingly unnecessary point in the curve and you also need to modify some test. I would prefer to modify the comment to make the result more consistent to users.

alexryndin · 2017-09-29T20:11:21Z

I think it is good just to clearly explain how the function works in the docs, but from my point of view the function works a bit tricky:

the function ensures that curve will always start at y-axis (if it is not so, the function adds (0,0) point), but why it doesn't ensures that the curve starts at x-axis?
sometimes user will get strange thresholds[0] = max(y_score) + 1, sometimes won't.

I'm not sure what is the best solution to fix this. May be we need always add (0,0), but set thresholds[0] to max(y_score) (not max(y_score+1))?

qinhanmin2014 · 2017-09-30T00:40:36Z

@alexryndin At this point we might need to hear from the core developers, especially the author of the code.

jnothman · 2017-10-03T03:41:46Z

Don't worry too much about tests failing if the tests seem unjustified. Also our concern should not be about making the minutiae of the documentation be matched by the function.

IMO, our concern should be that in all cases, the curve includes a point that represents predicting nothing in the positive class, and that every further point represents predicting more than nothing, for every threshold at which this changes the fpr or tpr, until all are predicted. If this is the case with the current implementation, then update the docs. If this is not the case with the current implementation, then update the implementation and its tests.

qinhanmin2014 · 2017-11-08T00:59:47Z

@alexryndin Are you still working on this? If so, could you please try to make CIs green?
(1) please merge master in/rebase to make lgtm run
(2) please change the test accordingly

alexryndin · 2017-11-08T18:49:28Z

@qinhanmin2014 I'm not currently working on this and unfortunately I'm not sure I could get working on this in a near future. Also, I'm not sure that I understand why Igtm Analysis Failed, Igtm Log doesn't give me any error, just "could not build the base commit"

jnothman · 2017-11-08T22:24:22Z

lgtm.com was down at the time. Don't worry about it. Thanks for your work so far, and your honesty in leaving it to someone else to complete.

FIX Improve check to ensure that ROC curve starts at (0,0) point

b8e1337

jnothman closed this Nov 8, 2017

qinhanmin2014 mentioned this pull request Nov 9, 2017

[MRG] Ensure that ROC curve starts at (0, 0) #10093

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Improve check to ensure that ROC curve starts at (0,0) point #9850

FIX Improve check to ensure that ROC curve starts at (0,0) point #9850

Uh oh!

alexryndin commented Sep 28, 2017

Uh oh!

lesteve commented Sep 29, 2017

Uh oh!

alexryndin commented Sep 29, 2017

Uh oh!

lesteve commented Sep 29, 2017

Uh oh!

qinhanmin2014 commented Sep 29, 2017

Uh oh!

alexryndin commented Sep 29, 2017 •

edited

Loading

Uh oh!

qinhanmin2014 commented Sep 30, 2017

Uh oh!

jnothman commented Oct 3, 2017

Uh oh!

qinhanmin2014 commented Nov 8, 2017

Uh oh!

alexryndin commented Nov 8, 2017

Uh oh!

jnothman commented Nov 8, 2017 via email

Uh oh!

Uh oh!

Uh oh!

FIX Improve check to ensure that ROC curve starts at (0,0) point #9850

FIX Improve check to ensure that ROC curve starts at (0,0) point #9850

Uh oh!

Conversation

alexryndin commented Sep 28, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lesteve commented Sep 29, 2017

Uh oh!

alexryndin commented Sep 29, 2017

Uh oh!

lesteve commented Sep 29, 2017

Uh oh!

qinhanmin2014 commented Sep 29, 2017

Uh oh!

alexryndin commented Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 commented Sep 30, 2017

Uh oh!

jnothman commented Oct 3, 2017

Uh oh!

qinhanmin2014 commented Nov 8, 2017

Uh oh!

alexryndin commented Nov 8, 2017

Uh oh!

jnothman commented Nov 8, 2017 via email

Uh oh!

Uh oh!

alexryndin commented Sep 29, 2017 •

edited

Loading