Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Ensure that ROC curve starts at (0, 0) #10093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 10, 2017
Merged

[MRG] Ensure that ROC curve starts at (0, 0) #10093

merged 4 commits into from
Nov 10, 2017

Conversation

qinhanmin2014
Copy link
Member

Reference Issues/PRs

Fixes #9790
See also #9850

What does this implement/fix? Explain your changes.

Currently, when the first point of ROC curve is on y-axis, we don't add a point (0, 0), which is not consistent with doc & some papers & some R packages.
Reference:
(1)scikit-learn doc
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.
(2)@jnothman's comment
our concern should be that in all cases, the curve includes a point that represents predicting nothing in the positive class, and that every further point represents predicting more than nothing, for every threshold at which this changes the fpr or tpr, until all are predicted.
(3)An introduction to ROC analysis cite >7000 link
(4)R package ROCR

library(ROCR)
pred <- prediction(c(0.1, 0.4, 0.35, 0.8), c(0, 0, 1, 1))
perf <- performance(pred,"tpr","fpr")
plot(perf)

Any other comments?

@jnothman
Copy link
Member

jnothman commented Nov 9, 2017

Looks good at a glance. Please add to what's new. Can this change affect the auc? If so, document carefully.

Also, perhaps add that reference on roc analysis to the docs

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks a lot for the instant review :)
I have updated the doc and what's new accordingly. Since the fix is only adding vertical line (overlapping with y-axis) at the beginning of the curve, I believe it will not influence roc_auc_score.

@massich
Copy link
Contributor

massich commented Nov 9, 2017

LGTM

@@ -160,6 +161,11 @@ Metrics
- Fixed a bug due to floating point error in :func:`metrics.roc_auc_score` with
non-integer sample weights. :issue:`9786` by :user:`Hanmin Qin <qinhanmin2014>`.

- Fixed a bug where :func:`metrics.roc_curve` sometimes starts on y-axis instead
of (0, 0), which is inconsistent with the document and other implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this does not affect auc

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks. Comment addressed.

@jnothman
Copy link
Member

I don't think this is controversial: I'll take Joan's +1 on this... Let's merge. Thanks!

@jnothman jnothman merged commit 3e85359 into scikit-learn:master Nov 10, 2017
@qinhanmin2014 qinhanmin2014 deleted the roc_curve branch November 14, 2017 03:48
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roc_curve doesn't always arbitrarily set thresholds[0] to max(y_score) + 1
3 participants