Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The shape of threshold returned by precision_recall_curve #4996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yangj1e opened this issue Jul 18, 2015 · 11 comments
Closed

The shape of threshold returned by precision_recall_curve #4996

yangj1e opened this issue Jul 18, 2015 · 11 comments
Labels

Comments

@yangj1e
Copy link

yangj1e commented Jul 18, 2015

In the sklearn.metrics.precision_recall_curve documentation,

thresholds : array, shape = [n_thresholds := len(np.unique(probas_pred))]

But as the example below shows,

>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
...
>>> thresholds
array([ 0.35,  0.4 ,  0.8 ])

the shape of threshold is not len(np.unique(probas_pred)).

@amueller
Copy link
Member

True... and that is inconsistent between roc_curve (which includes all thresholds) and precision_recall_curve. Maybe we should just add the smallest values as threshold for precision_recall_curve, too?

That would be backward-incompatible but would actually do what the docs say.

@jayflo
Copy link

jayflo commented Aug 5, 2015

I can try this one.

Can you expand on "Maybe we should just add the smallest values as threshold for precision_recall_curve, too?"?

@amueller
Copy link
Member

amueller commented Aug 5, 2015

well the smallest score value seems to be dropped from threshold. We could not drop it.

@jayflo
Copy link

jayflo commented Aug 5, 2015

my fault, i misread the 0.1 as 1.0 and did not understand the "smallest" part in your statement. taking a look now.

@jnothman
Copy link
Member

My explanation in #5091 (comment). TLDR: documentation not precisely correct, but code is.

@amueller
Copy link
Member

Why is that the correct behavior? I find it surprising, and it seems to be inconsistent with roc_curve, but maybe I'm missing something?

@amueller
Copy link
Member

Inspecting the output of the code you gave, never more then one value is dropped (but maybe just because it is very unlikely?)

@jayflo
Copy link

jayflo commented Aug 14, 2015

len(thresholds) depends both on the unique values in y_scores and the index of the final y_scores value that is a true positive (after y_scores has been sorted desc.). When y_true values are generated uniformly, it's likely that even indices with low y_scores can be true positives which keeps len(y_scores) - len(thresholds) small.

@amueller
Copy link
Member

right, len(np.unique(y_scores)) - len(thresholds) can be arbitrary. but len(precision) - len(thresholds) will always be 1.
I thought this wasn't about discrepancy between the unique scores and thresholds but between precisions and thresholds. I understand that we don't give the last threshold, I just don't understand why, and I feel that should be consistent between roc and precision_recall.

But then again, I have to recheck the behavior of roc_curve which seems to have a different behavior of dropping points.

@jnothman
Copy link
Member

Yes, we possibly should be consistent, but it depends in part on how auc
should be calculated for each, which is disputed elsewhere (e.g. #4223).
The difference in size might indeed be considered a historical quirk, but I
don't think it's harmful as long as documentation is sufficient.

On 15 August 2015 at 04:21, Andreas Mueller [email protected]
wrote:

right, len(np.unique(y_scores)) - len(thresholds) can be arbitrary. but len(precision)

  • len(thresholds) will always be 1.
    I thought this wasn't about discrepancy between the unique scores and
    thresholds but between precisions and thresholds. I understand that we
    don't give the last threshold, I just don't understand why, and I feel that
    should be consistent between roc and precision_recall.

But then again, I have to recheck the behavior of roc_curve which seems
to have a different behavior of dropping points.


Reply to this email directly or view it on GitHub
#4996 (comment)
.

@jnothman
Copy link
Member

I'm considering this a duplicate of #4223, #5073 and closing it, while quick-fixing the documentation to say <= instead of := in 749f2a9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants