Thanks to visit codestin.com
Credit goes to github.com

Skip to content

average_precision_score breaks on string labels #12312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Oct 5, 2018 · 10 comments
Closed

average_precision_score breaks on string labels #12312

amueller opened this issue Oct 5, 2018 · 10 comments
Labels

Comments

@amueller
Copy link
Member

amueller commented Oct 5, 2018

import numpy as np
from sklearn.metrics import average_precision_score
probs = np.array([0.41722746, 0.07162791, 0.41722746, 0.07162791, 0.69208494,
                  0.69208494, 0.40750916, 0.18227092, 0.40750916, 0.07162791])
labels = np.array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No'])

average_precision_score(labels, probs)

TypeError: 'bool' object is not subscriptable

That's not very helpful. Fixed in #12313

Casting to dtype object (as coming from pandas):

import numpy as np
from sklearn.metrics import average_precision_score
probs = np.array([0.41722746, 0.07162791, 0.41722746, 0.07162791, 0.69208494,
                  0.69208494, 0.40750916, 0.18227092, 0.40750916, 0.07162791])
labels = np.array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No'], dtype=object)

average_precision_score(labels, probs)
RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]

np.NaN

That's terrible.... Fixed in #12313

What I actually did was

probs = np.array([0.41722746, 0.07162791, 0.41722746, 0.07162791, 0.69208494,
                  0.69208494, 0.40750916, 0.18227092, 0.40750916, 0.07162791])
labels = np.array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No'])

average_precision_score(labels, probs, pos_label='yes')  # TYPO
RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]

np.NaN

that's also terrible...

Originally I used cross-validation, which is arguably worse:

from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

target = np.array(["yes", "no"], dtype="object")[data.target]
cross_val_score(DecisionTreeClassifier(max_depth=3), data.data, target, scoring='average_precision', cv=5)
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
array([nan, nan, nan, nan, nan])
@amueller amueller added the Bug label Oct 5, 2018
@amueller
Copy link
Member Author

amueller commented Oct 5, 2018

I think this bug was introduced in #9980, pos_label=1 is the culprit, because it overwrites the logic in _binary_clf_curve that requires the user to explicitly set a pos_label.

ping @qinhanmin2014.
We should probably also error, or at least warn if pos_label is not present.

@qinhanmin2014
Copy link
Member

Thanks for the issue and apologies for the inconvenience. Some quick replies, will further investigate and fix this one later this day:
(1) Yes, we need to raise an error when there're two classes and pos_label is not among them (like what we've done in P/R/F)
(2) The NaN bug is not related to my PR but I'll try to look into it. It won't be hard to handle it separately.
(3) I use pos_label=1 because pos_label=None is ill-defined in _binary_clf_curve (and in the repo, see #10010). You might expect that pos_label=None will take the larger label but it actually means pos_label=1 in _binary_clf_curve.
(4) I guess it's acceptable that if users need to set a parameter, they need to rely on make_scorer instead of the default scorer?

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Oct 6, 2018

Regarding the nan issue, we have a PR #8280. I can take that one but before that @amueller , I want your decision about the edge cases of precision & recall.
(1) (not related to current fix) How to define precision when TP + FP = 0? In precision_score, we raise a warning and return 0. In average_precision_score, we return 0 (in my PR #9980). But in a SO question provided by the contributor of #8280, it's defined as 1.
(2) (related to current fix) How to define recall when TP + FN = 0? In recall_score, we raise a warning and return 0. In average_precision_score, we raise a warning and return nan (the nan issue here). In #8280, the contributor return 1 in average_precision_score (consistent with the SO question he provided), which is inconsistent with our recall_score.
Also note that the nan issue is actually tested by us currently, so it's actually the expected behavior now.

@amueller
Copy link
Member Author

amueller commented Oct 7, 2018

I'm not sure what we should advise. We could ask users also to just do y == pos_label. That would remove the need to call make_scorer. So I think we should either recommend that to users in the error message, or we could try to use the minority class as positive - but we could only do that if we had the training set available, which we don't.

@amueller
Copy link
Member Author

amueller commented Oct 7, 2018

I'm not sure what you mean by "the nan issue".
It's a consequence of an undefined pos_label here, right? I think it's probably the right thing to do if only one label is present.

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Oct 8, 2018

I'm not sure what you mean by "the nan issue".

The nan issue means that if we only have negative classes in y_true, we'll get nan (along with a warning). The issue here is whether we allow metrics & scorers to output nan and how to solve it.

We could ask users also to just do y == pos_label.

Apologies I don't understand here. I think we add pos_label here so that users don't need to do things like y == pos_label.

I've added the error message in #12313. If you think the nan output you got in this issue is fine, then that PR will close this issue.

@amueller
Copy link
Member Author

amueller commented Oct 9, 2018

Apologies I don't understand here. I think we add pos_label here so that users don't need to do things like y == pos_label.

Well doing y == pos_label is much easier than using make_scorer though.

@amueller
Copy link
Member Author

amueller commented Oct 9, 2018

I feel like all the issues I raised are fixed by #12313. But it's still awkward to actually use this with the scorer interface and I'm not sure what we should be suggesting.

@qinhanmin2014
Copy link
Member

I feel like all the issues I raised are fixed by #12313. But it's still awkward to actually use this with the scorer interface and I'm not sure what we should be suggesting.

If you don't like make_scorer, then we need to define pos_label=None. The difficult thing is how to define it when there's only one label (i.e., should we treat it as positive label or negative label?).

Seems that you don't worry about getting nan, so I'm going to close this one. For pos_label related discussions, we have an issue #10010.

Reopen if you disagree.

@amueller
Copy link
Member Author

no I think this is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants