average_precision_score breaks on string labels #12312

amueller · 2018-10-05T21:42:55Z

import numpy as np
from sklearn.metrics import average_precision_score
probs = np.array([0.41722746, 0.07162791, 0.41722746, 0.07162791, 0.69208494,
                  0.69208494, 0.40750916, 0.18227092, 0.40750916, 0.07162791])
labels = np.array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No'])

average_precision_score(labels, probs)

TypeError: 'bool' object is not subscriptable

~~That's not very helpful.~~ Fixed in #12313

Casting to dtype object (as coming from pandas):

import numpy as np
from sklearn.metrics import average_precision_score
probs = np.array([0.41722746, 0.07162791, 0.41722746, 0.07162791, 0.69208494,
                  0.69208494, 0.40750916, 0.18227092, 0.40750916, 0.07162791])
labels = np.array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No'], dtype=object)

average_precision_score(labels, probs)

RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]

np.NaN

~~That's terrible....~~ Fixed in #12313

What I actually did was

probs = np.array([0.41722746, 0.07162791, 0.41722746, 0.07162791, 0.69208494,
                  0.69208494, 0.40750916, 0.18227092, 0.40750916, 0.07162791])
labels = np.array(['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No'])

average_precision_score(labels, probs, pos_label='yes')  # TYPO

RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]

np.NaN

that's also terrible...

Originally I used cross-validation, which is arguably worse:

from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

target = np.array(["yes", "no"], dtype="object")[data.target]
cross_val_score(DecisionTreeClassifier(max_depth=3), data.data, target, scoring='average_precision', cv=5)

/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
/home/andy/checkout/scikit-learn/sklearn/metrics/ranking.py:521: RuntimeWarning: invalid value encountered in true_divide
  recall = tps / tps[-1]
array([nan, nan, nan, nan, nan])

The text was updated successfully, but these errors were encountered:

amueller · 2018-10-05T21:48:48Z

I think this bug was introduced in #9980, pos_label=1 is the culprit, because it overwrites the logic in _binary_clf_curve that requires the user to explicitly set a pos_label.

ping @qinhanmin2014.
We should probably also error, or at least warn if pos_label is not present.

qinhanmin2014 · 2018-10-06T01:13:01Z

Thanks for the issue and apologies for the inconvenience. Some quick replies, will further investigate and fix this one later this day:
(1) Yes, we need to raise an error when there're two classes and pos_label is not among them (like what we've done in P/R/F)
(2) The NaN bug is not related to my PR but I'll try to look into it. It won't be hard to handle it separately.
(3) I use pos_label=1 because pos_label=None is ill-defined in _binary_clf_curve (and in the repo, see #10010). You might expect that pos_label=None will take the larger label but it actually means pos_label=1 in _binary_clf_curve.
(4) I guess it's acceptable that if users need to set a parameter, they need to rely on make_scorer instead of the default scorer?

qinhanmin2014 · 2018-10-06T12:07:21Z

Regarding the nan issue, we have a PR #8280. I can take that one but before that @amueller , I want your decision about the edge cases of precision & recall.
(1) (not related to current fix) How to define precision when TP + FP = 0? In precision_score, we raise a warning and return 0. In average_precision_score, we return 0 (in my PR #9980). But in a SO question provided by the contributor of #8280, it's defined as 1.
(2) (related to current fix) How to define recall when TP + FN = 0? In recall_score, we raise a warning and return 0. In average_precision_score, we raise a warning and return nan (the nan issue here). In #8280, the contributor return 1 in average_precision_score (consistent with the SO question he provided), which is inconsistent with our recall_score.
Also note that the nan issue is actually tested by us currently, so it's actually the expected behavior now.

amueller · 2018-10-07T17:05:08Z

I'm not sure what we should advise. We could ask users also to just do y == pos_label. That would remove the need to call make_scorer. So I think we should either recommend that to users in the error message, or we could try to use the minority class as positive - but we could only do that if we had the training set available, which we don't.

amueller · 2018-10-07T17:06:20Z

I'm not sure what you mean by "the nan issue".
It's a consequence of an undefined pos_label here, right? I think it's probably the right thing to do if only one label is present.

qinhanmin2014 · 2018-10-08T03:50:11Z

I'm not sure what you mean by "the nan issue".

The nan issue means that if we only have negative classes in y_true, we'll get nan (along with a warning). The issue here is whether we allow metrics & scorers to output nan and how to solve it.

We could ask users also to just do y == pos_label.

Apologies I don't understand here. I think we add pos_label here so that users don't need to do things like y == pos_label.

I've added the error message in #12313. If you think the nan output you got in this issue is fine, then that PR will close this issue.

amueller · 2018-10-09T16:49:14Z

Apologies I don't understand here. I think we add pos_label here so that users don't need to do things like y == pos_label.

Well doing y == pos_label is much easier than using make_scorer though.

amueller · 2018-10-09T17:41:05Z

I feel like all the issues I raised are fixed by #12313. But it's still awkward to actually use this with the scorer interface and I'm not sure what we should be suggesting.

qinhanmin2014 · 2018-10-10T01:20:13Z

I feel like all the issues I raised are fixed by #12313. But it's still awkward to actually use this with the scorer interface and I'm not sure what we should be suggesting.

If you don't like make_scorer, then we need to define pos_label=None. The difficult thing is how to define it when there's only one label (i.e., should we treat it as positive label or negative label?).

Seems that you don't worry about getting nan, so I'm going to close this one. For pos_label related discussions, we have an issue #10010.

Reopen if you disagree.

amueller · 2018-10-10T04:09:14Z

no I think this is good.

amueller added the Bug label Oct 5, 2018

qinhanmin2014 mentioned this issue Oct 6, 2018

ENH Raise an error when pos_label is not in binary y_true #12313

Merged

qinhanmin2014 closed this as completed Oct 10, 2018

qinhanmin2014 mentioned this issue Sep 4, 2019

Precision Recall and F-score: behavior when all negative #14876

Closed

juliapiscioniere mentioned this issue Feb 25, 2020

[MRG][DOC] Issue #14312 - Ensure All Attributes are Documented: Bayesian Ridge #16417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

average_precision_score breaks on string labels #12312

average_precision_score breaks on string labels #12312

amueller commented Oct 5, 2018 •

edited

Loading

amueller commented Oct 5, 2018

Uh oh!

qinhanmin2014 commented Oct 6, 2018

Uh oh!

qinhanmin2014 commented Oct 6, 2018 •

edited

Loading

Uh oh!

amueller commented Oct 7, 2018

Uh oh!

amueller commented Oct 7, 2018

Uh oh!

qinhanmin2014 commented Oct 8, 2018 •

edited

Loading

Uh oh!

amueller commented Oct 9, 2018

Uh oh!

amueller commented Oct 9, 2018

Uh oh!

qinhanmin2014 commented Oct 10, 2018

Uh oh!

amueller commented Oct 10, 2018

Uh oh!

Uh oh!

average_precision_score breaks on string labels #12312

average_precision_score breaks on string labels #12312

Comments

amueller commented Oct 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amueller commented Oct 5, 2018

Uh oh!

qinhanmin2014 commented Oct 6, 2018

Uh oh!

qinhanmin2014 commented Oct 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 7, 2018

Uh oh!

amueller commented Oct 7, 2018

Uh oh!

qinhanmin2014 commented Oct 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 9, 2018

Uh oh!

amueller commented Oct 9, 2018

Uh oh!

qinhanmin2014 commented Oct 10, 2018

Uh oh!

amueller commented Oct 10, 2018

Uh oh!

amueller commented Oct 5, 2018 •

edited

Loading

qinhanmin2014 commented Oct 6, 2018 •

edited

Loading

qinhanmin2014 commented Oct 8, 2018 •

edited

Loading