-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX binary/multiclass jaccard_similarity_score and extend to handle averaging #13092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
64e30d6
a495cfc
fcba7f0
d49ccab
615ac9a
78b2a84
41f7e2b
a7d0111
057815a
f1bd76f
581d540
aefe921
83df958
041c668
39b92b1
a0712b5
113072a
c52d577
2e2d762
5504a00
b30ba53
8d0ca20
ce89b5f
192bb2d
149af2a
40fca72
4b50447
fd099e5
8c9c614
a7d3b40
8a7e673
6e75c5a
c800598
c3fa41d
27ffebf
e017ccf
9ee4c11
d1311c7
d3f76d5
319b5d3
07a05e6
3a312a3
9251d29
225c0f2
d7fe5ca
414ae8b
3ac79bd
3673407
d3d7ca9
04768c5
54fe344
ee54853
551804d
37737c2
0d45a44
785bb36
7c1314a
90e0c5c
8ff62bc
f5d03d0
c3279ff
a673683
0b507aa
5bb690d
f1e1b69
9606d52
e1d7e28
a2a09da
c873dce
4978dfd
c73605f
b536ac6
4fe8a1f
095a02e
2c9b356
0e9e12d
95dfada
99fdd5c
80520e9
4ba98bc
5b5f04c
dfe58f4
7422982
1f495c1
afa7759
55b1e83
6b71c18
03d89de
46c1274
a779926
e082e62
1e9373e
28dcca4
7fd7201
27cf502
7943540
47776c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,22 +160,29 @@ Support for Python 3.4 and below has been officially dropped. | |
metrics such as recall, specificity, fall out and miss rate. | ||
:issue:`11179` by :user:`Shangwu Yao <ShangwuYao>` and `Joel Nothman`_. | ||
|
||
- |Feature| |Fix| :func:`metrics.jaccard_similarity_score` now accepts | ||
``average`` argument like :func:`metrics.precision_recall_fscore_support` as | ||
a naively set-wise measure applying only to binary, multilabel targets. It | ||
now binarizes multiclass input and treats them like the corresponding | ||
multilabel problem. | ||
:issue:`10083` by :user:`Gaurav Dhingra <gxyd>` and `Joel Nothman`_. | ||
|
||
- |Enhancement| Use label `accuracy` instead of `micro-average` on | ||
:func:`metrics.classification_report` to avoid confusion. `micro-average` is | ||
only shown for multi-label or multi-class with a subset of classes because | ||
it is otherwise identical to accuracy. | ||
:issue:`12334` by :user:`Emmanuel Arias <[email protected]>`, | ||
`Joel Nothman`_ and `Andreas Müller`_ | ||
|
||
- |Fix| The metric :func:`metrics.r2_score` is degenerate with a single sample | ||
and now it returns NaN and raises :class:`exceptions.UndefinedMetricWarning`. | ||
:issue:`12855` by :user:`Pawel Sendyk <psendyk>.` | ||
|
||
- |API| The parameter ``labels`` in :func:`metrics.hamming_loss` is deprecated | ||
in version 0.21 and will be removed in version 0.23. | ||
:issue:`10580` by :user:`Reshama Shaikh <reshamas>` and `Sandra | ||
Mitrovic <SandraMNE>`. | ||
|
||
- |Fix| The metric :func:`metrics.r2_score` is degenerate with a single sample | ||
and now it returns NaN and raises :class:`exceptions.UndefinedMetricWarning`. | ||
:issue:`12855` by :user:`Pawel Sendyk <psendyk>.` | ||
|
||
- |Efficiency| The pairwise manhattan distances with sparse input now uses the | ||
BLAS shipped with scipy instead of the bundled BLAS. :issue:`12732` by | ||
:user:`Jérémie du Boisberranger <jeremiedbb>` | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -577,7 +577,8 @@ class labels [2]_. | |||||||||||
return 1 - k | ||||||||||||
|
||||||||||||
|
||||||||||||
def jaccard_similarity_score(y_true, y_pred, normalize=True, | ||||||||||||
def jaccard_similarity_score(y_true, y_pred, labels=None, pos_label=1, | ||||||||||||
average='samples', normalize='true-if-samples', | ||||||||||||
sample_weight=None): | ||||||||||||
"""Jaccard similarity coefficient score | ||||||||||||
|
||||||||||||
|
@@ -596,72 +597,136 @@ def jaccard_similarity_score(y_true, y_pred, normalize=True, | |||||||||||
y_pred : 1d array-like, or label indicator array / sparse matrix | ||||||||||||
Predicted labels, as returned by a classifier. | ||||||||||||
|
||||||||||||
labels : list, optional | ||||||||||||
The set of labels to include when ``average != 'binary'``, and their | ||||||||||||
order if ``average is None``. Labels present in the data can be | ||||||||||||
excluded, for example to calculate a multiclass average ignoring a | ||||||||||||
majority negative class, while labels not present in the data will | ||||||||||||
result in 0 components in a macro average. For multilabel targets, | ||||||||||||
labels are column indices. By default, all labels in ``y_true`` and | ||||||||||||
adrinjalali marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
``y_pred`` are used in sorted order. | ||||||||||||
|
||||||||||||
pos_label : str or int, 1 by default | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -> |
||||||||||||
The class to report if ``average='binary'`` and the data is binary. | ||||||||||||
If the data are multiclass or multilabel, this will be ignored; | ||||||||||||
setting ``labels=[pos_label]`` and ``average != 'binary'`` will report | ||||||||||||
scores for that label only. | ||||||||||||
|
||||||||||||
average : string, ['samples' (default), 'binary', 'micro', 'macro', None, \ | ||||||||||||
'weighted'] | ||||||||||||
If ``None``, the scores for each class are returned. Otherwise, this | ||||||||||||
determines the type of averaging performed on the data: | ||||||||||||
|
||||||||||||
``'binary'``: | ||||||||||||
Only report results for the class specified by ``pos_label``. | ||||||||||||
This is applicable only if targets (``y_{true,pred}``) are binary. | ||||||||||||
``'micro'``: | ||||||||||||
Calculate metrics globally by counting the total true positives, | ||||||||||||
false negatives and false positives. | ||||||||||||
``'macro'``: | ||||||||||||
Calculate metrics for each label, and find their unweighted | ||||||||||||
mean. This does not take label imbalance into account. | ||||||||||||
``'weighted'``: | ||||||||||||
Calculate metrics for each label, and find their average, weighted | ||||||||||||
by support (the number of true instances for each label). This | ||||||||||||
alters 'macro' to account for label imbalance. | ||||||||||||
``'samples'``: | ||||||||||||
Calculate metrics for each instance, and find their average (only | ||||||||||||
meaningful for multilabel classification). | ||||||||||||
|
||||||||||||
normalize : bool, optional (default=True) | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. default is |
||||||||||||
If ``False``, return the sum of the Jaccard similarity coefficient | ||||||||||||
over the sample set. Otherwise, return the average of Jaccard | ||||||||||||
similarity coefficient. | ||||||||||||
similarity coefficient. ``normalize`` is only applicable when | ||||||||||||
``average='samples'``. The default value 'true-if-samples' behaves like | ||||||||||||
True, but does not raise an error with other values of `average`. | ||||||||||||
|
||||||||||||
sample_weight : array-like of shape = [n_samples], optional | ||||||||||||
Sample weights. | ||||||||||||
|
||||||||||||
Returns | ||||||||||||
------- | ||||||||||||
score : float | ||||||||||||
If ``normalize == True``, return the average Jaccard similarity | ||||||||||||
coefficient, else it returns the sum of the Jaccard similarity | ||||||||||||
coefficient over the sample set. | ||||||||||||
|
||||||||||||
The best performance is 1 with ``normalize == True`` and the number | ||||||||||||
of samples with ``normalize == False``. | ||||||||||||
score : float (if average is not None) or array of floats, shape =\ | ||||||||||||
[n_unique_labels] | ||||||||||||
|
||||||||||||
See also | ||||||||||||
-------- | ||||||||||||
accuracy_score, hamming_loss, zero_one_loss | ||||||||||||
|
||||||||||||
Notes | ||||||||||||
----- | ||||||||||||
In binary and multiclass classification, this function is equivalent | ||||||||||||
to the ``accuracy_score``. It differs in the multilabel classification | ||||||||||||
problem. | ||||||||||||
:func:`jaccard_similarity_score` may be a poor metric if there are no | ||||||||||||
positives for some samples or classes. | ||||||||||||
|
||||||||||||
References | ||||||||||||
---------- | ||||||||||||
.. [1] `Wikipedia entry for the Jaccard index | ||||||||||||
<https://en.wikipedia.org/wiki/Jaccard_index>`_ | ||||||||||||
|
||||||||||||
|
||||||||||||
Examples | ||||||||||||
-------- | ||||||||||||
>>> import numpy as np | ||||||||||||
>>> from sklearn.metrics import jaccard_similarity_score | ||||||||||||
>>> y_pred = [0, 2, 1, 3] | ||||||||||||
>>> y_true = [0, 1, 2, 3] | ||||||||||||
>>> jaccard_similarity_score(y_true, y_pred) | ||||||||||||
|
||||||||||||
In the multilabel case: | ||||||||||||
|
||||||||||||
>>> y_true = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]]) | ||||||||||||
>>> y_pred = np.array([[0, 1, 1], [1, 1, 1], [0, 0, 1]]) | ||||||||||||
>>> jaccard_similarity_score(y_true, y_pred, average='samples') | ||||||||||||
... # doctest: +ELLIPSIS | ||||||||||||
0.33... | ||||||||||||
>>> jaccard_similarity_score(y_true, y_pred, average='micro') | ||||||||||||
... # doctest: +ELLIPSIS | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is redundant, it's already set above (and it generates the odd empty There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think these flags are per-statement, so I don't see how "it's already set above" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The scope of those flags are at least per-block, example: scikit-learn/sklearn/covariance/empirical_covariance_.py Lines 128 to 132 in 8d10ba0
|
||||||||||||
0.33... | ||||||||||||
>>> jaccard_similarity_score(y_true, y_pred, average='weighted') | ||||||||||||
0.5 | ||||||||||||
>>> jaccard_similarity_score(y_true, y_pred, normalize=False) | ||||||||||||
2 | ||||||||||||
>>> jaccard_similarity_score(y_true, y_pred, average=None) | ||||||||||||
array([0., 0., 1.]) | ||||||||||||
|
||||||||||||
In the multilabel case with binary label indicators: | ||||||||||||
In the multiclass case: | ||||||||||||
|
||||||||||||
>>> import numpy as np | ||||||||||||
>>> jaccard_similarity_score(np.array([[0, 1], [1, 1]]),\ | ||||||||||||
np.ones((2, 2))) | ||||||||||||
0.75 | ||||||||||||
>>> jaccard_similarity_score(np.array([0, 1, 2, 3]), | ||||||||||||
... np.array([0, 2, 2, 3]), average='macro') | ||||||||||||
0.625 | ||||||||||||
""" | ||||||||||||
if average != 'samples' and normalize != 'true-if-samples': | ||||||||||||
adrinjalali marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
raise ValueError("'normalize' is only meaningful with " | ||||||||||||
"`average='samples'`, got `average='%s'`." | ||||||||||||
% average) | ||||||||||||
labels = _check_set_wise_labels(y_true, y_pred, average, labels, | ||||||||||||
pos_label) | ||||||||||||
if labels is _ALL_ZERO: | ||||||||||||
warnings.warn('Jaccard is ill-defined and being set to 0.0 with no ' | ||||||||||||
'true or predicted samples', UndefinedMetricWarning) | ||||||||||||
return 0. | ||||||||||||
samplewise = average == 'samples' | ||||||||||||
MCM = multilabel_confusion_matrix(y_true, y_pred, | ||||||||||||
sample_weight=sample_weight, | ||||||||||||
labels=labels, samplewise=samplewise) | ||||||||||||
numerator = MCM[:, 1, 1] | ||||||||||||
denominator = MCM[:, 1, 1] + MCM[:, 0, 1] + MCM[:, 1, 0] | ||||||||||||
|
||||||||||||
# Compute accuracy for each possible representation | ||||||||||||
y_type, y_true, y_pred = _check_targets(y_true, y_pred) | ||||||||||||
check_consistent_length(y_true, y_pred, sample_weight) | ||||||||||||
if y_type.startswith('multilabel'): | ||||||||||||
with np.errstate(divide='ignore', invalid='ignore'): | ||||||||||||
# oddly, we may get an "invalid" rather than a "divide" error here | ||||||||||||
pred_or_true = count_nonzero(y_true + y_pred, axis=1) | ||||||||||||
pred_and_true = count_nonzero(y_true.multiply(y_pred), axis=1) | ||||||||||||
score = pred_and_true / pred_or_true | ||||||||||||
score[pred_or_true == 0.0] = 1.0 | ||||||||||||
if average == 'micro': | ||||||||||||
numerator = np.array([numerator.sum()]) | ||||||||||||
denominator = np.array([denominator.sum()]) | ||||||||||||
|
||||||||||||
jaccard = _prf_divide(numerator, denominator, 'jaccard', | ||||||||||||
'true or predicted', average, ('jaccard',)) | ||||||||||||
if average is None: | ||||||||||||
return jaccard | ||||||||||||
if not normalize: | ||||||||||||
return np.sum(jaccard * (1 if sample_weight is None | ||||||||||||
else sample_weight)) | ||||||||||||
if average == 'weighted': | ||||||||||||
weights = MCM[:, 1, 0] + MCM[:, 1, 1] | ||||||||||||
if not np.any(weights): | ||||||||||||
# numerator is 0, and warning should have already been issued | ||||||||||||
weights = None | ||||||||||||
elif average == 'samples' and sample_weight is not None: | ||||||||||||
weights = sample_weight | ||||||||||||
else: | ||||||||||||
score = y_true == y_pred | ||||||||||||
|
||||||||||||
return _weighted_sum(score, sample_weight, normalize) | ||||||||||||
weights = None | ||||||||||||
return np.average(jaccard, weights=weights) | ||||||||||||
|
||||||||||||
|
||||||||||||
def matthews_corrcoef(y_true, y_pred, sample_weight=None): | ||||||||||||
|
@@ -1056,8 +1121,10 @@ def _prf_divide(numerator, denominator, metric, modifier, average, warn_for): | |||||||||||
The metric, modifier and average arguments are used only for determining | ||||||||||||
an appropriate warning. | ||||||||||||
""" | ||||||||||||
result = numerator / denominator | ||||||||||||
mask = denominator == 0.0 | ||||||||||||
denominator = denominator.copy() | ||||||||||||
denominator[mask] = 1 | ||||||||||||
result = numerator / denominator | ||||||||||||
if not np.any(mask): | ||||||||||||
return result | ||||||||||||
|
||||||||||||
|
@@ -1091,6 +1158,41 @@ def _prf_divide(numerator, denominator, metric, modifier, average, warn_for): | |||||||||||
return result | ||||||||||||
|
||||||||||||
|
||||||||||||
_ALL_ZERO = object() # sentinel for special, degenerate case | ||||||||||||
|
||||||||||||
|
||||||||||||
def _check_set_wise_labels(y_true, y_pred, average, labels, pos_label): | ||||||||||||
"""Validation associated with set-wise metrics | ||||||||||||
|
||||||||||||
Returns identified labels or _ALL_ZERO sentinel | ||||||||||||
""" | ||||||||||||
average_options = (None, 'micro', 'macro', 'weighted', 'samples') | ||||||||||||
if average not in average_options and average != 'binary': | ||||||||||||
raise ValueError('average has to be one of ' + | ||||||||||||
str(average_options)) | ||||||||||||
|
||||||||||||
y_type, y_true, y_pred = _check_targets(y_true, y_pred) | ||||||||||||
present_labels = unique_labels(y_true, y_pred) | ||||||||||||
if average == 'binary': | ||||||||||||
if y_type == 'binary': | ||||||||||||
if pos_label not in present_labels: | ||||||||||||
if len(present_labels) < 2: | ||||||||||||
return _ALL_ZERO | ||||||||||||
else: | ||||||||||||
raise ValueError("pos_label=%r is not a valid label: " | ||||||||||||
"%r" % (pos_label, present_labels)) | ||||||||||||
labels = [pos_label] | ||||||||||||
else: | ||||||||||||
raise ValueError("Target is %s but average='binary'. Please " | ||||||||||||
"choose another average setting." % y_type) | ||||||||||||
elif pos_label not in (None, 1): | ||||||||||||
warnings.warn("Note that pos_label (set to %r) is ignored when " | ||||||||||||
"average != 'binary' (got %r). You may use " | ||||||||||||
"labels=[pos_label] to specify a single positive class." | ||||||||||||
% (pos_label, average), UserWarning) | ||||||||||||
return labels | ||||||||||||
|
||||||||||||
|
||||||||||||
def precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None, | ||||||||||||
pos_label=1, average=None, | ||||||||||||
warn_for=('precision', 'recall', | ||||||||||||
|
@@ -1234,35 +1336,12 @@ def precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None, | |||||||||||
array([2, 2, 2])) | ||||||||||||
|
||||||||||||
""" | ||||||||||||
average_options = (None, 'micro', 'macro', 'weighted', 'samples') | ||||||||||||
if average not in average_options and average != 'binary': | ||||||||||||
raise ValueError('average has to be one of ' + | ||||||||||||
str(average_options)) | ||||||||||||
if beta <= 0: | ||||||||||||
raise ValueError("beta should be >0 in the F-beta score") | ||||||||||||
|
||||||||||||
y_type, y_true, y_pred = _check_targets(y_true, y_pred) | ||||||||||||
check_consistent_length(y_true, y_pred, sample_weight) | ||||||||||||
present_labels = unique_labels(y_true, y_pred) | ||||||||||||
|
||||||||||||
if average == 'binary': | ||||||||||||
if y_type == 'binary': | ||||||||||||
if pos_label not in present_labels: | ||||||||||||
if len(present_labels) < 2: | ||||||||||||
# Only negative labels | ||||||||||||
return (0., 0., 0., 0) | ||||||||||||
else: | ||||||||||||
raise ValueError("pos_label=%r is not a valid label: %r" % | ||||||||||||
(pos_label, present_labels)) | ||||||||||||
labels = [pos_label] | ||||||||||||
else: | ||||||||||||
raise ValueError("Target is %s but average='binary'. Please " | ||||||||||||
"choose another average setting." % y_type) | ||||||||||||
elif pos_label not in (None, 1): | ||||||||||||
warnings.warn("Note that pos_label (set to %r) is ignored when " | ||||||||||||
"average != 'binary' (got %r). You may use " | ||||||||||||
"labels=[pos_label] to specify a single positive class." | ||||||||||||
% (pos_label, average), UserWarning) | ||||||||||||
labels = _check_set_wise_labels(y_true, y_pred, average, labels, | ||||||||||||
pos_label) | ||||||||||||
if labels is _ALL_ZERO: | ||||||||||||
return (0., 0., 0., 0) | ||||||||||||
|
||||||||||||
# Calculate tp_sum, pred_sum, true_sum ### | ||||||||||||
samplewise = average == 'samples' | ||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note that this is not backward compatible with users calling it with positional arguments [sigh]! But I'm not sure what we should do in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we deprecate the current function and make jaccard_score that would solve it :)