ENH/FIX Replace jaccard_similarity_score by sane jaccard_score #13151

jnothman · 2019-02-13T00:01:42Z

Fixes #7332. Alternative to #13092

Also simplifies division warning logic, such that it fixes #10812 and Fixes #10843 (with thanks to @qinhanmin2014 in #13143)

What does this implement/fix? Explain your changes.

The current Jaccard implementation is ridiculous for binary and multiclass problems, returning accuracy. This makes a new Jaccard function with API comparable to Precision, Recall and F-score, which are also fundamentally set-wise metrics.

This also drops the normalize parameter.

Fixes scikit-learn#7332

this deals with both multilabel and multiclass problems

labels, sample_weight seems to be working fine, though haven't fully testing them again, will do in next commit

ogrisel

LGTM overall. I like the consistency with f1 / precision / recall.

However what I find a bit suboptimal is that we have the combination of the following:

1- Jaccard index in most useful with multilabel classification problems;
2- average="samples" is the most sensible averaging for this kind of problems (multilabel classification);
3- the default value of average (average="binary") yields an (informative error) for multilabel problems (it is only valid for binary classification problems for which the Jaccard index in not a commonly used evaluation metric).

Which means that for 99% of the regular use cases for jaccard_score in scikit-learn, the user has to pass the extra verbose jaccard_score(y_multilabel_true, y_multilabel_pred, average="samples").

One alternative would be to use average="samples" by default and make it explicit in the docstring that Jaccard index is most useful for multilabel classification problems.

That would make jaccard_score slightly less consistent with the F/P/R metrics but maybe more intuitive for the users.

WDYT?

In any case I like this PR overall.

doc/modules/model_evaluation.rst

ogrisel · 2019-03-08T14:43:15Z

doc/modules/model_evaluation.rst

+  >>> jaccard_score(y_true[0], y_pred[0])  # doctest: +ELLIPSIS
+  0.6666...
+  >>> jaccard_score(y_true, y_pred, average='macro')  # doctest: +ELLIPSIS
+  0.6666...


I think it would make the example easier to follow if you inserted the jaccard_score(y_true, y_pred, average=None) case before the average='macro' case.

ogrisel · 2019-03-08T14:51:59Z

doc/modules/model_evaluation.rst

+  >>> jaccard_score(y_true, y_pred, average='micro')
+  0.33...
+  >>> jaccard_score(y_true, y_pred, average=None)
+  array([1., 0., 0., 1.])


I think this example is a bit weird because the per-class jaccard indices are either 0. or 1. which is a bit of an edge case. Maybe make the example slightly less specific, e.g.:

>>> jaccard_score(np.array([0, 2, 1, 3]), np.array([0, 1, 1, 3]), average=None) array([1. , 0.5, 0. , 1. ])

Also could you give an intuition as to when it's useful to use the Jaccard index to score a multiclass (or binary) classification problem instead of using accuracy, AUC or F/P/R? It does not seem to be a common practice.

If so maybe we should add a note such as:

The Jaccard index is most useful to score multilabel classification models (with average="samples"). The generalization to binary and multiclass classification problems is provided for the sake of consistency but is not a common practice. Those two kinds of tasks are more commonly evaluated using other metrics such as accuracy, ROC AUC or Precision/Recall/F-score.

You're right. It's not common practice for evaluation, although Jaccard does have some nice properties that F1 (= Dice) does not, such as being the complement of a true distance metric.

And in whatever context, @bthirion had assumed jaccard_similarity_score would work on binary problems, since many people first know this index for set comparison.

It is, I admit, quite unusual for multiclass... and I'd be happy just to disallow it, and even throw away this PR and just allow multilabel.

FWIW, I think P/R/F are pretty weird for multiclass in some ways too...

And in whatever context, @bthirion had assumed jaccard_similarity_score would work on binary problems, since many people first know this index for set comparison.

Seems that the wiki page we cite also demonstrates the binary case? (See https://en.wikipedia.org/wiki/Jaccard_index, Similarity of asymmetric binary attributes)

It is, I admit, quite unusual for multiclass...

Agree. I'm happy to remove, or keep it for the sake of consistency.

FWIW, I think P/R/F are pretty weird for multiclass in some ways too...

What do you mean? I guess it's reasonable for P/R/F to support multiclass?

ogrisel · 2019-03-08T15:13:10Z

sklearn/metrics/classification.py

+    >>> jaccard_score(y_true, y_pred, average='weighted')
+    0.5
+    >>> jaccard_score(y_true, y_pred, average=None)
+    array([0., 0., 1.])


Here a again maybe better illustrate with data that yield class-wise Jaccard that are not all 0 or 1.

sklearn/metrics/tests/test_classification.py

ogrisel · 2019-03-08T15:19:36Z

sklearn/metrics/tests/test_classification.py

+    assert_raise_message(ValueError, msg2, jaccard_score, y_true,
+                         y_pred, average='binary')
+    msg3 = ("Samplewise metrics are not available outside of multilabel "
+            "classification.")


It it's easy enough, can you include the actual type of target that was inferred from y_true and y_pred in the error message?

Also maybe the phrasing "Samplewise averaging is not available outside...." would make more sense.

Also maybe the phrasing "Samplewise averaging is not available outside...." would make more sense

The problem with this is only that the error is being raised within multilabel_confusion_matrix which doesn't do the averaging...

ogrisel · 2019-03-08T15:42:14Z

sklearn/metrics/tests/test_classification.py

+    # size(y1 \inter y2) = [1, 2]
+    # size(y1 \union y2) = [2, 2]
+
+    jss = partial(assert_warns, DeprecationWarning, jaccard_similarity_score)


nice trick :)

jnothman · 2019-03-10T11:08:25Z

Thanks for the review. No time yet to implement changes. Why do you say that jaccard is most useful for average='samples'? I don't think it's as common as F1 but I've heard it argued that F1=dice is just a poor cousin to jaccard for set comparison.

qinhanmin2014 · 2019-03-11T12:55:56Z

Which means that for 99% of the regular use cases for jaccard_score in scikit-learn, the user has to pass the extra verbose jaccard_score(y_multilabel_true, y_multilabel_pred, average="samples").

This is actually the advantage of average="binary", right? (i.e., to make users aware of different average options).

jnothman · 2019-03-12T02:42:07Z

Apologies if this doesn't pass CI... I'm having troubles with my build environment

qinhanmin2014 · 2019-03-12T03:55:03Z

sklearn/metrics/classification.py

@@ -1239,8 +1245,12 @@ def _check_set_wise_labels(y_true, y_pred, average, labels, pos_label):
                                     "%r" % (pos_label, present_labels))
            labels = [pos_label]
        else:
+            applicable_options = list(average_options)
+            if y_type == 'multiclass':
+                applicable_options = applicable_options.remove('samples')


should be applicable_options.remove('samples') :)
I think we can use average_options directly because it's not used later.

jnothman · 2019-03-12T07:34:11Z

What have I missed?

756     >>> y_pred = [0, 2, 1, 2]  # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
757     >>> y_true = [0, 1, 2, 2]
758     >>> jaccard_score(y_true, y_pred, average=None)
Expected:
    array([1. , 0. , 0.33...])
Got:
    array([1.        , 0.        , 0.33333333])

qinhanmin2014 · 2019-03-12T08:10:45Z

sklearn/metrics/classification.py

+
+    In the multiclass case:
+
+    >>> y_pred = [0, 2, 1, 2]  # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS


these should be added in L760?

I keep being told by @adrinjalali that their scope is not just the line

thomasjpfan · 2019-03-13T02:25:17Z

doc/modules/model_evaluation.rst

+  >>> y_pred = [0, 2, 1, 2]
+  >>> y_true = [0, 1, 2, 2]
+  >>> jaccard_score(y_true, y_pred, average=None)
+  array([1., 0., 0.33...])


>>> jaccard_score(y_true, y_pred, average=None) ... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS array([1. , 0. , 0.33...]) >>> jaccard_score(y_true, y_pred, average='macro') 0.44... >>> jaccard_score(y_true, y_pred, average='micro') 0.33...

For the micro case, total true positive = 2, total false positive + total false negative = 6.

Yes, I was lazy and fudged this waiting for CI output, because I'm not actually able to compile and use the repo on my laptop atm :\

C compiler problems after migrating to this laptop that I've not worked out how to resolve yet.

qinhanmin2014 · 2019-03-13T09:07:55Z

@jnothman tests are still failing, @thomasjpfan 's solution #13151 (comment) seems reasonable and works locally.

jnothman · 2019-03-13T12:46:21Z

So shall we merge?

jnothman · 2019-03-13T13:09:10Z

Sigh of relief. Thanks @gxyd for all your work on this!

…t-learn#13151)

scikit-learn#13151)" This reverts commit 3eecd13.

…t-learn#13151)

gxyd added 30 commits November 8, 2017 01:25

multiclass jaccard similarity not equal to accurary_score

64e30d6

Fixes scikit-learn#7332

add space and fix input

a495cfc

score being a n_class size array and weight already taken care of

fcba7f0

add space to fix printing of doctest

d49ccab

add support for 'average' of type 'macro', 'micro', 'weighted'

615ac9a

add tests and make documentation changes

78b2a84

use 'average' for 'multilabel' classification

41f7e2b

introduce average='binary', average='samples'

a7d0111

show errors and warning before anything

057815a

this deals with both multilabel and multiclass problems

write separate functions

f1bd76f

completely okay API and improved doctest

581d540

fix lgtm error and better control flow

aefe921

add normalize in API

83df958

raise ValueError for not-providing 'avergae' in multiclass

041c668

fixed errors with multiclass for different average values

39b92b1

fix tests, use assert_raise_message instead

a0712b5

add common_test for jaccard_similarity_score

113072a

use average='none-samples' instead of 'normalize=False'

c52d577

average='micro' in multiclass case is equivalent to accuracy_score

2e2d762

fixes to multilabel case

5504a00

add error message for average='samples' for non-multilable case

b30ba53

add none-samples in common test

8d0ca20

add support for labels in multilabel classification

ce89b5f

fix multilablel classification

192bb2d

labels, sample_weight seems to be working fine, though haven't fully testing them again, will do in next commit

fix for multiclass

149af2a

corrected 'macro', 'weighted' for multiclass only 'micro' remains

40fca72

fix completely logic of average='micro', now only 'binary' remains

4b50447

remove 'warn' from API, after discussion on PR with jnothman

fd099e5

fix average='binary'

8c9c614

fix doctest, now test_common and lgtm remain to be fixed

a7d3b40

ogrisel approved these changes Mar 8, 2019

View reviewed changes

Try resolving Olivier's comments

2e410e8

qinhanmin2014 reviewed Mar 12, 2019

View reviewed changes

jnothman added 5 commits March 12, 2019 14:59

Fix use of list.remove

0e08db0

Fix doctest whitespace

42545f1

Fix doctest whitespace

f5f30ee

ELLIPSIS too

5ba99d5

Space

f3d19aa

qinhanmin2014 reviewed Mar 12, 2019

View reviewed changes

jnothman added 2 commits March 13, 2019 10:11

Try putting doctest flags on new line

db1acf1

Doctest whitespace

cff1f71

thomasjpfan reviewed Mar 13, 2019

View reviewed changes

fix doctest result

5abcc99

Fully implement Thomas's vision

17c5717

qinhanmin2014 approved these changes Mar 13, 2019

View reviewed changes

qinhanmin2014 merged commit 19c8af6 into scikit-learn:master Mar 13, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

ENH/FIX Replace jaccard_similarity_score by sane jaccard_score (sciki…

3eecd13

…t-learn#13151)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH/FIX Replace jaccard_similarity_score by sane jaccard_score (

43e65d8

scikit-learn#13151)" This reverts commit 3eecd13.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH/FIX Replace jaccard_similarity_score by sane jaccard_score (

48cc3b8

scikit-learn#13151)" This reverts commit 3eecd13.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH/FIX Replace jaccard_similarity_score by sane jaccard_score (sciki…

6a87da1

…t-learn#13151)

blaze515 mentioned this pull request Jul 18, 2020

AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' pandas-ml/pandas-ml#131

Open

Amorzhang mentioned this pull request Dec 6, 2021

JS score may be inaccurate on skin lesion segmentation task! rezazad68/BCDU-Net#34

Closed

lucyleeow mentioned this pull request Nov 3, 2023

DOC Improve pos_label and labels in precison/recall/f1 and jaccard #27714

Merged


		In the multiclass case:

		>>> y_pred = [0, 2, 1, 2] # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS

Uh oh!

ENH/FIX Replace jaccard_similarity_score by sane jaccard_score #13151

ENH/FIX Replace jaccard_similarity_score by sane jaccard_score #13151

Uh oh!

Conversation

jnothman commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement/fix? Explain your changes.

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 10, 2019 via email

Uh oh!

qinhanmin2014 commented Mar 11, 2019

Uh oh!

jnothman commented Mar 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Mar 13, 2019

Uh oh!

jnothman commented Mar 13, 2019

Uh oh!

jnothman commented Mar 13, 2019

Uh oh!

Uh oh!

jnothman commented Feb 13, 2019 •

edited

Loading

ogrisel left a comment •

edited

Loading