Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH/FIX Replace jaccard_similarity_score by sane jaccard_score #13151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 129 commits into from
Mar 13, 2019

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Feb 13, 2019

Fixes #7332. Alternative to #13092

Also simplifies division warning logic, such that it fixes #10812 and Fixes #10843 (with thanks to @qinhanmin2014 in #13143)

What does this implement/fix? Explain your changes.

The current Jaccard implementation is ridiculous for binary and multiclass problems, returning accuracy. This makes a new Jaccard function with API comparable to Precision, Recall and F-score, which are also fundamentally set-wise metrics.

This also drops the normalize parameter.

gxyd added 30 commits November 8, 2017 01:25
this deals with both multilabel and multiclass problems
labels, sample_weight seems to be working fine, though haven't
fully testing them again, will do in next commit
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I like the consistency with f1 / precision / recall.

However what I find a bit suboptimal is that we have the combination of the following:

1- Jaccard index in most useful with multilabel classification problems;
2- average="samples" is the most sensible averaging for this kind of problems (multilabel classification);
3- the default value of average (average="binary") yields an (informative error) for multilabel problems (it is only valid for binary classification problems for which the Jaccard index in not a commonly used evaluation metric).

Which means that for 99% of the regular use cases for jaccard_score in scikit-learn, the user has to pass the extra verbose jaccard_score(y_multilabel_true, y_multilabel_pred, average="samples").

One alternative would be to use average="samples" by default and make it explicit in the docstring that Jaccard index is most useful for multilabel classification problems.

That would make jaccard_score slightly less consistent with the F/P/R metrics but maybe more intuitive for the users.

WDYT?

In any case I like this PR overall.

>>> jaccard_score(y_true[0], y_pred[0]) # doctest: +ELLIPSIS
0.6666...
>>> jaccard_score(y_true, y_pred, average='macro') # doctest: +ELLIPSIS
0.6666...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make the example easier to follow if you inserted the jaccard_score(y_true, y_pred, average=None) case before the average='macro' case.

>>> jaccard_score(y_true, y_pred, average='micro')
0.33...
>>> jaccard_score(y_true, y_pred, average=None)
array([1., 0., 0., 1.])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example is a bit weird because the per-class jaccard indices are either 0. or 1. which is a bit of an edge case. Maybe make the example slightly less specific, e.g.:

>>> jaccard_score(np.array([0, 2, 1, 3]), np.array([0, 1, 1, 3]), average=None)
array([1. , 0.5, 0. , 1. ])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could you give an intuition as to when it's useful to use the Jaccard index to score a multiclass (or binary) classification problem instead of using accuracy, AUC or F/P/R? It does not seem to be a common practice.

If so maybe we should add a note such as:

The Jaccard index is most useful to score multilabel classification models (with average="samples"). The generalization to binary and multiclass classification problems is provided for the sake of consistency but is not a common practice. Those two kinds of tasks are more commonly evaluated using other metrics such as accuracy, ROC AUC or Precision/Recall/F-score.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's not common practice for evaluation, although Jaccard does have some nice properties that F1 (= Dice) does not, such as being the complement of a true distance metric.

And in whatever context, @bthirion had assumed jaccard_similarity_score would work on binary problems, since many people first know this index for set comparison.

It is, I admit, quite unusual for multiclass... and I'd be happy just to disallow it, and even throw away this PR and just allow multilabel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think P/R/F are pretty weird for multiclass in some ways too...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in whatever context, @bthirion had assumed jaccard_similarity_score would work on binary problems, since many people first know this index for set comparison.

Seems that the wiki page we cite also demonstrates the binary case? (See https://en.wikipedia.org/wiki/Jaccard_index, Similarity of asymmetric binary attributes)

It is, I admit, quite unusual for multiclass...

Agree. I'm happy to remove, or keep it for the sake of consistency.

FWIW, I think P/R/F are pretty weird for multiclass in some ways too...

What do you mean? I guess it's reasonable for P/R/F to support multiclass?

>>> jaccard_score(y_true, y_pred, average='weighted')
0.5
>>> jaccard_score(y_true, y_pred, average=None)
array([0., 0., 1.])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a again maybe better illustrate with data that yield class-wise Jaccard that are not all 0 or 1.

assert_raise_message(ValueError, msg2, jaccard_score, y_true,
y_pred, average='binary')
msg3 = ("Samplewise metrics are not available outside of multilabel "
"classification.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It it's easy enough, can you include the actual type of target that was inferred from y_true and y_pred in the error message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe the phrasing "Samplewise averaging is not available outside...." would make more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe the phrasing "Samplewise averaging is not available outside...." would make more sense

The problem with this is only that the error is being raised within multilabel_confusion_matrix which doesn't do the averaging...

# size(y1 \inter y2) = [1, 2]
# size(y1 \union y2) = [2, 2]

jss = partial(assert_warns, DeprecationWarning, jaccard_similarity_score)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick :)

@jnothman
Copy link
Member Author

jnothman commented Mar 10, 2019 via email

@qinhanmin2014
Copy link
Member

Which means that for 99% of the regular use cases for jaccard_score in scikit-learn, the user has to pass the extra verbose jaccard_score(y_multilabel_true, y_multilabel_pred, average="samples").

This is actually the advantage of average="binary", right? (i.e., to make users aware of different average options).

@jnothman
Copy link
Member Author

Apologies if this doesn't pass CI... I'm having troubles with my build environment

@@ -1239,8 +1245,12 @@ def _check_set_wise_labels(y_true, y_pred, average, labels, pos_label):
"%r" % (pos_label, present_labels))
labels = [pos_label]
else:
applicable_options = list(average_options)
if y_type == 'multiclass':
applicable_options = applicable_options.remove('samples')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be applicable_options.remove('samples') :)
I think we can use average_options directly because it's not used later.

@jnothman
Copy link
Member Author

What have I missed?

756     >>> y_pred = [0, 2, 1, 2]  # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
757     >>> y_true = [0, 1, 2, 2]
758     >>> jaccard_score(y_true, y_pred, average=None)
Expected:
    array([1. , 0. , 0.33...])
Got:
    array([1.        , 0.        , 0.33333333])


In the multiclass case:

>>> y_pred = [0, 2, 1, 2] # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be added in L760?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep being told by @adrinjalali that their scope is not just the line

>>> y_pred = [0, 2, 1, 2]
>>> y_true = [0, 1, 2, 2]
>>> jaccard_score(y_true, y_pred, average=None)
array([1., 0., 0.33...])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  >>> jaccard_score(y_true, y_pred, average=None)
  ... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
  array([1. , 0. , 0.33...])
  >>> jaccard_score(y_true, y_pred, average='macro')
  0.44...
  >>> jaccard_score(y_true, y_pred, average='micro')
  0.33...

For the micro case, total true positive = 2, total false positive + total false negative = 6.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was lazy and fudged this waiting for CI output, because I'm not actually able to compile and use the repo on my laptop atm :\

C compiler problems after migrating to this laptop that I've not worked out how to resolve yet.

@qinhanmin2014
Copy link
Member

@jnothman tests are still failing, @thomasjpfan 's solution #13151 (comment) seems reasonable and works locally.

@jnothman
Copy link
Member Author

So shall we merge?

@qinhanmin2014 qinhanmin2014 merged commit 19c8af6 into scikit-learn:master Mar 13, 2019
@jnothman
Copy link
Member Author

Sigh of relief. Thanks @gxyd for all your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants