Classification metrics incosinstencies #11743

agamemnonc · 2018-08-03T16:50:32Z

[This issue is related to #7332].

I think there is some inconsistency with the current classification metrics.

Let's consider the four following metrics:

accuracy_score
zero_one_loss
jaccard_similarity_score
hamming_loss

From my understanding, accuracy_score and zero_one_loss, represent exactly opposite things (i.e. there is a 1-to-1 relationship between the two), therefore they must always sum to 1. Makes sense.

Additionally, in the multi-label setting, both metrics are strict; that is, for a given sample the set of labels in y_pred must exactly match the corresponding set of labels in y_true for either metric to consider this sample as accurate.

Now, am I right in understanding that the hamming_loss is essentially the same as zero_one_loss, except that it is not strict in the multi-label setting, as in this case it penalises individual labels? In a single-label setting (either binary or multi-class, it doesn't matter) hamming_loss and zero_one_loss should be equivalent.

So far so good. Now, I would assume that with respect to hamming_loss, we would need a related accuracy metric which looks at the individual labels in the multi-label setting. In the case of a single-label setting (either binary or multi-class) this metric would be equivalent to accuracy_score. In other words, similarly to the pair accuracy_score - zero_one_loss we would have another pair some_new_accuracy_score - hamming_loss so that the latter two always sum up to one, in the same way that accuracy_score and zero_one_loss sum up to one (again, this would hold for multi-class and multi-label classification alike).

Instead we have the jaccard_similarity_score which is defined as the intersection over union -- are we sure this is what we really want?

Furthermore, I am not sure I understand why hamming_loss does not have a normalize input argument, in the same way that zero_one_loss, accuracy_score and jaccard_similarity_score do. Equally, why is hamming_loss the only metric supporting the classes (to be deprecated in favour of labels) input argument?

The text was updated successfully, but these errors were encountered:

jnothman · 2018-08-04T10:23:36Z

The jaccard_similarity_score implementation is confused. Let's leave it alone. I would like to have it fixed for the 0.20 release, but it likely won't be (unless another core dev wants to look at #10083 or, preferably, #11179, from which it would be simple). But I don't see why providing Jaccard is fundamentally a problem. Lots of information retrieval and classification work uses F1 score which is Dice coefficient, and as far as I can tell, Jaccard is more principles and practically more informative than Dice (it doesn't double-count errors; and it corresponds to a true distance metric). Your points aren't especially clear. zero_one_loss is defined, in the code, as the complement of accuracy_score. So I think the rest of your concerns about how hamming_loss relates to zero_one_loss (and perhaps how the available metrics relate to the set of metrics you might want, which I think you need to clarify more with an example). The use of the labels parameter in hamming_loss is weird. It is used for its length, but only in the multilabel case, where the number of labels should be y_true.shape[1]. I suspect this is due to legacy code not being changed correctly when we changed multilabel format. We should probably deprecate it. normalize is available inconsistently. Okay. Now why would we *need* a related accuracy metric corresponding to hamming loss? We provide both accuracy_score and zero_one_loss only out of convenience. We have no intentions to maintain complements of every loss. You might also be interested in #11179 which defined generic tools for set-wise scores.

agamemnonc · 2018-08-06T14:48:23Z

Agreed with everything.

To clarify further, I believe it would be worth having an accuracy metric which in the multi-label setting, instead of computing subset accuracy which is a strict metric, would just compute the average accuracy across samples and labels.

One idea for this to be implemented would be to just include an additional input argument, say multilabel which could be one of subset or average or something along these lines. Do you think this would be incompatible with the current API design?

The reason I am making this point is not because I believe that we need a compliment/dyadic metric to hamming_loss, but because in some, but not all, applications this metric might actually be useful. I am currently working on an application where such a metric makes much more sense than the strict subset accuracy provided by accuracy_score. I know it is rather straightforward to implement this, but I reckon it would be worth providing it off-the-shelf.

jnothman · 2018-08-06T23:44:18Z

it's not different enough from hamming for it to be interesting to me. I think multilabel_confusion matrix will also help users imagine variant metrics

agamemnonc · 2018-08-07T09:34:05Z

Agreed, but that's why I made the connection between zero_one_loss and accuracy_score in my first post. I mean, if we were to follow the same logic, then we wouldn't need accuracy_score since we already have zero_one_loss?

Anyway, please feel free to close the issue if you think this is not needed. (Although it might be worth opening a new one for fixing hamming_loss)

agamemnonc changed the title ~~Classification metrics incosinstency~~ Classification metrics incosinstencies Aug 3, 2018

agamemnonc mentioned this issue Aug 24, 2018

hamming_loss / hamming_score agamemnonc/sklearn-ext#7

Closed

agamemnonc closed this as completed Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Classification metrics incosinstencies #11743

Classification metrics incosinstencies #11743

agamemnonc commented Aug 3, 2018 •

edited

Loading

jnothman commented Aug 4, 2018 via email

Uh oh!

agamemnonc commented Aug 6, 2018 •

edited

Loading

Uh oh!

jnothman commented Aug 6, 2018 via email

Uh oh!

agamemnonc commented Aug 7, 2018

Uh oh!

Uh oh!

Classification metrics incosinstencies #11743

Classification metrics incosinstencies #11743

Comments

agamemnonc commented Aug 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jnothman commented Aug 4, 2018 via email

Uh oh!

agamemnonc commented Aug 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 6, 2018 via email

Uh oh!

agamemnonc commented Aug 7, 2018

Uh oh!

agamemnonc commented Aug 3, 2018 •

edited

Loading

agamemnonc commented Aug 6, 2018 •

edited

Loading