-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Classification metrics incosinstencies #11743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The jaccard_similarity_score implementation is confused. Let's leave it
alone. I would like to have it fixed for the 0.20 release, but it likely
won't be (unless another core dev wants to look at #10083 or, preferably,
#11179, from which it would be simple). But I don't see why providing
Jaccard is fundamentally a problem. Lots of information retrieval and
classification work uses F1 score which is Dice coefficient, and as far as
I can tell, Jaccard is more principles and practically more informative
than Dice (it doesn't double-count errors; and it corresponds to a true
distance metric).
Your points aren't especially clear. zero_one_loss is defined, in the code,
as the complement of accuracy_score. So I think the rest of your concerns
about how hamming_loss relates to zero_one_loss (and perhaps how the
available metrics relate to the set of metrics you might want, which I
think you need to clarify more with an example).
The use of the labels parameter in hamming_loss is weird. It is used for
its length, but only in the multilabel case, where the number of labels
should be y_true.shape[1]. I suspect this is due to legacy code not being
changed correctly when we changed multilabel format. We should probably
deprecate it.
normalize is available inconsistently. Okay.
Now why would we *need* a related accuracy metric corresponding to hamming
loss? We provide both accuracy_score and zero_one_loss only out of
convenience. We have no intentions to maintain complements of every loss.
You might also be interested in #11179 which defined generic tools for
set-wise scores.
|
Agreed with everything. To clarify further, I believe it would be worth having an accuracy metric which in the multi-label setting, instead of computing subset accuracy which is a strict metric, would just compute the average accuracy across samples and labels. One idea for this to be implemented would be to just include an additional input argument, say The reason I am making this point is not because I believe that we need a compliment/dyadic metric to hamming_loss, but because in some, but not all, applications this metric might actually be useful. I am currently working on an application where such a metric makes much more sense than the strict subset accuracy provided by accuracy_score. I know it is rather straightforward to implement this, but I reckon it would be worth providing it off-the-shelf. |
it's not different enough from hamming for it to be interesting to me. I
think multilabel_confusion matrix will also help users imagine variant
metrics
|
Agreed, but that's why I made the connection between zero_one_loss and accuracy_score in my first post. I mean, if we were to follow the same logic, then we wouldn't need accuracy_score since we already have zero_one_loss? Anyway, please feel free to close the issue if you think this is not needed. (Although it might be worth opening a new one for fixing hamming_loss) |
Uh oh!
There was an error while loading. Please reload this page.
[This issue is related to #7332].
I think there is some inconsistency with the current classification metrics.
Let's consider the four following metrics:
accuracy_score
zero_one_loss
jaccard_similarity_score
hamming_loss
From my understanding,
accuracy_score
andzero_one_loss
, represent exactly opposite things (i.e. there is a 1-to-1 relationship between the two), therefore they must always sum to 1. Makes sense.Additionally, in the multi-label setting, both metrics are strict; that is, for a given sample the set of labels in
y_pred
must exactly match the corresponding set of labels iny_true
for either metric to consider this sample as accurate.Now, am I right in understanding that the
hamming_loss
is essentially the same aszero_one_loss
, except that it is not strict in the multi-label setting, as in this case it penalises individual labels? In a single-label setting (either binary or multi-class, it doesn't matter)hamming_loss
andzero_one_loss
should be equivalent.So far so good. Now, I would assume that with respect to
hamming_loss
, we would need a related accuracy metric which looks at the individual labels in the multi-label setting. In the case of a single-label setting (either binary or multi-class) this metric would be equivalent toaccuracy_score
. In other words, similarly to the pairaccuracy_score
-zero_one_loss
we would have another pairsome_new_accuracy_score
-hamming_loss
so that the latter two always sum up to one, in the same way thataccuracy_score
andzero_one_loss
sum up to one (again, this would hold for multi-class and multi-label classification alike).Instead we have the
jaccard_similarity_score
which is defined as the intersection over union -- are we sure this is what we really want?Furthermore, I am not sure I understand why
hamming_loss
does not have anormalize
input argument, in the same way thatzero_one_loss
,accuracy_score
andjaccard_similarity_score
do. Equally, why ishamming_loss
the only metric supporting theclasses
(to be deprecated in favour oflabels
) input argument?The text was updated successfully, but these errors were encountered: