-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] FEA multilabel confusion matrix #11179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FEA multilabel confusion matrix #11179
Conversation
Nb: In #5516 I preferred adding scorers, but not metric functions, for things like fallout. What are benchmark results looking like? |
Also please prefix your pr title with [WIP] until tests and features are complete |
…trix corner cases
This pull request introduces 1 alert when merging cdd619d into a31a906 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging ec82be3 into a31a906 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
The benchmarking result of I have optimized the speed for The |
I'll have to take a good look at this some point soon, but tbh this is not
critical for the coming release so it might take some time. feel free to
ping.
|
Ok, I will ping you when I finished. |
Re contributing to something critical... #3855 might be a good fit for you?
|
That issue looks interesting, I am looking into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm mistaken, you're no longer using this in precision_tecall_fscore_supporr due to poor benchmarks in the multilabel case. I consider its use in that function to be a key goal.
Is there a faster way to count true positives, false positives and false negatives for each class without using confusion_matrix?
Also, could you please change your benchmark plots to show comparable curves with the same colour but different markers on them depending on whether they are using mlcm or not?
Thanks a lot!
Btw, that issue I pointed you too is a long-term wish, not critical for the next release.
sklearn/metrics/classification.py
Outdated
labels=None, samplewise=False): | ||
"""Returns a confusion matrix for each output of a multilabel problem | ||
|
||
Multiclass tasks will be treated as if binarised under a one-vs-rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps say (i.e. where y is 1d)
sklearn/metrics/classification.py
Outdated
raise ValueError("Samplewise confusion is not useful outside of " | ||
"multilabel classification.") | ||
present_labels = unique_labels(y_true, y_pred) | ||
C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should try do this with LabelEncoder and bincount. Or with confusion_matrix without its validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using LabelEncoder and bincount as you said, but then it is pretty much the same as the original implementation. So there is no speedup in doing that.
Although, interestingly, in the multiclass case and binary case of multilabel_confusion_matrix
, if I replace the call to confusion_matrix with the LabelEncoder and bincount implementation, it will make multilabel_confusion_matrix
faster than confusion_matrix
.
In [7]: y_true = np.random.randint(0, 2, (300,))
In [8]: y_pred = np.random.randint(0, 2, (300,))
In [9]: %timeit multilabel_confusion_matrix(y_true, y_pred)
308 µs ± 3.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit confusion_matrix(y_true, y_pred)
488 µs ± 5.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So there should be room for improvement here, I will let you know when I get deeper.
I invented the name. The function is mostly to help us implement and facilitate the implementation of arbitrary metrics based on set-wise/binary metrics: Precision, Recall, F1, Fβ, Jaccard, Specificity, etc. The point is that it calculates sufficient statistics for these metrics, just as Because its needs are mostly internal, and it would mostly be used through the metric aggregates, it is unlikely to be referenced in the literature as such. It makes sense, however, that utiml similarly implements something like this, because from it they can derive all the metrics they need for multilabel evaluation. Their "absolute matrix" is not identical to our The binary case (at least with 1d input) needs to be handled with a 2x2x2 matrix, in order to support those metrics above. One clarifying alternative would be to make a separate function for multiclass input (which needs to be binarised) from that for multilabel input. I'd be okay calling this |
(1) Regarding the name: I'm fine with both So the remaining things @jnothman |
And @jnothman another annoying thing :) Do you think it's acceptable?
|
I think that is consistent, and don't have a problem with it. >>> multilabel_confusion_matrix([0, 0], [0, 0])
array([[[0, 0],
[0, 2]]])
>>> multilabel_confusion_matrix([0, 0], [0, 0], labels=[0, 1])
array([[[0, 0],
[0, 2]],
[[2, 0],
[0, 0]]]) |
I'm happy to rename to |
I see, yes it's reasonable but a bit tricky.
I think both names are fine and will follow your decision. (binarized_confusion_matrix seems more straightforward but multilabel_confusion_matrix seems to be consistent with R utiml). Still want to know your opinion about the 4 reviews above (#11179 (review)), especially the second and the fourth one :) |
@TomDLT what do you think of the name |
Any further opinions on the name |
FYI test is failing (apologies I don't have time to investigate now)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both names are fine. I would be slightly in favor of |
I think we can merge. Thanks all for the great work! |
And thanks @ShangwuYao for making this happen! |
This reverts commit b2b191f.
This reverts commit b2b191f.
Reference Issues/PRs
Start adding metrics for #5516, continue on and close #10628
Fixes #3452
What does this implement/fix? Explain your changes.