[MRG] FEA multilabel confusion matrix #11179

ShangwuYao · 2018-05-31T23:57:09Z

Reference Issues/PRs

Start adding metrics for #5516, continue on and close #10628

Fixes #3452

What does this implement/fix? Explain your changes.

implement multilabel_confusion_matrix and fix up edge cases that fail tests
benchmark multiclass implementation against incumbent P/R/F/S
benchmark multilabel implementation with benchmarks/bench_multilabel_metrics.py extended to consider non-micro averaging, sample_weight and perhaps other cases
optimize speed based on line-profiling
directly test multilabel_confusion_matrix
document under model_evaluation.rst
document how to calculate fall-out, miss-rate, sensitivity, specificity from multilabel_confusion_matrix
refactor jaccard similarity implementation once [MRG] average parameter for jaccard_similarity_score #10083 is merged

jnothman · 2018-06-02T08:15:32Z

Nb: In #5516 I preferred adding scorers, but not metric functions, for things like fallout.

What are benchmark results looking like?

jnothman · 2018-06-02T08:16:19Z

Also please prefix your pr title with [WIP] until tests and features are complete

…trix corner cases

sklearn-lgtm · 2018-06-09T02:22:54Z

This pull request introduces 1 alert when merging cdd619d into a31a906 - view on LGTM.com

new alerts:

1 for Wrong name for an argument in a call

Comment posted by LGTM.com

sklearn-lgtm · 2018-06-09T03:03:08Z

This pull request introduces 1 alert when merging ec82be3 into a31a906 - view on LGTM.com

new alerts:

1 for Wrong name for an argument in a call

Comment posted by LGTM.com

…heck_targets

ShangwuYao · 2018-06-09T18:34:03Z

The benchmarking result of precision_recall_fscore_support_with_multilabel_confusion_matrix on multiclass case is much slower than the original one, because of the use of confusion_matrix (10 times slower in some cases) (as shown in #10628 ).

I have optimized the speed for multilabel-indicator case (replace bincount with point-wise multiplication, remove unnecessary expensive _check_targets), for this case, the precision_recall_fscore_support_with_multilabel_confusion_matrix has very close performance with the original implementation (slightly faster).

The precision_recall_fscore_support_with_multilabel_confusion_matrix is only for debugging and optimization purpose and will be removed. And this implementation of precision_recall_fscore_support_with_multilabel_confusion_matrix could pass the tests for precision_recall_fscore_support.

jnothman · 2018-06-09T22:07:44Z

I'll have to take a good look at this some point soon, but tbh this is not critical for the coming release so it might take some time. feel free to ping.

ShangwuYao · 2018-06-10T00:24:23Z

Ok, I will ping you when I finished.
I hope to contribute to sth critical, if you find anything suitable, I would love to give it a try.

jnothman · 2018-06-10T03:50:54Z

Re contributing to something critical... #3855 might be a good fit for you?

ShangwuYao · 2018-06-10T18:08:27Z

That issue looks interesting, I am looking into it.
I think the work on multilabel_confusion_matrix is finished, could you review it when you have the time? @jnothman Thanks!

jnothman

Unless I'm mistaken, you're no longer using this in precision_tecall_fscore_supporr due to poor benchmarks in the multilabel case. I consider its use in that function to be a key goal.

Is there a faster way to count true positives, false positives and false negatives for each class without using confusion_matrix?

Also, could you please change your benchmark plots to show comparable curves with the same colour but different markers on them depending on whether they are using mlcm or not?

Thanks a lot!

Btw, that issue I pointed you too is a long-term wish, not critical for the next release.

jnothman · 2018-06-11T07:27:58Z

sklearn/metrics/classification.py

+                                labels=None, samplewise=False):
+    """Returns a confusion matrix for each output of a multilabel problem
+
+    Multiclass tasks will be treated as if binarised under a one-vs-rest


Perhaps say (i.e. where y is 1d)

jnothman · 2018-06-11T09:23:41Z

sklearn/metrics/classification.py

+            raise ValueError("Samplewise confusion is not useful outside of "
+                             "multilabel classification.")
+        present_labels = unique_labels(y_true, y_pred)
+        C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight,


Yes, we should try do this with LabelEncoder and bincount. Or with confusion_matrix without its validation.

I tried using LabelEncoder and bincount as you said, but then it is pretty much the same as the original implementation. So there is no speedup in doing that.

Although, interestingly, in the multiclass case and binary case of multilabel_confusion_matrix, if I replace the call to confusion_matrix with the LabelEncoder and bincount implementation, it will make multilabel_confusion_matrix faster than confusion_matrix.

In [7]: y_true = np.random.randint(0, 2, (300,)) In [8]: y_pred = np.random.randint(0, 2, (300,)) In [9]: %timeit multilabel_confusion_matrix(y_true, y_pred) 308 µs ± 3.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [10]: %timeit confusion_matrix(y_true, y_pred) 488 µs ± 5.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So there should be room for improvement here, I will let you know when I get deeper.

jnothman · 2018-10-07T20:54:13Z

Do you have some references for users?

I invented the name. The function is mostly to help us implement and facilitate the implementation of arbitrary metrics based on set-wise/binary metrics: Precision, Recall, F1, Fβ, Jaccard, Specificity, etc. The point is that it calculates sufficient statistics for these metrics, just as confusion_matrix calculates sufficient statistics for multiclass metrics (ignoring sample_weight perhaps), and contingency_matrix calculates sufficient statistics for clustering metrics.

Because its needs are mostly internal, and it would mostly be used through the metric aggregates, it is unlikely to be referenced in the literature as such.

It makes sense, however, that utiml similarly implements something like this, because from it they can derive all the metrics they need for multilabel evaluation. Their "absolute matrix" is not identical to our confusion_matrix. It is merely the sum over axis 0 of the output of multilabel_confusion_matrix.

The binary case (at least with 1d input) needs to be handled with a 2x2x2 matrix, in order to support those metrics above. One clarifying alternative would be to make a separate function for multiclass input (which needs to be binarised) from that for multilabel input.

I'd be okay calling this binarized_confusion_matrix, in correspondence with LabelBinarizer (but without any affinity in meaning to Binarizer!).

qinhanmin2014 · 2018-10-08T04:09:12Z

(1) Regarding the name: I'm fine with both nultilabel_confusion_matrix and binarized_confusion_matrix so I'll follow your final decision. Personally, I prefer binarized_confusion_matrix, because if I put binarized_confusion_matrix and confusion_matrix together, I can figure out some of the relationships and differences between them.
(2) Regarding the R pack, apologies I didn't read the doc carefully. I don't think we need to care about their Absolute Matrix and Proportinal Matrix.
(3) Regarding the binary case, I'm fine with current implementation.

So the remaining things @jnothman
(1) your final decision about the name
(2) 4 minor review comments above

qinhanmin2014 · 2018-10-08T04:18:49Z

And @jnothman another annoying thing :) Do you think it's acceptable?

multilabel_confusion_matrix([0, 0], [0, 0])
array([[[0, 0],
           [0, 2]]], dtype=int64)

jnothman · 2018-10-10T00:46:17Z

Do you think it's acceptable?

I think that is consistent, and don't have a problem with it.

>>> multilabel_confusion_matrix([0, 0], [0, 0])
array([[[0, 0],
        [0, 2]]])
>>> multilabel_confusion_matrix([0, 0], [0, 0], labels=[0, 1])
array([[[0, 0],
        [0, 2]],

       [[2, 0],
        [0, 0]]])

jnothman · 2018-10-10T00:46:48Z

I'm happy to rename to binarized_confusion_matrix, as long as you reckon that makes sense for the case that the input is already multiple labels.

qinhanmin2014 · 2018-10-10T01:14:47Z

I think that is consistent, and don't have a problem with it.

I see, yes it's reasonable but a bit tricky.

I'm happy to rename to binarized_confusion_matrix

I think both names are fine and will follow your decision. (binarized_confusion_matrix seems more straightforward but multilabel_confusion_matrix seems to be consistent with R utiml).

Still want to know your opinion about the 4 reviews above (#11179 (review)), especially the second and the fourth one :)

jnothman · 2018-10-10T01:24:51Z

@TomDLT what do you think of the name binarized_confusion_matrix vs multilabel_confusion_matrix vs other??

…gwuYao/scikit-learn into mlcm

jnothman · 2018-10-14T21:25:05Z

Any further opinions on the name binarized_confusion_matrix vs multilabel_confusion_matrix vs other for a function which returns a 2x2 confusion matrix for each class in multilabel or multiclass data?

qinhanmin2014 · 2018-10-15T01:28:16Z

FYI test is failing (apologies I don't have time to investigate now)

=================================== FAILURES ===================================
___________________ test_multilabel_confusion_matrix_errors ____________________
    def test_multilabel_confusion_matrix_errors():
        y_true = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
        y_pred = np.array([[1, 0, 0], [0, 1, 1], [0, 0, 1]])
    
        # Bad sample_weight
        assert_raise_message(ValueError, "inconsistent numbers of samples",
                             multilabel_confusion_matrix,
                             y_true, y_pred, sample_weight=[1, 2])
        assert_raise_message(ValueError, "could not be broadcast",
                             multilabel_confusion_matrix,
                             y_true, y_pred,
                             sample_weight=[[1, 2, 3],
                                            [2, 3, 4],
>                                           [3, 4, 5]])
y_pred     = array([[1, 0, 0],
       [0, 1, 1],
       [0, 0, 1]])
y_true     = array([[1, 0, 1],
       [0, 1, 0],
       [1, 1, 0]])
/home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/tests/test_classification.py:484: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
exceptions = <type 'exceptions.ValueError'>, message = 'could not be broadcast'
function = <function multilabel_confusion_matrix at 0x7f3042ff0c80>
args = (array([[1, 0, 1],
       [0, 1, 0],
       [1, 1, 0]]), array([[1, 0, 0],
       [0, 1, 1],
       [0, 0, 1]]))
kwargs = {'sample_weight': [[1, 2, 3], [2, 3, 4], [3, 4, 5]]}
e = ValueError('a.shape[axis] != len(repeats)',)
error_message = 'a.shape[axis] != len(repeats)'
    def assert_raise_message(exceptions, message, function, *args, **kwargs):
        """Helper function to test the message raised in an exception.
    
        Given an exception, a callable to raise the exception, and
        a message string, tests that the correct exception is raised and
        that the message is a substring of the error thrown. Used to test
        that the specific message thrown during an exception is correct.
    
        Parameters
        ----------
        exceptions : exception or tuple of exception
            An Exception object.
    
        message : str
            The error message or a substring of the error message.
    
        function : callable
            Callable object to raise error.
    
        *args : the positional arguments to `function`.
    
        **kwargs : the keyword arguments to `function`.
        """
        try:
            function(*args, **kwargs)
        except exceptions as e:
            error_message = str(e)
            if message not in error_message:
                raise AssertionError("Error message does not include the expected"
                                     " string: %r. Observed error message: %r" %
>                                    (message, error_message))
E               AssertionError: Error message does not include the expected string: 'could not be broadcast'. Observed error message: 'a.shape[axis] != len(repeats)'
args       = (array([[1, 0, 1],
       [0, 1, 0],
       [1, 1, 0]]), array([[1, 0, 0],
       [0, 1, 1],
       [0, 0, 1]]))
e          = ValueError('a.shape[axis] != len(repeats)',)
error_message = 'a.shape[axis] != len(repeats)'
exceptions = <type 'exceptions.ValueError'>
function   = <function multilabel_confusion_matrix at 0x7f3042ff0c80>
kwargs     = {'sample_weight': [[1, 2, 3], [2, 3, 4], [3, 4, 5]]}
message    = 'could not be broadcast'

qinhanmin2014

@jnothman Do we need to wait for @TomDLT 's opinion about the name? If not, I guess we can merge.

TomDLT · 2018-10-30T10:49:54Z

multilabel_confusion_matrix makes it clear on which problem you can use the function.
binarized_confusion_matrix makes it clear what is actually computed.

Both names are fine. I would be slightly in favor of multilabel_confusion_matrix, since this may help users find the function. The question of what is actually computed can be found in the docstring: Multiclass data will be treated as if binarized under a one-vs-rest transformation..

qinhanmin2014 · 2018-10-30T11:26:13Z

I think we can merge. Thanks all for the great work!

jnothman · 2018-10-30T11:36:08Z

And thanks @ShangwuYao for making this happen!

This reverts commit b2b191f.

jnothman and others added 7 commits February 13, 2018 09:03

ENH Multilabel confusion matrix

d80c6bb

Add see also references

7b97b8c

Fix messy edge cases

1d30de7

Rm unnecessary comment

1ab72f1

Fix for old scipy

eca4dbb

FIX for old scipy

542ec86

added benchmark for P/R/F/S

00a636e

ShangwuYao changed the title ~~Multilabel confusion append~~ Continue on multilabel confusion matrix May 31, 2018

added benchmark comparing w and w/o multilabel confusion metrics

6d15cfd

ShangwuYao changed the title ~~Continue on multilabel confusion matrix~~ [WIP] Continue on multilabel confusion matrix Jun 2, 2018

improved speed, correctly handle sample_weight, dealed with sparse ma…

cdd619d

…trix corner cases

rm main

ec82be3

shangwuyao added 2 commits June 9, 2018 13:24

added support for samplewise

7cb7ccb

another speed improvement for multilabel case, removed unnecessary _c…

63508e7

…heck_targets

shangwuyao added 4 commits June 9, 2018 22:07

added tests for multilabel_confusion_matrix

7c19ac4

removed benchmark files

616d371

Fixed sparse array sum problem in different python version

bad5a0c

Fixed sparse array sum in a cleaner way

91cb38e

Added test for errors

078cd8b

jnothman reviewed Jun 11, 2018

View reviewed changes

eamanu mentioned this pull request Oct 6, 2018

Update classification.py #12245

Closed

jnothman added 4 commits October 10, 2018 12:38

DOC improved what's new

f5aef8c

More precise error message

8372778

Check error messages

ef00329

More explicit testing of count_nonzero dtypes

36b008b

jnothman mentioned this pull request Oct 10, 2018

[MRG] FIX make count_nonzero dtype invariant wrt axis #12341

Merged

jnothman added 2 commits October 10, 2018 14:05

Merge branch 'multilabel-confusion-append' of https://github.com/Shan…

2302060

…gwuYao/scikit-learn into mlcm

Merge branch 'master' into mlcm

f2ac113

jnothman added 2 commits October 15, 2018 15:20

numpy cross-version error messages

23c283b

Merge branch 'master' into mlcm

5b6fca8

qinhanmin2014 approved these changes Oct 30, 2018

View reviewed changes

qinhanmin2014 merged commit 6555631 into scikit-learn:master Oct 30, 2018

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018

FEA (0.21) multilabel confusion matrix (scikit-learn#11179)

5b1ba76

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FEA (0.21) multilabel confusion matrix (scikit-learn#11179)

b2b191f

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FEA (0.21) multilabel confusion matrix (scikit-learn#11179)"

54fd6a4

This reverts commit b2b191f.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FEA (0.21) multilabel confusion matrix (scikit-learn#11179)"

e312f9a

This reverts commit b2b191f.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FEA (0.21) multilabel confusion matrix (scikit-learn#11179)

c740f3f

rth mentioned this pull request Aug 3, 2020

[WIP] ENH Multilabel confusion matrix #4126

Closed

7 tasks

Uh oh!

[MRG] FEA multilabel confusion matrix #11179

[MRG] FEA multilabel confusion matrix #11179

Uh oh!

Conversation

ShangwuYao commented May 31, 2018 • edited by jnothman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

jnothman commented Jun 2, 2018

Uh oh!

jnothman commented Jun 2, 2018

Uh oh!

sklearn-lgtm commented Jun 9, 2018

Uh oh!

sklearn-lgtm commented Jun 9, 2018

Uh oh!

ShangwuYao commented Jun 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 9, 2018 via email

Uh oh!

ShangwuYao commented Jun 10, 2018

Uh oh!

jnothman commented Jun 10, 2018 via email

Uh oh!

ShangwuYao commented Jun 10, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Jun 11, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman Jun 11, 2018

Choose a reason for hiding this comment

Uh oh!

ShangwuYao Jun 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 7, 2018

Uh oh!

qinhanmin2014 commented Oct 8, 2018

Uh oh!

qinhanmin2014 commented Oct 8, 2018

Uh oh!

jnothman commented Oct 10, 2018

Uh oh!

jnothman commented Oct 10, 2018

Uh oh!

qinhanmin2014 commented Oct 10, 2018

Uh oh!

jnothman commented Oct 10, 2018

Uh oh!

jnothman commented Oct 14, 2018

Uh oh!

qinhanmin2014 commented Oct 15, 2018

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Oct 30, 2018

Uh oh!

qinhanmin2014 commented Oct 30, 2018

Uh oh!

jnothman commented Oct 30, 2018

Uh oh!

Uh oh!

ShangwuYao commented May 31, 2018 •

edited by jnothman

Loading

ShangwuYao commented Jun 9, 2018 •

edited

Loading

ShangwuYao Jun 15, 2018 •

edited

Loading