[MRG+2] ENH multiclass balanced accuracy #10587

jnothman · 2018-02-05T02:29:15Z

Includes computationally simpler implementation and logically simpler description.

See also #10040. Ping @maskani-moh, @amueller.

Includes computationally simpler implementation and logically simpler description.

jnothman · 2018-02-05T04:32:22Z

Ahh... passing tests.

jnothman · 2018-02-05T05:50:23Z

doc/glossary.rst

@@ -1357,6 +1357,8 @@ functions or non-estimator constructors.
        equal weight by giving each sample a weight inversely related
        to its class's prevalence in the training data:
        ``n_samples / (n_classes * np.bincount(y))``.
+        **Note** however that this rebalancing does not take the weight of
+        samples in each class into account.


Perhaps we should have a "weight-balanced" option for class_weight. It would be interesting to see if that improved imbalanced boosting.

Apparently my phone wrote "weight-loss card" (!) there. Amended.

jnothman · 2018-02-05T05:51:01Z

doc/modules/model_evaluation.rst


 .. math::

-   \texttt{balanced-accuracy}(y, \hat{y}) = \frac{1}{2} \left(\frac{\sum_i 1(\hat{y}_i = 1 \land y_i = 1)}{\sum_i 1(y_i = 1)} + \frac{\sum_i 1(\hat{y}_i = 0 \land y_i = 0)}{\sum_i 1(y_i = 0)}\right)
+   \hat{w}_i = \frac{w_i}{\sum_j{1(y_j = y_i) w_j}}


Should I give the equation assuming w_i=1?

I think it's fine if we let the general formula.

maskani-moh · 2018-02-06T00:46:52Z

sklearn/metrics/classification.py

-    sensitivity (true positive rate) and specificity (true negative rate),
-    or the average recall obtained on either class. It is also equal to the
-    ROC AUC score given binary inputs.
+    The balanced accuracy in binary and muitclass classification problems to


typo multiclass

jnothman · 2018-02-06T03:18:26Z

While I'm interested in your critique of the docs and implementation, @maskani-moh, I'd mostly like you to verify that this interpretation of balanced accuracy, as accuracy with sample weights assigned to give equal total weight to each class, makes the choice of a multiclass generalisation clear.

glemaitre · 2018-02-07T22:45:59Z

sklearn/metrics/classification.py

+                                   minlength=n_classes)
+    if sample_weight is None:
+        sample_weight = 1
+    sample_weight = class_weight.take(encoded_y_true) * sample_weight


What is the reason to apply sample_weight a second time. I thought it was already taken into account when computing the class_weight. Which paper should I check for references?

I don't think weighted balanced accuracy is reported anywhere, but:

the PR's implementation matches the incumbed

the implementation matches the invariance tests for weighting in metrics.tests.test_common that are really pretty good, if I must say so myself as the architect...

Generally when we have a value for class_weight as well as sample_weight, we weight the samples per the class_weight (i.e. class_weight.take(y)), and then we multiply by each sample's weight. Exactly what's happening here. However, currently our handling of class_weight='balanced' counts the number not the total weight of samples in each class, then assigns each the reciprocal as each sample's weight. I initially used that in this implementation, and was not surprised to find that it failed the tests: repetition of samples was no longer equivalent to integer weights. So here we use the total weight (not the cardinality) in determining the class weight, reciprocate that, but still assign each sample its weight so that we can correctly calculate the weighted confusion matrix.

Which makes me think: we should be able to implement this even more simply from the confusion matrix... I'll play with that another time soon.

glemaitre · 2018-02-08T11:18:35Z

The implementation with the confusion matrix seems really straight forward. It looks like an average of the TPR per classes. The generalization from binary to multi-class look good to me. I don't see a case where it would not be correct.

glemaitre · 2018-02-08T11:20:47Z

doc/modules/model_evaluation.rst

+
+In contrast, if the conventional accuracy is above chance only because the
+classifier takes advantage of an imbalanced test set, then the balanced
+accuracy, as appropriate, will drop to 1/(number of classes).


Could we use a math environment?

:math:`\frac{1}{# classes}`

glemaitre · 2018-02-08T11:22:30Z

doc/modules/model_evaluation.rst

+accuracy, as appropriate, will drop to 1/(number of classes).
+
+The score ranges from 0 to 1, or when ``adjusted=True`` is used, it rescaled
+to the range [1 / (1 - number of classes), 1] with performance at random being


I also find the range difficult to read in the doc. I would go for an math environment.

glemaitre · 2018-02-08T11:27:23Z

sklearn/metrics/classification.py

+    adjusted : bool, default=False
+        When true, the result is adjusted for chance, so that random
+        performance would score 0, and perfect performance scores 1.
+
    Returns
    -------
    balanced_accuracy : float.


We might change the sensitivity/specificity explanation.

glemaitre · 2018-02-08T11:28:34Z

doc/modules/model_evaluation.rst

+With ``adjusted=True``, balanced accuracy reports the relative increase from
+:math:`\texttt{balanced-accuracy}(y, \mathbf{0}, w) =
+\frac{1}{\text{n classes}}`.  In the binary case, this is also known as
+*Youden's J statistic*, or *informedness*.


maybe link to https://en.wikipedia.org/wiki/Youden%27s_J_statistic

glemaitre · 2018-02-13T17:49:09Z

LGTM. @maskani-moh Could you have a look and tell us WYT?

jnothman · 2018-07-26T10:39:05Z

This should be quick to review if someone (other than @glemaitre who has given his +1) is keen to throw it into 0.20.

qinhanmin2014

LGTM at a glance. I need (and promise) to double check the code and refs tomorrow.
Some small comments, feel free to ignore if you think current version is fine.
My LGTM on the PR is based on the fact that the function is there. Honestly, I don't like the idea of including such a function, which can simply be implemented using recall.
Tagging 0.20.

qinhanmin2014 · 2018-07-26T14:53:21Z

doc/modules/model_evaluation.rst

+
+In contrast, if the conventional accuracy is above chance only because the
+classifier takes advantage of an imbalanced test set, then the balanced
+accuracy, as appropriate, will drop to :math:`\frac{1}{\text{n\_classes}}`.


\text{n\_classes} -> \text{n_classes}? Or maybe some other way to get rid of the extra \ here.
Same comment for similar places below.

qinhanmin2014 · 2018-07-26T14:58:26Z

doc/modules/model_evaluation.rst

+
+The score ranges from 0 to 1, or when ``adjusted=True`` is used, it rescaled to
+the range :math:`\frac{1}{1 - \text{n\_classes}}` to 1, inclusive, with
+performance at random scoring 0.


Seems strange. "Rescaled to the range A to B, with performance at random scoring 0". But 0 is actually not in [A, B]?
I'd prefer a clearer explanation for the scaling strategy we use when adjusted=True.

Sorry. I realized that I'm wrong here.

qinhanmin2014 · 2018-07-26T14:58:57Z

doc/modules/model_evaluation.rst

-      have a score of :math:`0` while perfect predictions have a score of :math:`1`.
-      One can compute the macro-average recall using ``recall_score(average="macro")`` in :func:`recall_score`.
+    * Our definition: [Mosley2013]_, [Kelleher2015]_ and [Guyon2015]_, where
+      [Guyon2015]_ adopt the adjusted version to score chance as 0.


What is score change as 0?

qinhanmin2014 · 2018-07-26T15:00:09Z

sklearn/metrics/classification.py

-    ROC AUC score given binary inputs.
+    The balanced accuracy in binary and multiclass classification problems to
+    deal with imbalanced datasets. It is defined as the average of recall
+    obtained on each class.

    The best value is 1 and the worst value is 0.


No longer the case when adjusted=True?

qinhanmin2014 · 2018-07-26T15:18:29Z

sklearn/metrics/tests/test_classification.py

+    assert balanced == pytest.approx(macro_recall)
+    adjusted = balanced_accuracy_score(y_true, y_pred, adjusted=True)
+    chance = balanced_accuracy_score(y_true, np.full_like(y_true, y_true[0]))
+    assert adjusted == (balanced - chance) / (1 - chance)


Any reason we can't use == when adjusted=False?

qinhanmin2014 · 2018-07-26T15:19:27Z

doc/modules/model_evaluation.rst

-  0.625
-  >>> roc_auc_score(y_true, y_pred)
-  0.625
+   \texttt{balanced-accuracy}(y, \hat{y}, w) = \frac{1}{\sum{\hat{w}_i}} \sum_i 1(\hat{y}_i == y_i) \hat{w}_i


Is it common to use == in the indicator function?

qinhanmin2014

LGTM apart from the comments above.

qinhanmin2014 · 2018-07-27T00:59:26Z

doc/whats_new/v0.20.rst

+  :issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`, and
+  :issue:`10587` by `Joel Nothman`_.
+
+- Added :class:`multioutput.RegressorChain` for multi-target


This entry should be removed.

jnothman · 2018-07-27T04:28:44Z

Honestly, I don't like the idea of including such a function, which can

be simply implemented by recall. The adjusted metric can't just be implemented by recall. But really, we've had years of people asking for balanced accuracy, and not realising that they could implement it with recall....

jnothman · 2018-07-27T04:28:59Z

I don't have time to fix these up right away...

qinhanmin2014 · 2018-07-27T04:59:08Z

@jnothman Do you mind if I push some cosmetic changes and merge this one?

jnothman · 2018-07-27T06:22:37Z

I don't mind if you're confident about them

qinhanmin2014

LGTM, thanks @jnothman

jnothman · 2018-07-29T03:47:27Z

Removing those backslashes broke CircleCI on master.

jnothman added 6 commits February 5, 2018 13:27

ENH multiclass balanced accuracy

a09e7ac

Includes computationally simpler implementation and logically simpler description.

COSMIT

362c3cc

COSMIT

d5a065c

Try fix tests on earliest dependencies

da8d27b

Improve ignore_warnings scope

11ad2d7

Fix use of ignore_warnings as context manager

3d9919b

jnothman added 5 commits February 5, 2018 15:36

corrected -> adjusted

62021a2

DOC

2239eac

DOC TeX

df2ebbc

DOC

05a98e4

DOC

23e3976

jnothman commented Feb 5, 2018

View reviewed changes

DOC what's new

906d066

maskani-moh reviewed Feb 6, 2018

View reviewed changes

Fix typo

34d9ba3

glemaitre reviewed Feb 7, 2018

View reviewed changes

Simpler implementation using confusion_matrix

301d475

glemaitre reviewed Feb 8, 2018

View reviewed changes

Address comments from guillaume

1dcc881

glemaitre changed the title ~~[MRG] ENH multiclass balanced accuracy~~ [MRG+1] ENH multiclass balanced accuracy Feb 13, 2018

glemaitre mentioned this pull request Apr 19, 2018

FIX use balanced accuracy from scikit-learn paris-saclay-cds/ramp-workflow#128

Open

Merge branch 'master' into balacc-multiclass

28a034d

qinhanmin2014 reviewed Jul 26, 2018

View reviewed changes

qinhanmin2014 added this to the 0.20 milestone Jul 26, 2018

qinhanmin2014 reviewed Jul 26, 2018

View reviewed changes

qinhanmin2014 approved these changes Jul 27, 2018

View reviewed changes

qinhanmin2014 changed the title ~~[MRG+1] ENH multiclass balanced accuracy~~ [MRG+2] ENH multiclass balanced accuracy Jul 27, 2018

mostly formatting, I guess you won't be unhappy :)

9cf3979

qinhanmin2014 approved these changes Jul 27, 2018

View reviewed changes

qinhanmin2014 merged commit e888c0d into scikit-learn:master Jul 27, 2018

qinhanmin2014 mentioned this pull request Oct 11, 2018

[MRG+2] Add max_error to the existing set of metrics for regression #12232

Merged

pmarko1711 mentioned this pull request Oct 18, 2019

documentation error - (multiclass) balanced_accuracy_score #15290

Closed

Uh oh!

[MRG+2] ENH multiclass balanced accuracy #10587

[MRG+2] ENH multiclass balanced accuracy #10587

Uh oh!

Conversation

jnothman commented Feb 5, 2018

Uh oh!

jnothman commented Feb 5, 2018

Uh oh!

jnothman Feb 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Feb 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Feb 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Feb 13, 2018

Uh oh!

jnothman commented Jul 26, 2018

Uh oh!

qinhanmin2014 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jul 27, 2018 via email

Uh oh!

jnothman commented Jul 27, 2018 via email

Uh oh!

qinhanmin2014 commented Jul 27, 2018

Uh oh!

jnothman commented Jul 27, 2018 via email

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jul 29, 2018

Uh oh!

jnothman Feb 5, 2018 •

edited

Loading

jnothman commented Feb 6, 2018 •

edited

Loading

qinhanmin2014 left a comment •

edited

Loading