Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+2] ENH multiclass balanced accuracy #10587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 27, 2018

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Feb 5, 2018

Includes computationally simpler implementation and logically simpler description.

See also #10040. Ping @maskani-moh, @amueller.

@jnothman
Copy link
Member Author

jnothman commented Feb 5, 2018

Ahh... passing tests.

@@ -1357,6 +1357,8 @@ functions or non-estimator constructors.
equal weight by giving each sample a weight inversely related
to its class's prevalence in the training data:
``n_samples / (n_classes * np.bincount(y))``.
**Note** however that this rebalancing does not take the weight of
samples in each class into account.
Copy link
Member Author

@jnothman jnothman Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should have a "weight-balanced" option for class_weight. It would be interesting to see if that improved imbalanced boosting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently my phone wrote "weight-loss card" (!) there. Amended.


.. math::

\texttt{balanced-accuracy}(y, \hat{y}) = \frac{1}{2} \left(\frac{\sum_i 1(\hat{y}_i = 1 \land y_i = 1)}{\sum_i 1(y_i = 1)} + \frac{\sum_i 1(\hat{y}_i = 0 \land y_i = 0)}{\sum_i 1(y_i = 0)}\right)
\hat{w}_i = \frac{w_i}{\sum_j{1(y_j = y_i) w_j}}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I give the equation assuming w_i=1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if we let the general formula.

sensitivity (true positive rate) and specificity (true negative rate),
or the average recall obtained on either class. It is also equal to the
ROC AUC score given binary inputs.
The balanced accuracy in binary and muitclass classification problems to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo multiclass

@jnothman
Copy link
Member Author

jnothman commented Feb 6, 2018

While I'm interested in your critique of the docs and implementation, @maskani-moh, I'd mostly like you to verify that this interpretation of balanced accuracy, as accuracy with sample weights assigned to give equal total weight to each class, makes the choice of a multiclass generalisation clear.

minlength=n_classes)
if sample_weight is None:
sample_weight = 1
sample_weight = class_weight.take(encoded_y_true) * sample_weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to apply sample_weight a second time. I thought it was already taken into account when computing the class_weight. Which paper should I check for references?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think weighted balanced accuracy is reported anywhere, but:

  • the PR's implementation matches the incumbed
  • the implementation matches the invariance tests for weighting in metrics.tests.test_common that are really pretty good, if I must say so myself as the architect...

Generally when we have a value for class_weight as well as sample_weight, we weight the samples per the class_weight (i.e. class_weight.take(y)), and then we multiply by each sample's weight. Exactly what's happening here. However, currently our handling of class_weight='balanced' counts the number not the total weight of samples in each class, then assigns each the reciprocal as each sample's weight. I initially used that in this implementation, and was not surprised to find that it failed the tests: repetition of samples was no longer equivalent to integer weights. So here we use the total weight (not the cardinality) in determining the class weight, reciprocate that, but still assign each sample its weight so that we can correctly calculate the weighted confusion matrix.

Which makes me think: we should be able to implement this even more simply from the confusion matrix... I'll play with that another time soon.

@glemaitre
Copy link
Member

The implementation with the confusion matrix seems really straight forward. It looks like an average of the TPR per classes. The generalization from binary to multi-class look good to me. I don't see a case where it would not be correct.


In contrast, if the conventional accuracy is above chance only because the
classifier takes advantage of an imbalanced test set, then the balanced
accuracy, as appropriate, will drop to 1/(number of classes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a math environment?

:math:`\frac{1}{# classes}`

accuracy, as appropriate, will drop to 1/(number of classes).

The score ranges from 0 to 1, or when ``adjusted=True`` is used, it rescaled
to the range [1 / (1 - number of classes), 1] with performance at random being
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find the range difficult to read in the doc. I would go for an math environment.

adjusted : bool, default=False
When true, the result is adjusted for chance, so that random
performance would score 0, and perfect performance scores 1.

Returns
-------
balanced_accuracy : float.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might change the sensitivity/specificity explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

With ``adjusted=True``, balanced accuracy reports the relative increase from
:math:`\texttt{balanced-accuracy}(y, \mathbf{0}, w) =
\frac{1}{\text{n classes}}`. In the binary case, this is also known as
*Youden's J statistic*, or *informedness*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre glemaitre changed the title [MRG] ENH multiclass balanced accuracy [MRG+1] ENH multiclass balanced accuracy Feb 13, 2018
@glemaitre
Copy link
Member

LGTM. @maskani-moh Could you have a look and tell us WYT?

@jnothman
Copy link
Member Author

This should be quick to review if someone (other than @glemaitre who has given his +1) is keen to throw it into 0.20.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at a glance. I need (and promise) to double check the code and refs tomorrow.
Some small comments, feel free to ignore if you think current version is fine.
My LGTM on the PR is based on the fact that the function is there. Honestly, I don't like the idea of including such a function, which can simply be implemented using recall.
Tagging 0.20.


In contrast, if the conventional accuracy is above chance only because the
classifier takes advantage of an imbalanced test set, then the balanced
accuracy, as appropriate, will drop to :math:`\frac{1}{\text{n\_classes}}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\text{n\_classes} -> \text{n_classes}? Or maybe some other way to get rid of the extra \ here.
Same comment for similar places below.


The score ranges from 0 to 1, or when ``adjusted=True`` is used, it rescaled to
the range :math:`\frac{1}{1 - \text{n\_classes}}` to 1, inclusive, with
performance at random scoring 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems strange. "Rescaled to the range A to B, with performance at random scoring 0". But 0 is actually not in [A, B]?
I'd prefer a clearer explanation for the scaling strategy we use when adjusted=True.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I realized that I'm wrong here.

have a score of :math:`0` while perfect predictions have a score of :math:`1`.
One can compute the macro-average recall using ``recall_score(average="macro")`` in :func:`recall_score`.
* Our definition: [Mosley2013]_, [Kelleher2015]_ and [Guyon2015]_, where
[Guyon2015]_ adopt the adjusted version to score chance as 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is score change as 0?

ROC AUC score given binary inputs.
The balanced accuracy in binary and multiclass classification problems to
deal with imbalanced datasets. It is defined as the average of recall
obtained on each class.

The best value is 1 and the worst value is 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer the case when adjusted=True?

@qinhanmin2014 qinhanmin2014 added this to the 0.20 milestone Jul 26, 2018
assert balanced == pytest.approx(macro_recall)
adjusted = balanced_accuracy_score(y_true, y_pred, adjusted=True)
chance = balanced_accuracy_score(y_true, np.full_like(y_true, y_true[0]))
assert adjusted == (balanced - chance) / (1 - chance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't use == when adjusted=False?

0.625
>>> roc_auc_score(y_true, y_pred)
0.625
\texttt{balanced-accuracy}(y, \hat{y}, w) = \frac{1}{\sum{\hat{w}_i}} \sum_i 1(\hat{y}_i == y_i) \hat{w}_i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it common to use == in the indicator function?

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the comments above.

:issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`, and
:issue:`10587` by `Joel Nothman`_.

- Added :class:`multioutput.RegressorChain` for multi-target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry should be removed.

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] ENH multiclass balanced accuracy [MRG+2] ENH multiclass balanced accuracy Jul 27, 2018
@jnothman
Copy link
Member Author

jnothman commented Jul 27, 2018 via email

@jnothman
Copy link
Member Author

jnothman commented Jul 27, 2018 via email

@qinhanmin2014
Copy link
Member

@jnothman Do you mind if I push some cosmetic changes and merge this one?

@jnothman
Copy link
Member Author

jnothman commented Jul 27, 2018 via email

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jnothman

@qinhanmin2014 qinhanmin2014 merged commit e888c0d into scikit-learn:master Jul 27, 2018
@jnothman
Copy link
Member Author

Removing those backslashes broke CircleCI on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants