Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH add zero_division in balanced_accuracy_score #28038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

TengaCortal
Copy link

Reference Issues/PRs

Fixes #26892

What does this implement/fix? Explain your changes.

This addresses an inconsistency in the balanced_accuracy_score function, where the calculated balanced accuracy was not equal to the macro-average recall score. The issue was traced to the absence of the handling of zero division which resulted in unexpected discrepancies. To rectify this, the implementation was modified to ensure that zero division is appropriately handled, and the adjusted balanced accuracy is consistent with the macro-average recall score. These changes guarantee that the balanced accuracy aligns with the expected behavior. The test suite was updated to reflect the corrected behavior and to ensure the accuracy of the metric in various scenarios.

Any other comments?

Copy link

github-actions bot commented Dec 31, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 54dd010. Link to the linter CI: here

@glemaitre glemaitre self-requested a review January 11, 2024 21:27
@glemaitre glemaitre changed the title Fix: Ensure Consistency Between Balanced Accuracy and Macro-Average Recall ENH add zero_division in balanced_accuracy_score Jan 16, 2024
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TengaCortal.

Here is a review. I think that the check of the C.sum(axis=1) is unnecessary and we already catch this case by checking nan.

The unit test should also not reimplement the same computation than the function (we can be twice wrong). It is better to check for the regression and we have the possibility to check that we are consistent with the averaged recall score.

@glemaitre glemaitre self-requested a review May 18, 2024 13:03
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Only a couple of nitpicks for consistency. We are only the case of np.nan for consistency.

zero_division : {"warn", 0, 1}, default="warn"
Sets the value to return when there is a zero division. If set to "warn",
a warning will be raised and 0 will be returned. If set to 0, the metric
will be 0, and if set to 1, the metric will be 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
will be 0, and if set to 1, the metric will be 1.
will be 0, and if set to 1, the metric will be 1.
.. versionadded:: 1.6

@@ -2419,6 +2426,11 @@ def balanced_accuracy_score(y_true, y_pred, *, sample_weight=None, adjusted=Fals
performance would score 0, while keeping perfect performance at a score
of 1.

zero_division : {"warn", 0, 1}, default="warn"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zero_division : {"warn", 0, 1}, default="warn"
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be conistent with other metrics, we need to offer the np.nan option as well.

Comment on lines 2430 to 2433
Sets the value to return when there is a zero division. If set to "warn",
a warning will be raised and 0 will be returned. If set to 0, the metric
will be 0, and if set to 1, the metric will be 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sets the value to return when there is a zero division. If set to "warn",
a warning will be raised and 0 will be returned. If set to 0, the metric
will be 0, and if set to 1, the metric will be 1.
Sets the value to return when there is a zero division.
Notes:
- If set to "warn", this acts like 0, but a warning is also raised.
- If set to `np.nan`, such values will be excluded from the average.

@glemaitre glemaitre added this to the 1.6 milestone May 19, 2024
@glemaitre glemaitre self-requested a review October 29, 2024 17:08
@glemaitre
Copy link
Member

@lucyleeow This one is also of interest if you can have a look for a second review.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed my changes directly to try to make it for 1.6

@glemaitre
Copy link
Member

Also @adrinjalali you might have a look as well to make it for 1.6

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre WDYT?

Comment on lines 2530 to 2538
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"
Sets the value to return when there is a zero division.

Notes:

- If set to "warn", this acts like 0, but a warning is also raised.
- If set to `np.nan`, such values will be excluded from the average.

.. versionadded:: 1.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docstring as is, is vague and doesn't give enough information.

For instance, this warning is now removed: y_pred contains classes not in y_true.

The diff here seems more complicated than the other instances of adding zero_division, and this parameter is doing more than simply replacing a nan of a division by zero.

Better explaining the actual behavior, it'd be more clear how to proceed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve the current warning to give more info - both warings (old and new) are correct: when y_pred contains classes not in y_true, tp + fn = 0 (sum of true labels) for that class, this results in zero division and balance accuracy is ill-defined in this case.

How about we improve the warning to give more info, e.g., it is ill-defined because y_pred contains classes not in y_true (resulting in zero division when calculating recall)?

I appreciate it is not similar to precision/recall/f1 where there is explicit averaging/an explicit average parameter, so it's not so obvious that averaging is happening. We do say above: "It is defined as the average of recall obtained on each class" but we could be more explicitly explain in zero_division parameter that the zero division occurs when calculating recall for each class?

Similar to what we do in cohen_kappa_score:

zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"
Sets the return value when there is a zero division. This is the case when both
labelings `y1` and `y2` both exclusively contain the 0 class (e. g.
`[0, 0, 0, 0]`) (or if both are empty). If set to "warn", returns `0.0`, but a
warning is also raised.

(though cohen_kappa_score is even more different and there is no averaging)

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with @adrinjalali, in that the warning and zero_division docstring could be improved upon, but otherwise this looks good to me! Thanks!

assert balanced_accuracy == pytest.approx(expected_score)

# check the consistency with the averaged recall score per-class
with warnings.catch_warnings(record=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using record=True here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to because we don't use the output indeed.

@glemaitre glemaitre self-requested a review November 5, 2024 16:46
@glemaitre
Copy link
Member

I pushed a commit to improve the documentation and acknowledge the definition of the balanced accuracy as the average of recalls to make it explicit what do we mean by average.

Comment on lines +2539 to +2540
- If set to `np.nan`, such values will be excluded from the average when
computing the balanced accuracy as the average of the recalls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with the semantics of this now. Shouldn't zero_division=np.nan to me would mean "if there's a zero division, give me np.nan", but this is kinda the opposite. To me zero_dicision="ignore" would be easier to mean what we have written here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we are consistent with the other metric and with previous definition.

recall_score(y_true, y_pred, average="macro", zero_division=np.nan)

I don't think that we should change this behaviour here.

What make the semantic weird here is that the balanced accuracy have this average without any keyword. However, I would find it weird that the results does not match the definition of the recall_score. So it would mean that if we don't like this semantic, we should change for all zero_division parameters around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really feel like we're continuing a wrong path here.

Reading the code

recall_score(y_true, y_pred, average="macro", zero_division=np.nan)

w/o reading the docstring, I really expect to get np.nan if there's any zero_division in calculations, not to have them ignored in an aggregate calculation.

I'd be more in favor of deprecating that usage of zero_division=np.nan in favor of zero_division="ignore" or similar.

This is the situation now:

accuracy_score, classification_report, cohen_kappa_score, jaccard_score, matthews_corrcoef are all variations of this text:

    zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"
        Sets the value to return when there is a zero division,
        e.g. when `y_true` and `y_pred` are empty.
        If set to "warn", returns 0.0 input, but a warning is also raised.

        versionadded:: 1.6

However, classification_report calls precision_recall_fscore_support, which among other metrics has:

f1_score, fbeta_score, precision_recall_fscore_support, precision_score, recall_score:

    zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"
        Sets the value to return when there is a zero division, i.e. when all
        predictions and labels are negative.

        Notes:
        - If set to "warn", this acts like 0, but a warning is also raised.
        - If set to `np.nan`, such values will be excluded from the average.

        .. versionadded:: 1.3
           `np.nan` option was added.

I do think we need to change the status quo for the second batch, and make the written code by the user intuitive rather than very surprising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is something wrong here.

I don't like the semantic zero_division="ignore" because the exclusion happen at the aggregation level.

However, I'm wondering if we should not only return an aggregation that take the np.nan into account. Practically speaking, returning an average that does not take into account a certain class is actually dangerous. So I need to go back because I don't remember why this behaviour was chosen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more and more convinced that we need both np.nan and "ignore". np.nan will be taken into account in the final aggregation if there is one. "ignore" means that the result from the zero_division is ignore in the aggregation stage. My question know is if we have the "ignore" option, what is the behaviour when we don't aggregate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically what do we expect from the following example:

recall_score([[0, 0], [0, 0]], [[1, 1], [1, 1]], average=None, zero_division="ignore")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I wasn't saying we should remove np.nan from all of them. In my comment here (#28038 (comment)) to me the semantics of the first batch of the metrics (except classification report) is okay as is. We only need to change for the second batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.nan will be taken into account in the final aggregation if there is one

Can you clarify what this means?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification here: #29048 (comment)

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@glemaitre
Copy link
Member

Moving this in 1.7 since this we will need to revisit it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Balanced Accuracy Score is NOT equal to Recall Score
4 participants