-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH add zero_division in balanced_accuracy_score #28038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH add zero_division in balanced_accuracy_score #28038
Conversation
…ngaCortal/scikit-learn into 26892-balanced-accuracy-score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TengaCortal.
Here is a review. I think that the check of the C.sum(axis=1)
is unnecessary and we already catch this case by checking nan
.
The unit test should also not reimplement the same computation than the function (we can be twice wrong). It is better to check for the regression and we have the possibility to check that we are consistent with the averaged recall score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. Only a couple of nitpicks for consistency. We are only the case of np.nan
for consistency.
sklearn/metrics/_classification.py
Outdated
zero_division : {"warn", 0, 1}, default="warn" | ||
Sets the value to return when there is a zero division. If set to "warn", | ||
a warning will be raised and 0 will be returned. If set to 0, the metric | ||
will be 0, and if set to 1, the metric will be 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be 0, and if set to 1, the metric will be 1. | |
will be 0, and if set to 1, the metric will be 1. | |
.. versionadded:: 1.6 |
sklearn/metrics/_classification.py
Outdated
@@ -2419,6 +2426,11 @@ def balanced_accuracy_score(y_true, y_pred, *, sample_weight=None, adjusted=Fals | |||
performance would score 0, while keeping perfect performance at a score | |||
of 1. | |||
|
|||
zero_division : {"warn", 0, 1}, default="warn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zero_division : {"warn", 0, 1}, default="warn" | |
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be conistent with other metrics, we need to offer the np.nan
option as well.
sklearn/metrics/_classification.py
Outdated
Sets the value to return when there is a zero division. If set to "warn", | ||
a warning will be raised and 0 will be returned. If set to 0, the metric | ||
will be 0, and if set to 1, the metric will be 1. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets the value to return when there is a zero division. If set to "warn", | |
a warning will be raised and 0 will be returned. If set to 0, the metric | |
will be 0, and if set to 1, the metric will be 1. | |
Sets the value to return when there is a zero division. | |
Notes: | |
- If set to "warn", this acts like 0, but a warning is also raised. | |
- If set to `np.nan`, such values will be excluded from the average. | |
@lucyleeow This one is also of interest if you can have a look for a second review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed my changes directly to try to make it for 1.6
Also @adrinjalali you might have a look as well to make it for 1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre WDYT?
sklearn/metrics/_classification.py
Outdated
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn" | ||
Sets the value to return when there is a zero division. | ||
|
||
Notes: | ||
|
||
- If set to "warn", this acts like 0, but a warning is also raised. | ||
- If set to `np.nan`, such values will be excluded from the average. | ||
|
||
.. versionadded:: 1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docstring as is, is vague and doesn't give enough information.
For instance, this warning is now removed: y_pred contains classes not in y_true
.
The diff here seems more complicated than the other instances of adding zero_division
, and this parameter is doing more than simply replacing a nan of a division by zero.
Better explaining the actual behavior, it'd be more clear how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could improve the current warning to give more info - both warings (old and new) are correct: when y_pred
contains classes not in y_true
, tp + fn = 0 (sum of true labels) for that class, this results in zero division and balance accuracy is ill-defined in this case.
How about we improve the warning to give more info, e.g., it is ill-defined because y_pred contains classes not in y_true (resulting in zero division when calculating recall)?
I appreciate it is not similar to precision/recall/f1 where there is explicit averaging/an explicit average
parameter, so it's not so obvious that averaging is happening. We do say above: "It is defined as the average of recall obtained on each class" but we could be more explicitly explain in zero_division
parameter that the zero division occurs when calculating recall for each class?
Similar to what we do in cohen_kappa_score
:
scikit-learn/sklearn/metrics/_classification.py
Lines 740 to 744 in 56bbb5a
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn" | |
Sets the return value when there is a zero division. This is the case when both | |
labelings `y1` and `y2` both exclusively contain the 0 class (e. g. | |
`[0, 0, 0, 0]`) (or if both are empty). If set to "warn", returns `0.0`, but a | |
warning is also raised. |
(though cohen_kappa_score
is even more different and there is no averaging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with @adrinjalali, in that the warning and zero_division
docstring could be improved upon, but otherwise this looks good to me! Thanks!
assert balanced_accuracy == pytest.approx(expected_score) | ||
|
||
# check the consistency with the averaged recall score per-class | ||
with warnings.catch_warnings(record=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using record=True
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to because we don't use the output indeed.
I pushed a commit to improve the documentation and acknowledge the definition of the balanced accuracy as the average of recalls to make it explicit what do we mean by |
- If set to `np.nan`, such values will be excluded from the average when | ||
computing the balanced accuracy as the average of the recalls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused with the semantics of this now. Shouldn't zero_division=np.nan
to me would mean "if there's a zero division, give me np.nan
", but this is kinda the opposite. To me zero_dicision="ignore"
would be easier to mean what we have written here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we are consistent with the other metric and with previous definition.
recall_score(y_true, y_pred, average="macro", zero_division=np.nan)
I don't think that we should change this behaviour here.
What make the semantic weird here is that the balanced accuracy have this average without any keyword. However, I would find it weird that the results does not match the definition of the recall_score
. So it would mean that if we don't like this semantic, we should change for all zero_division
parameters around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really feel like we're continuing a wrong path here.
Reading the code
recall_score(y_true, y_pred, average="macro", zero_division=np.nan)
w/o reading the docstring, I really expect to get np.nan
if there's any zero_division
in calculations, not to have them ignored in an aggregate calculation.
I'd be more in favor of deprecating that usage of zero_division=np.nan
in favor of zero_division="ignore"
or similar.
This is the situation now:
accuracy_score
, classification_report
, cohen_kappa_score
, jaccard_score
, matthews_corrcoef
are all variations of this text:
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"
Sets the value to return when there is a zero division,
e.g. when `y_true` and `y_pred` are empty.
If set to "warn", returns 0.0 input, but a warning is also raised.
versionadded:: 1.6
However, classification_report
calls precision_recall_fscore_support
, which among other metrics has:
f1_score
, fbeta_score
, precision_recall_fscore_support
, precision_score
, recall_score
:
zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn"
Sets the value to return when there is a zero division, i.e. when all
predictions and labels are negative.
Notes:
- If set to "warn", this acts like 0, but a warning is also raised.
- If set to `np.nan`, such values will be excluded from the average.
.. versionadded:: 1.3
`np.nan` option was added.
I do think we need to change the status quo for the second batch, and make the written code by the user intuitive rather than very surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is something wrong here.
I don't like the semantic zero_division="ignore"
because the exclusion happen at the aggregation level.
However, I'm wondering if we should not only return an aggregation that take the np.nan
into account. Practically speaking, returning an average that does not take into account a certain class is actually dangerous. So I need to go back because I don't remember why this behaviour was chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I more and more convinced that we need both np.nan
and "ignore"
. np.nan
will be taken into account in the final aggregation if there is one. "ignore"
means that the result from the zero_division
is ignore in the aggregation stage. My question know is if we have the "ignore"
option, what is the behaviour when we don't aggregate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically what do we expect from the following example:
recall_score([[0, 0], [0, 0]], [[1, 1], [1, 1]], average=None, zero_division="ignore")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I wasn't saying we should remove np.nan
from all of them. In my comment here (#28038 (comment)) to me the semantics of the first batch of the metrics (except classification report) is okay as is. We only need to change for the second batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.nan will be taken into account in the final aggregation if there is one
Can you clarify what this means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification here: #29048 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Lucy Liu <[email protected]>
Moving this in 1.7 since this we will need to revisit it. |
Reference Issues/PRs
Fixes #26892
What does this implement/fix? Explain your changes.
This addresses an inconsistency in the balanced_accuracy_score function, where the calculated balanced accuracy was not equal to the macro-average recall score. The issue was traced to the absence of the handling of zero division which resulted in unexpected discrepancies. To rectify this, the implementation was modified to ensure that zero division is appropriately handled, and the adjusted balanced accuracy is consistent with the macro-average recall score. These changes guarantee that the balanced accuracy aligns with the expected behavior. The test suite was updated to reflect the corrected behavior and to ensure the accuracy of the metric in various scenarios.
Any other comments?