-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
FIX show only accuracy when having a subset of labels in classification report #28399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX show only accuracy when having a subset of labels in classification report #28399
Conversation
|
We also need a non-regression test to check that we have the expected behaviour and entry in the changelog to acknowledge the bug fix. |
…hi253/scikit-learn into Vin_ClassificationReportBug
|
@glemaitre Updated as per suggested. |
glemaitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments.
doc/whats_new/v1.5.rst
Outdated
| :class:`~metrics.PrecisionRecallDisplay`, :class:`~metrics.DetCurveDisplay`, | ||
| :class:`~calibration.CalibrationDisplay`. | ||
| :pr:`28051` by :user:`Pierre de Fréminville <pidefrem>`. | ||
| - |Fix|:class:`metrics.classification_report` now shows only accuracy and not micro-average when input is a subset of labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need to be moved in 1.4 under the 1.4.2 changelog. We will include it in the next bug fix release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to make sure that you entry is less than 88 characters also in the changelog.
|
|
||
|
|
||
| def test_classification_report_input_subset_of_labels(): | ||
| y_true, y_pred = [0, 1], [0, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we want a slightly different test isntead: we want that different input in labels show or not the "accuracy". So we could parametrize the test by trying labels=([0, 1, 2], [0, 1], [0]).
glemaitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the test just to check for the present key as this is the issue reported and I moved the entry in the right changelog as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I move the what's new entry to target 1.5. 1.4.2
LGTM. Thanks @vjoshi253
Reference Issues/PRs
This PR fixes the issue #27927.
What does this implement/fix? Explain your changes.
There is an incosistency between the calculation of the micro-average in the code as compared to what the doc mentions.
The doc mentions that:
But the code even gives the micro-average for superset cases.
The fix is code is quite trivial and makes the code and documentation consistent.
Any other comments?
The author of the issue shared some reproduction steps and the expected output. This fix is able to get to the expected output.
print(classification_report([0, 1], [1, 0], labels=[0, 1, 2], zero_division=0.0))Output: