Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Incorrect calculation from sklearn.metrics.f1_score? #10812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Scoodood opened this issue Mar 14, 2018 · 4 comments · Fixed by #13151
Closed

Incorrect calculation from sklearn.metrics.f1_score? #10812

Scoodood opened this issue Mar 14, 2018 · 4 comments · Fixed by #13151

Comments

@Scoodood
Copy link

Description

The equation for the f1_score is shown here. I think the f1_score calculation from the sklearn.metrics.f1_score is incorrect for the following cases. This website also validate my calculation.

TruePositive, TP = 0
TrueNegative, TN = 10
FalsePositive, FP = 0
FalseNegative, FN = 0
Precision = TP / (TP + FP) = NaN because it's zero division
Recall = TP / (TP + FN) = NaN because it's zero division
F1-Score = 2 * Precision * Recall / (Precision + Recall) = NaN

But sklearn.metrics.f1_score gives an output of 0, which is incorrect

Steps/Code to Reproduce

import numpy as np
import sklearn.metrics as skm

actual = np.zeros(10)
pred = np.zeros(10)

tn, fp, fn, tp = skm.confusion_matrix(actual , pred, labels=[0, 1]).ravel()
f1 = skm.f1_score(actual , pred)
print('TP=', tp)  # 0
print('TN=', tn)  # 10
print('FP=', fp)  # 0
print('FN=', fn)  # 0
print('F1=', f1)  # 0.0

Expected Results

f1_score should be NaN

Actual Results

But the f1_score calculated from sklearn.metrics.f1_score is 0.0

@jnothman
Copy link
Member

NaN is not a score. For users blindly sorting scores and taking the last value, it is a thorn in the side. We have instead chosen to raise a warning and return 0. In what situation would NaN be more practically useful?

@jnothman
Copy link
Member

I'm praticular, with macro-averaging, it would be very unhelpful to return NaN if a system failed to ever predict a rare class

@jnothman
Copy link
Member

If you feel the documentation could be clearer, please offer a PR

@e-pet
Copy link

e-pet commented Mar 9, 2022

NaN is not a score. For users blindly sorting scores and taking the last value, it is a thorn in the side. We have instead chosen to raise a warning and return 0. In what situation would NaN be more practically useful?

A very late comment on this: I calculate statistics of metrics across various groups and classifiers. For some of those groups and classifiers, there might be no positive instances. Returning NaN would be (very) desirable for me, because otherwise the returned 0s will (falsely) distort the computed metric statistics. Replacing all returned NaNs by 0 is easy if that is what the user desires (also forcing them to think about whether that actually makes sense, which is probably a good thing?), but simply replacing all returned 0s by NaN may be wrong - the value could also actually be zero. So I currently have no easy way to get the behavior I actually need, I believe?

On a more general note, is it a good idea to support users in "blindly sorting scores" and sweeping actually occurring problems with those metrics under the rug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants