Incorrect calculation from sklearn.metrics.f1_score? #10812

Scoodood · 2018-03-14T18:51:08Z

Description

The equation for the f1_score is shown here. I think the f1_score calculation from the sklearn.metrics.f1_score is incorrect for the following cases. This website also validate my calculation.

TruePositive, TP = 0
TrueNegative, TN = 10
FalsePositive, FP = 0
FalseNegative, FN = 0
Precision = TP / (TP + FP) = NaN because it's zero division
Recall = TP / (TP + FN) = NaN because it's zero division
F1-Score = 2 * Precision * Recall / (Precision + Recall) = NaN

But sklearn.metrics.f1_score gives an output of 0, which is incorrect

Steps/Code to Reproduce

import numpy as np
import sklearn.metrics as skm

actual = np.zeros(10)
pred = np.zeros(10)

tn, fp, fn, tp = skm.confusion_matrix(actual , pred, labels=[0, 1]).ravel()
f1 = skm.f1_score(actual , pred)
print('TP=', tp)  # 0
print('TN=', tn)  # 10
print('FP=', fp)  # 0
print('FN=', fn)  # 0
print('F1=', f1)  # 0.0

Expected Results

f1_score should be NaN

Actual Results

But the f1_score calculated from sklearn.metrics.f1_score is 0.0

The text was updated successfully, but these errors were encountered:

jnothman · 2018-03-14T22:13:04Z

NaN is not a score. For users blindly sorting scores and taking the last value, it is a thorn in the side. We have instead chosen to raise a warning and return 0. In what situation would NaN be more practically useful?

jnothman · 2018-03-14T22:14:14Z

I'm praticular, with macro-averaging, it would be very unhelpful to return NaN if a system failed to ever predict a rare class

jnothman · 2018-03-14T22:15:03Z

If you feel the documentation could be clearer, please offer a PR

e-pet · 2022-03-09T12:19:42Z

NaN is not a score. For users blindly sorting scores and taking the last value, it is a thorn in the side. We have instead chosen to raise a warning and return 0. In what situation would NaN be more practically useful?

A very late comment on this: I calculate statistics of metrics across various groups and classifiers. For some of those groups and classifiers, there might be no positive instances. Returning NaN would be (very) desirable for me, because otherwise the returned 0s will (falsely) distort the computed metric statistics. Replacing all returned NaNs by 0 is easy if that is what the user desires (also forcing them to think about whether that actually makes sense, which is probably a good thing?), but simply replacing all returned 0s by NaN may be wrong - the value could also actually be zero. So I currently have no easy way to get the behavior I actually need, I believe?

On a more general note, is it a good idea to support users in "blindly sorting scores" and sweeping actually occurring problems with those metrics under the rug?

qinhanmin2014 mentioned this issue Feb 12, 2019

MNT Consistent warning and more doc about the edge cases of P/R/F #13143

Merged

jnothman mentioned this issue Feb 13, 2019

ENH/FIX Replace jaccard_similarity_score by sane jaccard_score #13151

Merged

adrinjalali closed this as completed in #13143 Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Incorrect calculation from sklearn.metrics.f1_score? #10812

Incorrect calculation from sklearn.metrics.f1_score? #10812

Scoodood commented Mar 14, 2018

jnothman commented Mar 14, 2018

Uh oh!

jnothman commented Mar 14, 2018

Uh oh!

jnothman commented Mar 14, 2018

Uh oh!

e-pet commented Mar 9, 2022

Uh oh!

Uh oh!

Incorrect calculation from sklearn.metrics.f1_score? #10812

Incorrect calculation from sklearn.metrics.f1_score? #10812

Comments

Scoodood commented Mar 14, 2018

Description

Steps/Code to Reproduce

Expected Results

Actual Results

jnothman commented Mar 14, 2018

Uh oh!

jnothman commented Mar 14, 2018

Uh oh!

jnothman commented Mar 14, 2018

Uh oh!

e-pet commented Mar 9, 2022

Uh oh!