-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Closed
Copy link
Labels
Description
It seems that when sample_weight is specified in precision_recall_curve function, all the scores in y_score will be reserved in the output thresholds, even though the corresponding weight of the score is zero.
from sklearn.metrics import precision_recall_curve
y_true1 = [0, 0, 1, 1, 1]
y_score1 = [0, 1, 0, 1, 2]
weight1 = [1, 1, 1, 2, 0]
y_true2 = [0, 0, 1, 1]
y_score2 = [0, 1, 0, 1]
weight2 = [1, 1, 1, 2]
precision1, recall1, thre1 = precision_recall_curve(y_true1, y_score1, sample_weight=weight1)
print('precisions1', precision1)
print('recalls1 ', recall1)
print('thresholds1', thre1)
precision2, recall2, thre2 = precision_recall_curve(y_true2, y_score2, sample_weight=weight2)
print('precisions2', precision2)
print('recalls2 ', recall2)
print('thresholds2', thre2)
Take the code above as an example. The weight assigned to score 2 is 0. I think the precision-recall curve got from input 1 and input 2 should be the same while found the results are
precisions1 [0.6 0.66666667 0. 1. ]
recalls1 [1. 0.66666667 0. 0. ]
thresholds1 [0 1 2]
precisions2 [0.6 0.66666667 1. ]
recalls2 [1. 0.66666667 0. ]
thresholds2 [0 1]
The precision-recall curve got from input 1 looked very strange for its sudden drop to 0.