-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
Description
The method ndcg_score from sklearn.metrics fails when the true relevance scores are negative.
Steps/Code to Reproduce
import numpy as np
from sklearn.metrics import ndcg_score
y_true = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)
ndcg_score(y_true, y_score) # Should be between 0 and 1 as per the docstring
>>> 396.0329594603174
ndcg_score(y_true+1, y_score+1)
>>> 0.7996030755957273
Expected Results
The documentation doesn't explicitly state that y_true or y_score should be non-negative. The cited Wikipedia article for DCG doesn't seem to mention a non-negativity assumption either. So either the method should be able to deal with scores regardless of sign, or the documentation should explicitly say otherwise.
Disclosure/Question: I'm not an expert in ranking metrics, but it seems that there might be cases where one might want to compare lists of scores in (-∞,∞) based on their ordering alone (i.e., not MSE or related metrics). Is there any other metric in scikit-learn that is more appropriate for that use case?
Versions
numpy: 1.18.1
scipy: 1.4.1
sklearn: 0.22.1