Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/whats_new/v1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,11 @@ Changelog
- |Fix| Fixed a bug in :func:`metrics.normalized_mutual_info_score` which could return
unbounded values. :pr:`22635` by :user:`Jérémie du Boisberranger <jeremiedbb>`.

- |Fix| :func:`metrics.ndcg_score` will now trigger a warning when the y_true
value contains a negative value. It will allow the user to still use negative
values, but the result may not be between 0 and 1.
:pr:`22710` by :user:`Conroy Trinh <trinhcon>`
Comment on lines +613 to +616
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use `backticks` for parameters like y_true. This should probably also mention the deprecation of negative values for y_true. I made a minor wording edit as well, but feel free to rephrase as you see fit :)

Suggested change
- |Fix| :func:`metrics.ndcg_score` will now trigger a warning when the y_true
value contains a negative value. It will allow the user to still use negative
values, but the result may not be between 0 and 1.
:pr:`22710` by :user:`Conroy Trinh <trinhcon>`
- |Fix| :func:`metrics.ndcg_score` will now trigger a warning when `y_true`
contains a negative value. The user may still use negative
values, but the result may not be between 0 and 1. Begins deprecation
of negative values in `y_true`.
:pr:`22710` by :user:`Conroy Trinh <trinhcon>`


:mod:`sklearn.model_selection`
..............................

Expand Down
19 changes: 18 additions & 1 deletion sklearn/metrics/_ranking.py
Original file line number Diff line number Diff line change
Expand Up @@ -1538,7 +1538,9 @@ def ndcg_score(y_true, y_score, *, k=None, sample_weight=None, ignore_ties=False
----------
y_true : ndarray of shape (n_samples, n_labels)
True targets of multilabel classification, or true scores of entities
to be ranked.
to be ranked. Negative values in y_true may result in an output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to be ranked. Negative values in y_true may result in an output
to be ranked. Negative values in `y_true` may result in an output

that is not between 0 and 1. These negative values are deprecated, and
may cause an error in the future.

y_score : ndarray of shape (n_samples, n_labels)
Target scores, can either be probability estimates, confidence values,
Expand Down Expand Up @@ -1616,11 +1618,26 @@ def ndcg_score(y_true, y_score, *, k=None, sample_weight=None, ignore_ties=False
... scores, k=1, ignore_ties=True)
0.5
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a cleaner git blame, try to revert any extraneous line additions/edits.

y_true = check_array(y_true, ensure_2d=False)
y_score = check_array(y_score, ensure_2d=False)
check_consistent_length(y_true, y_score, sample_weight)
_check_dcg_target_type(y_true)
gain = _ndcg_sample_scores(y_true, y_score, k=k, ignore_ties=ignore_ties)

if (isinstance(y_true, np.ndarray)):
if (y_true.min() < 0):
warnings.warn(
"ndcg_score should not use negative y_true values",
DeprecationWarning,
)
else:
for value in y_true:
if (value < 0):
warnings.warn(
"ndcg_score should not use negative y_true values",
DeprecationWarning,
)
Comment on lines +1634 to +1640
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states that y_true : ndarray of shape (n_samples, n_labels) so do we really need this clause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause was for one of the tests that from it seemed did not use numpy arrays, but the regular one. I found that removing the else clause, causes these kinds of tests to fail. Unless I am misunderstanding the issue, it is necessary for some of tests to pass

return np.average(gain, weights=sample_weight)


Expand Down
17 changes: 17 additions & 0 deletions sklearn/metrics/tests/test_ranking.py
Original file line number Diff line number Diff line change
Expand Up @@ -1650,6 +1650,23 @@ def test_ndcg_ignore_ties_with_k():
ndcg_score(a, a, k=3, ignore_ties=True)
)

def test_ndcg_negative_ndarray_warn():
y_true = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)
Comment on lines +1654 to +1655
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're trying to organize this into an array of shape (1, 5) then this may be a cleaner way. Similarly for the other tests.

Suggested change
y_true = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)
y_true = np.array([[-0.89, -0.53, -0.47, 0.39, 0.56]])
y_score = np.array([[0.07,0.31,0.75,0.33,0.27]])

expected_message = "ndcg_score should not use negative y_true values"
with pytest.warns(DeprecationWarning, match=expected_message):
ndcg_score(y_true, y_score)

def test_ndcg_negative_output():
y_true = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)
Comment on lines +1661 to +1662
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_true = np.array([-0.89, -0.53, -0.47, 0.39, 0.56]).reshape(1,-1)
y_score = np.array([0.07,0.31,0.75,0.33,0.27]).reshape(1,-1)
y_true = np.array([[-0.89, -0.53, -0.47, 0.39, 0.56]])
y_score = np.array([[0.07,0.31,0.75,0.33,0.27]])

assert ndcg_score(y_true, y_score) == pytest.approx(396.0329)

def test_ndcg_positive_ndarray():
y_true = np.array([0.11, 0.47, 0.53, 1.39, 1.56]).reshape(1,-1)
y_score = np.array([1.07, 1.31, 1.75, 1.33, 1.27]).reshape(1,-1)
Comment on lines +1666 to +1667
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_true = np.array([0.11, 0.47, 0.53, 1.39, 1.56]).reshape(1,-1)
y_score = np.array([1.07, 1.31, 1.75, 1.33, 1.27]).reshape(1,-1)
y_true = np.array([[0.11, 0.47, 0.53, 1.39, 1.56]])
y_score = np.array([[1.07, 1.31, 1.75, 1.33, 1.27]])

with pytest.warns(None):
ndcg_score(y_true, y_score)

def test_ndcg_invariant():
y_true = np.arange(70).reshape(7, 10)
Expand Down