MNT Refactor `_average_weighted_percentile` to avoid double sort #31775

lucyleeow · 2025-07-17T11:21:00Z

Reference Issues/PRs

Supercedes #30945

What does this implement/fix? Explain your changes.

Refactor _average_weighted_percentile so we are not just performing _weighted_percentile twice, thus avoids sorting and computing cumulative sum twice.

#30945 essentially uses the sorted indicies and calculates _weighted_percentile(-array, 100-percentile_rank) - this was verbose and required computing cumulative sum again on the negative (you could have used symmetry to avoid computing cumulative sum in cases when fraction above is greater than 0 - i.e., g>0 from Hyndman and Fan)

I've followed the Hyndman and Fan computation more closely and calculate g and just use j+1 (since we already know j). This did make handling the case where j+1 had a sample weight of 0 (or when you have sample weight of 0 at the end of the array) more complex.

Any other comments?

github-actions · 2025-07-17T11:21:58Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: ba57727. Link to the linter CI: here}

lucyleeow · 2025-07-17T11:25:28Z

sklearn/utils/stats.py

+
+        result = xp.where(
+            is_fraction_above,
+            array[percentile_in_sorted, col_indices],


I initially thought this should be percentile_plus_one_in_sorted as from the paper, when g>0, $\gamma=1$, ~~but searchsorted defaults to left (equals is on the right), whereas the paper defined j <= pn < j+1~~ but searchsorted effectively gives i-1 < pn <= i whereas the paper had j <= pn < j+1. This means that when pn is greater than the LHS, searchsorted's i equals j+1, from the paper.

When the quantile exactly matches an index, searchsorted's i equals j, from the paper (as the equals is on opposite sides in paper vs searchsorted).

lucyleeow added 3 commits July 14, 2025 14:58

try reverse cum sum

8fe6ae2

initial implementation, wip tests

b9c0c7b

fix and add tests, update use

b56fab0

github-actions bot added module:metrics module:preprocessing module:utils labels Jul 17, 2025

lucyleeow mentioned this pull request Jul 17, 2025

Refactor weighted percentile functions to avoid redundant sorting #30945

Closed

lucyleeow commented Jul 17, 2025

View reviewed changes

lucyleeow added 2 commits July 18, 2025 23:51

fixes and add tests

f99366c

simplify zero sample code

ba57727

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MNT Refactor `_average_weighted_percentile` to avoid double sort #31775

MNT Refactor `_average_weighted_percentile` to avoid double sort #31775

lucyleeow commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

lucyleeow Jul 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

MNT Refactor _average_weighted_percentile to avoid double sort #31775

Are you sure you want to change the base?

MNT Refactor _average_weighted_percentile to avoid double sort #31775

Conversation

lucyleeow commented Jul 17, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lucyleeow Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MNT Refactor `_average_weighted_percentile` to avoid double sort #31775

MNT Refactor `_average_weighted_percentile` to avoid double sort #31775

github-actions bot commented Jul 17, 2025 •

edited

Loading

lucyleeow Jul 17, 2025 •

edited

Loading