Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MNT Refactor _average_weighted_percentile to avoid double sort #31775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lucyleeow
Copy link
Member

Reference Issues/PRs

Supercedes #30945

What does this implement/fix? Explain your changes.

Refactor _average_weighted_percentile so we are not just performing _weighted_percentile twice, thus avoids sorting and computing cumulative sum twice.

#30945 essentially uses the sorted indicies and calculates _weighted_percentile(-array, 100-percentile_rank) - this was verbose and required computing cumulative sum again on the negative (you could have used symmetry to avoid computing cumulative sum in cases when fraction above is greater than 0 - i.e., g>0 from Hyndman and Fan)

I've followed the Hyndman and Fan computation more closely and calculate g and just use j+1 (since we already know j). This did make handling the case where j+1 had a sample weight of 0 (or when you have sample weight of 0 at the end of the array) more complex.

Any other comments?

Copy link

github-actions bot commented Jul 17, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ba57727. Link to the linter CI: here


result = xp.where(
is_fraction_above,
array[percentile_in_sorted, col_indices],
Copy link
Member Author

@lucyleeow lucyleeow Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought this should be percentile_plus_one_in_sorted as from the paper, when g>0, $\gamma=1$, but searchsorted defaults to left (equals is on the right), whereas the paper defined j <= pn < j+1 but searchsorted effectively gives i-1 < pn <= i whereas the paper had j <= pn < j+1. This means that when pn is greater than the LHS, searchsorted's i equals j+1, from the paper.

When the quantile exactly matches an index, searchsorted's i equals j, from the paper (as the equals is on opposite sides in paper vs searchsorted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant