Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cem-anyscale
Copy link
Contributor

Description

Add TopKUnique aggregator that computes most frequent k values

* Add TopKUnique aggregator that computes most frequent k values

Signed-off-by: cem <[email protected]>
@cem-anyscale cem-anyscale requested a review from a team as a code owner December 18, 2025 18:50
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new TopKUnique aggregator, which is a valuable addition for computing the most frequent unique values in a column. The implementation correctly builds upon the existing ValueCounter and utilizes heapq.nlargest for efficient top-k computation. The accompanying tests are thorough, covering basic functionality, global frequency aggregation across blocks, and various edge cases. I have a couple of suggestions for a minor code cleanup to improve readability and a recommendation to enhance test robustness by adding a case for frequency ties.

@richardliaw
Copy link
Contributor

Why is it called TopKUnique and not just TopK?

@cem-anyscale
Copy link
Contributor Author

Why is it called TopKUnique and not just TopK?

Yeah TopK would be better; will rename.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Dec 18, 2025
@richardliaw richardliaw changed the title [data] Add TopKUnique aggregator [data] Add TopK aggregator Dec 18, 2025
* rename aggregator name
* rename default alias name

Signed-off-by: cem <[email protected]>
Signed-off-by: cem <[email protected]>
@cem-anyscale cem-anyscale added the go add ONLY when ready to merge, run all tests label Dec 18, 2025
Signed-off-by: cem <[email protected]>
@github-actions
Copy link

github-actions bot commented Jan 2, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests stale The issue is stale. It will be closed within 7 days unless there are further conversation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants