Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cem-anyscale
Copy link
Contributor

Description

Add TopKUnique aggregator that computes most frequent k values

* Add TopKUnique aggregator that computes most frequent k values

Signed-off-by: cem <[email protected]>
@cem-anyscale cem-anyscale requested a review from a team as a code owner December 18, 2025 18:50
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new TopKUnique aggregator, which is a valuable addition for computing the most frequent unique values in a column. The implementation correctly builds upon the existing ValueCounter and utilizes heapq.nlargest for efficient top-k computation. The accompanying tests are thorough, covering basic functionality, global frequency aggregation across blocks, and various edge cases. I have a couple of suggestions for a minor code cleanup to improve readability and a recommendation to enhance test robustness by adding a case for frequency ties.

@richardliaw
Copy link
Contributor

Why is it called TopKUnique and not just TopK?

@cem-anyscale
Copy link
Contributor Author

Why is it called TopKUnique and not just TopK?

Yeah TopK would be better; will rename.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Dec 18, 2025
@richardliaw richardliaw changed the title [data] Add TopKUnique aggregator [data] Add TopK aggregator Dec 18, 2025
* rename aggregator name
* rename default alias name

Signed-off-by: cem <[email protected]>
Signed-off-by: cem <[email protected]>
@cem-anyscale cem-anyscale added the go add ONLY when ready to merge, run all tests label Dec 18, 2025
Signed-off-by: cem <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants