Init checks for Dask KMeans #7391

viclafargue · 2025-10-27T11:19:54Z

python/cuml/cuml/cluster/kmeans.pyx

jcrist · 2025-10-27T14:40:09Z

python/cuml/cuml/cluster/kmeans.pyx


        # Skip this check if running in multigpu mode. In that case we don't care if
        # a single partition has fewer rows than clusters
        if not multigpu and n_rows < self.n_clusters:


While you're here - I'm not 100% sure if this check should be skipped in multi-gpu execution. When refactoring I excluded it in multi-gpu since we weren't running it there before.

Is the multi-gpu implementation robust to a single node having fewer rows than the requested n_clusters? It doesn't seem to error when invoked in that setup, but I'm also not sure if it provides good results.

I think the check was missing in the multi-GPU implementation. I do not know for sure either, but I guess that this is a rare case that should probably not yield very good results especially for scalable/parallel kmeans++ initialization. Better safe than sorry, we should probably alert the user in this case.

divyegala

Maybe also add a check that oversampling_factor > 0?

viclafargue · 2025-10-31T15:20:32Z

/merge

During a recent refactor we removed the `KMeansMG` class, viewing it as internal. It turns out this class was used by a few external projects. Since we still need to support external users accessing the non-dask multi-gpu implementation, we'll want a public way to do so that isn't the private `_fit` method. Additionally, since we want to special case the `MG` case a little more, making it a separate class (even if as a thin shim) makes sense. This PR: - Brings back the `KMeansMG` class - Adds a check that `random_state` is non-None in the `KMeansMG` case, ensuring external users also set `random_state` properly - Removes mutation of kwargs in the dask `KMeans` case (as suggested [here](#7417 (comment))) - Simplifies and moves the multi-gpu `kmeans++`/`oversampling_factor` check (as suggested [here](#7391 (comment))) Fixes #7387. Fixes #7389. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #7420

Init checks for Dask KMeans

038c8f6

viclafargue requested a review from a team as a code owner October 27, 2025 11:19

viclafargue requested a review from dantegd October 27, 2025 11:19

github-actions bot added the Cython / Python Cython or Python issue label Oct 27, 2025

github-actions bot assigned viclafargue Oct 27, 2025

jcrist reviewed Oct 27, 2025

View reviewed changes

Answering review

82a6bb2

divyegala reviewed Oct 31, 2025

View reviewed changes

viclafargue added 2 commits October 31, 2025 14:18

Merge branch 'main' into init-checks-dask-kmeans

9e605ea

answer review

be8c66b

viclafargue force-pushed the init-checks-dask-kmeans branch from aaa89f5 to be8c66b Compare October 31, 2025 13:33

viclafargue added bug Something isn't working non-breaking Non-breaking change labels Oct 31, 2025

divyegala approved these changes Oct 31, 2025

View reviewed changes

rapids-bot bot merged commit 550aba7 into rapidsai:main Oct 31, 2025
103 checks passed

jcrist mentioned this pull request Oct 31, 2025

Bring back KMeansMG #7420

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Init checks for Dask KMeans #7391

Init checks for Dask KMeans #7391

Uh oh!

viclafargue commented Oct 27, 2025

Uh oh!

Uh oh!

jcrist Oct 27, 2025

Uh oh!

viclafargue Oct 27, 2025 •

edited

Loading

Uh oh!

divyegala left a comment

Uh oh!

viclafargue commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Init checks for Dask KMeans #7391

Init checks for Dask KMeans #7391

Uh oh!

Conversation

viclafargue commented Oct 27, 2025

Uh oh!

Uh oh!

jcrist Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

viclafargue Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyegala left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viclafargue Oct 27, 2025 •

edited

Loading