fix: for balanced kmeans use grid.x for adjust_centers to avoid grid.y overflow #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix: for balanced kmeans use grid.x for adjust_centers to avoid grid.y overflow for large n_clusters
Problem: adjust_centers launched with grid.y = ceil(n_clusters/4) hits CUDA’s 65,535 Y-dim limit for n_clusters > 262,140 (e.g., 263k and 1M).
Fix: enumerate blocks along grid.x and compute l via blockIdx.x. No algorithmic or perf changes; only prevents the invalid configuration.
Repro: 262k works; 263k fails pre-fix; both work post-fix. 1M centroid training proceeds past balancing.
Impact: zero regression; removes a hard cap on n_clusters.