FEA Add array API support to `davies_bouldin_score` #32693

jaffourt · 2025-11-11T19:41:49Z

Reference Issues/PRs

Towards #26024

What does this implement/fix? Explain your changes.

Add array API support to davies_bouldin_score

Any other comments?

github-actions · 2025-11-11T19:42:53Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 85805f9. Link to the linter CI: here}

virchan

LGTM! Thanks, @jaffourt!

CUDA CI is green as well. @lucyleeow, @OmarManzoor, friendly ping, would you like to have a look?

OmarManzoor

LGTM. Thanks @jaffourt

lucyleeow · 2025-11-27T02:25:44Z

sklearn/metrics/cluster/_unsupervised.py


-    intra_dists = np.zeros(n_labels)
-    centroids = np.zeros((n_labels, len(X[0])), dtype=float)
+    dtype = _max_precision_float_dtype(xp, device_)


Just a quick question, why _max_precision_float_dtype and not _find_matching_floating_dtype here?

Its a good question and I don't have a solid answer. I have seen a number of different approaches for finding a common floating dtype in array_api compliant calculations, and it would be nice to have a more defined heuristic if needed :)

In this case, my reasoning for _max_precision_float_dtype was that _find_matching_floating_dtype finds a common float type between several arrays, but in this implementation we are building new floating arrays for intermediate calculations.

I.e., does it make sense to use the floating dtype returned from _find_matching_floating_dtype(x, labels)?

Yes, we could have better consistency around the use of these two.

From my understanding, _max_precision_float_dtype is used when we require the highest precision e.g. here:

scikit-learn/sklearn/metrics/_ranking.py

Lines 969 to 974 in de38166

# Perform the weighted cumulative sum using float64 precision when possible

# to avoid numerical stability problem with tens of millions of very noisy

# predictions:

# https://github.com/scikit-learn/scikit-learn/issues/31533#issuecomment-2967062437

y_true = xp.astype(y_true, max_float_dtype)

tps = xp.cumulative_sum(y_true * weight, dtype=max_float_dtype)[threshold_idxs]

cumulative_sum is notorious for floating point precision error, so it's best to use the highest precision.

I was just wondering if intra_dists had a particular requirement for higher precision.

Probably not worth worrying about here (I think?)

wip

dea5dd2

github-actions bot added the module:metrics label Nov 11, 2025

towncrier

85805f9

lucyleeow added the Array API label Nov 12, 2025

lucyleeow moved this to In Progress in Array API Nov 12, 2025

lucyleeow added this to Array API Nov 12, 2025

virchan added the CUDA CI label Nov 25, 2025

github-actions bot removed the CUDA CI label Nov 25, 2025

virchan approved these changes Nov 25, 2025

View reviewed changes

virchan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 25, 2025

OmarManzoor approved these changes Nov 25, 2025

View reviewed changes

OmarManzoor merged commit 3a27f28 into scikit-learn:main Nov 25, 2025
52 checks passed

github-project-automation bot moved this from In Progress to Done in Array API Nov 25, 2025

lucyleeow reviewed Nov 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEA Add array API support to `davies_bouldin_score` #32693

FEA Add array API support to `davies_bouldin_score` #32693

jaffourt commented Nov 11, 2025

Uh oh!

github-actions bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

virchan left a comment

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

lucyleeow Nov 27, 2025

Uh oh!

jaffourt Dec 1, 2025

Uh oh!

lucyleeow Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	# Perform the weighted cumulative sum using float64 precision when possible
	# to avoid numerical stability problem with tens of millions of very noisy
	# predictions:
	# https://github.com/scikit-learn/scikit-learn/issues/31533#issuecomment-2967062437
	y_true = xp.astype(y_true, max_float_dtype)
	tps = xp.cumulative_sum(y_true * weight, dtype=max_float_dtype)[threshold_idxs]

Uh oh!

FEA Add array API support to davies_bouldin_score #32693

FEA Add array API support to davies_bouldin_score #32693

Conversation

jaffourt commented Nov 11, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucyleeow Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

jaffourt Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

lucyleeow Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FEA Add array API support to `davies_bouldin_score` #32693

FEA Add array API support to `davies_bouldin_score` #32693

github-actions bot commented Nov 11, 2025 •

edited

Loading