Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jaffourt
Copy link
Contributor

Reference Issues/PRs

Towards #26024

What does this implement/fix? Explain your changes.

Add array API support to davies_bouldin_score

Any other comments?

@github-actions
Copy link

github-actions bot commented Nov 11, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 85805f9. Link to the linter CI: here

@lucyleeow lucyleeow moved this to In Progress in Array API Nov 12, 2025
@github-actions github-actions bot removed the CUDA CI label Nov 25, 2025
Copy link
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, @jaffourt!

CUDA CI is green as well. @lucyleeow, @OmarManzoor, friendly ping, would you like to have a look?

@virchan virchan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 25, 2025
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jaffourt

@OmarManzoor OmarManzoor merged commit 3a27f28 into scikit-learn:main Nov 25, 2025
52 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Array API Nov 25, 2025

intra_dists = np.zeros(n_labels)
centroids = np.zeros((n_labels, len(X[0])), dtype=float)
dtype = _max_precision_float_dtype(xp, device_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick question, why _max_precision_float_dtype and not _find_matching_floating_dtype here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a good question and I don't have a solid answer. I have seen a number of different approaches for finding a common floating dtype in array_api compliant calculations, and it would be nice to have a more defined heuristic if needed :)

In this case, my reasoning for _max_precision_float_dtype was that _find_matching_floating_dtype finds a common float type between several arrays, but in this implementation we are building new floating arrays for intermediate calculations.

I.e., does it make sense to use the floating dtype returned from _find_matching_floating_dtype(x, labels)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could have better consistency around the use of these two.

From my understanding, _max_precision_float_dtype is used when we require the highest precision e.g. here:

# Perform the weighted cumulative sum using float64 precision when possible
# to avoid numerical stability problem with tens of millions of very noisy
# predictions:
# https://github.com/scikit-learn/scikit-learn/issues/31533#issuecomment-2967062437
y_true = xp.astype(y_true, max_float_dtype)
tps = xp.cumulative_sum(y_true * weight, dtype=max_float_dtype)[threshold_idxs]

cumulative_sum is notorious for floating point precision error, so it's best to use the highest precision.

I was just wondering if intra_dists had a particular requirement for higher precision.

Probably not worth worrying about here (I think?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Array API module:metrics Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants