Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Added dtype preservation to Birch #22968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 2, 2022

Conversation

Micky774
Copy link
Contributor

Reference Issues/PRs

Addresses #11000

What does this implement/fix? Explain your changes.

Added dtype preservation to Birch

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

@Micky774 Micky774 requested a review from thomasjpfan March 27, 2022 23:12
@jeremiedbb
Copy link
Member

@Micky774 can you do a quick benchmark to compare your PR and main on float32 to be sure it does not introduce any performance regression ?

@Micky774
Copy link
Contributor Author

Micky774 commented Apr 1, 2022

@Micky774 can you do a quick benchmark to compare your PR and main on float32 to be sure it does not introduce any performance regression ?

@jeremiedbb Benching w/

# %%
from sklearn.cluster import Birch
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=1000)
brc = Birch(n_clusters=None)

# %%
%timeit brc.fit_transform(X)

branch: 45.1 ms ± 983 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
main: 44.5 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

No (visible) performance regression.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an changelog item in doc/whats_new/v1.1.rst

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- Replaced `_CFNode.dtype` with `_CFNode.init_centroids.dtype`
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Micky774. Please add 2 tests: a test that checks that the results of transform are close (rtol=1e-4) between float32 and float64 input, and one that checks that the subcluster_centers_ attribute has the same type as X (parametrized over float32 and float64).

@Micky774
Copy link
Contributor Author

@jeremiedbb Wanted to check if you had any other feedback on this PR -- thanks :)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, otherwise LGTM.

@jeremiedbb jeremiedbb added this to the 1.2 milestone Apr 29, 2022
@jeremiedbb jeremiedbb merged commit c414538 into scikit-learn:main May 2, 2022
@jeremiedbb
Copy link
Member

Thanks @Micky774 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants