Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Add dtype preservation for Isomap #24714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 27, 2022

Conversation

rprkh
Copy link
Contributor

@rprkh rprkh commented Oct 21, 2022

Reference Issues/PRs

Towards #11000

What does this implement/fix? Explain your changes.

Isomap uses float32 if the input is float32, otherwise it uses float64.

Any other comments?

Test now passes pytest sklearn/tests/test_common.py -k "Isomap and check_transformer_preserve_dtypes" -v

Benchmark as per #11000 (comment):

from sklearn.manifold import Isomap
import timeit
import numpy as np
import warnings
warnings.filterwarnings('ignore')

X, y = datasets.make_blobs(n_features=4, n_samples=4000, random_state=42)
iso = Isomap(n_neighbors=3)

start1 = timeit.default_timer()
result = iso.fit_transform(X.astype(np.float32, copy=False), y)
end1 = timeit.default_timer()

start2 = timeit.default_timer()
result = iso.fit_transform(X.astype(np.float64, copy=False), y)
end2 = timeit.default_timer()

print('Time taken: {} milliseconds'.format((end1 - start1) * 1000))
print('Time taken: {} milliseconds'.format((end2 - start2) * 1000))

Output:

Time taken: 6423.5296 milliseconds
Time taken: 6418.7789999999995 milliseconds

Comment on lines 302 to 303
if np.array(X).dtype == np.float32:
G = G.astype(X.dtype, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't needed. With G = self.dist_matrix_**2 and self.dist_matrix_ already of type float32, I think G will also be float32.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim is right and this can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if np.array(X).dtype == np.float32:
G = G.astype(X.dtype, copy=False)

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM modulo a few adaptations. Thank you, @rprkh!

Comment on lines 297 to 298
if np.array(X).dtype == np.float32:
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With copy=False, you can remove this test as only np.float{32,64} are accepted.

Suggested change
if np.array(X).dtype == np.float32:
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False)
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ideally, scipy.sparse.shortest_path could support float32, I need to look into it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included all the changes except this one. pytest sklearn/tests/test_common.py -v -k Isomap fails 2 tests if I make this change.

Comment on lines 302 to 303
if np.array(X).dtype == np.float32:
G = G.astype(X.dtype, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim is right and this can be removed.

Comment on lines 302 to 303
if np.array(X).dtype == np.float32:
G = G.astype(X.dtype, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if np.array(X).dtype == np.float32:
G = G.astype(X.dtype, copy=False)

@jjerphan jjerphan added the Quick Review For PRs that are quick to review label Oct 21, 2022
@rprkh
Copy link
Contributor Author

rprkh commented Oct 22, 2022

I'm not entirely sure why macOS is failing when the checks are green for everything else.

@jjerphan
Copy link
Member

Sporadic connections errors happens. Looking at the logs:

ERROR:root:HTTP errors are often intermittent, and a simple retry will get you on your way.

In such case, the CI can be re-triggered by pushing an empty commit or by merging main in the branch (preferred approach).

@glemaitre glemaitre self-requested a review October 26, 2022 12:29
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not totally preserving the dtype but I don't think that we can do better in this PR. We should make sure that NearestNeighbors preserves the dtype, then it means that the intermediate attribute will be float32. We could then reuse this information instead of self.nbrs_._fit_X.dtype to cast dist_matrix_.

But for the moment, I think that we can go ahead.

@@ -403,6 +403,9 @@ Changelog
`eigen_tol="auto"` in version 1.3.
:pr:`23210` by :user:`Meekail Zain <micky774>`.

- |Enhancement| :class:`manifold.Isomap` now preserves
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- |Enhancement| :class:`manifold.Isomap` now preserves
- |Enhancement| :class:`manifold.Isomap` now preserves

Comment on lines 297 to 298
if np.array(X).dtype == np.float32:
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if np.array(X).dtype == np.float32:
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False)
if self.nbrs_._fit_X.dtype == np.float32:
self.dist_matrix_ = self.dist_matrix_.astype(self.nbrs_._fit_X.dtype, copy=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the conversion of X using the validated X stored by NearestNeighbors.

@rprkh
Copy link
Contributor Author

rprkh commented Oct 26, 2022

Included the suggested changes.

@glemaitre glemaitre merged commit 53234c5 into scikit-learn:main Oct 27, 2022
@rprkh rprkh deleted the preserve_dtype_for_isomap branch October 27, 2022 16:58
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2022
andportnoy pushed a commit to andportnoy/scikit-learn that referenced this pull request Nov 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:manifold Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants