-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Add dtype preservation for Isomap #24714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/manifold/_isomap.py
Outdated
if np.array(X).dtype == np.float32: | ||
G = G.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't needed. With G = self.dist_matrix_**2
and self.dist_matrix_
already of type float32
, I think G
will also be float32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim is right and this can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if np.array(X).dtype == np.float32: | |
G = G.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM modulo a few adaptations. Thank you, @rprkh!
sklearn/manifold/_isomap.py
Outdated
if np.array(X).dtype == np.float32: | ||
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With copy=False
, you can remove this test as only np.float{32,64}
are accepted.
if np.array(X).dtype == np.float32: | |
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False) | |
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Ideally, scipy.sparse.shortest_path
could support float32, I need to look into it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included all the changes except this one. pytest sklearn/tests/test_common.py -v -k Isomap
fails 2 tests if I make this change.
sklearn/manifold/_isomap.py
Outdated
if np.array(X).dtype == np.float32: | ||
G = G.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim is right and this can be removed.
sklearn/manifold/_isomap.py
Outdated
if np.array(X).dtype == np.float32: | ||
G = G.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if np.array(X).dtype == np.float32: | |
G = G.astype(X.dtype, copy=False) |
I'm not entirely sure why macOS is failing when the checks are green for everything else. |
Sporadic connections errors happens. Looking at the logs:
In such case, the CI can be re-triggered by pushing an empty commit or by merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not totally preserving the dtype
but I don't think that we can do better in this PR. We should make sure that NearestNeighbors
preserves the dtype, then it means that the intermediate attribute will be float32
. We could then reuse this information instead of self.nbrs_._fit_X.dtype
to cast dist_matrix_
.
But for the moment, I think that we can go ahead.
doc/whats_new/v1.2.rst
Outdated
@@ -403,6 +403,9 @@ Changelog | |||
`eigen_tol="auto"` in version 1.3. | |||
:pr:`23210` by :user:`Meekail Zain <micky774>`. | |||
|
|||
- |Enhancement| :class:`manifold.Isomap` now preserves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- |Enhancement| :class:`manifold.Isomap` now preserves | |
- |Enhancement| :class:`manifold.Isomap` now preserves |
sklearn/manifold/_isomap.py
Outdated
if np.array(X).dtype == np.float32: | ||
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if np.array(X).dtype == np.float32: | |
self.dist_matrix_ = self.dist_matrix_.astype(X.dtype, copy=False) | |
if self.nbrs_._fit_X.dtype == np.float32: | |
self.dist_matrix_ = self.dist_matrix_.astype(self.nbrs_._fit_X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid the conversion of X
using the validated X
stored by NearestNeighbors
.
…scikit-learn into preserve_dtype_for_isomap
Included the suggested changes. |
Co-authored-by: Julien Jerphanion <[email protected]>
Co-authored-by: Julien Jerphanion <[email protected]>
Reference Issues/PRs
Towards #11000
What does this implement/fix? Explain your changes.
Isomap uses
float32
if the input isfloat32
, otherwise it usesfloat64
.Any other comments?
Test now passes
pytest sklearn/tests/test_common.py -k "Isomap and check_transformer_preserve_dtypes" -v
Benchmark as per #11000 (comment):
Output: