-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG] Added inverse_transform for pls base object and test #15304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @NicolasHug @adrinjalali Since I have seen you guys both touch _pls files recently. |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks, otherwise looks good to me, thanks @jiwidi
| Notes | ||
| This transformation will only be exact if n_components=n_features | ||
| ----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes
-----
This transformation will only be exact if n_components=n_features
sklearn/cross_decomposition/_pls_.py
Outdated
| check_is_fitted(self) | ||
| X = check_array(X, copy=False, dtype=FLOAT_DTYPES) | ||
| # From pls space to original space | ||
| X_original = np.matmul(X, self.x_loadings_.T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| X_original = np.matmul(X, self.x_loadings_.T) | |
| np.matmul(X, self.x_loadings_.T, out=X) |
this way you can do it in place.
You should also add a test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I came up with this:
def inverse_transform(self, X, copy=True):
check_is_fitted(self)
X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
# From pls space to original space
np.matmul(X, self.x_loadings_.T, out=X)
# Denormalize
X *= self.x_std_
X += self.x_mean_
return XAnd the extra test:
# Check inplace usage
plsca.inverse_transform(Xr,copy=False)
assert_array_almost_equal(Xr, X,
err_msg="inverse_transform failed")I tested it and it works with both inplace. Just checking this is what you meant before commiting it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this looks good to me, I think.
|
@adrinjalali newbie question here. All the last commits have failed to run the test tho I can not replicate the same error on local (they pass here). Is there any way to debug this in an easier way rather than just committing and make the test run every time? Thanks! |
|
The failing tests are all on python 3.5. You should create a similar environment and see why they fail. You can see the environment on top of the failing tests. For example: |
|
@adrinjalali I have been thinking about the inplace support for this function. It's a different behavior from the transform() inplace, where if you select copy=False the only change reflected to X would be the normalization: def transform(self, X, Y=None, copy=True):
"""Apply the dimension reduction learned on the train data.
Parameters
----------
X : array-like of shape (n_samples, n_features)
Training vectors, where n_samples is the number of samples and
n_features is the number of predictors.
Y : array-like of shape (n_samples, n_targets)
Target vectors, where n_samples is the number of samples and
n_targets is the number of response variables.
copy : boolean, default True
Whether to copy X and Y, or perform in-place normalization.
Returns
-------
x_scores if Y is not given, (x_scores, y_scores) otherwise.
"""
check_is_fitted(self)
X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
# Normalize
X -= self.x_mean_
X /= self.x_std_
# Apply rotation
x_scores = np.dot(X, self.x_rotations_)
if Y is not None:
Y = check_array(Y, ensure_2d=False, copy=copy, dtype=FLOAT_DTYPES)
if Y.ndim == 1:
Y = Y.reshape(-1, 1)
Y -= self.y_mean_
Y /= self.y_std_
y_scores = np.dot(Y, self.y_rotations_)
return x_scores, y_scores
return x_scoresIn inverse_transform the denormalization is computed on the matrix result and not in the X parameter. def inverse_transform(self, X, copy=True):
"""Transform data back to its original space.
Parameters
----------
X : array-like of shape (n_samples, n_components)
New data, where n_samples is the number of samples
and n_components is the number of pls components.
copy : bool, default=True
Whether to copy X, or perform in-place normalization.
Returns
-------
X_original array-like of shape (n_samples, n_features)
Notes
-----
This transformation will only be exact if n_components=n_features
"""
check_is_fitted(self)
X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
# From pls space to original space
np.matmul(X, self.x_loadings_.T, out=X)
# Denormalize
X *= self.x_std_
X += self.x_mean_
return XAnd by contrast in PCA inverse_transform they seem to not support the inplace functionality. It doesn't feel natural to support inplace functionality here if it isn't supported in other decomposition methods. What do you think? Or you see value to this inplace feature? (I'll be happy to work in a PR to add it for PCA). Coming back to the tests, I was able to reproduce the failure test. This is happening because With numpy==1.17 the test pass as this bug was fixed. Is there a specific reason why numpy==1.11 is being used? Should I push for a solution that solves that bug or ignore the inplace feature? I believe to have found a solution to bypass this bug and it would be by just hard copying X. And thanks again for the feedback, I'm learning some new stuff with this PR :) |
|
Numpy 1.11 is the oldest numpy we support. We'll be dropping support for that version soon, but not yet. As you suggest, I recommend you remove the |
This will mean dropping inplace support as well. There will be no reason to keep the |
sklearn/cross_decomposition/_pls_.py
Outdated
| """ | ||
| check_is_fitted(self) | ||
| X = check_array(X, copy=copy, dtype=FLOAT_DTYPES) | ||
| X = check_array(X, copy=True, dtype=FLOAT_DTYPES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to copy here. Just leave the copy as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to accept the copy parameter even if it will not affect the parameter X? I was thinking on removing copy from the function arg list and only accept X (and making it copy=True x default)
Wether copy is true or false will not affect the execution or trace it leaves on the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's what I meant.
|
@adrinjalali Last commit should be ready to merge version I believe |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM. You also need a whats_new entry for this one.
Co-Authored-By: Adrin Jalali <[email protected]>
You mean a whats_new entry on the PR text? Doesn't What does this implement/fix? Explain your changes. Refer to that? |
|
Sorry for not being clear in my previous comment. I meant an entry in the |
…n into pls_inverse-transform
no problem. It should ready to merge now :) |
doc/whats_new/v0.22.rst
Outdated
| - |Feature| :class:`cross_decomposition._PLS` Has a new function :func:`cross_decomposition._PLS.inverse_transform` to transform | ||
| data to the original space`. :pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the line length <80.
Also, we should point to the public API here, i.e. list the classes which inherit from _PLS instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this maybe?
- |Feature| :class:`cross_decomposition.PLSCanonical` and
:class:`cross_decomposition.PLSRegression`
have a new function `inverse_transform`
:func:`cross_decomposition.PLSCanonical.inverse_transform`
:func:`cross_decomposition.PLSRegression.inverse_transform`
to transform data to the original space`.
:pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd do something like:
- |Feature| :class:`cross_decomposition.PLSCanonical` and
:class:`cross_decomposition.PLSRegression` have a new function
``inverse_transform`` to transform data to the original space`.
:pr:`15304` by :user:`Jaime Ferrando Huertas <jiwidi>`.
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jiwidi for your prompt responses. Now we need to wait for another maintainer to do a second review, and hopefully it'll get in soon :)
Thank you for all the feedback! :) |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM!
sklearn/cross_decomposition/_pls_.py
Outdated
| Returns | ||
| ------- | ||
| x_reconstructed array-like of shape (n_samples, n_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| x_reconstructed array-like of shape (n_samples, n_features) | |
| x_reconstructed : array-like of shape (n_samples, n_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed this colon...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, fixed.
Co-Authored-By: Joel Nothman <[email protected]>
…n into pls_inverse-transform
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think reverting the inclusion of the colon will fix CI... It seems something is wrong with installation at the moment, not your fault
sklearn/utils/_unittest_backport.py
Outdated
| @@ -0,0 +1,224 @@ | |||
| """ | |||
| This is a backport of assertRaises() and assertRaisesRegex from Python 3.5.4 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my fault, I pulled from origin and messed things up... Tha's why I tried reverting to the commit before all of this happened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the files now, sorry for the mess
sklearn/cross_decomposition/_pls_.py
Outdated
| Returns | ||
| ------- | ||
| x_reconstructed array-like of shape (n_samples, n_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed this colon...
Reference Issues/PRs
This PR is the same as #15289 but with squeezed commits since I failed with docstring syntax.
What does this implement/fix? Explain your changes.
With this PR I'm adding a function
inverse_transformto the base _PLS object and the consecuent test. This function is already implemented in other decomposition methods like PCA here.The function transforms back to the original space by multiplying the transformed data with the x_loadings as
X_original = np.matmul(X, self.x_loadings_.T)and denormalizes the result with the model std and mean.This function allows you to transform back data to the original space (this transformation will only be exact if n_components=n_features). This transformation is widely used to compute the Squared Prediction Error of our _PLS model. This metric is famous for its use in industry scenarios where PLS acts as a statistical model to control processes where time takes a big act (papers on this: 1, 2 )
Following Sklearn _PLS example this is how the function should be used:
And a example to showcase the correctness of the function with n_components==n_features
Any other comments?
I have been developing software for multivariate statistical process control for some time now and Sklearn implementation of PLS has been widely used in this field. I always thought the _PLS was lacking from this method while PCA had it and decided to make a contribution for it :)