-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
API Standardize X
as inverse_transform
input parameter
#28756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Standardize X
as inverse_transform
input parameter
#28756
Conversation
The approach looks reasonable to me. It is based on how the deprecation of For me the big question is if it is worth the effort to do this. What is the problem we are fixing by doing this (except for solving a naming inconsistency)? |
I encountered it while trying to switch a Pipeline out with a StandardScaler and using kwargs. # Has Xt arg
pipeline = make_pipeline(StandardScaler())
Xt = pipeline.fit_transform(X)
X_again = pipeline.inverse_transform(Xt=Xt)
# inverse_transform takes X instead of Xt
transformer = StandardScaler()
Xt = transformer.fit_transform(X)
X_again = transformer.inverse_transform(Xt=Xt)
# TypeError: inverse_transform() got an unexpected keyword argument 'Xt' Pipelines might be the most used object but I'd think having a uniformed API with the transformers themselves might be good. What are your thoughts? |
I agree it is a bit annoying that using/not using a The big question for me is how typical Maybe other people/maintainers have an opinion? @scikit-learn/core-devs |
I'm not really in favor of such a change:
|
Here's the list:
It's true that most estimators use I agree that we should not change for cosmetic reasons. But the inconsistency between most estimators and the main meta-estimators gives a bad user experience that we can improve. I don't know however how many users are concerned, i.e. using kwargs for X. Side note: only the users who pass X as kwarg would actually see the API change and the deprecation warnings and this change aims to improve the experience of these users. It would be transparent for other users. So overall I'm +0.5 here. |
I didn't count but from scrolling through the list of estimators it seems many of them use |
Maybe, if we do anything, we should standardise on X?
That's my feeling
|
mine too at first, but notice that the |
I feel like the impact is minimum because it only impacts users that write We'll still go through a deprecation cycle and the change is fairly simple for users to make. (Make the input positional or use |
Thanks everyone for the feedback! Thanks for making the list @jeremiedbb. Very helpful! Personally, I want to note that the goal is to be backwards compatible. That would mean:
I will fix all of the examples in docstrings if they use a keyword example. However, I don't think any do at first glance. https://github.com/search?q=repo%3Ascikit-learn%2Fscikit-learn%20%22inverse_transform(X%3D%22&type=code I didn't have None for Xt which might have caused some confusion. I've added that with the latest commit! |
+1 for making it consistent. For me, it‘s not a cosmetic change, but a real flaw. The sooner the fix the better. |
Xt
as inverse_transform
argumentXt
as inverse_transform
parameter
🎉 If others have the same idea, then I will go through with the rest of the implementation! |
@wd60622 I‘m just one voice and @GaelVaroquaux has a different view than me. So it’s not decided yet. |
I am not a core dev but regarding the change I am aligned with @GaelVaroquaux. Regarding the impact I agree with @thomasjpfan
A public poll regarding the use of positional or keyword arguments usage in |
That's expensive and I would dedicate such a poll to much more impactful questions. |
@wd60622 I‘m just one voice and @GaelVaroquaux has a different view than me. So it’s not decided yet.
Consistency is a good thing. Hence given the summary above showing that X and Xt are used, I can be convinced :)
I'm slightly more in favor of converging to X than to Xt.
|
Totally, agree. Hence the PR 😄
Is there an issue with the name |
+1. Makes the API simpler. I guess we have reached a decision. |
And that is X? |
X it is! |
sklearn/utils/deprecation.py
Outdated
def _deprecate_Xt_in_inverse_transform(X, Xt): | ||
"""Helper to deprecate the `Xt` argument in favor of `X` in inverse_transform.""" | ||
if X is not None and Xt is not None: | ||
raise ValueError("Cannot use both X and Xt. Use X only.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late to the party with my comment. I'd use a TypeError
here as well. It seems more consistent with the case below where the user forgot to pass any argument. ValueError
makes me think that I passed the wrong value, not that I tried to use an argument I shouldn't.
sklearn/utils/deprecation.py
Outdated
raise ValueError("Cannot use both X and Xt. Use X only.") | ||
|
||
if X is None and Xt is None: | ||
raise TypeError("Missing required positional argument: 'X'.") | ||
|
||
if Xt is not None: | ||
warnings.warn( | ||
"Xt was renamed X in version 1.5 and will be removed in 1.7.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick: can we use 'X'
and 'Xt'
everywhere instead of mixing X
and 'X'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not mixing. However I then prefer bare X
because we usually don't put parameter names between quotes in other error/warning messages. Is it ok for you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in strings, variables should have single quote surrounding? That is, X -> 'X'
Does this go for the docstring changes as well (if outside of the normal parameter definitions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to Jeremie for what the correct formatting is. It sounds like it should be "Do not pass X and Xt at the same time"
. Not "Do not pass 'X' and 'Xt' at the same time"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just two small cosmetic comments. Good to merge after we resolve them.
sklearn/utils/deprecation.py
Outdated
if X is None and Xt is None: | ||
raise TypeError("Missing required positional argument: 'X'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TypeError
makes sense here @betatim ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say so. For me passing an argument or missing a required argument is a TypeError
.
Head branch was pushed to by a user without write access
I have made the adjustments based on the latest feedback @betatim @jeremiedbb. Hope they're squashed 😆 |
@wd60622 I think that you misunderstood, the decision was not to put them between quotes. Can you please revert your latest changes, or I can do it if you prefer ? |
Sorry. I did but them between quotes though! Not sure what you mean then Yeah, you can revert! |
"not to" 😄 . Don't worry, I'll revert. |
c2118a6
to
29d37a0
Compare
Reference Issues/PRs
closes #27654
related to #27666
What does this implement/fix? Explain your changes.
This changes all
X
toXt
ininverse_transform
methods. This then makes the API:Xt = transformer.transform(X)
X = transformer.inverse_transform(Xt)
Any other comments?
Hi @glemaitre . I am finally getting back to this!
It turns out that
X
is much more common as a parameter thanXt
. I've started with a few in order to get some feedback before rinsing and repeating the changes. Please let me know if you have any feedback on the:One thing I suspect is that tests might have used keyword arguments as well and thus cause warnings. I will switch if that is the case