-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX ensure consistency or column and feature names in FunctionTransformer #27801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| feature_names_out=feature_names_out, validate=validate | ||
| ) | ||
| transformer.fit_transform(X) | ||
| transformer.fit(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling transform allows us to find the issue with the number of columns and the function used here. Therefore, we can call fit to avoid this check.
|
So here, we only raise a better error message. There is no magic but we provide an explanation what to do. I am not a big fan of the magical solution and I am not sure that we will be able to somehow return the expected type (NumPy vs. Pandas) since it will depend of what |
thomasjpfan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lorentzenchr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Needs sync with main before merging. |
|
Thanks @lesteve for syncing the PR. Merging with the 2 above approvals. |
closes #27695
Raise an explicit error when the column names of the container given by
transformis not consistent with the output ofget_feature_names_outinFunctionTransformer.In #27695, the error raised is not easy to understand when the
FunctionTransformeris embedded within aPipeline.Here, we also give some solution how to resolve the problem.
I see that we have test failing in our test suite. I need to check if they are legitimate. I see that some come from the fact that
feature_names_outreturn less names than the number of columns inX_trans.