Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Sep 4, 2025

Fixes #32104

@glemaitre or @thomasjpfan maybe ?

Copy link

github-actions bot commented Sep 4, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fe0ecb0. Link to the linter CI: here

Co-authored-by: Jérôme Dockès <[email protected]>
@jeremiedbb
Copy link
Member Author

jeremiedbb commented Sep 5, 2025

Hum, actually this fix is not enough. When verbose_feature_names_out=False, it doesn't use get_feature_names_out but keep the current column names instead (i.e. with the meaningless intermediate names that I added here).

So a proper fix would be to do something like in the ColumnTransformer, that is if verbose_feature_names_out=False, obtain each transformer out feature names and use those before stacking, and raise an informative error in case of duplicate column names.

EDIT: Actually that was a bug that I fixed in this PR, see #32106 (comment), so we can keep the simple fix

@jeremiedbb jeremiedbb marked this pull request as draft September 5, 2025 13:07
Comment on lines +30 to +34
except AttributeError as e:
if "does not provide get_feature_names_out" in str(e):
return None
else:
raise
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note for reviewers: this was bypassing the check for duplicates in _add_prefix_for_feature_names_out (itself called in get_feature_names_out), explaining why the error did not happen with pandas even when verbose_feature_names_out was set to False.

@jeremiedbb
Copy link
Member Author

I took the opportunity to clean-up the whole logic in the ColumnTransformer as well. There's no need to compute the feature names and do the duplicate columns here. It should be done by the set_output wrapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeatureUnion with polars output can error due to duplicate column names
2 participants