Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 18, 2022

Reference Issues/PRs

Follow up to #23734

What does this implement/fix? Explain your changes.

On main, if the inner transformers does not define get_feature_names_out, then ColumnTransformer will error even if all the transformers return a DataFrame. This is because ColumnTransformer.get_feature_names_out is called to adjust the column names to follow verbose_feature_names_out.

This PR makes ColumnTransformer more lenient toward transformers that return DataFrames but does not define get_feature_names_out. Feature names out are prefixed following verbose_feature_names_out. The prefixing logic is shared with get_feature_names_out and refactored into a _add_prefix_for_feature_names_out method.

Any other comments?

I think it is common to have third-party transformers that only expect dataframes and will always return DataFrames regardless of how set_output is configured.

@glemaitre glemaitre self-requested a review October 19, 2022 08:39
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the proposed behaviour. I think that this is something that third-party libraries could find useful.

X_df = pd.DataFrame({"feat1": [1, 2, 3], "feat2": [3, 4, 5]})

X_wrapped = _wrap_in_pandas_container(X_df, columns=get_columns)
assert_array_equal(X_wrapped.columns, X_df.columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the documentation mentioned that raising an error is equivalent to None, I think that we should test the case where we raise an error and we pass something else than a dataframe to check that we return range(X.shape[1])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test in 0fc62e6 (#24699) and adjusted it slightly in 2fb935f (#24699)

@cmarmo cmarmo added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 12, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb merged commit 7dcb5ef into scikit-learn:main Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:compose module:utils Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants