Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@glemaitre
Copy link
Member

@glemaitre glemaitre commented Jan 25, 2024

closes #28260

The adapters in charge of renaming columns with set_output to a DataFrame-like object was failing in case that the original DataFrame-like as duplicated columns (that is possible in pandas.

We inverse the stacking and and renaming to avoid potential column duplicate that should not happen.

@github-actions
Copy link

github-actions bot commented Jan 25, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4c16345. Link to the linter CI: here

@glemaitre
Copy link
Member Author

I could also use the code snippet of the original issue as an integration test. However, they weird stuff to be investigated in between:

  • Polars does not allow duplicated columns so it should be failing,
  • It does not fail on the issue snippet because the FunctionTransformer returns a pandas DataFrame while requesting a polars DataFrame.

So more to follow.

@glemaitre glemaitre added this to the 1.4.1 milestone Jan 25, 2024
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion about the error message.

continue
names_out = feature_names_outs[names_idx : names_idx + X.shape[1]]
adapter.rename_columns(X, names_out)
names_idx += X.shape[1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this code ugly but I don't see how to improve it without reshaping boilerplate

@glemaitre
Copy link
Member Author

I think it is ready for a review @ogrisel @adrinjalali

There quite some boilerplate code for generating the error but I think this is OK. For the rest, I'm just annoyed between having a flat list and a list of list. But I don't see how to simplify if I want to avoid touching more code that is not related to this issue.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments but otherwise LGTM.

raise ValueError(
"Concatenating DataFrames from the transformer's output lead to"
" an inconsistent number of samples. The output may have Pandas"
" Indexes that do not match."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we could actually check if that last sentence hold to make to the error message more precise (and not use "may" when it's not the case).

But let's keep that for a follow-up PR ;)

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like there must be a better way to do it 😅

continue
names_out = feature_names_outs[names_idx : names_idx + X.shape[1]]
adapter.rename_columns(X, names_out)
names_idx += X.shape[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_num_features and _num_samples instead of .shape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we play with container that are dataframe-like container. So they will implement the shape.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@adrinjalali adrinjalali enabled auto-merge (squash) January 31, 2024 16:25
@adrinjalali adrinjalali merged commit 4a28ba3 into scikit-learn:main Jan 31, 2024
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ColumnTransformer output unexpected prefixed feature names from FunctionTransformer() step

3 participants