Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jeromedockes
Copy link
Contributor

Reference Issues/PRs

closes #27533

What does this implement/fix? Explain your changes.

As described in the issue #27533 , this modifies the format of the columns of the last item of the ColumnTransformer's transformers_ attribute, ie of the item that corresponds to the "remainder". They used to always be indices (integers), now they match the format that was used for the transformers parameter, if it was consistent across all transformers:

  • if all columns in inputs are provided as column names, so are remainder columns
  • if all columns in inputs are provided as boolean masks, so are remainder columns
  • otherwise remainder columns are int indices (as before)

This is controlled by the force_int_remainder_cols parameter (better name suggestions welcome :) ) : when it is True the old behavior is kept and a FutureWarning is emitted; when it is False the new behavior is applied

- if all columns in inputs are provided as column names, so are remainder columns
- if all columns in inputs are provided as boolean masks, so are remainder columns
- otherwise remainder columns are int indices (as before)
@jeromedockes jeromedockes marked this pull request as draft October 24, 2023 13:54
@github-actions
Copy link

github-actions bot commented Oct 24, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e91fd78. Link to the linter CI: here

@jeromedockes
Copy link
Contributor Author

as discussed IRL with @glemaitre, we want to avoid showing spurious warnings, as many users may not use directly the columns of the remainder entry in transformers_. The solution we wrote is to store the columns in a UserList subclass whose __getitem__ emits the warning, rather than a plain list, when the dtype of the columns would have been different with the new behavior

@jeromedockes jeromedockes marked this pull request as ready for review October 26, 2023 11:42
@jeromedockes jeromedockes changed the title [WIP] change ColumnTransformer remainder columns format Change ColumnTransformer remainder columns format Oct 26, 2023
@glemaitre glemaitre self-requested a review October 26, 2023 13:24
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of comment before looking at the tests

@glemaitre glemaitre changed the title Change ColumnTransformer remainder columns format API improve the remander index dtype to be consistent with transformers Oct 26, 2023
@glemaitre glemaitre self-requested a review October 27, 2023 13:38
The remainder columns warning (if it exists) is disabled.
"""
return _with_dtype_warning_enabled_set_to(False, transformers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot call directly this function in the code instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to call this function directly. I had introduced the enabled and disabled wrappers because I thought it was a bit easier to figure out what they did without needing to check the _RemaindersColsList when reading the ColumnTransformer code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see but I think this is good enough now.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@betatim betatim changed the title API improve the remander index dtype to be consistent with transformers API improve the remainder index dtype to be consistent with transformers Oct 30, 2023
@glemaitre glemaitre added this to the 1.5 milestone Dec 12, 2023
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fit it in 1.5. Please change the target version to 1.5, the target version to change the default to 1.7 and the target version to remove the old behavior to 1.9.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jeromedockes

@jeremiedbb jeremiedbb merged commit 2a2643f into scikit-learn:main Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better inference of the columns remainder dtype in transformers_ from ColumnTransformer

3 participants