Thanks to visit codestin.com
Credit goes to github.com

Skip to content

API add named_transformers attribute to FeatureUnion #20331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 25, 2022

Conversation

crflynn
Copy link
Contributor

@crflynn crflynn commented Jun 23, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This provides an attribute named_transformers to FeatureUnion, allowing access to estimators in a similar way as named_steps on Pipeline.

e.g.

fs = FeatureUnion([("scaler", StandardScaler()), ("func", FunctionTransformer()])

# access via named attribute
scaler = fs.named_transformers.scaler

# indexing by name
scaler = fs.named_transformers["scaler"]

Any other comments?

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2021

That sounds like a legit need and fix. However I am not sure about the name. For the ColumnTransformer the matching attribute it named named_transformers_. I think we should use the same name for consistency.

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2021

Note: the test failure is unrelated and has been fixed in the main branch.

@crflynn
Copy link
Contributor Author

crflynn commented Jun 23, 2021

I was going for consistency with Pipeline i.e. steps -> named_steps, although I think named_transformers (no trailing underscore) might be better.

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2021

I was going for consistency with Pipeline i.e. steps -> named_steps, although I think named_transformers (no trailing underscore) might be better.

Pipeline is a bit of an outlier. If the goal it to access the transformers after fitting them, then it makes sense to expose those as a fitted attribute with the trailing _ to be consistent with the scikit-learn API conventions.

@crflynn
Copy link
Contributor Author

crflynn commented Jun 23, 2021

As with Pipeline, I'd just like to navigate the hierarchy more easily in general, regardless of whether or not it's fitted. For further motivation, accessing estimators like this helps a lot when debugging in my experience.

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2021

I am pretty sure the other reviewers will agree with me that we prefer consistency with the ColumnTransformer's named attribute in this case. The fact that the named steps of Pipeline are not published as a standard scikit-learn fitted attribute is an hard-to-fix historical artifact that we should not replicate.

@crflynn
Copy link
Contributor Author

crflynn commented Jun 23, 2021

I've made the requested changes, although I still find it a bit awkward to be consistent with ColumnTransformer over Pipeline and inconsistent with attributes in general.

@ogrisel
Copy link
Member

ogrisel commented Jun 23, 2021

All the non-trailing-_ public attributes of scikit-learn estimators should be untouched constructor parameters which is not the case for this one.

@crflynn
Copy link
Contributor Author

crflynn commented Jun 23, 2021

All the non-trailing-_ public attributes of scikit-learn estimators should be untouched constructor parameters which is not the case for this one.

That makes more sense. Thanks for the clarification.

@crflynn crflynn changed the title API add named_transformer_list and indexing to FeatureUnion API add named_transformers_ attribute to FeatureUnion Jun 23, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks for you contribution and bearing with me @crflynn!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @crflynn!

I added a comment for @ogrisel to discuss a potential API issue.

Comment on lines 880 to 881
@property
def named_transformers_(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel named_transformers_ uses the naming convention for an (fitted) attribute but is defined before calling fit. I am not sure what to do for FeatureUnion since it overwrites transformer_list during fit. (unlike ColumnTransformer)

Copy link
Contributor Author

@crflynn crflynn Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realized that. Would something like this be more conformant?

    @property
    def named_transformers_(self):
        try:
            check_is_fitted(self)
        except NotFittedError:
            return Bunch(**dict(self.transformer_list))
        return Bunch(**dict(self.transformer_list_))

    def _update_transformer_list(self, transformers):
        transformers = iter(transformers)
        # use a fit attrib, and reference this when fitting
        self.transformer_list_ = [(name, old if old == 'drop'
                                     else next(transformers))
                                    for name, old in self.transformer_list]

Returning something when unfitted might also be technically incorrect, however. This would also be a breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan do you think that it could be better to have a named_transformers with the same semantic than in Pipeline? Our API is broken there and I am not sure that using the _ is actually a good idea then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think that it could be better to have a named_transformers with the same semantic than in Pipeline?

I would be okay with a named_transformers to be consistent with Pipeline's named_steps:

def named_steps(self):
# Use Bunch object to improve autocomplete
return Bunch(**dict(self.steps))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. @crflynn Would you mind make the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to reflect this. Tangentially, any plan to correct the API on Pipeline/FeatureUnion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially, any plan to correct the API on Pipeline/FeatureUnion?

The work was started here: #8350

(I also remember speaking with @glemaitre about this in 2019 😅 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it is even older than 2019. The difficulties in correcting the API come that we might have allowed some use-cases that the correct API would not allow. So we kind of need to think about all of them and provide the correct API but as well support of all previous use-cases.

@crflynn crflynn changed the title API add named_transformers_ attribute to FeatureUnion API add named_transformers attribute to FeatureUnion Jul 23, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will not merge since the +1 of @ogrisel was standing with another API.

Co-authored-by: Guillaume Lemaitre <[email protected]>
@cmarmo
Copy link
Contributor

cmarmo commented May 10, 2022

@ogrisel would you be available for a final review? Thanks!

@cmarmo cmarmo added Waiting for Second Reviewer First reviewer is done, need a second one! and removed Waiting for Reviewer labels Oct 20, 2022
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading the code of FeatureUnion I had completely forgotten that it was suffering from the same historical design flows as the Pipeline class.

Ok for being consistent with the Pipeline behavior for the time being...

@ogrisel
Copy link
Member

ogrisel commented Oct 25, 2022

Let's wait for the CI to complete and then merge.

@ogrisel ogrisel merged commit ff33ffb into scikit-learn:main Oct 25, 2022
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2022
andportnoy pushed a commit to andportnoy/scikit-learn that referenced this pull request Nov 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:pipeline Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants