-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
API add named_transformers attribute to FeatureUnion #20331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That sounds like a legit need and fix. However I am not sure about the name. For the |
Note: the test failure is unrelated and has been fixed in the main branch. |
I was going for consistency with Pipeline i.e. |
|
As with Pipeline, I'd just like to navigate the hierarchy more easily in general, regardless of whether or not it's fitted. For further motivation, accessing estimators like this helps a lot when debugging in my experience. |
I am pretty sure the other reviewers will agree with me that we prefer consistency with the |
I've made the requested changes, although I still find it a bit awkward to be consistent with ColumnTransformer over Pipeline and inconsistent with attributes in general. |
All the non-trailing- |
That makes more sense. Thanks for the clarification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for you contribution and bearing with me @crflynn!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn/pipeline.py
Outdated
@property | ||
def named_transformers_(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel named_transformers_
uses the naming convention for an (fitted) attribute but is defined before calling fit
. I am not sure what to do for FeatureUnion
since it overwrites transformer_list
during fit
. (unlike ColumnTransformer
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't realized that. Would something like this be more conformant?
@property
def named_transformers_(self):
try:
check_is_fitted(self)
except NotFittedError:
return Bunch(**dict(self.transformer_list))
return Bunch(**dict(self.transformer_list_))
def _update_transformer_list(self, transformers):
transformers = iter(transformers)
# use a fit attrib, and reference this when fitting
self.transformer_list_ = [(name, old if old == 'drop'
else next(transformers))
for name, old in self.transformer_list]
Returning something when unfitted might also be technically incorrect, however. This would also be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan do you think that it could be better to have a named_transformers
with the same semantic than in Pipeline
? Our API is broken there and I am not sure that using the _
is actually a good idea then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think that it could be better to have a named_transformers with the same semantic than in Pipeline?
I would be okay with a named_transformers
to be consistent with Pipeline
's named_steps
:
scikit-learn/sklearn/pipeline.py
Lines 247 to 249 in 9580530
def named_steps(self): | |
# Use Bunch object to improve autocomplete | |
return Bunch(**dict(self.steps)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. @crflynn Would you mind make the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to reflect this. Tangentially, any plan to correct the API on Pipeline/FeatureUnion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangentially, any plan to correct the API on Pipeline/FeatureUnion?
The work was started here: #8350
(I also remember speaking with @glemaitre about this in 2019 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is even older than 2019. The difficulties in correcting the API come that we might have allowed some use-cases that the correct API would not allow. So we kind of need to think about all of them and provide the correct API but as well support of all previous use-cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will not merge since the +1 of @ogrisel was standing with another API.
Co-authored-by: Guillaume Lemaitre <[email protected]>
@ogrisel would you be available for a final review? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the code of FeatureUnion
I had completely forgotten that it was suffering from the same historical design flows as the Pipeline
class.
Ok for being consistent with the Pipeline
behavior for the time being...
Let's wait for the CI to complete and then merge. |
) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
Reference Issues/PRs
What does this implement/fix? Explain your changes.
This provides an attribute
named_transformers
toFeatureUnion
, allowing access to estimators in a similar way asnamed_steps
onPipeline
.e.g.
Any other comments?