Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Add metadata routing for FeatureUnion #28205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 5, 2024

Conversation

StefanieSenger
Copy link
Contributor

Reference Issues/PRs

Towards #22893

What does this implement/fix? Explain your changes.

This PR adds metadata routing to FeatureUnion and the corresponding tests.

I was confused about why we cannot do set_fit_transform_request. I know we can work around it, but from a user's perspective it's not self-explanatory.

Copy link

github-actions bot commented Jan 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0304593. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_fit_transform_request isn't well-defined since it's a composite method, therefore it always has to be a combination of routing for fit and transform.

@adrinjalali adrinjalali self-requested a review February 2, 2024 15:05
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the test about making sure get_metadata_routing works on an unfitted estimator here.

method_mapping=MethodMapping()
.add(callee="fit", caller="fit")
.add(callee="fit_transform", caller="fit")
.add(callee="fit_transform", caller="fit_transform"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do explicit fit_transform here, since it's a compound method.

Copy link
Contributor Author

@StefanieSenger StefanieSenger Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see:

From FeatureUnion's fit_transform we need to prepare for routing into three directions: to fit, to transform and to fit_transform. This is because of the way _fit_transform_one is designed:

        if hasattr(transformer, "fit_transform"):
            res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
        else:
            res = transformer.fit(X, y, **params.get("fit", {})).transform(
                X, **params.get("transform", {})
            )

It is designed this way, because a transformer can use either fit and transform or fit_transform and in fact, metadata routing is only possible, if it overwrites TransformerMixin's fit_transform.

But yes, we don't route from the FeaturUnion's fit to the estimator's fit_transform.

I've changed the routing to my new understandings.

Copy link
Contributor Author

@StefanieSenger StefanieSenger Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't have a test for transformer.fit().transform(). I made one using a new class ConsumingNoFitTransformTransformer, that is special, because it doesn't inherit from TransformerMixin. it has a long name, but it's a speaking name. :)

We need this new class to make transformer.fit().transform() possible.

ConsumingClassifier's fit and transform would not be used, because it already has a fit_transform, that would be used instead.

@adrinjalali adrinjalali self-requested a review February 13, 2024 12:11
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

routed_fit_transform_params = Bunch()
for name, _ in self.transformer_list:
routed_fit_transform_params[name] = Bunch(fit_transform={})
routed_fit_transform_params[name].fit_transform = fit_params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
routed_fit_transform_params[name].fit_transform = fit_params
routed_fit_transform_params[name].fit_transform = fit_params
routed_fit_transform_params[name].fit = fit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it needs to be like that:

                if hasattr(obj, "fit_transform"):
                    routed_params[name] = Bunch(fit_transform={})
                    routed_params[name].fit_transform = params
                else:
                    routed_params[name] = Bunch(fit={})
                    routed_params[name] = Bunch(transform={})
                    routed_params[name].fit = params
                    routed_params[name].transform = params

It accounts for the metadata to be passed to the correct methods of the sub-transformer. I've also added a test for that (test_feature_union_fit_params_without_fit_transform).

@StefanieSenger
Copy link
Contributor Author

I've addressed all your review comments, @adrinjalali.
Would you have a fresh look if the PR is alright now?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@StefanieSenger
Copy link
Contributor Author

Okay, I've taken care of that, @adrinjalali.

@glemaitre glemaitre self-requested a review February 23, 2024 13:38
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Just some questions reagarding transform and thus the renaming from fit_params to params.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@glemaitre
Copy link
Member

@StefanieSenger I think that we are missing forwarding the metadata in the transform method of FeatureUnion. Quickly speaking with @adrinjalali, it seems this is something that we overlooked here.

@StefanieSenger
Copy link
Contributor Author

Oh yes, @glemaitre and @adrinjalali, sorry for the oversight.
I've added that now.

@StefanieSenger
Copy link
Contributor Author

I've also simplified the Metadata Router of FeatureUnion (by aligning if and else conditions), because they are almost identical and I think no harm would happen, by providing more possibilities than an object would use.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @StefanieSenger

@glemaitre glemaitre merged commit b1ce4c1 into scikit-learn:main Mar 5, 2024
@StefanieSenger StefanieSenger deleted the metadata_FeatureUnion branch March 6, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants