Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

StefanieSenger
Copy link
Member

@StefanieSenger StefanieSenger commented Jul 3, 2025

What does this implement/fix? Explain your changes.

This PR de-couples the double-handling of MetadataRouter and MetadataRequest objects in metadata routing ( a bit), by not using the get_metadata_routing method in MetadataRequest anymore, but instead to directly use _get_metadata_request.

This change has affect in several parts of our metadata routing implementation and the tests and there simplifies the code and makes it more intuitive to read.

The goal of this PR is to increase maintainability.

While at it, I also:

  • re-named __metadata_request__* class attribute into __metadata_request__{method} and made clear to mention its a class attribute in our internal documentation to make it easier to distinguish it from the _metadata_request instance attribute present in consumers at first sight
  • made some other small improvements for the external or internal documentation in metadata routing

Copy link

github-actions bot commented Jul 3, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 6f97885. Link to the linter CI: here

@StefanieSenger StefanieSenger changed the title Re-factor MNT Refactor double-handling of MetadataRouter and MetadataRequest objects Jul 3, 2025
@StefanieSenger StefanieSenger added Metadata Routing all issues related to metadata routing, slep006, sample props No Changelog Needed and removed module:metrics module:model_selection labels Jul 3, 2025
Comment on lines +996 to +1002
with pytest.warns(UserWarning, match="`transform` method which consumes metadata"):
rct = RoutingCustomTransformer(
estimator=CustomTransformer()
.set_fit_request(metadata=True)
.set_transform_request(metadata=True)
)
rct.fit_transform(X=[[1]], metadata="metadata")
Copy link
Member Author

@StefanieSenger StefanieSenger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was added to make sure the warning is also shown when a transformer is not a pure consumer (but also a router). It refers to the change in sklearn.base.TransformerMixin.fit_transform.

def test_default_requests():
class OddEstimator(BaseEstimator):
def test_class_level_requests():
class StubEstimator(BaseEstimator):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm renaming OddEstimator because the old naming implies we are intentionally demonstrating wrong usage, but that's not our purpose here.

@@ -210,19 +210,19 @@ def test_request_type_is_valid(val, res):


@config_context(enable_metadata_routing=True)
def test_default_requests():
class OddEstimator(BaseEstimator):
def test_class_level_requests():
Copy link
Member Author

@StefanieSenger StefanieSenger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm renaming test_default_requests because the term "default" is not referring to default / auto routing here and could be mis-understood. What is meant are the class level requests.

@StefanieSenger
Copy link
Member Author

StefanieSenger commented Jul 5, 2025

Log of my though process (that hopefully helps me move forwards):

The de-coupeling of _get_metadata_request for consumers and get_metadata_routing that was previously used in both consumers and routers and is now only used in routers is a separate issue than removing the add_self_request in routers by using their _get_metadata_request method, that is still present in them.
I understand I should do it in the same PR nevertheless.

I wonder if we want to fully automate the setting of self requests, or if it should be integrated in MetadataRouter.add(). But if the latter, I wonder if it really makes things simpler, because we would force estimator developers to define a method_mapping in .add, that is not easily available for self requests (or only in a hacky way).
So I tend to fully automate, which means that I'd have to manipulate MetadataRouter.__init__(). I have hardly seen us add any functionality to any init method in scikit-learn, and I think for a good reason (predictability or so?). Still, I will make a suggestion to automate the setting of self requests for router, because otherwise, we force us and developers of custom meta-estimators to pass a complicated method mapping into MetadataRouter.add()...

Since then I need to pass an object into MetadataRouter.__init__(), so I can create a self request, I wonder if I should make the existing owner become an object type, instead of a string, or if I should add a new init param that would be the object. Maybe better to pass a new param, because then the owner string can still get a user-defined name?

But potentially, those could both be merged into the same param, since passing MetadataRouter(owner=self.__class__.__name__, owning_obj=self) seems a bit unnecessary.

@StefanieSenger
Copy link
Member Author

StefanieSenger commented Jul 5, 2025

I understood that this PR cannot do much about the double-handling of MetadataRouter and MetadataRequest objects (which I had thought it was intending in the beginning), because that depends on several other places and choices taken. But it does part of the job by taking care of certain methods. I will update the PR title to better fit that.

@StefanieSenger StefanieSenger changed the title MNT Refactor double-handling of MetadataRouter and MetadataRequest objects MNT Refactor get_metadata_routing method in MetadataRequest to directly use _get_metadata_request instead Jul 5, 2025
@StefanieSenger
Copy link
Member Author

I've experimented with my above idea to add a self-request automatically to routers, but it fails for routers that are pure routers but not also consumers. Given that, I no longer see a benefit in removing or refining the add_self_request method from MetadataRouter. We would still require some input from estimator developers on whether a self request should be added or not.

If that input is done by calling MetadataRouter().add_self_request(self) in get_metadata_routing() or through - in the simplest case, boolean paramater (like add_self_request=True) feels like an implementation detail.

Also, the original change of this PR, the de-tangling of _get_metadata_request and get_metadata_routing methods for consumers doesn't provide a new possibility here, because MetadataRouter().add_self_request had already internally used obj._get_metadata_request().

At this point, I am tempted to think this refactoring is complete, unless there is some specific thing that we want to do with add_self_request?

This PR makes the code more readable, because we can clearly see if we deal with a consumer based on the method called on an object. It also adds some improvements to the internal and the API docs and personally, it has given me more insights and more ideas for refactorings (which I will open separate PRs for).

What is left that I will go through our docs and examples to check whether the right method is used everywhere. I will do that and then un-draft this PR. Do you consider it complete, @adrinjalali?

@@ -46,15 +45,20 @@
MetadataRouter
~~~~~~~~~~~~~~
The ``MetadataRouter`` objects are constructed via a `get_metadata_routing` method,
Copy link
Member Author

@StefanieSenger StefanieSenger Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MetadataRouter objects are constructed via a get_metadata_routing method

@adrinjalali, Do you think it makes sense to rename get_metadata_routing into get_metadata_router?
That would be more precise and we spare people to wonder about whether there is a difference between "metadata router" and "metadata routing". In fact, this method returns a MetadataRouter object.

That would be for a new PR and could be done without deprecation (because we are still in experimental stage), I believe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind the rename, but needs to go through a deprecation cycle. Maybe a different PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the renaming would be very useful for people unfamiliar with metadata routing like me: _get_metadata_request returns a MetadataRequest, get_metadata_router returns a MetadataRouter, all is clear. So +1 for a follow-up PR with the renaming !

@StefanieSenger StefanieSenger marked this pull request as ready for review July 5, 2025 17:45
Comment on lines -548 to -568
Consumer-only classes such as simple estimators return a serialized
version of this class as the output of `get_metadata_routing()`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true (anymore).

get_routing_for_object(), _get_metadata_request() and get_metadata_routing() all return unserialised output.

@@ -1529,20 +1532,6 @@ def _get_metadata_request(self):

return requests

def get_metadata_routing(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side effect of this change is that this method won't be displayed in the docs for the consumers anymore. I don't think it's too bad, since we want users to use get_routing_for_object anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also do a single release deprecation cycle for this (shorter than usual).

@adrinjalali adrinjalali moved this to To Review in Metadata routing Aug 11, 2025
@StefanieSenger
Copy link
Member Author

@adrinjalali, I did the deprecation of get_metadata_routing for estimators that are consumers and would like to do the deprecation of the method name for routers (get_metadata_router instead, see #31695) in a separate PR after this PR got merged (but before the next release).

@StefanieSenger StefanieSenger changed the title MNT Refactor get_metadata_routing method in MetadataRequest to directly use _get_metadata_request instead MNT Refactor and partially deprecate get_metadata_routing method in _MetadataRequester Aug 17, 2025
@StefanieSenger StefanieSenger changed the title MNT Refactor and partially deprecate get_metadata_routing method in _MetadataRequester MNT Refactor and deprecate get_metadata_routing method in _MetadataRequester Aug 17, 2025
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think otherwise LGTM.

hasattr(_obj, "get_metadata_routing")
or hasattr(_obj, "_get_metadata_request")
or isinstance(_obj, MetadataRouter)
):
raise AttributeError(
f"The given object ({_obj.__class__.__name__!r}) needs to either"
" implement the routing method `get_metadata_routing` or be a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error message is now wrong for simple consumers which do not inherit from BaseEstimator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should'nt we limit the scope of process_routing to routers (eg a meta-estimator) only ? It seems here that we allow simple consumers and MetadataRouter instances as well. According to the docstring its only use is inside routers. Are they cases where we need to call it on simple consumers or MetadataRouter instances ? If not, maybe we should raise an error (which could help debugging).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my, when I added that condition, it had made sense to me and now I cannot explain it anymore ... 😅

I now think you are right and I have removed the new condition, @antoinebaker.

On the question whether we ever call process_routing on MetadataRouter instances: It seems we do that in some of the tests in the test_metaestimators_metadata_routing.py module. But we shouldn't modify code just to match a test. Instead, I think the tests would need to be different / less abstract.

Copy link
Contributor

@antoinebaker antoinebaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @StefanieSenger for the PR, it does improve the code readability. Being unfamiliar with the metadata routing code, I had some difficulty understanding which object was returned by the get_metadata_routing methods, but it's much clearer now.

Here is a first round of comments, I'll do another pass later on, as I feel I don't fully grasp the metadata routing mechanics yet !

@@ -46,15 +45,20 @@
MetadataRouter
~~~~~~~~~~~~~~
The ``MetadataRouter`` objects are constructed via a `get_metadata_routing` method,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the renaming would be very useful for people unfamiliar with metadata routing like me: _get_metadata_request returns a MetadataRequest, get_metadata_router returns a MetadataRouter, all is clear. So +1 for a follow-up PR with the renaming !

@@ -91,8 +94,7 @@
which inherit from ``BaseEstimator``. This is done by attaching instances
of the ``RequestMethod`` descriptor to classes, which is done in the
``_MetadataRequester`` class, and ``BaseEstimator`` inherits from this mixin.
This mixin also implements the ``get_metadata_routing``, which meta-estimators
need to override, but it works for simple consumers as is.
This mixin also implements a ``_get_metadata_request`` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, meta-estimators who inherit from BaseEstimator will also have a _get_metadata_request method ? But I'm confused, what is the purpose of this method for routers, will it return a MetadataRequest ? Is it the self request for routers that are also consumers ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right. However, returning the self requests is not the primary purpose of having _get_metadata_request for pure routers.

I will try to explain to the best of my current understanding:

All estimator objects have their own _get_metadata_request method. I think it is easier to just have the method than to define some distinction between routers and consumers, because many can be both, pure routers are seldom and being a consumer depends on whether a set_{method}_request method is set on them or not.

When set_fit_request is called for instance, _get_metadata_request method is used to check whether they could potentially consume metadata by checking the signatures of their methods and adding class level settings (see _get_class_level_metadata_request_values therefore). And thewhile, the estimator builds a _metadata_request attribute that stores all that information. If I understand correctly, this is identical to the self requests.

When we call _get_metadata_request on a router, it returns that stored object. It might have been updated via the user code in the meantime.

So _get_metadata_request can serve two different purposes, and the first one is needed for any estimator, the second one is actually only interesing for consumers.

I also find that very confusing. I wonder if a solution could be to make the if-else distinction in _get_metadata_request in other places instead and reduce the functionality of _get_metadata_request to either only build the MetadataRequest object or to retrieve it. In that case, we could also rename it. What do you think, @antoinebaker and @adrinjalali? (I am thinking about a future PR of cause.)

# for other reasons.
if hasattr(obj, "get_metadata_routing"):

# TODO(1.9): remove when get_metadata_routing is removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_metadata_routing removed from where ? I guess from _MetadataRequester, but I'm not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add that. Thanks!

hasattr(_obj, "get_metadata_routing")
or hasattr(_obj, "_get_metadata_request")
or isinstance(_obj, MetadataRouter)
):
raise AttributeError(
f"The given object ({_obj.__class__.__name__!r}) needs to either"
" implement the routing method `get_metadata_routing` or be a"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should'nt we limit the scope of process_routing to routers (eg a meta-estimator) only ? It seems here that we allow simple consumers and MetadataRouter instances as well. According to the docstring its only use is inside routers. Are they cases where we need to call it on simple consumers or MetadataRouter instances ? If not, maybe we should raise an error (which could help debugging).

@antoinebaker
Copy link
Contributor

With this PR and after the deprecation of _MetadataRequester.get_metadata_routing, will we able to tell if an object if a router/consumer by duck typing ?

hasattr(obj, "_get_metadata_request") # consumer
hasattr(obj, "get_metadata_routing") # router

Copy link
Member Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, @antoinebaker! I have modified a bit according to your review. Awaiting the next pass. :)

@@ -91,8 +94,7 @@
which inherit from ``BaseEstimator``. This is done by attaching instances
of the ``RequestMethod`` descriptor to classes, which is done in the
``_MetadataRequester`` class, and ``BaseEstimator`` inherits from this mixin.
This mixin also implements the ``get_metadata_routing``, which meta-estimators
need to override, but it works for simple consumers as is.
This mixin also implements a ``_get_metadata_request`` method.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right. However, returning the self requests is not the primary purpose of having _get_metadata_request for pure routers.

I will try to explain to the best of my current understanding:

All estimator objects have their own _get_metadata_request method. I think it is easier to just have the method than to define some distinction between routers and consumers, because many can be both, pure routers are seldom and being a consumer depends on whether a set_{method}_request method is set on them or not.

When set_fit_request is called for instance, _get_metadata_request method is used to check whether they could potentially consume metadata by checking the signatures of their methods and adding class level settings (see _get_class_level_metadata_request_values therefore). And thewhile, the estimator builds a _metadata_request attribute that stores all that information. If I understand correctly, this is identical to the self requests.

When we call _get_metadata_request on a router, it returns that stored object. It might have been updated via the user code in the meantime.

So _get_metadata_request can serve two different purposes, and the first one is needed for any estimator, the second one is actually only interesing for consumers.

I also find that very confusing. I wonder if a solution could be to make the if-else distinction in _get_metadata_request in other places instead and reduce the functionality of _get_metadata_request to either only build the MetadataRequest object or to retrieve it. In that case, we could also rename it. What do you think, @antoinebaker and @adrinjalali? (I am thinking about a future PR of cause.)

# for other reasons.
if hasattr(obj, "get_metadata_routing"):

# TODO(1.9): remove when get_metadata_routing is removed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add that. Thanks!

hasattr(_obj, "get_metadata_routing")
or hasattr(_obj, "_get_metadata_request")
or isinstance(_obj, MetadataRouter)
):
raise AttributeError(
f"The given object ({_obj.__class__.__name__!r}) needs to either"
" implement the routing method `get_metadata_routing` or be a"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my, when I added that condition, it had made sense to me and now I cannot explain it anymore ... 😅

I now think you are right and I have removed the new condition, @antoinebaker.

On the question whether we ever call process_routing on MetadataRouter instances: It seems we do that in some of the tests in the test_metaestimators_metadata_routing.py module. But we shouldn't modify code just to match a test. Instead, I think the tests would need to be different / less abstract.

@StefanieSenger
Copy link
Member Author

@antoinebaker:

With this PR and after the deprecation of _MetadataRequester.get_metadata_routing, will we able to tell if an object if a router/consumer by duck typing ?

hasattr(obj, "_get_metadata_request") # consumer
hasattr(obj, "get_metadata_routing") # router

No, unfortunately, we won't be able to do that, since both objects will have the _get_metadata_request method and its used for creating a request object as well as to retrieve it. (See this comment.)

It might be that pure routers (that do not have self requests), don't need it and in this case, they could maybe not inherit from the _MetadataRequest mixin class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metadata Routing all issues related to metadata routing, slep006, sample props module:utils
Projects
Status: To Review
Development

Successfully merging this pull request may close these issues.

3 participants