-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
MNT Refactor and deprecate get_metadata_routing
method in _MetadataRequester
#31695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MNT Refactor and deprecate get_metadata_routing
method in _MetadataRequester
#31695
Conversation
with pytest.warns(UserWarning, match="`transform` method which consumes metadata"): | ||
rct = RoutingCustomTransformer( | ||
estimator=CustomTransformer() | ||
.set_fit_request(metadata=True) | ||
.set_transform_request(metadata=True) | ||
) | ||
rct.fit_transform(X=[[1]], metadata="metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was added to make sure the warning is also shown when a transformer is not a pure consumer (but also a router). It refers to the change in sklearn.base.TransformerMixin.fit_transform
.
def test_default_requests(): | ||
class OddEstimator(BaseEstimator): | ||
def test_class_level_requests(): | ||
class StubEstimator(BaseEstimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm renaming OddEstimator
because the old naming implies we are intentionally demonstrating wrong usage, but that's not our purpose here.
@@ -210,19 +210,19 @@ def test_request_type_is_valid(val, res): | |||
|
|||
|
|||
@config_context(enable_metadata_routing=True) | |||
def test_default_requests(): | |||
class OddEstimator(BaseEstimator): | |||
def test_class_level_requests(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm renaming test_default_requests
because the term "default" is not referring to default / auto routing here and could be mis-understood. What is meant are the class level requests.
Log of my though process (that hopefully helps me move forwards): The de-coupeling of I wonder if we want to fully automate the setting of self requests, or if it should be integrated in Since then I need to pass an object into But potentially, those could both be merged into the same param, since passing |
I understood that this PR cannot do much about the double-handling of |
I've experimented with my above idea to add a self-request automatically to routers, but it fails for routers that are pure routers but not also consumers. Given that, I no longer see a benefit in removing or refining the If that input is done by calling Also, the original change of this PR, the de-tangling of At this point, I am tempted to think this refactoring is complete, unless there is some specific thing that we want to do with This PR makes the code more readable, because we can clearly see if we deal with a consumer based on the method called on an object. It also adds some improvements to the internal and the API docs and personally, it has given me more insights and more ideas for refactorings (which I will open separate PRs for). What is left that I will go through our docs and examples to check whether the right method is used everywhere. I will do that and then un-draft this PR. Do you consider it complete, @adrinjalali? |
@@ -46,15 +45,20 @@ | |||
MetadataRouter | |||
~~~~~~~~~~~~~~ | |||
The ``MetadataRouter`` objects are constructed via a `get_metadata_routing` method, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
MetadataRouter
objects are constructed via aget_metadata_routing
method
@adrinjalali, Do you think it makes sense to rename get_metadata_routing
into get_metadata_router
?
That would be more precise and we spare people to wonder about whether there is a difference between "metadata router" and "metadata routing". In fact, this method returns a MetadataRouter object.
That would be for a new PR and could be done without deprecation (because we are still in experimental stage), I believe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind the rename, but needs to go through a deprecation cycle. Maybe a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the renaming would be very useful for people unfamiliar with metadata routing like me: _get_metadata_request
returns a MetadataRequest
, get_metadata_router
returns a MetadataRouter
, all is clear. So +1 for a follow-up PR with the renaming !
Consumer-only classes such as simple estimators return a serialized | ||
version of this class as the output of `get_metadata_routing()`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true (anymore).
get_routing_for_object()
, _get_metadata_request()
and get_metadata_routing()
all return unserialised output.
@@ -1529,20 +1532,6 @@ def _get_metadata_request(self): | |||
|
|||
return requests | |||
|
|||
def get_metadata_routing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side effect of this change is that this method won't be displayed in the docs for the consumers anymore. I don't think it's too bad, since we want users to use get_routing_for_object
anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also do a single release deprecation cycle for this (shorter than usual).
@adrinjalali, I did the deprecation of |
get_metadata_routing
method in _MetadataRequester
get_metadata_routing
method in _MetadataRequester
get_metadata_routing
method in _MetadataRequester
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think otherwise LGTM.
sklearn/utils/_metadata_requests.py
Outdated
hasattr(_obj, "get_metadata_routing") | ||
or hasattr(_obj, "_get_metadata_request") | ||
or isinstance(_obj, MetadataRouter) | ||
): | ||
raise AttributeError( | ||
f"The given object ({_obj.__class__.__name__!r}) needs to either" | ||
" implement the routing method `get_metadata_routing` or be a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error message is now wrong for simple consumers which do not inherit from BaseEstimator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should'nt we limit the scope of process_routing
to routers (eg a meta-estimator) only ? It seems here that we allow simple consumers and MetadataRouter instances as well. According to the docstring its only use is inside routers. Are they cases where we need to call it on simple consumers or MetadataRouter instances ? If not, maybe we should raise an error (which could help debugging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, when I added that condition, it had made sense to me and now I cannot explain it anymore ... 😅
I now think you are right and I have removed the new condition, @antoinebaker.
On the question whether we ever call process_routing
on MetadataRouter
instances: It seems we do that in some of the tests in the test_metaestimators_metadata_routing.py
module. But we shouldn't modify code just to match a test. Instead, I think the tests would need to be different / less abstract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @StefanieSenger for the PR, it does improve the code readability. Being unfamiliar with the metadata routing code, I had some difficulty understanding which object was returned by the get_metadata_routing
methods, but it's much clearer now.
Here is a first round of comments, I'll do another pass later on, as I feel I don't fully grasp the metadata routing mechanics yet !
@@ -46,15 +45,20 @@ | |||
MetadataRouter | |||
~~~~~~~~~~~~~~ | |||
The ``MetadataRouter`` objects are constructed via a `get_metadata_routing` method, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the renaming would be very useful for people unfamiliar with metadata routing like me: _get_metadata_request
returns a MetadataRequest
, get_metadata_router
returns a MetadataRouter
, all is clear. So +1 for a follow-up PR with the renaming !
@@ -91,8 +94,7 @@ | |||
which inherit from ``BaseEstimator``. This is done by attaching instances | |||
of the ``RequestMethod`` descriptor to classes, which is done in the | |||
``_MetadataRequester`` class, and ``BaseEstimator`` inherits from this mixin. | |||
This mixin also implements the ``get_metadata_routing``, which meta-estimators | |||
need to override, but it works for simple consumers as is. | |||
This mixin also implements a ``_get_metadata_request`` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, meta-estimators who inherit from BaseEstimator will also have a _get_metadata_request
method ? But I'm confused, what is the purpose of this method for routers, will it return a MetadataRequest
? Is it the self request for routers that are also consumers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right. However, returning the self requests is not the primary purpose of having _get_metadata_request
for pure routers.
I will try to explain to the best of my current understanding:
All estimator objects have their own _get_metadata_request
method. I think it is easier to just have the method than to define some distinction between routers and consumers, because many can be both, pure routers are seldom and being a consumer depends on whether a set_{method}_request
method is set on them or not.
When set_fit_request
is called for instance, _get_metadata_request
method is used to check whether they could potentially consume metadata by checking the signatures of their methods and adding class level settings (see _get_class_level_metadata_request_values
therefore). And thewhile, the estimator builds a _metadata_request
attribute that stores all that information. If I understand correctly, this is identical to the self requests.
When we call _get_metadata_request
on a router, it returns that stored object. It might have been updated via the user code in the meantime.
So _get_metadata_request
can serve two different purposes, and the first one is needed for any estimator, the second one is actually only interesing for consumers.
I also find that very confusing. I wonder if a solution could be to make the if-else distinction in _get_metadata_request
in other places instead and reduce the functionality of _get_metadata_request
to either only build the MetadataRequest
object or to retrieve it. In that case, we could also rename it. What do you think, @antoinebaker and @adrinjalali? (I am thinking about a future PR of cause.)
sklearn/utils/_metadata_requests.py
Outdated
# for other reasons. | ||
if hasattr(obj, "get_metadata_routing"): | ||
|
||
# TODO(1.9): remove when get_metadata_routing is removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_metadata_routing removed from where ? I guess from _MetadataRequester, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add that. Thanks!
sklearn/utils/_metadata_requests.py
Outdated
hasattr(_obj, "get_metadata_routing") | ||
or hasattr(_obj, "_get_metadata_request") | ||
or isinstance(_obj, MetadataRouter) | ||
): | ||
raise AttributeError( | ||
f"The given object ({_obj.__class__.__name__!r}) needs to either" | ||
" implement the routing method `get_metadata_routing` or be a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should'nt we limit the scope of process_routing
to routers (eg a meta-estimator) only ? It seems here that we allow simple consumers and MetadataRouter instances as well. According to the docstring its only use is inside routers. Are they cases where we need to call it on simple consumers or MetadataRouter instances ? If not, maybe we should raise an error (which could help debugging).
With this PR and after the deprecation of hasattr(obj, "_get_metadata_request") # consumer
hasattr(obj, "get_metadata_routing") # router |
Co-authored-by: antoinebaker <[email protected]>
…/scikit-learn into refactor_router_requester
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing, @antoinebaker! I have modified a bit according to your review. Awaiting the next pass. :)
@@ -91,8 +94,7 @@ | |||
which inherit from ``BaseEstimator``. This is done by attaching instances | |||
of the ``RequestMethod`` descriptor to classes, which is done in the | |||
``_MetadataRequester`` class, and ``BaseEstimator`` inherits from this mixin. | |||
This mixin also implements the ``get_metadata_routing``, which meta-estimators | |||
need to override, but it works for simple consumers as is. | |||
This mixin also implements a ``_get_metadata_request`` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right. However, returning the self requests is not the primary purpose of having _get_metadata_request
for pure routers.
I will try to explain to the best of my current understanding:
All estimator objects have their own _get_metadata_request
method. I think it is easier to just have the method than to define some distinction between routers and consumers, because many can be both, pure routers are seldom and being a consumer depends on whether a set_{method}_request
method is set on them or not.
When set_fit_request
is called for instance, _get_metadata_request
method is used to check whether they could potentially consume metadata by checking the signatures of their methods and adding class level settings (see _get_class_level_metadata_request_values
therefore). And thewhile, the estimator builds a _metadata_request
attribute that stores all that information. If I understand correctly, this is identical to the self requests.
When we call _get_metadata_request
on a router, it returns that stored object. It might have been updated via the user code in the meantime.
So _get_metadata_request
can serve two different purposes, and the first one is needed for any estimator, the second one is actually only interesing for consumers.
I also find that very confusing. I wonder if a solution could be to make the if-else distinction in _get_metadata_request
in other places instead and reduce the functionality of _get_metadata_request
to either only build the MetadataRequest
object or to retrieve it. In that case, we could also rename it. What do you think, @antoinebaker and @adrinjalali? (I am thinking about a future PR of cause.)
sklearn/utils/_metadata_requests.py
Outdated
# for other reasons. | ||
if hasattr(obj, "get_metadata_routing"): | ||
|
||
# TODO(1.9): remove when get_metadata_routing is removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add that. Thanks!
sklearn/utils/_metadata_requests.py
Outdated
hasattr(_obj, "get_metadata_routing") | ||
or hasattr(_obj, "_get_metadata_request") | ||
or isinstance(_obj, MetadataRouter) | ||
): | ||
raise AttributeError( | ||
f"The given object ({_obj.__class__.__name__!r}) needs to either" | ||
" implement the routing method `get_metadata_routing` or be a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, when I added that condition, it had made sense to me and now I cannot explain it anymore ... 😅
I now think you are right and I have removed the new condition, @antoinebaker.
On the question whether we ever call process_routing
on MetadataRouter
instances: It seems we do that in some of the tests in the test_metaestimators_metadata_routing.py
module. But we shouldn't modify code just to match a test. Instead, I think the tests would need to be different / less abstract.
No, unfortunately, we won't be able to do that, since both objects will have the It might be that pure routers (that do not have self requests), don't need it and in this case, they could maybe not inherit from the |
What does this implement/fix? Explain your changes.
This PR de-couples the double-handling of
MetadataRouter
andMetadataRequest
objects in metadata routing ( a bit), by not using theget_metadata_routing
method inMetadataRequest
anymore, but instead to directly use_get_metadata_request
.This change has affect in several parts of our metadata routing implementation and the tests and there simplifies the code and makes it more intuitive to read.
The goal of this PR is to increase maintainability.
While at it, I also:
__metadata_request__*
class attribute into__metadata_request__{method}
and made clear to mention its a class attribute in our internal documentation to make it easier to distinguish it from the_metadata_request
instance attribute present in consumers at first sight