-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
DOC improve metadata routing example #27357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC improve metadata routing example #27357
Conversation
# meta_est.fit(X, y, aliased_sample_weight=my_weights): | ||
# ... # this estimator (est), expects aliased_sample_weight as seen above | ||
# meta_meta_est.fit(X, y, aliased_sample_weight=my_weights): | ||
# ... # this estimator (meta_est), expects aliased_sample_weight as seen above | ||
# self.estimator_.fit(X, y, aliased_sample_weight=aliased_sample_weight): | ||
# ... # now est passes aliased_sample_weight's value as sample_weight, | ||
# ... # now meta_est passes aliased_sample_weight's value as sample_weight, | ||
# # which is expected by the sub-estimator | ||
# self.estimator_.fit(X, y, sample_weight=aliased_sample_weight) | ||
# ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused about this part. Do the comments belong to the code lines below or above them? Had est
and meta_est
been used correctly? What are the ...
supposed to mean?
In the next push, I'll suggest to change that part the way I would think it was meant. Please, let me know if this makes sense.
est.fit(X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights) | ||
meta_est.fit( | ||
X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights | ||
) # <-- `meta_clf_sample_weight=my_weights`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it very confusing why this would throw an error:
meta_est.fit(X, y, meta_clf_sample_weight=my_weights, clf_sample_weight=my_other_weights)
From the phrase above, I don't understand the reasoning and also its wording ("only needs") it doesn't seem so strict that it would not be possible to to otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it very confusing why this would throw an error:
Yes, that is confusing, we agree. But consumers do no validation and routing, hence they need non-aliased arguments. The routing information is only for the router that they'll be a part of, not for themselves.
The "only needs" refers to aliased metadata instead of non-aliased ones. As in, the aliases are relevant only for sub-estimators of the meta-estimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, understood. Though I'm afraid this will be a point where every user stumbles.
# - Alias only on the sub-estimator. This is useful if we don't want the | ||
# meta-estimator to use the metadata, and we only want the metadata to be used | ||
# by the sub-estimator. | ||
est = RouterConsumerClassifier( | ||
# - Alias only on the sub-estimator: | ||
# | ||
# This is useful when we don't want the meta-estimator to use the metadata, but | ||
# the sub-estimator should. | ||
meta_est = RouterConsumerClassifier( | ||
estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight") | ||
).set_fit_request(sample_weight=True) | ||
print_routing(est) | ||
).set_fit_request( | ||
sample_weight=True | ||
) # <-- why is `sample_weight=True` if `meta_est` shouldn't consume it? | ||
print_routing(meta_est) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the intend here.
Why not simply set the sub-estimators set_fit_requests
to True
and the meta-estimator's to False
? Why do we need an alias here at all? Why is sample_weight=True
if the meta-estimator shouldn't consume it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do True
and False
here, then in effect you're saying sample_weight
should be passed to the meta-estimator, and also shouldn't. Note that the consumer doesn't do any validation / routing, so if it receives sample_weight
, it would use it. If we want to pass something which only goes to the sub-estimator, then it needs to have another name. You could also set sample_weight=False
for the meta-estimator, that's irrelevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's have it like this:
meta_est = RouterConsumerClassifier(
estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight")
This way, it's less confusing on the first look and I'll add a sentence explaining the special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @StefanieSenger
# meta-estimators are responsible for validating the given metadata. | ||
# `get_routing_for_object` is a safe way to construct a | ||
# `MetadataRouter` or a `MetadataRequest` from the given object. | ||
# `get_routing_for_object` returns a copy of the MetadataRouter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# `get_routing_for_object` returns a copy of the MetadataRouter | |
# `get_routing_for_object` returns a copy of the `MetadataRouter` |
also, I think it's important to mention that the function can return a MetadataRequest
if the object is a consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this comes a little below:
# First, the :meth:`~utils.metadata_routing.get_routing_for_object` takes a
# meta-estimator (``self``) and returns a
# :class:`~utils.metadata_routing.MetadataRouter` or, a
# :class:`~utils.metadata_routing.MetadataRequest` if the object is a consumer,
# based on the output of the estimator's ``get_metadata_routing`` method.
Here in this case, it's a MetadataRouter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing, @adrinjalali. Now, I've gone through the whole document again and resolved everything. There are no open questions from my side.
I'd be happy if you could take another look. :)
est.fit(X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights) | ||
meta_est.fit( | ||
X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights | ||
) # <-- `meta_clf_sample_weight=my_weights`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, understood. Though I'm afraid this will be a point where every user stumbles.
# - Alias only on the sub-estimator. This is useful if we don't want the | ||
# meta-estimator to use the metadata, and we only want the metadata to be used | ||
# by the sub-estimator. | ||
est = RouterConsumerClassifier( | ||
# - Alias only on the sub-estimator: | ||
# | ||
# This is useful when we don't want the meta-estimator to use the metadata, but | ||
# the sub-estimator should. | ||
meta_est = RouterConsumerClassifier( | ||
estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight") | ||
).set_fit_request(sample_weight=True) | ||
print_routing(est) | ||
).set_fit_request( | ||
sample_weight=True | ||
) # <-- why is `sample_weight=True` if `meta_est` shouldn't consume it? | ||
print_routing(meta_est) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's have it like this:
meta_est = RouterConsumerClassifier(
estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight")
This way, it's less confusing on the first look and I'll add a sentence explaining the special case.
CI Failing. |
…Senger/scikit-learn into doc_developing_metadata_routing
This should fix the CI failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pff, long one, nice. A few points, but mostly looks okay.
# metadata routing as a consumer and how a meta-estimator can be upgraded to be | ||
# a router. Imagine a simple classifier accepting ``sample_weight`` as a | ||
# metadata on its ``fit`` and ``groups`` in its ``predict`` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm not sure if "upgrade" is the right notion here
- here we're only talking about a consumer, I don't understand why you've added the router part in the description here O_o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I will move this sentence down.
# Meta-estimators are responsible for validating the given metadata. | ||
# `MetadataRouter.validate_metadata` maps the given metadata to the | ||
# metadata required by the underlying estimator. `method` refers to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mapping is done by route_params
, not validate_metadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
# For demonstration purpose, only one sub-estimator is fitted and its | ||
# classes are attributed to the meta-estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, only metaestimators like ColumnTransformer
and Pipeline
have more than one subestimator. Most other ones take only one subestimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali
I've modified like suggested.
Fingers crossed for the CI failure not appearing again.
Would you also like to have a look, @glemaitre?
# metadata routing as a consumer and how a meta-estimator can be upgraded to be | ||
# a router. Imagine a simple classifier accepting ``sample_weight`` as a | ||
# metadata on its ``fit`` and ``groups`` in its ``predict`` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I will move this sentence down.
# Meta-estimators are responsible for validating the given metadata. | ||
# `MetadataRouter.validate_metadata` maps the given metadata to the | ||
# metadata required by the underlying estimator. `method` refers to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
# For demonstration purpose, only one sub-estimator is fitted and its | ||
# classes are attributed to the meta-estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, now I can see that, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of typo but it looks good. We will need #28371 to remove the current failure.
parameter and forwards data and metadata, is also a router. A consumer, on the | ||
other hand, is an object which accepts and uses a certain given metadata. For | ||
instance, an estimator taking into account ``sample_weight`` in its :term:`fit` | ||
method is a consumer of ``sample_weight``. It is possible for an object to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also break the paragraph when explaining that an estimator can consume and route data.
Co-authored-by: Guillaume Lemaitre <[email protected]>
So now, it is green. @adrinjalali you want to have a last pass on it. |
So we need to fix something before to merge in |
This PR is essentially waiting for #28517 to be merged. Then afterwards I will come back here. This is because I had originally tried to modify the The changes in |
Ok, then I'll wait for that to happen. Thanks for letting me know, @StefanieSenger :) |
@ArturoAmorQ: The other PR is now merged and this one here is cleaned up and up to date. You can now review this PR if you wish. |
What does this implement/fix? Explain your changes.
This PR aims to improve the Metadata Routing example for clarity and readability. I've also enhanced two error messages in
sklearn/utils/_metadata_requests.py
to make them better understandable for inexperienced users.I feel that this example is already pretty comprehensible and contains all the information needed (though it's a slow read). I've mainly worked on wording and variable naming.
I'm looking forward to your reviews, @adrinjalali and @glemaitre. :)
Any other comments?
Edit: It was a draft, but now it's ready to be a real PR.