Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC improve metadata routing example #27357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Sep 13, 2023

What does this implement/fix? Explain your changes.

This PR aims to improve the Metadata Routing example for clarity and readability. I've also enhanced two error messages in sklearn/utils/_metadata_requests.py to make them better understandable for inexperienced users.

I feel that this example is already pretty comprehensible and contains all the information needed (though it's a slow read). I've mainly worked on wording and variable naming.

I'm looking forward to your reviews, @adrinjalali and @glemaitre. :)

Any other comments?

Edit: It was a draft, but now it's ready to be a real PR.

@github-actions
Copy link

github-actions bot commented Sep 13, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7e53771. Link to the linter CI: here

Comment on lines 301 to 325
# meta_est.fit(X, y, aliased_sample_weight=my_weights):
# ... # this estimator (est), expects aliased_sample_weight as seen above
# meta_meta_est.fit(X, y, aliased_sample_weight=my_weights):
# ... # this estimator (meta_est), expects aliased_sample_weight as seen above
# self.estimator_.fit(X, y, aliased_sample_weight=aliased_sample_weight):
# ... # now est passes aliased_sample_weight's value as sample_weight,
# ... # now meta_est passes aliased_sample_weight's value as sample_weight,
# # which is expected by the sub-estimator
# self.estimator_.fit(X, y, sample_weight=aliased_sample_weight)
# ...
Copy link
Contributor Author

@StefanieSenger StefanieSenger Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused about this part. Do the comments belong to the code lines below or above them? Had est and meta_est been used correctly? What are the ... supposed to mean?

In the next push, I'll suggest to change that part the way I would think it was meant. Please, let me know if this makes sense.

est.fit(X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights)
meta_est.fit(
X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights
) # <-- `meta_clf_sample_weight=my_weights`?
Copy link
Contributor Author

@StefanieSenger StefanieSenger Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it very confusing why this would throw an error:

meta_est.fit(X, y, meta_clf_sample_weight=my_weights, clf_sample_weight=my_other_weights)

From the phrase above, I don't understand the reasoning and also its wording ("only needs") it doesn't seem so strict that it would not be possible to to otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it very confusing why this would throw an error:

Yes, that is confusing, we agree. But consumers do no validation and routing, hence they need non-aliased arguments. The routing information is only for the router that they'll be a part of, not for themselves.

The "only needs" refers to aliased metadata instead of non-aliased ones. As in, the aliases are relevant only for sub-estimators of the meta-estimator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understood. Though I'm afraid this will be a point where every user stumbles.

Comment on lines 410 to 445
# - Alias only on the sub-estimator. This is useful if we don't want the
# meta-estimator to use the metadata, and we only want the metadata to be used
# by the sub-estimator.
est = RouterConsumerClassifier(
# - Alias only on the sub-estimator:
#
# This is useful when we don't want the meta-estimator to use the metadata, but
# the sub-estimator should.
meta_est = RouterConsumerClassifier(
estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight")
).set_fit_request(sample_weight=True)
print_routing(est)
).set_fit_request(
sample_weight=True
) # <-- why is `sample_weight=True` if `meta_est` shouldn't consume it?
print_routing(meta_est)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the intend here.

Why not simply set the sub-estimators set_fit_requests to True and the meta-estimator's to False? Why do we need an alias here at all? Why is sample_weight=True if the meta-estimator shouldn't consume it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do True and False here, then in effect you're saying sample_weight should be passed to the meta-estimator, and also shouldn't. Note that the consumer doesn't do any validation / routing, so if it receives sample_weight, it would use it. If we want to pass something which only goes to the sub-estimator, then it needs to have another name. You could also set sample_weight=False for the meta-estimator, that's irrelevant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's have it like this:

meta_est = RouterConsumerClassifier(
    estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight")

This way, it's less confusing on the first look and I'll add a sentence explaining the special case.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# meta-estimators are responsible for validating the given metadata.
# `get_routing_for_object` is a safe way to construct a
# `MetadataRouter` or a `MetadataRequest` from the given object.
# `get_routing_for_object` returns a copy of the MetadataRouter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# `get_routing_for_object` returns a copy of the MetadataRouter
# `get_routing_for_object` returns a copy of the `MetadataRouter`

also, I think it's important to mention that the function can return a MetadataRequest if the object is a consumer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this comes a little below:

# First, the :meth:`~utils.metadata_routing.get_routing_for_object` takes a
# meta-estimator (``self``) and returns a
# :class:`~utils.metadata_routing.MetadataRouter` or, a
# :class:`~utils.metadata_routing.MetadataRequest` if the object is a consumer,
# based on the output of the estimator's ``get_metadata_routing`` method.

Here in this case, it's a MetadataRouter.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing, @adrinjalali. Now, I've gone through the whole document again and resolved everything. There are no open questions from my side.
I'd be happy if you could take another look. :)

est.fit(X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights)
meta_est.fit(
X, y, sample_weight=my_weights, clf_sample_weight=my_other_weights
) # <-- `meta_clf_sample_weight=my_weights`?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understood. Though I'm afraid this will be a point where every user stumbles.

Comment on lines 410 to 445
# - Alias only on the sub-estimator. This is useful if we don't want the
# meta-estimator to use the metadata, and we only want the metadata to be used
# by the sub-estimator.
est = RouterConsumerClassifier(
# - Alias only on the sub-estimator:
#
# This is useful when we don't want the meta-estimator to use the metadata, but
# the sub-estimator should.
meta_est = RouterConsumerClassifier(
estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight")
).set_fit_request(sample_weight=True)
print_routing(est)
).set_fit_request(
sample_weight=True
) # <-- why is `sample_weight=True` if `meta_est` shouldn't consume it?
print_routing(meta_est)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's have it like this:

meta_est = RouterConsumerClassifier(
    estimator=ExampleClassifier().set_fit_request(sample_weight="aliased_sample_weight")

This way, it's less confusing on the first look and I'll add a sentence explaining the special case.

@StefanieSenger StefanieSenger marked this pull request as ready for review January 4, 2024 12:09
@adrinjalali
Copy link
Member

CI Failing.

@StefanieSenger
Copy link
Contributor Author

This should fix the CI failure.
@adrinjalali, please have a look into this PR.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pff, long one, nice. A few points, but mostly looks okay.

Comment on lines 88 to 90
# metadata routing as a consumer and how a meta-estimator can be upgraded to be
# a router. Imagine a simple classifier accepting ``sample_weight`` as a
# metadata on its ``fit`` and ``groups`` in its ``predict`` method:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm not sure if "upgrade" is the right notion here
  • here we're only talking about a consumer, I don't understand why you've added the router part in the description here O_o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will move this sentence down.

Comment on lines 177 to 179
# Meta-estimators are responsible for validating the given metadata.
# `MetadataRouter.validate_metadata` maps the given metadata to the
# metadata required by the underlying estimator. `method` refers to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mapping is done by route_params, not validate_metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

Comment on lines 188 to 189
# For demonstration purpose, only one sub-estimator is fitted and its
# classes are attributed to the meta-estimator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, only metaestimators like ColumnTransformer and Pipeline have more than one subestimator. Most other ones take only one subestimator.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali
I've modified like suggested.
Fingers crossed for the CI failure not appearing again.

Would you also like to have a look, @glemaitre?

Comment on lines 88 to 90
# metadata routing as a consumer and how a meta-estimator can be upgraded to be
# a router. Imagine a simple classifier accepting ``sample_weight`` as a
# metadata on its ``fit`` and ``groups`` in its ``predict`` method:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will move this sentence down.

Comment on lines 177 to 179
# Meta-estimators are responsible for validating the given metadata.
# `MetadataRouter.validate_metadata` maps the given metadata to the
# metadata required by the underlying estimator. `method` refers to the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

Comment on lines 188 to 189
# For demonstration purpose, only one sub-estimator is fitted and its
# classes are attributed to the meta-estimator.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, now I can see that, too.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of typo but it looks good. We will need #28371 to remove the current failure.

parameter and forwards data and metadata, is also a router. A consumer, on the
other hand, is an object which accepts and uses a certain given metadata. For
instance, an estimator taking into account ``sample_weight`` in its :term:`fit`
method is a consumer of ``sample_weight``. It is possible for an object to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also break the paragraph when explaining that an estimator can consume and route data.

@glemaitre
Copy link
Member

So now, it is green. @adrinjalali you want to have a last pass on it.

StefanieSenger added a commit to StefanieSenger/scikit-learn that referenced this pull request Feb 23, 2024
@glemaitre
Copy link
Member

So we need to fix something before to merge in main.

@ArturoAmorQ ArturoAmorQ self-requested a review March 7, 2024 09:37
@StefanieSenger
Copy link
Contributor Author

This PR is essentially waiting for #28517 to be merged. Then afterwards I will come back here. This is because I had originally tried to modify the UnsetMetadataPassedError here already, then did something better on the other PR. So, the _metadata_requests.py file and some of the test files will change, after merging the other PR.

The changes in metadata_routing.rst can still be reviewed, @ArturoAmorQ (I have seen that you've self requested a review today).

@ArturoAmorQ
Copy link
Member

Ok, then I'll wait for that to happen. Thanks for letting me know, @StefanieSenger :)

@StefanieSenger
Copy link
Contributor Author

@ArturoAmorQ: The other PR is now merged and this one here is cleaned up and up to date. You can now review this PR if you wish.

StefanieSenger added a commit to StefanieSenger/scikit-learn that referenced this pull request Apr 12, 2024
@adrinjalali adrinjalali merged commit 9985c0b into scikit-learn:main Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants