Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MNT more informative error message for UnsetMetadataPassedError #28517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Feb 23, 2024

What does this implement/fix? Explain your changes.

This PR aims to improve the information provided when UnsetMetadataPassedError is raised.

The error message will now explicitly state, that the method they are asked to set a metadata request is called from which method in the meta-estimator object.

More detailed explanation

This is especially important when a method in the router is routing to several methods in a consumer. See #28261 for RANSACRegressor as an example, where RANSACRegressor.fit() routes to both estimator.fit() and estimator.score(). The user will receive the following error:

sklearn.exceptions.UnsetMetadataPassedError: [sample_weight] are passed but are not explicitly set as requested or not requested for LinearRegression.score, which is used within RANSACRegressor.fit. Call LinearRegression.set_score_request({metadata}=True) for each metadata.

when they have done LinearRegression.set_fit_request(sample_weight=True), but where not aware that they need to do set_score_request(sample_weight=True) as well. At the moment, the error message they would receive in such a case would confuse many people.

Additionally, in case of composed methods like fit_transform and fit_predict the message is now improved to suggest to use set_fit_request, instead of setting the request for the composed method (which would result in an error).

It also works over several layers.

@adrinjalali :)

Copy link

github-actions bot commented Feb 23, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 96e42a0. Link to the linter CI: here

@StefanieSenger StefanieSenger changed the title ENH more informative error message for UnsetMetadataPassedError MNT more informative error message for UnsetMetadataPassedError Feb 23, 2024
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would like to see a test where we test basically this case:

est = MetaEstimator1(
    MetaEstimator2(
        Consumer()
    )
).fit(X, y, metadata="blah")

to see what the error message looks like.

@StefanieSenger
Copy link
Contributor Author

Thank you @adrinjalali.
I've added a test, please have a look.
metadata, sample_weight and so on are not explicitly translated in the error message. They come as a list and can be one or many.

@adrinjalali adrinjalali self-requested a review February 27, 2024 11:07
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay that the message only includes the immediate parent and not the whole tree, i.e. in the test, the message doesn't mention that the meta estimator was inside a pipeline.

@@ -403,7 +403,7 @@ def _check_warnings(self, *, params):
"warning, or to True to consume and use the metadata."
)

def _route_params(self, params):
def _route_params(self, params, parent, caller_method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parent, caller order is not consistent here. I think parent, caller makes more sense than caller, parent. Also, caller would be enought, doesn't need to be caller_method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I fixed that an now it's parent, caller everywhere.

Comment on lines 443 to 444
if self.method in COMPOSITE_METHODS:
callee_method = COMPOSITE_METHODS[self.method][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does'nt seem right. The issuue might be in transform of a fit_transform, it's not always fit.

Also, the order of method in COMPOSITE_METHODS[self.method] shouldn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to prevent a message like that one:

sklearn.exceptions.UnsetMetadataPassedError: [metadata] are passed but are not explicitly set as requested or not requested for ConsumingTransformer.fit_transform, which is used within Pipeline.fit_transform. Call ConsumingTransformer.set_fit_transform_request({metadata}=True/False) for each metadata you want to request/ignore.

Where set_fit_transform_request() is wrongly suggested as existing.

If either set_fit_request or set_transform_request is set, the user would get this error message:

                raise ValueError(
                    f"Conflicting metadata requests for {', '.join(conflicts)} while"
                    f" composing the requests for {name}. Metadata with the same name"
                    f" for methods {', '.join(COMPOSITE_METHODS[name])} should have the"
                    " same request value."

So here, we'd have to deal with the case that both composite methods don't have a metadata request set. I've modified the message that so both method requests are now displayed in it and I've also added a test for that.

@@ -987,7 +1022,7 @@ def _route_params(self, *, params, method):
res.update(child_params)
return res

def route_params(self, *, caller, params):
def route_params(self, *, caller, params, parent=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have this default then, the user should always pass. If it's none, we'll fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, and I have adjusted the places where it was used in the code, especially where the scorer functions are tested.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali Thanks for your review.
I've addressed all your suggestions, please have another look.

@@ -403,7 +403,7 @@ def _check_warnings(self, *, params):
"warning, or to True to consume and use the metadata."
)

def _route_params(self, params):
def _route_params(self, params, parent, caller_method):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I fixed that an now it's parent, caller everywhere.

Comment on lines 443 to 444
if self.method in COMPOSITE_METHODS:
callee_method = COMPOSITE_METHODS[self.method][0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to prevent a message like that one:

sklearn.exceptions.UnsetMetadataPassedError: [metadata] are passed but are not explicitly set as requested or not requested for ConsumingTransformer.fit_transform, which is used within Pipeline.fit_transform. Call ConsumingTransformer.set_fit_transform_request({metadata}=True/False) for each metadata you want to request/ignore.

Where set_fit_transform_request() is wrongly suggested as existing.

If either set_fit_request or set_transform_request is set, the user would get this error message:

                raise ValueError(
                    f"Conflicting metadata requests for {', '.join(conflicts)} while"
                    f" composing the requests for {name}. Metadata with the same name"
                    f" for methods {', '.join(COMPOSITE_METHODS[name])} should have the"
                    " same request value."

So here, we'd have to deal with the case that both composite methods don't have a metadata request set. I've modified the message that so both method requests are now displayed in it and I've also added a test for that.

@StefanieSenger
Copy link
Contributor Author

I have also slightly changed the error message raised in MetadataRouter.validate_metadata (which has caused me a lot of confusion today in another PR). The reason this error raises here, is that no child method expects those metadata, not that the user forgot the set_method_request.

CodeCov is failing, but I suspect that we can ignore that?

@glemaitre
Copy link
Member

Yep codecov seems to be a false positive.

@glemaitre glemaitre self-requested a review March 5, 2024 17:46
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the improved error message.

However, I would like to make sure that the API is something that we want. I'm under the impression that parent is usuallay representing self and is actually the request.owner. But I might have overlooked something.

@@ -434,12 +440,28 @@ def _route_params(self, params):
elif alias in args:
res[prop] = args[alias]
if unrequested:
if parent.__class__.__name__ != "str":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented in the docstring. Do you have a concrete example where parent is a string. Also what is the problem in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this part is awkward. It's there to make sure we only look into the parent's class name if we use an actual scikit-learn-made parent class.
I had encountered problems with the scoring functions when used for routing metadata. In sklearn/metrics/tests/test_score_objects.py the name of the parent class would be str (not a string), if I remember correctly. I don't manage to recreate that anymore though, it had popped up when running tests, and I cannot figure out which one and how.

The change I made was this one.

At the moment, I don't know what to do about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this distinction is not necessary anymore, since all MetadataRouter.owner is always a string.

if parent.__class__.__name__ != "str":
parent = parent.__class__.__name__
if self.method in COMPOSITE_METHODS:
callee_methods = list(COMPOSITE_METHODS[self.method])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need really need to call the list constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it doesn't need to be wrapped into a list. I'm super surprised though, that both, list(COMPOSITE_METHODS[self.method]) and COMPOSITE_METHODS[self.method], behave in the same way.
They're both ['fit', 'transform'], and not [['fit', 'transform']] and ['fit', 'transform'] respectively ...

I've changed that in the code.

@@ -169,7 +169,9 @@ def fit(self, X, y, **fit_params):
# we can use provided utility methods to map the given metadata to what
# is required by the underlying estimator. Here `method` refers to the
# parent's method, i.e. `fit` in this example.
routed_params = request_router.route_params(params=fit_params, caller="fit")
routed_params = request_router.route_params(
params=fit_params, caller="fit", parent=self.__class__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here. I thought that parent should be self directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that I'm realising now: parent different from router.owner? I'm under the impression that this the is same information and thus we would not need to have parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. I will have to come back to this next week.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I would like to make sure that the API is something that we want. I'm under the impression that parent is usuallay representing self and is actually the request.owner. But I might have overlooked something.

I am not sure. Right now, I'm passing parent through the callstack within _metadata_request.py via:

li. 1609 process_routing()
li.1075 MetadataRouter.route_params()
li. 644 MetadataRequest._route_params() and
li. 464 MethodMetadataRequest._route_params()

I have to check if I can skip the first step and pick the information from MetadataRouter.self.owner. That might be well possible. However, there will be another problem for routers that are functions then.

@@ -434,12 +440,28 @@ def _route_params(self, params):
elif alias in args:
res[prop] = args[alias]
if unrequested:
if parent.__class__.__name__ != "str":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this part is awkward. It's there to make sure we only look into the parent's class name if we use an actual scikit-learn-made parent class.
I had encountered problems with the scoring functions when used for routing metadata. In sklearn/metrics/tests/test_score_objects.py the name of the parent class would be str (not a string), if I remember correctly. I don't manage to recreate that anymore though, it had popped up when running tests, and I cannot figure out which one and how.

The change I made was this one.

At the moment, I don't know what to do about it.

if parent.__class__.__name__ != "str":
parent = parent.__class__.__name__
if self.method in COMPOSITE_METHODS:
callee_methods = list(COMPOSITE_METHODS[self.method])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it doesn't need to be wrapped into a list. I'm super surprised though, that both, list(COMPOSITE_METHODS[self.method]) and COMPOSITE_METHODS[self.method], behave in the same way.
They're both ['fit', 'transform'], and not [['fit', 'transform']] and ['fit', 'transform'] respectively ...

I've changed that in the code.

@@ -169,7 +169,9 @@ def fit(self, X, y, **fit_params):
# we can use provided utility methods to map the given metadata to what
# is required by the underlying estimator. Here `method` refers to the
# parent's method, i.e. `fit` in this example.
routed_params = request_router.route_params(params=fit_params, caller="fit")
routed_params = request_router.route_params(
params=fit_params, caller="fit", parent=self.__class__
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. I will have to come back to this next week.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through it with fresh eyes and: yes! The parent param is the same as MetadataRouter.owner. We can take the information from there (which facilitates our test and example files) and as its a string, we don't even need the distinction before raising the UnsetMetadataPassedError. This solves several problems at once. Good that you hinted that, @glemaitre. Maybe you want to have another look now?

@glemaitre
Copy link
Member

I'll have another look now.

@glemaitre glemaitre self-requested a review March 15, 2024 16:04
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would have maybe think of calling parent with another name but it seems that through the documentation this is the way that we chose so let's keep it.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still somewhat on the edge about this. But if y'all think it helps users, then okay.

Note that this only provides the parent info, and in nested cases that itself might not be enough. Example:

est = BaggingClassifier(Pipeline([("inner", LogisticRegression())]))
est.fit([[1], [1]], [1,1], sample_weight=[1])

...
UnsetMetadataPassedError: [sample_weight] are passed but are not explicitly set as requested or not requested for LogisticRegression.fit, which is used within Pipeline.fit. Call `LogisticRegression.set_fit_request({metadata}=True/False)` for each metadata you want to request/ignore.

The information about BaggingClassifier is not there. Happy to merge this, but I think we could do better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants