-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MNT more informative error message for UnsetMetadataPassedError
#28517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT more informative error message for UnsetMetadataPassedError
#28517
Conversation
UnsetMetadataPassedError
UnsetMetadataPassedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I would like to see a test where we test basically this case:
est = MetaEstimator1(
MetaEstimator2(
Consumer()
)
).fit(X, y, metadata="blah")
to see what the error message looks like.
Co-authored-by: Adrin Jalali <[email protected]>
Thank you @adrinjalali. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay that the message only includes the immediate parent and not the whole tree, i.e. in the test, the message doesn't mention that the meta estimator was inside a pipeline.
sklearn/utils/_metadata_requests.py
Outdated
@@ -403,7 +403,7 @@ def _check_warnings(self, *, params): | |||
"warning, or to True to consume and use the metadata." | |||
) | |||
|
|||
def _route_params(self, params): | |||
def _route_params(self, params, parent, caller_method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parent, caller
order is not consistent here. I think parent, caller
makes more sense than caller, parent
. Also, caller
would be enought, doesn't need to be caller_method
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I fixed that an now it's parent, caller
everywhere.
sklearn/utils/_metadata_requests.py
Outdated
if self.method in COMPOSITE_METHODS: | ||
callee_method = COMPOSITE_METHODS[self.method][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does'nt seem right. The issuue might be in transform
of a fit_transform
, it's not always fit
.
Also, the order of method in COMPOSITE_METHODS[self.method]
shouldn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to prevent a message like that one:
sklearn.exceptions.UnsetMetadataPassedError: [metadata] are passed but are not explicitly set as requested or not requested for ConsumingTransformer.fit_transform, which is used within Pipeline.fit_transform. Call
ConsumingTransformer.set_fit_transform_request({metadata}=True/False)
for each metadata you want to request/ignore.
Where set_fit_transform_request()
is wrongly suggested as existing.
If either set_fit_request
or set_transform_request
is set, the user would get this error message:
raise ValueError(
f"Conflicting metadata requests for {', '.join(conflicts)} while"
f" composing the requests for {name}. Metadata with the same name"
f" for methods {', '.join(COMPOSITE_METHODS[name])} should have the"
" same request value."
So here, we'd have to deal with the case that both composite methods don't have a metadata request set. I've modified the message that so both method requests are now displayed in it and I've also added a test for that.
sklearn/utils/_metadata_requests.py
Outdated
@@ -987,7 +1022,7 @@ def _route_params(self, *, params, method): | |||
res.update(child_params) | |||
return res | |||
|
|||
def route_params(self, *, caller, params): | |||
def route_params(self, *, caller, params, parent=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have this default then, the user should always pass. If it's none, we'll fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, and I have adjusted the places where it was used in the code, especially where the scorer functions are tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali Thanks for your review.
I've addressed all your suggestions, please have another look.
sklearn/utils/_metadata_requests.py
Outdated
@@ -403,7 +403,7 @@ def _check_warnings(self, *, params): | |||
"warning, or to True to consume and use the metadata." | |||
) | |||
|
|||
def _route_params(self, params): | |||
def _route_params(self, params, parent, caller_method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I fixed that an now it's parent, caller
everywhere.
sklearn/utils/_metadata_requests.py
Outdated
if self.method in COMPOSITE_METHODS: | ||
callee_method = COMPOSITE_METHODS[self.method][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to prevent a message like that one:
sklearn.exceptions.UnsetMetadataPassedError: [metadata] are passed but are not explicitly set as requested or not requested for ConsumingTransformer.fit_transform, which is used within Pipeline.fit_transform. Call
ConsumingTransformer.set_fit_transform_request({metadata}=True/False)
for each metadata you want to request/ignore.
Where set_fit_transform_request()
is wrongly suggested as existing.
If either set_fit_request
or set_transform_request
is set, the user would get this error message:
raise ValueError(
f"Conflicting metadata requests for {', '.join(conflicts)} while"
f" composing the requests for {name}. Metadata with the same name"
f" for methods {', '.join(COMPOSITE_METHODS[name])} should have the"
" same request value."
So here, we'd have to deal with the case that both composite methods don't have a metadata request set. I've modified the message that so both method requests are now displayed in it and I've also added a test for that.
I have also slightly changed the error message raised in CodeCov is failing, but I suspect that we can ignore that? |
Yep codecov seems to be a false positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the improved error message.
However, I would like to make sure that the API is something that we want. I'm under the impression that parent
is usuallay representing self
and is actually the request.owner
. But I might have overlooked something.
sklearn/utils/_metadata_requests.py
Outdated
@@ -434,12 +440,28 @@ def _route_params(self, params): | |||
elif alias in args: | |||
res[prop] = args[alias] | |||
if unrequested: | |||
if parent.__class__.__name__ != "str": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be documented in the docstring. Do you have a concrete example where parent
is a string. Also what is the problem in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this part is awkward. It's there to make sure we only look into the parent's class name if we use an actual scikit-learn-made parent class.
I had encountered problems with the scoring functions when used for routing metadata. In sklearn/metrics/tests/test_score_objects.py the name of the parent class would be str
(not a string), if I remember correctly. I don't manage to recreate that anymore though, it had popped up when running tests, and I cannot figure out which one and how.
The change I made was this one.
At the moment, I don't know what to do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this distinction is not necessary anymore, since all MetadataRouter.owner is always a string.
sklearn/utils/_metadata_requests.py
Outdated
if parent.__class__.__name__ != "str": | ||
parent = parent.__class__.__name__ | ||
if self.method in COMPOSITE_METHODS: | ||
callee_methods = list(COMPOSITE_METHODS[self.method]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need really need to call the list
constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it doesn't need to be wrapped into a list. I'm super surprised though, that both, list(COMPOSITE_METHODS[self.method])
and COMPOSITE_METHODS[self.method]
, behave in the same way.
They're both ['fit', 'transform']
, and not [['fit', 'transform']]
and ['fit', 'transform']
respectively ...
I've changed that in the code.
@@ -169,7 +169,9 @@ def fit(self, X, y, **fit_params): | |||
# we can use provided utility methods to map the given metadata to what | |||
# is required by the underlying estimator. Here `method` refers to the | |||
# parent's method, i.e. `fit` in this example. | |||
routed_params = request_router.route_params(params=fit_params, caller="fit") | |||
routed_params = request_router.route_params( | |||
params=fit_params, caller="fit", parent=self.__class__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here. I thought that parent
should be self
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that I'm realising now: parent
different from router.owner
? I'm under the impression that this the is same information and thus we would not need to have parent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. I will have to come back to this next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I would like to make sure that the API is something that we want. I'm under the impression that parent is usuallay representing self and is actually the request.owner. But I might have overlooked something.
I am not sure. Right now, I'm passing parent
through the callstack within _metadata_request.py
via:
li. 1609 process_routing()
li.1075 MetadataRouter.route_params()
li. 644 MetadataRequest._route_params()
and
li. 464 MethodMetadataRequest._route_params()
I have to check if I can skip the first step and pick the information from MetadataRouter.self.owner
. That might be well possible. However, there will be another problem for routers that are functions then.
sklearn/utils/_metadata_requests.py
Outdated
@@ -434,12 +440,28 @@ def _route_params(self, params): | |||
elif alias in args: | |||
res[prop] = args[alias] | |||
if unrequested: | |||
if parent.__class__.__name__ != "str": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this part is awkward. It's there to make sure we only look into the parent's class name if we use an actual scikit-learn-made parent class.
I had encountered problems with the scoring functions when used for routing metadata. In sklearn/metrics/tests/test_score_objects.py the name of the parent class would be str
(not a string), if I remember correctly. I don't manage to recreate that anymore though, it had popped up when running tests, and I cannot figure out which one and how.
The change I made was this one.
At the moment, I don't know what to do about it.
sklearn/utils/_metadata_requests.py
Outdated
if parent.__class__.__name__ != "str": | ||
parent = parent.__class__.__name__ | ||
if self.method in COMPOSITE_METHODS: | ||
callee_methods = list(COMPOSITE_METHODS[self.method]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it doesn't need to be wrapped into a list. I'm super surprised though, that both, list(COMPOSITE_METHODS[self.method])
and COMPOSITE_METHODS[self.method]
, behave in the same way.
They're both ['fit', 'transform']
, and not [['fit', 'transform']]
and ['fit', 'transform']
respectively ...
I've changed that in the code.
@@ -169,7 +169,9 @@ def fit(self, X, y, **fit_params): | |||
# we can use provided utility methods to map the given metadata to what | |||
# is required by the underlying estimator. Here `method` refers to the | |||
# parent's method, i.e. `fit` in this example. | |||
routed_params = request_router.route_params(params=fit_params, caller="fit") | |||
routed_params = request_router.route_params( | |||
params=fit_params, caller="fit", parent=self.__class__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. I will have to come back to this next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through it with fresh eyes and: yes! The parent
param is the same as MetadataRouter.owner
. We can take the information from there (which facilitates our test and example files) and as its a string, we don't even need the distinction before raising the UnsetMetadataPassedError
. This solves several problems at once. Good that you hinted that, @glemaitre. Maybe you want to have another look now?
I'll have another look now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would have maybe think of calling parent
with another name but it seems that through the documentation this is the way that we chose so let's keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still somewhat on the edge about this. But if y'all think it helps users, then okay.
Note that this only provides the parent info, and in nested cases that itself might not be enough. Example:
est = BaggingClassifier(Pipeline([("inner", LogisticRegression())]))
est.fit([[1], [1]], [1,1], sample_weight=[1])
...
UnsetMetadataPassedError: [sample_weight] are passed but are not explicitly set as requested or not requested for LogisticRegression.fit, which is used within Pipeline.fit. Call `LogisticRegression.set_fit_request({metadata}=True/False)` for each metadata you want to request/ignore.
The information about BaggingClassifier
is not there. Happy to merge this, but I think we could do better.
What does this implement/fix? Explain your changes.
This PR aims to improve the information provided when
UnsetMetadataPassedError
is raised.The error message will now explicitly state, that the method they are asked to set a metadata request is called from which method in the meta-estimator object.
More detailed explanation
This is especially important when a method in the router is routing to several methods in a consumer. See #28261 for
RANSACRegressor
as an example, whereRANSACRegressor.fit()
routes to bothestimator.fit()
andestimator.score()
. The user will receive the following error:sklearn.exceptions.UnsetMetadataPassedError: [sample_weight] are passed but are not explicitly set as requested or not requested for LinearRegression.score, which is used within RANSACRegressor.fit. Call LinearRegression.set_score_request({metadata}=True) for each metadata.
when they have done
LinearRegression.set_fit_request(sample_weight=True)
, but where not aware that they need to doset_score_request(sample_weight=True)
as well. At the moment, the error message they would receive in such a case would confuse many people.Additionally, in case of composed methods like
fit_transform
andfit_predict
the message is now improved to suggest to useset_fit_request
, instead of setting the request for the composed method (which would result in an error).It also works over several layers.
@adrinjalali :)