-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add metadata routing for RANSACRegressor
#28261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add metadata routing for RANSACRegressor
#28261
Conversation
…sume sample_weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complete review, but leaving the comments I worked on for now.
"y": y, | ||
"preserves_metadata": False, | ||
"estimator_routing_methods": ["fit", "predict", "score"], | ||
"requests_set_together": {"fit": ["predict", "score"]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"requests_set_together": {"fit": ["predict", "score"]}, | |
"requests_set_together": {"fit": ["fit", "predict", "score"]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding fit
is unnecessary, because when fit is used set_fit_request
is already set.
Co-authored-by: Adrin Jalali <[email protected]>
…scikit-learn into metadata_RANSACRegressor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali
Thanks for the review. I've implemented most of your suggestions, and commented the remaining few.
sklearn/linear_model/_ransac.py
Outdated
f"{e}, which is used internally by `RANSACRegressor.fit()`." | ||
f"Call `{estimator.__class__.__name__}.set_{{method}}_request(" | ||
"{metadata}=True)` for each metadata." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConsumingRegressor.set_score_request(sample_weight=True)
Do you mean I should add an example within the error message? I think this would clutter up people's consoles.
"y": y, | ||
"preserves_metadata": False, | ||
"estimator_routing_methods": ["fit", "predict", "score"], | ||
"requests_set_together": {"fit": ["predict", "score"]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding fit
is unnecessary, because when fit is used set_fit_request
is already set.
From my side, there is one remaining issue:
The reason for this is that Actually no regressor in scikit-learn currently seems to accept Either this or ... what are the 100 reasons that speak against integrating
|
Update: So, I have finally realized, that I should never have put |
sklearn/linear_model/_ransac.py
Outdated
f"{e}, which is used internally by `RANSACRegressor.fit()`." | ||
f"Call `{estimator.__class__.__name__}.set_{{method}}_request(" | ||
"{metadata}=True)` for each metadata." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the whole message, not just a part of it.
# ) | ||
# return np.zeros(shape=(len(X),)) | ||
def score(self, X, y, sample_weight="default", metadata="default"): | ||
self.predict(X, sample_weight="default", metadata="default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to add this, and our existing scorers certainly don't route metadata to predict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the existing score
methods don't route metadata to predict
. I went through all (I believe) to confirm that. Thus, this line is really not needed here.
For curiosity: What's the problem with sample_weight
in predict
? I can see it in any kind of function, except for in predict. I mean, if I modify my data before fitting, I need to modify it using the same pattern before predicting....
with pytest.raises(UnsetMetadataPassedError, match=re.escape(msg)): | ||
method = getattr(instance, method_name) | ||
method(X, y, **method_kwargs) | ||
if "fit" not in method_name: | ||
# set request on fit | ||
set_requests(estimator, methods=["fit"], metadata_name=key) | ||
# make sure error message corresponding to `method_name` | ||
# is used for test | ||
if method_name == "predict": | ||
set_requests(estimator, methods=["score"], metadata_name=key) | ||
if method_name == "score": | ||
set_requests(estimator, methods=["predict"], metadata_name=key) | ||
# fit before calling method | ||
fit_method = getattr(instance, "fit") | ||
fit_method(X, y, **method_kwargs) | ||
if method_name == "predict": | ||
method(X, **method_kwargs) | ||
else: | ||
method(X, y, **method_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section could be simplified as:
method_requests=metaestimator.get("requests_set_together", {}).get(method_name, [])
set_request(estimator, methods=method_requests, metadata_name=key)
with pytest.raises(UnsetMetadataPassedError, match=re.escape(msg)):
method = getattr(instance, method_name)
if "fit" not in method_name:
instance.fit(X, y)
try:
# fit and partial_fit accept y, others don't.
method(X, y, **method_kwargs)
except TypeError:
method(X, **method_kwargs)
we don't need to pass metadata to fit, when trying to check metadata routing in predict and score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the case that any other routing method is called from within fit
, like RANSACRegressor
calls predict
and score
from within fit
, we need to pass metadata to fit
as well. The goal is to make sure that an error is raised if the user sets the request on fit
, but not on score
for instance and then tries to pass metadata (unaware that score
is also consuming it from within fit
). If no metadata is passed, the error cannot raise. Your suggestion doesn't cover that case.
This is the case I'm trying to modify this test for.
Note: When I wrote the test, I thought that we'd also prepare estimator.predict
to be passed metadata to (since ConsumingRegressor.predict
was coded to receive metadata). This idea has changed since (it would be to large an endevour for this PR) and I will adjust this test accordingly. Also, I've used some of your suggestions, to simplify this test.
Though, I am still looking for a more generic version of it, that's not so specific for RANSAC. Something maybe like the following:
if "fit" not in method_name:
# set request on fit
set_requests(estimator, methods=["fit"], metadata_name=key)
# make sure error message corresponding to `method_name`
# is used for test
requests_set_together = metaestimator.get("requests_set_together", {})
if method_name in requests_set_together.get("fit", {}):
#rotate on setting requests with leaving out only method_name
# fit before calling method
instance.fit(X, y, **method_kwargs)
try:
# `fit` and `partial_fit` accept y, others don't.
method(X, y, **method_kwargs)
except TypeError:
method(X, **method_kwargs)
But I need to think about it a little more. Next time it could be another method than fit
that would internally route metadata to more than one sub-estimator methods.
if method_name in ["predict", "score"]: | ||
# fit before calling method | ||
set_requests(estimator, methods=["fit"], metadata_name=key) | ||
if requests_set_together: | ||
set_requests( | ||
estimator, | ||
methods=requests_set_together["fit"], | ||
metadata_name=key, | ||
) | ||
fit_method = getattr(instance, "fit") | ||
fit_method(X, y, **method_kwargs, **extra_method_args) | ||
if method_name == "predict": | ||
method(X, **method_kwargs, **extra_method_args) | ||
else: | ||
method(X, y, **method_kwargs, **extra_method_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar pattern applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for simpli
sklearn/linear_model/_ransac.py
Outdated
f"{e}, which is used internally by `RANSACRegressor.fit()`." | ||
f"Call `{estimator.__class__.__name__}.set_{{method}}_request(" | ||
"{metadata}=True)` for each metadata." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. Cause I'm confused as why e
is at the beginning of the message.
set_requests(estimator, methods=["fit"], metadata_name=key) | ||
# make sure error message corresponding to `method_name` | ||
# is used for test | ||
if method_name != "score": | ||
set_requests(estimator, methods=["score"], metadata_name=key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the information on which metadata to set, should be in the dictionary where we declare the tests. This:
"requests_set_together": {"fit": ["fit", "score"]},
then we don't need to special case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we still do need that special case here, because in this case we need to deviate from how the test flows.
The test rotates through fit, predict and score and if we're at anything different than score, that we want to test if it raises an error, we need to set the score request, not the ones set together with fit.
sklearn/linear_model/_ransac.py
Outdated
f"{e}, which is used internally by `RANSACRegressor.fit()`." | ||
f"Call `{estimator.__class__.__name__}.set_{{method}}_request(" | ||
"{metadata}=True)` for each metadata." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two things I'm not sure about here:
metadata
andmethod
are not replaced with actual values, and I'm not sure if it makes the error message more understandable.- This is a pattern repeated pretty much everywhere when we call
process_routing
, do we want to change all of them with this pattern? I don't think we should it's really verbose. We can maybe improve the error message raised by validation insideprocess_routing
though.
- requests_set_together: a dict that defines which set_{method}_requests need | ||
to be set together with the key; used in case a router routes to different | ||
methods from the sub-estimator from withing the same meta-estimator's method. | ||
For instance, {"fit": ["score"]} would signal that | ||
`estimator.set_fit_request` premises `estimator.set_score_request` to be set | ||
as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the method mapping. I think when present, it should include all mappings instead of adding what's on top of a one-to-one mapping, which would also make the test code easier and more straightforward. It can also be called method_mapping
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we include the one-to-one mapping, in this case fit to fit, then we would need to a check if method_mapping
exists very early of the test and have a big devide of how to deal with the two different cases.
I had thought about this before, and came to the conclusion that it would make the test more complicated. After all, it's better to treat this further down in the test. This is why I designed it this way. Would you mind approving it as it is, @adrinjalali?
I went through the comments and reacted to all of them, @adrinjalali. Please have another look. I am wondering about the codecov warnings for l. 417 and 424 in Do you also want to have a look, @glemaitre? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d3ac747 shows my idea of how to do the multiple mapping for methods.
sklearn/linear_model/_ransac.py
Outdated
try: | ||
routed_params = process_routing(self, "fit", **fit_params) | ||
except UnsetMetadataPassedError as e: | ||
raise UnsetMetadataPassedError( | ||
message=( | ||
f"{e}, which is used internally by `RANSACRegressor.fit()`. " | ||
f"Call `{estimator.__class__.__name__}.set_{{method}}_request(" | ||
"{metadata}=True)` for each metadata." | ||
), | ||
unrequested_params=e.unrequested_params, | ||
routed_params=e.routed_params, | ||
) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't find the previous messages about this.
I don't think we should modify the exception here. The exception, if needs to be improved, needs to be handled for every estimator, and we shouldn't expect this much boilerplate from meta-estimators.
Okay, I have now reverted my re-raise the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
@OmarManzoor @glemaitre you could have a look here maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. Just a minor change suggested. Also code coverage shows two lines that are uncovered which related to the new ValueErrors that we are raising inside the test_metaestimators_metadata_routing file. Do we need to cover them @adrinjalali
Co-authored-by: Omar Salman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to cover those two lines, but it's nice to make it explicit that we're not covering them.
Co-authored-by: Adrin Jalali <[email protected]>
Co-authored-by: Adrin Jalali <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @StefanieSenger
Reference Issues/PRs
Towards #22893
What does this implement/fix? Explain your changes.
This PR is supposed to implement metadata routing for
RANSACRegressor
.Any other comments?
resolved issue:
RANSACRegressor.fit()
routes to the sub-estimator'sfit
,predict
andscore
. As a result fit runs into UnsetMetadataPassedError whenset_fit_request
is set, butset_score_request
is not. The tests need to allow for several set_{method}_requests be set at once, whatever is needed.