Description
Describe the issue linked to the documentation
From my understanding, there is no way to specify that some metadata is required with set_*_request(...)
.
Doc: https://scikit-learn.org/stable/metadata_routing.html#api-interface
It is possible to specify that some method will error if it is provided, but no converse option to error if it is not provided.
I've also read SLEP006 and the milestone issue #22893 and could not find a mention of this.
I am okay with this if that's the case but I would rather it be explicitly stated somewhere in the linked docs that this is not a feature that's supported.
Reason for clarification:
- Testing some custom evaluator that routes will fail if using a
GroupKFold
with aValueError
, i.e.groups=
was never specified. I would have expected this to raise some specificXXXMetaDataError
, similar toUnsetMetadataPassedError
for when some metadata is passed which is not required.
def test_custom_evaluator_forwards_splitter_params_correctly():
custom_evaluator = CustomEvalutor(..., splitter=GroupKFold(...), params={})
# Failing due to required metadata (an exceptional case of metadata for Group* splitters)
# However it's a generic `ValueError` and not a MetaData kind of error.
with pytest.raises(ValueError, match="The 'groups' parameter should not be None."):
custom_evaluator.blub(...)
# All good, parameters specified
custom_evaluator = CustomEvalutor(..., splitter=GroupKFold(...), params={"groups": groups})
custom_evaluator.blub()
Naturally, I wanted to test if this works for an estimator too.
def test_custom_evaluator_forwards_estimator_params_correctly():
estimator = DummyClassifier()
# True, False, None, "sample_weight" can't indicate that this **needs** sample weights
estimator.set_fit_request(sample_weight=...)
custom_evaluator = CustomEvalutor(estimator, params={})
# No error will be raised in any case, can't use this to test that sample_weight actually
# got passed
with pytest.raises(SomeMetaDataError):
custom_evaluator.blub(...)
# Will pass regardless, I just don't know if my CustomEvaluator actually did what it
# was meant to
custom_evaluator = CustomEvalutor(estimator, params={"sample_weight": sample_weight})
custom_evaluator.blub(...)
Suggest a potential alternative/fix
If there is no way to specify that some metadata is required, then to explicitly document this.
I would propose the following last bullet point to the documentation:
Here value can be:
* True: method requests a sample_weight. This means if the metadata is provided, it will be used, otherwise no error is raised.
* False: method does not request a sample_weight.
* None: router will raise an error if sample_weight is passed. This is in almost all cases the default value when an object is instantiated and ensures the user sets the metadata requests explicitly when a metadata is passed. The only exception are Group*Fold splitters.
* "param_name": if this estimator is used in a meta-estimator, the meta-estimator should forward "param_name" as sample_weight to this estimator. This means the mapping between the metadata required by the object, e.g. sample_weight and what is provided by the user, e.g. my_weights is done at the router level, and not by the object, e.g. estimator, itself.
# This line
# ------------------------
* It is not possible to indicate that a method **requires** metadata to be provided.
# ------------------------