Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Question, Documentation] Metadata Routing, indicate metadata is required by a method #28324

Closed
@eddiebergman

Description

@eddiebergman

Describe the issue linked to the documentation

From my understanding, there is no way to specify that some metadata is required with set_*_request(...).

Doc: https://scikit-learn.org/stable/metadata_routing.html#api-interface

It is possible to specify that some method will error if it is provided, but no converse option to error if it is not provided.

I've also read SLEP006 and the milestone issue #22893 and could not find a mention of this.

I am okay with this if that's the case but I would rather it be explicitly stated somewhere in the linked docs that this is not a feature that's supported.


Reason for clarification:

  • Testing some custom evaluator that routes will fail if using a GroupKFold with a ValueError, i.e. groups= was never specified. I would have expected this to raise some specific XXXMetaDataError, similar to UnsetMetadataPassedError for when some metadata is passed which is not required.
def test_custom_evaluator_forwards_splitter_params_correctly():
	custom_evaluator = CustomEvalutor(..., splitter=GroupKFold(...), params={})
	
	# Failing due to required metadata (an exceptional case of metadata for Group* splitters)
	# However it's a generic `ValueError` and not a MetaData kind of error.
	with pytest.raises(ValueError, match="The 'groups' parameter should not be None."):
		custom_evaluator.blub(...)
	
	# All good, parameters specified
	custom_evaluator = CustomEvalutor(..., splitter=GroupKFold(...), params={"groups": groups})
	custom_evaluator.blub()

Naturally, I wanted to test if this works for an estimator too.

def test_custom_evaluator_forwards_estimator_params_correctly():
    estimator = DummyClassifier()
    
    # True, False, None, "sample_weight" can't indicate that this **needs** sample weights
    estimator.set_fit_request(sample_weight=...) 
    
	custom_evaluator = CustomEvalutor(estimator, params={})
	
	# No error will be raised in any case, can't use this to test that sample_weight actually
	# got passed
	with pytest.raises(SomeMetaDataError):
		custom_evaluator.blub(...)
	
	# Will pass regardless, I just don't know if my CustomEvaluator actually did what it
	# was meant to
	custom_evaluator = CustomEvalutor(estimator, params={"sample_weight": sample_weight})
	custom_evaluator.blub(...)

Suggest a potential alternative/fix

If there is no way to specify that some metadata is required, then to explicitly document this.

I would propose the following last bullet point to the documentation:

Here value can be:

    * True: method requests a sample_weight. This means if the metadata is provided, it will be used, otherwise no error is raised.

    * False: method does not request a sample_weight.

    * None: router will raise an error if sample_weight is passed. This is in almost all cases the default value when an object is instantiated and ensures the user sets the metadata requests explicitly when a metadata is passed. The only exception are Group*Fold splitters.

    * "param_name": if this estimator is used in a meta-estimator, the meta-estimator should forward "param_name" as sample_weight to this estimator. This means the mapping between the metadata required by the object, e.g. sample_weight and what is provided by the user, e.g. my_weights is done at the router level, and not by the object, e.g. estimator, itself.
    
    # This line
    # ------------------------
    * It is not possible to indicate that a method **requires** metadata to be provided.
    # ------------------------

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions