Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Question, Documentation] Metadata Routing, indicate metadata is required by a method #28324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eddiebergman opened this issue Jan 31, 2024 · 1 comment
Labels
Documentation Needs Triage Issue requires triage

Comments

@eddiebergman
Copy link
Contributor

eddiebergman commented Jan 31, 2024

Describe the issue linked to the documentation

From my understanding, there is no way to specify that some metadata is required with set_*_request(...).

Doc: https://scikit-learn.org/stable/metadata_routing.html#api-interface

It is possible to specify that some method will error if it is provided, but no converse option to error if it is not provided.

I've also read SLEP006 and the milestone issue #22893 and could not find a mention of this.

I am okay with this if that's the case but I would rather it be explicitly stated somewhere in the linked docs that this is not a feature that's supported.


Reason for clarification:

  • Testing some custom evaluator that routes will fail if using a GroupKFold with a ValueError, i.e. groups= was never specified. I would have expected this to raise some specific XXXMetaDataError, similar to UnsetMetadataPassedError for when some metadata is passed which is not required.
def test_custom_evaluator_forwards_splitter_params_correctly():
	custom_evaluator = CustomEvalutor(..., splitter=GroupKFold(...), params={})
	
	# Failing due to required metadata (an exceptional case of metadata for Group* splitters)
	# However it's a generic `ValueError` and not a MetaData kind of error.
	with pytest.raises(ValueError, match="The 'groups' parameter should not be None."):
		custom_evaluator.blub(...)
	
	# All good, parameters specified
	custom_evaluator = CustomEvalutor(..., splitter=GroupKFold(...), params={"groups": groups})
	custom_evaluator.blub()

Naturally, I wanted to test if this works for an estimator too.

def test_custom_evaluator_forwards_estimator_params_correctly():
    estimator = DummyClassifier()
    
    # True, False, None, "sample_weight" can't indicate that this **needs** sample weights
    estimator.set_fit_request(sample_weight=...) 
    
	custom_evaluator = CustomEvalutor(estimator, params={})
	
	# No error will be raised in any case, can't use this to test that sample_weight actually
	# got passed
	with pytest.raises(SomeMetaDataError):
		custom_evaluator.blub(...)
	
	# Will pass regardless, I just don't know if my CustomEvaluator actually did what it
	# was meant to
	custom_evaluator = CustomEvalutor(estimator, params={"sample_weight": sample_weight})
	custom_evaluator.blub(...)

Suggest a potential alternative/fix

If there is no way to specify that some metadata is required, then to explicitly document this.

I would propose the following last bullet point to the documentation:

Here value can be:

    * True: method requests a sample_weight. This means if the metadata is provided, it will be used, otherwise no error is raised.

    * False: method does not request a sample_weight.

    * None: router will raise an error if sample_weight is passed. This is in almost all cases the default value when an object is instantiated and ensures the user sets the metadata requests explicitly when a metadata is passed. The only exception are Group*Fold splitters.

    * "param_name": if this estimator is used in a meta-estimator, the meta-estimator should forward "param_name" as sample_weight to this estimator. This means the mapping between the metadata required by the object, e.g. sample_weight and what is provided by the user, e.g. my_weights is done at the router level, and not by the object, e.g. estimator, itself.
    
    # This line
    # ------------------------
    * It is not possible to indicate that a method **requires** metadata to be provided.
    # ------------------------
@eddiebergman eddiebergman added Documentation Needs Triage Issue requires triage labels Jan 31, 2024
@eddiebergman eddiebergman changed the title [Question, Documentation] Metadata Routing, required metadata [Question, Documentation] Metadata Routing, indicate metadata is **required** by a method Jan 31, 2024
@eddiebergman eddiebergman changed the title [Question, Documentation] Metadata Routing, indicate metadata is **required** by a method [Question, Documentation] Metadata Routing, indicate metadata is required by a method Jan 31, 2024
@adrinjalali
Copy link
Member

This is basically a duplicate / variation on #23920, so closing, and let's continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

2 participants