MAINT Make param validation more lenient towards downstream dependencies #25088

jeremiedbb · 2022-12-01T15:03:52Z

Currently validate_params checks that all entries in the constraints dict are actual parameters of the class. It can break third party estimators (it was actually catched trying imbalanced-learn with the 1.2.0rc1). For instance if someone does:

from sklearn.cluster import KMeans

class MyKMeans(KMeans):
    # this estimator does not expose "algorithm"                      should be here v
    def __init__(self, n_clusters, init, n_init, max_iter, tol, verbose, random_state):
        super().__init__(...)

    def fit(self, X):
        super().fit(X)
        return self

then it would raise an error because "algorithm" is in the constraint dict or the base class but not a param of the new class.

I think we should not enforce that for third party estimators. For scikit-learn estimators we do want to enforce it but we already check that we have a 1 to 1 matching in the common tests:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 4047 to 4055 in 2b34dfd

    
           validation_params = estimator_orig._parameter_constraints.keys() 
        
           unexpected_params = set(validation_params) - set(estimator_params) 
        
           missing_params = set(estimator_params) - set(validation_params) 
        
           err_msg = ( 
        
               f"Mismatch between _parameter_constraints and the parameters of {name}." 
        
               f"\nConsider the unexpected parameters {unexpected_params} and expected but" 
        
               f" missing parameters {missing_params}" 
        
           ) 
        
           assert validation_params == estimator_params, err_msg

cc/ @glemaitre

betatim

LGTM and I like the reasoning.

Worth adding a test that makes sure this behaviour works also in the future?

betatim · 2022-12-01T15:36:55Z

An alternative could be to check if the estimator that is being checked lives inside the sklearn namespace and be more lenient for estimators outside. However, that seems like it would require quite a bit of re-jigging of things (like passing in the estimator itself). So I think I like the solution in this PR more.

jeremiedbb · 2022-12-01T15:45:07Z

Worth adding a test that makes sure this behaviour works also in the future?

Yes that's a good idea. Done

thomasjpfan · 2022-12-01T18:24:52Z

An alternative could be to check if the estimator that is being checked lives inside the sklearn namespace and be more lenient for estimators outside.

This is the current behavior. check_param_validation itself is not a part of check_estimator and only tested for scikit-learn estimators:

scikit-learn/sklearn/tests/test_common.py

Lines 460 to 466 in 04f3298

    
           @pytest.mark.parametrize( 
        
               "estimator", _tested_estimators(), ids=_get_check_estimator_ids 
        
           ) 
        
           def test_check_param_validation(estimator): 
        
               name = estimator.__class__.__name__ 
        
               _set_checking_parameters(estimator) 
        
               check_param_validation(name, estimator)

thomasjpfan

Minor comment, otherwise LGTM

thomasjpfan · 2022-12-01T18:28:00Z

sklearn/utils/tests/test_param_validation.py

+    # does not raise, niether because "b" is not in the constraints dict, neither
+    # because "a" is not a parameter of the estimator.


Nit: I think this is a little easier to parse:

Suggested change

# does not raise, niether because "b" is not in the constraints dict, neither

# because "a" is not a parameter of the estimator.

# does not raise, even tho "b" is not in the constraints dict and

# "a" is not a parameter of the estimator.

Alternatively:

Suggested change

# does not raise, niether because "b" is not in the constraints dict, neither

# because "a" is not a parameter of the estimator.

# Does not raise an error, despite the fact that:

# 1) "a" is not a parameter of the estimator

# 2) "b" is not present in the constraints dict

Micky774

LGTM pending some wording update on the inline comment

betatim · 2022-12-02T09:47:14Z

This is the current behavior.

I'm confused. Why do we need this change then? This PR changes validate_parameter_constraints which is used in BaseEstimator._validate_params(), which is different from what happens in the tests.

My "alternative proposal" idea was to add logic to validate_parameter_constraints that would check if it is being called from an estimator that is part of scikit-learn proper or a third party estimator.

…m-dep

jeremiedbb · 2022-12-02T11:46:56Z

Merging with 3 approvals, thanks

…ies (scikit-learn#25088)

do not force 1 to 1 matching

b50f64e

jeremiedbb added To backport PR merged in master that need a backport to a release branch defined based on the milestone. No Changelog Needed Quick Review For PRs that are quick to review Validation related to input validation labels Dec 1, 2022

jeremiedbb added this to the 1.2 milestone Dec 1, 2022

github-actions bot added the module:utils label Dec 1, 2022

betatim approved these changes Dec 1, 2022

View reviewed changes

add test

b580050

thomasjpfan approved these changes Dec 1, 2022

View reviewed changes

Micky774 approved these changes Dec 1, 2022

View reviewed changes

jeremiedbb added 2 commits December 2, 2022 11:28

Merge remote-tracking branch 'upstream/main' into param-val-downstrea…

91aa15d

…m-dep

reword

3748023

betatim approved these changes Dec 2, 2022

View reviewed changes

jeremiedbb merged commit 5e25f8e into scikit-learn:main Dec 2, 2022

jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 6, 2022

MAINT Make param validation more lenient towards downstream dependenc…

5d594f3

…ies (scikit-learn#25088)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Make param validation more lenient towards downstream dependencies #25088

MAINT Make param validation more lenient towards downstream dependencies #25088

Uh oh!

jeremiedbb commented Dec 1, 2022 •

edited by glemaitre

Loading

Uh oh!

betatim left a comment

Uh oh!

betatim commented Dec 1, 2022 •

edited

Loading

Uh oh!

jeremiedbb commented Dec 1, 2022

Uh oh!

thomasjpfan commented Dec 1, 2022

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Dec 1, 2022

Uh oh!

Micky774 Dec 1, 2022

Uh oh!

Micky774 left a comment

Uh oh!

betatim commented Dec 2, 2022

Uh oh!

jeremiedbb commented Dec 2, 2022

Uh oh!

Uh oh!

	validation_params = estimator_orig._parameter_constraints.keys()
	unexpected_params = set(validation_params) - set(estimator_params)
	missing_params = set(estimator_params) - set(validation_params)
	err_msg = (
	f"Mismatch between _parameter_constraints and the parameters of {name}."
	f"\nConsider the unexpected parameters {unexpected_params} and expected but"
	f" missing parameters {missing_params}"
	)
	assert validation_params == estimator_params, err_msg

		# does not raise, niether because "b" is not in the constraints dict, neither
		# because "a" is not a parameter of the estimator.

-    # does not raise, niether because "b" is not in the constraints dict, neither
-    # because "a" is not a parameter of the estimator.
+    # Does not raise an error, despite the fact that:
+    # 1) "a" is not a parameter of the estimator
+    # 2) "b" is not present in the constraints dict

Uh oh!

MAINT Make param validation more lenient towards downstream dependencies #25088

MAINT Make param validation more lenient towards downstream dependencies #25088

Uh oh!

Conversation

jeremiedbb commented Dec 1, 2022 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

betatim left a comment

Choose a reason for hiding this comment

Uh oh!

betatim commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Dec 1, 2022

Uh oh!

thomasjpfan commented Dec 1, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

Micky774 Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

Micky774 left a comment

Choose a reason for hiding this comment

Uh oh!

betatim commented Dec 2, 2022

Uh oh!

jeremiedbb commented Dec 2, 2022

Uh oh!

Uh oh!

jeremiedbb commented Dec 1, 2022 •

edited by glemaitre

Loading

betatim commented Dec 1, 2022 •

edited

Loading