Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT Make param validation more lenient towards downstream dependencies #25088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 2, 2022

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Dec 1, 2022

Currently validate_params checks that all entries in the constraints dict are actual parameters of the class. It can break third party estimators (it was actually catched trying imbalanced-learn with the 1.2.0rc1). For instance if someone does:

from sklearn.cluster import KMeans

class MyKMeans(KMeans):
    # this estimator does not expose "algorithm"                      should be here v
    def __init__(self, n_clusters, init, n_init, max_iter, tol, verbose, random_state):
        super().__init__(...)

    def fit(self, X):
        super().fit(X)
        return self

then it would raise an error because "algorithm" is in the constraint dict or the base class but not a param of the new class.

I think we should not enforce that for third party estimators. For scikit-learn estimators we do want to enforce it but we already check that we have a 1 to 1 matching in the common tests:

validation_params = estimator_orig._parameter_constraints.keys()
unexpected_params = set(validation_params) - set(estimator_params)
missing_params = set(estimator_params) - set(validation_params)
err_msg = (
f"Mismatch between _parameter_constraints and the parameters of {name}."
f"\nConsider the unexpected parameters {unexpected_params} and expected but"
f" missing parameters {missing_params}"
)
assert validation_params == estimator_params, err_msg

cc/ @glemaitre

@jeremiedbb jeremiedbb added To backport PR merged in master that need a backport to a release branch defined based on the milestone. No Changelog Needed Quick Review For PRs that are quick to review Validation related to input validation labels Dec 1, 2022
@jeremiedbb jeremiedbb added this to the 1.2 milestone Dec 1, 2022
Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I like the reasoning.

Worth adding a test that makes sure this behaviour works also in the future?

@betatim
Copy link
Member

betatim commented Dec 1, 2022

An alternative could be to check if the estimator that is being checked lives inside the sklearn namespace and be more lenient for estimators outside. However, that seems like it would require quite a bit of re-jigging of things (like passing in the estimator itself). So I think I like the solution in this PR more.

@jeremiedbb
Copy link
Member Author

Worth adding a test that makes sure this behaviour works also in the future?

Yes that's a good idea. Done

@thomasjpfan
Copy link
Member

An alternative could be to check if the estimator that is being checked lives inside the sklearn namespace and be more lenient for estimators outside.

This is the current behavior. check_param_validation itself is not a part of check_estimator and only tested for scikit-learn estimators:

@pytest.mark.parametrize(
"estimator", _tested_estimators(), ids=_get_check_estimator_ids
)
def test_check_param_validation(estimator):
name = estimator.__class__.__name__
_set_checking_parameters(estimator)
check_param_validation(name, estimator)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, otherwise LGTM

Comment on lines 638 to 639
# does not raise, niether because "b" is not in the constraints dict, neither
# because "a" is not a parameter of the estimator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is a little easier to parse:

Suggested change
# does not raise, niether because "b" is not in the constraints dict, neither
# because "a" is not a parameter of the estimator.
# does not raise, even tho "b" is not in the constraints dict and
# "a" is not a parameter of the estimator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively:

Suggested change
# does not raise, niether because "b" is not in the constraints dict, neither
# because "a" is not a parameter of the estimator.
# Does not raise an error, despite the fact that:
# 1) "a" is not a parameter of the estimator
# 2) "b" is not present in the constraints dict

Copy link
Contributor

@Micky774 Micky774 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending some wording update on the inline comment

@betatim
Copy link
Member

betatim commented Dec 2, 2022

This is the current behavior.

I'm confused. Why do we need this change then? This PR changes validate_parameter_constraints which is used in BaseEstimator._validate_params(), which is different from what happens in the tests.

My "alternative proposal" idea was to add logic to validate_parameter_constraints that would check if it is being called from an estimator that is part of scikit-learn proper or a third party estimator.

@jeremiedbb
Copy link
Member Author

Merging with 3 approvals, thanks

@jeremiedbb jeremiedbb merged commit 5e25f8e into scikit-learn:main Dec 2, 2022
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:utils No Changelog Needed Quick Review For PRs that are quick to review To backport PR merged in master that need a backport to a release branch defined based on the milestone. Validation related to input validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants