Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MNT Param validation: Add a common test for param validation of public functions #23514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jeremiedbb
Copy link
Member

This PR adds a test for checking param validation of public functions, similar to the for testing estimators

def check_param_validation(name, estimator_orig):

Usually we define a list of all the functions/estimators we want to be tested and comment them all but here I did not find very clear which functions are really considered public so I chose to define an empty list that we can fill incrementally. We can still replace by the other option later.

Did not find an obvious existing test file for this test so added a new one in sklearn/tests. Maybe I missed an obvious location ?

@glemaitre
Copy link
Member

Did not find an obvious existing test file for this test so added a new one in sklearn/tests. Maybe I missed an obvious location ?

This is something that we should probably address when refactoring the common tests. For the moment, IMO it is fine.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments but overall I think this is fine.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb
Copy link
Member Author

@glemaitre the code has changed significantly after irl discussions. You might want to take another look

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after resolving the coverage.

# The dict of parameter constraints is set as an attribute of the function
# to make it possible to dynamically introspect the constraints for
# automatic testing.
setattr(func, "_skl_parameter_constraints", parameter_constraints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are storing the constraints in the function itself, I'll prefer the function become a "class that is callable". i.e. a class that defines __call__.

To me, using setattr on a function feels like a hack.

Copy link
Member Author

@jeremiedbb jeremiedbb Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an issue with that approach. The decorator works with functions and methods. If the decorator returns an object I think it can't work as expected to replace a method. But maybe I just don't know how to do it. Here's what I came up with:

def validate_params(parameter_constraints):

    def decorator(func):

        @functools.wraps(func, updated=())
        class wrapper:
            def __init__(self):
                self._skl_parameter_constraints = parameter_constraints

            def __call__(self, *args, **kwargs):
                func_sig = signature(func)

                # Map *args/**kwargs to the function signature
                params = func_sig.bind(*args, **kwargs)
                params.apply_defaults()

                # ignore self/cls and positional/keyword markers
                to_ignore = [
                    p.name
                    for p in func_sig.parameters.values()
                    if p.kind in (p.VAR_POSITIONAL, p.VAR_KEYWORD)
                ]
                to_ignore += ["self", "cls"]
                params = {k: v for k, v in params.arguments.items() if k not in to_ignore}

                validate_parameter_constraints(
                    self._skl_parameter_constraints, params, caller_name=func.__qualname__
                )

                return func(*args, **kwargs)

        return wrapper()

    return decorator

maybe someone has an idea ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not sure it's that less hackish that setting an attribute on a function 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that replacing a method by an object that is callable doesn't make it a method but makes it an attribute of the class that is callable. Thus, when calling it, self is not passed as first argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing IRL with @jeremiedbb, it seems that making it callable makes things more complex. I would be inclined to keep setting the attribute on the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also possible to use the decorator on methods. I can think that we could validate kernels (of gaussian process) or splitters that are not proper estimators. So it could be handy to still make it possible to validate parameters for such classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with leaving this as is. I think to actually use a "class callable" with a decorator, it would end up something like available_if and using class descriptor's:

def available_if(check):

On that note, can you check to make sure that the current implementation does not run into issues like #21344 which was fixed in #23077?

Copy link
Member Author

@jeremiedbb jeremiedbb Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, I forgot about available_if !
Let me try to implement this and we'll chose the best solution EDIT: this would only work on methods but no longer on functions. I guess there's no simple way to make it work on both and would require to implement 2 versions of the decorator, making the whole thing a lot more complex. So I'm also keen on leaving the PR as is 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the attribute solution is the easiest even if it populates the callables' namespaces.

@jeremiedbb jeremiedbb added the Validation related to input validation label Jun 13, 2022
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jeremiedbb.

# The dict of parameter constraints is set as an attribute of the function
# to make it possible to dynamically introspect the constraints for
# automatic testing.
setattr(func, "_skl_parameter_constraints", parameter_constraints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the attribute solution is the easiest even if it populates the callables' namespaces.

@glemaitre glemaitre merged commit cc6806b into scikit-learn:main Jun 20, 2022
@glemaitre
Copy link
Member

lgtm

ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants