-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Param validation: fix generating invalid param when 2 interval constraints #23513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Param validation: fix generating invalid param when 2 interval constraints #23513
Conversation
# there exists an int between the 2 intervals | ||
return int_left - 1 | ||
else: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to have an additional error message to explain in which case we are just in case that we wrongly use the helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we could add a check in the test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not that the helper is wrongly used. It means that there exists no invalid value for the constraint. For instance, all lists are valid for the constraint _InstancesOF(list) so you can't generate a list that does not satisfy the constraint.
I added a comment to explain that, and a test to cover this (although already covered by the common test)
# there exists an int between the 2 intervals | ||
return int_right + 1 | ||
else: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -415,7 +415,7 @@ def __str__(self): | |||
) | |||
|
|||
|
|||
def generate_invalid_param_val(constraint): | |||
def generate_invalid_param_val(constraint, constraints=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it would be more explicit to have other_contraints
instead of only constraints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not the other constraints, its all the constraints, including constraint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we need the current constraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constraints is the list of all constraints for the parameter. It would just be more complex to try to extract a specific constraint from the list before calling the function.
I also just found that |
@@ -415,7 +415,7 @@ def __str__(self): | |||
) | |||
|
|||
|
|||
def generate_invalid_param_val(constraint): | |||
def generate_invalid_param_val(constraint, constraints=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we need the current constraint?
I think it was what we had in mind with @thomasjpfan. |
That was intended since the beginning but bugged and not well tested :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sklearn/utils/_param_validation.py
Outdated
|
||
Parameters | ||
---------- | ||
interval : Interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should specify that they are instances of Interval, Constraint, etc.
We can only correct the code touch now and make a small subsequent PR touching only the docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, @jeremiedbb.
sklearn/utils/_param_validation.py
Outdated
constraints : list of _Constraint instances or None, default=None | ||
The list of all constraints for this parameter. If None, the list containt only | ||
the constraint is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constraints : list of _Constraint instances or None, default=None | |
The list of all constraints for this parameter. If None, the list containt only | |
the constraint is used. | |
constraints : list of _Constraint instances or None, default=None | |
The list of all constraints for this parameter. If None, the list only | |
contains the constraints which is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sentence had a typo but was not clear at all. Here's what I meant. Is it clearer ?
constraints : list of _Constraint instances or None, default=None | |
The list of all constraints for this parameter. If None, the list containt only | |
the constraint is used. | |
constraints : list of _Constraint instances or None, default=None | |
The list of all constraints for this parameter. If None, the list only | |
containing `constraint` is used. |
Co-authored-by: Julien Jerphanion <[email protected]>
Let's merge. Thanks for the reviews |
…nstraints (scikit-learn#23513) Co-authored-by: Julien Jerphanion <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]>
Use case seen in #23499
It's possible that a parameter accepts float and int with different ranges (usually and
int >= 1
meaning an absolute value or afloat in [0, 1]
meaning a fraction). In that case, generating an invalid param (for automatic testing) must take both constraints into account since we must find a value that is in neither of the intervals.This PR fixes it but assumes that there will at most be 1 integer interval constraint and 1 real interval constraint. I don't think we ever need to have constraints be unions of more intervals in scikit-learn.