Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FIX Param validation: fix generating invalid param when 2 interval constraints #23513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 9, 2022

Conversation

jeremiedbb
Copy link
Member

Use case seen in #23499

It's possible that a parameter accepts float and int with different ranges (usually and int >= 1 meaning an absolute value or a float in [0, 1] meaning a fraction). In that case, generating an invalid param (for automatic testing) must take both constraints into account since we must find a value that is in neither of the intervals.

This PR fixes it but assumes that there will at most be 1 integer interval constraint and 1 real interval constraint. I don't think we ever need to have constraints be unions of more intervals in scikit-learn.

# there exists an int between the 2 intervals
return int_left - 1
else:
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice to have an additional error message to explain in which case we are just in case that we wrongly use the helper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we could add a check in the test for this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not that the helper is wrongly used. It means that there exists no invalid value for the constraint. For instance, all lists are valid for the constraint _InstancesOF(list) so you can't generate a list that does not satisfy the constraint.

I added a comment to explain that, and a test to cover this (although already covered by the common test)

# there exists an int between the 2 intervals
return int_right + 1
else:
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -415,7 +415,7 @@ def __str__(self):
)


def generate_invalid_param_val(constraint):
def generate_invalid_param_val(constraint, constraints=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it would be more explicit to have other_contraints instead of only constraints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not the other constraints, its all the constraints, including constraint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need the current constraint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constraints is the list of all constraints for the parameter. It would just be more complex to try to extract a specific constraint from the list before calling the function.

@jeremiedbb
Copy link
Member Author

I also just found that np.inf and np.nan were always considered to be in the interval. np.nan should never be in any interval and np.inf should only be valid if the bound is None and closed. I fixed that and added more tests

@@ -415,7 +415,7 @@ def __str__(self):
)


def generate_invalid_param_val(constraint):
def generate_invalid_param_val(constraint, constraints=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need the current constraint?

@glemaitre
Copy link
Member

np.inf should only be valid if the bound is None and closed

I think it was what we had in mind with @thomasjpfan.

@jeremiedbb
Copy link
Member Author

np.inf should only be valid if the bound is None and closed

I think it was what we had in mind with @thomasjpfan.

That was intended since the beginning but bugged and not well tested :)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


Parameters
----------
interval : Interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should specify that they are instances of Interval, Constraint, etc.

We can only correct the code touch now and make a small subsequent PR touching only the docstrings.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @jeremiedbb.

Comment on lines 436 to 438
constraints : list of _Constraint instances or None, default=None
The list of all constraints for this parameter. If None, the list containt only
the constraint is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constraints : list of _Constraint instances or None, default=None
The list of all constraints for this parameter. If None, the list containt only
the constraint is used.
constraints : list of _Constraint instances or None, default=None
The list of all constraints for this parameter. If None, the list only
contains the constraints which is used.

Copy link
Member Author

@jeremiedbb jeremiedbb Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sentence had a typo but was not clear at all. Here's what I meant. Is it clearer ?

Suggested change
constraints : list of _Constraint instances or None, default=None
The list of all constraints for this parameter. If None, the list containt only
the constraint is used.
constraints : list of _Constraint instances or None, default=None
The list of all constraints for this parameter. If None, the list only
containing `constraint` is used.

@jeremiedbb
Copy link
Member Author

Let's merge. Thanks for the reviews

@jeremiedbb jeremiedbb merged commit 02cbe01 into scikit-learn:main Jun 9, 2022
ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
…nstraints (scikit-learn#23513)

Co-authored-by: Julien Jerphanion <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants