-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
check_fit2d_1sample and check_fit2d_1feature expect very specific error message #12734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is it meaningful to pass only 1 sample to train_test_split? |
No. But my point is that in the context of my estimator it is just as (not) meaningful to pass one sample as to pass, e.g., test_size=99.9999999%, or any other input combination that would result in an empty train set. In my situation it makes more sense to have one unique error message that handles all those cases than one error message per case. |
Just saw issue #11028 about passing one sample to |
I see... FWIW if you are using |
I think the argument is that the check forces the code to validate in a
particular order of precedence, which is a bit silly. Not sure it's a big
deal given ensure_min_samples...
|
Early close since this will be addressed by #18582 |
Uh oh!
There was an error while loading. Please reload this page.
check_fit2d_1sample and check_fit2d_1feature are part of the
check_estimator
test suite.To pass those checks, the estimator either has to run gracefully, or raise
ValueError
with an error message containing some predefined substrings:and
This is quite restrictive. For example:
I have a custom estimator that does early stopping and
X
is split into train and validation data withtrain_test_split
.Passing only 1 sample (as in
check_fit2d_1sample
) totrain_test_split
will cause the train data to be empty. Of course an exception should be raised. But the appropriate message here is something along the lines of:"Not enough training data to perform early stopping. Use more training data or adjust 'test_size'".
There are many ways to get an empty train data from
train_test_split
, passing only one training sample is just one of them. Another way would be to set thetest_size
param to e.g..99
whenn_samples < 100
.Using one of the required substrings would not make sense here.
So if I want to pass
estimator_checks
I'm bound to have a very specific check in my code that checks whenI guess in my case I could add something like "got n_samples={n_samples}" to the message. But I'm still not sure if forcing a given substring makes sense in all situations.n_samples==1
and raises a message with one of the appropriate substrings.TLDR: the reason the estimator fails in the context of those checks may be much more general than just because we passed 1 sample (or 1 feature). To pass the tests though, the error message is restricted to a very small subset of causes.
Happy to submit a PR.
The text was updated successfully, but these errors were encountered: