check_fit2d_1sample and check_fit2d_1feature expect very specific error message #12734

NicolasHug · 2018-12-07T03:36:35Z

check_fit2d_1sample and check_fit2d_1feature are part of the check_estimator test suite.

To pass those checks, the estimator either has to run gracefully, or raise ValueError with an error message containing some predefined substrings:

msgs = ["1 sample", "n_samples = 1", "n_samples=1", "one sample", "1 class", "one class"]

and

msgs = ["1 feature(s)", "n_features = 1", "n_features=1"]

This is quite restrictive. For example:

I have a custom estimator that does early stopping and X is split into train and validation data with train_test_split.

Passing only 1 sample (as in check_fit2d_1sample) to train_test_split will cause the train data to be empty. Of course an exception should be raised. But the appropriate message here is something along the lines of:
"Not enough training data to perform early stopping. Use more training data or adjust 'test_size'".

There are many ways to get an empty train data from train_test_split, passing only one training sample is just one of them. Another way would be to set the test_size param to e.g. .99 when n_samples < 100.

Using one of the required substrings would not make sense here.

So if I want to pass estimator_checks ~~I'm bound to have a very specific check in my code that checks when n_samples==1 and raises a message with one of the appropriate substrings.~~ I guess in my case I could add something like "got n_samples={n_samples}" to the message. But I'm still not sure if forcing a given substring makes sense in all situations.

TLDR: the reason the estimator fails in the context of those checks may be much more general than just because we passed 1 sample (or 1 feature). To pass the tests though, the error message is restricted to a very small subset of causes.

Happy to submit a PR.

The text was updated successfully, but these errors were encountered:

albertcthomas · 2018-12-07T10:00:25Z

Is it meaningful to pass only 1 sample to train_test_split?

NicolasHug · 2018-12-07T11:52:56Z

No.

But my point is that in the context of my estimator it is just as (not) meaningful to pass one sample as to pass, e.g., test_size=99.9999999%, or any other input combination that would result in an empty train set.

In my situation it makes more sense to have one unique error message that handles all those cases than one error message per case.

albertcthomas · 2018-12-07T13:10:09Z

Is it meaningful to pass only 1 sample to train_test_split?

Just saw issue #11028 about passing one sample to train_test_split

albertcthomas · 2018-12-07T13:14:34Z

I see... FWIW if you are using check_array you can set ensure_min_samples to 2.

jnothman · 2018-12-09T19:54:15Z

I think the argument is that the check forces the code to validate in a particular order of precedence, which is a bit silly. Not sure it's a big deal given ensure_min_samples...

NicolasHug · 2020-10-26T15:44:39Z

Early close since this will be addressed by #18582

NicolasHug closed this as completed Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

check_fit2d_1sample and check_fit2d_1feature expect very specific error message #12734

check_fit2d_1sample and check_fit2d_1feature expect very specific error message #12734

NicolasHug commented Dec 7, 2018 •

edited

Loading

albertcthomas commented Dec 7, 2018

Uh oh!

NicolasHug commented Dec 7, 2018

Uh oh!

albertcthomas commented Dec 7, 2018

Uh oh!

albertcthomas commented Dec 7, 2018

Uh oh!

jnothman commented Dec 9, 2018 via email

Uh oh!

NicolasHug commented Oct 26, 2020

Uh oh!

Uh oh!

check_fit2d_1sample and check_fit2d_1feature expect very specific error message #12734

check_fit2d_1sample and check_fit2d_1feature expect very specific error message #12734

Comments

NicolasHug commented Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

albertcthomas commented Dec 7, 2018

Uh oh!

NicolasHug commented Dec 7, 2018

Uh oh!

albertcthomas commented Dec 7, 2018

Uh oh!

albertcthomas commented Dec 7, 2018

Uh oh!

jnothman commented Dec 9, 2018 via email

Uh oh!

NicolasHug commented Oct 26, 2020

Uh oh!

NicolasHug commented Dec 7, 2018 •

edited

Loading