[RFC] Stateless transformers requiring fit? #12616

amueller · 2018-11-19T18:26:59Z

Right now there's some estimators that don't require calling "fit", two that I'm aware of: Normalizer and FunctionTransformer. They do input validation if fit is called.
There's one estimator that is stateless but requires calling fit for no real reason I can see, AdditiveChi2Sampler.

My questions are:

Should we remove the requirement to calling fit if it can be avoided?
If fit is called, should we ensure that the number of features is the same in fit and transform, even though that's not required by the algorithm to avoid user errors?

The text was updated successfully, but these errors were encountered:

jnothman · 2018-11-19T22:58:18Z

Also consider HashingVectorizer, and CountVectorizer with given vocabulary, and OneHotEncoder and OrdinalEncoder with given categories...

qinhanmin2014 · 2018-11-20T01:42:05Z

Should we remove the requirement to calling fit if it can be avoided?

+1. This can avoid awkward things like #12514 and maybe more user-friendly.

itanvir · 2022-04-25T22:19:09Z

This issue is still open. Any workaround when we have "known" categories in OneHotEncoder?

glemaitre · 2022-08-30T14:41:37Z

We are dealing with this issue now with the parameter validation: some stateless estimators would like to validate the validity of some input parameters, usually done at fit even though we will not learn anything on X.

I would be in favour of always having to call fit to validate those parameters and keep the stateless meaning for the estimator not extracting information from the training X useful to transform any X.

At least, having this behaviour will not make these estimators different from others but would still have the tag to ensure some mathematical consistency regarding their stateless aspect.

glemaitre · 2023-01-31T17:55:03Z

This PR can be closed since #25190 solve the issue and define the behaviour of the stateless estimator. #24230 also defines a new stateless transformer.

glemaitre · 2023-01-31T17:56:25Z

I see that I forgot to mention that in one of the meetings we propose to always make parameter validation in fit but we don't want to request calling it. We added a common test to ensure that this behavior is consistent across scikit-learn.

amueller added the API label Nov 19, 2018

amueller mentioned this issue Nov 19, 2018

[MRG] Estimator tags #8022

Merged

4 tasks

qinhanmin2014 mentioned this issue Jul 12, 2019

OneHotEncoder: Fit required even if defining the categories manually #14310

Closed

smyskoff mentioned this issue Feb 28, 2020

[MRG] Encoders: make it optional to fit if categories are given (Issue #12616) #16591

Closed

cmarmo added the Needs Decision - API label Feb 6, 2022

glemaitre mentioned this issue Aug 24, 2022

API make PatchExtractor being a real scikit-learn transformer #24230

Merged

Vincent-Maladiere mentioned this issue Dec 14, 2022

MAINT make AdditiveChi2Sampler stateless and check that stateless Transformers don't raise NotFittedError #25190

Merged

glemaitre closed this as completed Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] Stateless transformers requiring fit? #12616

[RFC] Stateless transformers requiring fit? #12616

amueller commented Nov 19, 2018 •

edited by glemaitre

Loading

jnothman commented Nov 19, 2018 via email

Uh oh!

qinhanmin2014 commented Nov 20, 2018

Uh oh!

itanvir commented Apr 25, 2022

Uh oh!

glemaitre commented Aug 30, 2022

Uh oh!

glemaitre commented Jan 31, 2023

Uh oh!

glemaitre commented Jan 31, 2023

Uh oh!

Uh oh!

[RFC] Stateless transformers requiring fit? #12616

[RFC] Stateless transformers requiring fit? #12616

Comments

amueller commented Nov 19, 2018 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jnothman commented Nov 19, 2018 via email

Uh oh!

qinhanmin2014 commented Nov 20, 2018

Uh oh!

itanvir commented Apr 25, 2022

Uh oh!

glemaitre commented Aug 30, 2022

Uh oh!

glemaitre commented Jan 31, 2023

Uh oh!

glemaitre commented Jan 31, 2023

Uh oh!

amueller commented Nov 19, 2018 •

edited by glemaitre

Loading