Thanks to visit codestin.com
Credit goes to github.com

Skip to content

API Freezing estimators #8370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Feb 16, 2017 · 36 comments
Closed

API Freezing estimators #8370

jnothman opened this issue Feb 16, 2017 · 36 comments
Labels

Comments

@jnothman
Copy link
Member

jnothman commented Feb 16, 2017

TODO: motivate freezing: pipeline components, calibration, transfer/semisupervised learning

This should probably be a SLEP, but I just want it saved somewhere.

Features required for estimator freezing:

  • clone must have is isinstance(obj, FrozenModel): return obj (or do so via class polymorphism / singledispatch)
  • FrozenModel delegates all attribute access (get, set, del) to its wrapped estimator (except where specified)
    • hence its estimator cannot be accessible at FrozenModel().estimator but at some more munged name.
  • FrozenModel has def fit(self, *args, **kwargs): return self
  • FrozenModel has def fit_transform(self, *args, **kwargs): return fit(self, *args, **kwargs).transform(self, args[0]) (and similar for fit_predict?)
  • isinstance(freeze(obj), type(obj)) == True and isinstance(freeze(obj), FrozenModel) == True
    • since this is determined from type(freeze(obj)) (excluding __instancecheck__, which seems irrelevant), this appears to be the hardest criterion to fulfill
    • seems to entail use of a mixin, class created in closure, setting __class__ (!), overloading of __reduce__, help! I think I've gone down the wrong path!!
  • must behave nicely with pickle and copy.[deep]copy
  • freeze(some_list) will freeze every element of the list
@jnothman
Copy link
Member Author

We could give up on the isinstance criterion and require that users do not use isinstance with estimator types on the RHS except in testing and similar. Uses of isinstance with estimator types in the repo include:


sklearn/ensemble/partial_dependence.py:    if not isinstance(gbrt, BaseGradientBoosting):
sklearn/ensemble/partial_dependence.py:    if not isinstance(gbrt, BaseGradientBoosting):
sklearn/ensemble/weight_boosting.py:                isinstance(self.base_estimator, (BaseDecisionTree,
sklearn/ensemble/weight_boosting.py:            if isinstance(self, ClassifierMixin):
sklearn/ensemble/weight_boosting.py:                isinstance(self.base_estimator,
sklearn/linear_model/coordinate_descent.py:        if isinstance(self, ElasticNetCV) or isinstance(self, LassoCV):
sklearn/neighbors/graph.py:    if not isinstance(X, KNeighborsMixin):
sklearn/neighbors/graph.py:    if not isinstance(X, RadiusNeighborsMixin):
sklearn/neural_network/multilayer_perceptron.py:        if not isinstance(self, ClassifierMixin):
sklearn/neural_network/multilayer_perceptron.py:            if isinstance(self, ClassifierMixin):
sklearn/tree/tree.py:        is_classification = isinstance(self, ClassifierMixin)
sklearn/tree/tree.py:        if isinstance(self, ClassifierMixin):

@jnothman
Copy link
Member Author

I've not checked whether any of these are reasonable use contexts for frozen models.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Feb 16, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Feb 16, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Feb 16, 2017 via email

@glemaitre
Copy link
Member

I was checking how transfer learning is achieved in Keras.
Porting their design into sklearn, the base estimator would have a trainable member which can be turned True/False triggering or not the fit method.
In this case, a freezing method would just turn on/off trainable.

In this manner, you would not require a __dict__.

What are the drawbacks of such design?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Feb 22, 2017 via email

@jnothman
Copy link
Member Author

Every estimator needs to be coded to check in fit whether to fit.

I don't think that's reasonable. More simply it can be the responsibility of meta-estimators. Each could support or not support trainable in an interim.

Disadvantage is then that meta-estimators become more complicated, and their designers have another thing to remember...

@jnothman
Copy link
Member Author

And I'm leaning towards that approach. I don't think we should modify BaseEstimator, just clone and meta-estimators and a new freeze which sets the flag. Is trainable appropriate, or does that have subtly different semantics to frozen (e.g. wrt cloning)?

@jnothman
Copy link
Member Author

Since it needs to be handled in each meta-estimator, it is open to buggy implementations (remembering to check for it in one place and not another), so we need to be careful.

@jnothman
Copy link
Member Author

jnothman commented Jun 7, 2017

In discussion earlier, @GaelVaroquaux and I came to some agreement that a simple design goes as follows (more or less what I said in February):

  • user may set estimator.frozen = True (or estimator.trainable = False as per Keras) at any point after fitting
  • supporting meta-estimators will check for the attribute and avoid running fit{,_transform,_predict} where appropriate.

I would further suggest that:

  • the fact that freezing is supported be marked clearly and consistently in the estimator parameter description on the meta-estimator
  • we deprecate prefit parameters in CalibratedClassifier and SelectFromModel
  • we can suggest to users that if they want to use freezing with meta-estimators that do not support it, they need only wrap their estimator in a Pipeline (which will support freezing from early on).

Main problem with this design is that meta-estimators become more complicated. We have already burdened them with things like caching.

@jnothman
Copy link
Member Author

jnothman commented Jun 8, 2017

Although #8374 may remain conceptually simpler for users (because it should work in all contexts), but a little more magical. It overwrites the fit method in frozen estimators with a special implementation.

@luizgh
Copy link

luizgh commented Sep 19, 2018

Hi @jnothman - I saw that this issue is inactive for a while - do you know if this is being addressed elsewhere? We have a use case for a Dynamic Classifier Ensemble library that could be solved by this implementation.

@jnothman
Copy link
Member Author

jnothman commented Sep 19, 2018 via email

@luizgh
Copy link

luizgh commented Sep 20, 2018

Sure @jnothman. I am one of the developers of a library for Dynamic Ensemble Selection (DES) methods (the library is called DESlib). We are having problems to make the classes compatible with GridSearch / other CV functions.

One of the main use cases of this library is to facilitate research on the field of Dynamic Selection of classifiers, and this led to a design decision where the base classifiers are fit by the user, and the DES methods receive a pool of base classifiers that were already fit - this allow users to compare many DES techniques with the same base classifiers. Note that this could also be helpful for any ensemble technique, as it gives more freedom for the users on how to create the pool of classifiers (see the example code below)

However, this approach is creating an issue with GridSearch, since the clone method (defined in sklearn.base) is not cloning the classes as we would like. It does a shallow (non-deep) copy of the parameters, but we would like the pool of base classifiers to be deep-copied.

I analyzed this issue and I could not find a solution that does not require changes on the scikit-learn code. Here is the sequence of steps that cause the problem:

  • GridSearchCV calls "clone" on the DES estimator
  • The clone function calls the "get_params" function of the DES estimator. We don't re-implement this function, so it gets all the parameters, including the pool of classifiers (at this point, they are still "fitted")
  • The clone function then clones each parameter with safe=False (link). When cloning the pool of classifiers, the result is a pool that is not "fitted" anymore.

The problem is that, to my knowledge, there is no way for my classifier to inform "clone" that a parameter should be always deep copied. I see that other ensemble methods in sklearn always fit the base classifiers within the "fit" method of the ensemble, so this problem does not happen there.

It seems to me that this github issue could address our problem, by allowing us to "freeze" the pool of classifiers inside the DES classes, such that they are always deep copied / re-used (i.e. they retain the trained parameters of the pool of classifiers)

Here is a short code that reproduces the issue:

from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import load_iris


class MyClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, base_classifiers, k):
        self.base_classifiers = base_classifiers  # Base classifiers that are already trained
        self.k = k  # Simulate a parameter that we want to do a grid search on

    def fit(self, X_dsel, y_dsel):
        pass  # Here we would fit any parameters for the Dynamic selection method, not the base classifiers

    def predict(self, X):
        return self.base_classifiers.predict(X)  # In practice the methods would do something with the predictions of each classifier


X, y = load_iris(return_X_y=True)
X_train, X_dsel, y_train, y_dsel = train_test_split(X, y, test_size=0.5)

base_classifiers = BaggingClassifier()
base_classifiers.fit(X_train, y_train)

clf = MyClassifier(base_classifiers, k=1)

params = {'k': [1, 3, 5, 7]}
grid = GridSearchCV(clf, params)

grid.fit(X_dsel, y_dsel)  # Raises error that the bagging classifiers are not fitted

This code fails since the grid search clones the clf, which in turn will make a shallow copy of the base_classifiers (that will no longer be fitted).

FYI, the actual failing test on our library is this: https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_des_integration.py#L36

@jnothman
Copy link
Member Author

One variant we have considered is where the meta-estimator can mark some of its parameters as not to be cloned. Would that work?

@luizgh
Copy link

luizgh commented Sep 24, 2018

Just to confirm: would this mean that the clone would simply refer to the same object? (e.g. something like clone.parameter = estimator.parameter)

This should work for our use case, since we make no changes on the base classifiers after they are fit.

@jnothman
Copy link
Member Author

jnothman commented Sep 26, 2018 via email

@luizgh
Copy link

luizgh commented Sep 26, 2018

Thanks for confirming. This would solve the problem for our use case.

@samwisehawkins
Copy link

I also have a potential use case for this. I work with forecasts from weather models. For training models I have access to historic weather data (wind speed, say) to use as features. However, when making a real-time forecasts that information is not available, I only have forecasts of those features. I want to be able to fit an estimator on the historical features, freeze it, and then use it within a pipeline to make predictions, without it being refit.

@jnothman
Copy link
Member Author

A the sprint today we discussed four solutions for this, with a focus on transfer learning:

  1. a freezing wrapper like [WIP] ENH estimator freezing to stop it being cloned/refit #8374, [WIP] ENH magic implementation of estimator freezing #8372, [WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464
  2. labelling a base estimator with some marker that it is frozen, so that clone avoids cloning it, while meta-estimators explicitly avoid fitting it (ENH estimator freezing #9397)
  3. having a parameter in each meta-estimator that supports frozen base estimators (e.g. Pipeline.set_params(frozen=['pca']), and having clone ask the meta-estimator which parameters should not be cloned.
  4. recommending users serialise the models with transfer parameters, and advise the use of some kind of StaticTransformer which takes the serialised model (or a path thereto), decodes it when fitting and transforms using it.

We have resolved to take the fourth option, since it involves no changes to clone, metaestimators, etc., and implement an example of it (serialising with joblib and/or onnx) as a prototype. The third was otherwise seen as the best solution available.

@NicolasHug
Copy link
Member

Discussing this with @thomasjpfan : why do we want isinstance(freeze(obj), type(obj)) == True?

If I freeze an SVC, I shouldn't expect it to behave like an SVC... so why do we want this?

(asking because this seems to be the main reason for rejecting one of the PRs which is otherwise pretty solid)

@NicolasHug
Copy link
Member

@GaelVaroquaux @jnothman :) ?

@adrinjalali
Copy link
Member

Do we really want a frozen SVC to pretend it's and SVC and yet its fit does nothing?

@jnothman
Copy link
Member Author

jnothman commented Oct 22, 2019 via email

@adrinjalali
Copy link
Member

I'm not sure what you mean by that @jnothman

@jnothman
Copy link
Member Author

jnothman commented Oct 23, 2019 via email

@jonathan-taylor
Copy link

jonathan-taylor commented Oct 7, 2021

I can't quite tell from this discussion if it is possible to freeze estimators. Gather the "open" setting means it is still in discussion.

How bad can I be led astray by using check_is_fitted (noting the Pipeline exception as a special case)?

Clearly the individual steps of a Pipeline could be manipulated so this would not be freezing. My main use case is that I do not want to refit in my use of the estimators within a few functions / method. I clearly not control what other users do offline. If freezing were possible, I might clone and freeze but this would lose the reference to the original estimator which a user might want to inspect.

@nxorable
Copy link
Contributor

nxorable commented Dec 6, 2021

Regarding a static transformer (option 4 referenced above), maybe something like this?

class StaticTransformer(TransformerMixin, BaseEstimator):
    """Predict using a pre-fitted model, acting as a transformer.

    No refitting is done.
    """
    
    def __init__(self, base_model):
        self.base_model = base_model
        self.__base_model_object = joblib.load(self.base_model)
        
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        base_preds = self.__base_model_object.predict(X)
        base_preds = np.expand_dims(base_preds, axis=1)
        return base_preds

The base model that is loaded (frozen model) does not change, and it seems to survive a clone.

@adrinjalali
Copy link
Member

In terms of API, now that we have __sklearn_clone__, this is trivial, should we close this? Do we want to have a FrozenEstimator inside sklearn?

@lorentzenchr
Copy link
Member

+1 to close as we even have the FrozenTransformer as an example in the doc https://scikit-learn.org/dev/developers/develop.html#cloning.

@glemaitre
Copy link
Member

I recall a discussion IRL with users where it was making their life much easier to have to just to import the estimator. I'm wondering if it was during the OpenML workshop and it was linked with some AutoML use cases.

@adrinjalali
Copy link
Member

I'm also happy to have it inside the library. I've opened #29705 for this.

@adrinjalali
Copy link
Member

Closing as done in #29705, w/o the constraint of isinstance(FrozenEstimator(est), type(est)) == True. I think we can skip that part 😉 unless @jnothman feels strongly about that part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests