Description
sklearn.base.clone
is defined to reconstruct an object of the argument's type with its constructor parameters (from get_params(deep=False)
) recursively cloned and other attributes removed.
There are cases where I think the One Obvious Way to provide an API entails allowing polymorphic overriding of clone behaviour. In particular, my longstanding implementation of wrappers for memoized and frozen estimators relies on this, and I would like to have that library of utilities not depend on a change to sklearn.base
. So we need to patch the latter.
Let me try to explain. Let's say we want a way to freeze a model. That is, cloning it should not flush its fit attributes, and calling fit
again should not affect it. A syntax like the following seems far and away the clearest:
est = freeze_model(MyEstimator().fit(special_X, special_Y))
It should be obvious that the standard definition of clone
won't make this operate very easily: we need to keep more than will be returned by get_params
, unless MyEstimator().__dict__
becomes a param of the freeze_model
instance, which is pretty hacky.
Alternative syntax could be class decoration (freeze_model(MyEstimator)()
) or mixin (class MyFrozenEstimator(MyEstimator, FrozenModel): pass
) such that the first call to fit
then sets a frozen model. These are not only uglier, but encounter the same problems.
Ideally this sort of estimator wrapper should pass through {set,get}_params
of the wrapped estimator without adding underscored prefixes (not that this is so pertinent for a frozen model, but for other applications of similar wrappers). It should also delegate all attributes to the wrapped estimator. Without making a mess of freeze_model.__init__
this is also not possible, IMO, without redefining clone
.
So. Can we agree:
- that it would not be a Bad Thing to allow polymporphism in cloning?
- on a name for the polymorphic clone method:
clone
orclone_params
orsklearn_clone
?