-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Closed
Description
Description
Let us say we have a custom transform A that has some arguments. When the A is instantiated, these arguments are set in the init.
When we clone A (as happens in cross_val_score, for example), the arguments get copied successfully.
However, if the arguments are sent to a structure such as a dictionary, the clone replaces them with None.
In cases where None does not cause errors, this creates a silent error, as the cloned version of A will run, producing different results from its original version (which is how I ran into this problem in the first place).
Fully replicable example follows.
Steps/Code to Reproduce
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.base import clone
class MyTransformA(BaseEstimator, TransformerMixin):
def __init__(self, n_cols_to_keep):
self.cols_to_keep_dict = {'n_cols': n_cols_to_keep}
def fit(self, X, *_):
return self
def transform(self, X, *_):
return X
class MyTransformB(BaseEstimator, TransformerMixin):
def __init__(self, n_cols_to_keep):
self.n_cols_to_keep = n_cols_to_keep # <--- this time we save the input immediately
self.cols_to_keep_dict = {'n_cols': self.n_cols_to_keep}
def fit(self, X, *_):
return self
def transform(self, X, *_):
return X
my_transform_a = MyTransformA(n_cols_to_keep=5)
my_transform_a_clone = clone(my_transform_a)
my_transform_b = MyTransformB(n_cols_to_keep=5)
my_transform_b_clone = clone(my_transform_b)
print('Using MyTransformA:')
print(' my_transform_a.cols_to_keep_dict: %s' % str(my_transform_a.cols_to_keep_dict))
print(' my_transform_a_clone.cols_to_keep_dict: %s <------ ?' % str(my_transform_a_clone.cols_to_keep_dict))
print('\nUsing MyTransformB:')
print(' my_transform_b.cols_to_keep_dict: %s' % str(my_transform_b.cols_to_keep_dict))
print(' my_transform_b_clone.cols_to_keep_dict): %s' % str(my_transform_b_clone.cols_to_keep_dict))
Expected Results
Using MyTransformA:
my_transform_a.cols_to_keep_dict: ('n_cols', 5)
my_transform_a_clone.cols_to_keep_dict: ('n_cols', 5) <------ Does not happen
Using MyTransformB:
my_transform_b.cols_to_keep_dict: {'n_cols': 5}
my_transform_b_clone.cols_to_keep_dict): {'n_cols': 5}
Actual Results
Using MyTransformA:
my_transform_a.cols_to_keep_dict: ('n_cols', 5)
my_transform_a_clone.cols_to_keep_dict: ('n_cols', None) <------ ?
Using MyTransformB:
my_transform_b.cols_to_keep_dict: {'n_cols': 5}
my_transform_b_clone.cols_to_keep_dict): {'n_cols': 5}
Versions
System:
python: 3.7.3 (default, Mar 27 2019, 16:54:48) [Clang 4.0.1 (tags/RELEASE_401/final)]
executable: /anaconda3/bin/python
machine: Darwin-18.6.0-x86_64-i386-64bit
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /anaconda3/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.20.3
numpy: 1.16.2
scipy: 1.2.1
Cython: 0.29.6
pandas: 0.24.2
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.16.2
SciPy 1.2.1
Scikit-Learn 0.20.3
Metadata
Metadata
Assignees
Labels
No labels