Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Cloning custom transform replaces values in __init__ dictionary #14461

@PedroGFonseca

Description

@PedroGFonseca

Description

Let us say we have a custom transform A that has some arguments. When the A is instantiated, these arguments are set in the init.

When we clone A (as happens in cross_val_score, for example), the arguments get copied successfully.

However, if the arguments are sent to a structure such as a dictionary, the clone replaces them with None.

In cases where None does not cause errors, this creates a silent error, as the cloned version of A will run, producing different results from its original version (which is how I ran into this problem in the first place).

Fully replicable example follows.

Steps/Code to Reproduce

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.base import clone


class MyTransformA(BaseEstimator, TransformerMixin):
    
    def __init__(self, n_cols_to_keep):
        self.cols_to_keep_dict = {'n_cols': n_cols_to_keep}  
    
    def fit(self, X, *_):
        return self 
        
    def transform(self, X, *_):
        return X
    
    
class MyTransformB(BaseEstimator, TransformerMixin):

    def __init__(self, n_cols_to_keep):
        self.n_cols_to_keep = n_cols_to_keep  # <--- this time we save the input immediately 
        self.cols_to_keep_dict = {'n_cols': self.n_cols_to_keep}  
    
    def fit(self, X, *_):
        return self 
        
    def transform(self, X, *_):
        return X

my_transform_a = MyTransformA(n_cols_to_keep=5)
my_transform_a_clone = clone(my_transform_a)

my_transform_b = MyTransformB(n_cols_to_keep=5)
my_transform_b_clone = clone(my_transform_b)

print('Using MyTransformA:')
print('  my_transform_a.cols_to_keep_dict:        %s' % str(my_transform_a.cols_to_keep_dict))
print('  my_transform_a_clone.cols_to_keep_dict:  %s  <------ ?' % str(my_transform_a_clone.cols_to_keep_dict))

print('\nUsing MyTransformB:')
print('  my_transform_b.cols_to_keep_dict:        %s' % str(my_transform_b.cols_to_keep_dict))
print('  my_transform_b_clone.cols_to_keep_dict): %s' % str(my_transform_b_clone.cols_to_keep_dict))

Expected Results

Using MyTransformA:
  my_transform_a.cols_to_keep_dict:        ('n_cols', 5)
  my_transform_a_clone.cols_to_keep_dict:  ('n_cols', 5)  <------ Does not happen

Using MyTransformB:
  my_transform_b.cols_to_keep_dict:        {'n_cols': 5}
  my_transform_b_clone.cols_to_keep_dict): {'n_cols': 5}

Actual Results

Using MyTransformA:
  my_transform_a.cols_to_keep_dict:        ('n_cols', 5)
  my_transform_a_clone.cols_to_keep_dict:  ('n_cols', None)  <------ ?

Using MyTransformB:
  my_transform_b.cols_to_keep_dict:        {'n_cols': 5}
  my_transform_b_clone.cols_to_keep_dict): {'n_cols': 5}

Versions

System:
    python: 3.7.3 (default, Mar 27 2019, 16:54:48)  [Clang 4.0.1 (tags/RELEASE_401/final)]
executable: /anaconda3/bin/python
   machine: Darwin-18.6.0-x86_64-i386-64bit

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /anaconda3/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.0.3
setuptools: 40.8.0
   sklearn: 0.20.3
     numpy: 1.16.2
     scipy: 1.2.1
    Cython: 0.29.6
    pandas: 0.24.2
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.16.2
SciPy 1.2.1
Scikit-Learn 0.20.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions