ColumnTransformer.transfomers_ should store indices rather than a function #12097

jnothman · 2018-09-17T10:53:05Z

When column is specified as a function, this should not be stored in transformers_. Rather, the calculated indices should be stored. The current approach risks getting different sets of indices returned when fit and transform are called.

>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.compose import ColumnTransformer
>>> def get_all(X):
...     return np.arange(X.shape[1])
...
>>> trans = ColumnTransformer([('foobar', StandardScaler(), get_all)])
>>> trans.fit(np.array([[1., 2, 3]]))

Expected:

>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), array([0, 1, 2]))]

Actual:

>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), <function <lambda> at 0x1811fa3048>)]

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-09-17T13:23:45Z

Apparently this is also what we discussed already some time ago when adding the function feature: #11592 (comment)

jnothman · 2018-09-18T03:10:17Z

@janvanrijn asked on another issue about this topic:

So at what time will the list be formed?

During fitting, the column function should be applied to X and the result stored.

jnothman added Bug help wanted labels Sep 17, 2018

jorisvandenbossche mentioned this issue Sep 18, 2018

[MRG +1] ColumnTransformer: store evaluated function column specifier during fit #12107

Merged

janvanrijn mentioned this issue Sep 19, 2018

ColumnTransformer generalization to work on empty lists #12084

Merged

ogrisel closed this as completed in #12107 Sep 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ColumnTransformer.transfomers_ should store indices rather than a function #12097

ColumnTransformer.transfomers_ should store indices rather than a function #12097

jnothman commented Sep 17, 2018

jorisvandenbossche commented Sep 17, 2018

jnothman commented Sep 18, 2018

ColumnTransformer.transfomers_ should store indices rather than a function #12097

ColumnTransformer.transfomers_ should store indices rather than a function #12097

Comments

jnothman commented Sep 17, 2018

jorisvandenbossche commented Sep 17, 2018

jnothman commented Sep 18, 2018