Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ColumnTransformer.transfomers_ should store indices rather than a function #12097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Sep 17, 2018 · 2 comments
Closed

Comments

@jnothman
Copy link
Member

When column is specified as a function, this should not be stored in transformers_. Rather, the calculated indices should be stored. The current approach risks getting different sets of indices returned when fit and transform are called.

>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.compose import ColumnTransformer
>>> def get_all(X):
...     return np.arange(X.shape[1])
...
>>> trans = ColumnTransformer([('foobar', StandardScaler(), get_all)])
>>> trans.fit(np.array([[1., 2, 3]]))

Expected:

>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), array([0, 1, 2]))]

Actual:

>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), <function <lambda> at 0x1811fa3048>)]
@jorisvandenbossche
Copy link
Member

Apparently this is also what we discussed already some time ago when adding the function feature: #11592 (comment)

@jnothman
Copy link
Member Author

@janvanrijn asked on another issue about this topic:

So at what time will the list be formed?

During fitting, the column function should be applied to X and the result stored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants