Open
Description
Describe the bug
Trying to call transform
for FunctionTransformer
for which feature_names_out
is configured raises error that advises to use set_output(transform='pandas')
. But this doesn't change anything.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
my_transformer = FunctionTransformer(
lambda X : pd.concat(
[
X[col].rename(f"{col} {str(power)}")**power
for col in X
for power in range(2,4)
],
axis=1
),
feature_names_out = (
lambda transformer, input_features: [
f"{feature} {power_str}"
for feature in input_features
for power_str in ["square", "cubic"]
]
)
)
# I specified transform=pandas
my_transformer.set_output(transform='pandas')
sample_size = 10
X = pd.DataFrame({
"feature 1" : [1,2,3,4,5],
"feature 2" : [3,4,5,6,7]
})
my_transformer.fit(X)
my_transformer.transform(X)
Expected Results
pandas.DataFrame
like following
feature 1 square | feature 1 cubic | feature 2 square | feature 2 cubic | |
---|---|---|---|---|
0 | 1 | 1 | 9 | 27 |
1 | 4 | 8 | 16 | 64 |
2 | 9 | 27 | 25 | 125 |
3 | 16 | 84 | 36 | 216 |
4 | 25 | 125 | 49 | 343 |
Actual Results
ValueError: The output generated by `func` have different column names than the ones provided by `get_feature_names_out`. Got output with columns names: ['feature 1 2', 'feature 1 3', 'feature 2 2', 'feature 2 3'] and `get_feature_names_out` returned: ['feature 1 square', 'feature 1 cubic', 'feature 2 square', 'feature 2 cubic']. The column names can be overridden by setting `set_output(transform='pandas')` or `set_output(transform='polars')` such that the column names are set to the names provided by `get_feature_names_out`.
Versions
System:
python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
executable: /usr/bin/python3
machine: Linux-6.5.0-14-generic-x86_64-with-glibc2.35
Python dependencies:
sklearn: 1.4.1.post1
pip: 24.0
setuptools: 68.2.2
numpy: 1.24.2
scipy: 1.11.1
Cython: None
pandas: 2.2.1
matplotlib: 3.7.1
joblib: 1.3.1
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /home/fedor/.local/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so
version: 0.3.21
threading_layer: pthreads
architecture: Haswell
num_threads: 12
user_api: openmp
internal_api: openmp
prefix: libgomp
filepath: /home/fedor/.local/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None
num_threads: 12
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /home/fedor/.local/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
version: 0.3.21.dev
threading_layer: pthreads
architecture: Haswell
num_threads: 12
Metadata
Metadata
Assignees
Type
Projects
Status
Easy