-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Closed
Labels
Description
Describe the bug
I have a StandardScaler configured to output pandas dataframes using the set_output api. When fit_transform is called with data of type list, it throws an error as shown below.
I found that this issue does not occur when the line my_stand_scale.set_output(transform='pandas') in the MWE below is commented out.
On further digging and looking at the traceback, this bug occurs because the index argument in the line return pd.DataFrame(data_to_wrap, index=index, columns=columns, copy=False) of the traceback is actually the bound method index of list object X (which was passed as input to fit_transform), instead of a pandas DataFrame.index.
Steps/Code to Reproduce
import pandas as pd
from sklearn import preprocessing
from sklearn.datasets import make_blobs
my_stand_scale = preprocessing.StandardScaler()
my_stand_scale.set_output(transform='pandas')
X, y = make_blobs(
n_samples=30,
centers=[[0, 0, 0], [1, 1, 1]],
random_state=0,
n_features=2,
cluster_std=0.1,
)
my_stand_scale.fit_transform(X=X.tolist())
Expected Results
no error is thrown when fit_transform is called.
Actual Results
Traceback (most recent call last):
File "/home/nihal/issue_check.py", line 16, in <module>
my_stand_scale.fit_transform(X=X.tolist())
File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 140, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/home/nihal/scikit-learn/sklearn/base.py", line 948, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 153, in wrapped
return _wrap_data_with_container(method, data_to_wrap, X, self)
File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 128, in _wrap_data_with_container
return _wrap_in_pandas_container(
File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 60, in _wrap_in_pandas_container
return pd.DataFrame(data_to_wrap, index=index, columns=columns, copy=False)
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/frame.py", line 722, in __init__
mgr = ndarray_to_mgr(
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 345, in ndarray_to_mgr
index, columns = _get_axes(
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 748, in _get_axes
index = ensure_index(index)
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7333, in ensure_index
return Index._with_infer(index_like, copy=copy)
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 716, in _with_infer
result = cls(*args, **kwargs)
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 565, in __new__
subarr = com.asarray_tuplesafe(data, dtype=_dtype_obj)
File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/common.py", line 238, in asarray_tuplesafe
values = list(values)
TypeError: 'builtin_function_or_method' object is not iterable
Versions
System:
python: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0]
executable: /home/nihal/miniconda3/envs/sklearn-env/bin/python
machine: Linux-5.15.0-76-generic-x86_64-with-glibc2.31
Python dependencies:
sklearn: 1.4.dev0
pip: 23.2.1
setuptools: 68.0.0
numpy: 1.25.2
scipy: 1.11.1
Cython: 3.0.0
pandas: 1.5.3
matplotlib: None
joblib: 1.3.1
threadpoolctl: 3.2.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /home/nihal/miniconda3/envs/sklearn-env/lib/libopenblasp-r0.3.23.so
version: 0.3.23
threading_layer: pthreads
architecture: Haswell
user_api: openmp
internal_api: openmp
num_threads: 8
prefix: libgomp
filepath: /home/nihal/miniconda3/envs/sklearn-env/lib/libgomp.so.1.0.0
version: None