Thanks to visit codestin.com
Credit goes to github.com

Skip to content

StandardScaler fit_transform() does not work with list as input data when output is configured to 'pandas' #27037

@nihal-rao

Description

@nihal-rao

Describe the bug

I have a StandardScaler configured to output pandas dataframes using the set_output api. When fit_transform is called with data of type list, it throws an error as shown below.

I found that this issue does not occur when the line my_stand_scale.set_output(transform='pandas') in the MWE below is commented out.

On further digging and looking at the traceback, this bug occurs because the index argument in the line return pd.DataFrame(data_to_wrap, index=index, columns=columns, copy=False) of the traceback is actually the bound method index of list object X (which was passed as input to fit_transform), instead of a pandas DataFrame.index.

Steps/Code to Reproduce

import pandas as pd
from sklearn import preprocessing
from sklearn.datasets import make_blobs

my_stand_scale = preprocessing.StandardScaler()
my_stand_scale.set_output(transform='pandas')

X, y = make_blobs(
        n_samples=30,
        centers=[[0, 0, 0], [1, 1, 1]],
        random_state=0,
        n_features=2,
        cluster_std=0.1,
    )
    
my_stand_scale.fit_transform(X=X.tolist())

Expected Results

no error is thrown when fit_transform is called.

Actual Results

Traceback (most recent call last):
  File "/home/nihal/issue_check.py", line 16, in <module>
    my_stand_scale.fit_transform(X=X.tolist())
  File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 140, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/nihal/scikit-learn/sklearn/base.py", line 948, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 153, in wrapped
    return _wrap_data_with_container(method, data_to_wrap, X, self)
  File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 128, in _wrap_data_with_container
    return _wrap_in_pandas_container(
  File "/home/nihal/scikit-learn/sklearn/utils/_set_output.py", line 60, in _wrap_in_pandas_container
    return pd.DataFrame(data_to_wrap, index=index, columns=columns, copy=False)
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/frame.py", line 722, in __init__
    mgr = ndarray_to_mgr(
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 345, in ndarray_to_mgr
    index, columns = _get_axes(
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 748, in _get_axes
    index = ensure_index(index)
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7333, in ensure_index
    return Index._with_infer(index_like, copy=copy)
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 716, in _with_infer
    result = cls(*args, **kwargs)
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 565, in __new__
    subarr = com.asarray_tuplesafe(data, dtype=_dtype_obj)
  File "/home/nihal/miniconda3/envs/sklearn-env/lib/python3.9/site-packages/pandas/core/common.py", line 238, in asarray_tuplesafe
    values = list(values)
TypeError: 'builtin_function_or_method' object is not iterable

Versions

System:
    python: 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03)  [GCC 11.3.0]
executable: /home/nihal/miniconda3/envs/sklearn-env/bin/python
   machine: Linux-5.15.0-76-generic-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.4.dev0
          pip: 23.2.1
   setuptools: 68.0.0
        numpy: 1.25.2
        scipy: 1.11.1
       Cython: 3.0.0
       pandas: 1.5.3
   matplotlib: None
       joblib: 1.3.1
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /home/nihal/miniconda3/envs/sklearn-env/lib/libopenblasp-r0.3.23.so
        version: 0.3.23
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libgomp
       filepath: /home/nihal/miniconda3/envs/sklearn-env/lib/libgomp.so.1.0.0
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions