Support other dataframes like polars and pyarrow not just pandas #25896

lorentzenchr · 2023-03-17T18:01:16Z

Describe the workflow you want to enable

Currently, scikit-learn nowhere claims to support pyarrow or polars. And indeed,

import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.compose import ColumnTransformer

X, y = load_iris(as_frame=True, return_X_y=True)
sepal_cols = ["sepal length (cm)", "sepal width (cm)"]
petal_cols = ["petal length (cm)", "petal width (cm)"]

preprocessor = ColumnTransformer(
    [
        ("scaler", StandardScaler(), sepal_cols),
        ("kbin", KBinsDiscretizer(encode="ordinal"), petal_cols),
    ],
    verbose_feature_names_out=False,
)

import polars as pl  # or import pyarrow as pa
X_pl = pl.from_pandas(X)  # or X_pa = pa.table(X)

preprocessor.fit_transform(X_pl)
# preprocessor.set_output(transform="pandas").fit_transform(X_pl)

errors with

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

During handling of the above exception, another exception occurred:

ValueError: Specifying the columns using strings is only supported for pandas DataFrames

Describe your proposed solution

scikit-learn should support those dataframes, maybe via the python dataframe interchange protocol.

In that regard, a new option like set_output(transform="dataframe") would be nice.

Describe alternatives you've considered, if relevant

No response

Additional context

Some related discussion came up in #25813.

The text was updated successfully, but these errors were encountered:

Vishal-sys-code · 2023-03-19T11:53:17Z

While scikit-learn does not currently support Polars or Pyarrow dataframes out-of-the-box, there are some possible workarounds to use these dataframes with scikit-learn.

One possible solution would be to convert the Polars or Pyarrow dataframe to a Pandas dataframe before passing it to scikit-learn's ColumnTransformer. This can be done using the to_pandas() method in Polars or pa.Table.to_pandas() method in Pyarrow.

import polars as pl
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

# Load data into a Polars dataframe
X_pl = pl.DataFrame({...})

# Convert Polars dataframe to Pandas dataframe
X_pd = X_pl.to_pandas()

# Create ColumnTransformer
preprocessor = ColumnTransformer(
    [
        ("scaler", StandardScaler(), ["sepal length (cm)", "sepal width (cm)"]),
    ]
)

# Fit and transform using ColumnTransformer
X_transformed = preprocessor.fit_transform(X_pd)

Another possible solution would be to write a custom transformer that can directly handle Polars or Pyarrow dataframes. This transformer would need to implement the fit_transform() method and should be compatible with scikit-learn's ColumnTransformer.

import polars as pl
from sklearn.base import BaseEstimator, TransformerMixin

class PolarsTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, pl_transformer):
        self.pl_transformer = pl_transformer
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        X_pl = pl.from_pandas(X)
        X_transformed_pl = self.pl_transformer.fit_transform(X_pl)
        X_transformed_pd = X_transformed_pl.to_pandas()
        return X_transformed_pd

With this custom transformer, you can pass it directly to scikit-learn's ColumnTransformer:

import polars as pl
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

# Load data into a Polars dataframe
X_pl = pl.DataFrame({...})

# Create PolarsTransformer
preprocessor = ColumnTransformer(
    [
        ("scaler", PolarsTransformer(StandardScaler()), ["sepal length (cm)", "sepal width (cm)"]),
    ]
)

# Fit and transform using ColumnTransformer
X_transformed = preprocessor.fit_transform(X_pl)

adrinjalali · 2023-03-21T07:36:52Z

We definitely should fix this, I'm not sure if @thomasjpfan already has plans for it.

betatim · 2023-03-21T08:51:13Z

I think it would make a lot of sense to support other popular data frames, especially if they support the data frame protocol.

I'm not sure if @thomasjpfan already has plans for it.

If people have plans to work on things like this, it would be great to share them before they start working on it. Seems like a good opportunity to get collaboration going.

thomasjpfan · 2023-03-21T14:19:29Z

I see three features with dataframes + a default option.

TLDR: The engineering to get other DataFrames to work is doable. Implementation-wise, I prefer to lean as much as we can on the DataFrame exchange protocol.

Support other dataframes in `ColumnTransformer` as input.

If we want to support Polars directly, we need to extend ColumnTransformer to recognize it. Although its not too hard to add polars as an optional dependency, I'll prefer to use the dataframe exchange protocol to get the data out of the input DataFrame.

Support other dataframes for output in `set_output`.

When designing set_output, I left the API open so that we can have the following API:

def construct_polars_df(data, columns, index):
    # ignore index since polars does not have an index
    return pl.from_numpy(data, columns=columns)

# API does not work now, but not hard to enable.
transformer.set_output(transform=construct_polars_df)

The above API would configure scikit-learn to output polars DataFrames. The other piece is to get check_array to work with polars dataframes, which currently has some issues: #25813 (comment). Note that even if we get polars to work in a pipeline, it will have to go through many copies because polars <-> NumPy which is not free. Pandas does not have this issue because it can be backed by a NumPy array using pandas's BlockManager.

Generic set_output(transform="dataframe")

Assuming this means "dataframe in -> dataframe out", I think it's best to enable this with the dataframe exchange protocol when data-apis/dataframe-api#42 is decided. As with the above, we'll need to update check_array to work with the exchange protocol. If we do not want to wait for data-apis/dataframe-api#42, we can have an optional dependencies on the dataframe libraries.

Default option: Do not extend support for other DataFrames

Given that Pandas 2.0 DataFrames can be backed by arrow, Polars can now go from polars -> pandas with zero copy. As stated in #25896 (comment), one can convert the polars dataframe into a pandas one before passing it to ColumnTransformer. This gives us the option of "Do not extend support for other DataFrames and recommend converting DataFrames into pandas because the conversion is zero copy".

jiawei-zhang-a · 2023-03-22T17:06:10Z

@thomasjpfan Hello Thomas, has this matter been left for further discussion? Am I permitted to take it?

adrinjalali · 2023-03-22T18:12:16Z

@jiawei-zhang-a this is by far not a good first issue, and we need to discuss further. I suggest other simpler issues to start with. But happy that you're looking to contribute here :)

jiawei-zhang-a · 2023-03-22T18:55:13Z

@adrinjalali Your words are greatly appreciated, and I am excited at the opportunity to contribute to the project. Thank you for your encouragement!

glemaitre · 2023-03-23T22:45:24Z

This gives us the option of "Do not extend support for other DataFrames and recommend converting DataFrames into pandas because the conversion is zero copy".

Or do we magically convert internally to pandas? If we have a full pipeline with a predictor at the end, then I don't find it too much of a hassle. If we have a Pipeline which is then a transformer, then we will be requested to output the same DataFrame type as what came in.

lorentzenchr · 2023-03-26T13:43:21Z

Until data-apis/dataframe-api#42 is decided, could we at least support the ones with __dataframe__ (quite many already) by means of pandas.api.interchange.from_dataframe (pandas v1.5.0). I would like to avoid that the users must call X.to_pandas().

Or could we use https://github.com/apache/arrow-nanoarrow to support arrow arrays in general?

alexander-beedie · 2023-03-28T05:40:43Z

Until data-apis/dataframe-api#42 is decided, could we at least support the ones with __dataframe__ (quite many already) by means of pandas.api.interchange.from_dataframe (pandas v1.5.0). I would like to avoid that the users must call X.to_pandas().

As an FYI, it looks like VegaFusion just took the interchange approach for Polars integration; consequently they got Vaex, pyarrow Tables, cuDF, and Polars working with the same update, which seems like good bang for the buck 🤔

https://vegafusion.io/posts/2023/2023-03-25_Release_1.1.0.html

adrinjalali · 2023-03-28T14:18:02Z

Now that we have more or less the infrastructure for it, we shouldn't be too shy of supporting these.

betatim · 2023-03-30T11:18:58Z

@lorentzenchr do you have some example code or link to something that shows how people use duckdb and scikt-learn now? a super quick google got me to https://duckdb.org/docs/api/python/overview.html#result-conversion which is a bit too basic(?). I'd like to see what some real world(ish) code looks like today.

thomasjpfan · 2023-04-07T00:46:42Z

For libraries that implement the dataframe exchange protocol, a workaround to support other DataFrame input in ColumnTransformer is to have a FunctionTransformer that converts the DataFrame into a Pandas one:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer
import polars as pl
from pandas.api.interchange import from_dataframe

X, y = load_iris(as_frame=True, return_X_y=True)
sepal_cols = ["sepal length (cm)", "sepal width (cm)"]
petal_cols = ["petal length (cm)", "petal width (cm)"]
X_pl = pl.from_pandas(X)

preprocessor = make_pipeline(
    FunctionTransformer(from_dataframe, feature_names_out="one-to-one"),
    ColumnTransformer(
    [
        ("scaler", StandardScaler(), sepal_cols),
        ("kbin", KBinsDiscretizer(encode="ordinal"), petal_cols),
    ],
    verbose_feature_names_out=False)
)
preprocessor.set_output(transform="pandas")
preprocessor.fit_transform(X_pl)

Or do we magically convert internally to pandas? If we have a full pipeline with a predictor at the end, then I don't find it too much of a hassle. If we have a Pipeline which is then a transformer, then we will be requested to output the same DataFrame type as what came in.

I opened #26115 as an implementation of this idea.

As an update, the Polars np.asarray(polars_df) issue was resolved: pola-rs/polars#7961. When the bug fix is released, Polars DataFrames will work out of the box with estimators that assume homogeneous float data. I opened a similar issue for PyArrow: apache/arrow#34886.

betatim · 2023-04-11T13:04:02Z

I think supporting other dataframes via FunctionTransformer and the like feels very much like a clever hack. For the average user it is probably way to time consuming to figure out that this is the way to make it work. It probably doesn't even cross their mind that it is possible. For me this means we should work on getting to "passing a foobar-dataframe just works".

lorentzenchr · 2023-04-11T17:21:41Z

Do you have some example code or link to something that shows how people use duckdb and scikt-learn now?

They simply convert to pandas before passing the data to fit (I can write some SQL-like data prep example if you like). This mean that they have to have pandas installed.

My personal summary:

I'd like to make (or better a volunteer to make) the conversion to __dataframe__ supporting data objects magically work or clearly fail if pandas not installed.
Further discuss other set_output options.
I'm thinking a lot about an arrow native ML library...

betatim · 2023-04-13T07:57:12Z

They simply convert to pandas before passing the data to fit (I can write some SQL-like data prep example if you like). This mean that they have to have pandas installed.

Thanks. I wasn't sure if it was as simple as that or not. Don't think we need an example.

davlee1972 · 2023-06-07T15:24:43Z

Here are my thoughts since I work with all the dataframe libraries above, Spark and other frameworks.. I'll just list the PROs only..

substrait.io plan
https://substrait.io/
If you can define a set of logical operations it can be execute on any "substrait" compatible dataframe / engine.
This currently includes R, Presto, Spark, Clickhouse and PyArrow. You can get native dataframe execution as the list of substrait supported dataframes / engine grows.
This also allows developers to code stuff in one library and execute in production using some other library which is more robust and scalable.

custom transformer
With regards to pypolars. Pypolars supports a lazy execution model which will look at all your transformations and optimize it.. Filters can all be moved to execute first. Aggregations can be combined, etc.. This requires everything to execute in pypolars without converting back and forth between pandas.

transformer
Ok here is a con.. Converting to pandas should be replaced with converting to arrow instead. Pandas 2.0 has added support for pyarrow columns vs numpy columns. There are just issues with numpy backed pandas like variable string columns saved in memory as dtype objects instead of real strings or no NULLs allowed in integer columns. Pandas, PyPolars, R dataframe, DuckDB, etc.. all support arrow under the hood already to move data in and out which could be processed by scikit..

ogrisel · 2023-12-08T10:16:43Z

I think most of the work is done for polars. But the ColumnTransformer (and maybe OrdinalEncoder and LabelEncoder) might still need work to support pyarrow properly.

We also would need .set_output(transform="pyarrow") in the transformer mixins.

ogrisel · 2023-12-08T10:18:27Z

Maybe we could have one such issue per-dataframe libraries we want to support, either for input only or input/output (e.g. at least pyarrow I think).

lorentzenchr · 2024-12-14T18:22:56Z

FYI, the above code snippet now works, I guess since #26464. So I'm inclined to close.

MarcoGorelli · 2025-03-20T11:58:17Z

FYI, the above code snippet now works, I guess since #26464. So I'm inclined to close.

It works for Polars, but not for PyArrow, right?

At least:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.compose import ColumnTransformer

X, y = load_iris(as_frame=True, return_X_y=True)
sepal_cols = ["sepal length (cm)", "sepal width (cm)"]
petal_cols = ["petal length (cm)", "petal width (cm)"]

preprocessor = ColumnTransformer(
    [
        ("scaler", StandardScaler(), sepal_cols),
        ("kbin", KBinsDiscretizer(encode="ordinal"), petal_cols),
    ],
    verbose_feature_names_out=False,
)

import pyarrow as pa
X_pa = pa.table(X)

preprocessor.fit_transform(X_pa)

raises

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 21
     18 import pyarrow as pa
     19 X_pa = pa.table(X)
---> 21 preprocessor.fit_transform(X_pa)

File ~/scratch/.310venv/lib/python3.10/site-packages/sklearn/utils/_set_output.py:319, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    317 @wraps(f)
    318 def wrapped(self, X, *args, **kwargs):
--> 319     data_to_wrap = f(self, X, *args, **kwargs)
    320     if isinstance(data_to_wrap, tuple):
    321         # only wrap the first output for cross decomposition
    322         return_tuple = (
    323             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    324             *data_to_wrap[1:],
    325         )

File ~/scratch/.310venv/lib/python3.10/site-packages/sklearn/base.py:1389, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1382     estimator._validate_params()
   1384 with config_context(
   1385     skip_parameter_validation=(
   1386         prefer_skip_nested_validation or global_skip_validation
   1387     )
   1388 ):
-> 1389     return fit_method(estimator, *args, **kwargs)

File ~/scratch/.310venv/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:1001, in ColumnTransformer.fit_transform(self, X, y, **params)
    998 else:
    999     routed_params = self._get_empty_routing()
-> 1001 result = self._call_func_on_transformers(
   1002     X,
   1003     y,
   1004     _fit_transform_one,
   1005     column_as_labels=False,
   1006     routed_params=routed_params,
   1007 )
   1009 if not result:
   1010     self._update_fitted_transformers([])

File ~/scratch/.310venv/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:902, in ColumnTransformer._call_func_on_transformers(self, X, y, func, column_as_labels, routed_params)
    897         else:  # func is _transform_one
    898             extra_args = {}
    899         jobs.append(
    900             delayed(func)(
    901                 transformer=clone(trans) if not fitted else trans,
--> 902                 X=_safe_indexing(X, columns, axis=1),
    903                 y=y,
    904                 weight=weight,
    905                 **extra_args,
    906                 params=routed_params[name],
    907             )
    908         )
    910     return Parallel(n_jobs=self.n_jobs)(jobs)
    912 except ValueError as e:

File ~/scratch/.310venv/lib/python3.10/site-packages/sklearn/utils/_indexing.py:270, in _safe_indexing(X, indices, axis)
    268     return _polars_indexing(X, indices, indices_dtype, axis=axis)
    269 elif hasattr(X, "shape"):
--> 270     return _array_indexing(X, indices, indices_dtype, axis=axis)
    271 else:
    272     return _list_indexing(X, indices, indices_dtype)

File ~/scratch/.310venv/lib/python3.10/site-packages/sklearn/utils/_indexing.py:36, in _array_indexing(array, key, key_dtype, axis)
     34 if isinstance(key, tuple):
     35     key = list(key)
---> 36 return array[key, ...] if axis == 0 else array[:, key]

File ~/scratch/.310venv/lib/python3.10/site-packages/pyarrow/table.pxi:1693, in pyarrow.lib._Tabular.__getitem__()

File ~/scratch/.310venv/lib/python3.10/site-packages/pyarrow/table.pxi:1779, in pyarrow.lib._Tabular.column()

File ~/scratch/.310venv/lib/python3.10/site-packages/pyarrow/table.pxi:1725, in pyarrow.lib._Tabular._ensure_integer_index()

TypeError: Index must either be string or integer

Given that the original issue also mentioned PyArrow, may I suggest either reopening until PyArrow support is completed, or making a separate issue for PyArrow support?

Just to avoid ambiguity: I'm not requesting that PyArrow be required in scikit-learn (far from it!), but that pyarrow.Table be supported in the same way the polars.DataFrame is

related issue: #31019

lorentzenchr · 2025-03-20T12:23:06Z

@scikit-learn/core-devs Should we make pyarrow tables work within scikit-learn (without requiring it as dependency, just like pandas and polars)?

adam2392 · 2025-03-20T12:34:32Z

Yes that would be great imo. I need to look into it more, but are there any major API incompatibilities?

lorentzenchr · 2025-03-20T12:37:14Z

are there any major API incompatibilities?

Not that I know of. The API calls like fit(X, y) stay the same, but we would allow for more kinds of objects X, y being passed.

adam2392 · 2025-03-20T12:45:44Z

I meant on Arrow side to operate internal to fit, predict, etc.

Some related discussion: #25450

Would we use the dataframe interchange protocol? https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html#stakeholders

adrinjalali · 2025-03-20T13:17:29Z

I'm not sure about the dataframe interchange protocol really.

I'd need to see what @MarcoGorelli thinks about it. At some point in order to support multiple dataframe like objects, we better simply use narwhals.

betatim · 2025-03-20T13:46:34Z

I think the dataframe interchange protocol, at least the one that is similar to array API, is not going to get wide spread adoption. At least that is my impression.

lorentzenchr · 2025-03-20T14:44:23Z

There are several different things to fix for an implementation:

set_output
This requires a PyArrowTablesAdapter in sklearn/utils/_set_output.py. Here, we could, sooner or later, think about using narwhals.
feature_names_in_, see release highlights 1.0
This is done in sklearn/utils/validation.py and makes use of the dataframe interchange protocol which is supported by pyarrow as of version 11.0.0
Internal indexing tools in sklearn/utils/_indexing.py
This is where the error reported in Support other dataframes like polars and pyarrow not just pandas #25896 (comment) stems from, i.e. _safe_indexing. It currently uses a mix of dataframe protocol and pandas & polars specific code. Here, too, narwhals could help.

davlee1972 · 2025-03-20T17:14:31Z

You could go full pyarrow with pyarrow dataset instead of pyarrow table.

Leveraging pyarrow compute to apply calculations is pretty powerful when backed by GPUs.

ogrisel · 2025-03-21T10:45:16Z

Since the dataframe interchange API is unlikely to become widely adopted and feature rich enough for scikit-learn use cases, I wouldn't mind considering the inclusion of narwhals as a soft dependency to simplify support for polars / pyarrow tables in the future.

I would still keep custom code to support pandas with narwhals in the short to medium term, to avoid introducing a new dependency to the pandas users, though.

adrinjalali · 2025-03-21T17:00:55Z

I'd be okay adding narwhals as a dependency since it's a very lightweight dependency and doesn't bring any transient dependencies. However, I don't mind having two paths for now, for pandas, and others, while making sure we do NOT maintain the pandas path too much, and just leave it as is for now and mostly maintain the narwhals path.

YuanfengZhang · 2025-04-20T09:17:56Z

Pyarrow is used in both pandas, polars and cudf (RAPIDS), making it a good choice of interface for scikit-learn.
How about trying pyarrow approach and fall back to numpy if it fails?

Importing narwhals is better than rebuilding the wheel in short term but additional dependency may sometime cause trouble.

lorentzenchr added New Feature Needs Triage Issue requires triage labels Mar 17, 2023

adrinjalali added RFC and removed Needs Triage Issue requires triage labels Mar 21, 2023

thomasjpfan mentioned this issue Apr 7, 2023

ENH Support dataframe exchange protocol in ColumnTransformer as input #26115

Closed

ogrisel mentioned this issue May 24, 2023

ENH Adds native pandas categorical support to gradient boosting #26411

Merged

aleeminati mentioned this issue May 25, 2023

Added Support for polars #26435

Closed

thomasjpfan mentioned this issue May 30, 2023

ENH Adds feature names support to dataframe protocol #26464

Merged

ogrisel mentioned this issue May 31, 2023

feat: __dataframe__ protocol support ibis-project/ibis#6343

Closed

1 task

jovan-stojanovic mentioned this issue Jun 21, 2023

[WIP] FIX Add tests for pyarrow dtypes in pandas Dataframes #26651

Draft

This was referenced Jun 22, 2023

ENH Supports DataFrame API for polars in ColumnTransformer #26669

Closed

ENH Adds polars output support to ColumnTransformer #26683

Merged

betatim mentioned this issue Jul 14, 2023

Transform output to xarray objects #26835

Open

ogrisel mentioned this issue Dec 4, 2023

ENH detect categorical polars columns in HistGradientBoosting #27835

Merged

kkraus14 mentioned this issue Jan 25, 2024

Expressions - another attempt data-apis/dataframe-api#346

Closed

MarcoGorelli mentioned this issue Feb 6, 2024

DISC: Consider not requiring PyArrow in 3.0 pandas-dev/pandas#57073

Open

adamsardar mentioned this issue Feb 26, 2024

Consider switching to the Dataframe API rather than hard-tethering to pandas azukds/tubular#185

Closed

krz mentioned this issue Mar 25, 2024

Support polars data frames py-why/dowhy#1151

Open

This was referenced Mar 28, 2024

Warning with DecisionBoundaryPlot and polars DataFrame #28717

Closed

FIX warning using polars DataFrames in DecisionBoundaryDisplay.from_estimator #28718

Merged

glemaitre added this to Dataframe interoperability May 17, 2024

glemaitre moved this to Discussion in Dataframe interoperability May 17, 2024

lorentzenchr closed this as completed Jan 10, 2025

github-project-automation bot moved this from Discussion to Done in Dataframe interoperability Jan 10, 2025

MarcoGorelli mentioned this issue Mar 20, 2025

Allow column names to pass through when fitting narwhals dataframes #31019

Closed

lorentzenchr reopened this Mar 20, 2025

lorentzenchr mentioned this issue Mar 20, 2025

FIX _safe_indexing for pyarrow #31040

Open

lorentzenchr mentioned this issue Mar 21, 2025

RFC adopt narwhals for dataframe support #31049

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support other dataframes like polars and pyarrow not just pandas #25896

Support other dataframes like polars and pyarrow not just pandas #25896

lorentzenchr commented Mar 17, 2023 •

edited

Loading

Vishal-sys-code commented Mar 19, 2023

adrinjalali commented Mar 21, 2023

betatim commented Mar 21, 2023

thomasjpfan commented Mar 21, 2023 •

edited

Loading

jiawei-zhang-a commented Mar 22, 2023

adrinjalali commented Mar 22, 2023

jiawei-zhang-a commented Mar 22, 2023

glemaitre commented Mar 23, 2023

lorentzenchr commented Mar 26, 2023

alexander-beedie commented Mar 28, 2023 •

edited

Loading

adrinjalali commented Mar 28, 2023

betatim commented Mar 30, 2023

thomasjpfan commented Apr 7, 2023

betatim commented Apr 11, 2023

lorentzenchr commented Apr 11, 2023

betatim commented Apr 13, 2023

davlee1972 commented Jun 7, 2023 •

edited

Loading

ogrisel commented Dec 8, 2023

ogrisel commented Dec 8, 2023

lorentzenchr commented Dec 14, 2024

MarcoGorelli commented Mar 20, 2025 •

edited

Loading

lorentzenchr commented Mar 20, 2025

adam2392 commented Mar 20, 2025

lorentzenchr commented Mar 20, 2025

adam2392 commented Mar 20, 2025

adrinjalali commented Mar 20, 2025

betatim commented Mar 20, 2025

lorentzenchr commented Mar 20, 2025

davlee1972 commented Mar 20, 2025

ogrisel commented Mar 21, 2025

adrinjalali commented Mar 21, 2025

YuanfengZhang commented Apr 20, 2025 •

edited

Loading

Support other dataframes like polars and pyarrow not just pandas #25896

Support other dataframes like polars and pyarrow not just pandas #25896

Comments

lorentzenchr commented Mar 17, 2023 • edited Loading

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Vishal-sys-code commented Mar 19, 2023

adrinjalali commented Mar 21, 2023

betatim commented Mar 21, 2023

thomasjpfan commented Mar 21, 2023 • edited Loading

Support other dataframes in ColumnTransformer as input.

Support other dataframes for output in set_output.

Generic set_output(transform="dataframe")

Default option: Do not extend support for other DataFrames

jiawei-zhang-a commented Mar 22, 2023

adrinjalali commented Mar 22, 2023

jiawei-zhang-a commented Mar 22, 2023

glemaitre commented Mar 23, 2023

lorentzenchr commented Mar 26, 2023

alexander-beedie commented Mar 28, 2023 • edited Loading

adrinjalali commented Mar 28, 2023

betatim commented Mar 30, 2023

thomasjpfan commented Apr 7, 2023

betatim commented Apr 11, 2023

lorentzenchr commented Apr 11, 2023

betatim commented Apr 13, 2023

davlee1972 commented Jun 7, 2023 • edited Loading

ogrisel commented Dec 8, 2023

ogrisel commented Dec 8, 2023

lorentzenchr commented Dec 14, 2024

MarcoGorelli commented Mar 20, 2025 • edited Loading

lorentzenchr commented Mar 20, 2025

adam2392 commented Mar 20, 2025

lorentzenchr commented Mar 20, 2025

adam2392 commented Mar 20, 2025

adrinjalali commented Mar 20, 2025

betatim commented Mar 20, 2025

lorentzenchr commented Mar 20, 2025

davlee1972 commented Mar 20, 2025

ogrisel commented Mar 21, 2025

adrinjalali commented Mar 21, 2025

YuanfengZhang commented Apr 20, 2025 • edited Loading

lorentzenchr commented Mar 17, 2023 •

edited

Loading

thomasjpfan commented Mar 21, 2023 •

edited

Loading

Support other dataframes in `ColumnTransformer` as input.

Support other dataframes for output in `set_output`.

alexander-beedie commented Mar 28, 2023 •

edited

Loading

davlee1972 commented Jun 7, 2023 •

edited

Loading

MarcoGorelli commented Mar 20, 2025 •

edited

Loading

YuanfengZhang commented Apr 20, 2025 •

edited

Loading