Support other dataframes like polars and pyarrow not just pandas

### Describe the workflow you want to enable

Currently, scikit-learn nowhere claims to support [pyarrow](https://arrow.apache.org/docs/python/) or [polars](https://www.pola.rs/). And indeed,
```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer
from sklearn.compose import ColumnTransformer

X, y = load_iris(as_frame=True, return_X_y=True)
sepal_cols = ["sepal length (cm)", "sepal width (cm)"]
petal_cols = ["petal length (cm)", "petal width (cm)"]

preprocessor = ColumnTransformer(
    [
        ("scaler", StandardScaler(), sepal_cols),
        ("kbin", KBinsDiscretizer(encode="ordinal"), petal_cols),
    ],
    verbose_feature_names_out=False,
)

import polars as pl  # or import pyarrow as pa
X_pl = pl.from_pandas(X)  # or X_pa = pa.table(X)

preprocessor.fit_transform(X_pl)
# preprocessor.set_output(transform="pandas").fit_transform(X_pl)
```

errors with
```
AttributeError: 'numpy.ndarray' object has no attribute 'columns'

During handling of the above exception, another exception occurred:

ValueError: Specifying the columns using strings is only supported for pandas DataFrames
```

### Describe your proposed solution

scikit-learn should support those dataframes, maybe via the [python dataframe interchange protocol](https://data-apis.org/dataframe-protocol/latest/index.html).

In that regard, a new option like `set_output(transform="dataframe")` would be nice.

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

Some related discussion came up in #25813.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support other dataframes like polars and pyarrow not just pandas #25896

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support other dataframes like polars and pyarrow not just pandas #25896

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions