Pandas in, Pandas out?

At the moment, it's possible to use a pandas dataframe as an input for most sklearn fit/predict/transform methods, but you get a numpy array out. It would be really nice to be able to get data out in the same format you put it in.

This isn't perfectly straightforward, because if your Dataframe contains columns that aren't numeric, then the intermediate numpy arrays will cause sklearn to fail, because they wil be `dtype=object`, instead of `dtype=float`. This can be solved by having a Dataframe->ndarray transformer, that maps the non-numeric data to numeric data (e.g. integers representing classes/categories). [sklearn-pandas](https://github.com/paulgb/sklearn-pandas/) already does this, although it currently [doesn't have an `inverse_transform`](https://github.com/paulgb/sklearn-pandas/issues/41), but that shouldn't be hard to add.

I feel like a transform like this would be _really_ useful to have in sklearn - it's the kind of thing that anyone working with datasets with multiple data types would find useful. What would it take to get something like this into sklearn?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pandas in, Pandas out? #5523

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pandas in, Pandas out? #5523

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions