Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Pandas in, Pandas out?Β #5523

@naught101

Description

@naught101

At the moment, it's possible to use a pandas dataframe as an input for most sklearn fit/predict/transform methods, but you get a numpy array out. It would be really nice to be able to get data out in the same format you put it in.

This isn't perfectly straightforward, because if your Dataframe contains columns that aren't numeric, then the intermediate numpy arrays will cause sklearn to fail, because they wil be dtype=object, instead of dtype=float. This can be solved by having a Dataframe->ndarray transformer, that maps the non-numeric data to numeric data (e.g. integers representing classes/categories). sklearn-pandas already does this, although it currently doesn't have an inverse_transform, but that shouldn't be hard to add.

I feel like a transform like this would be really useful to have in sklearn - it's the kind of thing that anyone working with datasets with multiple data types would find useful. What would it take to get something like this into sklearn?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions