-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
At the moment, it's possible to use a pandas dataframe as an input for most sklearn fit/predict/transform methods, but you get a numpy array out. It would be really nice to be able to get data out in the same format you put it in.
This isn't perfectly straightforward, because if your Dataframe contains columns that aren't numeric, then the intermediate numpy arrays will cause sklearn to fail, because they wil be dtype=object
, instead of dtype=float
. This can be solved by having a Dataframe->ndarray transformer, that maps the non-numeric data to numeric data (e.g. integers representing classes/categories). sklearn-pandas already does this, although it currently doesn't have an inverse_transform
, but that shouldn't be hard to add.
I feel like a transform like this would be really useful to have in sklearn - it's the kind of thing that anyone working with datasets with multiple data types would find useful. What would it take to get something like this into sklearn?