-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Allow column names to pass through when fitting narwhals
dataframes
#31019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @ryansheabla, scikit-learn can accept pandas or polars (only most recent versions), but narwhals dataframes are not passable and you need to call I think narwhals is to be used internally by libraries and users then could reliably pass polars, pandas or pyarrow frames. This said, scikit-learn is not using dataframes internally a lot. Input validation, especially So the design - if scikit-learn would decide to use narwhals, which has not been discussed yet - would be something like a) store attributes from inputs and if the user has requested to have a different return type than numpy with Narwhals could be used in steps b) and c), but maybe in step a) it is not necessary to have it. |
Thanks @ryansheabla for opening the request, and Stefanie for your reply! 🙏
True, but I think here @ryansheabla is making a library which internally uses Narwhals and would like to pass that around. Also, although Narwhals supports PyArrow tables, scikit-learn doesn't #25896 (comment) Even though Narwhals was originally designed with tool builders in mind, I've anecdotally been hearing from users working with it directly as a friendly and unified interface to different engines I don't think it would be too much of a lift to generalise the Polars code in scikit-learn to also handle Narwhals input, given that the API is very similar. Happy to work towards this if you'd be open to it! 🙌 |
From what I can tell scikit-learn does not have a problem converting narwhals DataFrames to numpy arrays, given my provided example runs except for the I added the changes to the code I outlined above in my virtual environment and the code runs as expected. You can even pass pandas/polars frames to the narwhals-fitted scaler and vice-versa. I'd be willing to open a PR but this feels like it's part of a larger discussion, especially if there's a possibility of a). |
Describe the workflow you want to enable
Currently when fitting with a
narwhals
DataFrame, the feature names do not pass through because it does not implement a__dataframe__
method.Example:
Expected output
Actual output
All other attributes on
s_nw
are what I'd expect.Describe your proposed solution
This should be easy enough to implement by adding another check within
sklearn.utils.validation._get_feature_names
:_is_narwhals_df
method, borrowing logic from_is_pandas_df
_get_feature_names
:Describe alternatives you've considered, if relevant
No response
Additional context
narwhals-dev/narwhals#355 (comment)
The text was updated successfully, but these errors were encountered: