-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX ColumnTransformer.fit_transform for polars.DataFrame missing a .size attribute in sparse stacking #32188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX ColumnTransformer.fit_transform for polars.DataFrame missing a .size attribute in sparse stacking #32188
Conversation
…ize attribute in sparse stacking
…r polars/pandas DataFrame
…e="pandas" is a valid input
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a changelog entry based on our guidelines: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md
Otherwise LGTM.
|
Thanks @adrinjalali, I added the changelog entry! |
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @ph-ll-pp. I just pushed a commit with a few nitpicks. LGTM
|
@jeremiedbb Thanks for the nitpicks, much appreciated! |
…arse output (scikit-learn#32188) Co-authored-by: Jérémie du Boisberranger <[email protected]>
Reference Issues/PRs
Fixes #32155
What does this implement/fix?
ColumnTransformer.fit_transform failed on polars.DataFrames, if some transformers, but not all, returned a sparse matrix/array. In this case, the function tried to access a .size attribute, which is not implemented for polars.DataFrame. The bug was fixed by instead resorting to the .shape attribute which is universally implemented across numpy/pandas/polars. Tests in test_column_transformer_sparse_stacking were updated to not only test stacking of sparse matrix outputs from numpy inputs, but also from polars and pandas inputs.
Any other comments?
I also improved the docstring of the internal _convert_container helper function to indicate constructor_name="pandas" is a valid input.