Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC use polars in plot_digits_pipe example #28576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

MarcoGorelli
Copy link
Contributor

Reference Issues/PRs

Related to #28341 - if you want to diversify your examples to show a bit of pandas and a bit of Polars, then this one might be a good one to use Polars in?

What does this implement/fix? Explain your changes.

Any other comments?

@MarcoGorelli MarcoGorelli changed the title use polars in plot_digits_pipe example DOC use polars in plot_digits_pipe example Mar 4, 2024
Copy link

github-actions bot commented Mar 4, 2024

βœ”οΈ Linting Passed

All linting checks passed. Your pull request is in excellent shape! β˜€οΈ

Generated for commit: fa4c590. Link to the linter CI: here

@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 4, 2024 19:20
@ArturoAmorQ
Copy link
Member

Hi @MarcoGorelli, thanks for the PR :) Though I agree that we can change some examples to use polars, maybe this particular example doesn't really show an advantage with respect to using pandas. Instead we can address #28341 (comment), where @glemaitre mentions that time lagged feature engineering seems to be a more natural place to introduce polars.

@MarcoGorelli
Copy link
Contributor Author

Thanks @ArturoAmorQ for your review!

The advantage I was thinking of here is that Polars is strict about dtypes, unlike pandas doesn't let you do arithmetic on object dtype. The result of using pandas here was that it silently produced a "nonsense" plot:

image

This has since been addressed (#28345, #28571, #28352) but my point is that it was by trying to use Polars that the issue came up and was resolved - so here the advantage isn't in speed or memory usage, but strictness

Anyway, happy to close if you think the time lagged feature engineering example is a better fit

@adrinjalali
Copy link
Member

I was actually happy with this PR. I think it's worth the change, I just need to dig into the changed code a bit.

@adrinjalali adrinjalali reopened this Mar 6, 2024
@adrinjalali adrinjalali self-requested a review March 6, 2024 07:16
@ArturoAmorQ
Copy link
Member

Just to be clear, I am not against this PR. If @adrinjalali is happy with it, then maybe we can highlight the benefit with a comment in the code or as narrative text.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the semantics here, since it's weird to group by a column and then include other non-aggregated columns in the result in the first place. Thanks @MarcoGorelli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants