Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Polars Support #735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
firmai opened this issue May 30, 2024 · 5 comments
Open

Polars Support #735

firmai opened this issue May 30, 2024 · 5 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@firmai
Copy link

firmai commented May 30, 2024

It would be great to offer Polars support, it is currently half as popular as Pandas, and generally work better for large datasets. Polars is bound to replace most data-scientist day to day operations within the next five years.

Thanks for developing bigframes, it very useful.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label May 30, 2024
@TrevorBergeron
Copy link
Contributor

What kind of polars support would you find useful? Would you want BigQuery DataFrames to have an polars-like DataFrame API (as an alternative to the current pandas-like one) or simply interop with polars objects more easily?

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jun 3, 2024
@lmmx
Copy link

lmmx commented Jun 13, 2024

I would like automatic schema supply, this is currently the limiting step in automatically uploading Polars DataFrames: write_ndjson seems to be the only way I can upload list dtypes (Parquet seems to not be viable, see this issue), but NDJSON requires the schema to be passed. I'm really looking for something that will just let me put my Polars DataFrame in a BQ table without fiddling with schemas: there should be enough info already here to do that for me.

@tswast
Copy link
Collaborator

tswast commented Jun 26, 2024

For going from BigQuery DataFrames to polars, I'm adding a to_arrow method in #807 as well as an example for how to create a polars DataFrame from the results.

@tswast
Copy link
Collaborator

tswast commented Jan 3, 2025

For uploading to BigQuery, I have updated the polars docs to indicate how to get BigQuery to correctly handle list types pola-rs/polars#20292

I think that read_polars and to_polars methods would be reasonable requests for bigframes. I have done some refactoring recently to our I/O that might make it a bit easier, but would probably require a little more refactoring to have pyarrow tables/recordbatches as the intermediate format instead of pandas dataframes. The other thing to be careful about is that polars would be an optional "extra" dependency in setup.py to avoid a hard dependency on the polars package.

Edit: Or at the very least, a read_arrow(...) method to correspond to the to_arrow() I implemented in #807. There are fewer concerns with depending on pyarrow in bigframes because we already have that as a required dependency.

@lmmx
Copy link

lmmx commented Feb 13, 2025

Amazing! Should this issue be closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants