Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: add dry_run parameter to read_gbq(), read_gbq_table() and read_gbq_query() #1674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 5, 2025

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Apr 30, 2025

If a table reference is fed to read_gbq() with dry_run set to True, we will use SELECT * FROM {table_ref} for dry run

For read_gbq(), and read_gbq_table() calls that do not ultimately lead to SQL conversions, we use the table metadata for dry run stats report.

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Apr 30, 2025
@sycai sycai requested a review from tswast April 30, 2025 22:27
@sycai sycai marked this pull request as ready for review April 30, 2025 22:27
@sycai sycai requested review from a team as code owners April 30, 2025 22:27
@tswast
Copy link
Collaborator

tswast commented May 1, 2025

If a table reference is fed to read_gbq() with dry_run set to True, we will use SELECT * FROM {table_ref} for dry run

👎 That's a bit misleading. There are some code paths that do fallback to query (e.g. if max_results) is set. Those should have a dry run because they do immediately run a query. But for a deferred operation, I don't think dry run makes sense. Instead, let's populate what we can from the table metadata and have some indicator that no query is actually run.

@sycai
Copy link
Contributor Author

sycai commented May 1, 2025

If a table reference is fed to read_gbq() with dry_run set to True, we will use SELECT * FROM {table_ref} for dry run

👎 That's a bit misleading. There are some code paths that do fallback to query (e.g. if max_results) is set. Those should have a dry run because they do immediately run a query. But for a deferred operation, I don't think dry run makes sense. Instead, let's populate what we can from the table metadata and have some indicator that no query is actually run.

Sounds good. Code updated. Now read_gbq_table dry run looks like this: https://screenshot.googleplex.com/AHaxiSsafniVFRN

col_dtypes = dtypes.bf_type_from_type_kind(table.schema)
index.append("tableColumnCount")
values.append(len(col_dtypes))
index.append("tableColumnTypes")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super easy for end user too predict if something will result in a query or just read the table directly. Could we try to align these names so that they don't need as much logic to handle one case over the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Done

@sycai sycai requested a review from tswast May 1, 2025 22:46
@sycai sycai changed the title feat: add dry_run parameter to read_gbq() and read_gbq_query() feat: add dry_run parameter to read_gbq(), read_gbq_table() and read_gbq_query() May 1, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sycai sycai merged commit 4c5dee5 into main May 5, 2025
24 checks passed
@sycai sycai deleted the sycai_dry_run branch May 5, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants