-
Notifications
You must be signed in to change notification settings - Fork 48
feat: add dry_run parameter to read_gbq()
, read_gbq_table()
and read_gbq_query()
#1674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👎 That's a bit misleading. There are some code paths that do fallback to query (e.g. if max_results) is set. Those should have a dry run because they do immediately run a query. But for a deferred operation, I don't think dry run makes sense. Instead, let's populate what we can from the table metadata and have some indicator that no query is actually run. |
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sounds good. Code updated. Now read_gbq_table dry run looks like this: https://screenshot.googleplex.com/AHaxiSsafniVFRN |
bigframes/session/dry_runs.py
Outdated
col_dtypes = dtypes.bf_type_from_type_kind(table.schema) | ||
index.append("tableColumnCount") | ||
values.append(len(col_dtypes)) | ||
index.append("tableColumnTypes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not super easy for end user too predict if something will result in a query or just read the table directly. Could we try to align these names so that they don't need as much logic to handle one case over the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Done
read_gbq()
, read_gbq_table()
and read_gbq_query()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
If a table reference is fed toread_gbq()
with dry_run set to True, we will useSELECT * FROM {table_ref}
for dry runFor
read_gbq()
, andread_gbq_table()
calls that do not ultimately lead to SQL conversions, we use the table metadata for dry run stats report.