Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

aholyoke
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

cte_follows_insert is a dialect flag which dictates where the CTE appears relative to an insert-from-select statement.

For example, when cte_follows_insert=True, then SA will generate

INSERT INTO table
WITH cte_name AS (
  ...
)
SELECT * FROM cte_name

and when cte_follows_insert=False, then SA will generate

WITH cte_name AS (
  ...
)
INSERT INTO table
SELECT * FROM cte_name

If we unskip the compliance test test_insert_from_select_round_trip, the test will fail with

FAILED tests/sqlalchemy_dialect_compliance/test_dialect_compliance.py::CTETest_bigquery+bigquery::test_insert_from_select_round_trip - sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/flywheel-dev-328120/queries?prettyPrint=false: Syntax error: Unexpected keyword INSERT at [5:2]
[SQL: WITH `some_cte` AS
(SELECT `some_table`.`id` AS `id`, `some_table`.`data` AS `data`, `some_table`.`parent_id` AS `parent_id`
FROM `some_table`
WHERE `some_table`.`data` IN UNNEST(%(data_1:STRING)s))
 INSERT INTO `some_other_table` (`id`, `data`, `parent_id`) SELECT `some_cte`.`id`, `some_cte`.`data`, `some_cte`.`parent_id`
FROM `some_cte`]
[parameters: {'data_1': ['d2', 'd3', 'd4']}]

This is because by default SA uses cte_follows_insert=False, but BQ syntax expects CTEs to follow inserts.

Here's a snippet that can reproduce the issue:

import sqlalchemy as sa

engine = sa.create_engine("bigquery://")
conn = engine.connect()

meta = sa.MetaData()
source_table = sa.Table(
    "source_table",
    meta,
    sa.Column("id", sa.Integer()),
    sa.Column("value", sa.Integer()),
    schema=DATASET_NAME,
)
dest_table = sa.Table(
    "dest_table",
    meta,
    sa.Column("id", sa.Integer()),
    sa.Column("value", sa.Integer()),
    schema=DATASET_NAME,
)
meta.create_all(bind=conn)

conn.execute(source_table.insert().values([
    {"id": 1, "value": 5},
    {"id": 2, "value": 2},
    {"id": 3, "value": 4},
    {"id": 4, "value": 9},
    {"id": 5, "value": 7},
    {"id": 6, "value": 1},
]))

cte_expr = sa.select(source_table.columns).where(source_table.c.value >= 5).cte("cte")
insert_stmt = dest_table.insert().from_select(cte_expr.columns, cte_expr)

conn.execute(insert_stmt)

Previously, the insert statement would fail with:

[SQL: WITH `cte` AS
(SELECT `source_table`.`id` AS `id`, `source_table`.`value` AS `value`
FROM `holyoke_test`.`source_table`
WHERE `source_table`.`value` >= %(value_1:INT64)s)
 INSERT INTO `holyoke_test`.`dest_table` (`id`, `value`) SELECT `cte`.`id`, `cte`.`value`
FROM `cte`]
[parameters: {'value_1': 5}]
(Background on this error at: https://sqlalche.me/e/20/4xp6)

But now it succeeds. We can see the query that it generates now correctly writes the cte after the insert statement.

In [5]: from sqlalchemy_bigquery import BigQueryDialect; print(str(insert_stmt.compile(dialect=BigQueryDialect())))
INSERT INTO `holyoke_test`.`dest_table` (`id`, `value`) WITH `cte` AS
(SELECT `source_table`.`id` AS `id`, `source_table`.`value` AS `value`
FROM `holyoke_test`.`source_table`
WHERE `source_table`.`value` >= :value_1)
 SELECT `cte`.`id`, `cte`.`value`
FROM `cte`

@aholyoke aholyoke requested review from a team as code owners July 14, 2024 13:35
@aholyoke aholyoke requested a review from mrfaizal July 14, 2024 13:35
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Jul 14, 2024
@tswast tswast assigned Linchin and unassigned tswast Jul 16, 2024
@tswast tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Jul 16, 2024
@tswast
Copy link
Collaborator

tswast commented Jul 16, 2024

@Linchin please take a look

@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Jul 16, 2024
@Linchin Linchin added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 17, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 17, 2024
@Linchin
Copy link
Contributor

Linchin commented Jul 18, 2024

@tswast Thanks for the heads up! Sorry for the delay, I got pulled into other projects.

@aholyoke Thanks a lot for helping fix this bug! Also the description is very thorough, which really helps with reviewing.

@Linchin Linchin merged commit 9e0b117 into googleapis:main Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: xs Pull request size is extra small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants