-
-
Notifications
You must be signed in to change notification settings - Fork 101
feat: add BigQuery dialect #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
562a9e6 to
9f03e73
Compare
|
This looks like a great start. I don't see any obvious red flags. Throwing an error for unsupported cases is good; we could consider adding a specialized As for testing, I'd recommend leaving the existing tests alone for now, but creating a new dedicated file/folder for BigQuery tests. We could then follow suit for DuckDB later on if it makes sense. That said, this package is of course DuckDB-centric, so I'm ok creating separate test folders for other dialects and leaving the DuckDB variants as the "core" use case for now. |
|
Regarding untyped strings:
|
I'll go ahead and add a new folder. Should I copy 100% of the tests, or only the tests for the overridden methods? |
|
I think for now testing only the only the functions that are different. Otherwise, we will have a lot of duplication. |
|
I'll make this as draft since it needs tests and some more integration etc but I'm excited to eventually get this in. |
|
Are you still planning to work on this? |
|
Yes, we're finishing up our frontend PoC with Mosaic, then once we have that in a good place, will start using BigQuery to see how it performs |
This isn't tested yet, I just wanted to make sure I'm heading in the right direction. This may not be 100% comprehensive with all the language differences, but it should be pretty close, at least for the major functions.
Function node
The function node is probably going to be the hairiest to support, since it's an untyped string match, and requires some arg reordering, but wasn't bad. Do you see any issues with the direct arg references?
Types
There were a few untyped strings I ran across that would be easier to translate if they were a string union. I should've kept a better list, but these are the ones I remember:
Maybe the function names in the function node could be too?
Casing
DuckDB by convention uses lowercase, BigQuery uses uppercase. I wrote the override functions in uppercase, not sure if those should still all be lowercase. I don't feel strongly about it.
Unsupported errors
Right now I'm checking for unsupported cases and throwing errors, following the example of a single throw in the DuckDB visitor. Not sure if there's a pattern otherwise to follow.
Testing
We moved the visitor out, but not the tests. Should these sql tests be moved into per visitor files? I can add specific BigQuery tests once we decide on that organization.
https://github.com/uwdata/mosaic/tree/c381a4422193464aad796d741ad371dd460863ca/packages/mosaic/sql/test
AST manipulation
There are a few operations that could be supported, but not while walking through
toString. One example isSEMI|ANTI JOIN. BigQuery and Snowflake don't support them, but they can be trivially added by adding aWHEREcondition. The database engine shouldn't leak into the AST creation, where individual nodes have to run asupportsSemiJointype of scenario, which is the other place I can think to add it. Maybe an optional method likemanipulateAstcould be exposed, and executed if the visitor needed to. There's nothing urgent on this, as Mosaic just barely got join support yesterday, so I wouldn't spend any time on it unless there was a materialized view use case for the coordinator. In the meantime, I filed an issue with Google to add syntax support. (https://issuetracker.google.com/issues/446157721)