Add direct GCS export to DatabricksSqlOperator with Parquet/Avro support#60543
Conversation
bad5f46 to
2a920d0
Compare
2a920d0 to
77a6ba5
Compare
jason810496
left a comment
There was a problem hiding this comment.
Nice! Thanks for the PR, LGTM overall.
|
After this PR, the Databricks provider will depend on the GCP provider. Eventually, the Databricks provider will depend on all three cloud providers (AWS, Azure, and GCP), right? |
|
I'm wondering we could move this kind of common serialization logic to |
77a6ba5 to
27b1e3f
Compare
Yes, I agree I think this can be possibly be opened up as another issue? |
Yep, that's right |
|
I'm not quite sure why the Docker build test is failing. The errors show Microsoft's apt repository returning 403 Forbidden during apt-get update, which appears unrelated to the Databricks provider changes |
|
Restarted, was a problem in the backend. Assuming CI will turn green in a moment. |
jason810496
left a comment
There was a problem hiding this comment.
Yes, I agree I think this can be possibly be opened up as another issue?
Yes, it's non-blocking. We could just create issue to track it as follow-up.
27b1e3f to
9555bd1
Compare
|
I'm getting build failure on #60719 https://github.com/apache/airflow/actions/runs/21098174150/job/60678759100?pr=60719#step:8:1265 |
|
@jason810496 For the build issue Elad mentioned, the error shows fastavro 1.9.4 uses deprecated C APIs that were removed in Python 3.13. Would bumping fastavro up to >=1.10.0 work? |
|
Attempted fix PR: #60732 |
Adds direct GCS export capability to
DatabricksSqlOperatorwith Parquet and Avro format support.closes: #55128
Changes
parquetandavroto supportedoutput_formatvaluesgs://bucket/path) inoutput_pathparametergcp_conn_id,gcs_impersonation_chain[gcs]dependency for Google provider