-
Notifications
You must be signed in to change notification settings - Fork 97
[SQL] Support per-table 'skip_unused_columns' property #5458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for a per-table SQL property skip_unused_columns to address issue #5436, which required taking the skip_unused_fields setting into account when computing pipeline diffs.
Changes:
- Added
skip_unused_columnsas a new table-level SQL property that can be set either directly on the table or through connector configuration - Modified
TableMetadataclass to include theskip_unused_columnsflag, ensuring it's included in JSON serialization so it affects the hash computation used for pipeline diffs - Added validation and test coverage for the new property
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
TableMetadata.java |
Added skipUnusedColumns field to table metadata, updated constructor, and added JSON serialization/deserialization support (only emitted when true) |
CreateTableStatement.java |
Defined constants for predefined property names and added skipUnusedColumns() method to read the property value |
CreateViewStatement.java |
Extracted property name constants to public fields for reusability |
CalciteToDBSPCompiler.java |
Enhanced table compilation logic to read skip_unused_columns from both table properties and connector configuration, applying OR logic when both are present |
SqlToRelCompiler.java |
Added validation for the skip_unused_columns property as a boolean value and replaced string literals with constants |
UnusedFields.java |
Updated metadata propagation to preserve the skipUnusedColumns flag when trimming unused fields |
ExpandMetadataCasts.java |
Updated metadata propagation to preserve the skipUnusedColumns flag when expanding casts |
CalciteTableDescription.java |
Replaced string literal with constant for consistency |
MetadataTests.java |
Added tests for validation and functionality of the new property, replaced string literals with constants |
grammar.md |
Added documentation explaining the skip_unused_columns property and its use cases |
...bsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/MetadataTests.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| public boolean skipUnusedColumns() { | ||
| String mat = this.getPropertyValue(SKIP_UNUSED_COLUMNS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the delta connector (the only one that has it), this property is under transport.config. Example:
'connectors' = '[{
"transport": {
"name": "delta_table_input",
"config": {
"uri": "s3://feldera-fraud-detection-data/transaction_train",
"mode": "cdc",
"cdc_order_by": "trans_num",
"aws_skip_signature": "true",
"timestamp_column": "trans_date_trans_time",
"aws_region": "us-east-1",
"version": 0,
"verbose": 1,
"skip_unused_columns": true
}
}
}
]');
docs.feldera.com/docs/sql/grammar.md
Outdated
| only consumer of the data. In some circumstances, the views defined | ||
| may not need all the columns present in the data sources. This | ||
| annotation instructs the connectors to avoid ingesting columns that | ||
| are currently not used by the pipeline, if the storage layer makes it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with the storage layer, it's a feature of the connector
6a7f72c to
2537de7
Compare
Signed-off-by: Mihai Budiu <[email protected]>
The Delta connector `skip_unused_columns` attribute introduces unhealthy coupling between connector and table definitions, where the table definition can implicitly change when a connector is added, removed, or modified. We make skip_unused_columns a table-level property instead. It advises connectors that they don't have to ingest unused columns if they support this optimization. We partially keep backward compatibility by computing `skip_unused_columns` as a logical OR of per-connector and per-table attributes: * Existing tables that don't have Delta connectors with `skip_unused_columns` set to true are not affected. * Existing tables that have Delta connectors with `skip_unudes_columns` will behave as if the table definition has changed when the program is recompiled with the new runtime version. These tables will be backfilled from scratch. Signed-off-by: Leonid Ryzhyk <[email protected]>
Part of #5436