Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mihaibudiu
Copy link
Contributor

@mihaibudiu mihaibudiu commented Jan 18, 2026

Part of #5436

Copilot AI review requested due to automatic review settings January 18, 2026 20:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for a per-table SQL property skip_unused_columns to address issue #5436, which required taking the skip_unused_fields setting into account when computing pipeline diffs.

Changes:

  • Added skip_unused_columns as a new table-level SQL property that can be set either directly on the table or through connector configuration
  • Modified TableMetadata class to include the skip_unused_columns flag, ensuring it's included in JSON serialization so it affects the hash computation used for pipeline diffs
  • Added validation and test coverage for the new property

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
TableMetadata.java Added skipUnusedColumns field to table metadata, updated constructor, and added JSON serialization/deserialization support (only emitted when true)
CreateTableStatement.java Defined constants for predefined property names and added skipUnusedColumns() method to read the property value
CreateViewStatement.java Extracted property name constants to public fields for reusability
CalciteToDBSPCompiler.java Enhanced table compilation logic to read skip_unused_columns from both table properties and connector configuration, applying OR logic when both are present
SqlToRelCompiler.java Added validation for the skip_unused_columns property as a boolean value and replaced string literals with constants
UnusedFields.java Updated metadata propagation to preserve the skipUnusedColumns flag when trimming unused fields
ExpandMetadataCasts.java Updated metadata propagation to preserve the skipUnusedColumns flag when expanding casts
CalciteTableDescription.java Replaced string literal with constant for consistency
MetadataTests.java Added tests for validation and functionality of the new property, replaced string literals with constants
grammar.md Added documentation explaining the skip_unused_columns property and its use cases

}

public boolean skipUnusedColumns() {
String mat = this.getPropertyValue(SKIP_UNUSED_COLUMNS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the delta connector (the only one that has it), this property is under transport.config. Example:

  'connectors' = '[{
    "transport": {
      "name": "delta_table_input",
      "config": {
        "uri": "s3://feldera-fraud-detection-data/transaction_train",
        "mode": "cdc",
        "cdc_order_by": "trans_num",
        "aws_skip_signature": "true",
        "timestamp_column": "trans_date_trans_time",
        "aws_region": "us-east-1",
        "version": 0,
        "verbose": 1,
        "skip_unused_columns": true
      }
    }
  }
]');

only consumer of the data. In some circumstances, the views defined
may not need all the columns present in the data sources. This
annotation instructs the connectors to avoid ingesting columns that
are currently not used by the pipeline, if the storage layer makes it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with the storage layer, it's a feature of the connector

@ryzhyk ryzhyk marked this pull request as draft January 18, 2026 22:19
@mihaibudiu mihaibudiu force-pushed the issue5436 branch 2 times, most recently from 6a7f72c to 2537de7 Compare January 19, 2026 06:27
mihaibudiu and others added 2 commits January 19, 2026 07:59
The Delta connector `skip_unused_columns` attribute introduces unhealthy
coupling between connector and table definitions, where the table definition
can implicitly change when a connector is added, removed, or modified.

We make skip_unused_columns a table-level property instead. It advises
connectors that they don't have to ingest unused columns if they support this
optimization.

We partially keep backward compatibility by computing `skip_unused_columns` as
a logical OR of per-connector and per-table attributes:

* Existing tables that don't have Delta connectors with `skip_unused_columns`
  set to true are not affected.
* Existing tables that have Delta connectors with `skip_unudes_columns` will
  behave as if the table definition has changed when the program is recompiled
  with the new runtime version. These tables will be backfilled from scratch.

Signed-off-by: Leonid Ryzhyk <[email protected]>
@ryzhyk ryzhyk marked this pull request as ready for review January 19, 2026 20:43
@ryzhyk ryzhyk enabled auto-merge January 19, 2026 20:45
@ryzhyk ryzhyk added this pull request to the merge queue Jan 19, 2026
Merged via the queue into main with commit 576dd12 Jan 19, 2026
1 check passed
@ryzhyk ryzhyk deleted the issue5436 branch January 19, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants