-
Notifications
You must be signed in to change notification settings - Fork 97
[SQL] Option to discover correlated table columns #5455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new --correlatedColumns compiler flag that performs lineage analysis to discover table columns that are directly compared in equijoin operations. The feature outputs correlated column sets in the format [Correlated:] [table0.column1, table2.column0, ...].
Changes:
- Added new
Lineagevisitor class (~846 lines) implementing dataflow lineage analysis - Added
--correlatedColumnscommand-line flag to enable the feature - Added test coverage for both inner lineage analysis and the end-to-end flag functionality
- Updated documentation to describe the new option
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
Lineage.java |
New visitor implementing lineage analysis through dataflow graph traversal and symbolic interpretation |
LineageTests.java |
New test file with unit tests for inner lineage analysis and integration test for the correlatedColumns flag |
CompilerOptions.java |
Added correlatedColumns boolean option and reorganized toString output alphabetically |
CircuitOptimizer.java |
Integrated Lineage visitor into optimization pipeline when flag is enabled |
using.md |
Added documentation for the new --correlatedColumns flag with usage examples |
MetadataTests.java |
Updated help message test to include new flag |
DBSPAssignmentExpression.java |
Added clarifying documentation comment |
ToDotNodesVisitor.java |
Refactored join operator coloring to use DBSPJoinBaseOperator check |
...ompiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/Lineage.java
Show resolved
Hide resolved
...ompiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/Lineage.java
Show resolved
Hide resolved
...ompiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/Lineage.java
Show resolved
Hide resolved
...ompiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/Lineage.java
Show resolved
Hide resolved
...ompiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/Lineage.java
Show resolved
Hide resolved
...ompiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/Lineage.java
Show resolved
Hide resolved
Signed-off-by: Mihai Budiu <[email protected]>
swanandx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much! 🚀
Fixes #5441
@swanandx please try this and approve if the output is what you need.
See the file using.md for some brief documentation.