-
Notifications
You must be signed in to change notification settings - Fork 97
[dbsp] TopN and BottomN GC. #5460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Compiler changes required: GC for
|
|
I have implemented MIN, MAX, ARG_MIN, and ARG_MAX for now, I am thinking to leave the others for a separate PR. |
|
Other features to build also tracked in #1850 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements two new garbage collection (GC) strategies for DBSP: Top-N and Bottom-N. These strategies enable efficient memory management for aggregate operations like MAX, MIN, and top-k/bottom-k transformers by retaining only the N largest or smallest values below the waterline, without assuming that values are sorted by timestamp.
Changes:
- Renamed
DBSPIntegrateTraceRetainValuesLastNOperatortoDBSPIntegrateTraceRetainNValuesOperatorto support multiple retention strategies (LastN, TopN, BottomN) - Added new
GroupFiltervariants (TopNandBottomN) with corresponding cursor implementations - Integrated new GC strategies into the monotonicity analysis pipeline to automatically apply them for MIN/MAX aggregates
- Added comprehensive test coverage including property-based tests
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/tools/CompilerCircuitStream.java |
Added overloaded step method accepting Change objects directly |
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/tools/Change.java |
Updated comment to clarify shuffle parameter |
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/tools/BaseSQLTests.java |
Changed visibility of currentTestInformation to protected |
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/streaming/StreamingTests.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/streaming/LatenessTests.java |
Added comprehensive test suite for MIN/MAX with lateness using new GC strategies |
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/simple/IncrementalRegressionTests.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPZSetExpression.java |
Added @CheckReturnValue annotations and improved multi-line formatting |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPExpression.java |
Added @CheckReturnValue annotation to add method |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/recursive/ValidateRecursiveOperators.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/InsertLimiters.java |
Added logic to determine which GC strategy to use based on aggregate type and integrated TopN/BottomN for MIN/MAX |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/CheckRetain.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/StrayGC.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java |
Updated references and comments for renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/FindDeadCode.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/CircuitVisitor.java |
Updated method signatures for renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/CircuitRewriter.java |
Updated method signatures and field references for renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/CircuitCloneVisitor.java |
Updated method signatures for renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/monotone/PartiallyMonotoneTuple.java |
Changed mayBeNull field visibility to public |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/frontend/calciteCompiler/optimizer/CalciteOptimizer.java |
Added Calcite optimization rules for single values |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/backend/rust/ToRustVisitor.java |
Updated references to renamed operator class |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/backend/dot/ToDotNodesVisitor.java |
Simplified color logic for retain operators |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/backend/ToJsonOuterVisitor.java |
Updated serialization to use which enum instead of boolean flag |
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/circuit/operator/DBSPIntegrateTraceRetainValuesLastNOperator.java |
Renamed class to DBSPIntegrateTraceRetainNValuesOperator and added WhichN enum to support multiple strategies |
crates/dbsp/src/trace/test/test_batch.rs |
Added test implementations for TopN and BottomN filters |
crates/dbsp/src/trace/filter.rs |
Added TopN and BottomN variants to GroupFilter enum |
crates/dbsp/src/trace/cursor.rs |
Implemented cursor logic for TopN and BottomN filters |
crates/dbsp/src/operator/trace.rs |
Added public API for top-n trace retention |
crates/dbsp/src/operator/dynamic/trace.rs |
Added dynamic API and tests for top-n trace retention |
crates/dbsp/src/operator/dynamic/aggregate.rs |
Added property-based tests for MAX with retention and fixed typo |
crates/dbsp/src/operator/dynamic/accumulate_trace.rs |
Added dynamic APIs for top-n and bottom-n accumulate trace retention |
crates/dbsp/src/operator/accumulate_trace.rs |
Added public APIs for top-n and bottom-n accumulate trace retention |
.../src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/InsertLimiters.java
Outdated
Show resolved
Hide resolved
...er/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/streaming/LatenessTests.java
Show resolved
Hide resolved
...er/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/streaming/LatenessTests.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/InsertLimiters.java
Outdated
Show resolved
Hide resolved
| } | ||
| case ArgMinSome -> DBSPIntegrateTraceRetainNValuesOperator.WhichN.BottomN; | ||
| case MinSome1 -> { | ||
| limit = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining why this needs to be 2 may be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there was some explanation somewhere in DBSP, but I could not find it.
8eb1e53 to
5d4ec58
Compare
We implement two new GC strategies: - Top-N: Keep N largest value below the waterline. Unlike Last-N, this strategy doesn's assume that values are sorted by timestamp. Can be used to GC the Max aggregate and the top-k group transformer. - Bottom-N: Keep N smallest values below the waterline. Doesn't assume that values are sorted by timestamp. Can be used to implement bottom-k and Min. Can be also used to implement MinSome, but this requires setting N to 2 to account for the potential None value. Signed-off-by: Leonid Ryzhyk <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Partially addresses #1975
We implement two new GC strategies:
Requires compiler changes.