Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ryzhyk
Copy link
Contributor

@ryzhyk ryzhyk commented Jan 19, 2026

Partially addresses #1975

We implement two new GC strategies:

  • Top-N: Keep N largest value below the waterline. Unlike Last-N, this strategy doesn's assume that values are sorted by timestamp. Can be used to GC the Max aggregate and the top-k group transformer.
  • Bottom-N: Keep N smallest values below the waterline. Doesn't assume that values are sorted by timestamp. Can be used to implement bottom-k and Min. Can be also used to implement MinSome, but this requires setting N to 2 to account for the potential None value.

Requires compiler changes.

@ryzhyk ryzhyk added the DBSP core Related to the core DBSP library label Jan 19, 2026
@mihaibudiu
Copy link
Contributor

mihaibudiu commented Jan 19, 2026

Compiler changes required: GC for

  • MIN
  • MAX
  • TopK
  • ARG_MIN
  • ARG_MAX
  • LAG?

@gz gz added the marketing Relevant for marketing content label Jan 20, 2026
@mihaibudiu mihaibudiu added the SQL compiler Related to the SQL compiler label Jan 21, 2026
@mihaibudiu mihaibudiu marked this pull request as ready for review January 21, 2026 23:48
Copilot AI review requested due to automatic review settings January 21, 2026 23:48
@mihaibudiu
Copy link
Contributor

mihaibudiu commented Jan 21, 2026

I have implemented MIN, MAX, ARG_MIN, and ARG_MAX for now, I am thinking to leave the others for a separate PR.

@mihaibudiu
Copy link
Contributor

Other features to build also tracked in #1850

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements two new garbage collection (GC) strategies for DBSP: Top-N and Bottom-N. These strategies enable efficient memory management for aggregate operations like MAX, MIN, and top-k/bottom-k transformers by retaining only the N largest or smallest values below the waterline, without assuming that values are sorted by timestamp.

Changes:

  • Renamed DBSPIntegrateTraceRetainValuesLastNOperator to DBSPIntegrateTraceRetainNValuesOperator to support multiple retention strategies (LastN, TopN, BottomN)
  • Added new GroupFilter variants (TopN and BottomN) with corresponding cursor implementations
  • Integrated new GC strategies into the monotonicity analysis pipeline to automatically apply them for MIN/MAX aggregates
  • Added comprehensive test coverage including property-based tests

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/tools/CompilerCircuitStream.java Added overloaded step method accepting Change objects directly
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/tools/Change.java Updated comment to clarify shuffle parameter
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/tools/BaseSQLTests.java Changed visibility of currentTestInformation to protected
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/streaming/StreamingTests.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/streaming/LatenessTests.java Added comprehensive test suite for MIN/MAX with lateness using new GC strategies
sql-to-dbsp-compiler/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/simple/IncrementalRegressionTests.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPZSetExpression.java Added @CheckReturnValue annotations and improved multi-line formatting
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/expression/DBSPExpression.java Added @CheckReturnValue annotation to add method
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/recursive/ValidateRecursiveOperators.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/InsertLimiters.java Added logic to determine which GC strategy to use based on aggregate type and integrated TopN/BottomN for MIN/MAX
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/monotonicity/CheckRetain.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/StrayGC.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/LowerAsof.java Updated references and comments for renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/FindDeadCode.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/CircuitVisitor.java Updated method signatures for renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/CircuitRewriter.java Updated method signatures and field references for renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/CircuitCloneVisitor.java Updated method signatures for renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/monotone/PartiallyMonotoneTuple.java Changed mayBeNull field visibility to public
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/frontend/calciteCompiler/optimizer/CalciteOptimizer.java Added Calcite optimization rules for single values
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/backend/rust/ToRustVisitor.java Updated references to renamed operator class
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/backend/dot/ToDotNodesVisitor.java Simplified color logic for retain operators
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/compiler/backend/ToJsonOuterVisitor.java Updated serialization to use which enum instead of boolean flag
sql-to-dbsp-compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/circuit/operator/DBSPIntegrateTraceRetainValuesLastNOperator.java Renamed class to DBSPIntegrateTraceRetainNValuesOperator and added WhichN enum to support multiple strategies
crates/dbsp/src/trace/test/test_batch.rs Added test implementations for TopN and BottomN filters
crates/dbsp/src/trace/filter.rs Added TopN and BottomN variants to GroupFilter enum
crates/dbsp/src/trace/cursor.rs Implemented cursor logic for TopN and BottomN filters
crates/dbsp/src/operator/trace.rs Added public API for top-n trace retention
crates/dbsp/src/operator/dynamic/trace.rs Added dynamic API and tests for top-n trace retention
crates/dbsp/src/operator/dynamic/aggregate.rs Added property-based tests for MAX with retention and fixed typo
crates/dbsp/src/operator/dynamic/accumulate_trace.rs Added dynamic APIs for top-n and bottom-n accumulate trace retention
crates/dbsp/src/operator/accumulate_trace.rs Added public APIs for top-n and bottom-n accumulate trace retention

}
case ArgMinSome -> DBSPIntegrateTraceRetainNValuesOperator.WhichN.BottomN;
case MinSome1 -> {
limit = 2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining why this needs to be 2 may be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was some explanation somewhere in DBSP, but I could not find it.

@mihaibudiu mihaibudiu added this pull request to the merge queue Jan 22, 2026
@mihaibudiu mihaibudiu removed this pull request from the merge queue due to a manual request Jan 22, 2026
@mihaibudiu mihaibudiu force-pushed the top-k-gc branch 2 times, most recently from 8eb1e53 to 5d4ec58 Compare January 22, 2026 05:33
@mihaibudiu mihaibudiu enabled auto-merge January 22, 2026 05:33
@mihaibudiu mihaibudiu added this pull request to the merge queue Jan 22, 2026
@mihaibudiu mihaibudiu removed this pull request from the merge queue due to a manual request Jan 22, 2026
@mihaibudiu mihaibudiu enabled auto-merge January 22, 2026 06:17
@mihaibudiu mihaibudiu added this pull request to the merge queue Jan 22, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 22, 2026
ryzhyk and others added 2 commits January 22, 2026 09:12
We implement two new GC strategies:

- Top-N: Keep N largest value below the waterline. Unlike Last-N, this strategy
  doesn's assume that values are sorted by timestamp. Can be used to GC
  the Max aggregate and the top-k group transformer.
- Bottom-N: Keep N smallest values below the waterline. Doesn't assume that
  values are sorted by timestamp. Can be used to implement bottom-k and Min.
  Can be also used to implement MinSome, but this requires setting N to 2 to
  account for the potential None value.

Signed-off-by: Leonid Ryzhyk <[email protected]>
@mihaibudiu mihaibudiu added this pull request to the merge queue Jan 22, 2026
Merged via the queue into main with commit 164f3af Jan 22, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the top-k-gc branch January 22, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library marketing Relevant for marketing content SQL compiler Related to the SQL compiler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants