-
Notifications
You must be signed in to change notification settings - Fork 97
storage: better checkpoint GC during S3 sync #5469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the checkpoint garbage collection mechanism to better handle concurrent S3 checkpoint syncs. Previously, checkpoint GC was limited to removing one checkpoint at a time and was completely blocked during S3 sync operations. The updated implementation allows GC to remove multiple old checkpoints in a single pass while protecting only those checkpoints that are actively being synced to S3.
Changes:
- Modified
gc_checkpointto accept an exception list of checkpoint UUIDs that should be preserved - Changed GC behavior from single-checkpoint removal to bulk removal of all eligible old checkpoints
- Updated the main call site to pass UUIDs of checkpoints currently being synced to S3 as exceptions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
crates/dbsp/src/circuit/dbsp_handle.rs |
Updated gc_checkpoint signature to accept exception list parameter and updated documentation |
crates/dbsp/src/circuit/checkpointer.rs |
Implemented bulk checkpoint removal with exception list support, added informative logging |
crates/adapters/src/controller.rs |
Updated gc_checkpoint call to pass active sync checkpoint UUIDs as exceptions, added Itertools import for deduplication |
|
This needs @blp 's review. Is there any risk of getting stuck in local GC for too long because of a large number of accumulated local checkpoints? |
|
Mostly, there should be just one checkpoint to GC. But in cases when an older checkpoint was being synced and that prevented GCing (so many of them have accumulated), we want to cleanup all the older ones. |
|
@abhizer reminder to add state machine tests for this. |
|
If we're syncing an old checkpoint, I believe that this keeps all the checkpoints newer than that. Why not just keep MIN_CHECKPOINT_THRESHOLD newer ones? Then we won't accumulate many new checkpoints if it takes a long time to sync an old one. |
| // Find the first checkpoint in checkpoint list that is not in `except`. | ||
| self.checkpoint_list | ||
| .iter() | ||
| .filter(|c| except.contains(&c.uuid)) | ||
| .take(1) | ||
| .filter_map(|c| self.backend.gather_batches_for_checkpoint(c).ok()) | ||
| .for_each(|batches| { | ||
| for batch in batches { | ||
| batch_files_to_keep.insert(batch); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're only keeping batches from the first checkpoint in the list? I don't understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is that the checkpoints are incremental and that:
- checkpoint
n + 1will only depend on checkpointn's batch files + new batch files - So, when deleting checkpoints
n - 1or older's batch files, if we keep all the batch files ofn, we implicitly keep all the batch files ofn + 1
Basically, checkpoint n + 1 cannot depend on a batch file that is in n - 1 but not in n.
This should be the same behavior as the current implementation of GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's true. Thanks.
| // Find the first checkpoint in checkpoint list that is not in `except`. | ||
| self.checkpoint_list | ||
| .iter() | ||
| .filter(|c| except.contains(&c.uuid)) | ||
| .take(1) | ||
| .filter_map(|c| self.backend.gather_batches_for_checkpoint(c).ok()) | ||
| .for_each(|batches| { | ||
| for batch in batches { | ||
| batch_files_to_keep.insert(batch); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's true. Thanks.
ea626df to
aba1b3c
Compare
aba1b3c to
4a078ac
Compare
4a078ac to
ef800bf
Compare
ef800bf to
6b7a49d
Compare
Previously: - During S3 sync, we would prevent the GC of existing local checkpoints. - Only one checkpoint would be GCed at a time. This commit updates the `gc_checkpoint` method such that it such that, we can GC all *old* checkpoints (ie, checkpoints older than the retention threshold, currently 2 of the most recent) are GCed, except for any checkpoint in the `except` list or newer. This except list is populated from currently active requests for checkpoint syncs. Signed-off-by: Abhinav Gyawali <[email protected]> checkpointer: only preserve checkpoints in except list This commit updates the checkpointer to only preserve the checkpoints in the except list, instead of preserving any checkpoint that is newer. Also adds tests for the Checkpointer to ensure that it works correctly. Signed-off-by: Abhinav Gyawali <[email protected]> py: add tests with sync GC count: 1, age: 0 Tests for potential regressions where we only want to keep 1 checkpoint in object store. Previously, this introduced a bug by also cleaning up the local checkpoint directory for this checkpoint, but the pipeline still expects it to be available. Signed-off-by: Abhinav Gyawali <[email protected]>
6b7a49d to
3da078e
Compare
Previously:
This commit updates the
gc_checkpointmethod such that it such that, we can GC all old checkpoints (ie, checkpoints older than the retention threshold, currently 2 of the most recent) are GCed, except for any checkpoint in theexceptlist or newer.This except list is populated from currently active requests for checkpoint syncs.