-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[FLINK-34936][Checkpointing] Register reused shared state handle to FileMergingSnapshotManager #24644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -459,13 +470,28 @@ public void notifyCheckpointSubsumed(SubtaskKey subtaskKey, long checkpointId) | |||
uploadedStates.headMap(checkpointId, true).entrySet().iterator(); | |||
while (uploadedStatesIterator.hasNext()) { | |||
Map.Entry<Long, Set<LogicalFile>> entry = uploadedStatesIterator.next(); | |||
if (discardLogicalFiles(subtaskKey, entry.getKey(), entry.getValue())) { | |||
if (discardLogicalFiles(subtaskKey, checkpointId, entry.getValue())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: This is a bug found by newly added UTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zakelly Thanks for the PR, overall LGTM, I left a minor comment.
...java/org/apache/flink/contrib/streaming/state/snapshot/RocksIncrementalSnapshotStrategy.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pr! Overall, LGTM. I have one question:
If a subtask completes its checkpoint but the job's overall checkpoint fails, the FileMergingSnapshotManager will still delete the successfully uploaded files upon checkpoint abort. Does this imply that, for FileMergingSnapshotManager, there is no feasible way to reuse the checkpoint files which were "partially uploaded successfully" within a checkpoint?
...ain/java/org/apache/flink/runtime/checkpoint/filemerging/FileMergingSnapshotManagerBase.java
Outdated
Show resolved
Hide resolved
For now, yes, they cannot be reused. |
…ileMergingSnapshotManager
What is the purpose of the change
This is a sub-task of FLIP-306. The life-cycle maintenance of underlying files in file merging manager needs the info of state handle reusing. This PR provide a callback for state handle reusing in
CheckpointStreamFactory
and integrate this with RocksDBKeyedStateBackend.Brief change log
reusePreviousStateHandle
inCheckpointStreamFactory
, implement this inFsMergingCheckpointStorageLocation
reusePreviousStateHandle
during uploading.Verifying this change
testReuseCallbackAndAdvanceWatermark
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation