Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[FLINK-34936][Checkpointing] Register reused shared state handle to FileMergingSnapshotManager #24644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

Zakelly
Copy link
Contributor

@Zakelly Zakelly commented Apr 9, 2024

What is the purpose of the change

This is a sub-task of FLIP-306. The life-cycle maintenance of underlying files in file merging manager needs the info of state handle reusing. This PR provide a callback for state handle reusing in CheckpointStreamFactory and integrate this with RocksDBKeyedStateBackend.

Brief change log

  • Add interface reusePreviousStateHandle in CheckpointStreamFactory, implement this in FsMergingCheckpointStorageLocation
  • Call reusePreviousStateHandle during uploading.

Verifying this change

  • New UT testReuseCallbackAndAdvanceWatermark

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 9, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@Zakelly
Copy link
Contributor Author

Zakelly commented Apr 11, 2024

@ljz2051 @fredia Would you please take a look?

@@ -459,13 +470,28 @@ public void notifyCheckpointSubsumed(SubtaskKey subtaskKey, long checkpointId)
uploadedStates.headMap(checkpointId, true).entrySet().iterator();
while (uploadedStatesIterator.hasNext()) {
Map.Entry<Long, Set<LogicalFile>> entry = uploadedStatesIterator.next();
if (discardLogicalFiles(subtaskKey, entry.getKey(), entry.getValue())) {
if (discardLogicalFiles(subtaskKey, checkpointId, entry.getValue())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This is a bug found by newly added UTs.

Copy link
Contributor

@fredia fredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zakelly Thanks for the PR, overall LGTM, I left a minor comment.

Copy link
Contributor

@ljz2051 ljz2051 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pr! Overall, LGTM. I have one question:

If a subtask completes its checkpoint but the job's overall checkpoint fails, the FileMergingSnapshotManager will still delete the successfully uploaded files upon checkpoint abort. Does this imply that, for FileMergingSnapshotManager, there is no feasible way to reuse the checkpoint files which were "partially uploaded successfully" within a checkpoint?

@Zakelly
Copy link
Contributor Author

Zakelly commented Apr 16, 2024

Thanks for this pr! Overall, LGTM. I have one question:

If a subtask completes its checkpoint but the job's overall checkpoint fails, the FileMergingSnapshotManager will still delete the successfully uploaded files upon checkpoint abort. Does this imply that, for FileMergingSnapshotManager, there is no feasible way to reuse the checkpoint files which were "partially uploaded successfully" within a checkpoint?

For now, yes, they cannot be reused.

@fredia fredia merged commit 31ea1a9 into apache:master Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants