Add pod-level checkpoint support to containerd (KEP-5823)#12932
Add pod-level checkpoint support to containerd (KEP-5823)#12932adrianreber wants to merge 3 commits intocontainerd:mainfrom
Conversation
Update github.com/checkpoint-restore/checkpointctl from v1.5.0 to v1.5.1-0.20260212150839-06e23e6d1f24. This pulls in the PodOptionsFile constant and updated checkpoint metadata definitions needed for pod-level checkpoint/restore support. Also updates transitive dependencies: - github.com/checkpoint-restore/go-criu/v8 v8.1.0 - github.com/containers/storage v1.59.1 - github.com/klauspost/pgzip v1.2.6 - github.com/spf13/cobra v1.10.2 Generated with Claude Code (https://claude.ai/code) Signed-off-by: Adrian Reber <[email protected]>
Workaround: the vendored CRI API files are manually copied from the Kubernetes repository and need to be updated via Go's module mechanism once the upstream changes are released. Add CheckpointPod and RestorePod RPCs to the CRI RuntimeService, including request/response message definitions, gRPC bindings, and client implementations. Stub implementations in containerd return Unimplemented for now. This is part of KEP-5823: Pod-Level Checkpoint/Restore. Generated with Claude Code (https://claude.ai/code) Signed-off-by: Adrian Reber <[email protected]>
Add CheckpointPod RPC implementation that checkpoints all running containers in a pod sandbox to a tar archive. The checkpoint flow: 1. Pause all containers for a consistent snapshot 2. Checkpoint each container via task.Checkpoint (CRIU) 3. Extract checkpoint data (CRIU images, rootfs diff, spec, config, status) into per-container directories 4. Write pod metadata (pod.options, pod.dump) 5. Create tar archive at the requested path 6. Resume containers if leaveRunning option is set Add stub RestorePod implementation (not yet functional) and non-Linux stubs for both operations. Add pod_checkpoint metric to track checkpoint duration. This is part of KEP-5823: Pod-Level Checkpoint/Restore. Generated with Claude Code (https://claude.ai/code) Signed-off-by: Adrian Reber <[email protected]>
| // Phase 1: Pause all containers before checkpointing. | ||
| // This ensures all containers are frozen at roughly the same time | ||
| // for a consistent snapshot. | ||
| pausedTasks := make(map[string]client.Task) |
There was a problem hiding this comment.
This should be wired through sandbox API.
Kube/CRI specific code should remain here, implementation specific one (pause containers, criu, etc) should go to podsandbox/ package. The communication between should happen via Controller interface (see here: https://github.com/containerd/containerd/blob/main/api/runtime/sandbox/v1/sandbox.proto#L32)
There was a problem hiding this comment.
What does your "This" exactly refer to? I guess you mean pausing of containers, right?
There was a problem hiding this comment.
Functions implementations need a split (both CheckpointPod and RestorePod).
With sandbox API different runtimes may provide it's own implementations (which often the case for VM based containers).
This way we don't force everyone to use criu, but allow runtimes to decide what's the best way to handle CRI requests.
There was a problem hiding this comment.
Thanks for the explanation. Now it all makes sense. Will rework it.
@mxpv Thank you for your comment! We received similar feedback from several people in the Kubernetes community. Our goal with KEP-5823 is to support checkpoint/restore with multiple runtimes (e.g., gVisor, runc/crun with criu). I am updating this KEP at the moment to finalize the design, and this pull request is not ready for review. |
This implements the CheckpointPod CRI RPC which checkpoints all running containers in a pod sandbox to a tar archive. The checkpoint flow pauses all containers for a consistent snapshot, checkpoints each container via CRIU, collects
checkpoint data (CRIU images, rootfs diff, spec, config, status) into per-container directories, writes pod metadata (pod.options, pod.dump), and creates a tar archive at the requested path. Containers are resumed if the leaveRunning
option is set.
The CRI API changes (CheckpointPodRequest/CheckpointPodResponse, RestorePodRequest/RestorePodResponse) are manually vendored from the Kubernetes repository until the upstream changes are released. RestorePod is stubbed out for now.
Based on the CRI-O implementation: cri-o/cri-o#9758
See:
Changes