Codestin Search App

adrianreber · 2026-02-23T16:18:46Z

This implements the CheckpointPod CRI RPC which checkpoints all running containers in a pod sandbox to a tar archive. The checkpoint flow pauses all containers for a consistent snapshot, checkpoints each container via CRIU, collects
checkpoint data (CRIU images, rootfs diff, spec, config, status) into per-container directories, writes pod metadata (pod.options, pod.dump), and creates a tar archive at the requested path. Containers are resumed if the leaveRunning
option is set.

The CRI API changes (CheckpointPodRequest/CheckpointPodResponse, RestorePodRequest/RestorePodResponse) are manually vendored from the Kubernetes repository until the upstream changes are released. RestorePod is stubbed out for now.

Based on the CRI-O implementation: cri-o/cri-o#9758

See:

Changes

Bump checkpointctl to 06e23e6d1f24 for pod checkpoint metadata types (CheckpointedPodOptions, PodOptionsFile, PodDumpFile)
Add CheckpointPod and RestorePod CRI API types (vendored from Kubernetes)
Implement CheckpointPod in internal/cri/server/sandbox_checkpoint_linux.go
Add instrumented service forwarding for both new RPCs
Add pod_checkpoint metric
Add non-Linux stubs and stub RestorePod implementation

Update github.com/checkpoint-restore/checkpointctl from v1.5.0 to v1.5.1-0.20260212150839-06e23e6d1f24. This pulls in the PodOptionsFile constant and updated checkpoint metadata definitions needed for pod-level checkpoint/restore support. Also updates transitive dependencies: - github.com/checkpoint-restore/go-criu/v8 v8.1.0 - github.com/containers/storage v1.59.1 - github.com/klauspost/pgzip v1.2.6 - github.com/spf13/cobra v1.10.2 Generated with Claude Code (https://claude.ai/code) Signed-off-by: Adrian Reber <[email protected]>

Workaround: the vendored CRI API files are manually copied from the Kubernetes repository and need to be updated via Go's module mechanism once the upstream changes are released. Add CheckpointPod and RestorePod RPCs to the CRI RuntimeService, including request/response message definitions, gRPC bindings, and client implementations. Stub implementations in containerd return Unimplemented for now. This is part of KEP-5823: Pod-Level Checkpoint/Restore. Generated with Claude Code (https://claude.ai/code) Signed-off-by: Adrian Reber <[email protected]>

Add CheckpointPod RPC implementation that checkpoints all running containers in a pod sandbox to a tar archive. The checkpoint flow: 1. Pause all containers for a consistent snapshot 2. Checkpoint each container via task.Checkpoint (CRIU) 3. Extract checkpoint data (CRIU images, rootfs diff, spec, config, status) into per-container directories 4. Write pod metadata (pod.options, pod.dump) 5. Create tar archive at the requested path 6. Resume containers if leaveRunning option is set Add stub RestorePod implementation (not yet functional) and non-Linux stubs for both operations. Add pod_checkpoint metric to track checkpoint duration. This is part of KEP-5823: Pod-Level Checkpoint/Restore. Generated with Claude Code (https://claude.ai/code) Signed-off-by: Adrian Reber <[email protected]>

mxpv · 2026-02-24T19:12:27Z

internal/cri/server/sandbox_checkpoint_linux.go

+	// Phase 1: Pause all containers before checkpointing.
+	// This ensures all containers are frozen at roughly the same time
+	// for a consistent snapshot.
+	pausedTasks := make(map[string]client.Task)


This should be wired through sandbox API.
Kube/CRI specific code should remain here, implementation specific one (pause containers, criu, etc) should go to podsandbox/ package. The communication between should happen via Controller interface (see here: https://github.com/containerd/containerd/blob/main/api/runtime/sandbox/v1/sandbox.proto#L32)

What does your "This" exactly refer to? I guess you mean pausing of containers, right?

Functions implementations need a split (both CheckpointPod and RestorePod).
With sandbox API different runtimes may provide it's own implementations (which often the case for VM based containers).

This way we don't force everyone to use criu, but allow runtimes to decide what's the best way to handle CRI requests.

Thanks for the explanation. Now it all makes sense. Will rework it.

rst0git · 2026-03-01T10:09:11Z

This way we don't force everyone to use criu, but allow runtimes to decide what's the best way to handle CRI requests.

@mxpv Thank you for your comment! We received similar feedback from several people in the Kubernetes community. Our goal with KEP-5823 is to support checkpoint/restore with multiple runtimes (e.g., gVisor, runc/crun with criu). I am updating this KEP at the moment to finalize the design, and this pull request is not ready for review.

adrianreber added 3 commits February 23, 2026 11:15

github-project-automation bot added this to Pull Request Review Feb 23, 2026

k8s-ci-robot added the do-not-merge/work-in-progress label Feb 23, 2026

github-project-automation bot moved this to Needs Triage in Pull Request Review Feb 23, 2026

k8s-ci-robot added the size/XXL label Feb 23, 2026

mxpv self-requested a review February 24, 2026 19:08

mxpv reviewed Feb 24, 2026

View reviewed changes

samuelkarp added area/cri Container Runtime Interface (CRI) area/criu checkpoint/resume area/runtime Runtime labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pod-level checkpoint support to containerd (KEP-5823)#12932

Add pod-level checkpoint support to containerd (KEP-5823)#12932
adrianreber wants to merge 3 commits intocontainerd:mainfrom
adrianreber:2026-02-23-KEP-5823

adrianreber commented Feb 23, 2026 •

edited

Loading

Uh oh!

mxpv Feb 24, 2026

Uh oh!

adrianreber Feb 25, 2026

Uh oh!

mxpv Feb 25, 2026

Uh oh!

adrianreber Feb 26, 2026

Uh oh!

rst0git commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

adrianreber commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

mxpv Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

adrianreber Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

mxpv Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

adrianreber Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

rst0git commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

adrianreber commented Feb 23, 2026 •

edited

Loading