zeropod - pod that scales down to zero

Zeropod is a Kubernetes runtime (or more specifically a containerd shim) that automatically checkpoints containers to disk after a certain amount of time of the last TCP connection. While in scaled down state, it will listen on the same port the application inside the container was listening on and will restore the container on the first incoming connection. Depending on the memory size of the checkpointed program this happens in tens to a few hundred milliseconds, virtually unnoticeable to the user. As all the memory contents are stored to disk during checkpointing, all state of the application is restored. It adjusts resource requests in scaled down state in-place if the cluster supports it. To prevent huge resource usage spikes when draining a node, scaled down pods can be migrated between nodes without needing to start up. There is also a more experimental live migration feature that stretches the scaling use cases of zeropod.

Use cases

Low traffic sites
Dev/Staging environments
Providing a small tier on Heroku-like platforms
"Mostly static" sites that still need some server component

How it works

First off, what is this containerd shim? The shim sits between containerd and the container sandbox. Each pod has such a long-running process that calls out to runc to manage the lifecycle of all containers of a pod.

show containerd architecture

There are several components that make zeropod work but here are the most important ones:

Checkpointing is done using CRIU.
After checkpointing, a userspace TCP proxy (activator) is created on a random port and an eBPF program is loaded to redirect packets destined to the checkpointed container to the activator. The activator then accepts the connection, restores the process, signals to disable the eBPF redirect and then proxies the initial request(s) to the restored application. See activation sequence for more details.
All subsequent connections go directly to the application without any proxying and performance impact.
The redirector eBPF program is also used to track the last TCP activity targeting the running application. This helps zeropod delay checkpointing if there is recent activity. This avoids too much flapping on a service that is frequently used.
To the container runtime (e.g. Kubernetes), the container appears to be running even though the process is technically not. This is required to prevent the runtime from trying to restart the container.
When running kubectl exec on to the scaled down container, it will be restored and the exec should work just as with any normal Kubernetes container.
Metrics are recorded continuously within each shim and the zeropod-manager process that runs once per node (DaemonSet) is responsible to collect and merge all metrics from the different shim processes. The shim exposes a unix socket for the manager to connect. The manager exposes the merged metrics on an HTTP endpoint.

Activation sequence

This diagram shows what happens when a user initiates a connection to a checkpointed container.

show diagram

sequenceDiagram
    actor User
    participant Redirector
    participant Activator
    participant Container
    Note over Container: checkpointed
    Note over Activator: listening on port 41234
    User->>Redirector: TCP connect to port 80
    Note right of User: local port 12345
    Redirector->>Redirector: redirect to port 41234
    Redirector->>Activator: TCP connect
    Activator->>Activator: TCP accept
    Activator->>Container: restore
    loop every millisecond
        Activator->>Container: TCP connect to port 80
    end
    Note over Container: restored
    Container-->>Activator: TCP accept
    Activator-->>Redirector: TCP accept
    Redirector-->>Redirector: redirect to port 12345
    Redirector-->>User: TCP accept
    Note right of User: connection between user<br>and container established
    User->>Container: TCP connect to port 80
    Note over Redirector: pass
    Container-->>User: TCP accept
    Note over Redirector: pass

Compatibility

Most programs should to just work with zeropod out of the box. The examples directory contains a variety of software that have been tested successfully. If something fails, the containerd logs can prove useful to figuring out what went wrong as it will output the CRIU log on checkpoint/restore failure. What has proven somewhat flaky sometimes are some arm64 workloads running in a linux VM on top of Mac OS. If you run into any issues with your software, please don't hesitate to create an issue.

Docs

For more resources and documentation, head to the docs.

Name		Name	Last commit message	Last commit date
Latest commit History 346 Commits
.github/workflows		.github/workflows
activator		activator
api		api
cmd		cmd
config		config
criu		criu
docs		docs
e2e		e2e
hack		hack
manager		manager
shim		shim
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

zeropod - pod that scales down to zero

Use cases

How it works

Activation sequence

Compatibility

Docs

About

Uh oh!

Releases 17

Packages

Uh oh!

Uh oh!

Contributors 3

Languages

License

ctrox/zeropod

Folders and files

Latest commit

History

Repository files navigation

zeropod - pod that scales down to zero

Use cases

How it works

Activation sequence

Compatibility

Docs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Uh oh!

Contributors 3

Languages

Packages