Real-time metrics for Kata Containers. A cadvisor-compatible monitoring agent providing metrics collection for Kata Containers environments.
kata-pulse is a lightweight Rust-based monitoring daemon that:
- π Collects metrics from Kata Container sandboxes
- π Aggregates metrics across all running sandboxes
- π·οΈ Maps Cloud Hypervisor metrics to cAdvisor-compatible format
- π Discovers per-sandbox monitoring agents
- π― Integrates seamlessly with Kubernetes and Prometheus monitoring stacks
- Multi-Sandbox Metrics Collection - Monitor metrics from multiple Kata sandboxes simultaneously
- Automatic Sandbox Discovery - Detects new sandboxes from filesystem and CRI runtime
- Kubernetes Integration - Enriches metrics with pod names, namespaces, and UIDs from CRI
- cAdvisor Compatibility - Outputs metrics in cAdvisor-compatible Prometheus format
- Automatic Cleanup - Removes cached metrics when sandboxes are deleted
- Zero-Copy Architecture - Uses Arc for efficient memory management
- Async/Await - Tokio-based asynchronous operations for high throughput
- Caching Strategy - In-memory caches for sandbox metadata and metrics
- Configurable Intervals - Adjustable metrics collection frequency
- Layer Caching - Docker multi-stage build with dependency caching
- Prometheus Metrics - Exposes health and performance metrics
- Health Checks - HTTP endpoint for container health verification
- Structured Logging - Tracing-based logging with configurable levels
- Error Tracking - Comprehensive error handling and reporting
# Pull from GitHub Container Registry
docker pull ghcr.io/kata-containers/kata-pulse:latest
# Run with defaults
docker run -d \
--name kata-pulse \
-p 8090:8090 \
-v /run/kata:/run/kata:ro \
-v /run/vc/sbs:/run/vc/sbs:ro \
-v /run/containerd/containerd.sock:/run/containerd/containerd.sock:ro \
ghcr.io/kata-containers/kata-pulse:latest
# Check metrics
curl http://localhost:8090/metrics# Clone repository
git clone https://github.com/kata-containers/kata-pulse.git
cd kata-pulse
# Build release binary
cargo build --release
# Run
./target/release/kata-pulse
# Or with custom config
./target/release/kata-pulse \
--listen-address 0.0.0.0:8090 \
--runtime-endpoint /run/containerd/containerd.sock \
--log-level info# HTTP server configuration
KATA_PULSE_LISTEN=127.0.0.1:8090 # Listen address (default)
RUST_LOG=info # Log level (trace/debug/info/warn/error)
# Container runtime
RUNTIME_ENDPOINT=/run/containerd/containerd.sock # CRI socket path
# Metrics collection
KATA_PULSE_METRICS_INTERVAL=60 # Interval in seconds (default: 60)./target/release/kata-pulse --help
OPTIONS:
-l, --listen-address <LISTEN_ADDRESS>
HTTP server listen address
[default: 127.0.0.1:8090]
[env: KATA_PULSE_LISTEN]
-r, --runtime-endpoint <RUNTIME_ENDPOINT>
CRI runtime socket path
[default: /run/containerd/containerd.sock]
[env: RUNTIME_ENDPOINT]
-m, --metrics-interval-secs <METRICS_INTERVAL_SECS>
Metrics collection interval in seconds
[default: 60]
[env: KATA_PULSE_METRICS_INTERVAL]
-h, --help
Print helpReturns HTML or plain text index page (based on Accept header)
curl http://localhost:8090/Aggregated metrics from all sandboxes in Prometheus format
curl http://localhost:8090/metrics
curl http://localhost:8090/metrics?sandbox=sandbox-123 # Per-sandboxList all running sandboxes
curl http://localhost:8090/sandboxes
[
{
"sandbox_id": "abc123...",
"pod_name": "my-pod",
"namespace": "default",
"uid": "12345-67890"
}
]ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β kata-pulse Daemon β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β HTTP Server β β Metrics Cache β β
β β (Axum) ββββββββ β (Arc<RwLock>) β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β β β
β GET /metrics Updated every β
β GET /sandboxes 60 seconds β
β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Metrics Collectorβ β Sandbox Cache β β
β β (Tokio Task) ββββββββ β (Arc<RwLock>) β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β β β
β Per-sandbox shim β CRI Sync Task (every 5s) β
β Unix domain sockets β
β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Sandbox Manager β β Directory β β
β β (SandboxCache β ββββββ β Watcher β β
β β + CRI Client) β β (/run/vc/sbs, β β
β ββββββββββββββββββββ β /run/kata) β β
β ββββββββββββββββββββ β
β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Metrics Converterβ β Cloud Hypervisor β β
β β (cAdvisor compat)β ββββββ β β cAdvisor β β
β ββββββββββββββββββββ β format mapper β β
β ββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-
HTTP Server (Axum) - High-performance async HTTP server for endpoints
- GET / - Index page (HTML format)
- GET /metrics - Aggregated or per-sandbox metrics in Prometheus format
- GET /sandboxes - List all running sandboxes with metadata
-
Metrics Collector - Background task that periodically:
- Queries active sandboxes from cache
- Fetches metrics from per-sandbox shims via Unix sockets
- Parses Prometheus format metrics
- Stores metrics in thread-safe cache (double-buffered)
-
Sandbox Cache Manager - Tracks sandbox lifecycle:
- Watches /run/vc/sbs and /run/kata directories for additions/deletions
- Syncs metadata with CRI runtime every 5 seconds
- Maintains CRI metadata (pod name, namespace, UID)
- Cleans up stale metrics when sandboxes terminate
-
CRI Client - Kubernetes container runtime integration:
- gRPC connection to containerd CRI endpoint
- Enriches sandbox metadata with Kubernetes pod information
- Handles retries and connection management
-
Metrics Converter - Cloud Hypervisor format transformation:
- Parses Prometheus metrics from shim (gauge format with labels)
- Converts CPU time (microseconds β seconds), memory (KB), network (bytes), disk I/O
- Enriches with Kubernetes labels (pod_name, namespace, uid)
- Outputs cAdvisor-compatible format for Prometheus scraping
Output is cAdvisor-compatible Prometheus format:
# CPU metrics
container_cpu_usage_seconds_total{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod",cpu="total"} 1234.5
# Memory metrics
container_memory_usage_bytes{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod"} 536870912
# Network metrics (per-interface)
container_network_receive_bytes_total{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod",interface="eth0"} 1024000
# Disk I/O metrics (per-device)
container_blkio_device_usage_total{container="",device="",id="/kubepods/...",image="",major="8",minor="0",name="my-pod",namespace="default",operation="Read",pod="my-pod"} 2000000
# Process/task metrics
container_processes_count{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod"} 42
# Debug build
cargo build
# Release build (optimized)
cargo build --release
# Check without building
cargo check# Run all tests
cargo test
# Run specific test
cargo test test_name
# Run with output
cargo test -- --nocapture
# Check coverage
cargo tarpaulin --out Html# Format code
cargo fmt
# Check formatting
cargo fmt -- --check
# Lint with clippy
cargo clippy -- -D warnings
# Security audit
cargo auditThe easiest way to deploy kata-pulse to Kubernetes is using the official Helm chart.
Prerequisites:
- Kubernetes 1.20+
- Helm 3.0+
- Prometheus Operator (for PodMonitor integration)
- Kata Containers runtime installed on nodes
Installation:
# Install from GHCR
helm install kata-pulse oci://ghcr.io/diverofdark/kata-pulse-chart
# Or to specific namespace
helm install kata-pulse oci://ghcr.io/diverofdark/kata-pulse-chart -n monitoring --create-namespace
# With custom values
helm install kata-pulse oci://ghcr.io/diverofdark/kata-pulse-chart \
--set config.logLevel=debug \
--set config.metricsIntervalSecs=30Chart Configuration:
| Key | Default | Description |
|---|---|---|
image.pullPolicy |
Always |
Image pull policy |
config.runtimeEndpoint |
/run/containerd/containerd.sock |
CRI runtime socket |
config.metricsIntervalSecs |
60 |
Metrics collection interval |
config.logLevel |
info |
Log level (trace/debug/info/warn/error) |
resources.requests.cpu |
50m |
CPU request |
resources.requests.memory |
100Mi |
Memory request |
resources.limits.cpu |
100m |
CPU limit |
resources.limits.memory |
200Mi |
Memory limit |
podMonitor.enabled |
true |
Enable Prometheus PodMonitor |
podMonitor.interval |
30s |
Scrape interval |
Uninstall:
helm uninstall kata-pulseFor detailed chart documentation, see helm/kata-pulse/README.md.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kata-pulse
namespace: kube-system
spec:
selector:
matchLabels:
app: kata-pulse
template:
metadata:
labels:
app: kata-pulse
spec:
hostNetwork: true
containers:
- name: kata-pulse
image: ghcr.io/kata-containers/kata-pulse:latest
ports:
- name: metrics
containerPort: 8090
hostPort: 8090
env:
- name: RUST_LOG
value: info
volumeMounts:
- name: sandbox-dir
mountPath: /run/vc/sbs
readOnly: true
- name: kata-dir
mountPath: /run/kata
readOnly: true
- name: containerd-socket
mountPath: /run/containerd
readOnly: true
livenessProbe:
httpGet:
path: /
port: metrics
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: metrics
initialDelaySeconds: 5
periodSeconds: 10
volumes:
- name: sandbox-dir
hostPath:
path: /run/vc/sbs
- name: kata-dir
hostPath:
path: /run/kata
- name: containerd-socket
hostPath:
path: /run/containerd-
Check logs
docker logs kata-pulse RUST_LOG=debug ./target/release/kata-pulse
-
Verify sandbox connectivity
ls /run/vc/sbs # Should see sandbox directories -
Check CRI socket
ls -la /run/containerd/containerd.sock
- Adjust metrics cache cleanup
- Reduce metrics collection frequency
- Monitor number of active sandboxes
- Check network connectivity to CRI
- Review Prometheus scrape interval
- Check system load and available resources
| Metric | Value |
|---|---|
| Memory per sandbox | ~2-5 MB |
| Metrics latency | <1 second |
| CPU overhead | <1% per 100 sandboxes |
| Typical startup time | <2 seconds |
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feat/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feat/amazing-feature) - Open a Pull Request
- Follow Rust naming conventions
- Write tests for new features
- Update documentation
- Run
cargo fmtandcargo clippy - Ensure all tests pass:
cargo test
This project is licensed under the Apache License 2.0 - see individual source files for details.
For issues, questions, or suggestions:
- Check GitHub Issues
- Review documentation
- Open a new issue with detailed information
This project was inspired by kata-monitor, the original Go-based monitoring agent for Kata Containers. kata-pulse is a complete rewrite in Rust with significant improvements in architecture, performance, and maintainability.
Built with:
- π¦ Rust (Edition 2021)
- β‘ Tokio async runtime
- π Axum HTTP framework
- π Prometheus metrics format
- π³ Docker multi-stage builds with distroless
- ποΈ Cloud Hypervisor metrics integration
- βΈοΈ Kubernetes CRI integration