kata-pulse

Real-time metrics for Kata Containers. A cadvisor-compatible monitoring agent providing metrics collection for Kata Containers environments.

Overview

kata-pulse is a lightweight Rust-based monitoring daemon that:

📊 Collects metrics from Kata Container sandboxes
🔄 Aggregates metrics across all running sandboxes
🏷️ Maps Cloud Hypervisor metrics to cAdvisor-compatible format
🔗 Discovers per-sandbox monitoring agents
🎯 Integrates seamlessly with Kubernetes and Prometheus monitoring stacks

Features

Core Capabilities

Multi-Sandbox Metrics Collection - Monitor metrics from multiple Kata sandboxes simultaneously
Automatic Sandbox Discovery - Detects new sandboxes from filesystem and CRI runtime
Kubernetes Integration - Enriches metrics with pod names, namespaces, and UIDs from CRI
cAdvisor Compatibility - Outputs metrics in cAdvisor-compatible Prometheus format
Automatic Cleanup - Removes cached metrics when sandboxes are deleted

Performance

Zero-Copy Architecture - Uses Arc for efficient memory management
Async/Await - Tokio-based asynchronous operations for high throughput
Caching Strategy - In-memory caches for sandbox metadata and metrics
Configurable Intervals - Adjustable metrics collection frequency
Layer Caching - Docker multi-stage build with dependency caching

Monitoring

Prometheus Metrics - Exposes health and performance metrics
Health Checks - HTTP endpoint for container health verification
Structured Logging - Tracing-based logging with configurable levels
Error Tracking - Comprehensive error handling and reporting

Quick Start

Docker

# Pull from GitHub Container Registry
docker pull ghcr.io/kata-containers/kata-pulse:latest

# Run with defaults
docker run -d \
  --name kata-pulse \
  -p 8090:8090 \
  -v /run/kata:/run/kata:ro \
  -v /run/vc/sbs:/run/vc/sbs:ro \
  -v /run/containerd/containerd.sock:/run/containerd/containerd.sock:ro \
  ghcr.io/kata-containers/kata-pulse:latest

# Check metrics
curl http://localhost:8090/metrics

From Source

# Clone repository
git clone https://github.com/kata-containers/kata-pulse.git
cd kata-pulse

# Build release binary
cargo build --release

# Run
./target/release/kata-pulse

# Or with custom config
./target/release/kata-pulse \
  --listen-address 0.0.0.0:8090 \
  --runtime-endpoint /run/containerd/containerd.sock \
  --log-level info

Configuration

Environment Variables

# HTTP server configuration
KATA_PULSE_LISTEN=127.0.0.1:8090              # Listen address (default)
RUST_LOG=info                                   # Log level (trace/debug/info/warn/error)

# Container runtime
RUNTIME_ENDPOINT=/run/containerd/containerd.sock  # CRI socket path

# Metrics collection
KATA_PULSE_METRICS_INTERVAL=60                # Interval in seconds (default: 60)

Command Line Arguments

./target/release/kata-pulse --help

OPTIONS:
  -l, --listen-address <LISTEN_ADDRESS>
          HTTP server listen address
          [default: 127.0.0.1:8090]
          [env: KATA_PULSE_LISTEN]

  -r, --runtime-endpoint <RUNTIME_ENDPOINT>
          CRI runtime socket path
          [default: /run/containerd/containerd.sock]
          [env: RUNTIME_ENDPOINT]

  -m, --metrics-interval-secs <METRICS_INTERVAL_SECS>
          Metrics collection interval in seconds
          [default: 60]
          [env: KATA_PULSE_METRICS_INTERVAL]

  -h, --help
          Print help

API Endpoints

GET /

Returns HTML or plain text index page (based on Accept header)

curl http://localhost:8090/

GET /metrics

Aggregated metrics from all sandboxes in Prometheus format

curl http://localhost:8090/metrics
curl http://localhost:8090/metrics?sandbox=sandbox-123  # Per-sandbox

GET /sandboxes

List all running sandboxes

curl http://localhost:8090/sandboxes

[
  {
    "sandbox_id": "abc123...",
    "pod_name": "my-pod",
    "namespace": "default",
    "uid": "12345-67890"
  }
]

Architecture

┌──────────────────────────────────────────────────────────┐
│                   kata-pulse Daemon                      │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌──────────────────┐         ┌──────────────────┐       │
│  │  HTTP Server     │         │  Metrics Cache   │       │
│  │  (Axum)          │──────→  │  (Arc<RwLock>)   │       │
│  └──────────────────┘         └──────────────────┘       │
│       ↓                              ↑                   │
│  GET /metrics                   Updated every           │
│  GET /sandboxes                 60 seconds              │
│                                                          │
│  ┌──────────────────┐         ┌──────────────────┐       │
│  │ Metrics Collector│         │ Sandbox Cache    │       │
│  │ (Tokio Task)     │──────→  │ (Arc<RwLock>)    │       │
│  └──────────────────┘         └──────────────────┘       │
│       ↓                             ↑                    │
│   Per-sandbox shim    ← CRI Sync Task (every 5s)         │
│   Unix domain sockets                                   │
│                                                          │
│  ┌──────────────────┐         ┌──────────────────┐       │
│  │ Sandbox Manager  │         │ Directory        │       │
│  │ (SandboxCache    │ ←────→  │ Watcher          │       │
│  │  + CRI Client)   │         │ (/run/vc/sbs,    │       │
│  └──────────────────┘         │  /run/kata)      │       │
│                               └──────────────────┘       │
│                                                          │
│  ┌──────────────────┐         ┌──────────────────┐       │
│  │ Metrics Converter│         │ Cloud Hypervisor │       │
│  │ (cAdvisor compat)│ ←────→  │ → cAdvisor       │       │
│  └──────────────────┘         │ format mapper    │       │
│                               └──────────────────┘       │
│                                                          │
└──────────────────────────────────────────────────────────┘

Key Components

HTTP Server (Axum) - High-performance async HTTP server for endpoints
- GET / - Index page (HTML format)
- GET /metrics - Aggregated or per-sandbox metrics in Prometheus format
- GET /sandboxes - List all running sandboxes with metadata
Metrics Collector - Background task that periodically:
- Queries active sandboxes from cache
- Fetches metrics from per-sandbox shims via Unix sockets
- Parses Prometheus format metrics
- Stores metrics in thread-safe cache (double-buffered)
Sandbox Cache Manager - Tracks sandbox lifecycle:
- Watches /run/vc/sbs and /run/kata directories for additions/deletions
- Syncs metadata with CRI runtime every 5 seconds
- Maintains CRI metadata (pod name, namespace, UID)
- Cleans up stale metrics when sandboxes terminate
CRI Client - Kubernetes container runtime integration:
- gRPC connection to containerd CRI endpoint
- Enriches sandbox metadata with Kubernetes pod information
- Handles retries and connection management
Metrics Converter - Cloud Hypervisor format transformation:
- Parses Prometheus metrics from shim (gauge format with labels)
- Converts CPU time (microseconds → seconds), memory (KB), network (bytes), disk I/O
- Enriches with Kubernetes labels (pod_name, namespace, uid)
- Outputs cAdvisor-compatible format for Prometheus scraping

Metrics Format

Output is cAdvisor-compatible Prometheus format:

# CPU metrics
container_cpu_usage_seconds_total{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod",cpu="total"} 1234.5

# Memory metrics
container_memory_usage_bytes{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod"} 536870912

# Network metrics (per-interface)
container_network_receive_bytes_total{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod",interface="eth0"} 1024000

# Disk I/O metrics (per-device)
container_blkio_device_usage_total{container="",device="",id="/kubepods/...",image="",major="8",minor="0",name="my-pod",namespace="default",operation="Read",pod="my-pod"} 2000000

# Process/task metrics
container_processes_count{container="",id="/kubepods/...",image="",name="my-pod",namespace="default",pod="my-pod"} 42

Development

Build

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Check without building
cargo check

Testing

# Run all tests
cargo test

# Run specific test
cargo test test_name

# Run with output
cargo test -- --nocapture

# Check coverage
cargo tarpaulin --out Html

Code Quality

# Format code
cargo fmt

# Check formatting
cargo fmt -- --check

# Lint with clippy
cargo clippy -- -D warnings

# Security audit
cargo audit

Kubernetes Deployment

Helm Chart

The easiest way to deploy kata-pulse to Kubernetes is using the official Helm chart.

Prerequisites:

Kubernetes 1.20+
Helm 3.0+
Prometheus Operator (for PodMonitor integration)
Kata Containers runtime installed on nodes

Installation:

# Install from GHCR
helm install kata-pulse oci://ghcr.io/diverofdark/kata-pulse-chart

# Or to specific namespace
helm install kata-pulse oci://ghcr.io/diverofdark/kata-pulse-chart -n monitoring --create-namespace

# With custom values
helm install kata-pulse oci://ghcr.io/diverofdark/kata-pulse-chart \
  --set config.logLevel=debug \
  --set config.metricsIntervalSecs=30

Chart Configuration:

Key	Default	Description
`image.pullPolicy`	`Always`	Image pull policy
`config.runtimeEndpoint`	`/run/containerd/containerd.sock`	CRI runtime socket
`config.metricsIntervalSecs`	`60`	Metrics collection interval
`config.logLevel`	`info`	Log level (trace/debug/info/warn/error)
`resources.requests.cpu`	`50m`	CPU request
`resources.requests.memory`	`100Mi`	Memory request
`resources.limits.cpu`	`100m`	CPU limit
`resources.limits.memory`	`200Mi`	Memory limit
`podMonitor.enabled`	`true`	Enable Prometheus PodMonitor
`podMonitor.interval`	`30s`	Scrape interval

Uninstall:

helm uninstall kata-pulse

For detailed chart documentation, see helm/kata-pulse/README.md.

DaemonSet Example

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kata-pulse
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: kata-pulse
  template:
    metadata:
      labels:
        app: kata-pulse
    spec:
      hostNetwork: true
      containers:
      - name: kata-pulse
        image: ghcr.io/kata-containers/kata-pulse:latest
        ports:
        - name: metrics
          containerPort: 8090
          hostPort: 8090
        env:
        - name: RUST_LOG
          value: info
        volumeMounts:
        - name: sandbox-dir
          mountPath: /run/vc/sbs
          readOnly: true
        - name: kata-dir
          mountPath: /run/kata
          readOnly: true
        - name: containerd-socket
          mountPath: /run/containerd
          readOnly: true
        livenessProbe:
          httpGet:
            path: /
            port: metrics
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: metrics
          initialDelaySeconds: 5
          periodSeconds: 10
      volumes:
      - name: sandbox-dir
        hostPath:
          path: /run/vc/sbs
      - name: kata-dir
        hostPath:
           path: /run/kata
      - name: containerd-socket
        hostPath:
          path: /run/containerd

Troubleshooting

No metrics appearing

Check logs

docker logs kata-pulse
RUST_LOG=debug ./target/release/kata-pulse

Verify sandbox connectivity

ls /run/vc/sbs  # Should see sandbox directories

Check CRI socket
```
ls -la /run/containerd/containerd.sock
```

High memory usage

Adjust metrics cache cleanup
Reduce metrics collection frequency
Monitor number of active sandboxes

Slow metrics collection

Check network connectivity to CRI
Review Prometheus scrape interval
Check system load and available resources

Performance Considerations

Metric	Value
Memory per sandbox	~2-5 MB
Metrics latency	<1 second
CPU overhead	<1% per 100 sandboxes
Typical startup time	<2 seconds

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feat/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feat/amazing-feature)
Open a Pull Request

Development Guidelines

Follow Rust naming conventions
Write tests for new features
Update documentation
Run cargo fmt and cargo clippy
Ensure all tests pass: cargo test

License

This project is licensed under the Apache License 2.0 - see individual source files for details.

References

Support

For issues, questions, or suggestions:

Check GitHub Issues
Review documentation
Open a new issue with detailed information

Acknowledgments

This project was inspired by kata-monitor, the original Go-based monitoring agent for Kata Containers. kata-pulse is a complete rewrite in Rust with significant improvements in architecture, performance, and maintainability.

Built with:

🦀 Rust (Edition 2021)
⚡ Tokio async runtime
🌐 Axum HTTP framework
📊 Prometheus metrics format
🐳 Docker multi-stage builds with distroless
🏗️ Cloud Hypervisor metrics integration
☸️ Kubernetes CRI integration

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
helm/kata-pulse		helm/kata-pulse
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Dockerfile.debug		Dockerfile.debug
README.md		README.md

DiverOfDark/kata-pulse

Folders and files

Latest commit

History

Repository files navigation

kata-pulse

Overview

Features

Core Capabilities

Performance

Monitoring

Quick Start

Docker

From Source

Configuration

Environment Variables

Command Line Arguments

API Endpoints

GET /

GET /metrics

GET /sandboxes

Architecture

Key Components

Metrics Format

Development

Build

Testing

Code Quality

Kubernetes Deployment

Helm Chart

DaemonSet Example

Troubleshooting

No metrics appearing

High memory usage

Slow metrics collection

Performance Considerations

Contributing

Development Guidelines

License

References

Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages