Nimbus

Nimbus is a self-hosted CI platform built around Firecracker microVMs, org-scoped storage, and end-to-end observability for AI evaluation workloads. It replaces GitHub-hosted runners while keeping execution on infrastructure you control.

Documentation

Architecture Overview

Control Plane (src/nimbus/control_plane): Validates GitHub workflow_job webhooks (HMAC, timestamp, delivery replay fence), enforces distributed per-org rate limits, issues agent/cache tokens, and brokers job leases over Redis + Postgres. It also exposes SAML SSO, SCIM provisioning, service accounts, and compliance export logging.
Host Agent (src/nimbus/host_agent): Runs Firecracker microVMs with snapshot boot and network fencing, plus Docker and GPU executors selected by capability labels. The agent layers in warm pools, resource/performance telemetry, offline egress enforcement, SBOM generation, and supply-chain allow/deny policies.
Executor System (src/nimbus/runners): Pluggable executors share a common Executor protocol, pool manager, resource tracker, and watchdogs for timeouts and lease renewal fencing.
Cache Proxy (src/nimbus/cache_proxy): Multi-tenant cache front-end with S3/local backends, org quotas, eviction metrics, and circuit-breakers to isolate backend failures.
Docker Layer Cache (src/nimbus/docker_cache): Minimal OCI registry enforcing org-prefixed repositories, blob accounting, and scoped cache tokens for push/pull.
Logging Pipeline (src/nimbus/logging_pipeline): ClickHouse-backed ingestion API with batched writes, scoped query filters, and hardened metrics endpoints.
Web Dashboard (web): React/Vite SPA that surfaces job queues, agent health, logs, and compliance metadata via the public API.

Security Highlights

Lease fencing with rotating fence tokens prevents duplicate job execution across agents.
Webhook signature + timestamp validation with replay tracking (x-github-delivery) blocks tampering and replays.
Agent/cache/service-account tokens are org scoped, versioned, and auditable through Postgres-backed ledgers.
Offline-mode egress enforcement combines metadata endpoint deny-lists, regex policy packs, and explicit registry allow-lists.
Rootfs attestation, cosign provenance checks, and per-job SBOM generation tighten host supply-chain posture.
Metrics and admin endpoints require bearer tokens and can be IP-filtered for additional hardening.

Quick Start

Install dependencies and bootstrap the environment (uv venv, uv pip install -e .).
Configure the required environment variables and secrets (see Configuration).
Follow the detailed setup in Getting Started to launch services and supporting infrastructure.
Run the automated checks with uv run pytest to validate control plane, host agent, caching, and executor integrations.

GitHub Actions Integration

Workflows can target Nimbus runners using capability-based labels:

# Secure isolation (default)
runs-on: [nimbus]  # Uses Firecracker microVMs

# Fast startup for CI/CD
runs-on: [nimbus, docker]  # ~200ms startup

# GPU acceleration for ML/AI
runs-on: [nimbus, gpu, pytorch, gpu-count:2]  # 2 GPUs

# Custom configurations
runs-on: [nimbus, docker, image:node:18-alpine]

The control plane verifies workflow_job signatures, enforces per-org rate limits, and dispatches jobs to agents based on capability matching.

Pre-built Runners

Nimbus publishes curated container images, such as nimbus/ai-eval-runner (Node.js, eval2otel, ollama client) for evaluation workloads. See Getting Started for example usage.

Operations & Hardening

Day-two procedures, monitoring, and ClickHouse schema live in the Operations Guide.
Firecracker jailer, seccomp, and capability dropping guidance is documented in Firecracker Security Hardening.

Repository Layout

src/nimbus/control_plane: FastAPI application, database models, RBAC/SCIM/SAML integrations, compliance tooling.
src/nimbus/host_agent: Firecracker launcher, multi-executor orchestration, warm pools, egress enforcement, and SSH utilities.
src/nimbus/runners: Executor implementations (Firecracker, Docker, GPU), pool manager, resource tracker, performance monitor.
src/nimbus/cache_proxy & src/nimbus/docker_cache: Artifact/cache services with metrics, quota enforcement, and S3/OCI backends.
src/nimbus/logging_pipeline: ClickHouse ingestion and querying service for job logs.
web: Vite/React dashboard for operational monitoring.
tests: Extensive pytest suite covering services, executors, security controls, and CLI tooling.

Roadmap Snapshot

Complete: Multi-tenant isolation, lease fencing, webhook replay protection, distributed rate limiting, metrics endpoint authentication, tenant analytics dashboard, multi-executor system with Firecracker/Docker/GPU support, warm pools, snapshot boot, comprehensive performance monitoring.
In Progress: Enhanced GPU scheduling, ARM64 support, advanced resource optimization.
Planned: Kubernetes executor, Windows containers, auto-scaling warm pools, cost optimization features.

Contributing

Nimbus is ready for production deployments with a mature multi-executor architecture. See the Executor System Guide for comprehensive usage documentation. Contributions welcome in:

New executor implementations (Kubernetes, ARM64, Windows)
Advanced GPU scheduling and optimization
Performance analysis and cost optimization
Extended warm pool strategies

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github/workflows		.github/workflows
containers/ai-eval-runner		containers/ai-eval-runner
deploy		deploy
deployment		deployment
docs		docs
examples/github-actions-migration		examples/github-actions-migration
migrations		migrations
scripts		scripts
src		src
tests		tests
tools		tools
web		web
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
compose.env.sample		compose.env.sample
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Nimbus

Documentation

Architecture Overview

Security Highlights

Quick Start

GitHub Actions Integration

Pre-built Runners

Operations & Hardening

Repository Layout

Roadmap Snapshot

Contributing

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

Uh oh!

evalops/nimbus

Folders and files

Latest commit

History

Repository files navigation

Nimbus

Documentation

Architecture Overview

Security Highlights

Quick Start

GitHub Actions Integration

Pre-built Runners

Operations & Hardening

Repository Layout

Roadmap Snapshot

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages