High-Performance S3 Ingress Controller (BGP/ECMP or VRRP)
Herr Ober ("Head Waiter") is a lightweight, high-throughput (50GB/s+) ingress controller designed for Ceph RGW clusters. It uses HAProxy 3.3 (AWS-LC) for SSL offloading with two HA modes:
- BGP/ECMP Mode - Uses ExaBGP for Layer 3 HA via BGP (requires BGP-capable router)
- Keepalived Mode - Uses VRRP for VIP failover (no BGP required)
Supported: Ubuntu, Debian, RHEL 10+ on Proxmox VMs (KVM)
For deep internals, kernel tuning, and failure recovery logic, see architecture.md.
Before installing, ensure the VM is configured for 50GB/s throughput:
- CPU: Type
host(AES-NI passthrough) - Network:
VirtIOwith Multiqueue enabled (Queues = vCPUs) - Hardware Watchdog: Add device
Intel 6300ESB→ Action:Reset
One-liner (recommended):
curl -fsSL https://raw.githubusercontent.com/dirkpetersen/herr-ober/main/install.sh | sudo bash
sudo ober bootstrapManual install:
sudo su -
apt install -y pipx
pipx ensurepath
source ~/.bashrc
pipx install herr-ober
ober bootstrapInteractive wizard to select HA mode (BGP or Keepalived) and configure VIPs, backends, and certificates.
sudo ober configNote: Choose BGP mode if you have a BGP-capable router for ECMP load balancing. Choose Keepalived mode for VRRP-based failover without BGP.
# Check prerequisites and configuration
ober doctor
# View service status
ober statusFor a 3-node keepalived cluster without BGP:
Node 1:
sudo ober bootstrap
sudo ober config
# Select: Keepalived/VRRP mode
# Peers: node2,node3 (or 10.0.0.2,10.0.0.3)
# VIPs: 192.168.1.100,192.168.1.101,192.168.1.102 (one per node)
sudo ober startNode 2 & 3: Repeat the same steps with appropriate peer IPs.
DNS Round-Robin:
s3.example.com IN A 192.168.1.100
s3.example.com IN A 192.168.1.101
s3.example.com IN A 192.168.1.102
ober bootstrap [path] # Install and set up everything
ober config [--dry-run] # Interactive configuration wizard (choose BGP or Keepalived mode)
ober sync # Update external system whitelists
ober status # Show current state (--json for scripting)
ober start|stop|restart # Service management (stop gracefully withdraws routes/releases VIPs)
ober logs [-f] [-n N] # View logs (--service http|bgp|ha|all to filter)
ober doctor # Diagnostic checks
ober test # Test BGP connectivity without starting services
ober upgrade # Check and install updates
ober uninstall # Clean removalUpdate external system whitelists with Slurm hostlists or IP addresses:
# Update all whitelists (interactive prompts)
ober sync
# Update specific whitelist
ober sync --routers "switch[01-04]"
ober sync --frontend-http "weka[001-100]"
ober sync --backend-http "rgw[01-08].internal"# Full status with systemd service info
ober status
# JSON output for monitoring integration
ober status --json
# Direct health endpoint
curl http://127.0.0.1:8404/health| Event | Recovery |
|---|---|
| Node Crash | Traffic fails over via ECMP (instant) |
| OS Freeze | Proxmox Watchdog hard-resets VM (10s) |
| HAProxy Crash | BGP withdraws immediately (BindsTo=) |
| Network Cut | BFD detects and tears down route (~150ms) |
| Event | Recovery |
|---|---|
| Node Crash | VIP fails over to backup node (instant) |
| OS Freeze | Proxmox Watchdog resets VM, VRRP timeout triggers failover (3-10s) |
| HAProxy Crash | Health check fails, priority drops, VIP fails over (4-6s) |
| Keepalived Crash | VRRP advertisements stop, VIP fails over (3s) |
| Network Partition | Risk of split-brain (both nodes may claim VIP) |
See architecture.md for detailed failure scenarios.
# Clone and install dev dependencies
git clone https://github.com/dirkpetersen/herr-ober.git
cd ober
pip install -e ".[dev]"
# Run tests
pytest
# Lint and format
ruff check .
ruff format .
# Type check
mypy ober/MIT