Operational playbooks for on-call response. Each runbook targets a "fix in under 2 minutes" path for a known incident class.
Every runbook follows the same four sections:
- Symptômes — observable signals that should trigger this runbook
- Diagnostic — commands to confirm the issue and scope it
- Procédure — numbered, idempotent steps to restore service
- Post-action — checks, monitoring, post-mortem trigger
| Runbook | Purpose |
|---|---|
| severity-levels.md | P1/P2/P3 definitions (used by incident-create.yml) |
| worker-stuck.md | Messenger workers blocked or failing |
| database-disk-full.md | PostgreSQL disk pressure |
| redis-out-of-memory.md | Redis OOM evictions or refused writes |
| mercure-disconnected.md | SSE clients cannot reconnect |
| ollama-down.md | LLM dependency unreachable (hard dep per ADR-028) |
| valhalla-overpass-rebuild.md | Routing tiles or POI cache rebuild |
| oracle-vm-reclaimed.md | Oracle Always Free instance reclaimed |
| incident-template.md | Post-mortem template |
| release-rollback.md | Roll back a bad deploy via Coolify |
| release-checklist.md | Pre-release checklist |
| uptime-monitoring.md | Uptime Kuma + UptimeRobot configuration |
| secrets-inventory.md | Source of truth for every production secret |
| secrets-rotation.md | Rotation policy + per-class procedures |
- All commands assume the working directory is the repository root.
make php-shell/make pwa-shellopen a bash session inside the relevant container;bin/consoleis always called from there or viadocker compose exec php.- Production commands run on the Coolify host via SSH; the compose project is named after the Coolify application.
- Times are UTC unless stated otherwise.