operations-runbook.md

Operations Runbook

Recurring checks for production Sovra operators.

Verify core health endpoints:
- Web: GET /api/health returns status: "ok" and no missing_config checks
- Worker: GET /health
Review error monitoring (Sentry) for new high-severity issues.
Review security signals:
- Secret scanning alerts
- Code scanning alerts
- Dependency alerts
Confirm background workflows are green (CI, Security, Deploy).

Run dependency hygiene checks:
- pnpm audit --prod
- Review Dependabot updates
Validate DB migration and policy state in staging + production.
Spot-check tenant isolation in critical read/write paths.
Review worker logs for auth failures and broadcast errors.

Rotate and validate shared secrets:
- INTERNAL_API_SECRET
- SUPABASE_JWT_SECRET
Review release process and rollback readiness.
Run a recovery drill:
- Re-deploy from clean commit
- Validate health + core flows
Check docs drift:
- README.md
- docs/environment-variables.md
- docs/deployment.md

Priority	Typical impact	Target response
`P1`	Full outage, security incident, or cross-tenant data risk	Immediate
`P2`	Major degradation with business impact	< 4 hours
`P3`	Partial degradation with workaround	< 1 business day
`P4`	Low-impact bug or docs issue	Next planned cycle