This is a mono repository for my home infrastructure and Kubernetes cluster implementing GitOps practices using Talos Linux, Kubernetes, ArgoCD, and GitHub Actions. The repository manages 60+ applications across a hyper-converged Kubernetes cluster with automated dependency updates via Renovate.
My cluster is a 3-node high-availability setup running Talos Linux with Kubernetes. All three nodes function as control planes (no dedicated workers), providing both compute and distributed storage via Rook-Ceph. This hyper-converged architecture maximizes resource utilization across all nodes, with each node contributing:
- Compute: Kubernetes workload scheduling
- Storage: 1TB NVMe disk for Ceph distributed storage (block, filesystem, and object)
- Control Plane: etcd member and Kubernetes API server
The cluster uses 2-way replication for storage, tolerating one node failure while maintaining data availability. Initial bootstrap is managed with Helmfile and Just, with ArgoCD taking over for ongoing GitOps management using an app-of-apps pattern.
Networking & Ingress: Cilium provides eBPF-based networking with L2 announcements for LoadBalancer IPs. Envoy Gateway implements Gateway API, while Cloudflare Tunnel secures external access. External-DNS syncs DNS records to Cloudflare, UniFi, and internal DNS. Multus enables multi-network support with VLAN configurations.
Security & Secrets: Cert-Manager automates SSL/TLS certificate management using Cloudflare DNS-01 challenges. External Secrets Operator integrates with 1Password Connect to inject secrets from 1Password vaults into Kubernetes.
Storage & Backup: Rook-Ceph provides distributed storage with block (RBD), filesystem (CephFS), and object (S3) storage capabilities across the 3-node cluster. Volsync handles automated backup/restore using Restic. Spegel runs a stateless cluster-local OCI image mirror.
Data Management: Crunchy PostgreSQL Operator manages HA PostgreSQL clusters with 3 replicas and dual backup repositories (MinIO local + Cloudflare R2). Dragonfly Operator provides a Redis-compatible in-memory datastore.
Monitoring & Observability: Kube-Prometheus-Stack delivers comprehensive monitoring with Prometheus, Alertmanager, and Grafana. Victoria Logs aggregates logs via Fluent-Bit. Gatus provides uptime monitoring, Headlamp offers a Kubernetes dashboard, and specialized exporters (Blackbox, Smartctl, Unpoller) monitor infrastructure. KEDA enables event-driven autoscaling.
Automation & CI/CD: Actions Runner Controller runs self-hosted GitHub Actions runners in-cluster. Renovate Bot automatically creates PRs for dependency updates with custom auto-merge rules.
Cluster Utilities: Descheduler optimizes pod placement, Reloader auto-restarts pods on config changes, and Metrics Server provides resource metrics. AMD Device Plugin enables GPU workloads.
This Git repository contains the following directories.
📁 .github # GitHub workflows (Terraform, tagging, labels)
📁 .renovate # Renovate configuration (auto-merge, grouping)
📁 apps # 60+ applications organized by category
└─📁 media # Media automation (Sonarr, Radarr, qBittorrent, etc.) - 13 apps
└─📁 productivity # Productivity tools (n8n, Vikunja, Twenty, etc.) - 10 apps
└─📁 auth # Authentication (Authelia, LLDAP) - 2 apps
└─📁 communication # Communication apps (TheLounge) - 1 app
└─📁 monitoring # Monitoring apps (Seerr, Whoami) - 2 apps
└─📁 home-automation # Smart home (Homebridge) - 1 app
└─📁 data # Database clusters (PostgreSQL, Dragonfly) - 2 apps
📁 argocd # ArgoCD GitOps configuration
└─📁 applications # ApplicationSets for auto-discovery
└─📁 install # ArgoCD installation manifests
📁 bootstrap # Initial cluster bootstrap with Helmfile
📁 docs # Documentation and assets
📁 infra # Core infrastructure
└─📁 k8s # Kubernetes infrastructure by category
└─📁 networking # CNI, Gateway API, DNS (Cilium, Envoy, etc.) - 6 apps
└─📁 storage # Storage systems (Rook-Ceph, CSI, Volsync) - 5 apps
└─📁 security # Security (Cert-Manager, External Secrets) - 3 apps
└─📁 monitoring # Observability stack (Prometheus, Grafana, etc.) - 12 apps
└─📁 cluster-management # Cluster utilities (Descheduler, Reloader) - 6 apps
└─📁 operators # Kubernetes operators (Postgres, GPU, etc.) - 6 apps
└─📁 talos # Talos Linux configurations (Jinja2 templates)
└─📁 terraform # OpenTofu/Terraform for Cloudflare (DNS, R2, Tunnel)
📁 stacks # Docker Compose files for Unraid NAS
└─📁 * # Jellyfin, Lidarr, MinIO, Syncthing, etc.| Name | CIDR |
|---|---|
| Server VLAN | 192.168.40.0/24 |
| Kubernetes pods (Cilium) | 10.244.0.0/16 |
| Kubernetes services | 10.96.0.0/12 |
| Gateway LoadBalancer IP | 192.168.60.10 |
While most of my infrastructure and workloads are self-hosted, I rely on cloud services for critical functions that need to remain available regardless of cluster status.
| Service | Use | Cost |
|---|---|---|
| 1Password | Secrets management with 1Password Connect | ~$65/yr |
| Cloudflare | Domain, DNS, R2 object storage, Zero Trust Tunnel | ~$30/yr |
| GitHub | Repository hosting, CI/CD workflows | Free |
| Tailscale | VPN service for secure remote access | Free |
| Total: ~$7.90/mo |
| Device | Count | OS Disk Size | Data Disk Size | RAM | CPU | Operating System | Purpose |
|---|---|---|---|---|---|---|---|
| Mini PC | 3 | 256GB NVMe | 1TB NVMe | 32GB | Ryzen 7 4800H | Talos Linux | K8s control plane nodes |
| AI PC | 1 | 256GB NVMe | 1TB NVMe | 32GB | Ryzen 5 5600X | Talos Linux | K8s AI Node |
| Asustor AS5404T | 1 | 32GB (USB) | 4x 1TB + 4x 12TB | 32GB | Intel Celeron | Unraid | NAS (external storage) |
| Unifi Cloud Gateway Max | 1 | - | - | - | - | - | Router |
| Unifi USW Enterprise 24 PoE | 1 | - | - | - | - | - | 2.5Gb PoE Switch |
Thanks to all the people who share their knowledge and experience on GitHub. I have learned a lot from the homelab and Kubernetes communities.
- k8s-at-home - Community of homelabbers running Kubernetes
- Christian Lempa - Excellent homelab content
- pi-cluster - Comprehensive homelab documentation
- Techno Tim - Homelab and infrastructure tutorials
See my awful commit history
See LICENSE