This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Kubernetes, ArgoCD, GitHub Actions.
My cluster is a 3-node high-availability setup running on bare-metal Talos Linux. All three nodes function as control planes (no dedicated workers), providing both compute and distributed storage via Rook-Ceph. This hyper-converged architecture maximizes resource utilization across all nodes, with each node contributing:
- Compute: Kubernetes workload scheduling
- Storage: 1TB NVMe disk for Ceph distributed storage (block, filesystem, and object)
- Control Plane: etcd member and Kubernetes API server
The cluster uses 2-way replication for storage, tolerating one node failure while maintaining data availability.
Networking & Ingress: cilium provides eBPF-based networking with Gateway API support and L2 announcements for LoadBalancer IPs. cloudflare-ingress secures external access via Cloudflare Tunnel, while external-dns automatically syncs DNS records to Cloudflare and AdGuard Home.
Security & Secrets: cert-manager automates SSL/TLS certificate management using Cloudflare DNS-01 challenges. For secrets, external-secrets integrates with 1Password Connect to inject secrets into Kubernetes, while sealed-secrets stores encrypted secrets safely in Git.
Storage & Backup: rook-ceph provides distributed storage with block (RBD), filesystem (CephFS), and object (S3) storage capabilities across the 3-node cluster. volsync handles volume snapshots and replication for backup/restore. spegel improves reliability by running a stateless cluster-local OCI image mirror. crunchy-postgres-operator manages highly available PostgreSQL clusters.
Monitoring & Observability: kube-prometheus-stack delivers comprehensive monitoring with Prometheus, Alertmanager, and Grafana. metrics-server provides resource metrics for autoscaling and kubectl top commands.
Automation & CI/CD: actions-runner-controller runs self-hosted GitHub Actions runners directly in the cluster for continuous integration workflows.
Cluster Utilities: descheduler optimizes pod placement, while reloader automatically restarts pods when ConfigMaps or Secrets change.
This Git repository contains the following directories.
📁 .github # GitHub workflows for CI/CD
📁 apps # Applications deployed in the cluster
└─📁 {category} # Organized by function: media, data, auth, communication, etc.
📁 argocd # ArgoCD configuration and parent applications
└─📁 applications # Parent apps that discover and manage child apps
└─📁 install # ArgoCD installation manifests
📁 docs # Extra documentation and assets
📁 infra # Core infrastructure configurations
└─📁 ansible # Ansible playbooks for cluster bootstrap
└─📁 k8s # Kubernetes infrastructure by category
└─📁 networking # CNI, ingress, DNS (Cilium, Envoy Gateway, etc.)
└─📁 storage # Storage systems (Rook-Ceph, CSI drivers, etc.)
└─📁 security # Security tools (cert-manager, sealed-secrets, etc.)
└─📁 monitoring # Monitoring stack (Prometheus, Grafana)
└─📁 cluster-mgmt # Cluster utilities (metrics-server, reloader, etc.)
└─📁 operators # Kubernetes operators (Postgres, GPU, etc.)
└─📁 talos # Talos Linux node configurations
└─📁 terraform # Terraform configurations
📁 stacks # Docker Compose files for Asustor NAS
└─📁 media-stack # Media management stack
📁 terraform # Terraform for cloud resources (Cloudflare, etc.)| Name | CIDR |
|---|---|
| Server VLAN | 192.168.40.0/24 |
| Kubernetes pods (Cilium) | 10.244.0.0/16 |
| Kubernetes services | 10.96.0.0/12 |
| Gateway LoadBalancer IP | 192.168.60.10 |
While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about two things. (1) Dealing with chicken/egg scenarios and (2) services I critically need whether my cluster is online or not.
| Service | Use | Cost |
|---|---|---|
| 1Password | Secrets with 1Password Connect and Controler | ~$65/yr |
| Cloudflare | Domain and R2 | ~$30/yr |
| GitHub | Hosting this repository and continuous integration/deployments | Free |
| Tailscale | VPN Serice | Free |
| Total: ~$7,90/mo |
| Device | Count | OS Disk Size | Data Disk Size | RAM | CPU | Operating System | Purpose |
|---|---|---|---|---|---|---|---|
| Mini PC | 3 | 256GB NVMe | 1TB NVMe | 16GB | Ryzen 7 4800H | Talos Linux 1.9 | K8s control plane nodes |
| Asustor AS5404T | 1 | 32GB (USB) | 4x 1TB + 4x 12TB | 32GB | Intel Celeron | Unraid 7.1.4 | NAS (external storage) |
- Ansible playbook for deploying the cluster
- Implement terraform for managing cloud resources
Thanks to all the people who share their knowledge and experience on Github. I have learned a lot from reading blog posts and watching YouTube videos. I have tried to link to the sources of my inspiration where possible
See my awful commit history
See LICENSE