This is a monorepository is for my home kubernetes clusters. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Terraform, Kubernetes, Flux, Renovate, and GitHub Actions.
The purpose here is to learn k8s, while practicing Gitops.
My Kubernetes cluster is deployed with Talos - a semi-hyper-converged setup where workloads and block storage share resources on the nodes, with a separate NAS providing NFS/SMB shares and backups.
- actions-runner-controller: Self-hosted GitHub runners
- cilium: Internal Kubernetes networking (CNI)
- cert-manager: SSL certificates management
- envoy-gateway: Ingress controller (Gateway API)
- external-dns: Automatic DNS records management
- external-secrets: Kubernetes secrets from Bitwarden Secrets Manager
- rook-ceph: Distributed block storage
- spegel: Stateless cluster local OCI registry mirror
- tofu-controller: Terraform/OpenTofu controller for Flux
- volsync: Backup and recovery of PVCs
Flux continuously reconciles this Git repository with the cluster state. Renovate automatically creates PRs for dependency updates. Learn more about Flux at fluxcd.io/docs.
This diagram shows how Flux handles complex application dependencies. In this example, Authentik deployment waits for:
- PostgreSQL and Dragonfly operators to be installed
- Database and cache instances to be provisioned and healthy
graph TD
%% Operator Installation
A[Kustomization: crunchy-postgres-operator] -->|Creates| B[HelmRelease: crunchy-postgres-operator]
C[Kustomization: dragonfly-operator] -->|Creates| D[HelmRelease: dragonfly-operator]
%% Authentik Dependencies
E[Kustomization: authentik] -->|dependsOn| A
E -->|dependsOn| C
E -->|Creates| F[(PostgresCluster: authentik)]
E -->|Creates| G[(Dragonfly: authentik)]
E -->|Creates| H[[HelmRelease: authentik]]
%% Health Dependencies
H -->|Requires healthy| F
H -->|Requires healthy| G
%% Operator Management
B -.->|Manages| F
D -.->|Manages| G
%% External Dependencies
I[(rook-ceph storage)] -->|Provides PVC| F
Network infrastructure is managed via Terraform. See vrozaksen/mikrotik-terraform for details.
While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about two things. (1) Dealing with chicken/egg scenarios and (2) services I critically need whether my cluster is online or not.
The alternative solution to these two problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Gatus. However, maintaining another cluster and monitoring another group of workloads is a lot more time and effort than I am willing to put in.
| Service | Use | Cost |
|---|---|---|
| Bitwarden | Secrets with External Secrets | ~$10/yr |
| Cloudflare | Domain, DNS, WAF and R2 bucket (S3 Compatible endpoint) | ~$30/yr |
| GitHub | Hosting this repository and continuous integration/deployments | Free |
| Healthchecks.io | Monitoring internet connectivity and external facing applications | Free |
| Total: ~$3,3/mo |
| Name | Device | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
|---|---|---|---|---|---|---|---|
| Alfheim | Lenovo M720q | i5-8500T | 480GB SSD | 500GB NVME | 64GB | Talos | k8s control |
| Alne | Lenovo M720q | i5-8500T | 480GB SSD | 500GB NVME | 32GB | Talos | k8s control |
| Ainias | Lenovo M720q | i5-8500T | 480GB SSD | 500GB NVME | 32GB | Talos | k8s control |
Totals: 18 CPU threads, 128 GB RAM Network: Intel X710-DA2 (LACP 2x10Gbps 802.3ad)
| Name | Device | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
|---|---|---|---|---|---|---|---|
| Granzam | Lenovo M920q | i3-9100 | xxxxxxxxx | xxxxxxxxxx | 16GB | TBD | Game servers (Pterodactyl/AMP) |
Infrastructure management: Ansible or Terraform (learning project)
| Name | CPU | RAM | OS | Storage | Purpose |
|---|---|---|---|---|---|
| Aincrad | i3-14100 | 32GB | Unraid | Array: 5x14TB + 5x4TB ZFS Cache: 1TB M.2 SSD Blaze Pool: 2x960GB SSD RAID1 |
NAS/NFS/S3/Backup |
Components: AsRock B760M-H2/M.2, Corsair Vengeance DDR5 6000MHz, Inter-Tech 4U Case, 2x ASM1166 HBA GPU: ASUS GeForce RTX 3060 Phoenix V2 LHR 12GB GDDR6 (ML/LLM)
| Device | Purpose |
|---|---|
| MikroTik RB5009UPr+S+IN | Router |
| MikroTik CRS326-24S+2Q+RM | 10G Switch |
| HORACO 2.5GbE 5-Port + 10G SFP+ | 2.5G Switch |
| APC SMC1500I-2UC | UPS |
Big shout out to onedr0p's cluster-template for the excellent foundation, and the Home Operations Discord community for continuous inspiration and support.
Check out kubesearch.dev for ideas on deploying applications in your homelab.