-
Check the status of Cilium:
cilium status
-
Check the status of Flux and if the Flux resources are up-to-date and in a ready state:
π Run
task reconcileto force Flux to sync your Git repository stateflux check flux get sources git flux-system flux get ks -A flux get hr -A
-
Check TCP connectivity to both the internal and external gateways:
π The variables are only placeholders, replace them with your actual values
nmap -Pn -n -p 443 ${cluster_gateway_addr} ${cloudflare_gateway_addr} -vv
-
Check you can resolve DNS for
echo, this should resolve to${cloudflare_gateway_addr}:π The variables are only placeholders, replace them with your actual values
dig @${cluster_dns_gateway_addr} echo.${cloudflare_domain}
-
Check the status of your wildcard
Certificate:kubectl -n kube-system describe certificates
Tip
Use the envoy-external gateway on HTTPRoutes to make applications public to the internet. These are also accessible on your private network once you set up split DNS.
The external-dns application created in the network namespace will handle creating public DNS records. By default, echo and the flux-webhook are the only subdomains reachable from the public internet. In order to make additional applications public you must set the correct gateway like in the HelmRelease for echo.
Tip
Use the envoy-internal gateway on HTTPRoutes to make applications private to your network. If you're having trouble with internal DNS resolution check out this GitHub discussion.
k8s_gateway will provide DNS resolution to external Kubernetes resources (i.e. points of entry to the cluster) from any device that uses your home DNS server. For this to work, your home DNS server must be configured to forward DNS queries for ${cloudflare_domain} to ${cluster_dns_gateway_addr} instead of the upstream DNS server(s) it normally uses. This is a form of split DNS (aka split-horizon DNS / conditional forwarding).
... Nothing working? That is expected, this is DNS after all!
Tip
Ensure you have updated talconfig.yaml and any patches with your updated configuration. In some cases you not only need to apply the configuration but also upgrade talos to apply new configuration.
# (Re)generate the Talos config
task talos:generate-config
# Apply the config to the node
task talos:apply-node IP=? MODE=?
# e.g. task talos:apply-node IP=10.10.10.10 MODE=autoTip
Ensure the talosVersion and kubernetesVersion in talenv.yaml are up-to-date with the version you wish to upgrade to.
# Upgrade node to a newer Talos version
task talos:upgrade-node IP=?
# e.g. task talos:upgrade-node IP=10.10.10.10# Upgrade cluster to a newer Kubernetes version
task talos:upgrade-k8s
# e.g. task talos:upgrade-k8sBelow is a general guide on trying to debug an issue with an resource or application. For example, if a workload/resource is not showing up or a pod has started but in a CrashLoopBackOff or Pending state. These steps do not include a way to fix the problem as the problem could be one of many different things.
-
Check if the Flux resources are up-to-date and in a ready state:
π Run
task reconcileto force Flux to sync your Git repository stateflux get sources git -A flux get ks -A flux get hr -A
-
Do you see the pod of the workload you are debugging:
kubectl -n <namespace> get pods -o wide
-
Check the logs of the pod if its there:
kubectl -n <namespace> logs <pod-name> -f
-
If a resource exists try to describe it to see what problems it might have:
kubectl -n <namespace> describe <resource> <name>
-
Check the namespace events:
kubectl -n <namespace> get events --sort-by='.metadata.creationTimestamp'
Resolving problems that you have could take some tweaking of your YAML manifests in order to get things working, other times it could be a external factor like permissions on a NFS server. If you are unable to figure out your problem see the support sections below.
There's a lot to absorb here, especially if you're new to these tools. Take some time to familiarize yourself with the tooling and understand how all the components interconnect. Dive into the documentation of the various tools included β they are a valuable resource. This shouldn't be a production environment yet, so embrace the freedom to experiment. Move fast, break things intentionally, and challenge yourself to fix them.
Below are some optional considerations you may want to explore.
The template uses k8s_gateway to provide DNS for your applications, consider exploring external-dns as an alternative.
External-DNS offers broad support for various DNS providers, including but not limited to:
This flexibility allows you to integrate seamlessly with a range of DNS solutions to suit your environment and offload DNS from your cluster to your router, or external device.
SOPs is an excellent tool for managing secrets in a GitOps workflow. However, it can become cumbersome when rotating secrets or maintaining a single source of truth for secret items.
For a more streamlined approach to those issues, consider External Secrets. This tool allows you to move away from SOPs and leverage an external provider for managing your secrets. External Secrets supports a wide range of providers, from cloud-based solutions to self-hosted options.
If your workloads require persistent storage with features like replication or connectivity to NFS, SMB, or iSCSI servers, there are several projects worth exploring:
These tools offer a variety of solutions to meet your persistent storage needs, whether youβre using cloud-native or self-hosted infrastructures.
Community member @whazor created Kubesearch to allow searching Flux HelmReleases across Github and Gitlab repositories with the kubesearch topic.
- Make a post in this repository's Github Discussions.
- Start a thread in the
#supportor#cluster-templatechannels in the Home Operations Discord server.