-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What happened:
If the Corefile uses imported files, a race condition may happen during reload, in the sense that the restarted instance may not run with the most recent configuration. Explanation attempt:
As of now, the InstanceStartupEvent event hook in https://github.com/coredns/coredns/blob/master/plugin/reload/reload.go#L76 reads and remembers an initial hash value of the current parsed Corefile. Here, 'current' means that it indeed (and correctly IMHO) uses the Corefile content that was loaded by current instance's Start() or Restart() methods. However, any imported files are read at execution time of the time of the reload's event hook() function, when calling the parse() function.
This can create a race condition. Imagine the following sequence:
- A reload is triggered for some reason, e.g. by a change of the Corefile or one of the imported files
- The currently running reload hook does its job, and triggers a
Restart()of the instance (https://github.com/coredns/coredns/blob/master/plugin/reload/reload.go#L108) - Now imagine, another change happens in one of the imported files, during or after executing the
Corefile(https://github.com/coredns/caddy/blob/master/caddy.go#L246), but before emitting theInstanceStartupEvent(https://github.com/coredns/caddy/blob/master/caddy.go#L264). Then some or all of the plugins might use the previous, outdated content of the imported file, but the new reload handler started in thehook()function will remember the SHA of the current content. As a consequence, no further reload will happen (unless another change occurs), but plugins might run with a non-up-to-date configuration.
IMHO the actual problem is that the caddy.Instance stores the content of the current Corefile, but not the content of potentially imported files. Probably, changing this would be a bigger effort.
As an alternative solution, I created a pull request (#6244).
What you expected to happen:
Such a race condition should not happen.
How to reproduce it (as minimally and precisely as possible):
Import a frequently changing file, and let reload run at a high frequency; and wait ...
Anything else we need to know?: no
Environment:
- the version of CoreDNS: 1.10.1, master
- Corefile:
masquerading.override (frequently changing):
.:62529 { bind 127.0.0.1 errors log . prometheus kubernetes cluster.local in-addr.arpa ip6.arpa { kubeconfig /var/folders/qq/3flcldfx1x3gvszg3xfv_13m0000gn/T/2860259645/kubeconfig pods insecure fallthrough in-addr.arpa ip6.arpa ttl 1 } forward . 8.8.8.8 { max_concurrent 1000 } loop reload 2s 1s loadbalance import *.override }rewrite name exact mhzavbbgti.gvdmt kubernetes.default.svc.cluster.local rewrite name exact jbveovgrmf.bwyke kubernetes.default.svc.cluster.local - logs, if applicable: n/a
- OS (e.g:
cat /etc/os-release): Darwin XXX 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm6 (but happens on linux too) - Others: n/a