Thanks to visit codestin.com
Credit goes to github.com

Skip to content

plugin/reload: potential race conditon with imported files #6243

@cbarbian-sap

Description

@cbarbian-sap

What happened:

If the Corefile uses imported files, a race condition may happen during reload, in the sense that the restarted instance may not run with the most recent configuration. Explanation attempt:

As of now, the InstanceStartupEvent event hook in https://github.com/coredns/coredns/blob/master/plugin/reload/reload.go#L76 reads and remembers an initial hash value of the current parsed Corefile. Here, 'current' means that it indeed (and correctly IMHO) uses the Corefile content that was loaded by current instance's Start() or Restart() methods. However, any imported files are read at execution time of the time of the reload's event hook() function, when calling the parse() function.

This can create a race condition. Imagine the following sequence:

  1. A reload is triggered for some reason, e.g. by a change of the Corefile or one of the imported files
  2. The currently running reload hook does its job, and triggers a Restart() of the instance (https://github.com/coredns/coredns/blob/master/plugin/reload/reload.go#L108)
  3. Now imagine, another change happens in one of the imported files, during or after executing the Corefile (https://github.com/coredns/caddy/blob/master/caddy.go#L246), but before emitting the InstanceStartupEvent (https://github.com/coredns/caddy/blob/master/caddy.go#L264). Then some or all of the plugins might use the previous, outdated content of the imported file, but the new reload handler started in the hook() function will remember the SHA of the current content. As a consequence, no further reload will happen (unless another change occurs), but plugins might run with a non-up-to-date configuration.

IMHO the actual problem is that the caddy.Instance stores the content of the current Corefile, but not the content of potentially imported files. Probably, changing this would be a bigger effort.

As an alternative solution, I created a pull request (#6244).

What you expected to happen:

Such a race condition should not happen.

How to reproduce it (as minimally and precisely as possible):

Import a frequently changing file, and let reload run at a high frequency; and wait ...

Anything else we need to know?: no

Environment:

  • the version of CoreDNS: 1.10.1, master
  • Corefile:
    .:62529 {
        bind 127.0.0.1
        errors
        log .
        prometheus
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            kubeconfig /var/folders/qq/3flcldfx1x3gvszg3xfv_13m0000gn/T/2860259645/kubeconfig
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 1
        }
        forward . 8.8.8.8 {
            max_concurrent 1000
        }
        loop
        reload 2s 1s
        loadbalance
        import *.override
    }
    
    masquerading.override (frequently changing):
    rewrite name exact mhzavbbgti.gvdmt kubernetes.default.svc.cluster.local
    rewrite name exact jbveovgrmf.bwyke kubernetes.default.svc.cluster.local
    
  • logs, if applicable: n/a
  • OS (e.g: cat /etc/os-release): Darwin XXX 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm6 (but happens on linux too)
  • Others: n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions