Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Kentzo
Copy link
Contributor

@Kentzo Kentzo commented Sep 6, 2025

1. Why is this pull request needed and what does it do?

dnsserver.Server.Stop doesn't properly wait for connections: dsnserver.Server.dnsWg is a no-op
and dns.Server.ShutownContext is not used.

2. Which issues (if any) are related?

None

3. Which documentation changes (if any) need to be made?

None

4. Does this introduce a backward incompatible change or deprecation?

No

@codecov
Copy link

codecov bot commented Sep 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.62%. Comparing base (93c57b6) to head (1c3c9b5).
⚠️ Report is 1660 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7517      +/-   ##
==========================================
+ Coverage   55.70%   62.62%   +6.92%     
==========================================
  Files         224      274      +50     
  Lines       10016    18318    +8302     
==========================================
+ Hits         5579    11472    +5893     
- Misses       3978     6178    +2200     
- Partials      459      668     +209     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@thevilledev
Copy link
Collaborator

Conflict caused by #7562 but I think this PR is definitely worth doing still. Please rebase when you have a moment, thanks!

@Kentzo
Copy link
Contributor Author

Kentzo commented Sep 23, 2025

If I read correctly, dns.Server.Shutdown / ShutdownContext already have protections to be safe for concurrent calls: https://github.com/miekg/dns/blob/294d37389cccc53250740798dde72a0c1810be2a/server.go#L398-L448

Need to understand whether stopOnce is necessary.

@thevilledev
Copy link
Collaborator

I ran similar tests as in #7314, which has the stacktrace visible. There's also TestReloadConcurrentRestartAndStop which you can use for validation, if the stopOnce appears unnecessary. 👍

@Kentzo
Copy link
Contributor Author

Kentzo commented Sep 23, 2025

I struggle how to make TestStopIsIdempotent useful. It doesn't seem to test anything atm.

@thevilledev
Copy link
Collaborator

Yeah +1 for removing it now with dnsWg gone

@thevilledev
Copy link
Collaborator

How about a test that starts a server, sets a very short graceTimeout, calls Stop(), and asserts context.DeadlineExceeded? Should exercise the timeout path and verify the deadline is propagated properly.

@Kentzo
Copy link
Contributor Author

Kentzo commented Sep 24, 2025

I'm not sure that can reliably work unless there is an active TCP connection. But assuming it does, how does this verify that Stop is idempotent?

@thevilledev
Copy link
Collaborator

Yeah this would be a new test. I would remove the idempotency one with dnsWg now gone. And stopOnce structurally guarantees that Stop() is idempotent, so not much value in testing it anymore. Instead this new test would focus on observing the context timeout. I guess the test directory would be the best place for an e2e test like that.

@Kentzo
Copy link
Contributor Author

Kentzo commented Sep 24, 2025

Which behavior is preferred for Stop's error: return the error on all calls or return the error on the first call only?

@Kentzo
Copy link
Contributor Author

Kentzo commented Sep 25, 2025

@thevilledev Is this the test you had in mind? The biggest downside is it uses default value for graceTimeout which is 5 seconds.

@thevilledev
Copy link
Collaborator

Yeah I think that's the right idea. Although I missed the fact that we can't change graceTimeout. The reflect for private fields is probably the most brittle part.

What do you think of an in-package test instead? I did the following test using your blocking struct in core/dnsserver/server_test.go and seems to work quite well:

func TestGracefulStopTimeout_Internal(t *testing.T) {
	p := new(blocking)
	cfg := testConfig("dns", p)

	s, err := NewServer("127.0.0.1:0", []*Config{cfg})
	if err != nil {
		t.Fatalf("NewServer failed: %v", err)
	}

	// Shorten the graceful timeout
	s.graceTimeout = 500 * time.Millisecond

	pc, err := net.ListenPacket("udp", "127.0.0.1:0")
	if err != nil {
		t.Fatalf("ListenPacket failed: %v", err)
	}
	defer pc.Close()

	go s.ServePacket(pc)
	udp := pc.LocalAddr().String()

	// Block the handler
	p.lock.Lock()
	defer p.lock.Unlock()

	m := new(dns.Msg)
	m.SetQuestion("example.com.", dns.TypeA)

	// Readiness loop to avoid flakiness
	deadline := time.Now().Add(2 * time.Second)
	for {
		_, err := dns.Exchange(m, udp)
		if err == nil || time.Now().After(deadline) {
			break
		}
		time.Sleep(10 * time.Millisecond)
	}

	err = s.Stop()

	if !errors.Is(err, context.DeadlineExceeded) {
		t.Fatalf("expected context.DeadlineExceeded, got %v", err)
	}
}

Which behavior is preferred for Stop's error: return the error on all calls or return the error on the first call only?

IMO returning it only on the first call would hide failures from later callers which kinda complicates err handling. With sync.Once subsequent calls get the stored error from the first Stop() call.

@Kentzo
Copy link
Contributor Author

Kentzo commented Sep 25, 2025

IIUC the socket is ready and the only source of flakiness is go s.ServePacket(pc). Since the dns.Exchange already uses a timeout of 2 seconds, do you think this loop is necessary?

dnsserver.Server.dnsWg does nothing but dns.Server tracks its connections.

Signed-off-by: Ilya Kulakov <[email protected]>
Copy link
Collaborator

@thevilledev thevilledev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Just me being too cautious for Github Actions 😁

LGTM - I'll let others a chance to also review, but lets merge after couple of days if nothing else pops up. Thanks @Kentzo!

@yongtang yongtang merged commit eafc352 into coredns:master Sep 27, 2025
12 of 13 checks passed
@Kentzo Kentzo deleted the dnsserver-stop branch October 3, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants