Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cangming
Copy link
Contributor

@cangming cangming commented Nov 22, 2025

1. Why is this pull request needed and what does it do?

Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.

When show_first is enabled:

The first matching error is logged immediately with full details
(rcode, domain, type, error message) using the configured log level
Subsequent matching errors are consolidated during the period
At period end:
If only one error occurred, no summary is printed (already logged)
If multiple errors occurred, summary shows the total count

2. Which issues (if any) are related?

#7702

3. Which documentation changes (if any) need to be made?

Update errors plugin docs to include show_first settings and examples

4. Does this introduce a backward incompatible change or deprecation?

No breaking changes. The show_first option disable by default .

@cangming
Copy link
Contributor Author

cangming commented Nov 23, 2025

Test Failure Investigation

Summary

The Test e2e failure is not related to this PR. This is a pre-existing flaky test in the master branch.

Evidence

I tested the master branch (commit de010910e) locally with the same test setup:

docker run --rm -v /path/to/coredns:/workspace -w /workspace/test golang:1.25 \
  bash -c "go install github.com/fatih/faillint@c56e3ec6dbfc933bbeb884fd31f2bcd41f712657 && \
  go test -race -count=3 ./..."

Result: The same data race in TestReadme was reproduced on master branch.

image

Details

Test Status for This PR

All tests in plugin/errors pass successfully:
✅ Unit tests: PASS (with -race)
✅ Integration tests: PASS
✅ golangci-lint: 0 issues

The errors plugin changes are isolated and thoroughly tested. The e2e failure is a known flaky test issue that should be addressed separately.

if cnt > 0 {
h.patterns[i].logCallback("%d errors like '%s' occurred in last %s",
cnt, h.patterns[i].pattern.String(), h.patterns[i].period)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid duplicated code in different branches. Consider a single condition like

	if cnt > 1 || (cnt == 0 && !h.patterns[i].showFirst) {

Parentheses are probably not required, but they make the code clearer.
Alternatively, you can consider the code like

	if cnt == 0 {
		return
	}
	if cnt > 1 || !h.patterns[i].showFirst {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

for i := range h.patterns {
if h.patterns[i].pattern.MatchString(strErr) {
if h.inc(i) {
if h.recordError(i, rcode, state.Name(), state.Type(), strErr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider just returning false from recordError when cnt == 1 and showFirst is configured. The error message will be printed by the log.Errorf below.
This way, you will not need to pass more parameters to recordError and to duplicate the error printing code.
The function recordError needs to have a clear description

Copy link
Contributor Author

@cangming cangming Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make following changes.

  • Rename inc to consolidateError
  • Removed all parameters, keeping function not change
  • Returns false when showFirst is enabled, letting caller handle logging
  • Unified two log call sites using function pointer
  • First error now uses pattern's configured log level (not always ERROR)

expCnt := uint32(tt.errorCount)
actCnt := atomic.LoadUint32(&h.patterns[0].count)
if actCnt != expCnt {
t.Fatalf("Unexpected 'count', expected %d, actual %d", expCnt, actCnt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider following to continue with other test cases

  t.Errorf(...)
  continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

// Check for show_first option
showFirst := false
if len(args) > 2+logLevelArgCount {
if args[2+logLevelArgCount] == "show_first" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, consider refactoring the parseLogLevel (rename to parseOptionalParams) to loop through remaining arguments and return logger and showFirst flag.

Copy link
Contributor Author

@cangming cangming Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make following changes.

  • Loop through remaining arguments and return logger and showFirst flag.
  • Map-based dispatch replaces switch-case
  • Added parameter order validation (log level must precede show_first)
  • Added validation for multiple log levels
  • Clearer error messages

@cangming cangming force-pushed the errors-show-first branch 2 times, most recently from 5b4239f to 57fc718 Compare November 26, 2025 04:33
Copy link
Collaborator

@rdrozhdzh rdrozhdzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, look good to me. See one minor note

if len(args) != 3 {
return log.Errorf, nil
// parseOptionalParams parses optional parameters (log level and show_first flag)
// from the remaining arguments starting at index 2.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, you can pass args[2:] to this func and avoid checking/mentioning this magic number inside this func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested. Thanks for the review!

Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.

When show_first is enabled:
- The first matching error is logged immediately with full details
  (rcode, domain, type, error message) using the configured log level
- Subsequent matching errors are consolidated during the period
- At period end:
  - If only one error occurred, no summary is printed (already logged)
  - If multiple errors occurred, summary shows the total count

Syntax:
  consolidate DURATION REGEXP [LEVEL] [show_first]

Example with 3 errors:
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
  [WARNING] 3 errors like '^read udp .* i/o timeout$' occurred in last 30s

Example with 1 error:
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout

Implementation details:
- Add showFirst bool to pattern struct
- Rename inc() to consolidateError(), return false for showFirst case
- Use function pointer in ServeDNS to unify log calls with proper level
- Simplify logPattern() with single condition (cnt > 1 || !showFirst)
- Refactor parseLogLevel() to parseOptionalParams() with map-based dispatch
- Validate parameter order: log level must come before show_first
- Update README.md with show_first documentation and examples
- Add comprehensive test cases for show_first functionality

Signed-off-by: cangming <[email protected]>
@yongtang yongtang merged commit cad961f into coredns:master Dec 10, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants