Thanks to visit codestin.com
Credit goes to github.com

Skip to content

plugin/errors: add show_first option to consolidate #7702

@cangming

Description

@cangming

What would you like to be added:

Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.

When show_first is enabled:

  • The first matching error is logged immediately with full details
    (rcode, domain, type, error message) using the configured log level
  • Subsequent matching errors are consolidated during the period
  • At period end:
    • If only one error occurred, no summary is printed (already logged)
    • If multiple errors occurred, summary shows the total count

Syntax:
consolidate DURATION REGEXP [LEVEL] [show_first]

Example with 3 errors:
[WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
[WARNING] 3 errors like '^read udp .* i/o timeout$' occurred in last 30s

Example with 1 error:
[WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout

Why is this needed:

The current consolidate directive in the errors plugin effectively prevents log flooding by aggregating similar errors and showing only a summary. However, this approach has a significant limitation in production environments:

The consolidated summary lacks concrete error details needed for debugging.

For example, when you see:
[WARNING] 15 errors like '.timeout.' occurred in last 5m

You know errors occurred, but you don't know:

  • Which specific domain triggered the error
  • What was the exact error message
  • What query type was involved
  • Any other contextual information that could help diagnose the root cause

The Production Dilemma

In production environments, operators face a difficult choice:

  1. Keep consolidate enabled: Prevent log flooding, but lose debugging context
  2. Disable consolidate: Get full error details, but risk overwhelming the logging system and storage

This is especially problematic because many issues only manifest in production under real traffic patterns, making it impossible to reproduce in development or staging environments.

Solution: The show_first Option

The show_first flag provides the best of both worlds:

  • Maintains log hygiene: Only the first error is logged in detail, not all occurrences
  • Preserves debugging context: The first error includes full details (rcode, domain, query type, error message)
  • Enables production troubleshooting: Operators can identify the specific scenario causing errors
  • Minimal log volume increase: Only one additional log entry per consolidation period

Real-World Use Case

Consider a DNS server experiencing intermittent timeout errors to upstream resolvers:

Without show_first:
[WARNING] 247 errors like '.i/o timeout.' occurred in last 5m
→ You know there are timeout errors, but which upstream? Which domain? Hard to debug.

With show_first:
[WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
[WARNING] 247 errors like '.i/o timeout.' occurred in last 5m
→ Now you can see it's affecting queries to 8.8.8.8, and you can investigate network connectivity to that specific upstream.

Benefits

  1. Improves observability: Provides both metrics (error count) and examples (actual error)
  2. Enables faster incident response: No need to disable consolidate or wait for errors to reproduce
  3. Follows best practices: Similar to distributed tracing, where you sample exemplars alongside metrics
  4. Respects log level configuration: First error uses the configured log level (warning/error/info/debug)
  5. Backward compatible: Completely optional, existing configurations continue to work unchanged

This enhancement makes CoreDNS more production-ready by balancing operational needs (log volume control) with debugging requirements (contextual information).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions