plugin/errors: add show_first option to consolidate

**What would you like to be added**:

Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.

When show_first is enabled:
- The first matching error is logged immediately with full details
  (rcode, domain, type, error message) using the configured log level
- Subsequent matching errors are consolidated during the period
- At period end:
  - If only one error occurred, no summary is printed (already logged)
  - If multiple errors occurred, summary shows the total count

Syntax:
  consolidate DURATION REGEXP [LEVEL] [show_first]

Example with 3 errors:
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
  [WARNING] 3 errors like '^read udp .* i/o timeout$' occurred in last 30s

Example with 1 error:
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout


**Why is this needed**:

  The current `consolidate` directive in the errors plugin effectively prevents log flooding by aggregating similar errors and showing only a summary. However, this approach has a significant limitation in production environments:

  **The consolidated summary lacks concrete error details needed for debugging.**

  For example, when you see:
  [WARNING] 15 errors like '.timeout.' occurred in last 5m

  You know errors occurred, but you don't know:
  - Which specific domain triggered the error
  - What was the exact error message
  - What query type was involved
  - Any other contextual information that could help diagnose the root cause

  ### The Production Dilemma

  In production environments, operators face a difficult choice:
  1. **Keep consolidate enabled**: Prevent log flooding, but lose debugging context
  2. **Disable consolidate**: Get full error details, but risk overwhelming the logging system and storage

  This is especially problematic because **many issues only manifest in production** under real traffic patterns, making it impossible to reproduce in development or staging environments.

  ### Solution: The `show_first` Option

  The `show_first` flag provides the best of both worlds:
  - **Maintains log hygiene**: Only the first error is logged in detail, not all occurrences
  - **Preserves debugging context**: The first error includes full details (rcode, domain, query type, error message)
  - **Enables production troubleshooting**: Operators can identify the specific scenario causing errors
  - **Minimal log volume increase**: Only one additional log entry per consolidation period

  ### Real-World Use Case

  Consider a DNS server experiencing intermittent timeout errors to upstream resolvers:

  **Without show_first:**
  [WARNING] 247 errors like '.i/o timeout.' occurred in last 5m
  → You know there are timeout errors, but which upstream? Which domain? Hard to debug.

  **With show_first:**
  [WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
  [WARNING] 247 errors like '.i/o timeout.' occurred in last 5m
  → Now you can see it's affecting queries to 8.8.8.8, and you can investigate network connectivity to that specific upstream.

  ### Benefits

  1. **Improves observability**: Provides both metrics (error count) and examples (actual error)
  2. **Enables faster incident response**: No need to disable consolidate or wait for errors to reproduce
  3. **Follows best practices**: Similar to distributed tracing, where you sample exemplars alongside metrics
  4. **Respects log level configuration**: First error uses the configured log level (warning/error/info/debug)
  5. **Backward compatible**: Completely optional, existing configurations continue to work unchanged

  This enhancement makes CoreDNS more production-ready by balancing operational needs (log volume control) with debugging requirements (contextual information).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

plugin/errors: add show_first option to consolidate #7702

The Production Dilemma

Solution: The `show_first` Option

Real-World Use Case

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

plugin/errors: add show_first option to consolidate #7702

Description

The Production Dilemma

Solution: The show_first Option

Real-World Use Case

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Solution: The `show_first` Option