Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kevwan
Copy link
Contributor

@kevwan kevwan commented Dec 20, 2025

Fixes #3823

When etcd authentication fails with 'invalid auth token' error, the watch()
function was entering an infinite tight loop without any delay, causing:

  • 100% CPU usage
  • Gigabytes of identical error logs in seconds
  • Server crashes due to resource exhaustion

Added a 1-second cooldown between retries using coolDownUnstable.AroundDuration()
to prevent this issue while maintaining retry capability for transient errors.

This approach is consistent with existing patterns in the codebase and adds
slight randomness (±5% deviation) to prevent thundering herd scenarios.

Copilot AI review requested due to automatic review settings December 20, 2025 13:50
…h errors

Fixes zeromicro#3823

When etcd authentication fails with 'invalid auth token' error, the watch()
function was entering an infinite tight loop without any delay, causing:
- 100% CPU usage
- Gigabytes of identical error logs in seconds
- Server crashes due to resource exhaustion

Added a 1-second cooldown between retries using coolDownUnstable.AroundDuration()
to prevent this issue while maintaining retry capability for transient errors.

This approach is consistent with existing patterns in the codebase and adds
slight randomness (±5% deviation) to prevent thundering herd scenarios.
@kevwan kevwan changed the title Fix: issue 3823 fix(discov): add retry cooldown to prevent CPU/disk exhaustion on auth errors Dec 20, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue 3823 by making documentation improvements and adding resilience protection to the etcd watch retry loop.

  • Condenses and improves readability of README files (both English and Chinese versions)
  • Adds cooldown mechanism to etcd watch error retry loop to prevent CPU/disk resource exhaustion
  • Makes the AI tooling setup instructions more concise

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

File Description
readme.md Condensed background, design principles, features, and AI tooling sections for better readability; removed verbose explanations in favor of concise bullet points
readme-cn.md Similar condensing and formatting improvements as English version, maintaining consistency between both language versions
core/discov/internal/registry.go Added time.Sleep with cooldown after error logging in the watch function retry loop to prevent rapid retries from exhausting CPU/disk resources; follows the existing pattern used in the load function

@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@kevwan kevwan merged commit 39729f3 into zeromicro:master Dec 20, 2025
6 checks passed
@kevwan kevwan deleted the fix/issue-3823 branch December 20, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Infinite output of ETCD logs, causing service crashes

1 participant