Randomize cluster startup node order during topology refresh#4060
Merged
petyaslavova merged 13 commits intoMay 20, 2026
Merged
Conversation
…logy_reinitialization
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
…ceive the last failed node as argument and it is moved to be the last option for topology refresh
…etter maint notifications behaviour the randomization is mocked to keep the original order
…logy_reinitialization
…logy_reinitialization
vladvildanov
approved these changes
May 20, 2026
…logy_reinitialization
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8fcae90. Configure here.
petyaslavova
added a commit
that referenced
this pull request
May 26, 2026
* Randomize cluster startup node order during topology refresh * Fixing failing tests and adding randomization improvement - now if receive the last failed node as argument and it is moved to be the last option for topology refresh * Fixing linters * Applying review comment * Fixing flaky test - flakiness appeared after the randomization. For better maint notifications behaviour the randomization is mocked to keep the original order * Fix flaky tests by introducing mocked timer * Fixing tests after bad conflict resolution
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Randomizes the startup node iteration order during cluster topology initialization for both sync and async clients. This prevents many clients from consistently querying the same first startup node when reinitializing cluster state.
The implementation copies
startup_nodesto a list, shuffles it when multiple nodes are available, and then proceeds with the existing initialization flow. Sync behavior still includes any additional startup nodes after the shuffled startup node list, preserving the existing MOVED refresh path behavior.Adds sync and async cluster tests that use the real cluster fixture and mock only
random.shuffleto make the order deterministic. The tests verify that initialization queries the node that becomes first after shuffling.Fixes #4049
Note
Medium Risk
Changes cluster topology refresh/initialization ordering (sync + asyncio) and retry behavior by deferring the last failed node, which could affect failover/reconnect paths in production clusters. Test updates reduce flakiness but new shuffle/defer logic may surface edge cases with dynamic startup nodes and additional startup nodes.
Overview
Randomizes Redis Cluster topology refresh startup node selection for both sync and asyncio clients by shuffling the
startup_nodesiteration order when multiple nodes exist, reducing the chance that many clients hammer the same first node.Propagates a
last_failed_node_namehint through retry/refresh paths so the node that just errored is tried after other startup andadditional_startup_nodesduring reinitialization (syncredis/cluster.py, asyncredis/asyncio/cluster.py).Adds/adjusts tests to deterministically validate the shuffle behavior (mocking
random.shuffle), updates maintenance-notification tests to pin startup-node ordering, and makes lock blocking-timeout tests deterministic by monkeypatching time/asyncio sleep.Reviewed by Cursor Bugbot for commit caba14d. Bugbot is set up for automated code reviews on this repo. Configure here.