Fix flaky cluster tests by accepting either retry limit error (maxAttempts or maxTotalRetriesDuration)#4399
Conversation
Tests expecting "No more cluster attempts left" sometimes got "Cluster retry deadline exceeded" due to randomized backoff jitter on the final attempt. On the last retry, backoff can sleep 0 to millisLeft (entire remaining time). High jitter exhausts the deadline first, low jitter exhausts attempts first - making it non-deterministic which error occurs. Fixed by updating assertions to accept either error message using Hamcrest anyOf matcher.
e4f3a62 to
1d77ff1
Compare
Test Results 285 files ±0 285 suites ±0 11m 51s ⏱️ -26s Results for commit 721269b. ± Comparison against base commit 9413149. This pull request skips 201 tests.♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Pull request overview
This pull request fixes flaky cluster tests by updating test assertions to accept either of two possible error messages that can occur due to non-deterministic backoff behavior during cluster retries.
Changes:
- Updated test assertions to use Hamcrest's
anyOfmatcher to accept either "No more cluster attempts left." or "Cluster retry deadline exceeded." error messages - Added Hamcrest imports to support the new assertion pattern
- Applied the fix consistently across three SSL cluster test classes
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| SSLRedisClusterClientTest.java | Updated assertions in connectToNodesFailsWithSSLParametersAndNoHostMapping and connectWithCustomHostNameVerifier tests to accept either error message; added Hamcrest imports |
| SSLOptionsRedisClusterClientTest.java | Updated assertions in connectToNodesFailsWithSSLParametersAndNoHostMapping and connectWithCustomHostNameVerifier tests to accept either error message; added Hamcrest imports |
| SSLACLRedisClusterClientTest.java | Updated assertions in connectToNodesFailsWithSSLParametersAndNoHostMapping and connectWithCustomHostNameVerifier tests to accept either error message; added Hamcrest imports |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix flaky cluster tests by accepting either retry limit error
Problem
Tests flaked when expecting "No more cluster attempts left" but got
"Cluster retry deadline exceeded" due to randomized backoff jitter.
Root Cause
On the final retry attempt, backoff can sleep for 0 to millisLeft
(entire remaining time). Depending on random jitter:
This makes it non-deterministic which limit is reached first.
Solution
Updated assertions to accept either error message using anyOf matcher,
making tests resilient to backoff randomness.
Affected tests