Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix: Add defensive type validation to prevent silent cluster refresh failure.#6993

Open
ngyngcphu wants to merge 1 commit intoredisson:masterfrom
ngyngcphu:master
Open

Fix: Add defensive type validation to prevent silent cluster refresh failure.#6993
ngyngcphu wants to merge 1 commit intoredisson:masterfrom
ngyngcphu:master

Conversation

@ngyngcphu
Copy link

This PR fixes a critical bug where a queue based response binding error causes CLUSTER NODES to receive a PONG response instead of cluster topology data. The resulting ClassCastException is silently swallowed by CompletableFuture, causing the cluster topology refresh to stop permanently.
The fix detects type mismatches early and converts them into recoverable errors, allowing the existing retry mechanism to handle the situation gracefully instead of failing silently.

Fixes: #6992

Copilot AI review requested due to automatic review settings March 1, 2026 14:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a defensive guard in the Redis protocol decoder to detect when a CLUSTER NODES request is incorrectly bound to a non-topology response (e.g., PONG), turning what would become a silent ClassCastException into an explicit failure that upstream retry/scheduling logic can handle (per issue #6992).

Changes:

  • Add a CLUSTER NODES response-type validation in CommandDecoder.completeResponse.
  • Convert mismatched response types into a RedisException via tryFailure instead of completing the promise with an invalid type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

&& "CLUSTER".equals(data.getCommand().getName())
&& "NODES".equals(data.getCommand().getSubName())
&& !(result instanceof List)) {
data.tryFailure(new RedisException("Response type mismatch for command: CLUSTER NODES"));
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new RedisException message is quite generic and drops the contextual details used elsewhere in this decoder (e.g., command info via LogHelper). Consider including the expected vs actual response type (and the command via LogHelper.toString(data)) so this failure is diagnosable from logs without additional instrumentation.

Suggested change
data.tryFailure(new RedisException("Response type mismatch for command: CLUSTER NODES"));
String commandInfo = LogHelper.toString(data);
String actualType = (result == null) ? "null" : result.getClass().getName();
String message = "Response type mismatch for command " + commandInfo
+ ": expected java.util.List (e.g. List<ClusterNodeInfo>), but got " + actualType;
data.tryFailure(new RedisException(message));

Copilot uses AI. Check for mistakes.
Comment on lines +561 to +562
&& "CLUSTER".equals(data.getCommand().getName())
&& "NODES".equals(data.getCommand().getSubName())
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check hard-codes command/subcommand names as string literals. Since RedisCommands already defines REDIS_CLUSTER_NODES, consider using its getName()/getSubName() values (or a shared constant) to avoid typos and keep this consistent if command naming ever changes in one place.

Suggested change
&& "CLUSTER".equals(data.getCommand().getName())
&& "NODES".equals(data.getCommand().getSubName())
&& RedisCommands.REDIS_CLUSTER_NODES.getName().equals(data.getCommand().getName())
&& RedisCommands.REDIS_CLUSTER_NODES.getSubName().equals(data.getCommand().getSubName())

Copilot uses AI. Check for mistakes.
Comment on lines +556 to +566
// Fix for: https://github.com/redisson/redisson/issues/6992
// CLUSTER NODES expects List<ClusterNodeInfo>, not String
// This prevents queue-based response binding errors from causing
// ClassCastException in CompletableFuture handlers
if (data.getCommand() != null
&& "CLUSTER".equals(data.getCommand().getName())
&& "NODES".equals(data.getCommand().getSubName())
&& !(result instanceof List)) {
data.tryFailure(new RedisException("Response type mismatch for command: CLUSTER NODES"));
return;
}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new behavior (failing CLUSTER NODES when a non-List reply is bound) but there doesn’t appear to be a unit/integration test covering it. Adding a regression test that simulates a queued CLUSTER NODES receiving a +PONG response would help prevent the silent-refresh regression from reappearing.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] CLUSTER NODES receiving PONG responses causes silent topology refresh failure.

2 participants