Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Added support for remapping the cluster's keyspace on a failover#2025

Merged
michael-grunder merged 1 commit into
phpredis:developfrom
barshaul:develop
Nov 9, 2021
Merged

Added support for remapping the cluster's keyspace on a failover#2025
michael-grunder merged 1 commit into
phpredis:developfrom
barshaul:develop

Conversation

@barshaul

Copy link
Copy Markdown
Contributor

When using phpredis with RedisCluster, and a failover takes place on the cluster, the old primary does not get removed from the masters cache.
It means that if a user attempts to ping the masters after a failover, an exception will be thrown.

foreach ($obj_cluster->_masters() as $arr_master) {
    $obj_cluster->ping($arr_master);
}
 
Array PHP Fatal error:  Uncaught RedisClusterException: Unable to send command at the specified node in /home/ec2-user/clusterTest2/test.php:31
Stack trace:
#0 /home/ec2-user/clusterTest2/test.php(31): RedisCluster->ping(Array)
#1 {main}
  thrown in /home/ec2-user/clusterTest2/test.php on line 31

This PR adds support for remapping the cluster's keyspace when a failover occurs: 
In the current implementation, when we get a MOVED error, we check to see if the redirected address belongs to an existing primary, and if it doesn't, we create a new node and add it to the cluster's primaries. It will end up with a stale primary on the _masters array.
In this fix, I added a check to detect a failover: if the redirected node was a replica of the master that is currently pointing to this slot, then a failover had occurred. In the case of a failover, the cluster's topology has changed, and we will call cluster_map_keyspace() to reinitialize the cluster's nodes cache.

I ran the following test scenario:

  1. Created a cluster with 2 shards = [[primary=6379], [primary=6378,replica=6377]]
  2. Created a RedisCluster instance:
foreach ($obj_cluster->_masters() as $arr_master) {
    print_r($arr_master);

Output:

Array
(
    [0] => 127.0.0.1
    [1] => 6379
)
Array
(
    [0] => 127.0.0.1
    [1] => 6378
)
  1. Killed primary 6378 & executed failover takeover on replica 6377, so replica 6377 became the new primary of this shard
  2. Ran GET command with a slot of the failed primary (to get the MOVED error)
  3. Tested the _masters array: after the old primary (6378) was killed, the masters array was updated with the new promoted primary (6377), but the old primary hasn’t been removed.
Array
(
    [0] => 127.0.0.1
    [1] => 6379
)

Array
(
    [0] => 127.0.0.1
    [1] => 6377
)

Array
(
    [0] => 127.0.0.1
    [1] => 6378
)

With the changes I suggest in this PR the _masters array is updated correctly after the failover:

Array
(
   [0] => 127.0.0.1
   [1] => 6379
)

Array
(
   [0] => 127.0.0.1
   [1] => 6377
)

@michael-grunder michael-grunder self-requested a review October 31, 2021 18:44
@michael-grunder michael-grunder self-assigned this Oct 31, 2021
@michael-grunder michael-grunder removed their request for review October 31, 2021 18:55
@michael-grunder michael-grunder merged commit bce6929 into phpredis:develop Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants