Tags: Dentrax/serf
Tags
updating to memberlist v0.5.0 (hashicorp#664) * updating to memberlist v0.5.0 * running gomod tidy
Add metrics labels (hashicorp#658) * Add labels to metrics * upgrade memberlist to 0.4.0 Co-authored-by: R.B. Boyer <[email protected]>
Add label for ping msg; config memberlist metrics label (hashicorp#654) - Add lable for ping msg - config metrics for memberlist
finish the leave process if broadcasting leave timeout (hashicorp#640) * finish the leave process if broadcasting leave timeout * Log broadcast timeout as WARN and finish the Leave process. * increase timeout
Merge pull request hashicorp#614 from hashicorp/feature/per-node-reco… …nnect-timeout
Avoid issue where two unique leave events for the same node could lea… …d to infinite rebroadcast storms (hashicorp#606) Fixes this scenario: When two leave events for the same node (with different ltimes) are received by two or more additional nodes without any intermediate state transitions it will lead to them setting up an infinite gossip echo chamber rebroadcasting the original node's leave. When in this situation the only way to recover would be to either: - Bring the node back so it can send out a fresh "alive" message. - Use force-leave prune to purge the node's existence from the cluster. There are two known ways to arrive in this situation: 1. Have a node leave the cluster and then from a different node initiate a force-leave. This is the easiest way to reproduce. 2. Have a node leave, rejoin, and leave again in quick succession. This requires winning a lot of races in just the right way to have gossip from the first leave still rebroadcasting naturally WHILE the second round of leaves is rebroadcasting AND to have those rounds of gossip arrive at the target nodes before they see the rebroadcast of the intervening REJOIN. The fix is pretty simple. When we witness a node leave event, before we rebroadcast we need to update the lamport time associated with that node so that the ltime check will cause the same event to be processed again and again. Fixes hashicorp/consul#7960 Fixes hashicorp/consul#8179
[backport on v0.8.5] Avoid issue where two unique leave events for th… …e same node could lead to infinite rebroadcast storms (hashicorp#607) Backport of hashicorp#606 to follow serf v0.8.5
PreviousNext