CS4231 Reference
CS4231 Reference
do-count value =1
(1,0,0) (2,0,0) (3,0,0) Protocol for consistent global snapshot (assume FIFO CS4231 Parallel and Distributed Algorithms 17
user1 (process1) request
delivery on each channel) ▪ Each process maintains
(0,2,2) (2,3,2)
Leader Election
▪ Logical clock and a message buffer for undelivered messages
value = 1
value = value =
value = 1
(0,1,0) - Initiated by one process 1+1 = 2
user2 (process2) 1+1+1 = 3
- After each process takes a snapshot, it sends out a control ▪ A message
Leader election in the
on buffer is deliveredring
anonymous / removed if
CS4231 Parallel and Distributed Algorithms 25 final value computed
(0,0,1) (0,0,2) (0,0,3) message to all other processes ▪
- In the sense that there are no uniqueassigned
All messages in the buffer have been numbers
identifiers for each = 1+2+3+1 = 7
user3 (process3) - If a process receives a control message but has not taken a value = 1 no msg sent along
process▪ This message has the smallest number
snapshot, it takes a snapshot immediately non-tree edges!
- It is impossible using deterministic algorithms (by
Logical clock:
CS4231 Parallel and Distributed Algorithms 12
Chandy&Lamport’s
- For each pair Protocol:
of processes s and r, the messages received symmetry)
between r’s snapshot time and the control message from s Spanning tree usages
s happens before t =⇒ logical-clock(s) < logical-clock(t) Taking snapshots for messages on the fly - For unknown ring size it is not possible to terminate (i.e.
to r are considered to be on-the-fly, and they are recorded to be certain that there is a unique leader after a finite - Broadcast
Vector clock: - Any aggregation (count, sum, average, max, min, etc.)
by r upon receipt number of steps)
s happens before t ⇐⇒ logical-clock(s) < logical-clock(t)
process1 Randomized algorithm that terminates with probability 1 CS4231 Parallel and Distributed Algorithms 18
where “<” := every field in v1 is less than or equal to the
on known ring size:
corresponding field in v2, and at least one field is strictly
- At each phase, run Chang-Roberts algorithm (with hop
Agreement / Consensus
less
count)
Crash failure: Node stops sending any more messages
process2 - If a node receives its own message, then it is one of the
Matrix clock after crashing
winners, so it proceeds to the next phases
- Overview:
process2 records all messages received - Losers only forward messages in future phases Byzantine failure: Node can send arbitrary messages
— Each process maintains n vector clocks, one containing in this window – These are the exact set - If there is only a single winner (can be detected by after failing
data from each process of messages that are only the “fly”
winning mode), then the algorithm stops
— ith vector on process i is called process i’s principle
Synchronous: Message delay has a known upper bound x,
vector Leader election on ring: Chang-Roberts algorithm
Causal order /CS4231
Total order and node processing delay has a known upper bound y (so
— Principle vector is the same as vector clock - Each node has a unique id
Parallel and Distributed Algorithms 21 that it is possible to accurately detect crash failures)
— Non-principle vectors are just piggybacked on messages
Causal order - Nodes send election message with its own id clockwise - For synchronous timing model, processes can always
to update “knowledge” - Election message is forwarded if id in message is larger
- If s1 happened before s2, and r1 and r2 are on the same proceed in inter-locked rounds, where in each round:
- Increment C[i] (C := principle vector of process i) at than own id
process, then r1 must be before r2 — Every process sends one message to every other process
every “local computation” and “send” event - Otherwise message is discarded — Every process receives one message from every other
- Receive event (principle vector C): Protocol for ensuring casual order Chang-Roberts Algorithm:
- (When a node receives its ownBest/Worst Performance
election message, it process
C = pairwise-max(C, V ); C[i]++; (V := principle vector of - Each process maintains an n by n matrix M becomes the leader)
For distributed systems, communication is the bottleneck. — Every process does some local computation
sender) - M [i, j] := num. of messages sent from i to j Performance thus is often described as message complexity. (Can be implemented with accurate clocks, or clocks with
- Receive event (non-principle vector C): - Before i sends a message to j, do M [i, j]++ before Chang-Roberts algorithm message complexity bounded error)
Application
C = pairwise-max(C, V ); of
(V Matrix Clock vector of
:= corresponding piggybacking M with the message ▪ Best case: ▪ Worst case:
sender) - Deliver the message and set local matrix Asynchronous: Message delay is finite but unbounded
(1,0,0) (2,0,0) (3,0,0) ▪ 2n-1 messages ▪ n(n-1)/2 messages
M = pairwise-max(M, T ) if:
(0,0,0)
process1 (0,0,0)
(0,0,0) (0,0,0) Goals
(0,0,0) (0,0,0) 2 2
with gossip
( Termination: All non-failed nodes eventually decide
user3 now T [k, j] ≤ M [k, j] ∀k 6= i
G1 = (1,0,0) Agreement: All non-failed nodes should decide on the same
knows T [i, j] = M [i, j] + 1
(0,0,0) (0,0,0) (2,0,0) (2,0,0) 4 8 value
that all 3 8 4
process2
(0,1,0) (0,2,2) (2,3,2) (2,4,2) users Validity: If all nodes have the same initial input, then that
with gossip (0,0,0) (0,0,2) (0,0,2) (0,0,2) have seen - Otherwise delay message value must be the decision value
G2 = (0,1,0) D1
(2,0,0)
Total order (when broadcasting messages) 7 5 5 7 Ver0: No node or link failures: Trivial
(0,0,0) (0,0,0) (0,0,0)
process3 (0,0,0) (0,0,0) (2,4,2) - All messages delivered to all processes in exactly the same
with gossip (0,0,1)
(0,0,0)
(2,4,4) order - Average case can be proved to be O(n log n). Ver1: Node crash failures, channels are reliable,
(0,0,2) (0,0,3) CS4231 Parallel and Distributed Algorithms 7
G3 = (0,0,1) synchronous
Total order ⇐⇒
6 casual order Leader election on complete graph - Forward messages to all other nodes for (f + 1) rounds to
CS4231 Parallel and Distributed Algorithms 19
- Each node sends its id to all other nodes tolerate f failures
Vector clock: Know what I have seen Total order broadcast protocol using a designated - Wait until you receive n ids (Key idea for proof: If there is a round in which no node
Matrix clock: Know what other people have seen coordinator - Biggest id wins fails, then every non-failed node will have the same set of
Distributed
messages) Consensus Version 1: Protocol coordinator to overrule the deciding phase
A Randomized Algorithm Decision Rule
input = 2 input = 1 input = 3 n processes; at most f failures; f+1 phases; each phase has two rounds
▪ At the end of the r rounds, P1 decides on 1 iff
▪ For simplicity, consider two processes (can ▪ P1 knows that P1’s input and P2’s input are both 1, and
Code for Process i:
generalize to multiple): P1 and P2 V[1..n] = 0; V[i] = my input;
▪ L1 bar
for (k = 1; k ≤ f+1; k++) { // (f+1) phases
round 1 ▪ Algorithm has a predetermined number (r) of rounds ▪ (This implies that P1 will decide on 0 if it does not see P2’s input.)
send V[i] to all processes;
{1, 2, 3} {2, 3} ▪ Adversary determines which messages get lost, round for set V[1..n] to be the n values received;
before seeing the random choices all-to-all
▪ At the end of the r rounds, P2 decides on 1 iff broadcast if (value x occurs (> n/2) times in V) decision = x;
round 2 ▪ P2 knows that P1’s input and P2’s input are both 1, and else decision = 0;
{1, 2, 3} {1, 2, 3} ▪ P1 picks a random integer bar [1...r] at the ▪ P2 knows bar, and
▪ L2 bar coordinator if (k==i) send decision to all; // I am coordinator
beginning round
▪ (This implies that P2 will decide on 0 if it does not see P1’s input or receive coordinatorDecision from the coordinator
Need one extra round for each failure ! f+1 rounds to tolerate f failures ▪ The protocol allows P1 and P2 to each maintain a if it does not see bar.)
decide
S <- {my input} level variable (L1 and L2), such that whether to if (value y occurs (> n/2 + f) times in V) V[i] = y;
for (int i =to1;
f needs beian<= f+1;
input i++)
to the {
protocol --- namely, the ▪ level is influenced by the adversary, but listen to else V[i] = coordinatorDecision;
coordinator
// protocol
do thisneeds
for f+1 rounds
the user to indicate the maximum ▪ L1 and L2 can differ by at most 1 }
send S to ofall
number other
failures nodes
to be tolerated decide on V[i];
CS4231 Parallel and Distributed Algorithms 26 CS4231 Parallel and Distributed Algorithms 30
receive n-1 sets;
Simple Algorithm to Maintain Level When does error occur?
CS4231 Parallel and Distributed Algorithms 35
for each setCS4231
T received: S <- Union(S,
Parallel and Distributed Algorithms T) 14 ▪ For P1 and P2 to decide on different values: One must decide on 1 Correctness Summary
} ▪ P1 sends msg to P2 each round while the other decide on 0
P1 P2
Decide on min(S); ▪ P2 sends msg to P1 each round ▪ For someone to decide on 1, P1’s input and P2’s input must be both 1
▪ Lemma 1: If all nonfaulty processes P_i have V[i] = y at the beginning
0 0 ▪ Case 1: of phase k, then this remains true at the end of phase k.
▪ bar, input and current level ▪ P1 sees P2’s input, but P2 does not see P1’s input or does not see bar
▪ Lemma 2: If the coordinator in phase k is nonfaulty, then all nonfaulty
attached on all messages round 1 ▪ Then L1 = 1 and L2 = 0. Error occurs only when bar = 1.
processes P_i have the same V[i] at the end of phase k.
- It can be shown that any consensus protocol will take at 0 1 ▪ Case 2:
▪ P2 sees P1’s input and bar, but P1 does not see P2’s input
least (f + 1) rounds ▪ Upon P2 receiving a msg with ▪ Termination: Obvious (f+1 phases).
round 2 ▪ Then L1 = 0 and L2 = 1. Error occurs only when bar = 1.
L1 attached: P2 sets L2 = L1+1
2 1 ▪ Case 3: ▪ Validity: Follows from Lemma 1.
▪ P1 sees P2’s input, and P2 sees P1’s input and bar ▪ Agreement:
▪ L1 maintained similarly
round 3 ▪ Define Lmax = max(L1, L2) ▪ With f+1 phases, at least one of them is a deciding phase
▪ Error occurs only when bar = Lmax. ▪ (From Lemma 2) Immediately after the deciding phase, all nonfaulty
Ver2: No node failures, channels may drop 2 3
processes P_i have the same V[i]
messages, synchronous ▪ (From Lemma 1) In following phases, V[i] on nonfaulty processes P_i
- It is immpossible to reach goal using a deterministic does not change
CS4231 Parallel and Distributed Algorithms 27 CS4231 Parallel and Distributed Algorithms 31
algorithm (can consider the case where the communication
Simple Algorithm to Maintain Level Correctness Proof CS4231 Parallel and Distributed Algorithms 40
channel can drop all messages, do we decide on 0 or 1?)
▪ Lemma: L1 and L2 never ▪ Termination: Obvious (r rounds)
Impossibility Proof via Contradiction decreases in any round,
P1 P2
▪ Theorem: If the system is in a legal state, then it will stay in legal states
▪ Our assumption on instantaneous actions will simplify this proof.
▪ Theorem: Regardless of the initial states of the system, the system
will eventually reach a legal state.
▪ We can all possible actions.
▪ Proof: From Lemma 5 and Lemma 6.
▪ For the red process, the only action that can change the system state
happens when V==L, and that action will update V to be (V+1) % k.
▪ As we show earlier, when V==L, the only legal state is for all n values to be
the same. Hence updating V to be (V+1) % k will result in two bands of
values, which is also a legal state.
▪ For the green process, the only action that can change the system state
happens when V≠L, and that action will update V to L.
▪ As we show earlier, when V≠L, the only legal state is to have two bands of
values. Updating V to be L will either result in all n values being the same
(if the green process if the clockwise neighbor of the red process), or still
result in two bands of values (if otherwise). In either case, the system state
is still legal.