Module 5: Mutual Exclusion
Already Discussed
• In an asynchronous system we can make no
timing assumptions.
• In a synchronous system, we shall assume
that there are bounds
– on the maximum message transmission delay,
– on the time taken to execute each step of a process,
and
– on clock drift rates
• The synchronous assumptions allow us to
use timeouts to detect process crashes.
SCOPE, VIT Chennai
Aim
• To consider failures, and how to deal
with them when designing
algorithms
SCOPE, VIT Chennai
To learn
• Distributed mutual exclusion
• How to ‘elect’ one of a collection of
processes to perform a special role
• Group communication
• Consensus and agreement
SCOPE, VIT Chennai
Why is it Difficult?
• Centralised solutions not appropriate
– communications bottleneck, single point of failure
• Fixed master-slave arrangements not appropriate
– process crashes
• Varying network topologies
– ring, tree, arbitrary; connectivity problems
• Failures must be tolerated if possible
– link failures
– process crashes
• Impossibility results
– in presence of failures, esp. asynchronous model
– impossibility of “coordinated attack”
SCOPE, VIT Chennai
Synchronous vs.
Asynchronous Interaction
• Synchronous distributed system
– Time to execute a step has lower and upper bounds
– Each message is received within a given time
– Each process has a local clock with a bounded drift
failure detection by timeout
• Asynchronous distributed system
– No bounds on process execution time
– No bounds on message reception time
– Arbitrary clock drifts the common case
SCOPE, VIT Chennai
Co-ordination Problems
• Leader election
– after crash failure has occurred
– after network reconfiguration
• Mutual exclusion
– distributed form of synchronized access problem
– must use message passing
• Consensus (also called Agreement)
– similar to coordinated attack
– some based on multicast communication
– variants depending on type of failure, network, etc
SCOPE, VIT Chennai
Network Partition
SCOPE, VIT Chennai
Failure assumptions and
models
• Assumes reliable communication
channels for simplicity (failure is
masked by a reliable communication
protocol)
• Detecting that a process has failed
can be reliable or unreliable
SCOPE, VIT Chennai
Failure Detector
• Service that processes queries about
whether a particular process has
failed.
• The object local to each process is
called a local failure detector.
SECTION 15.2 DISTRIBUTED MUTUAL EXCLUSION, Page 631-
633
SCOPE, VIT Chennai
Properties of failure detector
• Unreliable: replies unsuspected or
suspected
• Reliable: replies unsuspected or
failed
Unsuspected: Process not failed, recently received message from it, but failed
from then
Suspected: No message received from the process in the bound; may be slow
Failed: Process Crashed
• Practical solution: Use Timeouts
SCOPE, VIT Chennai
Key Notes for Failure
• Assume reliable links, but possible process crashes
• Failure detection service:
– provides query answer if a process has failed
– how?
• processes send ‘I am here’ messages every T secs
• failure detector records replies
– unreliable, especially in asynchronous systems
• Observations of failures:
– Suspected: no recent communication, but could be slow
– Unsuspected: but no guarantee it has not failed since
– Failed: crash has been determined
SCOPE, VIT Chennai
Distributed Mutual Exclusion
• provide critical region in a distributed
environment
• message passing
• for example, locking files, locked
daemon in UNIX (NFS is stateless, no
file-locking at the NFS level)
SCOPE, VIT Chennai
Algorithms for mutual
exclusion
Known as critical section problem:
• Only one process can be in critical section – the
process with the token that allows access to the
resource
• Operations include the following:
– enter() – requests access; can be granted or blocked
– resourceAccess() – access resource in critical section
– exit() – leave critical section – other processes may now enter
• Conditions:
– ME1 (safety) – at most one process in critical section
– ME2 (liveness) – requests eventually succeed
– ME3 (ordering) – requests follow happened-before relationship
SCOPE, VIT Chennai
Algorithms for mutual
exclusion
• Performance evaluation:
– overhead and bandwidth consumption: # of
messages sent
– client delay incurred by a process at entry
and exit
– throughput measured by synchronization
delay: delay between one's exit and next's
entry
SCOPE, VIT Chennai
Distributed mutual exclusion
Various algorithms:
• Central server algorithm
• Ring-based algorithm
• Algorithm using multicast and logical
clocks
• Voting algorithm
• Others
SCOPE, VIT Chennai
A central server algorithm
• Server keeps track of a token---permission
to enter critical region
• A process requests the server for the token
• The server grants the token if it has the
token
• A process can enter if it gets the token,
otherwise waits
• When done, a process sends release and
exits
SCOPE, VIT Chennai
Server managing a mutual exclusion token
for a set of processes (Figure: 15.2)
SCOPE, VIT Chennai
• In the figure, we show a situation in which
p2 ’s request has been appended to the
queue, which already contained p4 ’s
request.
• p3 exits the critical section, and the server
removes p4 ’s entry and grants permission
to enter to p4 by replying to it.
• Process p1 does not currently require entry
to the critical section.
SCOPE, VIT Chennai
Evaluate the performance
• Properties:
– safety, why?
– liveness, why?
– HB ordering not guaranteed, why? [VC in processes vs. server]
• Performance:
– enter overhead: two messages (request and grant)
– enter delay: time between request and grant
– exit overhead: one message (release)
– exit delay: none
– synchronization delay: between release and grant
– centralized server is the bottle neck
SCOPE, VIT Chennai
A Ring-Based algorithm
• Arrange the N processes as a logical ring.
• No need for additional process to promote mutual exclusion.
• Process pi has a communication channel to the next process in the ring.
• Mutual Exclusion is achieved by obtaining a token in the form of a
message passed from process to process in a single direction – Eg:
Clockwise direction around the ring.
• Similar to ring topology.
• If a process does not require to enter the critical section when it
receives the token, then it immediately forwards the token to its
neighbour.
• A process that requires the token waits until it receives it, but retains it.
• To exit the critical section, the process sends the token on to its
neighbour.
SCOPE, VIT Chennai
A ring-based algorithm
(Figure: 15.3)
SCOPE, VIT Chennai
Performance of Ring-Based
Algorithm
• Continuously consumes network bandwidth - the processes
send messages around the ring even when no process
requires entry to the critical section.
• The delay experienced by a process requesting entry to the
critical section is between 0 messages (when it has just
received the token) and N messages (when it has just
passed on the token).
• The synchronization delay between one process’s exit from
the critical section and the next process’s entry is anywhere
from 1 to N message transmissions.
SCOPE, VIT Chennai
An algorithm using multicast
and logical clocks
• An algorithm to implement mutual exclusion between N peer
processes that is based upon multicast - Ricart and Agrawala
• Processes that require entry to a critical section multicast a request
message, and can enter it only when all the other processes have
replied to this message.
• The processes p1 to pN have distinct numeric identifiers.
• Processes are assumed to possess communication channels to one
another.
• Each process pi keeps a Lamport clock to maintain uniform time.
• Each process records its state as follows:
– outside the critical section (RELEASED),
– wanting entry (WANTED)
– being in the critical section (HELD) in a variable state.
SCOPE, VIT Chennai
An algorithm using multicast
and logical clocks
• If a process requests entry and the state of all other
processes is RELEASED, then all processes will reply
immediately to the request and the requester will
obtain entry.
• If some process is in the state HELD, then that
process will not reply to requests until it has finished
with the critical section, and so the requester cannot
gain entry in the meantime.
• If two or more processes request entry at the same
time, then whichever process’s request bears the
lowest timestamp will be the first to collect N – 1
replies, granting it entry next.
SCOPE, VIT Chennai
SCOPE, VIT Chennai
Multicast synchronization
(Figure: 15.5)
SCOPE, VIT Chennai
An algorithm using multicast
and logical locks
• The advantage of this algorithm is
that its synchronization delay is only
one message transmission time.
• Both the previous algorithms
incurred a round-trip synchronization
delay.
SCOPE, VIT Chennai
Maekawa’s Voting Algorithm
• For a process to enter a critical section, it is not necessary for
all of its peers to grant it access – Maekawa.
• Processes have to only obtain permission to enter from
subsets of their peers, as long as the subsets used by any two
processes overlap.
• Processes vote for one another to enter the critical section.
• A ‘candidate’ process must collect sufficient votes to enter.
• Processes in the intersection of two sets of voters ensure that
at most one process can enter the critical section, by casting
their votes for only one candidate.
SCOPE, VIT Chennai
Figure 15.6
Maekawa’s voting algorithm
SCOPE, VIT Chennai
Elections
• Choosing a unique process for a particular role
• For example, server for distributed algorithms that
require a central server
• each process can call only one election: if it detects that
central server has failed
• multiple concurrent elections can be called by different
processes
• participant: engages in an election
• process with the largest id wins
• each process pi has variable electedi = ? (don't know)
initially
SCOPE, VIT Chennai
Elections
• Election algorithms:
• Ring-based algorithm
• Bully algorithm
SCOPE, VIT Chennai
A ring-based algorithm
• logical ring, could be unrelated to the physical
configuration
• pi sends messages to p (i+1) mod N
• no failures
• elect the coordinator with the largest id
• initially, every process is a non-participant
• any process can call an election:
– marks itself as participant
– places its id in an election message
– sends the message to its neighbour
SCOPE, VIT Chennai
Ring-based algorithm
• receiving an election message:
– if id > myid, forward the msg, mark participant
– if id < myid
• non-participant: replace id with myid: forward the msg, mark
participant
• participant: stop forwarding (why? Later, multiple elections)
– if id = myid, coordinator found, mark non-participant,
electedi := id, send elected message with myid
• receiving an elected message:
– id != myid, mark non-participant, electedi := id, forward
the msg
– if id = myid, stop forwarding
SCOPE, VIT Chennai
A ring-based election in
progress
SCOPE, VIT Chennai
Ring-based algorithm
• Properties:
– safety: only the process with the largest id
can send an elected message
– liveness: every process in the ring
eventually participates in the election; extra
elections are stopped
SCOPE, VIT Chennai
Ring-based algorithm
• Performance:
– one election, best case, when?
• N election messages
• N elected messages
• turnaround: 2N messages
– one election, worst case, when?
• 2N - 1 election messages
• N elected messages
• turnaround: 3N - 1 messages
SCOPE, VIT Chennai
The bully algorithm
• Processes can crash and can be
detected by other processes
• Each process knows all the other
processes and can communicate with
them
• Messages: election, answer,
coordinator
SCOPE, VIT Chennai
The bully algorithm
• start an election
– detects the coordinator has failed
– sends an election message to all processes with higher
id's and waits for answers (except the failed
coordinator/process)
– if no answers in time T,
• it is the coordinator
• sends coordinator message (with its id) to all processes with
lower id's
– else
• waits for a coordinator message
• starts an election if timeout
SCOPE, VIT Chennai
The bully algorithm
SCOPE, VIT Chennai
The bully algorithm
• Receiving an election message
– sends an answer message back
– starts an election if it hasn't started one—send election
messages to all higher-id processes (including the “failed”
coordinator—the coordinator might be up by now)
• Receiving a coordinator message
– set electedi to the new coordinator
• To be a coordinator, it has to start an election
• When a crashed process is replaced
– the new process starts an election and
– can replace the current coordinator (hence ``bully'')
SCOPE, VIT Chennai
Consensus Introduction
• Make agreement in a distributed
manner
– Mutual exclusion: who can enter the critical region
– Totally ordered multicast: the order of message delivery
– Byzantine generals: attack or retreat?
• Consensus problem
– Agree on a value after one or more of the processes has
proposed what the value should be
SCOPE, VIT Chennai
Consensus Introduction
• To reach consensus, every process pi begins
in the undecided state and proposes a
single value vi , drawn from a set D ( i =
1,2,...,N)
• The processes communicate with one
another, exchanging values.
• Two processes propose ‘proceed’ and a
third proposes ‘abort’ but then crashes. The
two processes that remain correct each
decide ‘proceed’.
SCOPE, VIT Chennai
Consensus Problem Example
SCOPE, VIT Chennai
Consensus
• Consensus Requirements:
• Termination: Eventually each correct process sets its
decision variable
• Agreement: the decision value of all correct
processes is the same:
– Pi and Pj are correct di = dj (i,j=1, …, N)
• Integrity: If the correct processes all proposed the
same value, then any correct process in the decided
state has chosen that value. Otherwise majority
function can be used. In case of ordered values,
Minimum or Maximum function can also be used.
SCOPE, VIT Chennai
Consensus
• Interactive consistency problem: variant of the
consensus problem
• Objective: correct processes must agree on a
vector of values, one for each process
• Proprieties to satisfy:
– Termination: Eventually each correct process sets its
decision variable
– Agreement: the decision vector of all correct processes
is the same
– Integrity: If Pi is correct, then all correct processes
decide on Vi as the ith component of their vector
SCOPE, VIT Chennai
Consensus
• Byzantine generals problems: variant of the
consensus problem
• Objective: a distinguished process supplies a value
that the others must agree upon
• Proprieties to satisfy:
– Termination: Eventually each correct process sets its decision
variable
– Agreement: the decision value of all correct processes is the
same; Pi and Pj are correct di = dj (i,j=1, …,N)
– Integrity: If the commander is correct, then all correct processes
decide on the value that the commander proposed
SCOPE, VIT Chennai
Consensus
Byzantine agreement in a synchronous system:
Example : a system composed of three processes (must agree on a
binary value 0 or 1)
process j is faulty
Commander
1 1
Nodei Nodej
0
SCOPE, VIT Chennai
References
• Coulouris G (et al) – Distributed
System – Concepts and Design –
Pearson 2001
SCOPE, VIT Chennai