Chapter 7 - Consistency and Replication
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 1
Introduction
data are generally replicated to enhance reliability and
improve performance
but replication may create inconsistency
consistency models for shared data are often hard to
implement in large-scale distributed systems; hence simpler
models such as client–centric consistency models are used
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 2
Objectives of the Chapter
we discuss
why replication is useful and its relation with
scalability; in particular object-based replication
consistency models for shared data designed for
parallel computers which are also useful in distributed
shared memory systems
client–centric consistency models
how consistency and replication are implemented
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 3
7.1 Reasons for Replication and Object Replication
two major reasons: reliability and performance
reliability
if a file is replicated, we can switch to other replicas if
there is a crash on our replica
we can provide better protection against corrupted
data; similar to mirroring in non-distributed systems
performance
if the system has to scale in size and geographical area
place a copy of data in the proximity of the process
using them, reducing the time of access and increasing
its performance; for example a Web server is accessed
by thousands of clients from all over the world
caching is strongly related to replication; normally by
clients
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 4
Object Replication
consider a distributed object shared by multiple clients
organization of a distributed remote object shared by two different clients
before replication of an object, how to protect the object
against simultaneous access by multiple clients; two
methods
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 5
1. the object itself can handle concurrent invocations by its
own; e.g., a Java object can be constructed as a monitor by
declaring the object’s methods to be synchronized (only one
thread is allowed to proceed while others are blocked until
further notice)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 6
2. the server is responsible for concurrency control using an
object adapter, e.g., using a single thread per object; the
single thread serializes all incoming invocations
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 7
when objects are replicated, the replicas need additional
synchronization to ensure that concurrent invocations are
performed in the correct order at each of the replicas; e.g.,
our previous example of the bank account database
two approaches can be used to handle synchronization
1. the object is aware of the replication and ensures that the
replicas stay consistent; this allows to construct object-
specific replication strategies
a distributed system for replication-aware distributed objects
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 8
2. the distributed system manages the replication
it ensures that concurrent invocations are passed to the
various replicas in the correct order
simpler for application developers
difficult to implement object-specific solutions
a distributed system responsible for replica management
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 9
Replication as Scaling Technique
replication and caching are widely applied as scaling
techniques
processes can use local copies and limit access time and
traffic
however, we need to keep the copies consistent; but this
may
1. require more network bandwidth
if the copies are refreshed more often than used (low
access-to-update ratio), the cost (bandwidth) is more
expensive than the benefits; not all updates have been
used
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 10
2. itself be subject to serious scalability problems
intuitively, a read operation made on any copy should
return the same value (the copies are always the same)
thus, when an update operation is performed on one
copy, it should be propagated to all copies before a
subsequent operation takes places
this is sometimes called tight consistency (a write is
performed at all copies in a single atomic operation or
transaction)
difficult to implement since it means that all replicas
first need to reach agreement on when exactly an
update is to be performed locally, say by deciding a
global ordering of operations using Lamport
timestamps and this takes a lot of communication time
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 11
dilemma
scalability problems can be alleviated by applying
replication and caching, leading to a better performance
but, keeping copies consistent requires global
synchronization, which is generally costly in terms of
performance
solution: loosen the consistency constraints
updates do not need to be executed as atomic operations
(no more instantaneous global synchronization); but copies
may not be always the same everywhere
to what extent the consistency can be loosened depends on
the specific application (the purpose of data as well as
access and update patterns)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 12
7.2 Data-Centric Consistency Models
consistency has always been discussed
in terms of read and write operations on shared data
available by means of (distributed) shared memory, a
(distributed) shared database, or a (distributed) file
system
we use the broader term data store, which may be
physically distributed across multiple machines
assume also that each process has a local copy of the data
store and write operations are propagated to the other
copies
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 13
the general organization of a logical data store, physically distributed and replicated across multiple
processes
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 14
a consistency model is a contract between processes and the
data store
processes agree to obey certain rules
then the data store promises to work correctly
ideally, a process that reads a data item expects a value that
shows the results of the last write operation on the data
in a distributed system and in the absence of a global clock
and with several copies, it is difficult to know which is the last
write operation
to simplify the implementation, each consistency model
restricts what read operations return
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 15
data-centric consistency models to be discussed
1. strict consistency
2. sequential consistency
3. linearizability
4. causal consistency
5. FIFO consistency
6. weak consistency
7. release consistency
8. entry consistency
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 16
1.Strict Consistency
the most stringent consistency model and is defined by the
following condition:
Any read on a data item x returns a value corresponding
to the result of the most recent write on x.
this relies on absolute global time
sometimes it is against nature
x is stored only on machine B
a process on machine A reads x at time T1, i.e., a
message is sent to B
a process on machine B does a write on x at
time T2 (T1 < T2)
if T2-T1 is 1 nanosecond, and if the machines are 3
meters apart, the read request can reach B before the
new write operation if the signal travels 10 times the
speed of light
the requirement is too stringent to demand
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 17
the following notations and assumptions will be used
W (x) means write by P to data item x with the value a has
i a i
been done
R (x) means a read by P to data item x returning the value
i b i
b has been done
the index may be omitted when there is no confusion as to
which process is accessing data
assume that initially each data item is NIL
consider the following example; write operations are done
locally and later propagated to other replicas
behavior of two processes operating on the same data item
a) a strictly consistent data store
b) a data store that is not strictly consistent; P2’s first read may be, for example, after 1 nanosecond of P1’s write
the solution is to relax absolute time and consider time
intervals UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 18
2.Sequential Consistency
strict consistency is the ideal but impossible to implement
fortunately, most programs do not need strict consistency
sequential consistency is a slightly weaker consistency
a data store is said to be sequentially consistent when it
satisfies the following condition:
The result of any execution is the same as if the (read and
write) operations by all processes on the data store were
executed in some sequential order and the operations of
each individual process appear in this sequence in the
order specified by its program
i.e., all processes see the same interleaving of operations
time does not play a role; no reference to the “most recent”
write operation
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 19
example: four processes operating on the same data item x
the write operation of P2 appears
to have taken place before that of
P1; but for all processes
a sequentially consistent data
store
to P3, it appears as if the data item
has first been changed to b, and
later to a; but P4 , will conclude
that the final value is b
a data store that is not
sequentially consistent not all processes see the same
interleaving of write operations
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 20
3.Linearizability
weaker than strict consistency but stronger than sequential
consistency
operations are assumed to receive a timestamp using a
globally available clock, but one with finite precision; for
example processes use loosely synchronized clocks
let ts
OP(x) denote the timestamp assigned to operation OP
that is performed on data item x, where OP is either a read
or write, then
a data store is said to be linearizable when each operation
is timestamped and the following condition holds:
The result of any execution is the same as if the (read and
write) operations by all processes on the data store were
executed in some sequential order and the operations of
each individual process appear in this sequence in the
order specified by its program. In addition, if tsOP1(x) <
tsOP2(y), then OP1(x) should precede OP2(y) in this
sequence. 21
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE
a linearizable data store is also sequentially consistent
but linearizability is more expensive to implement because
of the additional requirement
in the case of transactions, sequential consistency is
comparable to serializability (recall: a collection of
concurrently executing transactions is serializable if the
final result is the same as if the transactions were executed
one after the other in some specific order)
the main difference is in granularity: sequential consistency
is defined in terms of read and write operations, whereas
serializability is defined in terms of transactions, which
aggregate such operations
to understand sequential consistency better consider the
following example
assume three concurrently executing processes and three
data items (integers) stored in a sequentially consistent
data store
each variable is assumed to be initialized to 0
UNIVERSITY OF GONDAR 22
DEPARTMENT OF COMPUTER SCIENCE
Process P1 Process P2 Process P3
x = 1; y = 1; z = 1;
print (y, z); print (x, z); print (x, y);
three concurrently executing processes
assignments are write operations and prints are read
operations; all statements are assumed to be indivisible
there are 720 = 6! possible execution sequences
from the 120 (5!) sequences that begin with x = 1, some of
them have print(x, z) before y = 1 and violate program
order; some also have print(x, y) before z = 1 and violate
program order
only 1/4 (=30) of the 120 sequences are valid
also considering those that start with y = 1 and z = 1,
there are only a total of 90 valid execution sequences
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 23
x = 1; x = 1; y = 1; y = 1;
print (y, z); y = 1; z = 1; x = 1;
y = 1; print (x, z); print (x, y); z = 1;
print (x, z); print (y, z); print (x, z); print (x, z);
z = 1; z = 1; x = 1; print (y, z);
print (x, y); print (x, y); print (y, z); print (x, y);
Prints: 001011 Prints: 101011 Prints: 010111 Prints: 111111
Signature: Signature: Signature: Signature:
001011 101011 110101 111111
four valid execution sequences for the processes of the previous
slide; the vertical axis is time
if we concatenate the outputs of P1, P2 and P3 in that order, we get a 6-bit
signature of the execution
there are a total of 64 = 26 signatures, where 6 is the number of bits of the
signature
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 24
not all 64 (26) signatures are valid; for example
000000 is not valid; it means all prints are done before all
assignments; it violates the requirement that statements
are executed in program order
001001 is impossible; 00 means P1 executes before P3 and
01 means P3 executes before P1
the 90 different valid statement orderings produce a variety of
results (< 64) that are allowed under the assumption of
sequential consistency
all processes must accept these as valid results and work
correctly, which is the contract between them and the data
store
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 25
4.Causal Consistency
it is a weakening of sequential consistency
it distinguishes between events that are potentially causally
related and those that are not
example: a write on y that follows a read on x; the writing
of y may have depended on the value of x; e.g., y = x+5
otherwise the two events are concurrent
two processes write two different variables
if event B is caused or influenced by an earlier event, A,
causality requires that everyone else must first see A, then
B
a data store is said to be causally consistent, if it obeys the
following condition:
Writes that are potentially causally related must be seen
by all processes in the same order. Concurrent writes
may be seen in a different order on different machines.
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 26
example
W2(x)b and W1(x)c are concurrent, not a requirement for
processes to see them in the same order
CR
Conc
this sequence is allowed with a casually-consistent store, but not with
sequentially or strictly consistent store
CR Conc
a violation of a causally-consistent store
a)
b) a correct sequence of events in a causally-consistent store
implementing causal consistency requires keeping track of which
processes have seen which writes; a dependency graph must be
constructed and maintained, say by means of vector timestamps
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 27
5.FIFO Consistency
in causal consistency, causally-related operations must be
seen in the same order by all machines
FIFO consistency relaxes this
necessary condition for FIFO consistency:
Writes done by a single process are seen by all other
processes in the order in which they were issued, but
writes from different processes may be seen in a different
order by different processes
a valid sequence of events of FIFO consistency, but not with others discussed so far
FIFO consistency is easy to implement; tag each write
operation with a (process, sequence number) pair, and
perform writes per process in the order of their sequence
number UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 28
consider the following three processes again
Process P1 Process P2 Process P3
x = 1; y = 1; z = 1;
print (y, z); print (x, z); print (x, y);
x = 1; x = 1; y = 1;
print (y, z); y = 1; print (x, z);
y = 1; print (x, z); z = 1;
print (x, z); print (y, z); print (x, y);
z = 1; z = 1; x = 1;
print (x, y); print (x, y); print (y, z);
Prints: 00 Prints: 10 Prints: 01
(a) (b) (c)
statement execution as seen by the three processes (a) as seen by P1, (b) as seen by P2,
and (c) as seen by P3; the statements in red are the ones that generate the output shown
concatenating the output of the three processes gives
001001, which is impossible in sequential consistency
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 29
problem of FIFO consistency
assume two concurrent processes P1 and P2 and let the
integer variables x and y are initialized to 0
Process P1 Process P2
x = 1; // a y = 1; // c
if (y == 0) kill (P2); // b if (x == 0) kill (P1); // d
two concurrent processes
one would expect three possible outcomes: P1 is killed,
P2 is killed or neither is killed (if the two assignments go
first)
with a sequentially consistent data store, there are six
possible statement interleavings, and none of them
results in both processes being killed (abcd - kills P2;
cdab - kills P1; cadb, cabd, acbd, acdb - neither is killed )
but, both can be killed in FIFO consistency if P1 reads
R1(y)0 before it sees P2’s W2(y)1 and P2 reads R1(x)0
before itUNIVERSITY
sees P1’s W1(x)1
OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 30
Models with synchronization operations
6.Weak Consistency
FIFO consistency is still unnecessarily restrictive for many
applications; it requires that writes originating in a single
process be seen everywhere in order
not all applications require even seeing all writes, let alone
seeing them in order
for example, there is no need to worry about intermediate
results in a critical section since other processes will not see
the data until it leaves the critical section; only the final
result need to be seen by other processes
this can be done by a synchronization variable, S, that has
only a single associated operation synchronize(S), which
synchronizes all local copies of the data store
a process performs operations only on its locally available
copy of the store
when the data store is synchronized, all local writes by
process P are propagated to the other copies and writes by
other processes are brought in to P’s copy
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 31
this leads to weak consistency models which have three
properties
1. Accesses to synchronization variables associated with
a data store are sequentially consistent (all processes
see all operations on synchronization variables in the
same order)
2. No operation on a synchronization variable is allowed to
be performed until all previous writes have been
completed everywhere (synchronization flushes the
pipeline: all partially completed - or in progress - writes
are guaranteed to be completed when the
synchronization is done)
3. No read or write operation on data items are allowed to
be performed until all previous operations to
synchronization variables have been performed (when a
process accesses a data item (for reading or writing) all
previous synchronization will have been completed; by
doing a synchronization a process can be sure of
gettingUNIVERSITY
the most recent
OF GONDAR values) OF COMPUTER SCIENCE
DEPARTMENT 32
weak consistency enforces consistency on a group of
operations, not on individual reads and writes
e.g., S stands for synchronizes; it means that a local copy
of a data store is brought up to date
a) a valid sequence of events for weak consistency
b) an invalid sequence for weak consistency; P2 should get b
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 33
7. Release Consistency
with weak consistency model, when a synchronization
variable is accessed, the data store does not know whether it
is done because the process has finished writing the shared
data or is about to start reading
if we can separate the two (entering a critical section and
leaving it), a more efficient implementation might be possible
the idea is to selectively guard shared data; the shared data
that are kept consistent are said to be protected
release consistency provides mechanisms to separate the
two kinds of operations or synchronization variables
an acquire operation is used to tell that a critical region is
about to be entered
a release operation is used to tell that a critical region has
just been exited
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 34
when a process does an acquire, the store will ensure that
all copies of the protected data are brought up to date to be
consistent with the remote ones; does not guarantee that
locally made changes will be sent to other local copies
immediately
when a release is done, protected data that have been
changed are propagated out to other local copies of the
store; it does not necessarily import changes from other
copies
a valid event sequence for release consistency
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 35
a distributed data store is release consistent if it obeys the
following rules
Before a read or write operation on shared data is
performed, all previous acquires done by the process
must have completed successfully.
Before a release is allowed to be performed, all previous
reads and writes by the process must have been
completed.
Accesses to synchronization variables are FIFO
consistent (sequential consistency is not required).
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 36
implementation algorithm (eager release consistency)
to do an acquire, a process sends a message to a central
synchronization manager requesting an acquire on a
particular lock
if there is no competition, the request is granted
then, the process does reads and writes on the shared
data, locally
when the release is done, the modified data are sent to the
other copies that use them
after each copy has acknowledged receipt of the data, the
synchronization manager is informed of the release
but may be not all processes need to see the new changes
a variant is the lazy release consistency
at the time of release, nothing is sent anywhere
instead, when an acquire is done, the process trying to do
an acquire has to get the most recent values of the data
this avoids sending values to processes that don’t need
them thereby reducing
UNIVERSITY OF GONDARwastage of OF
DEPARTMENT bandwidth
COMPUTER SCIENCE 37
8. Entry Consistency
like release consistency, it requires an acquire and release
to be used at the start and end of a critical section
however, it requires that each ordinary shared data item to
be associated with some synchronization variable such as
a lock
if it is desired that elements of an array be accessed
independently in parallel, then different array elements may
be associated with different locks
synchronization variable ownership
each synchronization variable has a current owner, the
process that acquired it last
the owner may enter and exit critical sections
repeatedly without sending messages
other processes must send a message to the current
owner asking for ownership and the current values of
the data associated with that synchronization variable
several processes can also simultaneously own a
synchronization variable, but only for reading
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 38
a data store exhibits entry consistency if it meets all the
following conditions:
An acquire access of a synchronization variable is not
allowed to perform with respect to a process until all
updates to the guarded shared data have been performed
with respect to that process. (at an acquire, all remote
changes to the guarded data must be made visible)
Before an exclusive mode access to a synchronization
variable by a process is allowed to perform with respect to
that process, no other process may hold the
synchronization variable, not even in nonexclusive mode.
After an exclusive mode access to a synchronization
variable has been performed, any other process's next
nonexclusive mode access to that synchronization variable
may not be performed until it has performed with respect to
that variable's owner. (it must first fetch the most recent
copies of the guarded shared data)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 39
a valid event sequence for entry consistency
when an acquire is done only those variables guarded by that
synchronization variable are made consistent
therefore, a few shared data items have to be synchronized
when there is a release
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 40
Summary of Data-Centric Consistency Models
a) consistency models not using synchronization operations
b) models with synchronization operations 41
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE
consistency models differ
in complexity of implementation
ease of programming
performance
strict consistency: most restrictive; never implemented,
implementation in a distributed system is impossible
linearizability: hardly ever used; but facilitates reasoning
about the correctness of parallel programs
sequential consistency: widely used, but poor performance;
so relax conditions by having causal consistency and FIFO
consistency
weak consistency, release consistency, and entry
consistency: require additional programming constructs;
allow programmers to pretend that a data store is
sequentially consistent when in fact it is not; may provide the
best performance depending on applications
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 42
7.3 Client-Centric Consistency Models
with many applications, updates happen very rarely
for these applications, data-centric models where high
importance is given for updates are not suitable
very weak consistency is generally sufficient for such
systems
Eventual Consistency
there are many applications where few processes (or a
single process) update the data while many read it and
there are no write-write conflicts; we need to handle
only read-write conflicts; e.g., DNS server, Web site
for such applications, it is even acceptable for readers
to see old versions of the data (e.g., cached versions of
a Web page) until the new version is propagated
with eventual consistency, it is only required that
updates are guaranteed to gradually propagate to all
replicas
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 43
data stores that are eventually consistent have the property
that in the absence of updates, all replicas converge toward
identical copies of each other
write-write conflicts are rare and are implemented separately
the problem with eventual consistency is when different
replicas are accessed, e.g., a mobile client accessing a
distributed database may acquire an older version of data
when it uses a new replica as a result of changing location
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 44
the principle of a mobile user accessing different replicas of a distributed database
the solution is to introduce client-centric consistency
it provides guarantees for a single client concerning the
consistency of accesses to a data store by that client; no
guaranties are given concerning concurrent accesses by
different clients
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 45
there are four client-centric consistency models
consider a data store that is physically distributed across
multiple machines
a process reads and writes to a locally available copy and
updates are propagated
assume that data items have an associated owner, the only
process permitted to modify that item, hence write-write
conflicts are avoided
the following notations are used
x [t] denotes the version of the data item x at local copy
i
Li at time t
version x [t] is the result of a series of write operations at
i
Li that took place since initialization; denote this set by
WS(xi[t])
if operations in WS(x [t ]) have also been performed at
i 1
local copy Lj at a later time t2, we write WS(xi[t1];xj[t2]); it
means that WS(xi[t1]) is part of WS(xj[t2])
the time index may be omitted if ordering of operations is
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 46
1.Monotonic Reads
a data store is said to provide monotonic-read consistency
if the following condition holds:
If a process reads the value of a data item x, any
successive read operation on x by that process will
always return that same value or a more recent value
i.e., a process never sees a version of data older than what
it has already seen
the read operations performed by a single process P at two different local
copies of the same data store
a) a monotonic-read consistent data store
b) a data store that does not provide monotonic reads; there is no
guaranty that when R(x2) is executed WS (x2) also contains WS (x1)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 47
2.Monotonic Writes
it may be required that write operations propagate in the
correct order to all copies of the data store
in a monotonic-write consistent data store the following
condition holds:
A write operation by a process on a data item x is
completed before any successive write operation on x by
the same process
completing a write operation means that the copy on which
a successive operation is performed reflects the effect of a
previous write operation by the same process, no matter
where that operation was initiated
monotonic-write consistency resembles data-centric FIFO
consistency; here we consider consistency only for a
single process (instead of for a collection of concurrent
processes)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 48
may not be necessary if a later write operation completely
overwrites the present
x = 78;
x = 90;
no need to make sure that x has been first changed to 78
it is important only if part of the state of the data item
changes
e.g., a software library, where one or more functions are
replaced, leading to a new version
the write operations performed by a single process P at two different local copies of the
same data store
a) a monotonic-write consistent data store
b) a data store that does not provide monotonic-write consistency
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 49
3.Read Your Writes
a data store is said to provide read-your-writes
consistency, if the following condition holds:
The effect of a write operation by a process on data item
x will always be seen by a successive read operation on x
by the same process
i.e., a write operation is always completed before a
successive read operation by the same process, no matter
where that read operation takes place
the absence of read-your-writes consistency is often
experienced when a Web page is modified using an editor
and the modification is not seen on the browser due to
caching; read-your-writes consistency guarantees that the
cache is invalidated when the page is updated
a) a data store that provides read-your-writes consistency
b) a data store that does not 50
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE
4.Writes Follow Reads
updates are propagated as the result of previous read
operations
a data store is said to provide writes-follow-reads
consistency, if the following condition holds:
A write operation by a process on a data item x following
a previous read operation on x by the same process, is
guaranteed to take place on the same or a more recent
value of x that was read
i.e., any successive write operation by a process on a data
item x will be performed on a copy of x that is up to date
with the value most recently read by that process
this guaranties, for example, that users of a newsgroup see
a posting of a reaction to an article only after they have
seen the original article; if B is a response to message A,
writes-follow-reads consistency guarantees that B will be
written to any copy only after A has been written
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 51
a) a writes-follow-reads consistent data store
b) a data store that does not provide writes-follow-reads consistency
Naive Implementation of Client-Centric Consistency
each write operation is given a globally unique identifier,
assigned by the server that accepts the operation for the
first time
then for each client, keep track of two sets of identifiers:
the read set consists of the write identifiers relevant for
the read operations performed by a client
the write set consists of the write identifiers performed by
the client
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 52
monotonic-read consistency is implemented as follows
when a client performs a read operation at a server, the
server is handed the client’s read set to check if all the
identified writes have taken place locally
if not, the server contacts the other servers to ensure that it
is brought up to date before carrying out the read operation
(or the read operation is forwarded to a server where the
write operations took place)
after the read operation, the relevant write operations that
have taken place at the selected servers are added to the
client’s read set
monotonic-write consistency is implemented as follows
when a client initiates a new write operation to a server, the
server is handed the client’s write set
it then ensures that the identified write operations are done
first and in the correct order
after performing the write, that operation’s write identifier is
added to the write set
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 53
read-your-writes consistency is implemented as follows
it requires that the server where the read operation is
performed has seen all the write operations in the client’s
write set
the writes can be fetched from the other servers before the
read operation is performed (may result with a poor response
time)
alternatively, the client-side software can search for a server
where the identified write operations in the client’s write set
have already been performed
writes-follow-reads consistency is implemented as follows
first bring the selected server up to date with the write
operations in the client’s read set
then add the identifier of the write operation to the write set,
along with the identifiers in the read set (which have now
become relevant for the write operation just performed)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 54
problem: in naive implementation, the read and write sets can
become very large
to improve efficiency, read and write operations can be
grouped into sessions, clearing the sets when the session
ends
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 55
7.4 Replica Management
there are different ways of propagating, i.e., distributing
updates to replicas, independent of the consistency
model
we will discuss
replica placement
update propagation
epidemic protocols
a. Replica Placement
a major design issue for distributed data stores is
deciding where, when, and by whom copies of the data
store are to be placed
three types of copies:
permanent replicas
server-initiated replicas
client-initiated replicas
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 56
the logical organization of different kinds of copies of a data store into three concentric rings
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 57
1. Permanent Replicas
the initial set of replicas that constitute a distributed
data store; normally a small number of replicas
e.g., a Web site: two forms
the files that constitute a site are replicated across a
limited number of servers on a LAN; a request is
forwarded to one of the servers
mirroring: a Web site is copied to a limited number
of servers, called mirror sites, which are
geographically spread across the Internet; clients
choose one of the mirror sites
2. Server-Initiated Replicas (push caches)
Web Hosting companies dynamically create replicas to
improve performance (e.g., create a replica near hosts
that use the Web site very often)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 58
3. Client-Initiated Replicas (client caches or simply caches)
to improve access time
a cache is a local storage facility used by a client to
temporarily store a copy of the data it has just received
placed on the same machine as its client or on a
machine shared by clients on a LAN
managing the cache is left entirely to the client; the
data store from which the data have been fetched has
nothing to do with keeping cached data consistent
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 59
b.Update Propagation
updates are initiated at a client, forwarded to one of the
copies, and propagated to the replicas ensuring
consistency
some design issues in propagating updates
state versus operations
pull versus push protocols
unicasting versus multicasting
1. State versus Operations
what is actually to be propagated? three possibilities
send notification of update only (for invalidation
protocols - useful when read/write ratio is small); use of
little bandwidth
transfer the modified data (useful when read/write ratio
is high)
transfer the update operation (also called active
replication); it assumes that each machine knows how
to do the operation; use of little bandwidth, but more
processing power needed from each replica
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 60
2. Pull versus Push Protocols
push-based approach (also called server- based protocols):
propagate updates to other replicas without those replicas
even asking for the updates (used when high degree of
consistency is required and there is a high read/write ratio)
pull-based approach (also called client-based protocols):
often used by client caches; a client or a server requests
for updates from the server whenever needed (used when
the read/write ratio is low)
a comparison between push-based and pull-based
protocols; for simplicity assume multiple clients and a
single server
Issue Push-based Pull-based
State of server List of client replicas and caches None
Update (and possibly fetch
Messages sent Poll and update
update later)
Response time at client Immediate (or fetch-update time) Fetch-update time
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 61
3. Unicasting versus Multicasting
multicasting can be combined with push-based
approach; the underlying network takes care of sending a
message to multiple receivers
unicasting is the only possibility for pull-based approach;
the server sends separate messages to each receiver
c.Epidemic Protocols
update propagation in eventual consistency is often
implemented by a class of algorithms known as epidemic
protocols
updates are aggregated into a single message and then
exchanged between two servers
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 62
7.5 Consistency Protocols
so far we have concentrated on various consistency
models and general design issues
consistency protocols describe an implementation of a
specific consistency model
there are three types
primary-based protocols
remote-write protocols
local-write protocols
replicated-write protocols
active replication
quorum-based protocols
cache-coherence protocols
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 63
1. Primary-Based Protocols
each data item x in the data store has an associated
primary, which is responsible for coordinating write
operations on x
two approaches: remote-write protocols, and local-write
protocols
a. Remote-Write Protocols
all read and write operations are carried out at a
(remote) single server; in effect, data are not
replicated; traditionally used in client-server systems,
where the server may possibly be distributed
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 64
primary-based remote-write protocol with a fixed server to which all read and write operations are
forwarded
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 65
another approach is primary-backup protocols where reads
can be made from local backup servers while writes should
be made directly on the primary server
the backup servers are updated each time the primary is
updated
the principle of primary-backup protocol 66
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE
may lead to performance problems since it may take time
before the process that initiated the write operation is
allowed to continue - updates are blocking
primary-backup protocols provide straightforward
implementation of sequential consistency; the primary can
order all incoming writes
b.Local-Write Protocols
two approaches
i. there is a single copy; no replicas
when a process wants to perform an operation on some
data item, the single copy of the data item is transferred
to the process, after which the operation is performed
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 67
primary-based local-write protocol in which a single copy is migrated between processes
consistency is straight forward
keeping track of the current location of each data item is a
major problem
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 68
ii. primary-backup local-write protocol
the primary migrates between processes that wish to
perform a write operation
multiple, successive write operations can be carried out
locally, while (other) reading processes can still access their
local copy
such improvement is possible only if a nonblocking protocol
is followed
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 69
primary-backup protocol in which the primary migrates to the process wanting to perform an update
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 70
2.Replicated-Write Protocols
unlike primary-based protocols, write operations can be
carried out at multiple replicas; two approaches: Active
Replication and Quorum-Based Protocols
a. Active Replication
each replica has an associated process that carries out update
operations
updates are generally propagated by means of write operations
(the operation is propagated); also possible to send the update
the operations need to be done in the same order everywhere;
totally-ordered multicast
two possibilities to ensure that the order is followed
Lamport’s timestamps, or
use of a central sequencer that assigns a unique sequence
number for each operation; the operation is first sent to the
sequencer then the sequencer forwards the operation to all
replicas
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 71
a problem is replicated invocations
suppose object A invokes B, and B invokes C; if object B is
replicated, each replica of B will invoke C independently
this may create inconsistency and other effects; what if the
operation on C is to transfer $10
the problem of replicated invocations
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 72
one solution is to have a replication-aware communication
layer that avoids the same invocation being sent more than
once
when a replicated object B invokes another replicated object C,
the invocation request is first assigned the same, unique
identifier by each replica of B
a coordinator of the replicas of B forwards its request to all
replicas of object C; the other replicas of object B hold back;
hence only a single request is sent to each replica of C
the same mechanism is used to ensure that only a single reply
message is returned to the replicas of B
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 73
a) forwarding an invocation request from a replicated object
b) returning a reply to a replicated object
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 74
b.Quorum-Based Protocols
use of voting: clients are required to request and acquire
the permission of multiple servers before either reading or
writing a replicated data item
e.g., assume a distributed file system where a file is
replicated on N servers
a client must first contact at least half + 1 (majority)
servers and get them to agree to do an update
the new update will be done and the file will be given a
new version number
to read a file, a client must also first contact at least half +
1 and ask them to send version numbers; if all version
numbers agree, this must be the most recent version
a more general approach is to arrange a read quorum (a
collection of any NR servers, or more) for reading and a
write quorum (of at least NW servers) for updating
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 75
the values of NR and Nw are subject to the following two
constraints
N + N > N ; to prevent read-write conflicts
R w
Nw > N/2 ; to prevent write-write conflicts
three examples of the voting algorithm (N = 12)
a) a correct choice of read and write sets; any subsequent read quorum of three servers will have t
contain at least one member of the write set which has a higher version number
b) a choice that may lead to write-write conflicts; if a client chooses {A,B,C,E,F,G} as its write set
and another client chooses {D,H,I,J,K,L) as its write set, the two updates will both be accepted
without detecting that they actually conflict
c) a correct choice, known as ROWA (read one, write all)
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 76
3. Cache-Coherence Protocols
cashes form a special case of replication as they are
controlled by clients instead of servers
cache-coherence protocols ensure that a cache is
consistent with the server-initiated replicas
two design issues in implementing caches: coherence
detection and coherence enforcement
coherence detection strategy: when inconsistencies are
actually detected
static solution: prior to execution, a compiler performs
the analysis to determine which data may lead to
inconsistencies if cached and inserts instructions that
avoid inconsistencies
dynamic solution: at runtime, a check is made with the
server to see whether a cached data have been
modified since they were cached
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE 77
Thank you!
?
UNIVERSITY OF GONDAR DEPARTMENT OF COMPUTER SCIENCE