Consistency
Consistency
• One of the biggest changes from a centralized
relational database to a cluster-oriented NoSQL
database is about consistency.
• Relational databases try to exhibit strong
consistency by avoiding all the various
inconsistencies
• Consistency comes in various forms, so we’re
going to discuss the various shapes consistency
can take.
Update Consistency
• Example: Martin and Pramod are looking at
the company website and notice that the
phone number is out of date. Both have
update access, so they both go in at the same
time to update the number.
• This issue is called a write-write conflict: two
people updating the same data item at the
same time.
Update Consistency
• When the writes reach the server, the server will
serialize them—decide to apply one, then the
other (alphabetical order)
• Martin’s update would be applied and
immediately overwritten by Pramod’s. In this
case Martin’s is a lost update.
• this is a failure of consistency because Pramod’s
update was based on the state before Martin’s
update, yet was applied after it.
Update Consistency
• Approaches for maintaining consistency - pessimistic or
optimistic.
• A pessimistic approach works by preventing conflicts
from occurring; an optimistic approach lets conflicts
occur, but detects them and takes action to sort them
out.
• So Martin and Pramod would both attempt to acquire
the write lock, but only Martin (the first one) would
succeed. Pramod would then see the result of Martin’s
write before deciding whether to make his own update.
Update Consistency
• A common optimistic approach is a
conditional update where any client that does
an update tests the value just before updating
it to see if it’s changed since his last read.
• Martin’s update would succeed but Pramod’s
would fail. The error would let Pramod know
that he should look at the value again and
decide whether to attempt a further update.
Update Consistency
• Both the pessimistic and optimistic approaches
rely on a consistent serialization of the updates
• With a single server—it has to choose one, then
the other.
• But if there’s more than one server, peer-to-peer
replication, then two nodes might apply the
updates in a different order, resulting in a different
value for the telephone number on each peer.
Update Consistency
• There is another optimistic way to handle a write-write
conflict—save both updates and record that they are in
conflict. This approach is familiar to many programmers from
version control systems
• Any automated merge of write-write conflicts is highly
domain-specific and needs to be programmed for each
particular case.
• Concurrent programming involves a fundamental tradeoff
between safety (avoiding errors such as update conflicts) and
liveness (responding quickly to clients). Pessimistic
approaches often severely degrade the responsiveness of a
system to the degree that it becomes unfit for its purpose.
Update Consistency
• Replication makes it much more likely to run into
write-write conflicts.
• If different nodes have different copies of some
data which can be independently updated, then
you’ll get conflicts unless specific measures are
taken to avoid them.
• Using a single node as the target for all writes for
some data makes it much easier to maintain
update consistency.
Read Consistency
• We have an order with line items and a shipping charge.
The shipping charge is calculated based on the line items in
the order. If we add a line item, we thus also need to
recalculate and update the shipping charge. In a relational
database, the shipping charge and line items will be in
separate tables. The danger of inconsistency is that Martin
adds a line item to his order, Pramod then reads the line
items and shipping charge, and then Martin updates the
shipping charge.
• This is an inconsistent read or read-write conflict
• We refer to this type of consistency as logical consistency:
ensuring that different data items make sense together.
Read Consistency
• To avoid a logically inconsistent read-write
conflict, relational databases support the
notion of transactions. Providing Martin wraps
his two writes in a transaction, the system
guarantees that Pramod will either read both
data items before the update or both after the
update.
Read Consistency
• Common claim is that NoSQL databases don’t support
transactions and thus can’t be consistent.
• Such claim is mostly wrong because
• Lack of transactions usually only applies to some NoSQL
databases, in particular the aggregate-oriented ones.
• In contrast, graph databases tend to support ACID transactions
just the same as relational databases.
• Secondly, aggregate-oriented databases do support atomic
updates, but only within a single aggregate. This means that
you will have logical consistency within an aggregate but not
between aggregates. So in the example, you could avoid
running into that inconsistency if the order, the delivery charge,
and the line items are all part of a single order aggregate.
Read Consistency
• Of course not all data can be put in the same
aggregate, so any update that affects multiple
aggregates leaves open a time when clients
could perform an inconsistent read.
• The length of time an inconsistency is present is
called the inconsistency window.
• A NoSQL system may have a quite short
inconsistency window: As one data point,
Amazon’s documentation says that the
inconsistency window for its SimpleDB service is
usually less than a second.
Read Consistency
• Let’s imagine there’s one last hotel room for a desirable
event. The hotel reservation system runs on many nodes.
Martin and Cindy are a couple considering this room, but
they are discussing this on the phone because Martin is in
London and Cindy is in Boston. Meanwhile Pramod, who is in
Mumbai, goes and books that last room. That updates the
replicated room availability, but the update gets to Boston
quicker than it gets to London. When Martin and Cindy fire
up their browsers to see if the room is available, Cindy sees it
booked and Martin sees it free.
• This is another inconsistent read—but it’s a breach of a
different form of consistency we call replication consistency:
ensuring that the same data item has the same value when
read from different replicas
Read Consistency
• Eventually, the updates will propagate fully,
and Martin will see the room is fully booked.
Therefore this situation is generally referred to
as eventually consistent, meaning that at any
time nodes may have replication
inconsistencies but, if there are no further
updates, eventually all nodes will be updated
to the same value.
• Data that is out of date is generally referred to
as stale
Read Consistency
• Although replication consistency is
independent from logical consistency,
replication can worsen a logical inconsistency
by lengthening its inconsistency window.
• Two different updates on the master may be
performed in rapid succession, leaving an
inconsistency window of milliseconds.
• But delays in networking could mean that the
same inconsistency window lasts for much
longer on a slave.
Read Consistency
• The presence of an inconsistency window means that
different people will see different things at the same time.
• Inconsistency windows can be particularly problematic when
you get inconsistencies with yourself.
• Example: Posting comments on a blog entry. Few people are
going to worry about inconsistency windows of even a few
minutes while people are typing in their latest thoughts.
Often, systems handle the load of such sites by running on a
cluster and load-balancing incoming requests to different
nodes.
• Therein lies a danger: You may post a message using one
node, then refresh your browser, but the refresh goes to a
different node which hasn’t received your post yet—and it
looks like your post was lost.
Read Consistency
• One can tolerate reasonably long inconsistency
windows, but you need read your- writes consistency
which means that, once you’ve made an update,
you’re guaranteed to continue seeing that update.
• One way to get this in an otherwise eventually
consistent system is to provide session consistency:
Within a user’s session there is read-your-writes
consistency.
• This does mean that the user may lose that
consistency should their session end for some reason
or should the user access the same system
simultaneously from different computers
Read Consistency
• Techniques to provide session consistency.
• sticky session: a session that’s tied to one node (this is also called session
affinity). A sticky session allows you to ensure that as long as you keep read-
your-writes consistency on a node, you’ll get it for sessions too. The downside
is that sticky sessions reduce the ability of the load balancer to do its job.
• version stamps to ensure every interaction with the data store includes the
latest version stamp seen by a session. The server node must then ensure that
it has the updates that include that version stamp before responding to a
request.
• Session consistency with sticky sessions and master-slave replication can be
awkward . if you want to read from the slaves to improve read performance
but still need to write to the master.
• One way of handling this is for writes to be sent the slave, who then takes
responsibility for forwarding them to the master while maintaining session
consistency for its client. Another approach is to switch the session to the
master temporarily when doing a write, just long enough that reads are done
from the master until the slaves have caught up with the update.
Relaxing Consistency
• Consistency is a Good Thing—but, sadly,
sometimes we have to sacrifice it.
• It is always possible to design a system to avoid
inconsistencies, but often impossible to do so
without making unbearable sacrifices in other
characteristics of the system.
• As a result, we often have to trade off consistency
for something else.
• Different domains have different tolerances for
inconsistency, and we need to take this tolerance
into account as we make our decisions.
Relaxing Consistency
• Principal tool to enforce consistency is the
transaction.
• Transaction systems usually come with the
ability to relax isolation levels, allowing queries
to read data that hasn’t been committed yet,
and in practice we see most applications relax
consistency down from the highest isolation
level (serialized) in order to get effective
performance.
The CAP Theorem
• In the NoSQL world it’s common to refer to the
CAP theorem as the reason why you may need to
relax consistency.
• The basic statement of the CAP theorem is that,
given the three properties of Consistency,
Availability, and Partition tolerance, you can only
get two.
• Consistency is pretty much as we’ve defined it so
far. Availability has a particular meaning in the
context of CAP—it means that if you can talk to a
node in the cluster, it can read and write data.
The CAP Theorem
• Partition tolerance
means that the cluster
can survive
communication
breakages in the cluster
that separate the
cluster into multiple
partitions unable to
communicate with each
other (situation known
as a split brain)
The CAP Theorem
• A single-server system is the obvious example
of a CA system—a system that has Consistency
and Availability but not Partition tolerance.
• A single machine can’t partition, so it does not
have to worry about partition tolerance.
• There’s only one node—so if it’s up, it’s
available.
• Being up and keeping consistency is
reasonable. This is the world that most
relational database systems live in.
The CAP Theorem
• Theoretically possible to have a CA cluster.
• However, this would mean that if a partition ever
occurs in the cluster, all the nodes in the cluster
would go down so that no client can talk to a node.
• But in CAP’s there is a special usage of
“availability”
• CAP defines “availability” to mean “every request
received by a non failing node in the system must
result in a response”. So a failed, unresponsive
node doesn’t infer a lack of CAP availability.
The CAP Theorem
• This does imply that you can build a CA
cluster, but you have to ensure it will only
partition rarely and completely.
• Remember that in order to bring down all the
nodes in a cluster on a partition, you also have
to detect the partition in a timely manner—
which itself is no small feat.
The CAP Theorem
• So clusters have to be tolerant of network partitions.
And here is the real point of the CAP theorem.
• CAP theorem is often stated as “you can only get two
out of three,”
• System that may suffer partitions, as distributed system
do, you have to trade off consistency versus availability.
• This isn’t a binary decision; often, you can trade off a
little consistency to get some availability. The resulting
system would be neither perfectly consistent nor
perfectly available—but would have a combination that
is reasonable for your particular needs.
The CAP Theorem
• Example : Martin and Pramod are both trying to book
the last hotel room on a system that uses peer-to-
peer distribution with two nodes (London for Martin
and Mumbai for Pramod). If we want to ensure
consistency, then when Martin tries to book his room
on the London node, that node must communicate
with the Mumbai node before confirming the
booking. And both nodes must agree on the
serialization of their requests.
• This gives us consistency—but should the network link
break, then neither system can book any hotel room,
sacrificing availability.
The CAP Theorem
• One way to improve availability is to designate
one node as the master for a particular hotel and
ensure all bookings are processed by that master.
• Let master be Mumbai, then Mumbai can still
process hotel bookings for that hotel and Pramod
will get the last room.
• If we use master-slave replication, London users
can see the inconsistent room information but
cannot make a booking and thus cause an update
inconsistency.
The CAP Theorem
• We still can’t book a room on the London
node for the hotel whose master is in Mumbai
if the connection goes down.
• In CAP terminology, this is a failure of
availability in that Martin can talk to the
London node but the London node cannot
update the data.
• To gain more availability, we might allow both
systems to keep accepting hotel reservations
even if the network link breaks down.
The CAP Theorem
• The classic example of allowing inconsistent writes is the
shopping cart
• In this case you are always allowed to write to your
shopping cart, even if network failures mean you end up
with multiple shopping carts. The checkout process can
merge the shopping carts by putting the union of the
items from the carts into a single cart and returning that.
• Almost always that’s the correct answer—but if not, the
user gets the opportunity to look at the cart before
completing the order.
• These situations are closely tied to the domain and
require domain knowledge to know how to resolve
The CAP Theorem
• A similar logic applies to read consistency.
• If you are trading financial instruments over a
computerized exchange, you may not be able to tolerate
any data that isn’t right up to date.
• However, if you are posting a news item to a media
website, you may be able to tolerate old pages for minutes.
• In these cases you need to know how tolerant you are of
stale reads, and how long the inconsistency window can be
—often in terms of the average length, worst case, and
some measure of the distribution for the lengths.
• Different data items may have different tolerances for
staleness, and thus may need different settings in your
replication configuration.
Relaxing Durability
• The key to Consistency is serializing requests by
forming Atomic, Isolated work units.
• There are cases where you may want to trade off some
durability for higher performance.
• If a database can run mostly in memory, apply updates
to its in-memory representation, and periodically flush
changes to disk, then it may be able to provide
substantially higher responsiveness to requests.
• The cost is that, should the server crash, any updates
since the last flush will be lost.
Relaxing Durability
• Example : A big website may have many users and keep
temporary information about what each user is doing in
some kind of session state.
• There’s a lot of activity on this state, creating lots of
demand, which affects the responsiveness of the website.
The vital point is that losing the session data isn’t too much
of a tragedy—it will create some annoyance, but maybe less
than a slower website would cause.
• This makes it a good candidate for nondurable writes. Often,
you can specify the durability needs on a call-by-call basis,
so that more important updates can force a flush to disk.
Relaxing Durability
• Another class of durability tradeoffs comes up
with replicated data.
• A failure of replication durability occurs when
a node processes an update but fails before
that update is replicated to the other nodes.
Relaxing Durability
• In Master-slave distribution model the slaves appoint a
new master automatically should the existing master fail.
• Any update written, if not passed onto the replicas will
effectively become lost.
• If the master come back online, those updates will conflict
with updates that have happened since.
• We think of this as a durability problem because you think
your update has succeeded since the master acknowledged
it, but a master node failure caused it to be lost.
• If master can be recovered rapidly no need to auto-failover
to slave.
Relaxing Durability
• Otherwise, one can improve replication
durability by ensuring that the master waits
for some replicas to acknowledge the update
before the master acknowledges it to the
client.
• Obviously, however, that will slow down
updates and make the cluster unavailable if
slaves fail—so, again, we have a tradeoff,
depending upon how vital durability is.
Quorums
• When you’re trading off consistency or
durability, it’s not an all or nothing proposition
• The more nodes you involve in a request, the
higher is the chance of avoiding an
inconsistency.
• This naturally leads to the question: How
many nodes need to be involved to get strong
consistency?
Quorums
• write quorum – expressed as W > N/2, meaning the
number of nodes participating in the write (W) must be
more than the half the number of nodes involved in
replication (N).
• The number of replicas is often called the replication
factor.
• Example: data replicated over three node. don’t need all
nodes to acknowledge a write to ensure strong
consistency; all you need is two of them—a majority.
• If you have conflicting writes, only one can get a majority.
Quorums
• read quorum: How many nodes you need to contact to be
sure you have the most up-to-date change.
• bit more complicated because it depends on how many nodes
need to confirm a write.
• Example:
Let’s consider a replication factor of 3. If all writes need two
nodes to confirm (W = 2) then we need to contact at least two
nodes to be sure we’ll get the latest data. If, however, writes
are only confirmed by a single node (W = 1) we need to talk to
all three nodes to be sure we have the latest updates.
• In this case, since we don’t have a write quorum, we may
have an update conflict, but by contacting enough readers we
can be sure to detect it. Thus we can get strongly consistent
reads even if we don’t have strong consistency on our writes.
Quorums
• relationship between the number of nodes you
need to contact for a read (R), those confirming
a write (W), and the replication factor (N) can
be captured in an inequality: You can have a
strongly consistent read if R + W > N.
Key Points
• Write-write conflicts occur when two clients try to write the same data at the same time.
• Read-write conflicts occur when one client reads inconsistent data in the middle of another client’s
write.
• Pessimistic approaches lock data records to prevent conflicts. Optimistic approaches detect
conflicts and fix them.
• Distributed systems see read-write conflicts due to some nodes having received updates while
other nodes have not.
• Eventual consistency means that at some point the system will become consistent once all the
writes have propagated to all the nodes.
• Clients usually want read-your-writes consistency, which means a client can write and then
immediately read the new value. This can be difficult if the read and the write happen on different
nodes.
• To get good consistency, you need to involve many nodes in data operations, but this increases
latency. So you often have to trade off consistency versus latency.
• The CAP theorem states that if you get a network partition, you have to trade off availability of data
versus consistency.
• Durability can also be traded off against latency, particularly if you want to survive failures with
replicated data.
• You do not need to contact all replicants to preserve strong consistency with replication; you just
need a large enough quorum.