Consistency in NoSQL

Consistency
in NoSQL
Dr. Dipali P. Meher
MCS, M.Phil, NET. Ph.D
Modern College of Arts, Science and Commerce, Ganeshkhind, Pune 411016
mailtomeher@gmail.com/dipalimeher@moderncollegegk.org
Source: NoSQL Distilled by Pramod J. Sadalage & Martin Fowler

Relational databases try to exhibit strong consistency by
avoiding all the various inconsistencies.
In NOSQL “CAP theorem” & “eventual consistency”

Consistency
The CAP Theorem
04
Consistency
Update Consistency
02
Read Consistency
01
Relaxing Consistency
03
Relaxing Durability
05
Quorums
06

Source: NoSQL Distilled by
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
ACID Properties

Pramod J. Sadalage
Martin Fowler
Consistency
Consistency in database systems refers to the
requirement that any given database transaction must
change affected data only in allowed ways.
Any data written to the database must be valid according
to all defined rules,
including constraints, cascades, triggers, and any
combination thereof.
This does not guarantee correctness of the transaction in
all ways the application programmer might have wanted
(that is the responsibility of application-level code) but
merely that any programming errors cannot result in the
violation of any defined database constraints

Pramod J. Sadalage
Martin Fowler
Consistency
Database consistency states that only valid data will
be written to the database.
If a transaction is executed that violates the
database's consistency rules, the entire transaction
will be rolled back and the database will be
restored to its original state.
On the other hand, if a transaction successfully
executes, it will take the database from one state
that is consistent with the rules to another state that
is also consistent with the rules.
Database consistency
doesn't mean that the
transaction is correct,
only that the
transaction didn't
break the rules
defined by the
program
You may have experience with consistency rules about
leaving a field on a web page form empty.

Pramod J. Sadalage
Martin Fowler
Consistency
RDBMS gives strong consistency
In NOSQL CAP Theorem ,Eventual consistency

Pramod J. Sadalage
Martin Fowler
Update Consistency
person x and person y want to update telephone number on
website. Both will have update rights.
Both persons uses different format but at the same time they are
updating phone number.
This issue is called write-write conflict.: Two people updating
same data item at same time.
Write-write conflict
When these write reaches to server , server will serialize them after one another.
1) Server will use alphabetical order then x persons update will done first then y persons update will done.
So x persons update is overwritten by y persons update.
So whatever phone number is updated by person x is known as lost update.
This is failure of consistency( actually lost update is not a big problem)

Pramod J. Sadalage
Martin Fowler
Approaches for maintaining consistency
1)Pessimistic
2) Optimistic
works by preventing conflicts from occurring
Common approach :to have write locks
in order to change a value you need to acquire a lock,
and the system ensures that only one client can get a
lock at a time.
In previous scenario both(person x and person y)
attempt to acquire the write lock but only person x
would succeed.
So persons Y will see result of person x and then he will
decide whether to make update or not.

Pramod J. Sadalage
Martin Fowler
2) Optimistic
 lets conflicts occur, but detects them and
takes action to sort them out
 Common approach: conditional update
 any client that does an update tests the value
just before updating it to see if it’s changed
since his last read.
 In previous scenario , person x’s update would
succeed but Persons y’s would fail.
 The error would let person y know that he
should look at the value again and decide
whether to attempt a further update.

Pramod J. Sadalage
Martin Fowler
2) Optimistic
—save both updates and record that they are
in conflict.
This approach is familiar to many programmers
from version control systems, particularly
distributed version control systems( as there are
often conflicting commits).
Version Control: merge the two updates.
Maybe you show both values to the user and
ask them to sort it out.
Or the computer can merge itself.

Pramod J. Sadalage
Martin Fowler
Both updates relay on a consistent serialization of the updates.
1) Single server: —it has to choose one, then the other.(synchronization is done)
2) Peer to peer(multi server)- then two nodes might apply the updates
in a different order resulting in a different value for the telephone number
on each peer.
when people talk about concurrency in distributed systems,
they talk about sequential consistency—
ensuring that all nodes apply operations in the same order.

Pramod J. Sadalage
Martin Fowler
When people encounter this issue they prefer to go for
pessimistic concurrency control(they try to avoid conflicts)
Using a single node as the target for all writes for
some data makes it much easier to maintain update consistency.

Source NoSQL Distilled by
Pramod J. Sadalage
Martin Fowler
example
Read Consistency
Data store that maintains update
consistency does not guarantee
that all readers
(transactions/users)
will always get consistent
database( as response to read
requests).
we have an order with line items
and a shipping charge. The
shipping charge is calculated based
on the line items in the order. If we
add a line item, we thus also need
to recalculate and update the
shipping charge.
danger of inconsistency
Martin adds a line item to his
order, Pramod then reads the
line items and shipping charge,
and then Martin updates the
shipping charge. Pramod has
done a read in the middle of
Martin’s write
Inconsistent Read or read-write conflict

Pramod J. Sadalage
Martin Fowler

Pramod J. Sadalage
Martin Fowler
Inconsistent read /read-write conflict
in the field of databases, write–read
conflict, also known
as reading uncommitted data, is a
computational anomaly associated
with interleaved execution of
transactions.
Given a schedule S.
T2 could read a
database object A,
modified by T1 which
hasn't committed. This
is a dirty read.
Also known as logical
consistency

Pramod J. Sadalage
Martin Fowler
Inconsistent read /read-write conflict
A common claim we hear is that NoSQL databases
don’t support transactions and thus can’t be
consistent.
1) any statement about lack of transactions usually
only applies to some NoSQL databases, in particular
the aggregate-oriented ones. In case of GRAPH
databases they support ACID transactions.
2) aggregate-oriented databases do support atomic
updates, but only within a single aggregate
logical consistency within an aggregate but not
between aggregates .
logical consistency:
ensuring that different data
items make sense together.
To avoid a logically
inconsistent read-write
conflict, RDBMS support the
notion of transactions.
Providing Martin wraps his
two writes in a transaction,
the system guarantees that
Pramod will either read both
data items before the update
or both after the update.
In above situation avoid
inconsistency by giving single order
aggregate.

Pramod J. Sadalage
Martin Fowler
Inconsistency Window
The period between the update and the moment when it is
guaranteed that any observer will always see the updated value is
dubbed the inconsistency window.
The length of time an inconsistency is
present is called the inconsistency window.
A NoSQL system may have a quite short
inconsistency window. Usually less than a
second.

Pramod J. Sadalage
Martin Fowler
Inconsistency Window
The presence of an inconsistency window means that different
people will see different things at the same time
If Martin and Cindy are looking at rooms while on a transatlantic call,
it can cause confusion. It’s more common for users to act
independently
But inconsistency windows can be particularly problematic when you get
inconsistencies with yourself. Consider the example of posting comments
on a blog entry. Few people are going to worry about inconsistency
windows of even a few minutes while people are typing in their latest
thoughts.

Pramod J. Sadalage
Martin Fowler
Replication Inconsistency
R3
A=5
R4
A=5
R5
A=5
R1
A=5
R2
A=5
Original
DATABASE
Read(A) from replica R1 A=5?
Replicas are places on different geographical areas

Pramod J. Sadalage
Martin Fowler

Pramod J. Sadalage
Martin Fowler
The updates will propagate fully, and Martin will see the room is fully
booked. Therefore this situation is generally referred to as eventually
consistent, meaning that at any time nodes may have replication
inconsistencies but, if there are no further updates, eventually all
nodes will be updated to the same value.
Data that is out of date is generally referred to as stale, which reminds
us that a cache is another form of replication—essentially following
the master-slave distribution model.

Pramod J. Sadalage
Martin Fowler
When replication is introduced in NoSQL a new kind of
consistency will come into picture.
Example
There is last hotel room to be booked for event. The hotel reservation system runs on many nodes.
Martin and Cindy are a couple considering this room, but they are discussing this on the phone
because Martin is in London and Cindy is in Boston. Meanwhile Pramod, who is in Mumbai, goes
and books that last room. That updates the replicated room availability, but the update gets to
Boston quicker than it gets to London. When Martin and Cindy fire up their browsers to see if the
room is available, Cindy sees it booked and Martin sees it free. This is another inconsistent read—
but it’s a breach of a different form of consistency we call replication consistency
Ensuring that the same data item has the same value
when read from different replicas.but this does not
happen so this is inconsistency

Pramod J. Sadalage
Martin Fowler
Although replication consistency is independent from logical
consistency, replication can exacerbate(excel) a logical inconsistency
by lengthening its inconsistency window.
Two different updates on the master may be performed in rapid
succession, leaving an inconsistency window of milliseconds. But
delays in networking could mean that the same inconsistency window
lasts for much longer on a slave.

Pramod J. Sadalage
Martin Fowler
In situations like this, you can tolerate reasonably long inconsistency
windows, but you need ready our-writes consistency which means that,
once you’ve made an update, you’re guaranteed to continue seeing that
update. One way to get this in an otherwise eventually consistent
system is to provide session consistency: Within a user’s session there is
read-your-writes consistency.
This does mean that the user may lose that consistency should their session
end for some reason or should the user access the same system
simultaneously from different computers, but these cases are relatively rare.

Pramod J. Sadalage
Martin Fowler
Sticky Session
There are a couple of techniques to provide session
consistency. A common way, and often the easiest way, is to
have a sticky session: a session that’s tied to one node (this is
also called session affinity). A sticky session allows you to
ensure that as long as you keep read-your-writes consistency
on a node, you’ll get it for sessions too. The downside is that
sticky sessions reduce the ability of the load balancer to do its
job

Pramod J. Sadalage
Martin Fowler
Another approach for session consistency is to use
version stamps and ensure every interaction with
the data store includes the latest version stamp
seen by a session. The server node must then
ensure that it has the updates that include that
version stamp before responding to a request.

Pramod J. Sadalage
Martin Fowler
Maintaining session consistency with sticky sessions and master-slave
replication can be awkward if you want to read from the slaves to
improve read performance but still need to write to the master.
Two ways for handling this is
1) For writes to be sent the slave, who then takes responsibility for forwarding
them to the master while maintaining session consistency for its client.
2) Switch the session to the master temporarily when doing a write, just long
enough that reads are done from the master until the slaves have caught up with
the update.

Pramod J. Sadalage
Martin Fowler
Trading off consistency is a familiar concept even in single-
server relational database systems.
But, the principal tool to enforce consistency is the
transaction.
Transaction systems usually come with the ability to relax
isolation levels, allowing queries to read data that hasn’t been
committed yet, and in practice we see most applications relax
consistency down from the highest isolation level (serialized)
in order to get effective performance.

Pramod J. Sadalage
Martin Fowler
Consistency is a Good Thing — but, sadly, sometimes we have to
sacrifice it
It is always possible to design a system to avoid
inconsistencies, but often impossible to do so without
making unbearable sacrifices in other characteristics of
the system.
As a result, we often have to tradeoff consistency for
something else

Pramod J. Sadalage
Martin Fowler
CAP THEOREM
Proposed by Eric Brewer in 2000
The basic statement of the CAP theorem is that,
given the three properties of Consistency,
Availability, and Partition tolerance, you can only
get two.
Consistency: All the nodes see same data item at same time
Availability:every request received by a database node(not fail)
should result in correct response
Partition Tolerance: The system continues to operate despite
arbitrary partitioning due to network failures.

Pramod J. Sadalage
Martin Fowler
CAP Theorem

Pramod J. Sadalage
Martin Fowler
CAP Theorem
 The cap theorem is a tool used to makes system designers aware of the trade-offs while designing
networked shared-data systems.
 the theorem states that networked shared-data systems can only guarantee/strongly support
two of the following three proper ties:
consistency — a guarantee that every node in a distributed cluster returns the same, most recent,
successful write. consistency refers to every client having the same view of the data. there are
various types of consistency models. consistency in cap (used to prove the theorem) refers to
linearizability or sequential consistency, a very strong form of consistency.
availability — every non-failing node returns a response for all read and write requests in a
reasonable amount of time. the key word here is every. to be available, every node on (either side of
a network partition) must be able to respond in a reasonable amount of time.
partition tolerant — the system continues to function and upholds its consistency guarantees in
spite of network partitions. network partitions are a fact of life. distributed systems guaranteeing
partition tolerance can gracefully recover from partitions once the partition heals.

Pramod J. Sadalage
Martin Fowler
CAP Theorem
the c and a in acid represent different concepts than c and a in the cap theorem.
cp (consistent and partition tolerant) — at first glance, the cp category is confusing, i.e.,
a system that is consistent and partition tolerant but never available. cp is referring to a
category of systems where availability is sacrificed only in the case of a network
partition.
ca (consistent and available) — ca systems are consistent and available systems in the
absence of any network partition. often a single node's db servers are categorized as ca
systems. single node db servers do not need to deal with partition tolerance and are
thus considered ca systems. the only hole in this theory is that single node db systems
are not a network of shared data systems and thus do not fall under the preview of cap.
ap (available and partition tolerant) — these are systems that are available and
partition tolerant but cannot guarantee consistency.

Pramod J. Sadalage
Martin Fowler
CAP Theorem
the correct way to think about cap is that in case of a network partition
(a rare occurrence) one needs to choose between availability
and consistency.

Pramod J. Sadalage
Martin Fowler
Relaxing Durability
what is the point of a data store if it can
lose updates?

Pramod J. Sadalage
Martin Fowler
In some cases where user want to trade off
some durability fro higher performance.
Database are in memory
updates can be applied to memory periodically.
So it gives higher responses to disk requests
But though if there is server crash ,
any updates since the last flush will be lost
more important updates can force a flush to disk.

Example:
A big website may have many users and keep temporary
information about what each user is doing in some kind of
session state.
There’s a lot of activity on this state, creating lots of demand,
which affects the responsiveness of the website.
The vital point is that losing the session data isn’t too much of
a tragedy—it will create some annoyance, but maybe less than
a slower website would cause.
Durability: call by call basis , so that more important
updates can force a flush to disk.

Example:
relaxing durability is capturing telemetric data
from physical devices.
It may be that you’d rather capture data at a
faster rate, at the cost of missing the last
updates should the server go down.

Pramod J. Sadalage
Martin Fowler
Example Relaxing Durability
replicated data
A failure of replication durability occurs
when a node processes an update but
fails before that update is replicated to
the other nodes.
Node A
Node
B/Replic
a B
Node
C/Replica
C
Node
D/
Replica
s D
Synchronization of updates to replicas
Failure of Node A
before replication
of updates to
other nodes
Update
Database
Item Z

Pramod J. Sadalage
Martin Fowler
Example Relaxing Durability :in Replicated Data
Master Slave
Replication
 The slaves appoint a new master
automatically should the existing master fail.
 If master will fail then updates proposed to
slaves(replicas) will lost.
 When the master will come back there will
conflicting updates.
 Above problem is known as durability
problem as user will think update is
succeed as master have acknowledged it but
due to failure of master node updates are lost.

Pramod J. Sadalage
Martin Fowler
Solution to Durability Problem
Improve replication durability by ensuring that the
Master waits for some replicas to acknowledge the
update before the master acknowledges it to the client.
Above solution wills slow down the update process
then the decision is upto user is how vital durability is?
So the levels of durability will be decided.

Pramod J. Sadalage
Martin Fowler
Quorums
 Trading of consistency/durability
 More nodes involve in a request , the higher is the chance of
avoiding an inconsistency
 But how many nodes you take for strong consistency?

Pramod J. Sadalage
Martin Fowler
Write Quorum
 Some data is replicated over three nodes.
 All the nodes are not needed to acknowledge for write operation
 to ensure strong consistency.
 Need is either two of them or majority of them.
 For conflicting writes (ww_conflict)majority is important.
W > N/2
W: number of nodes participating in the write
N: number of nodes involved in replication
Meaning: number of nodes participating in the write must be
more than the half the number of nodes involved in replication
Replication Factor:The number of replicas is often
called the replication factor.

Pramod J. Sadalage
Martin Fowler
Read Quorum
 Number of nodes user need to contact to be sure you have the most
up-to-date change.
 This quorum is complicated. (it depends on how many nodes need
to confirm a write)

Pramod J. Sadalage
Martin Fowler
Example of write Quorum and write Quorum
Replication Factor= 3
R: number of nodes
contacted for read
W: number of nodes participating in the write
N: number of nodes involved in replication
N=3
For all write operation W=2
Means user need to contact two nodes to get correct
data(consistent data)
If w=1 then we have to talk with all three nodes to confirm
updates. So when w=1 we donot have write quorum(ww
conflict is there)
So we can contact all w to get strong consistent reads.
i.r R+W > N

Pramod J. Sadalage
Martin Fowler
Example of write Quorum and write Quorum
Peer-to-peer distribution model : for strong read R+W > N this is inequality
Master slave replication: read and write from master only to
avoid ww and rw conflict.
To get good resilience: replication factor = 3
For strongly consistent reads , require writes to be acknowledged
by all the nodes, thus allowing reads to contact only one.
N=3
W=3
R=1

Pramod J. Sadalage
Martin Fowler
Key Points to Remember
 Relational databases try to exhibit strong consistency by avoiding all the various inconsistencies.
 In NOSQL “CAP theorem” & “eventual consistency”
 Pessimistic approaches lock data records to prevent conflicts. Optimistic approaches detect
conflicts and fix them.
 To get good consistency, you need to involve many nodes in data operations, but this increases
latency. So you often have to trade off consistency versus latency.
 The CAP theorem states that if you get a network partition, you have to trade off availability of
data versus consistency.
 Durability can also be traded off against latency, particularly if you want to survive failures with
replicated data. You do not need to contact all replicants to preserve strong consistency with
replication; you just need a large enough quorum.

Consistency in NoSQL

In this document