1) Relational databases try to maintain strong consistency by avoiding inconsistencies, while NoSQL databases accept some inconsistencies due to the CAP theorem and eventual consistency.
2) Consistency in databases refers to only allowing valid data transactions according to the defined rules to prevent violations. NoSQL databases sacrifice some consistency for availability and partition tolerance.
3) Eventual consistency means replicas may show temporary inconsistencies but will eventually converge to the same state with further updates. This can cause problems for applications that require strong consistency.
Dr. Dipali P. Meher introduces NoSQL consistency concepts. Key authors: Pramod J. Sadalage & Martin Fowler.
Relational databases focus on strong consistency while NoSQL embraces the CAP theorem emphasizing eventual consistency.
CAP theorem details various types of consistency, including update and read consistency, relaxing consistency, and durability.
Introduces ACID properties relevant to database consistency as per Sadalage and Fowler’s work.
Consistency means valid data changes per defined rules; transactions violating consistency are rolled back.
RDBMS provides strong consistency; NoSQL focuses on CAP theorem and eventual consistency.
Write-write conflicts arise when simultaneous updates occur, leading to lost updates and failure in consistency.
Pessimistic and optimistic concurrency control methods are detailed for avoiding and handling conflicts.
Optimistic approaches allow conflicts, detecting and handling them akin to version control systems.
Pessimistic concurrency control prefers single-node writes for easier consistency management.
Inconsistencies occur during reads when updates are happening concurrently, posing risks of conflicting data.
Dirty reads & logical consistency issues relate to interleaved transaction execution causing data inconsistency.
Defines an inconsistency window; the period during which updates aren't visible to all users, potentially leading to confusion.
Demonstrates read consistency across replicas, explaining eventual consistency and how stale data arises.
Illustrates a scenario demonstrating replication inconsistency when updates arrive at different times to users.
Replication can worsen logical inconsistency by extending inconsistency windows, complicating data correctness.
Methods like sticky sessions maintain read-your-writes consistency within user sessions to ensure data correctness.
Discusses using version stamps and managing session consistency amidst replication for user-specific data accuracy.
Explains how systems often relax consistency for performance, highlighting challenges in maintaining data correctness.
CAP theorem states that distributed systems can guarantee only two of the three properties: consistency, availability, or partition tolerance.
Explores various CAP theorem scenarios; emphasizes trade-offs between consistency, availability, and partition tolerance.Discusses instances where durability can be sacrificed for performance, impacting data recovery during failures.
Highlights replication durability failures and issues with lost updates during node failures in master-slave systems.
Describes solutions to enhance durability while balancing the need for performance and data integrity.
Explains write and read quorum significance in balanced data operations ensuring consistency across nodes.
Summarizes relational vs. NoSQL consistency, the CAP theorem, and practical implications in data management.
Consistency
in NoSQL
Dr. DipaliP. Meher
MCS, M.Phil, NET. Ph.D
Modern College of Arts, Science and Commerce, Ganeshkhind, Pune 411016
mailtomeher@gmail.com/dipalimeher@moderncollegegk.org
Source: NoSQL Distilled by Pramod J. Sadalage & Martin Fowler
2.
Relational databases tryto exhibit strong consistency by
avoiding all the various inconsistencies.
In NOSQL “CAP theorem” & “eventual consistency”
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
ACID Properties
5.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Consistency
Consistency in database systems refers to the
requirement that any given database transaction must
change affected data only in allowed ways.
Any data written to the database must be valid according
to all defined rules,
including constraints, cascades, triggers, and any
combination thereof.
This does not guarantee correctness of the transaction in
all ways the application programmer might have wanted
(that is the responsibility of application-level code) but
merely that any programming errors cannot result in the
violation of any defined database constraints
6.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Consistency
Database consistency states that only valid data will
be written to the database.
If a transaction is executed that violates the
database's consistency rules, the entire transaction
will be rolled back and the database will be
restored to its original state.
On the other hand, if a transaction successfully
executes, it will take the database from one state
that is consistent with the rules to another state that
is also consistent with the rules.
Database consistency
doesn't mean that the
transaction is correct,
only that the
transaction didn't
break the rules
defined by the
program
You may have experience with consistency rules about
leaving a field on a web page form empty.
7.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Consistency
RDBMS gives strong consistency
In NOSQL CAP Theorem ,Eventual consistency
8.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Update Consistency
person x and person y want to update telephone number on
website. Both will have update rights.
Both persons uses different format but at the same time they are
updating phone number.
This issue is called write-write conflict.: Two people updating
same data item at same time.
Write-write conflict
When these write reaches to server , server will serialize them after one another.
1) Server will use alphabetical order then x persons update will done first then y persons update will done.
So x persons update is overwritten by y persons update.
So whatever phone number is updated by person x is known as lost update.
This is failure of consistency( actually lost update is not a big problem)
9.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Approaches for maintaining consistency
1)Pessimistic
2) Optimistic
works by preventing conflicts from occurring
Common approach :to have write locks
in order to change a value you need to acquire a lock,
and the system ensures that only one client can get a
lock at a time.
In previous scenario both(person x and person y)
attempt to acquire the write lock but only person x
would succeed.
So persons Y will see result of person x and then he will
decide whether to make update or not.
10.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Approaches for maintaining consistency
2) Optimistic
lets conflicts occur, but detects them and
takes action to sort them out
Common approach: conditional update
any client that does an update tests the value
just before updating it to see if it’s changed
since his last read.
In previous scenario , person x’s update would
succeed but Persons y’s would fail.
The error would let person y know that he
should look at the value again and decide
whether to attempt a further update.
11.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Approaches for maintaining consistency
2) Optimistic
—save both updates and record that they are
in conflict.
This approach is familiar to many programmers
from version control systems, particularly
distributed version control systems( as there are
often conflicting commits).
Version Control: merge the two updates.
Maybe you show both values to the user and
ask them to sort it out.
Or the computer can merge itself.
12.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Both updates relay on a consistent serialization of the updates.
1) Single server: —it has to choose one, then the other.(synchronization is done)
2) Peer to peer(multi server)- then two nodes might apply the updates
in a different order resulting in a different value for the telephone number
on each peer.
when people talk about concurrency in distributed systems,
they talk about sequential consistency—
ensuring that all nodes apply operations in the same order.
13.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Approaches for maintaining consistency
When people encounter this issue they prefer to go for
pessimistic concurrency control(they try to avoid conflicts)
Using a single node as the target for all writes for
some data makes it much easier to maintain update consistency.
14.
Source NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
example
Read Consistency
Data store that maintains update
consistency does not guarantee
that all readers
(transactions/users)
will always get consistent
database( as response to read
requests).
we have an order with line items
and a shipping charge. The
shipping charge is calculated based
on the line items in the order. If we
add a line item, we thus also need
to recalculate and update the
shipping charge.
danger of inconsistency
Martin adds a line item to his
order, Pramod then reads the
line items and shipping charge,
and then Martin updates the
shipping charge. Pramod has
done a read in the middle of
Martin’s write
Inconsistent Read or read-write conflict
Source NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Inconsistent read /read-write conflict
in the field of databases, write–read
conflict, also known
as reading uncommitted data, is a
computational anomaly associated
with interleaved execution of
transactions.
Given a schedule S.
T2 could read a
database object A,
modified by T1 which
hasn't committed. This
is a dirty read.
Also known as logical
consistency
17.
Source NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Inconsistent read /read-write conflict
A common claim we hear is that NoSQL databases
don’t support transactions and thus can’t be
consistent.
1) any statement about lack of transactions usually
only applies to some NoSQL databases, in particular
the aggregate-oriented ones. In case of GRAPH
databases they support ACID transactions.
2) aggregate-oriented databases do support atomic
updates, but only within a single aggregate
logical consistency within an aggregate but not
between aggregates .
logical consistency:
ensuring that different data
items make sense together.
To avoid a logically
inconsistent read-write
conflict, RDBMS support the
notion of transactions.
Providing Martin wraps his
two writes in a transaction,
the system guarantees that
Pramod will either read both
data items before the update
or both after the update.
In above situation avoid
inconsistency by giving single order
aggregate.
18.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Inconsistency Window
The period between the update and the moment when it is
guaranteed that any observer will always see the updated value is
dubbed the inconsistency window.
The length of time an inconsistency is
present is called the inconsistency window.
A NoSQL system may have a quite short
inconsistency window. Usually less than a
second.
19.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Inconsistency Window
The presence of an inconsistency window means that different
people will see different things at the same time
If Martin and Cindy are looking at rooms while on a transatlantic call,
it can cause confusion. It’s more common for users to act
independently
But inconsistency windows can be particularly problematic when you get
inconsistencies with yourself. Consider the example of posting comments
on a blog entry. Few people are going to worry about inconsistency
windows of even a few minutes while people are typing in their latest
thoughts.
20.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Replication Inconsistency
R3
A=5
R4
A=5
R5
A=5
R1
A=5
R2
A=5
Original
DATABASE
Read(A) from replica R1 A=5?
Read(A) from replica R2 A=5?
Read(A) from replica R3 A=5?
Read(A) from replica R4 A=5?
Read(A) from replica R5 A=5?
Replicas are places on different geographical areas
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Replication Inconsistency
The updates will propagate fully, and Martin will see the room is fully
booked. Therefore this situation is generally referred to as eventually
consistent, meaning that at any time nodes may have replication
inconsistencies but, if there are no further updates, eventually all
nodes will be updated to the same value.
Data that is out of date is generally referred to as stale, which reminds
us that a cache is another form of replication—essentially following
the master-slave distribution model.
23.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Replication Inconsistency
When replication is introduced in NoSQL a new kind of
consistency will come into picture.
Example
There is last hotel room to be booked for event. The hotel reservation system runs on many nodes.
Martin and Cindy are a couple considering this room, but they are discussing this on the phone
because Martin is in London and Cindy is in Boston. Meanwhile Pramod, who is in Mumbai, goes
and books that last room. That updates the replicated room availability, but the update gets to
Boston quicker than it gets to London. When Martin and Cindy fire up their browsers to see if the
room is available, Cindy sees it booked and Martin sees it free. This is another inconsistent read—
but it’s a breach of a different form of consistency we call replication consistency
Ensuring that the same data item has the same value
when read from different replicas.but this does not
happen so this is inconsistency
24.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Replication Inconsistency
Although replication consistency is independent from logical
consistency, replication can exacerbate(excel) a logical inconsistency
by lengthening its inconsistency window.
Two different updates on the master may be performed in rapid
succession, leaving an inconsistency window of milliseconds. But
delays in networking could mean that the same inconsistency window
lasts for much longer on a slave.
25.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
In situations like this, you can tolerate reasonably long inconsistency
windows, but you need ready our-writes consistency which means that,
once you’ve made an update, you’re guaranteed to continue seeing that
update. One way to get this in an otherwise eventually consistent
system is to provide session consistency: Within a user’s session there is
read-your-writes consistency.
This does mean that the user may lose that consistency should their session
end for some reason or should the user access the same system
simultaneously from different computers, but these cases are relatively rare.
26.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Sticky Session
There are a couple of techniques to provide session
consistency. A common way, and often the easiest way, is to
have a sticky session: a session that’s tied to one node (this is
also called session affinity). A sticky session allows you to
ensure that as long as you keep read-your-writes consistency
on a node, you’ll get it for sessions too. The downside is that
sticky sessions reduce the ability of the load balancer to do its
job
27.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Another approach for session consistency is to use
version stamps and ensure every interaction with
the data store includes the latest version stamp
seen by a session. The server node must then
ensure that it has the updates that include that
version stamp before responding to a request.
28.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Maintaining session consistency with sticky sessions and master-slave
replication can be awkward if you want to read from the slaves to
improve read performance but still need to write to the master.
Two ways for handling this is
1) For writes to be sent the slave, who then takes responsibility for forwarding
them to the master while maintaining session consistency for its client.
2) Switch the session to the master temporarily when doing a write, just long
enough that reads are done from the master until the slaves have caught up with
the update.
29.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Relaxing Consistency
Trading off consistency is a familiar concept even in single-
server relational database systems.
But, the principal tool to enforce consistency is the
transaction.
Transaction systems usually come with the ability to relax
isolation levels, allowing queries to read data that hasn’t been
committed yet, and in practice we see most applications relax
consistency down from the highest isolation level (serialized)
in order to get effective performance.
30.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Relaxing Consistency
Consistency is a Good Thing — but, sadly, sometimes we have to
sacrifice it
It is always possible to design a system to avoid
inconsistencies, but often impossible to do so without
making unbearable sacrifices in other characteristics of
the system.
As a result, we often have to tradeoff consistency for
something else
31.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP THEOREM
Proposed by Eric Brewer in 2000
The basic statement of the CAP theorem is that,
given the three properties of Consistency,
Availability, and Partition tolerance, you can only
get two.
Consistency: All the nodes see same data item at same time
Availability:every request received by a database node(not fail)
should result in correct response
Partition Tolerance: The system continues to operate despite
arbitrary partitioning due to network failures.
32.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
33.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
34.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
35.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
36.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
The cap theorem is a tool used to makes system designers aware of the trade-offs while designing
networked shared-data systems.
the theorem states that networked shared-data systems can only guarantee/strongly support
two of the following three proper ties:
consistency — a guarantee that every node in a distributed cluster returns the same, most recent,
successful write. consistency refers to every client having the same view of the data. there are
various types of consistency models. consistency in cap (used to prove the theorem) refers to
linearizability or sequential consistency, a very strong form of consistency.
availability — every non-failing node returns a response for all read and write requests in a
reasonable amount of time. the key word here is every. to be available, every node on (either side of
a network partition) must be able to respond in a reasonable amount of time.
partition tolerant — the system continues to function and upholds its consistency guarantees in
spite of network partitions. network partitions are a fact of life. distributed systems guaranteeing
partition tolerance can gracefully recover from partitions once the partition heals.
37.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
the c and a in acid represent different concepts than c and a in the cap theorem.
cp (consistent and partition tolerant) — at first glance, the cp category is confusing, i.e.,
a system that is consistent and partition tolerant but never available. cp is referring to a
category of systems where availability is sacrificed only in the case of a network
partition.
ca (consistent and available) — ca systems are consistent and available systems in the
absence of any network partition. often a single node's db servers are categorized as ca
systems. single node db servers do not need to deal with partition tolerance and are
thus considered ca systems. the only hole in this theory is that single node db systems
are not a network of shared data systems and thus do not fall under the preview of cap.
ap (available and partition tolerant) — these are systems that are available and
partition tolerant but cannot guarantee consistency.
38.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
CAP Theorem
the correct way to think about cap is that in case of a network partition
(a rare occurrence) one needs to choose between availability
and consistency.
39.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Relaxing Durability
what is the point of a data store if it can
lose updates?
40.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
In some cases where user want to trade off
some durability fro higher performance.
Database are in memory
updates can be applied to memory periodically.
So it gives higher responses to disk requests
But though if there is server crash ,
any updates since the last flush will be lost
more important updates can force a flush to disk.
41.
Example:
A big websitemay have many users and keep temporary
information about what each user is doing in some kind of
session state.
There’s a lot of activity on this state, creating lots of demand,
which affects the responsiveness of the website.
The vital point is that losing the session data isn’t too much of
a tragedy—it will create some annoyance, but maybe less than
a slower website would cause.
Durability: call by call basis , so that more important
updates can force a flush to disk.
42.
Example:
relaxing durability iscapturing telemetric data
from physical devices.
It may be that you’d rather capture data at a
faster rate, at the cost of missing the last
updates should the server go down.
43.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Example Relaxing Durability
replicated data
A failure of replication durability occurs
when a node processes an update but
fails before that update is replicated to
the other nodes.
Node A
Node
B/Replic
a B
Node
C/Replica
C
Node
D/
Replica
s D
Synchronization of updates to replicas
Failure of Node A
before replication
of updates to
other nodes
Update
Database
Item Z
44.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Example Relaxing Durability :in Replicated Data
Master Slave
Replication
The slaves appoint a new master
automatically should the existing master fail.
If master will fail then updates proposed to
slaves(replicas) will lost.
When the master will come back there will
conflicting updates.
Above problem is known as durability
problem as user will think update is
succeed as master have acknowledged it but
due to failure of master node updates are lost.
45.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Solution to Durability Problem
Improve replication durability by ensuring that the
Master waits for some replicas to acknowledge the
update before the master acknowledges it to the client.
Above solution wills slow down the update process
then the decision is upto user is how vital durability is?
So the levels of durability will be decided.
46.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Quorums
Trading of consistency/durability
More nodes involve in a request , the higher is the chance of
avoiding an inconsistency
But how many nodes you take for strong consistency?
47.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Write Quorum
Some data is replicated over three nodes.
All the nodes are not needed to acknowledge for write operation
to ensure strong consistency.
Need is either two of them or majority of them.
For conflicting writes (ww_conflict)majority is important.
W > N/2
W: number of nodes participating in the write
N: number of nodes involved in replication
Meaning: number of nodes participating in the write must be
more than the half the number of nodes involved in replication
Replication Factor:The number of replicas is often
called the replication factor.
48.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Read Quorum
Number of nodes user need to contact to be sure you have the most
up-to-date change.
This quorum is complicated. (it depends on how many nodes need
to confirm a write)
49.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Example of write Quorum and write Quorum
Replication Factor= 3
R: number of nodes
contacted for read
W: number of nodes participating in the write
N: number of nodes involved in replication
N=3
For all write operation W=2
Means user need to contact two nodes to get correct
data(consistent data)
If w=1 then we have to talk with all three nodes to confirm
updates. So when w=1 we donot have write quorum(ww
conflict is there)
So we can contact all w to get strong consistent reads.
i.r R+W > N
50.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Example of write Quorum and write Quorum
Peer-to-peer distribution model : for strong read R+W > N this is inequality
Master slave replication: read and write from master only to
avoid ww and rw conflict.
To get good resilience: replication factor = 3
For strongly consistent reads , require writes to be acknowledged
by all the nodes, thus allowing reads to contact only one.
N=3
W=3
R=1
51.
Source: NoSQL Distilledby
Pramod J. Sadalage
Martin Fowler
Prepared by Dr. Dipali Meher
Key Points to Remember
Relational databases try to exhibit strong consistency by avoiding all the various inconsistencies.
In NOSQL “CAP theorem” & “eventual consistency”
Pessimistic approaches lock data records to prevent conflicts. Optimistic approaches detect
conflicts and fix them.
To get good consistency, you need to involve many nodes in data operations, but this increases
latency. So you often have to trade off consistency versus latency.
The CAP theorem states that if you get a network partition, you have to trade off availability of
data versus consistency.
Durability can also be traded off against latency, particularly if you want to survive failures with
replicated data. You do not need to contact all replicants to preserve strong consistency with
replication; you just need a large enough quorum.
Editor's Notes
#40 talked about consistency, which is most of what people mean when they talk about the ACID properties of database transactions.
The key to Consistency is serializing requests by forming Atomic, Isolated work units.
But most people would scoff at relaxing durability—after all, what is the point of a data store if it can lose updates?
#42 Session state allows a developer to store data about a user as he/she navigates through web pages in a web application.
Annoyance: to trouble someone
#45 Trade off- an act of balancing between two opposing situations, qualities or things, both of which you want and need
#47 the smallest number of people that must be at a meeting before it can make official decisions.
महत्त्वपूर्ण निर्णय घेण्यापूर्वी एखाद्या सभेला उपस्थिती आवश्यक असणारी किमान गणसंख्या; गणपूर्ती, कोरम, पुरेशी गणसंख्या.
A quorum is the minimum number of members of an organization who must be present in order
for their meeting to be legal or official