Module 5 - DBMS
Module 5 - DBMS
Chapter 1
CONCURRENCY CONTROL
1.1 INTRODUCTION:
Concurrent execution refers to the simultaneous execution of more than one transaction. This is a common
scenario in multi-user database environments where many users or applications might be accessing or modifying
the database at the same time.
Advantages of Concurrent Execution
1. Increased System Throughput: Multiple transactions can be in progress at the same time, but at different
stages
2. Maximized Processor Utilization: If one transaction is waiting for I/O operations, another transaction can
utilize the processor.
3. Decreased Wait Time: Transactions no longer have to wait for other long transactions to complete.
4. Improved Transaction Response Time: Transactions get processed faster because they can be executed in
parallel.
Concurrency Control Protocols: The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as:
1. Lock Based Concurrency Control Protocol
2. Time Stamp Concurrency Control Protocol
3. Validation Based Concurrency Control Protocol
Purpose of Concurrency control
• To ensure that Isolation property is maintained while allowing transactions to execute concurrently.
• To preserve database consistency by ensuring that the schedules of executing transactions are serializable.
• To resolve read-write and write-write conflicts among transactions.
• Several types of locks are used in concurrency control. To introduce locking concepts gradually, first we
discuss binary locks, which are simple but are also too restrictive for database concurrency control purposes
and so are not used much.
• Then we discuss shared/exclusive locks—also known as read/write locks—which provide more general
locking capabilities and are used in database locking schemes.
1. Binary Locks. A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity).
• A distinct lock is associated with each database item X. If the value of the lock on X is 1, item X cannot be
accessed by a database operation that requests the item.
• If the value of the lock on X is 0, the item can be accessed when requested, and the lock value is changed to
1. We refer to the current value (or state) of the lock associated with item X as lock(X).
• Two operations, lock_item and unlock_item, are used with binary locking. A transaction requests access to
an item X by first issuing a lock_item(X) operation.
• If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks the
item) and the transaction is allowed to access item X.
• When the transaction is through using the item, it issues an unlock_item(X) operation, which sets LOCK(X)
back to 0 (unlocks the item) so that X may be accessed by other transactions. Hence, a binary lock
enforces mutual exclusion on the data item. A description of the lock_item(X) and unlock_item(X)
operations is shown below
• Notice that the lock_item and unlock_item operations must be implemented as indivisible units (known
as critical sections in operating systems); that is, no interleaving should be allowed once a lock or unlock
operation is started until the operation terminates or the transaction waits.
• In Figure the wait command within the lock_item(X) operation is usually implemented by putting the
transaction in a waiting queue for item X until X is unlocked and the transaction can be granted access to it.
Other transactions that also want to access X are placed in the same queue. Hence, the wait command is
considered to be outside the lock_item operation.
• It is quite simple to implement a binary lock; all that is needed is a binary-valued variable, LOCK, associated
with each data item X in the database. In its simplest form, each lock can be a record with three fields:
• <Data_item_name, LOCK, Locking_transaction> plus a queue for transactions that are waiting to
access the item.
• The system needs to maintain only these records for the items that are currently locked in a lock table, which
could be organized as a hash file on the item name. Items not in the lock table are considered to be unlocked.
The DBMS has a lock manager sub-system to keep track of and control access to locks.
If the simple binary locking scheme described here is used, every transaction must obey the following rules:
1. A transaction T must issue the operation lock_item(X) before any read_item(X) or write_item(X) operations
are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and write_item(X) operations
are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.1
A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on item X.
This is because read operations on the same item by different transactions are not conflicting .However, if a
transaction is to write an item X, it must have exclusive access to X.
• For this purpose, a different type of lock called a multiple-mode lock is used. In this scheme—
called shared/exclusive or read/write locks—there are three locking
operations: read_lock(X), write_lock(X), and unlock(X).
• A lock associated with an item X, LOCK(X), now has three possible states: read-locked, write-locked,
or unlocked.
• A read-locked item is also called share-locked because other transactions are allowed to read the item,
whereas a write-locked item is called exclusive-locked because a single transaction exclusively holds the
lock on the item.
• One method for implementing the preceding operations on a read/write lock is to keep track of the number
of transactions that hold a shared (read) lock on an item in the lock table.
• Again, to save space, the system needs to maintain lock records only for locked items in the lock table. The
value (state) of LOCK is either read-locked or write-locked, suitably coded (if we assume no records are kept
in the lock table for unlocked items).
• If LOCK(X)=read-locked, the value of locking transaction(s) is a list of one or more transactions that hold
the shared (read) lock on X. The three operations read_lock(X), write_lock(X), and unlock(X).
When we use the shared/exclusive locking scheme, the system must enforce the following rules:
1. A transaction T must issue the operation read_lock(X) or write_lock(X) before any read_item(X)
operation is performed in T.
2. A transaction T must issue the operation write_lock(X) before any write_item(X) operation is performed
in T.
• Such a transaction can be divided into two phases: an expanding or growing (first) phase, during which
new locks on items can be acquired but none can be released; and a shrinking (second) phase, during which
existing locks can be released but no new locks can be acquired.
• If lock conversion is allowed, then upgrading of locks (from read-locked to write-locked) must be done during
the expanding phase, and downgrading of locks (from write-locked to read-locked) must be done in the
shrinking phase. Hence, a read_lock(X) operation that downgrades an already held write lock on X can appear
only in the shrinking phase.
• Transactions T1 and T2 in Figure 22.3(a) do not follow the two-phase locking protocol because
the write_lock(X) operation follows the unlock(Y) operation in T1, and similarly the write_lock(Y) operation
follows the unlock(X) operation in T2.
• If we enforce two-phase locking, the transactions can be rewritten as T1 and T2 , as shown in Figure 22.4.
Now, the schedule shown in Figure 22.3(c) is not permitted for T1 and T2 (with their modified order of locking
and unlocking operations) under the rules of locking described in Section 22.1.1 because T1 will issue
its write_lock(X) before it unlocks item Y; consequently, when T2 issues its read_lock(X), it is forced to wait
until T1 releases the lock by issuing an unlock (X) in the schedule.
• It can be proved that, if every transaction in a schedule follows the two-phase locking protocol, the schedule
is guaranteed to be serializable, obviating the need to test for serializability of schedules. The locking
protocol, by enforcing two-phase locking rules, also enforces serializability.
• Two-phase locking may limit the amount of concurrency that can occur in a schedule because a
transaction T may not be able to release an item X after it is through using it if T must lock an additional
item Y later; or conversely, T must lock the additional item Y before it needs it so that it can release X.
• Hence, X must remain locked by T until all items that the transaction needs to read or write have been
locked; only then can X be released by T.
• Meanwhile, another transaction seeking to access X may be forced to wait, even though T is done with X;
conversely, if Y is locked earlier than it is needed, another transaction seeking to access Y is forced to
wait even though T is not using Y yet.
• This is the price for guaranteeing serializability of all schedules without having to check the schedules
themselves.
• Although the two-phase locking protocol guarantees serializability (that is, every schedule that is
permitted is serializable), it does not permit all possible serializable schedules (that is, some serializable
schedules will be prohibited by the protocol).
1. A variation known as conservative 2PL (or static 2PL) requires a transaction to lock all the items it
accesses before the transaction begins execution, by predeclaring its read-set and write-set. Recall from Section
21.1.2 that the read-set of a transaction is the set of all items that the transaction reads, and the write-set is the
set of all items that it writes.
• If any of the predeclared items needed cannot be locked, the transaction does not lock any item; instead,
it waits until all the items are available for locking.
• Conservative 2PL is a deadlock-free protocol, we can discuss the deadlock problem. However, it is
difficult to use in practice because of the need to predeclare the read-set and write-set, which is not
possible in many situations.
2. In practice, the most popular variation of 2PL is strict 2PL, which guarantees strict .
In this variation, a transaction T does not release any of its exclusive (write) locks until after it commits or aborts.
Hence, no other transaction can read or write an item that is written by T unless T has committed, leading to a
strict schedule for recoverability. Strict 2PL is not deadlock-free.
3.A more restrictive variation of strict 2PL is rigorous 2PL, which also guarantees strict schedules. In this
variation, a transaction T does not release any of its locks (exclusive or shared) until after it commits or aborts,
and so it is easier to implement than strict 2PL.
Notice the difference between conservative and rigorous 2PL: the former must lock all its items before it starts,
so once the transaction starts it is in its shrinking phase; the latter does not unlock any of its items until after it
terminates (by committing or aborting), so the transaction is in its expanding phase until it ends.
In many cases, the concurrency control subsystem itself is responsible for generating
the read_lock and write_lock requests.
• For example, suppose the system is to enforce the strict 2PL protocol. Then, whenever
transaction T issues a read_item(X), the system calls the read_lock(X) operation on behalf of T.
• If the state of LOCK(X) is write_locked by some other transaction T , the system places T in the waiting
queue for item X; otherwise, it grants the read_lock(X) request and permits the read_item(X) operation
of T to execute.
• On the other hand, if transaction T issues a write_item(X), the system calls the write_lock(X) operation
on behalf of T. If the state of LOCK(X) is write_locked or read_locked by some other transaction T , the
system places T in the waiting queue for item X;
• if the state of LOCK(X) is read_locked and T itself is the only transaction holding the read lock on X, the
system upgrades the lock to write_locked and permits the write_item(X) operation by T.
• Finally, if the state of LOCK(X) is unlocked, the system grants the write_lock(X) request and permits
the write_item(X) operation to execute. After each action, the system must update its lock table
appropriately.
The use of locks can cause two additional problems: deadlock and starvation.
• A simple example is shown in Figure 22.5(a), where the two transactions T1 and T2 are deadlocked in a partial
schedule; T1 is in the waiting queue for X, which is locked by T2 , while T2 is in the waiting queue for Y, which
is locked by T1 . Meanwhile, neither T1 nor T2 nor any other transaction can access items X and Y.
Deadlock Prevention
• A Transaction Locks all the data items it refers to before it begins execution.
• This way of locking prevents the deadlock since a transaction never waits for a data item.
• The conservative two phase locking uses this type of approach.
Deadlock detection and resolution
In this approach , deadlocks are allowed to happen. The scheduler maintains a wait for graph for detecting cycle.
If a cycle exists, then one of the transactions involved in the cycle is selected as victim and roll back.
A wait for graph is created using lock table. As soon as a transaction is blocked, it is added to the graph. When a
chain like : Ti waits for Tj waits for Tk waits for Ti or Tj occurs , then this creates a cycle.
Deadlock avoidance
There are many variations of two-phase locking algorithm. Some avoid deadlock by not letting cycle to complete.
That is as soon as the algorithm discovers that blocking a transaction is likely to create a cycle , it rolls back the
transaction.
• Starvation occurs when a particular transaction consistently waits or restarted and never gets a chance to
proceed further.
• In a deadlock resolution it is possible that the same transaction may consistently be selected as victim and
roll back.
• In wound wait scheme a younger transaction may always be wounded(aborted)by a long running older
transaction which creates starvation.
• Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions.
o If true, a younger transaction has already read or written to the data item, so:
o If false:
o If true, a younger transaction has already written to the data item, so:
o If false:
This process ensures that transactions are executed in a manner that maintains consistency based on their
timestamps.
A variation of basic TO called strict TO ensures that the schedules are both strict (for easy recoverability) and
(conflict) serializable.
In this variation, a transaction T issues a read_item(X) or write_item(X) such that TS(T) > write_TS(X) has its
read or write operation delayed until the transaction T′ that wrote the value of X (hence TS(T′) = write_TS(X))
has committed or aborted.
To implement this algorithm, it is necessary to simulate the locking of an item X that has been written by
transaction T′ until T′ is either committed or aborted. This algorithm does not cause deadlock, since T waits for T′
only if TS(T) > TS(T′).
A modification of the basic TO algorithm, known as Thomas’s write rule, does not enforce conflict serializability,
but it rejects fewer write operations by modifying the checks for the write_item(X) operation as follows:
1. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If write_TS(X) > TS(T), then do not execute the write operation but continue processing. This is because some
transaction with timestamp greater than TS(T)—and hence after T in the timestamp ordering—has already written
the value of X. Thus, we must ignore the write_item(X) operation of T because it is already outdated and obsolete.
Notice that any conflict arising from this situation would be detected by case (1).
3. If neither the condition in part (1) nor the condition in part (2) occurs, then execute the write_item(X) operation
of T and set write_TS(X) to TS(T).
• This approach maintains a number of versions of a data item and allocates the right version to a read operation
of a transaction.
• Thus unlike other mechanisms a read operation in this mechanism is never rejected.
• Side effect: Significantly more storage (RAM & Disk)is required to maintain multiple versions. To check
unlimited growth of versions, a garbage collection is run when some criteria is satisfied.
In this method, several versions X1, X2, … , Xk of each data item X are maintained. For each version, the value
of version Xi and the following two timestamps associated with version Xi are kept:
1. read_TS(Xi). The read timestamp of Xi is the largest of all the timestamps of transactions that have
successfully read version Xi.
2. write_TS(Xi). The write timestamp of Xi is the timestamp of the transaction that wrote the value of version Xi.
• Whenever a transaction T is allowed to execute a write_item(X) operation, a new version Xk+1 of item
X is created, with both the write_TS(Xk+1) and the read_TS(Xk+1) set to TS(T).
• Correspondingly, when a transaction T is allowed to read the value of version Xi , the value of
read_TS(Xi ) is set to the larger of the current read_TS(Xi ) and TS(T).
To ensure serializability, the following rules are used
Rules for Write Operation (write_item(X)):
1. Condition Checking:
When transaction T issues a write_item(X) operation:
Determine if there exists a version Xi of item X such that:
write_TS(Xi) (timestamp of when Xi was last written) is the highest among all versions of X.
write_TS(Xi) <= TS(T) (timestamp of transaction T).
read_TS(Xi) > TS(T) (timestamp of when Xi was last read is greater than TS(T)).
2. Abort Condition:
If such a version Xi exists (meeting all conditions above), then:
Abort and roll back transaction T.
This condition ensures that transaction T does not write to an item X that has been read by another transaction
after T started, which could violate serializability.
3. Create New Version:
If no such version Xi exists (i.e., all existing versions of X either have read_TS(Xi) <= TS(T) or there are no
versions of X at all):
Create a new version Xj of X with:read_TS(Xj) = write_TS(Xj) = TS(T).
This ensures that transaction T writes to a version of X that is consistent with its own timestamp, maintaining
MVCC principles.
Rules for Read Operation (read_item(X)):
1. Version Selection:
When transaction T issues a read_item(X) operation:Identify the version Xi of item X that has the highest
write_TS(Xi) among all versions of X such that: write_TS(Xi) <= TS(T).
2. Read Operation Execution:
Return the value of Xi to transaction T.Update read_TS(Xi) to the larger of TS(T) and its current
read_TS(Xi).This update ensures that the transaction T reads a consistent version of X that is at least as recent
as T's own timestamp, preventing it from reading outdated data.
concurrent access to data needs to be managed efficiently while ensuring transaction isolation and consistency.
Here’s an explanation of how MV2PL with Certify Locks works:
Overview of MV2PL with Certify Locks:
1. Multiversion Concurrency Control (MVCC):
o Versioning: Each data item in the database has multiple versions (snapshots) corresponding to
different points in time when it was modified.
o Read Consistency: Transactions can read from a version of a data item that is consistent with their
start time (TS(T)), ensuring they do not see intermediate changes made by other transactions.
2. Two-Phase Locking (2PL):
o Lock Acquisition: Transactions acquire locks (read or write) on data items before accessing them.
o Strict Two-Phase Locking: Transactions acquire all necessary locks before they start executing
(growing phase) and release all locks when they finish (shrinking phase).
3. Certify Locks:
o Purpose: Used to validate the correctness of read operations in a multiversion environment.
o Ensuring Read Consistency: Certify locks are acquired during the read phase to ensure that the
version of the data being read by a transaction is consistent with the transaction's timestamp (TS(T)).
Steps in MV2PL with Certify Locks:
1. Transaction Initialization:
o When a transaction T starts, it is assigned a timestamp TS(T).
2. Read Phase:
o Read_item(X):
▪ Transaction T requests to read data item X.
▪ The system identifies the version Xi of X that satisfies the following conditions:
▪ write_TS(Xi) <= TS(T): The write timestamp of Xi is less than or equal to
TS(T), ensuring T is allowed to read this version.
▪ Transaction T acquires a certify lock on version Xi to ensure that no conflicting writes
occur after TS(T) began.
▪ T reads the value from Xi and continues its processing.
3. Write Phase:
o Write_item(X):
▪ Transaction T requests to write to data item X.
▪ T first acquires appropriate write locks (exclusive locks) on X to prevent other
transactions from concurrently reading or writing to X.
▪ T then creates a new version Xj of X with write_TS(Xj) = TS(T) and commits the
updated value.
4. Commit Phase:
o After completing its operations, transaction T commits its changes to the database.
o Release all locks acquired during the transaction (both certify locks and write locks).
CHAPTER 2
NOSQL DATABASES AND BIG DATA STORAGE
SYSTEMS
2.Introduction to NOSQL Systems
NoSQL is a type of database management system (DBMS) that is designed to handle and store large volumes of
unstructured and semi-structured data. Unlike traditional relational databases that use tables with pre-defined
schemas to store data, NoSQL databases use flexible data models that can adapt to changes in data structures and
are capable of scaling horizontally to handle growing amounts of data.
NoSQL databases are often used in applications where there is a high volume of data that needs to be processed
and analysed in real-time, such as social media analytics, e-commerce, and gaming. They can also be used for
other applications, such as content management systems, document management, and customer relationship
management.
However, NoSQL databases may not be suitable for all applications, as they may not provide the same level of
data consistency and transactional guarantees as traditional relational databases. It is important to carefully
evaluate the specific needs of an application when choosing a database management system.
2. Horizontal Scalability
Distributed Architecture: Many NoSQL systems are designed to scale out horizontally, meaning they can distribute
data across multiple servers or nodes. This helps handle increased load by adding more servers rather than
upgrading existing hardware.
3. Variety of Data Models
Document Stores: Data is stored in documents (e.g., JSON or BSON), and each document can have a different
structure. Examples include MongoDB and CouchDB.
Key-Value Stores: Data is stored as key-value pairs, where each key is unique and maps to a value. Examples
include Redis and DynamoDB.
Column-Family Stores: Data is stored in columns rather than rows, which allows for efficient querying and storage
of sparse data. Examples include Cassandra and HBase.
Graph Databases: Data is stored as nodes, edges, and properties, which is ideal for handling complex relationships.
Examples include Neo4j and ArangoDB.
4. Eventual Consistency
Consistency Models: Many NoSQL systems adopt an "eventual consistency" model rather than strict ACID
(Atomicity, Consistency, Isolation, Durability) transactions. This means that the system guarantees that, given
enough time, all replicas of the data will converge to the same value, but consistency is not guaranteed in real-
time.
5. High Availability
Fault Tolerance: NoSQL databases are often designed with high availability in mind. They typically provide
mechanisms for replication and automatic failover to ensure that the system remains operational even if some
components fail.
6. Partitioning and Sharding
Data Distribution: To handle large volumes of data and high throughput, NoSQL systems often use partitioning
(or sharding) techniques to distribute data across multiple servers. This improves performance and scalability.
7. Optimized for Specific Use Cases
Tailored Designs: Different NoSQL databases are optimized for specific use cases, such as high-speed read/write
operations, complex queries, or handling large volumes of data. This specialization can provide performance
benefits for particular applications.
8. Flexible Querying
Varied Query Languages: NoSQL databases often use their own query languages or APIs tailored to their data
models. For example, MongoDB uses the MongoDB Query Language (MQL), while Cassandra uses CQL
(Cassandra Query Language).
9. Large Volume Handling
Big Data Support: Many NoSQL systems are designed to handle large volumes of data, making them suitable for
big data applications where traditional relational databases might struggle.
10. Performance Optimization
High Throughput: Many NoSQL systems are optimized for high-throughput operations and low-latency
responses, making them suitable for real-time applications.
4. Graph-based NOSQL systems: Data is represented as graphs, and related nodes can be found by traversing
the edges using path expressions.
Additional categories can be added as follows to include some systems that are not easily categorized into the
above four categories, as well as some other types of systems that have been available even before the term
NOSQL became widely used.
5. Hybrid NOSQL systems: These systems have characteristics from two or more of the above four categories.
6. Object databases & XML databases
Even keyword-based search engines store large amounts of data with fast search access, so the stored data can be
considered as large NOSQL big data stores.
C -: Consistency
In a distributed system, consistency means that all nodes or replicas in the system have the same data at the
same time. When a client reads data, it receives the most recent write or an error. In other words, there is no
divergence in the data observed by different nodes. Suppose we are working on a distributed system having client
node and two database nodes say d1 and d2 . Now let’s say we have generated an update request to d1 and at the
same time we have generated a read request at d2 . So here due to replication of data between d1 and d2 we are
able to access latest data. This is called consistency.
A -: Availability
Availability refers to the system’s ability to respond to client requests, even in the presence of node failures or
network partitions. An available system ensures that every request eventually receives a response, though it
doesn’t guarantee that the response contains the most recent data. In short availability ensures that the system is
always available.
P -: Partition Tolerance
Partition tolerance deals with the system’s ability to continue functioning even when network partitions occur.
Network partitions can cause nodes to lose contact with one another, making communication and synchronization
difficult.
CAP theorem says that we cannot have all three properties i.e. C A P at same time we can have at most two at
once . So let’s understand this .
All possible combinations of consistency , availability and partition tolerance are
1.CA (consistency + availability)
Here complete system is consistent and is always available . If we break the connection between systems in
order to make it partition tolerant we will lose consistency of system.
The CAP theorem is important because it forces developers to think carefully about the trade-offs they’re making
when building a distributed system. When designing a distributed system, you have to decide which two properties
are most important for your use case.
Example:
2. Read: Documents can be read from the database. The API or query language allows developers to query for
documents using their unique identifiers or field values. Indexes can be added to the database in order to increase
read performance.
• db.collection.find()
Example:
3. Update:Update operations modify existing documents in a collection. MongoDB provides the following
methods to update documents of a collection.
• db.collection.updateOne()
• db.collection.updateMany()
• db.collection.replaceOne()
4. Delete Operations:Delete operations remove documents from a collection. MongoDB provides the
following methods to delete documents of a collection.
• db.collection.deleteOne()
• db.collection.deleteMany()
for example, structured data rows (tuples) similar to relational data, or semi structured data using JSON or
some other self-describing data format.
• Different key-value stores can thus store unstructured, semi structured, or structured data items.
Characteristics of Key-value stores:
• The main characteristic of key-value stores is the fact that every value (data item) must be associated with a
unique key, and that retrieving the value by supplying the key must be very fast.
• There are many systems that fall under the key-value store label, so rather than provide a lot of details on one
particular system.
• DynamoDB data model. The basic data model in DynamoDB uses the concepts of tables, items, and attributes.
• A table in DynamoDB does not have a schema; it holds a collection of self-describing items.
• Each item will consist of a number of (attribute, value) pairs, and attribute values can be single-valued or
multivalued. So basically, a table will hold a collection of items, and each item is a self-describing record (or
object).
• DynamoDB also allows the user to specify the items in JSON format, and the system will convert them to the
internal storage format of DynamoDB.
• When a table is created, it is required to specify a table name and a primary key; the primary key will be used
to rapidly locate the items in the table.
• The primary key attribute must exist in every item in the table. The primary key can be one of the following
two types:
• A single attribute: The DynamoDB system will use this attribute to build a hash index on the items in the
table. This is called a hash type primary key. The items are not ordered in storage on the value of the hash
attribute.
• A pair of attributes. This is called a hash and range type primary key. The primary key will be a pair of
attributes (A, B): attribute A will be used for hashing, and because there will be multiple items with the same
value of A, the B values will be used for ordering the records with the same A value. A table with this type of
key can have additional secondary indexes defined on its attributes. For example, if we want to store multiple
versions of some type of items in a table, we could use ItemID as hash and Date or Timestamp (when the
version was created) as range in a hash and range type primary key.
formats can also be specified if the application provides the conversion (also known as serialization) between
the user format and the storage format as a Serializer class.
• The Serializer class must be provided by the user and will include operations to convert the user format into
a string of bytes for storage as a value, and to convert back a string (array of bytes) retrieved via s.get(k) into
the user format. Voldemort has some built-in serializers for formats other than JSON.
• Consistent hashing for distributing (key, value) pairs-A variation of the data distribution algorithm known
as consistent hashing is used in Voldemort for data distribution among the nodes in the distributed cluster of
nodes.
• A hash function h(k) is applied to the key k of each (k, v) pair, and h(k) determines where the item will be
stored. The method assumes that h(k) is an integer value, usually in the range 0 to Hmax = 2n−1, where n is
chosen based on the desired range for the hash values.
• This method is best visualized by considering the range of all possible integer hash values 0 to Hmax to be
evenly distributed on a circle (or ring).
• The nodes in the distributed system are then also located on the same ring; usually each node will have several
locations on the ring.
• The positioning of the points on the ring that represent the nodes is done in a psuedorandom manner.
• An item (k, v) will be stored on the node whose position in the ring follows the position of h(k) on the ring
in a clockwise direction.
• In Figure 24.2(a), we assume there are three nodes in the distributed cluster labeled A, B, and C, where node
C has a bigger capacity than nodes A and B. In a typical system, there will be many more nodes. On the circle,
two instances each of A and B are placed, and three instances of C (because of its higher capacity), in a
pseudorandom manner to cover the circle. Figure 24.2(a) indicates which (k, v) items are placed in which
nodes based on the h(k) values.
• Consistency and versioning. Voldemort uses a method similar to the one developed for DynamoDB for
consistency in the presence of replicas.
• Basically, concurrent write operations are allowed by different processes so there could exist two or more
different values associated with the same key at different nodes when items are replicated.
• Consistency is achieved when the item is read by using a technique known as versioning and read repair.
Concurrent writes are allowed, but each write is associated with a vector clock value.
• When a read occurs, it is possible that different versions of the same value (associated with the same key) are
read from different nodes.
• If the system can reconcile to a single final value, it will pass that value to the read; otherwise, more than one
version can be passed back to the application, which will reconcile the various versions into one version based
on the application semantics and give this reconciled value back to the nodes.
3. Apache Cassandra. Cassandra is a NOSQL system that is not easily categorized into one category; it is
sometimes listed in the column-based NOSQL category (see Section 24.5) or in the key-value category. If offers
features from several NOSQL categories and is used by Facebook as well as many other customers.
o Timestamp-based Retrieval: You can use timestamps to access specific versions of data. For
example, you might query data as it existed at a specific point in time by using a timestamp
filter.
4. Use Cases:
o Audit Trails: Versioning is useful for maintaining audit trails where changes to data need to be
tracked over time.
o Historical Analysis: It enables historical analysis by allowing queries against past states of data.
o Data Recovery: If data is accidentally overwritten or deleted, previous versions can be retrieved
for recovery.
• Labels and properties: A node label can be declared/specified when a node is created.
o It is also possible to create nodes without any labels.
• Relationships and relationship types: “->” specifies the direction of the relationship.
o The relationship can be traversed in either direction.
• Paths: A path specifies a traversal of part of the graph. Typically it is used as part of a query to specify a
pattern, where the query will retrieve from the graph data that matches the pattern.
o A path is typically specified by a start node, followed by one or more relationships, leading to one
or more end nodes that satisfy the pattern.
• Indexing and node identifiers: The Neo4j system creates an internal unique system-defined identifier for
each node, when a node is created. To retrieve individual nodes using other properties of the nodes efficiently,
the user can create indexes for the collection of nodes that have a particular label.
2.9.1 The Cypher Query Language of Neo4j
Cypher is a declarative graph query language that is used by developers worldwide. Created by Neo4j, Cypher
provides expressive and efficient queries for property graphs. The property graph data model is increasingly
popular across a wide variety of application domains, with growing adoption in multiple products and projects.
Cypher is the most established and intuitive query language to learn for working with property graphs.Neo4j uses
Cypher, a high-level query language for interacting with graph databases. Here's a brief overview of its features:
1. Clauses: Cypher queries are made up of different clauses that can pass results from one to the next.
2. Main Clauses:
o CREATE: Adds nodes and relationships.
o MATCH: Finds nodes and relationships based on patterns.
o RETURN: Specifies which results to retrieve.
o WHERE: Filters results based on conditions.
o ORDER BY: Sorts results.
o LIMIT: Restricts the number of results.
o WITH: Separates parts of a query, often used with aggregation.
3. Examples:
o Query 1: Finds locations for department number 5 using MATCH and RETURN.
o Query 2: Gets projects and hours per week for the employee with Empid = 2.
o Query 3: Lists employees and hours per week for those working on project Pno = 2.
o Query 4: Retrieves employees and their projects, sorted by employee name.
o Query 5: Returns the first 10 results.
o Query 6: Shows employees working on more than two projects and counts those projects. Uses
WITH and WHERE.
o Query 7: Displays nodes and relationships as a graph.
o Query 8: Adds a new property to employee nodes.
■ Enterprise edition vs. community edition. Both editions support the Neo4j graph data model and storage system,
as well as the Cypher graph query language, and several other interfaces, including a high-performance native
API, language drivers for several popular programming languages, such as Java, Python, PHP, and the REST
(Representational State Transfer) API. In addition, both editions support ACID properties. The enterprise edition
supports additional features for enhancing performance, such as caching and clustering of data and locking.
■ Graph visualization interface. Neo4j has a graph visualization interface, so that a subset of the nodes and edges
in a database graph can be displayed as a graph. This tool can be used to visualize query results in a graph
representation.
■ Master-slave replication. Neo4j can be configured on a cluster of distributed system nodes (computers), where
one node is designated the master node. The data and indexes are fully replicated on each node in the cluster.
Various ways of synchronizing the data between master and slave nodes can be configured in the distributed
cluster.
■ Caching. A main memory cache can be configured to store the graph data for improved performance.