Advanced Database Management System
Week 7: Module 4: NoSQL
Faculty Name :
Mrs. Aditi Chhabria, Mrs. Rajashree Shedge, Mr. Tushar Ghorpade
Index - Module :4 NoSQL
Lecture 17 : Introduction to NoSQL, NoSQL Business Drivers 4
Lecture 18 : CAP Theorem, BASE Properties, NoSQL Business Drivers 17
Lecture 19 : NoSQL data Architecture patterns: Key value stores, Graph stores, column
34
column family(Bigtable) stores,
Lecture 20 : Document stores, Variations of NoSQL architectural patterns,
44
2 Module 3: NoSQL
Lecture 17
Introduction to NoSQL
History of Databases
Problem Solution
No Standard Relational
Flat File System
Definition Database
Problem Solution
Relational Could not handle
No SQL Databases
Databases big data
4 Module 3: NoSQL
Definition
NoSQL database stands for “Not Only SQL” or “NOT SQL”
Traditional RDBMS uses SQL syntax and queries to analyze and get the
data for further insights.
NoSQL is a Database Management System that provides mechanism for
storage and retrieval of massive amount of unstructured data in distributed
environment.
Database Management Systems
RDBMS
OLAP NoSQL
(Relational)
5 Module 3: NoSQL
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data. The
system response time becomes slow when you use RDBMS for massive
volumes of data.
To resolve this problem, we could "scale up" our systems by upgrading our
existing hardware. This process is expensive.
The alternative for this issue is to distribute database load on multiple hosts
whenever the load increases. This method is known as "scaling out."
6 Module 3: NoSQL
Why NoSQL?
7 Module 3: NoSQL
Further Challenges with Traditional RDBMS
Not optimized for horizontal scaling
Data size has increased tremendously to the range of petabytes.
Schema-less data
Majority of data comes in a semi-structured or unstructured format
Cost
High licensing cost for data analysis
High Velocity of data ingestion
RDBMS lacks in high velocity because it is designed for steady data
retention rather than rapid growth
8 Module 3: NoSQL
Performance
More Functionality Less Functionality
Less Performance Less Performance
Database Management Systems
RDBMS
OLAP NoSQL
(Relational)
9 Module 3: NoSQL
Performance
Structured Data or
Structured Data
Unstructured Data
Database Management Systems
RDBMS
OLAP NoSQL
(Relational)
Tables Cubes Collections
10 Module 3: NoSQL
Brief History of NoSQL
1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
2000- Graph database Neo4j is launched
2004- Google BigTable is launched
2005- CouchDB is launched
2007- The research paper on Amazon Dynamo is released
2008- Facebooks open sources the Cassandra project
2009- The term NoSQL was reintroduced
11 Module 3: NoSQL
Features of NoSQL
1. Non-relational
NoSQL databases never follow the relational model
Never provide tables with flat fixed-column records
Work with self-contained aggregates or BLOBs
Doesn't require object-relational mapping and data normalization
No complex features like query languages, query planners, referential
integrity joins, ACID
12 Module 3: NoSQL
Features of NoSQL
2. Scehma-free
NoSQL databases are either schema-free or have relaxed schemas
Do not require any sort of definition of the schema of the data
Offers heterogeneous structures of data in the same domain
13 Module 3: NoSQL
Features of NoSQL
3.Simple API
Offers easy to use interfaces for storage and querying data provided
APIs allow low-level data manipulation & selection methods
Text-based protocols mostly used with HTTP REST with JSON
Mostly used no standard based query language
Web-enabled databases running as internet-facing services
14 Module 3: NoSQL
Features of NoSQL
4. Distributed
Multiple NoSQL databases can be executed in a distributed fashion
Offers auto-scaling and fail-over capabilities
Often ACID concept can be sacrificed for scalability and throughput
Shared Nothing Architecture. This enables less coordination and higher
distribution.
15 Module 3: NoSQL
Lecture 18
CAP Theorem, BASE
Properties, NoSQL Business
Drivers
What is CAP theorem?
CAP theorem is also called brewer's theorem. It states that is impossible
for a distributed data store to offer more than two out of three guarantees
1. Consistency
2. Availability
3. Partition Tolerance
Consistency: The data should remain consistent even after the execution of
an operation. This means once data is written, any future read request should
contain that data. For example, after updating the order status, all the clients
should be able to see the same data.
17 Module 3: NoSQL
What is CAP Theorem?
Availability:
The database should always be available and responsive. It should not have
any downtime.
Partition Tolerance:
Partition Tolerance means that the system should continue to function even if
the communication among the servers is not stable. For example, the servers
can be partitioned into multiple groups which may not communicate with each
other. Here, if part of the database is unavailable, other parts are always
unaffected.
18 Module 3: NoSQL
CAP Theorem
NoSQL databases are meant for distributed storage
19 Module 3: NoSQL
CAP Theorem
Duplicate Copy of same data is maintained on Multiple Machines. This
increases availability, but decreases consistency
20 Module 3: NoSQL
CAP Theorem
If duplicate copy of same data is not maintained, consistency is superior But
availability decreases.
21 Module 3: NoSQL
CAP Theorem
If data on one machine changes, the update propagates to the other machine,
system is inconsistent, but will become eventually consistent.
22 Module 3: NoSQL
Eventual Consistency
The term "eventual consistency" means to have copies of data on
multiple machines to get high availability and scalability. Thus, changes
made to any data item on one machine has to be propagated to other
replicas.
Data replication may not be instantaneous as some copies will be updated
immediately while others in due course of time.
These copies may be mutually, but in due course of time, they become
consistent. Hence, the name eventual consistency.
23 Module 3: NoSQL
CAP Theorem
Availability
Each client has always
read and write
Pick
2
All clients always have The system works well
the same view of the despite physical
data network partition
Consistency Partition
Tolerance
24 Module 3: NoSQL
BASE – in NoSQL Systems
BASE: Basically Available, Soft
state, Eventual consistency
Basically, available means DB is
available all the time as per CAP
theorem
Soft state means even without an
input; the system state may
change
Eventual consistency means that
the system will become consistent
over time
25 Module 3: NoSQL
NoSQL business drivers
Volume
Velocity
Variability
Agility
26 Module 3: NoSQL
NoSQL business drivers
Volume:
There are two ways to look into data
processing to improve performance
If the key factor is only speed, a
faster processor could be used.
If the processing involves complex
computations, GPU could be used
along with the CPU.
But the volume of data is limited to
on board GPU memory
27 Module 3: NoSQL
NoSQL business drivers
Volume:
•The main reason for organizations to
look at an alternative to their current
RDBMS’s is the need to query big data
•The need to horizontal scaling made
organizations to move from serial to
distributed parallel processing where big
data is fragmented and processed using
cluster of commodity machines.
•This is made possible by the
development of technologies like Apache
Hadoop, MapR ,Hbase etc.
28 Module 3: NoSQL
NoSQL business drivers
Velocity
Many single-processor RDBMSs are
unable to keep up with the demands of
real-time inserts and online queries to the
database made by public-facing websites.
RDBMS frequently index many
columns of every new row, a process
which decreases system performance.
When single-processor RDBMSs are
used as a back end to a web store front,
the random bursts in web traffic slow
down response for everyone, and tuning
these systems can be costly when both
high read and write throughput is desired.
29 Module 3: NoSQL
NoSQL business drivers
Variability
• Companies that want to capture and
report on exception data struggle when
attempting to use rigid database schema
structures imposed by RDBMSs.
• For example, if a business unit wants
to capture a few custom fields for a
particular customer, all customer rows
within the database need to store this
information even though it doesn’t apply.
• Adding new columns to an RDBMS
requires the system be shut down and
ALTER TABLE commands to be run.
When a database is large, this process
can impact system availability, costing
time and money.
30 Module 3: NoSQL
NoSQL business drivers
Agility
The most complex part of building
applications using RDBMSs is the
process of putting data into and getting
data out of the database.
If your data has nested and repeated
subgroups of data structures, you need
to include an object-relational mapping
layer.
The responsibility of this layer is to
generate the correct combination of
INSERT, UPDATE, DELETE, and
SELECT SQL statements to move
object data to and from the RDBMS
persistence layer.
31 Module 3: NoSQL
NoSQL business drivers
Agility
This process isn’t simple and is associated
with the largest barrier to rapid change
when developing new or modifying
existing applications.
Generally, object-relational mapping
requires experienced software developers
who are familiar with object-relational
frameworks such as Java Hibernate (or
NHiber-nate for .Net systems).
Even with experienced staff, small
change requests can cause slowdowns in
development and testing schedules.
32 Module 3: NoSQL
Lecture 19
NoSQL Data Architecture
Patterns: Key-Value stores,
Column Family stores,
Document Stores
Types of NoSQL Databases
34 Module 3: NoSQL
Types of NoSQL databases
Relational databases generally strive toward normalization: making sure
every piece of data is stored only once.
35 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database
Traditional relational databases are row-oriented, with each row having a row-id
and each field within the row stored together in a table.
36 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database
Every time you look something up in a row-oriented database, every row is
scanned, regardless of which columns you require. Let’s say you only want a list
of birthdays in September. The database will scan the table from top to bottom
and left to right
37 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database
Column databases store each column separately, allowing for quicker scans
when only a small number of columns are involved
38 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database
When should you use a row-oriented database and when should you use a
column-oriented database?
In a column-oriented database it’s easy to add another column
because none of the existing columns are affected by it. But adding an
entire record requires adapting all tables. This makes the row-oriented
database preferable over the column-oriented database for online
transaction processing (OLTP) because this implies adding or changing
records constantly.
39 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database
Column Family Store:
Apache Hbase
Facebook’s Cassandra
Hypertable
Google BigTable
40 Module 3: NoSQL
Types of NoSQL databases : Key-Value Stores
Key-value stores are the least complex of the NoSQL databases. They are,
as the name suggests, a collection of key-value pairs.
This simplicity makes them the most scalable of the NoSQL database types,
capable of storing huge amounts of data.
41 Module 3: NoSQL
Types of NoSQL databases : Key-Value Stores
The value in a key-value store can be anything: a string, a number, but also
an entire new set of key-value pairs encapsulated in an object. Figure,
shows a slightly more complex key value nested structure.
Examples:
Redis
Voldemort
Riak
Amazon’s Dynamo
42 Module 3: NoSQL
Lecture 20
Document Stores, Graph
Stores
Types of NoSQL databases : Document Stores
Document stores are one step up in complexity from key-value stores.
Document stores appear the most natural among the NoSQL database
types because they’re designed to store everyday documents as is, and
they allow for complex querying and calculations on this often already
aggregated form of data.
The way things are stored in a relational database makes sense from a
normalization point of view: everything should be stored only once and
connected via foreign keys. Document stores care little about
normalization as long as the data is in a structure that makes sense.
44 Module 3: NoSQL
Types of NoSQL databases : Document Stores
Newspapers or magazines, for example, contain articles. To store these in a
relational database, you need to chop them up first: the article text goes in
one table, the author and all the information about the author in another,
and comments on the article when published on a website go in yet another.
Examples of document stores are MongoDB and CouchDB.
45 Module 3: NoSQL
Types of NoSQL databases : Document Stores
46 Module 3: NoSQL
Types of NoSQL databases : Document Stores
47 Module 3: NoSQL
Types of NoSQL databases : Document Stores
48 Module 3: NoSQL
Types of NoSQL databases : Document Stores
game::1
{
“name”:”Pokemon Red”,
“price”:”29.99”
}
game::2
{
“name”:”Super Smash Bros.”
“price”:”49.99”
}
49 Module 3: NoSQL
Types of NoSQL databases : Document Stores
person::agupta
{
“first_name”:”Arun”,
“last_name”:”Gupta”
“email”:”arun@test.com”
}
50 Module 3: NoSQL
Types of NoSQL databases : Document Stores
transaction::1 transaction::2
{ {
“order_number”:”1234” “order_number”:”1234”
“date”:”07/08/2016” “date”:”07/08/2016”
“person_id”:”person::nraboy”” “person_id”:”person::nraboy””
“game_id”:”game::1” “game_id”:”game::2”
“quantity”:”1” “quantity”:”1”
} }
51 Module 3: NoSQL
Types of NoSQL databases : Document Stores
transaction::1 transaction::2
{ {
“order_number”:”1234” “order_number”:”1234”
“date”:”07/08/2016” “date”:”07/08/2016”
“person_id”:”person::nraboy”” “person_id”:”person::nraboy””
“game_id”:”game::1” “game_id”:”game::2”
“quantity”:”1” “quantity”:”1”
} }
52 Module 3: NoSQL
Types of NoSQL databases : Document Stores
Embedded
53 Module 3: NoSQL
Types of NoSQL databases : Graph Databases
The last big NoSQL database type is the most complex one, geared toward
storing relations between entities in an efficient manner.
When the data is highly interconnected, such as for social networks,
scientific paper citations, or capital asset clusters, graph databases are the
answer.
Graph or network data has two main components:
Node: The entities themselves. In a social network this could be people.
Edge: The relationship between two entities. This relationship is
represented by a line and has its own properties. An edge can have a
direction, for example, if the arrow indicates who is whose boss.
54 Module 3: NoSQL
Types of NoSQL databases : Graph Databases
Graphs can become incredibly complex given enough relation and entity
types. Figure already shows that complexity with only a limited number of
entities. Graph databases like Neo4j also claim to uphold ACID, whereas
document stores and key-value stores adhere to BASE.
55 Module 3: NoSQL
Thank You