0% found this document useful (0 votes)

26 views64 pages

NoSQL Database

The document discusses the benefits and limitations of relational databases compared to NoSQL databases, highlighting that while relational databases are designed for structured data and provide ACID properties, they struggle with scalability and distributed applications. NoSQL databases offer flexibility, horizontal scaling, and schema-less design, making them suitable for handling large volumes of diverse data. The CAP theorem is introduced, emphasizing the trade-offs between consistency, availability, and partition tolerance in distributed systems.

Uploaded by

Md Hamid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views64 pages

NoSQL Database

Uploaded by

Md Hamid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Relational databases 5

• Benefits of Relational databases:

🡺 Designed for OLTP

🡺 ACID properties
🡺 Strong consistancy, concurrency, recovery
🡺 Mathematical background
🡺 Standard Query language (SQL)
🡺 Lots of tools to use with i.e: Reporting services, entity
frameworks, ...
NoSQL why, what and when? 8
But...
❑ Relational databases were not built
for distributed applications.

Because...
❑ Joins are expensive
❑ Hard to scale horizontally (adding
more machines)
❑ Impedance (object-relational)
mismatch occurs
❑ Expensive (product cost, hardware,
Maintenance)
NoSQL why, what and when? 9

And....
It’s weak in:
❑ Speed (performance)
❑ High availability
❑ Partition tolerance
Why NOSQL now?? Driving Trends 11
13
What is NoSQL?

❑ A No SQL database provides a mechanism

for storage and retrieval of data that
employs less constrained models than
traditional relational database

❑ No SQL systems are also referred to as

"NotonlySQL“ to emphasize that they do in
fact allow SQL-like query languages to be
used.
Motivations of NoSQL databases 14

o simplicity of design
o simpler "horizontal" scaling to
clusters of machines (which is
a problem for relational
databases)
o finer control over availability
Servers can be added or removed without
application downtime

o limiting the object-relational

impedance mismatch
Characteristics of NoSQL databases 14

NoSQL avoids:
▶ Overhead of ACID transactions
▶ Complexity of SQL query
▶ Burden of up-front schema design
▶ DBA presence
▶ Transactions (It should be handled
at
application layer)
Provides:
▶ Easy and frequent changes to DB
▶ Fast development
▶ Large data volumes(eg.Google)
▶ Schema less
What we need ? 26

• We need a distributed database system having such

features:
– Fault tolerance
– High availability
– Consistency
– Scalability

Which is impossible!!!
According to CAP theorem
CAP Theorem
■ Three properties of a system
❑ Consistency (all copies have same value)
❑ Availability (system can run even if parts have failed)
❑ Via replication
❑ Partitions (network can break into two or more parts,
each with active systems that can’t talk to other
parts)
■ Brewer’s CAP “Theorem”: You can have at most
two of these three properties for any system
■ Very large systems will partition at some point
❑ 🡺Choose one of consistency or availablity
❑ Traditional database choose consistency
❑ Most Web applications choose availability
■ Except for specific parts such as order processing
Availability

■ Traditionally, thought of as the

server/process available five 9’s (99.999
%).
■ However, for large node system, at
almost any point in time there’s a good
chance that a node is either down or
there is a network disruption among the
nodes.
❑ Want a system that is resilient in the face
of network disruption
Eventual Consistency

■ When no updates occur for a long period of time,

eventually all updates will propagate through the
system and all the nodes will be consistent
■ For a given accepted update and a given node,
eventually either the update reaches the node or the
node is removed from service
■ Known as BASE (Basically Available, Soft state,
Eventual consistency), as opposed to ACID
❑ Soft state: copies of a data item may be inconsistent

❑ Eventually Consistent – copies becomes consistent at

some later time if there are no more updates to that
data item
CAP theorem 27

We can not achieve all the three items

In distributed database systems
(center)
NoSQL when? 10

o To handle a huge volume of structured, semi-structured and

unstructured data.
o Where there is a need to follow modern software development
practices like Agile Scrum and if you need to deliver
prototypes or fast applications.
o If you prefer object-oriented programming.
o If your relational database is not capable enough to scale up
to your traffic at an acceptable cost.
o If you want to have an efficient, scale-out architecture in place
of an expensive and monolithic architecture.
o If you have local data transactions that need not be very
durable.
o If you are going with schema-less data and want to include
new fields without any ceremony.
o When your priority is easy scalability and availability.
NoSQL when not? 10

o If you are required to perform complex and dynamic querying

and reporting, then you should avoid using NoSQL as it has a
limited query functionality. For such requirements, you should
prefer SQL only.
o NoSQL also lacks in the ability to perform dynamic operations.
It can’t guarantee ACID properties. In such cases like financial
transactions, etc., you may go with SQL databases.
o You should also avoid NoSQL if your application needs
run-time flexibility.
o If consistency is a must and if there aren’t going to be any
large-scale changes in terms of the data volume, then going
with the SQL database is a better option.
NoSQL is getting more & more popular 15
What is a schema-less datamodel? 16

In relational Databases:

▶ You can’t add a record which does

not fit the schema
▶ You need to add NULLs to unused
items in a row
▶ We should consider the
datatypes.
i.e : you can’t add a stirng to an
interger field
▶ You can’t add multiple items in a
field (You should create another
table: primary-key, foreign key,
joins, normalization, ... !!!)
What is a schema-less datamodel? 17

In NoSQL Databases:

▶ There is no schema to consider

▶ There is no unused cell
▶ There is no datatype (implicit)
▶ Most of considerations are done in

application layer

▶ We gather all items in an aggregate

(document)
Aggregate Data Models 18

NoSQL databases are classified in four major

datamodels:

• Key-value
• Document
• Column family (or wide
column)
• Graph

Each DB has its own query language

Aggregate Data Models 18

Column Family: Azure Cosmos DB, Accumulo, Cassandra, Scylla,

HBase.

Document: Azure Cosmos DB, Apache CouchDB, ArangoDB,

BaseX, Clusterpoint, Couchbase, eXist-db, IBM Domino,
MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB

Key–value: Azure Cosmos DB, Aerospike, Apache Ignite,

ArangoDB, Berkeley DB, Couchbase, Dynamo, FoundationDB,
InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database,
OrientDB, Redis, Riak, SciDB, SDBM/Flat File dbm

Graph: Azure Cosmos DB, AllegroGraph, ArangoDB,

InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, AgensGraph,
OrientDB, Virtuoso
Key-value data model 19

🡺 Simplest NOSQL databases

🡺 The main idea is the use of a

hash table

🡺 Access data (values) by strings

called keys

🡺 Data has no required format data

may have any format

🡺 Data model: (key, value) pairs

🡺 Basic
Operations:
Insert(key,value),
Fetch(key),
Update(key),
Delete(key)
Column family data model 20

🡺 Based on Google's Bigtable

🡺 The column is lowest/smallest instance of data.

🡺 the names and format of the columns can vary from row to row in the
same table

🡺 each column family typically contains multiple columns that are used
together
🡺 Within a given column family, all data is stored in a row-by-row
fashion, such that the columns for a given row are stored together,
rather than each column being stored separately.
Column family data model 20

🡺 A wide-column store can be

interpreted as a
two-dimensional key–value
store
🡺 It is a tuple that contains a
name, a value and a timestamp
Column family data model 21

Some statistics about Facebook Search (using Cassandra)

❖ MySQL > 50 GB Data

🡺 Writes Average : ~300 ms
🡺 Reads Average : ~350 ms

❖ Rewritten with Cassandra > 50 GB Data

🡺 Writes Average : 0.12 ms
🡺 Reads Average : 15 ms
Graph data model 22
🡺 Similar to network data model at high
level of abstraction
🡺 Based on Graph Theory.
🡺 You can use graph algorithms easily
🡺 Graph Query language (GQL): Gremlin,
cypher, SPARQL
🡺 underlying storage mechanism of graph
databases can vary: relational, key–value
store or document-oriented database
Document based data model 23

• The central concept of a document-oriented database is the notion

of a document
• document store are roughly equivalent to the programming concept
of an object
• While each document-oriented database implementation differs on
the details of this definition, in general, they all assume documents
encapsulate and encode data (or information) in some standard
format or encoding
• Encodings in use include XML, YAML, JSON, as well as binary forms
like BSON.
• allow different types of documents in a single store
• Documents are addressed in the database via a unique key that
represents that document. This key is a simple identifier (or ID),
typically a string, a URI, or a path
Document based data model 23

• Pair each key with complex data

structure known as data structure.
• Documents can contain many different
key-value pairs, or key-array pairs, or
even nested documents.
SQL vs NOSQL 25
Common Advantages of NoSQL
Systems

■ Cheap, easy to implement (open source)

■ Data are replicated to multiple nodes (therefore identical and
fault-tolerant) and can be partitioned
❑ When data is written, the latest version is on at least one node and then
replicated to other nodes

❑ No single point of failure

■ Easy to distribute
■ Don't require a schema
What does NoSQL Not Provide?
■ Joins
■ Group by
❑ But PNUTS (a massively parallel and
geographically distributed database
system for Yahoo!'s web applications)
provides materialized view approach to
joins/aggregation.
■ ACID transactions
■ SQL
■ Integration with applications that are
based on SQL
What: HBase is...
Open-source non-relational distributed
column family database modeled after
Google’s BigTable.

Think of it as a sparse, consistent,

distributed, multidimensional, sorted map:

labeled tables of rows

row consist of key-value cells:

(row key, column family, column, timestamp) -> value

HBase
random, real time read/write access to the
Big Data
goal is the hosting of very large tables --
billions of rows X millions of columns --
atop clusters of commodity hardware.
HDFS vs HBase
HBase
Tables in HBase can serve as the input and
output for MapReduce jobs run in Hadoop
may be accessed through the Java API but
also through REST, Avro or Thrift gateway
APIs

HBase runs on top of HDFS and is

well-suited for faster read and write
operations on large datasets with high
throughput and low input/output latency.
Phoenix
HBase is not a direct replacement for a
classic SQL database, however Apache Phoenix
project provides a SQL layer for Hbase
Apache Phoenix is an open source, massively
parallel, relational database engine
supporting OLTP for Hadoop using Apache HBase
as its backing store
Phoenix provides a JDBC driver that hides the
intricacies of the noSQL store enabling users
to create, delete, and alter SQL tables,
views, indexes, and sequences; insert and
delete rows singly and in bulk; and query data
through SQL.
Phoenix compiles queries and other statements
into native noSQL store APIs
Usage
HBase is now serving several data-driven
websites
Facebook elected to implement its new messaging
platform using HBase in November 2010, but
migrated away from HBase in 2018 (MyRocks)
Twitter runs HBase across its entire Hadoop
cluster.
HP IceWall SSO is a web-based single sign-on
solution and uses HBase to store user data to
authenticate users.
Adobe: currently have about 30 nodes running
HDFS, Hadoop and HBase in clusters ranging from
5 to 14 nodes on both production and development

Powered By Apache Hbase at

http://hbase.apache.org/poweredbyhbase.html
Enterprises that use HBase
What: Part of Hadoop
ecosystem

Provides realtime random read/write

access to data stored in HDFS

read HBase write

Data Data
read write
Consumer Producer
HDFS write
Hive vs. HBase
o Unlike Hive, HBase operations run in real-time on
its database rather than MapReduce jobs
o Apache Hive is a data warehouse system that's
built on top of Hadoop. Apache HBase is a NoSQL
key/value store on top of HDFS
o Apache Hive provides SQL features to Spark/Hadoop
data. HBase can store or process Hadoop data with
near real-time read/write needs.
o Hive should be used for analytical querying of
data collected over a period of time. HBase is
primarily used to store and process unstructured
Hadoop data
o HBase is perfect for real-time querying of Big
Data. Hive should not be used for real-time
querying
What: Features-1

Linear scalability, capable of

storing hundreds of terabytes of data

Automatic and configurable sharding

of tables

Automatic failover support

Strictly consistent reads and writes

What: Features-2
Integrates nicely with Hadoop MapReduce (both
as source and destination)

Easy Java API for client access

Thrift gateway and REST APIs

Bulk import of large amount of data

Replication across clusters & backup options

Block cache and Bloom filters for real-time

queries
How to use HBase?
Hbase Table
How: the Data
Row keys uninterpreted byte arrays
Columns grouped in columnfamilies (CFs)

CFs defined statically upon table creation

Rows are Cell is uninterpreted byte array and a timestamp

ordered and
accessed by Different data All values stores
row key separated into CFs as byte arrays

Row Key Data Rows can

have
geo:{‘country’:‘Belarus’,‘regio
Minsk differen
n’:‘Minsk’}
t
demography:{‘population’:‘1,937,00
0’@ts=2011} columns
Cell can have
New_York_City multiple
geo:{‘country’:‘ USA’,‘state’:’ NY’} versions

geo:{‘country’:‘Fiji’} Data can be

Suva demography:{‘population’:‘8,175,133’@ts=2010,
very “sparse”
‘population’:‘8,244,910’@ts=2011}
How: Writing the Data
Row updates are atomic

Updates across multiple rows are NOT

atomic, no transaction support out of
the box

HBase stores N versions of a cell

(default 3)

Tables are usually “sparse”, not all

columns populated in a row
How: Reading the Data
Reader will always read the last written (and committed)
values

Reading single row: Get

Reading multiple rows: Scan (very fast)

Scan usually defines start key and stop key

Rows are ordered, easy to do partial key scan

How: MapReduce Integration
How: Sharding the Data
Automatic and configurable sharding of
tables:

Tables partitioned into Regions

Region defined by start & end row keys

Regions are the “atoms” of

distribution

Regions are assigned to RegionServers

(HBase cluster slaves)
How: Setup: Components
HBase components

ZooKeeper

client
How: Setup: Hadoop Cluster
Typical Hadoop+HBase setup
Master Node HDFS

NameNode JobTracker MapRed

uce
HBase
HMaster

TaskTracker
TaskTracker

RegionServer RegionServer Slave

Nodes

DataNode DataNode

Slave Node Slave Node

How: Setup: Automatic Failover
When to Use HBase?
When: What HBase is good at

Serving large amount of data: built

to scale from the get-go
fast random access to the data

Write-heavy applications*

Append-style writing (inserting/

overwriting new data) rather than
heavy read-modify-write operations
When: HBase vs ...
General COMMANDS
• status: Provides the status of HBase,
for example, the number of servers.
• version: Provides the version of HBase
being used.
• table_help: Provides help for
table-reference commands.
• whoami: Provides information about the
user.
Hbase DDL commands
• create: Creates a table.
• list: Lists all the tables in HBase.
• disable: Disables a table.
• is_disabled: Verifies whether a table
is disabled.
• enable: Enables a table.
• is_enabled: Verifies whether a table
is enabled.
• describe: Provides the description of
a table.
• alter: Alters a table.
• exists: Verifies whether a table
exists.
• drop: Drops a table from HBase.
Hbase Data Manipulation commands

• put: Puts a cell value at a specified

column in a
specified row in a particular table.
• get: Fetches the contents of row or a
cell.
• delete: Deletes a cell value in a
table.
• deleteall: Deletes all the cells in a
given row.
• scan: Scans and returns the table
data.
• count: Counts and returns the number
of rows in a
table.
• truncate: Disables, drops, and

2.1 Nosql
No ratings yet
2.1 Nosql
25 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
No SQL
No ratings yet
No SQL
12 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Module 1
No ratings yet
Module 1
69 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
29 pages
Nosql
No ratings yet
Nosql
64 pages
NoSQL
No ratings yet
NoSQL
18 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
No SQL & RDBMS
No ratings yet
No SQL & RDBMS
39 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
2 - NoSQL
No ratings yet
2 - NoSQL
32 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
No SQL
No ratings yet
No SQL
109 pages
BDS Session 10
No ratings yet
BDS Session 10
70 pages
Unit 4
No ratings yet
Unit 4
47 pages
Unit 2
No ratings yet
Unit 2
26 pages
Bda Module 3
No ratings yet
Bda Module 3
20 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
11 NoSQL-slides
No ratings yet
11 NoSQL-slides
26 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
NOSQL
No ratings yet
NOSQL
23 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Unitw 12 W 2
No ratings yet
Unitw 12 W 2
18 pages
DSA Notes Unit-03
No ratings yet
DSA Notes Unit-03
144 pages
Chap2 NoSQL
No ratings yet
Chap2 NoSQL
13 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
NoSQL vs. Cloud Data Storage Systems
No ratings yet
NoSQL vs. Cloud Data Storage Systems
17 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
NGD Unit 1-4
No ratings yet
NGD Unit 1-4
43 pages
A Survey On RDBMS and NoSQL Databases MySQL Vs MongoDB
No ratings yet
A Survey On RDBMS and NoSQL Databases MySQL Vs MongoDB
7 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Nosql Prepared
No ratings yet
Nosql Prepared
60 pages
8.4 NoSQL Database
No ratings yet
8.4 NoSQL Database
36 pages
Module 3 NOSQL
No ratings yet
Module 3 NOSQL
69 pages
NoSQL Databases
No ratings yet
NoSQL Databases
52 pages
Nosql KK
No ratings yet
Nosql KK
23 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
NoSQL Databases Explained
No ratings yet
NoSQL Databases Explained
13 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
Database Management Systems: UNIT-5: Nosql Databases
No ratings yet
Database Management Systems: UNIT-5: Nosql Databases
39 pages
NoSQL Databases Overview
No ratings yet
NoSQL Databases Overview
8 pages
Gunawan Smart Farm RTOSv 2
No ratings yet
Gunawan Smart Farm RTOSv 2
6 pages
Guide To Effective ChatGPT Prompting
No ratings yet
Guide To Effective ChatGPT Prompting
42 pages
DrWeb Crash
No ratings yet
DrWeb Crash
13 pages
Applied One
No ratings yet
Applied One
166 pages
Annex A Barangay Profile DCF No. 1
77% (13)
Annex A Barangay Profile DCF No. 1
5 pages
AWA Television Manual
No ratings yet
AWA Television Manual
18 pages
Alfa Account Server V2.2.36
No ratings yet
Alfa Account Server V2.2.36
26 pages
Cse3012 Mobile-Application-Development LP 1.0 7 Cse3012 Mobile-Application-Development LP 1.0 1 Mobile Application Development
100% (1)
Cse3012 Mobile-Application-Development LP 1.0 7 Cse3012 Mobile-Application-Development LP 1.0 1 Mobile Application Development
3 pages
Slide 7 - Cost Estimation Static and Cocomo Basic
No ratings yet
Slide 7 - Cost Estimation Static and Cocomo Basic
22 pages
Internalisation and Localisation
No ratings yet
Internalisation and Localisation
41 pages
Acfroga7lh 3qkjyenivl01jo7ajbmipe Nvvlmfdrm53id0o2x7hq Evlyzkpsyz0wydsfreraso3q6nvj8jqj7ke0uhlglzplv0j9dvprlkrcaaib0z 1dhbx1ywi
No ratings yet
Acfroga7lh 3qkjyenivl01jo7ajbmipe Nvvlmfdrm53id0o2x7hq Evlyzkpsyz0wydsfreraso3q6nvj8jqj7ke0uhlglzplv0j9dvprlkrcaaib0z 1dhbx1ywi
1 page
Onshape College Lesson 10
No ratings yet
Onshape College Lesson 10
43 pages
Career in Industrial Automation, PLC's & Industrial HMI's
No ratings yet
Career in Industrial Automation, PLC's & Industrial HMI's
4 pages
Ethical Perspectives On Hacktivism: The Roles and Actions of Hacker Activist Groups
No ratings yet
Ethical Perspectives On Hacktivism: The Roles and Actions of Hacker Activist Groups
7 pages
Word Chapter 5 Study Guide
No ratings yet
Word Chapter 5 Study Guide
3 pages
VMDR Presentation Slides
No ratings yet
VMDR Presentation Slides
126 pages
Information Security Incident Management - Current Practice As Reported in The Literature
No ratings yet
Information Security Incident Management - Current Practice As Reported in The Literature
16 pages
An Analysis On Measuring Graph Patterns in Social Networks
No ratings yet
An Analysis On Measuring Graph Patterns in Social Networks
6 pages
YR 13 PUREdacs
No ratings yet
YR 13 PUREdacs
20 pages
SPFresh SOSP
No ratings yet
SPFresh SOSP
35 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
56 pages
OS Lab Assignment Final
No ratings yet
OS Lab Assignment Final
6 pages
Cryptography Notes
No ratings yet
Cryptography Notes
24 pages
iDS-7208HUHI-M1 X Datasheet 20241018
No ratings yet
iDS-7208HUHI-M1 X Datasheet 20241018
6 pages
NN & DL Lab Manual 1
No ratings yet
NN & DL Lab Manual 1
44 pages
Literature Review Fonts
100% (1)
Literature Review Fonts
7 pages
Information Assurance Security Reviewer Exam 2nd Semester 2025 2026
No ratings yet
Information Assurance Security Reviewer Exam 2nd Semester 2025 2026
8 pages
Easy2forge Open Die Forge Software
No ratings yet
Easy2forge Open Die Forge Software
20 pages
Database Quiz Review & Answers
No ratings yet
Database Quiz Review & Answers
53 pages
1MRK514026-UUS - en - N - Installation Manual, 670 Series Version 2.2 ANSI
No ratings yet
1MRK514026-UUS - en - N - Installation Manual, 670 Series Version 2.2 ANSI
102 pages

NoSQL Database

Uploaded by

NoSQL Database

Uploaded by

Relational databases 5

• Benefits of Relational databases:

🡺 Designed for OLTP

❑ A No SQL database provides a mechanism

❑ No SQL systems are also referred to as

o limiting the object-relational

• We need a distributed database system having such

■ Traditionally, thought of as the

■ When no updates occur for a long period of time,

❑ Eventually Consistent – copies becomes consistent at

We can not achieve all the three items

o To handle a huge volume of structured, semi-structured and

o If you are required to perform complex and dynamic querying

▶ You can’t add a record which does

▶ There is no schema to consider

▶ We gather all items in an aggregate

NoSQL databases are classified in four major

Each DB has its own query language

Column Family: Azure Cosmos DB, Accumulo, Cassandra, Scylla,

Document: Azure Cosmos DB, Apache CouchDB, ArangoDB,

Key–value: Azure Cosmos DB, Aerospike, Apache Ignite,

Graph: Azure Cosmos DB, AllegroGraph, ArangoDB,

🡺 Simplest NOSQL databases

🡺 The main idea is the use of a

🡺 Access data (values) by strings

🡺 Data has no required format data

🡺 Data model: (key, value) pairs

🡺 Based on Google's Bigtable

🡺 The column is lowest/smallest instance of data.

🡺 A wide-column store can be

Some statistics about Facebook Search (using Cassandra)

❖ MySQL > 50 GB Data

❖ Rewritten with Cassandra > 50 GB Data

• The central concept of a document-oriented database is the notion

• Pair each key with complex data

■ Cheap, easy to implement (open source)

❑ No single point of failure

Think of it as a sparse, consistent,

labeled tables of rows

row consist of key-value cells:

(row key, column family, column, timestamp) -> value

HBase runs on top of HDFS and is

Powered By Apache Hbase at

Provides realtime random read/write

read HBase write

Linear scalability, capable of

Automatic and configurable sharding

Automatic failover support

Strictly consistent reads and writes

Easy Java API for client access

Thrift gateway and REST APIs

Replication across clusters & backup options

Block cache and Bloom filters for real-time

CFs defined statically upon table creation

Rows are Cell is uninterpreted byte array and a timestamp

Row Key Data Rows can

geo:{‘country’:‘Fiji’} Data can be

Updates across multiple rows are NOT

HBase stores N versions of a cell

Tables are usually “sparse”, not all

Reading single row: Get

Reading multiple rows: Scan (very fast)

Rows are ordered, easy to do partial key scan

Tables partitioned into Regions

Region defined by start & end row keys

Regions are the “atoms” of

Regions are assigned to RegionServers

NameNode JobTracker MapRed

RegionServer RegionServer Slave

Slave Node Slave Node

Serving large amount of data: built

Append-style writing (inserting/

• put: Puts a cell value at a specified

You might also like