Types of No SQL Databases
pm jat @ daiict
Recap
• Aggregation Oriented Databases
– Object Aggregates are saved rather than normalized tables
– In other words it is “de-normalized” through related “object embedding”
– “Partition strategies” can be defined better
– By storing together data are queried together makes data manipulation
operations very efficient on distributed data
• Design of “Aggregation Oriented Databases” dictated by Query Load that means
some set of queries execute faster on a design while another set of queries shall
perform poorly.
– Note that this is not the case with relational systems!
05-10-2023 Types of No SQL Databases 2
What do we discuss here
• Types of No-SQL databases with some examples
– Key-Value
– Document Oriented
– “Column Family” or “Wide Column Databases”
– Graph Databases
05-10-2023 Types of No SQL Databases 3
Key Value Databases
• The concept of Key value is not new in Programming world.
• Hopefully, you have used one of following in your programming. Different languages
give different names: Maps, Dictionary, Associated Arrays
• Operations are done only on Key, quiet similar to the Dictionary
– Example: Get, Put, and Remove
• For Example:
items.put( 313, item_x );
Item a = items.get( 123 );
• So forth
05-10-2023 Types of No SQL Databases 4
Key Value Databases
• Key Value database uses the concept of “Dictionary” or “Map” in “databases”
• A key value database could very much defined as a “Persistent Dictionary” or
“Persistent Map”. Where
– value is an “Entity”
– key is “primary key” of that entity
• Key-Value databases are also called “Key Value Stores”
• Make sure that you clearly understand what the meaning of “Key” and “Value” in
Key-value databases. This is important.
• These are Aggregation Oriented Database, where key is Key of Aggregate and Value
is Aggregate itself?
05-10-2023 Types of No SQL Databases 5
Key Value Databases Operations
• In a Key-Value database, the collection of key-value pairs may be called as “Table” or
“Collection”
• Example: we have database “empdb”, then read/write operations re performed as
following:
• Write: empdb.employees.put("10011", '{empno:"1001",
"name":"Michael", "salary":60000}')
• Read: emp1 = empdb.employees.get("10005")
• Remove: emp2 = empdb.employees.remove("10001")
05-10-2023 Types of No SQL Databases 6
Key Value Stores
Key “empno”: “1234”
Value
• In pure “Key Value” databases, content
of value is not “visible” to DBMS
• Value is just read and written as a
chunk
• DynamoDB, Redis, Riak are popular key-
value stores.
• Since key-value systems always use
primary-key access, they generally have
greater performance and can be easily
scaled.
05-10-2023 Types of No SQL Databases 7
Key Value Stores
Key “orderNo”: 1234
Value
• Value in KV DB could be structured or
unstructured as byteArray, String CSV,
or
• Could be very much structured
represented as XML, or JSON
05-10-2023 Types of No SQL Databases 8
Key Value Stores - Summarized
• Database here is a “Persistent Collection”
of “Key-Value pairs”
• Where value is typically a “Data Object” , “Aggregated Data Object”
• Key here, typically could be “Primary Key”
• Value part is transparent and “DBMS does not see and does not process it”
• Values can be stored internal stored in any of data representation format
– String, Tuple, CLOB, BLOB, XML, JSON.
• By definition KV databases are schema-less
05-10-2023 Types of No SQL Databases 9
KV database – Pros and Cons
• Pros
– Efficient queries (very predictable performance) as operations are always based
on Key.
– Easy to distribute across a cluster.
– No impedance (object-relational) miss-match
• Cons
– No complex query filters (WHERE(Predicate) part is limited to Keys only)
– All joins must be done manually through code
– No foreign key constraints
– No triggers
05-10-2023 Types of No SQL Databases 10
Document Oriented Databases
• Document databases are also built on top of “Key-Value” strategy only
• A KV database does not know anything about the value part, and does not perform
any operation on value. It just puts and gets value as a block.
• Whereas in document databases, we specify structural information (partial or full)
of value part.
• We can use attributes from value part in SELECT and PREDICATE part
• Note that there is a thin boundary between KV and Document databases!
Document DB Key-Value DB
05-10-2023 Types of No SQL Databases 11
Document Oriented Databases
• Here the term “Document” refers to a record as shown below.
• Document here is a analogues to a row in RDB
• Here is a snapshot from Mongo DB docs.
• MongoDB It calls the collection
of documents as “Collection”
which is analogues to a
Table in RDB
Source: https://docs.mongodb.com/manual/core/databases-and-collections/
05-10-2023 Types of No SQL Databases 12
Document Oriented Databases
• Here are Mongo DB correspondences with RDB
RDB Doc DB
Database/Schema Database
Table “Collection” of
Documents
Row/Tuple “Document”
Row ID _id
05-10-2023 Types of No SQL Databases 13
Document DB and Schema
• Flexible Schema
– Mongo DB allows Schema to be “No Schema” to “Full Schema” including Type
and Cardinality Constraints
– We may define set of required attributes and some constraints on them.
– You can define ID, or Shard Key
– Schema remain flexible - any extra set of attribute-values.
• DB Design can be “Aggregated” and Normalized” or anything in between
• You can define index on attributes from “Value Part”
05-10-2023 Types of No SQL Databases 14
Mongo DB “Sharded”
• Recall modern distributed databases
are “partitioned” and “replicated”
• “Sharding” is a special type of
“partition” where data records are
partitioned “horizontally”
• Each partition has same schema.
• Each partition has a local DB Engine
“Wiredtiger” storage engine!
05-10-2023 Types of No SQL Databases 15
Popular “Document Databases”
• Popular Document databases are
– Mongo DB (more towards “Consistency”)
– Couch DB (more towards “Availability”)
• Different Document databases would have different features in terms of
– Data Types, and other building blocks
– API
– Partitioning, replication, indexing
– “Consistency Models”
05-10-2023 Types of No SQL Databases 16
“Column Family” or “Wide Column” Databases
• Why do we call them wide-column databases?
– Can have a huge number of columns, making a row too wide!
– Note that the row here is not the same as the row in relational systems!
• Why do we call them “Column Family Databases”?
– We do not define the names of columns while creating tables. Only the thing we
do is define “column families” (shall learn about shortly)
05-10-2023 Types of No SQL Databases 17
“Column Family” or “Wide Column” Databases
• Do use concepts of “Table”, “Row”, and “Column”, though slightly different, and
additionally “Column Family” [4]
• Column: “attribute” may not be atomic
• Row: represents a “Data Record”
– typically aggregate and certainly not a normalized row
• Table: A collection of “rows”
• Column Family: A group of columns. A table only has a fixed set of column families
but not a fixed set of columns. A column family can have any number of columns
and names are not known upfront!
05-10-2023 Types of No SQL Databases 18
“Table”, “Row”, “Column Family”, “Column” [5]
05-10-2023 Types of No SQL Databases 19
“Column Family” or “Wide Column” Databases
• Google – BigTable was the first column family databases
• Then “HBase” at Hadoop, “Cassandra” at Facebook followed!
– HBase is a Hadoop implementation of Big Table only, whereas
– Cassandra is “Column Family” adaption of “Dynamo DB”, a key-value database
• The BigTable article [2], defines “Big Table” as
– sparse, distributed, persistent multidimensional sorted “map”.
– The map is indexed by a row key, column key, and a timestamp; each value in the
map is an uninterested array of bytes
05-10-2023 Types of No SQL Databases 20
“Column Family” or “Wide Column” Databases [2]
• “sparse”:
– Total number of columns in a table could be very large, however individual rows
have values only for few of columns
– This is also reason that these systems are called “Wide Column” databases
• “distributed”: partitioned
• “persistent”
• “multidimensional sorted map”.
– “Map” tells that this is key value store
– Rows are kept “Sorted” on key
05-10-2023 Types of No SQL Databases 21
“Table”, “Row”, “Column Family”, “Column” [5]
Row as a multidimensional sorted map! [2]
05-10-2023 Types of No SQL Databases 22
Also a “key-value store” [5]
• Multi Level Key-Value store: [row key, column family, column qualifier, timestamp]
05-10-2023 Types of No SQL Databases 23
“Column Family Databases” and Schema
• While defining a table
– Row Key is Required
– Column Family is Required
– There can be any number of columns in Column Family
• A row required to have a “Row Key” and can have any number of column in any
column family
• A row need not to have columns in all column families!
05-10-2023 Types of No SQL Databases 24
Why do we three different types of “KV databases”
• Key Value
– Highly Scalable, Highly Efficient on Key based Access
• Document
– More friendly with “OOP”, Aggregation Oriented Store.
– Have no Impedance mismatch problem
– Tries to have the same functionality of “traditional databases” at scale
– Access is based on many attributes of an object even attributes of embedded objects
• Wide Column or Column Family
– Key value beyond mere “KEY”
– Multi-Dimensional Map
– We still have a large number of attributes (theoretically millions) still “look up” is
amazingly fast and that too at scale.
05-10-2023 Types of No SQL Databases 25
Graph Databases
• A node represents an Entity, whereas
• An edge Represents a Relationship
• Operations are performed by Graph Manipulations
• Graph Databases are popularly used in RDF-based Linked Open Data, Social
networks, and other information networks.
05-10-2023 Types of No SQL Databases 26
Graph Databases [1]
05-10-2023 Types of No SQL Databases 27
“Works that made the impact”
• GFS
• Map Reduce
• Google Big Table [2]
• Amazon Dynamo [3]
• Column Oriented Storage
05-10-2023 Types of No SQL Databases 28
References/Further Readings
[1] Chapter 2 of book Sadalage, Pramod J., and Martin Fowler. NoSQL distilled: a brief guide to the
emerging world of polyglot persistence. Pearson Education, 2013.
[2] Chang, Fay, et al. "Bigtable: A distributed storage system for structured data." ACM Transactions on
Computer Systems (TOCS) 26.2 (2008): 1-26.
[3] DeCandia, Giuseppe, et al. "Dynamo: Amazon's highly available key-value store." ACM SIGOPS
operating systems review 41.6 (2007): 205-220.
[4] Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system."
ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
[5] Khurana, Amandeep. "Introduction to HBase schema design." White Paper, Cloudera (2012).
[6] Sasaki, Bryce Merkl, Joy Chao, and Rachel Howard. "Graph databases for beginners." Neo4j (2018).
05-10-2023 Types of No SQL Databases 29