NoSQL
• NoSQL which stands for “Not Only SQL”:
• Databases following the NoSQL database scheme have no relations, i.e
Unlike the relational databases which use SQL as their primary language for
data access and have relational diagrams such as tables and have a
predefined schema.
• NoSQL databases have no such relational diagrams like the tables and has no
predefined the schema.
• The NoSQL databases are famous for its flexibility and are known for its rich
use in the development of real-time web apps and also for storing big data
• Advantages of NoSQL over RDBMS
• Handles Big Data easily
• There is no predefined schema making it very flexible
• Cheaper to manage
• Provides horizontal scaling which is advantageous for many
applications and web apps that contain a lot of data
• There are various types of NoSQL databases
• Document Databases(MongoDB, CouchDB):
• They store data in a format similar to the JSON format or JavaScript object
form.
• Completely schema less and provides dynamic web app development
• Column Databases(Apache Cassandra):
• Optimized for reading and writing data columns- wise instead of row-wise
• Key-Value Stores(Redis,Couchbase Server):
• Used to store a large amount of data.
• Similar to a associative array the simplest kind of a NoSQL database
• NoSQL is basically a database used to manage huge sets of
unstructured data, where in the data is not stored in tabular relations
like relational databases.
• Most of the currently existing Relational Databases have failed in
solving some of the complex modern problems like :
• Continuously changing nature of data - structured, semi-structured,
unstructured and polymorphic data.
• Applications now serve millions of users in different geo-locations, in
different time zones and have to be up and running all the time, with
data integrity maintained
• Applications are becoming more distributed with many moving
towards cloud computing.
• NoSQL plays a vital role in an enterprise application which needs to
access and analyze a massive set of data that is being made available
on multiple virtual servers (remote based) in the cloud infrastructure
and mainly when the data set is not structured.
• Hence, the NoSQL database is designed to overcome the
Performance, Scalability, Data Modelling and Distribution limitations
that are seen in the Relational Databases.
• What is Structured Data?
• Structured data is usually text files, with defined column titles and data in
rows.
• Such data can easily be visualized in form of charts and can be processed
using data mining tools.
•
What is Unstructured Data?
• Unstructured data can be anything like video file, image file, PDF, Emails
etc. What does these files have in common, nothing.
• Structured Information can be extracted from unstructured data, but the
process is time consuming.
• And as more and more modern data is unstructured, there was a need to
have something to store such data for growing applications, hence setting
path for NoSQL.
• NoSQL Database Types
• Following are the NoSQL database types :
• Document Databases : In this type, key is paired with a complex data
structure called as Document. Example : MongoDB
• Graph stores : This type of database is usually used to store networked data.
Where in we can relate data based on some existing data.
• Key-Value stores :
• These are the simplest NoSQL databases.
• In this each is stored with a key to identify it. In some Key-value databases,
we can even save the type of the data saved along, like in Redis.
• Wide-column stores :
• Used to store large data sets(store columns of data together). Example :
Cassandra(Used in Facebook), HBase etc.
• Some Advantages of NoSQL Databases
• Here we will be discussing some of the main advantages of NoSQL databases with examples.
• Dynamic Schemas
• In Relational Databases like Oracle, MySQL we define table structures, right? For example, if we
want to save records of Student Data, then we will have to create a table named Student, add
columns to it, like student_id, student_name etc, this is called defined schema, where in we define
the structure before saving any data.
• If in future we plan to add some more related data in our Student table, then we will have to add a
new column to our table. Which is easy, if we have less data in our tables, but what if we have
millions of records.
• Migration to the updated schema would be a hectic job.
• NoSQL databases solve this problem, as in a NoSQL database, schema definition is not required.
• Sharding
• In Sharding, large databases are partitioned into small, faster and easily
manageable databases.
• The (classic) Relational Databases follow a vertical architecture where in a
single server holds the data, as all the data is related.
• Relational Databases does not provide Sharding feature by default, to
achieve this a lot of efforts has to be put in, because transactional
integrity(Inserting/Updating data in transactions), Multiple table JOINS etc
cannot be easily achieved in distributed architecture in case of Relational
Databases.
• NoSQL Databases have the Sharding feature as default. No additional
efforts required.
• They automatically spread the data across servers, fetch the data in the
fastest time from the server which is free, while maintaining the integrity of
data.
• Replication
• Auto data replication is also supported in NoSQL databases by
default.
• Hence, if one DB server goes down, data is restored using its copy
created on another server in network.
•
Integrated Caching
• Many NoSQL databases have support for Integrated Caching, where in
the frequently demanded data is stored in cache to make the queries
faster.
MongoDB
• MongoDB is a document database. It stores data in a type of JSON
format called BSON.
• A record in MongoDB is a document, which is a data structure
composed of key value pairs similar to the structure of JSON objects.
• A MongoDB Document:
• Records in a MongoDB database are called documents, and the field
values may include numbers, strings, booleans, arrays, or even nested
documents.
• MongoDB is a No SQL database. It is an open-source, cross-platform, document-oriented database
written in C++.
• MongoDB is an open-source document database that provides high performance, high availability, and
automatic scaling.
• Mongo DB is a document-oriented database.
• "MongoDB is a scalable, open source, high performance, document-oriented database.“
• MongoDB is a document-oriented database. It is a key feature of MongoDB. It offers a
document-oriented storage. It is very simple you can program it easily.
• MongoDB stores data as documents, so it is known as document-oriented database.
FirstName = "John",
Address = "Detroit",
Spouse = [{Name: "Angela"}].
FirstName ="John",
Address = "Wick"
• Features of MongoDB:
These are some important features of MongoDB:
1. Support ad hoc queries
• In MongoDB, you can search by field, range query and it also supports regular expression searches.
2. Indexing
• You can index any field in a document.
3. Replication
• MongoDB supports Master Slave replication.
4. Duplication of data
• MongoDB can run over multiple servers. The data is duplicated to keep the system up and also keep its running
condition in case of hardware failure.
5. Load balancing
• It has an automatic load balancing configuration because of data placed in shards.
6. Supports map reduce and aggregation tools.
7. Uses JavaScript instead of Procedures.
8. It is a schema-less database written in C++.
9. Provides high performance.
10. Stores files of any size easily without complicating your stack.
Example Document
{
title: "Post Title 1",
body: "Body of post.",
category: "News",
likes: 1,
tags: ["news", "events"],
date: Date()
}
• use jayshree
• switched to db jayshree
• > show dbs
• admin 0.000GB
• config 0.000GB
• local 0.000GB
• > db.createcollection('student')
• 2018-10-19T00:27:10.536+0530 E QUERY [js] TypeError:
db.createcollection is not a function :
• @(shell):1:1
• > db.createCollection('student')
• { "ok" : 1 }
• > db.student.insert({rollno:1,fname:"Jayshree"})
• WriteResult({ "nInserted" : 1 })
db.student.insert([{rollno:1,fname:"Jayshree"},{rollno:2,fname:"Manisha",lname:"Ghahirwal"}])
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 2,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
> db.student.find()
{ "_id" : ObjectId("5bc8db4537069ea1f9e85044"), "rollno" : 1, "fname" : "Jayshree" }
{ "_id" : ObjectId("5bc8dbdf37069ea1f9e85045"), "rollno" : 1, "fname" : "Jayshree" }
{ "_id" : ObjectId("5bc8dbdf37069ea1f9e85046"), "rollno" : 2, "fname" : "Manisha", "lname" : "Ghahirwal" }
> show
• What is NoSQL?
• NoSQL Database is a non-relational Data Management System, that does
not require a fixed schema.
• It avoids joins, and is easy to scale.
• The major purpose of using a NoSQL database is for distributed data stores
with humongous data storage needs.
• NoSQL is used for Big data and real-time web apps.
• For example, companies like Twitter, Facebook and Google collect terabytes
of user data every single day.
• NoSQL database stands for “Not Only SQL” or “Not SQL.”
• Though a better term would be “NoREL”, NoSQL caught on. Carl Strozz
introduced the NoSQL concept in 1998.
• Traditional RDBMS uses SQL syntax to store and retrieve data for further
insights.
• Instead, a NoSQL database system encompasses a wide range of database
technologies that can store structured, semi-structured, unstructured and
polymorphic data.
• Let’s understand about NoSQL with a diagram in this NoSQL database
tutorial:
• Why NoSQL?
• The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data.
• The system response time becomes slow when you use RDBMS for massive
volumes of data.
• To resolve this problem, we could “scale up” our systems by upgrading our
existing hardware. This process is expensive.
• The alternative for this issue is to distribute database load on multiple
hosts whenever the load increases.
• This method is known as “scaling out.”
• Features of NoSQL
• Non-relational
• NoSQL databases never follow the relational model
• Never provide tables with flat fixed-column records
• Work with self-contained aggregates or BLOBs
• Doesn’t require object-relational mapping and data normalization
• No complex features like query languages, query planners,referential
integrity joins, ACID
• Schema-free
• NoSQL databases are either schema-free or have relaxed schemas
• Do not require any sort of definition of the schema of the data
• Offers heterogeneous structures of data in the same domain
• Simple API
• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON
• Mostly used no standard based NoSQL query language
• Web-enabled databases running as internet-facing services
• Distributed
• Multiple NoSQL databases can be executed in a distributed fashion
• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and throughput
• Mostly no synchronous replication between distributed nodes Asynchronous
Multi-Master Replication, peer-to-peer, HDFS Replication
• Only providing eventual consistency
• Shared Nothing Architecture.
• This enables less coordination and higher distribution.
• Types of NoSQL Databases
• NoSQL Databases are mainly categorized into four types: Key-value pair,
Column-oriented, Graph-based and Document-oriented.
• Every category has its unique attributes and limitations.
• None of the above-specified database is better to solve all the problems.
• Users should select the database based on their product needs.
• Types of NoSQL Databases:
• Key-value Pair Based
• Column-oriented Graph
• Graphs based
• Document-oriented
• NoSQL Data Architecture Patterns
• Architecture Pattern is a logical way of categorizing data that will be stored on the
Database.
• NoSQL is a type of database which helps to perform operations on big data and store
it in a valid format.
• It is widely used because of its flexibility and a wide variety of services.
• Architecture Patterns of NoSQL:
• The data is stored in NoSQL in any of the following four data architecture patterns.
1. Key-Value Store Database
2. Column Store Database
3. Document Database
4. Graph Database
• Key Value Pair Based
• Data is stored in key/value pairs.
• It is designed in such a way to handle lots of data and heavy load.
• Key-value pair storage databases store data as a hash table where
each key is unique, and the value can be a JSON, BLOB(Binary Large
Objects), string, etc.
• 1. Key-Value Store Database:
• This model is one of the most basic models of NoSQL databases.
• As the name suggests, the data is stored in form of Key-Value Pairs.
• The key is usually a sequence of strings, integers or characters but can also be a
more advanced data type.
• The value is typically linked or co-related to the key.
• The key-value pair storage databases generally store data as a hash table where
each key is unique.
• The value can be of any type (JSON, BLOB(Binary Large Object), strings, etc).
• This type of pattern is usually used in shopping websites or e-commerce
applications.
• Advantages:
• Can handle large amounts of data and heavy load,
• Easy retrieval of data by keys.
• Limitations:
• Complex queries may attempt to involve multiple key-value pairs which may delay
performance.
• Data can be involving many-to-many relationships which may collide.
• Examples:
• DynamoDB
• Berkeley DB
• Column-based
• Column-oriented databases work on columns and are based on BigTable paper by
Google.
• Every column is treated separately.
• Values of single column databases are stored contiguously.
• They deliver high performance on aggregation queries like SUM, COUNT,
AVG, MIN etc. as the data is readily available in a column.
• Column-based NoSQL databases are widely used to manage data
warehouses, business intelligence, CRM, Library card catalogs,
• HBase, Cassandra, HBase, Hypertable are NoSQL query examples of
column based database.
2. Column Store Database:
• Rather than storing data in relational tuples, the data is stored in individual
cells which are further grouped into columns.
• Column-oriented databases work only on columns.
• They store large amounts of data into columns together.
• Format and titles of the columns can diverge from one row to other. Every
column is treated separately.
• But still, each individual column may contain multiple other columns like
traditional databases.
• Basically, columns are mode of storage in this type.
• Advantages:
• Data is readily available
• Queries like SUM, AVERAGE, COUNT can be easily performed on
columns.
• Examples:
• HBase
• Bigtable by Google
• Cassandra
• Document-Oriented:
• Document-Oriented NoSQL DB stores and retrieves data as a key value
pair but the value part is stored as a document.
• The document is stored in JSON or XML formats.
• The value is understood by the DB and can be queried.
• 3. Document Database:
• The document database fetches and accumulates data in form of key-value
pairs but here, the values are called as Documents.
• Document can be stated as a complex data structure.
• Document here can be a form of text, arrays, strings, JSON, XML or any
such format.
• The use of nested documents is also very common.
• It is very effective as most of the data created is usually in form of JSONs
and is unstructured.
• Advantages:
• This type of format is very useful and apt for semi-structured data.
• Storage retrieval and managing of documents is easy.
• Limitations:
• Handling multiple documents is challenging
• Aggregation operations may not work accurately.
• Examples:
• MongoDB
• CouchDB
• Graph-Based
• A graph type database stores entities as well the relations amongst those
entities.
• The entity is stored as a node with the relationship as edges.
• An edge gives a relationship between nodes.
• Every node and edge has a unique identifier.
• Compared to a relational database where tables are loosely connected, a
Graph database is a multi-relational in nature.
• Traversing relationship is fast as they are already captured into the DB,
and there is no need to calculate them.
• Graph base database mostly used for social networks, logistics, spatial
data.
• Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based
databases.
4. Graph Databases:
• Clearly, this architecture pattern deals with the storage and management of data
in graphs.
• Graphs are basically structures that depict connections between two or more
objects in some data.
• The objects or entities are called as nodes and are joined together by
relationships called Edges.
• Each edge has a unique identifier.
• Each node serves as a point of contact for the graph.
• This pattern is very commonly used in social networks where there are a large
number of entities and each entity has one or many characteristics which are
connected by edges.
• The relational database pattern has tables that are loosely connected, whereas
graphs are often very strong and rigid in nature.
• Advantages:
• Fastest traversal because of connections.
• Spatial data can be easily handled.
• Limitations:
• Wrong connections may lead to infinite loops.
• Examples:
• Neo4J
• FlockDB( Used by Twitter)
Figure – Graph model format of NoSQL Databases
• Advantages of NoSQL
• Can be used as Primary or Analytic Data Source
• Big Data Capability
• No Single Point of Failure
• Easy Replication
• No Need for Separate Caching Layer
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured data with equal effect
• Object-oriented programming which is easy to use and flexible
• NoSQL databases don’t need a dedicated high-performance server
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• It can serve as the primary data source for online applications.
• Handles big data which manages data velocity, variety, volume, and complexity
• Excels at distributed database and multi-data center operations
• Eliminates the need for a specific caching layer to store data
• Offers a flexible schema design which can easily be altered without downtime or
service disruption
• Query Mechanism tools for NoSQL
• The most common data retrieval mechanism is the REST-based retrieval
of a value based on its key/ID with GET resource
• Document store Database offers more difficult queries as they
understand the value in a key-value pair.
• For example, CouchDB allows defining views with MapReduce
• What is the CAP Theorem?
• CAP theorem is also called brewer’s theorem.
• It states that is impossible for a distributed data store to offer more
than two out of three guarantees
• Consistency
• Availability
• Partition Tolerance
• The CAP theorem (also called Brewer’s theorem) states that a
distributed database system can only guarantee two out of these
three characteristics: Consistency, Availability, and Partition
Tolerance.
• Consistency:
• The data should remain consistent even after the execution of an
operation.
• This means once data is written, any future read request should contain
that data.
• For example, after updating the order status, all the clients should be able
to see the same data.
• Availability:
• The database should always be available and responsive.
• It should not have any downtime.
• Partition Tolerance:
• Partition Tolerance means that the system should continue to function
even if the communication among the servers is not stable.
• For example, the servers can be partitioned into multiple groups which
may not communicate with each other.
• Here, if part of the database is unavailable, other parts are always
unaffected.