NoSQL Database
Group paper presentation
INFSYS 6849 Group 1
Vijaya Madhuri Puli
Xinyi Xu
Ashma Singh
What is NoSQL?
NoSQL Database is a non-relational Data Management System which needs no fixed schema.
NoSQL database stands for "Not Only SQL (Structured Query Language)" or "Not SQL." The
major purpose of using a NoSQL database is for distributed data stores with massive data storage
needs. Developers are given massive flexibility through NoSQL to store huge volumes of
unstructured data. The data storage model in SQL is form of tables with rows and columns, while
in NoSQL it is JSON documents, Key-value: key-value pairs, Wide-column: tables with rows
and dynamic columns, Graph: nodes and edges. The NoSQL schemas are quite flexible whereas
the SQL schema are rigid. NoSQL avoids joins, and is easy to scale. SQL is based on schema on
write while the NoSQL follows schema on read.
Brief History of NoSQL Databases
1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
2000- Graph database Neo4j is launched
2004- Google BigTable is launched
2005- CouchDB is launched
2007- The research paper on Amazon Dynamo is released
2008- Facebooks open sources the Cassandra project
2009- The term NoSQL was reintroduced
The BASE Model:
We all know that SQL has ACID (Atomicity, Consistency, Isolation, and Durability) properties.
NoSQL relies on BASE model which means Basically Available Soft State Eventually
Consistent.
Basically Available:
Basically available could visit the perceived availability of the data. If one node fails, a part
of the data won't be available, but the complete data layer stays operational. It achieves this
by employing a highly distributed approach to database management. Rather
than maintaining one large data store and concentrating on the fault tolerance of that store,
NoSQL databases spread data across many storage systems with a high degree of replication. In
a worst scenario where a failure disrupts access to segment of data, this doesn't necessarily end
in a whole database outage.
Soft state:
The state of the data could change without application interactions thanks to eventual
consistency. Due to the lack of immediate consistency, data values may change over time. It is
not necessary that stores must be in write-consistent or different replicas must be mutually
consistent all the time. Soft state means the system state may change without giving an input.
Eventually Consistent:
Eventual consistency means that the system will become consistent over time. The system will
be eventually consistent after the application input. The data will be replicated to different nodes
and would eventually reach a consistent state. But the consistency is not guaranteed at a
transaction level.
Types of NoSQL:
Key Value Pair Based:
A Key-Value Store works very in a very different way than a relational database. Every data
element in the database is stored as a key value pair consisting of an attribute name (key) and a
value. In a sense, a key-value store is like a relational database with only two columns: the key or
attribute name and the value.
Example: Redis
Column-based:
Every column is treated separately. The values of single column databases are stored alongside.
Column based uses tables, rows, and columns, but unlike relational databases, names and
formats of the columns can change from row to row within the same table. They are more
flexible.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs.
Example: Cassandra
Document-Oriented:
A document database stores data in JSON, BSON, or XML documents. In a document database,
documents can be nested. Specific elements can be indexed for faster querying.
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document type is usually used for CMS systems, blogging platforms,
real-time analytics & e-commerce applications.
Example: MongoDB
Graph-Based:
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier. Only a few real-world business systems can persist
solely on graph queries. As a result graph databases are usually run in conjunction with other
more traditional databases.
Use cases include fraud detection, social networks, and knowledge graphs.
Example: Neo4J
When and why should we use NoSQL:
The system response time becomes slow when you use relational database management systems
for massive volumes of data. To resolve this issue, we could try scaling up our systems by
upgrading our existing hardware. This process is expensive. The alternative for this issue is to
distribute database load on multiple hosts whenever the load increases. This method is known as
“scaling out.”
NoSQL databases were developed during the Internet era in response to the incapability of SQL
databases to address the requirements of web scale applications that handled huge volumes of
data and traffic.
The speed of development with NoSQL databases could be much faster when compared
to SQL database.
The structure of many different forms of data is more easily handled and evolved with a
NoSQL database.
The amount of data in many applications cannot be served at reasonable cost by a SQL
database.
The scale of traffic and need for zero downtime cannot be handled by SQL.
New application paradigms can be more easily supported.
Features of NoSQL:
Flexible data models
NoSQL databases typically have very flexible schemas. A flexible schema allows you to
easily make changes to your database as requirements change. You can iterate quickly
and continuously integrate new application features to provide value to your users faster.
Horizontal scaling
Most NoSQL databases allow you to scale-out horizontally, meaning you can add
cheaper, commodity servers whenever you need to. In case of SQL databases, they
necessitate you to scale-up vertically when you exceed the capacity requirements of your
current server.
Fast queries
Data in SQL databases is typically normalized, so queries for a single object or entity
require you to join data from multiple tables. As the volume of data increases, we should
use complex joins across multiple tables. However, data in NoSQL databases is typically
stored in a way that is optimized for queries. In most cases, data is that is accessed
together should be stored together. Queries do not require joins, so the queries are very
fast.
Easy for developers
Some NoSQL databases like MongoDB map their data structures to those of popular
programming languages. This mapping allows developers to store their data in the same
way that they use it in their application code which allows developers to write less code
leading to quick development and fewer bugs.
New technical issues that NoSQL address:
Relational databases were introduced in the era of mainframes and business applications – long
before the internet, the cloud, big data, mobile, and today’s massively interactive enterprise.
The internet is connecting everything:
Supporting many diverse things with different data structures, different hardware/software
updates, generating variety of data and continuous streams of real-time data.
More customers are going online:
The necessity for scaling to support millions of users, meeting UX requirements with consistent
high performance, maintaining availability 24/7.
Big data is getting bigger:
Storing customer generated semi-structured/unstructured data, different types of data from
multiple varied sources, together, storing data generated by millions of customers/products.
Applications are moving to the cloud:
Scaling on demand to support increasing number of customers, store additional data, operating
applications on a global scale, customers worldwide and minimizing infrastructure costs,
achieving a rapid time to market.
The world has gone mobile:
Creating offline apps for which network connection is not required, synchronizing mobile data
with remote databases in the cloud and supporting multiple mobile platforms with a single
backend.
Scaling in NoSQL
Database scaling depicts the ability to scale up or out a database in order to hold more data
without losing efficiency. Scaling out usually entails sharing the database across several database
servers in a distributed cluster, while scaling up entails
increasing the database server's computing power and
resources.
NoSQL uses horizontal scaling as it achieves scales by
increasing the number of servers. It is based on the
partitioning of data such as each node containing part of
data. One of the reason users prefer Horizontal scaling is, it
allows the user to scale as many servers as possible in parallel. In results, user can store more
data but in contrast, it will also pass on the problems of a distributed system. The lack of a public
address space complicates data sharing in distributed computing. It also increases the expense of
exchanging, passing, or modifying data because copies of the data must be passed.
Difference between SQL and NoSQL
The main difference between SQL and NoSQL are primarily relational and non-relational or
distributed database. SQL databases are organized query languages that describe and manipulate
data (SQL). It is also one of the most adaptable and broadly- used database, making it a safe
venture, especially for large, complex queries. It can be restrictive at the same time. For instance,
SQL necessarily needs the user to use predefined schemas to decide the structure of user’s data
before working. On top of that, it needs to follow the same structure to prevent any difficulties or
any disruption in the system. On the other hand, NoSQL has dynamic schema for unstructured
data. Data can be stored in a variety of ways, including document-oriented, column-oriented,
graph-based, or as a KeyValue store. Because of this versatility, documents can be produced
without first defining their structure. Each document may also have its own structure.
In terms of scalability, SQL databases are vertically scalable in most of the cases whereas
NoSQL databases are horizontally scalable. This also depicts that adding more servers and
sharing to NoSQL database can handle more traffic. Moreover, NoSQL database can become
huge and powerful and could be the preferred choice for large data sets.
SQL databases are usually table- based while NoSQL databases comes in multiple structures
such as, key- value pair, document- based, graph database or wide-column stores. As a result,
relational SQL databases are a safer choice for applications that include multi-row transactions,
such as accounting systems, or for legacy systems that were designed with a relational structure
in mind.
SQL database follow property such as Atomicity, Consistency, Isolation and Durability (ACID)
while NoSQL database follow property with Consistency, Availability and Partition tolerance
(CAP). CAP is also known as Brewers CAP theorem.
As far as support for SQL database, they have excellent support available. Whereas NoSQL
database must depend on the community support and limited outside experts for setting up or
deployment of large-scale deployments. MySQL, Microsoft SQL Server, Oracle and PostgreSQL
are few examples of SQL databases whereas RavenDB, Cassandra, MongoDB, Neo4j,
CouchDB, Redis, Big Table and HBase are examples of NoSQL databases.
SQL vs NoSQL Terminology
Picture shown below are few terminologies used by NoSQL with SQL.
Challenges of NoSQL
The promise of the NoSQL database has generated a lot of enthusiasm, but there are many
obstacles to overcome before they can appeal to mainstream enterprises. Here are a few of the
top challenges.
Maturity
RDBMS systems have been around for a long time. NoSQL advocates will argue that their
advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is
reassuring. For the most part, RDBMS systems are stable and richly functional. In comparison,
most NoSQL alternatives are in pre-production versions with many key features yet to be
implemented. Living on the technological leading edge is an exciting prospect for many
developers, but enterprises should approach it with extreme caution.
Expertise
There are literally millions of developers throughout the world, and in every business segment,
who are familiar with RDBMS concepts and programming. In contrast, almost every NoSQL
developer is in a learning mode. This situation will address naturally over time, but for now, it's
far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.
Analytics and Business Intelligence
Business intelligence (BI) is a key IT issue for all medium to large companies. NoSQL databases
offer few facilities for ad-hoc query and analysis. Even a simple query requires significant
programming expertise, and commonly used BI tools do not provide connectivity to NoSQL.
Some relief is provided by the emergence of solutions which can provide easier access to data
held in Hadoop clusters and perhaps eventually, other NoSQL databases. Quest Software has
developed a product -- Toad for Cloud Databases -- that can provide ad-hoc query capabilities to
a variety of NoSQL databases.
Support and Administration
The design goals for NoSQL may be to provide a zero-admin solution, but the current reality
falls well short of that goal. NoSQL today requires a lot of skill to install and a lot of effort to
maintain.
Enterprises want the reassurance that if a key system fails, they will be able to get timely and
competent support. All RDBMS vendors go to great lengths to provide a high level of enterprise
support.
In contrast, Each NoSQL database in contrast tends to be open-source, with just one or two firms
handling the support angle. Many of them have been developed by smaller startups which lack
the resources to fund support on a global scale, and also the credibility that the established
RDBMS vendors like Oracle, IBM and Microsoft enjoy.
Advantages
Regardless of these obstacles, NoSQL databases have been widely adopted in many enterprises
for the following reasons:
Scalability and Economic Advantage
RDBMSs are not as easy to scale out on commodity clusters, whereas NoSQL databases are
made for transparent expansion, taking advantage of new nodes. These databases are designed
for use with low-cost commodity hardware. For years, database administrators have relied on
scale up - buying bigger servers as database load increases, rather than scale out - distributing the
database across multiple hosts as load increases. As transaction rates and availability
requirements increase, and as databases move into the cloud or onto virtualized environments,
the economic advantages of scaling out on commodity hardware become irresistible. NoSQL
databases typically use clusters of cheap commodity servers to manage the exploding data and
transaction volumes. The result is that the cost per gigabyte or transaction/second for NoSQL can
be many times less than the cost for RDBMS, allowing you to store and process more data at a
much lower price.
Big Data Applications
Given that transaction rates are growing from recognition, there is need to store massive volumes
of data. While RDBMSs have grown to match the growing needs, but it’s difficult to realistically
use one RDBMS to manage such data volumes. These volumes are however easily handled by
NoSQL databases.
Flexible Data Model
Change management is a big headache for large production RDBMS. Even minor changes to the
data model of an RDBMS have to be carefully managed and may necessitate downtime or
reduced service levels.
NoSQL databases have far more relaxed -- or even nonexistent -- data model restrictions.
NoSQL Key Value stores and document databases allow the application to store virtually any
structure it wants in a data element. Even the more rigidly defined BigTable-based NoSQL
databases (Cassandra, HBase) typically allow new columns to be created without too much fuss.
The result is that application changes and database schema changes do not have to be managed
as one complicated change unit. In theory, this will allow applications to iterate faster, though,
clearly, there can be undesirable side effects if the application fails to manage data integrity.
Database Administration
The best RDBMSs require the services of expensive administrators to design, install and
maintain the systems. On the other hand, NoSQL databases require much less hands-on
management, with data distribution and auto repair capabilities, simplified data models and
fewer tuning and administration requirements. However, in practice, someone will always be
needed to take care of performance and availability of databases.
Broad Functionality
Most relational databases support the same features but in a slightly different way, so they are all
similar.
NoSQL databases, in contrast, come in four core types: key-value, column based, document, and
graph based. Within these types, you can find a database to suit your particular needs. With so
much choice, you’re bound to find a NoSQL database that will solve your application needs.
NoSQL databases provide support for a range of data structures.
• Simple binary values, lists, maps, and strings can be handled at high speed in key-value
stores.
• Related information values can be grouped in column families within Bigtable clones.
• Highly complex parent-child hierarchal structures can be managed within document
databases.
• A web of interrelated information can be described flexibly in graph based stores.
Popular NoSQL Databases Examples
The following list describes popular NoSQL databases:
MongoDB: The most popular open-source NoSQL system. MongoDB is a document-oriented
database that stores JSON-like documents in dynamic schemas. Craigslist, eBay, and Foursquare
use MongoDB.
CouchDB: An open source, web-oriented database developed by Apache. CouchDB uses the
JSON data exchange format to store its documents; JavaScript for indexing, combining, and
transforming documents; and HTTP for its API.
HBase: An open source Apache project that was developed as a part of Hadoop. HBase is a
column store database written in Java.
Oracle NoSQL Database: Oracle’s NoSQL database, different from its relational database
product.
Cassandra DB: A distributed database that excels at handling extremely large amounts of
structured data. Cassandra DB is also highly scalable. Cassandra DB was created at Facebook. It
is used by Instagram, Comcast, Apple, and Spotify.
Riak: An open source, key-value store database written. Riak has built-in fault-tolerance
replication and automatic data distribution that enable it to offer excellent performance.
InfoGrid: An open source web graph database for creating RESTful Web applications that use
graphed data.
InfiniteGraph: A highly specialized graph database that focuses on graph data structures.
InfiniteGraph is useful for finding hidden relationships in big data. It is implemented in Java.
Conclusion
NoSQL databases have one important thing in common: they do not rely on the traditional row-
and-column schema that relational databases use. NoSQL databases are becoming an
increasingly important part of the database landscape, and when used appropriately, can offer
real benefits. However, enterprises should proceed with caution with full awareness of the
legitimate limitations and issues that are associated with these databases.
References
Difference between SQL and NoSQL. (2020, December 23). Retrieved April 19, 2021, from
https://www.geeksforgeeks.org/difference-between-sql-and-nosql/
Lim, Z. (2020, April 28). How to scale sql and nosql databases. Retrieved April 19, 2021, from
https://betterprogramming.pub/scaling-sql-nosql-databases-1121b24506df
Fowler, A. (2012). Dummies. Retrieved from Ten Advantages of NoSQL over RDBMS:
https://www.dummies.com/programming/big-data/10-advantages-of-nosql-over-rdbms/
Harrison, G. (2010, 08 26). Ten things you should know about NoSQL databases. Retrieved from
TechRepublic: https://www.techrepublic.com/blog/10-things/10-things-you-should-know-
about-nosql-databases/
Mishra, S. (2018, 05 07). Examples of RDBMS and NoSQL databases. Retrieved from Rackspace
Technology: https://docs.rackspace.com/support/how-to/examples-of-rdbms-and-nosql-
databases/
DATAVERSITY. (n.d.). A Brief History of Non-Relational Databases - DATAVERSITY. Retrieved from
DATAVERSITY: https://www.dataversity.net/a-brief-history-of-non-relational-databases/
Medium. (n.d.). Medium. Retrieved from 4 Types of NoSQL Databases: https://medium.com/swlh/4-
types-of-nosql-databases-d88ad21f7d3b
MongoDB. (n.d.). What is NoSQL? NoSQL Databases Explained. Retrieved from MongoDB:
https://www.mongodb.com/nosql-explained
What is NoSQL? | Nonrelational Databases, Flexible Schema Data Models | AWS. (n.d.). Retrieved from
Amazon Web Services, Inc.: https://aws.amazon.com/nosql/