Introduction to NoSQL Database
What is RDBMS
2
RDBMS : the relational
database management system.
Relation: a relation is a 2D
table which has the following
features:
Name
Attributes
Tuple
NAME
Issues with RDBMS- Scalability
3
Issues with scaling up when the
dataset is just too big e.g. Big
Data.
Not designed to be distributed.
Looking at multi-node database
solutions. Known as ‘horizontal
scaling’.
Different approaches include:
Master-slave
Sharding
4
Scaling RDBMS
Master-Slave Sharding
All writes are written to the
master. All reads are Scales well for both
performed against the reads and writes.
replicated slave databases. Not transparent,
Critical reads may be application needs to be
incorrect as writes may not partition-aware.
have been propagated Can no longer have
down. relationships or joins
Large data sets can pose across partitions.
problems as master needs to Loss of referential
duplicate data to slaves. integrity across shards.
What is NoSQL
5
Stands for Not Only SQL. Term was redefined by Eric
Evans after Carlo Strozzi.
Class of non-relational data storage systems.
Do not require a fixed table schema nor do they use the
concept of joins.
Relaxation for one or more of the ACID properties
(Atomicity, Consistency, Isolation, Durability) using
CAP theorem.
Need of NoSQL
6
Explosion of social media sites (Facebook, Twitter, Google etc.)
with large data needs. (Sharding is a problem)
Rise of cloud-based solutions such as Amazon S3 (simple
storage solution).
Just as moving to dynamically-typed languages (Ruby/Groovy),
a shift to dynamically-typed data with frequent schema
changes.
Expansion of Open-source community.
NoSQL solution is more acceptable to a client now than a year
NoSQL Types
7
NoSQL database are classified into four types:
• Key Value pair based
• Column based
• Document based
• Graph based
Key Value Pair Based
8
Designed for processing dictionary. Dictionaries
contain a collection of records having fields
containing data.
Records are stored and retrieved using a key that
uniquely identifies the record, and is used to quickly
find the data within the database.
Example: CouchDB, Oracle NoSQL Database, Riak etc.
We use it for storing session information, user profiles,
preferences, shopping cart data.
We would avoid it when we need to query data having
relationships between entities.
Column based
9 It store data as Column families containing rows that have
many columns associated with a row key.
each row can have different columns.
Column families are groups of related data that
is accessed together.
Example: Cassandra, HBase, Hypertable,
and Amazon DynamoDB.
We use it for content management systems,
blogging platforms, log aggregation.
We would avoid it for systems that are in early development,
changing query patterns.
Document Based
The database stores and retrieves documents. It stores
10 documents in the value part of the key-value store.
Self- describing, hierarchical tree data structures
consisting of maps, collections, and scalar values.
Example: Lotus Notes, MongoDB, Couch DB, Orient
DB, Raven DB.
We use it for content management systems, blogging
platforms, web analytics, real-time analytics,
e- commerce applications.
We would avoid it for systems that need complex
transactions spanning multiple operations or queries
against varying aggregate structures.
Graph Based
11
Store entities and relationships between
these entities as nodes and edges of a graph
respectively. Entities have properties.
Traversing the relationships is very fast as
relationship between nodes is not
calculated at query time but is actually
persisted as a relationship.
Example: Neo4J, Infinite Graph, OrientDB,
FlockDB.
It is well suited for connected data, such as
social networks, spatial data, routing
information for goods and supply.
Features of NoSQL
• Design simplicity
• Simpler horizontal scaling to clusters of machines. This was a
problem in relational databases
• More control over data availability.
Advantages of NoSQL
• Data Storage
• Support for unstructured text
• Ability to handle change over time
• No reliance on SQL magic
• Ability to scale horizontally on commodity hardware
• Support for multiple data structures
• Big data application
• Economy
What is not provided by NoSQL
14
• Joins
• Group by
• ACID transactions
• SQL
• Integration with applications that are based on SQL
Where to use NoSQL
15
• NoSQL Data storage systems makes sense for applications that
process very large semi-structured data –like Log Analysis, Social
Networking Feeds, Time-based data.
• To improve programmer productivity by using a database that
better matches an application's needs.
• To improve data access performance via some combination of
handling larger data volumes, reducing latency, and improving
throughput.
Summary
• All the choices provided by the rise of NoSQL databases does
not mean the demise of RDBMS databases as Relational
databases are a powerful tool.
• We are entering an era of Polyglot persistence, a technique that
uses different data storage technologies to handle varying data
storage needs. It can apply across an enterprise or within an
individual application.
MongoDB
MongoDB is a No SQL database. It is an open-
source, cross-platform, document-oriented database
written in C++.
What is MongoDB
• MongoDB is a scalable, open source, high
performance, document-oriented database
• Mongo DB is a document-oriented database that
provides high performance, high availability, and
automatic scaling.
Purpose of building MongoDB
"what was the need of MongoDB although there
were many databases in action?"
Purpose of building MongoDB
• Scalability
• Performance
• High Availability
• Scaling from single server deployments to large,
complex multi-site architectures.
• Key points of MongoDB
• Develop Faster
• Deploy Easier
• Scale Bigger
Example of document oriented database
FirstName = "John",
Address = "Detroit",
Spouse = [{Name: "Angela"}].
FirstName ="John",
Address = "Wick"
Features of MongoDB
• Support ad hoc queries
• Indexing
• Replication
• Duplication of data
• Load balancing
• Supports map reduce and aggregation tools.
• Uses JavaScript instead of Procedures.
• It is a schema-less database written in C++.
• Provides high performance
• Stores files of any size easily without complicating
your stack.
MongoDB advantages over RDBMS
MongoDB Advantages
Distinctive features of MongoDB
Where MongoDB should be used ?
Performance analysis of MongoDB and RDBMS
XML
• XML stands for Extensible Markup Language
• XML tags identify the data and are used to store
and organize the data
Characteristics of XML
• XML is extensible
• XML carries the data, does not present it
• XML is a public standard
• XML can be used to create new internet languages
Advantages of XML
• XML separates data from HTML
• XML simplify data sharing
• XML simplify data transport
• XML simplify Platform charge
• XML increase data availability
• XML can be used to create new internet
languages
XML - Tags
• XML tags form the foundation of XML.
• They define the scope of an element in
XML.
• They can also be used to insert comments,
declare settings required for parsing the
environment, and to insert special
instructions.
Start Tag End Tag
<address> </address>
XML - Elements
XML elements can be defined as building blocks
of an XML. Elements can behave as containers
to hold text, elements, attributes, media objects
or all of these.
<element-name attribute1 attribute2>
....content
</element-name>
XML DTD
DTD stands for Document Type Definition. It
defines the legal building blocks of an XML
document. It is used to define document structure
with a list of legal elements and attributes
Purpose of DTD
Its main purpose is to define the structure of an
XML document. It contains a list of legal elements
and define the structure with the help of them.
employee.xml
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
XML Schema
XML schema is a language which is used for
expressing constraint about XML documents.
An XML schema is used to define the structure of
an XML document. It is like DTD but provides
more control on XML structure.
XML Schema Example
employee.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.javatpoint.com"
xmlns="http://www.javatpoint.com"
elementFormDefault="qualified">
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XML Schema Example
employee.xml
<?xml version="1.0"?>
<employee
xmlns="http://www.javatpoint.com"
xmlns:xsi="http://www.w3.org/2001/
XMLSchema-instance"
xsi:schemaLocation="http://
www.javatpoint.com employee.xsd">
<firstname>vimal</firstname>
<lastname>jaiswal</lastname>
<email>vimal@javatpoint.com</email>
</employee>
DTD vs XSD
DTD XSD
1) DTD stands for Document XSD stands for XML Schema
Type Definition. Definition.
2) DTDs are derived XSDs are written in XML.
from SGML syntax.
3) DTD doesn't support XSD supports datatypes for
datatypes. elements and attributes.
4) DTD doesn't support XSD supports namespace.
namespace.
5) DTD doesn't define XSD defines order for child
order for child elements. elements.
6) DTD is not extensible. XSD is extensible.
7) DTD is not simple to XSD is simple to learn because
learn. you don't need to learn new
language.
8) DTD provides less XSD provides more control on
control on XML structure. XML structure.
XQuery
XQuery is a functional query language which is built
on XPath expressions.
XQuery is a functional query language used to
retrieve information stored in XML format. It is
same as for XML what SQL is for databases. It was
designed to query XML data.
"XQuery is a standardized language for combining
documents, databases, Web pages and almost anything
else. It is very widely implemented. It is powerful and
easy to learn. XQuery is replacing proprietary
middleware languages and Web Application
development languages. XQuery is replacing complex
Java or C++ programs with a few lines of code.
XQuery is simpler to work with and easier to maintain
than many other alternatives."
XQuery is a functional language which is responsible
for finding and extracting elements and attributes from
XML documents.
It can be used for following things
To extract information to use in a web service.
To generates summary reports.
To transform XML data to XHTML.
Search Web documents for relevant information.
XQuery Features
• Query is a functional language. It is used to retrieve
and query XML based data.
• XQuery is expression-oriented programming language
with a simple type system.
• XQuery is analogous to SQL. For example: As SQL is
query language for databases, same as XQuery is
query language for XML.
• XQuery is XPath based and uses XPath expressions to
navigate through XML documents.
• XQuery is a W3C standard and universally supported
by all major databases.
Advantages of XQuery
• XQuery can be used to retrieve both hierarchal and
tabular data.
• XQuery can also be used to query tree and graphical
structures.
• XQuery can used to build web pages.
• XQuery can be used to query web pages.
• XQuery is best for XML-based databases and
object-based databases. Object databases are much
more flexible and powerful than purely tabular
databases.
• XQuery can be used to transform XML documents
into XHTML documents.
XQuery vs XPath
XQuery XPath
1) XQuery is a functional programming XPath is a xml path language that is
and query language that is used to used to select nodes from an xml
query a group of XML data. document using queries.
2) XQuery is used to extract and XPath is used to compute values like
manipulate data from either xml strings, numbers and boolean types
documents or relational databases from another xml documents.
and ms office documents that
support an xml data source.
3) xquery is represented in the form of xpath is represented as tree
a tree model with seven nodes, structure, navigate it by selecting
namely processing instructions, different nodes.
elements, document nodes,
attributes, namespaces, text nodes,
and comments.
4) xquery supports xpath and extended xpath is still a component of query
relational models. language.
5) xquery language helps to create xpath was created to define a
syntax for new xml documents. common syntax and behavior model
for xpointer and xslt.
XQuery vs XSLT
• XQuery is program driven while XSLT is document-driven.
• XQuery is declarative while XSLT is functional.
• XSLT is written in XML while XQuery is not written in XML.
• XQuery is used only for simple transformations while XSLT is a language that
was especially designed to process tree structures.
• XQuery is not as much powerful and sophisticated as XSLT which is still best to
retrieve results in tree structure.
• XQuery is good to access XML database and extract the necessary XML nodes
while XSLT is used to transform XML documents.
• XQuery is designed for retrieving and interpreting information according to the
specification. It is very flexible to query a broad spectrum of XML information
sources, like XML databases and XML documents while XSLT is mainly designed
for transforming the XML documents.
• XQuery is considered easier to learn while XSLT is comparatively difficult.
• XQuery is shorter, faster and more elegant for huge data jobs while XSLT may
be difficult to maintain unless you carefully designed your stylesheet.
JSON
• JSON stands for JavaScript Object Notation.
• JSON is lightweight data-interchange format.
• JSON is easy to read and write than XML.
• JSON is language independent.
• JSON supports array, object, string, number and values
first.json
{"employees":[
{"name":"Sonoo", "email":"sonoojaiswal1987@gmail.com"},
{"name":"Rahul", "email":"rahul32@gmail.com"},
{"name":"John", "email":"john32bob@gmail.com"}
]}
Features of JSON
• Simplicity
• Openness
• Self Describing
• Internationalization
• Extensibility
• Interoperability
JSON vs XML
No. JSON XML
1) JSON stands for JavaScript Object XML stands for eXtensible Markup
Notation. Language.
2) JSON is simple to read and write. XML is less simple than JSON.
3) JSON is easy to learn. XML is less easy than JSON.
4) JSON is data-oriented. XML is document-oriented.
5) JSON doesn't provide display XML provides the capability to display data
capabilities. because it is a markup language.
6) JSON supports array. XML doesn't support array.
7) JSON is less secured than XML. XML is more secured.
8) JSON files are more human readable XML files are less human readable.
than XML.
9) JSON supports only text and number XML support many data types such as text,
data type. number, images, charts, graphs etc.
Moreover, XML offeres options for
transferring the format or structure of the
data with actual data.
JSON Example
A JSON object contains data in the form of key/value pair.
The keys are strings and the values are the JSON types. Keys
and values are separated by colon. Each entry (key/value
pair) is separated by comma.
{
"employee": {
"name": "sonoo",
"salary": 56000,
"married": true
}
}
PHP JSON
PHP json_encode
string json_encode ( mixed $value [, int $options = 0 [, int $depth = 512 ]] )
<?php
$arr = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5);
echo json_encode($arr);
?>
Output
{"a":1,"b":2,"c":3,"d":4,"e":5}
PHP JSON
PHP json_decode
mixed json_decode ( string $json [, bool $assoc = false [, int $depth = 512 [, int $options = 0 ]]] )
<?php
$json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
var_dump(json_decode($json, true));//
true means returned object will be converted into associative array
?>
array(5) {
["a"] => int(1)
["b"] => int(2)
Output
["c"] => int(3)
["d"] => int(4)
["e"] => int(5) }
Java JSON
The json.simple library allows us to read and write JSON data
in Java. In other words, we can encode and decode JSON
object in java using json.simple library.
The org.json.simple package contains important classes for
JSON API.
JSONValue
JSONObject
JSONArray
JsonString
JsonNumber
Java JSON Encode
import org.json.simple.JSONObject;
public class JsonExample1{
public static void main(String args[])
{
JSONObject obj=new JSONObject();
obj.put("name","sonoo");
obj.put("age",new Integer(27));
obj.put("salary",new Double(600000));
System.out.print(obj);
}}
Output {"name":"sonoo","salary":600000.0,"age":27}
Java JSON Decode
import org.json.simple.JSONObject;
import org.json.simple.JSONValue;
public class JsonDecodeExample1 {
public static void main(String[] args) {
String s="{\"name\":\"sonoo\",\"salary\":600000.0,\"age\":27}";
Object obj=JSONValue.parse(s);
JSONObject jsonObject = (JSONObject) obj;
String name = (String) jsonObject.get("name");
double salary = (Double) jsonObject.get("salary");
long age = (Long) jsonObject.get("age");
System.out.println(name+" "+salary+" "+age);
}
}
Output sonoo 600000.0 27
Hadoop
The most well known technology used for Big Data is
Hadoop.
It is actually a large scale batch data processing system
Famous Hadoop users
HADOOP ARCHITECTURE
Advantages
•Huge amounts of any kind of data can be stored and processed
quickly
•Computing Power
•Fault Tolerance
•Flexibility
•Low Cost
•Scalability
Hadoop MAPREDUCE
Map reduce implementation:
Job Tracker:
Splitting into map and reduce tasks
Scheduling tasks on a cluster node
Task Tracker:
Runs Map Reduce tasks
periodically
What is HDFS ?
•Distributed file system
•Traditional hierarchical file organization
•Single namespace for the entire cluster
•Write-once-read-many access model
•Aware of the network topology
HDFS