0% found this document useful (0 votes)

17 views61 pages

Nosql Module 4

Document databases store and manage documents in formats like XML and JSON, allowing for flexible schemas where documents can vary in structure. MongoDB, a popular document database, utilizes replica sets for high availability and offers features like WriteConcern for managing data consistency. The document model supports complex queries and can scale horizontally for both read and write operations, making it suitable for applications like event logging and content management systems.

Uploaded by

Raghavendra gs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views61 pages

Nosql Module 4

Uploaded by

Raghavendra gs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Document

Databases
MODULE 4,CLASS 1
• Documents are the main concept in document databases.

• The database stores and retrieves documents, which can be XML, JSON,

BSON, and so on.

• These documents are self-describing, hierarchical tree data structures

which can consist of maps, collections, and scalar values.

• The documents stored are similar to each other but do not have to be

exactly the same.

• Document databases store documents in the value part of the key-value

store; think about document databases as key-value stores where the value

is examinable.
Oracle and
MongoDB
• The _id is a special field that
is found on all documents in
Mongo, just like ROWID in
Oracle.

• In MongoDB, _id can be

assigned by the user, as long
as it is unique.
What Is a Document Database?
DOCUMENT-2
DOCUMENT- 1
• Looking at the documents, we can see that they are similar, but have
differences in attribute names.

• This is allowed in document databases.

• The schema of the data can differ across documents, but these documents
can still belong to the same collection—unlike an RDBMS where every row
in a table has to follow the same schema
• We represent a list of cities visited as an array, or a list of addresses as list of
documents embedded inside the main document.

• Embedding child documents as sub objects inside documents provides for easy
access and better performance.

• If you look at the documents, you will see that some of the attributes are similar,
such as firstname or city.

• At the same time, there are attributes in the second document which do not exist
in the first document, such as addresses, while likes is in the first document but
not the second.
• MongoDB [MongoDB],
• CouchDB [CouchDB],
• Terrastore [Terrastore],
popular • OrientDB [OrientDB],
document • RavenDB [RavenDB], and
• Lotus Notes [Notes Storage
databases Facility] that uses document
storage
Consistency

Transactions
Features of
document
Availability
database(MongoDB)

Query Features

Scaling
how MongoDB works
• Consistency in MongoDB database is
configured by using the replica sets and

1. Consistency choosing to wait for the writes to be

replicated to all the slaves or a given
number of slaves.

• Every write can specify the number of

servers the write has to be propagated to
before it returns as successful.
• A command like db.runCommand({ getlasterror : 1 , w : "majority" }) tells
the database how strong is the consistency you want.

• For example, if you have one server and specify the w as majority, the write
will return immediately since there is only one node.

• If you have three nodes in the replica set and specify w as majority, the write
will have to complete at a minimum of two nodes before it is reported as a
success.

• You can increase the w value for stronger consistency but you will suffer on
write performance, since now the writes have to complete at more nodes.
• Replica sets also allow you to increase the read performance by allowing
reading from slaves by setting slaveOk;

• this parameter can be set on the connection, or database, or collection, or

individually for each operation:
• Mongo mongo = new Mongo("localhost:27017");
• mongo.slaveOk();

• Here we are setting slaveOk per operation, so that we can decide which
operations can work with data from the slave node.
1. DBCollection collection = getOrderCollection();

2. BasicDBObject query = new BasicDBObject();

3. query.put("name", "Martin");

4. DBCursor cursor = collection.find(query).slaveOk();

• Similar to various options available for read, you can change the settings to
achieve strong write consistency, if desired.
WriteConcern

• By default, a write is reported successful once the database

receives it; you can change this so as to wait for the writes to be
synced to disk or to propagate to two or more slaves.

• This is known as WriteConcern:

• You make sure that certain writes are written to the master and
some slaves by setting WriteConcern to REPLICAS_SAFE.
Code: setting the WriteConcern for all writes to a
collection

• DBCollection shopping = database.getCollection("shopping");

• shopping.setWriteConcern(REPLICAS_SAFE);

• WriteConcern can also be set per operation by specifying it on the

savecommand:

• WriteResult result = shopping.insert(order, REPLICAS_SAFE);

• There is a tradeoff that you need to

carefully think about, based on your

application needs and business

requirements, to decide what settings

make sense for slaveOk during read or

what safety level you desire during write

with WriteConcern
END OF M4,C1
Availability,
Transactions, Query
Features
M4,C2
• Recall : CAP theorm

• Document databases try to improve on availability

by replicating data using the master-slave setup.
2. Availability
• The same data is available on multiple nodes and
the clients can get to the data even when the
primary node is down.

• Usually, the application code does not have to

determine if the primary node is available or not.

• MongoDB implements replication, providing high

availability using replica sets.
REPLICA SET

• In a replica set, there are two or more

nodes participating in an asynchronous
master-slave replication.

• The replica-set nodes elect the master,

or primary, among themselves.

• Assuming all the nodes have equal voting

rights, some nodes can be favoured for
being closer to the other servers, for
having more RAM, and so on;

• users can affect this by assigning a

priority—a number between 0 and 1000—
to a node
• All requests go to the master node, and the data is replicated to the slave
nodes.

• If the master node goes down, the remaining nodes in the replica set vote
among themselves to elect a new master;

• all future requests are routed to the new master, and the slave nodes start
getting data from the new master.

• When the node that failed comes back online, it joins in as a slave and catches
up with the rest of the nodes by pulling all the data it needs to get current.
Example:
Replica set configuration
with higher priority
assigned to nodes in the
same datacenter
• We have two nodes, mongo A and mongo B, running the MongoDB
database in the primary datacenter, and mongo C in the secondary
datacenter.

• If we want nodes in the primary datacenter to be elected as primary

nodes, we can assign them a higher priority than the other nodes.

• More nodes can be added to the replica sets without having to take them
offline.
• The application writes or reads from the primary (master) node.

• When connection is established, the application only needs to connect to one node
(primary or not, does not matter) in the replica set, and the rest of the nodes are
discovered automatically.

• When the primary node goes down, the driver talks to the new primary elected by the
replica set.

• The application does not have to manage any of the communication failures or node
selection criteria.
• Using replica sets gives you the
ability to have a highly available
document data store.

• Replica sets are generally used for

data redundancy, automated
failover, read scaling, server
maintenance without downtime, and
disaster recovery.
Transactions, in the traditional RDBMS sense,
mean that you can start modifying the database
with insert, update, or delete commands over
different tables and then decide if you want to
keep the changes or not by using commit or

3. rollback.

Transactions
These constructs are generally not available in
NoSQL solutions—a write either succeeds or
fails.
Transactions involving more
than one operation are not
Transactions at the single-
possible, although there are
document level are known as
products such as RavenDB that
atomic transactions.
do support transactions across
multiple operations.
• By default, all writes are reported as
successful.

• A finer control over the write can be

achieved by using WriteConcern parameter.

• We ensure that order is written to more than

one node before it’s reported successful by
using WriteConcern.REPLICAS_SAFE.

• Different levels of WriteConcern let you

choose the safety level during writes;
• for example, when writing log entries, you can use lowest level of
safety, WriteConcern.NONE.

• final Mongo mongo = new Mongo(mongoURI);

• mongo.setWriteConcern(REPLICAS_SAFE);
• DBCollection shopping =
mongo.getDB(orderDatabase).getCollection(shoppingCollection);
• try {

• WriteResult result = shopping.insert(order, REPLICAS_SAFE);

• //Writes made it to primary and at least one secondary

• } catch (MongoException writeException) {

• //Writes did not make it to minimum of two nodes including primary

• dealWithWriteFailure(order, writeException);

• }
4. Query Features

• CouchDB allows you to query via views—complex queries on documents

which can be either materialized or dynamic (think of them as RDBMS views

which are either materialized or not).

• With CouchDB, if you need to aggregate the number of reviews for a product as

well as the average rating, you could add a view implemented via map-reduce

to return the count of reviews and the average of their ratings.

• When there are many requests, you don’t want to compute the count

and average for every request;

• instead you can add a materialized view that precomputes the values

and stores the results in the database.

• These materialized views are updated when queried, if any data was

changed since the last update.

• One of the good features of document

databases, as compared to key-value

stores, is that we can query the data

inside the document without having to

retrieve the whole document by its key

and then introspect the document.

• This feature brings these databases closer

to the RDBMS query model.

MongoDB has a query language which is expressed
via JSON and has constructs such as

$query for the where clause,

$orderby for sorting the data, or
$explain to show the execution plan
of the query.

There are many more constructs like these

that can be combined to create a MongoDB
query.
END OF M4,C2
Query Features (cont) ,
Scaling , Suitable Use
Cases
M4,C3
SQL and equivalent MongoDB
queries
1. want to return all the documents in an order collection (all
rows in the order table).

• The SQL for this would be:

• SELECT * FROM order

• The equivalent query in Mongo shell would be:

• db.order.find()
2. Selecting the orders for a single customerId of 883c2c5b4e5b

• SQL Query would be:

• SELECT * FROM order WHERE customerId =

"883c2c5b4e5b"

• The equivalent query in Mongo to get all orders for a

single customerId of 883c2c5b4e5b:

• db.order.find({"customerId":"883c2c5b4e5b"})
3. selecting orderId and orderDate for one customer in
• SQL would be:

• SELECT orderId,orderDate FROM order WHERE

customerId = "883c2c5b4e5b"
• the equivalent in Mongo would be:
• db.order.find({customerId:"883c2c5b4e5b"},{order
Id:1,orderDate:1})

• Similarly, queries to count, sum, and so on are all

available.

• Since the documents are aggregated objects, it is really

easy to query for documents that have to be matched using
the fields with child objects.
4. QUERY FOR ALL THE ORDERS WHERE ONE OF THE ITEMS
ORDERED HAS A NAME LIKE REFACTORING.

SQL EQUIVALENT MONGO QUERY

• SELECT * FROM customerOrder, • db.orders.find({"items.product.name":/Ref

orderItem, product WHERE actoring/})
customerOrder.orderId =
orderItem.customerOrderId AND
orderItem.productId =
product.productId AND product.name
LIKE '%Refactoring%'
The query for MongoDB is simpler because the objects
are embedded inside a single document and you can
query based on the embedded child documents.
5. SCALING

• Scaling for heavy-read loads can be achieved by adding more read slaves, so
that all the reads can be directed to the slaves.

• Given a heavy-read application, with our 3-node replica-set cluster, we can

add more read capacity to the cluster as the read load increases just by
adding more slave nodes to the replica set to execute reads with the slaveOk
flag

• This is horizontal scaling for reads.

Adding a new node, mongo D, to an existing replica-set cluster
Once the new node, mongo D, is started, it needs
to be added to the replica set.
rs.add("mongod:27017");

When a new node is added, it will sync up with

the existing nodes, join the replica set as
secondary node, and start serving read requests.

An advantage of this setup is that we do not

have to restart any other nodes, and there is no
downtime for the application either.
SCALING FOR WRITE

When we want to scale for write, we can start

sharding the data.

Sharding is similar to partitions in RDBMS where

we split data by value in a certain column, such as
state or year.

With RDBMS, partitions are usually on the same node, so the

client application does not have to query a specific partition
but can keep querying the base table;

the RDBMS takes care of finding the right partition

for the query and returns the data
• In sharding, the data is also split by certain field, but then moved to
different Mongo nodes.

• The data is dynamically moved between nodes to ensure that shards are
always balanced.

• We can add more nodes to the cluster and increase the number of writable
nodes, enabling horizontal scaling for writes.
• db.runCommand( { shardcollection : "ecommerce.customer",

• key : {firstname : 1} } )

• Splitting the data on the first name of the customer ensures that the
data is balanced across the shards for optimal write performance;

• furthermore, each shard can be a replica set ensuring better read

performance within the shard
MongoDB sharded setup where each shard is a replica set
• When we add a new shard to this existing sharded cluster, the
data will now be balanced across four shards instead of three.

• As all this data movement and infrastructure refactoring is

happening, the application will not experience any downtime,
although the cluster may not perform optimally when large
amounts of data are being moved to rebalance the shards
• The shard key plays an important role.
• You may want to place your MongoDB database shards closer to
their users, so sharding based on user location may be a good idea.
• When sharding by customer location, all user data for the East
Coast of the USA is in the shards that are served from the East
Coast, and all user data for the West Coast is in the shards that are
on the West Coast.
EVENT LOGGING CONTENT MANAGEMENT
SYSTEMS, BLOGGING

Suitable Use PLATFORMS

Cases

WEB ANALYTICS OR E-COMMERCE

REAL-TIME ANALYTICS APPLICATIONS
Event Logging
• Applications have different event logging needs;

• within the enterprise, there are many different

applications that want to log events.

• Document databases can store all these different types

of events and can act as a central data store for event
storage.

• This is especially true when the type of data being

captured by the events keeps changing.

• Events can be sharded by the name of the application

where the event originated or by the type of event such
as order_processed or customer_logged.
Content Management
Systems, Blogging Platforms
• Since document databases have no predefined
schemas and usually understand JSON
documents, they work well in content
management systems or applications for
publishing websites, managing user comments,
user registrations, profiles, web-facing
documents.
Web Analytics or Real-Time
Analytics

• Document databases can store data for real-

time analytics;

• since parts of the document can be updated,

it’s very easy to store page views or unique
visitors, and new metrics can be easily
added without schema changes.
• E-commerce applications often

need to have flexible schema for

E-Commerce products and orders, as well as the

Applications ability to evolve their data models

without expensive database

refactoring or data migration

When Not to Use

Complex
Queries against
Transactions
Varying Aggregate
Spanning Different
Structure
Operations
Complex Transactions Spanning
Different Operations

If you need to have However, there are some

atomic cross-document document databases that
operations, then do support these kinds of
document databases operations, such as
may not be for you. RavenDB
Queries against Varying Aggregate Structure

• Flexible schema means that the database does not enforce any restrictions on
the schema.

• Data is saved in the form of application entities. If you need to query these
entities ad hoc, your queries will be changing (in RDBMS terms, this would
mean that as you join criteria between tables, the tables to join keep changing).

• Since the data is saved as an aggregate, if the design of the aggregate is

constantly changing, you need to save the aggregates at the lowest level of
granularity—basically, you need to normalize the data.

• In this scenario, document databases may not work.

END OF MODULE 4

No SQL Module 4
No ratings yet
No SQL Module 4
11 pages
Nosql Module 4.
No ratings yet
Nosql Module 4.
8 pages
Nosql Mod4
No ratings yet
Nosql Mod4
12 pages
Module 4
No ratings yet
Module 4
17 pages
NoSQL-Module 4
No ratings yet
NoSQL-Module 4
11 pages
Module 4
No ratings yet
Module 4
36 pages
noSQL Module-4 (Sindhu)
No ratings yet
noSQL Module-4 (Sindhu)
9 pages
BDA Unit 5
No ratings yet
BDA Unit 5
61 pages
Udbms (Unit 3)
No ratings yet
Udbms (Unit 3)
9 pages
NoSQL Unit 3
No ratings yet
NoSQL Unit 3
65 pages
Mongo DB
No ratings yet
Mongo DB
227 pages
Unit IV
No ratings yet
Unit IV
50 pages
Document Database
No ratings yet
Document Database
25 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
CHAP1 No SQL Database - 085309
No ratings yet
CHAP1 No SQL Database - 085309
72 pages
NGD Question Bank Answers
No ratings yet
NGD Question Bank Answers
41 pages
MST Unit-5
No ratings yet
MST Unit-5
14 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
10 pages
Mongo DB
No ratings yet
Mongo DB
8 pages
MongoDB Cheat Sheet
No ratings yet
MongoDB Cheat Sheet
9 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
02 - Document-Based and MongoDB
No ratings yet
02 - Document-Based and MongoDB
133 pages
MongoDB Case Study 1
No ratings yet
MongoDB Case Study 1
6 pages
MongoDB NoSQL Database Guide
No ratings yet
MongoDB NoSQL Database Guide
19 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Mongo Lesson2
No ratings yet
Mongo Lesson2
43 pages
MongoDB Lecture 1
No ratings yet
MongoDB Lecture 1
37 pages
Mongodb
No ratings yet
Mongodb
60 pages
Module 3 Mongodb
No ratings yet
Module 3 Mongodb
10 pages
Lecture 18 Theory
No ratings yet
Lecture 18 Theory
18 pages
NGT Unit 2 - 230630 - 094118
No ratings yet
NGT Unit 2 - 230630 - 094118
62 pages
Complete Unit 3 Notes
No ratings yet
Complete Unit 3 Notes
30 pages
Chapter 5
No ratings yet
Chapter 5
84 pages
Notes For Question Bank
No ratings yet
Notes For Question Bank
17 pages
O7tygtemdb2j DF300 010 BeyondStorage
No ratings yet
O7tygtemdb2j DF300 010 BeyondStorage
24 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
Lecture 6 - Document Databases, Data Formats
No ratings yet
Lecture 6 - Document Databases, Data Formats
43 pages
MongoDB Cheat Sheet
No ratings yet
MongoDB Cheat Sheet
17 pages
Introduction To MongoDB
No ratings yet
Introduction To MongoDB
38 pages
Module 3
No ratings yet
Module 3
60 pages
Module 4.2
No ratings yet
Module 4.2
78 pages
Module 3 MongoDB
No ratings yet
Module 3 MongoDB
8 pages
Mongo DB
No ratings yet
Mongo DB
30 pages
Lecture 40 1
No ratings yet
Lecture 40 1
22 pages
MongoDB: A Guide for Developers
No ratings yet
MongoDB: A Guide for Developers
50 pages
MongoDB Database Model
No ratings yet
MongoDB Database Model
7 pages
Dbs Unit V Notes
No ratings yet
Dbs Unit V Notes
23 pages
Mongodb
No ratings yet
Mongodb
161 pages
Mongodb
No ratings yet
Mongodb
9 pages
MongoDB Guide for M.Tech Students
No ratings yet
MongoDB Guide for M.Tech Students
9 pages
281511lecture Notes 2 - MongoDB Data Modeling-1718181255820
No ratings yet
281511lecture Notes 2 - MongoDB Data Modeling-1718181255820
13 pages
Mongo DB
No ratings yet
Mongo DB
5 pages
Mongo DB
No ratings yet
Mongo DB
3 pages
MongoDB Guide for Students
No ratings yet
MongoDB Guide for Students
104 pages
UNIT III MongoDB
No ratings yet
UNIT III MongoDB
43 pages
Unit 2
No ratings yet
Unit 2
85 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
18 pages
MongoDB for Developers at Doodle
No ratings yet
MongoDB for Developers at Doodle
30 pages
GPUs and GPGPU
No ratings yet
GPUs and GPGPU
15 pages
06 Planning
No ratings yet
06 Planning
26 pages
AI and Machine Learning 4
No ratings yet
AI and Machine Learning 4
34 pages
7 Csesyll
No ratings yet
7 Csesyll
4 pages
Er Model
No ratings yet
Er Model
98 pages
Week 1 PyDA
No ratings yet
Week 1 PyDA
7 pages
Circle Trigno
No ratings yet
Circle Trigno
55 pages
Oracle SIH Data Model
No ratings yet
Oracle SIH Data Model
101 pages
ProtonVPN Command-Line Tool For Linux - ProtonVPN Support
No ratings yet
ProtonVPN Command-Line Tool For Linux - ProtonVPN Support
39 pages
BIM Execution Plan 1688808650
No ratings yet
BIM Execution Plan 1688808650
27 pages
WeScale - CloudRadar - CloudNative - Partie1
No ratings yet
WeScale - CloudRadar - CloudNative - Partie1
29 pages
Black Magic ATEM Mini Pro
No ratings yet
Black Magic ATEM Mini Pro
6 pages
Microsoft Copilot Studio - Workshop
No ratings yet
Microsoft Copilot Studio - Workshop
66 pages
PCI DSS Compliance for AWS Users
No ratings yet
PCI DSS Compliance for AWS Users
117 pages
Red Hat Ansible Automation Platform-2.5-Access Management and Authentication-En-US
No ratings yet
Red Hat Ansible Automation Platform-2.5-Access Management and Authentication-En-US
98 pages
MARSTEK B2500-D User Manual V1.0 Multi Languages Compressed Version
No ratings yet
MARSTEK B2500-D User Manual V1.0 Multi Languages Compressed Version
92 pages
WORKSHEET-3.1 Java PDF New
No ratings yet
WORKSHEET-3.1 Java PDF New
3 pages
SanDisk Cruzer Blade USB Registry
No ratings yet
SanDisk Cruzer Blade USB Registry
2 pages
Iphone 16e - Apple (UK)
No ratings yet
Iphone 16e - Apple (UK)
1 page
Final Project Report: Vehicle Number Plate Recognition
No ratings yet
Final Project Report: Vehicle Number Plate Recognition
32 pages
ViperTouch: Advanced Climate & Production Control
No ratings yet
ViperTouch: Advanced Climate & Production Control
12 pages
Java UML Class Diagram Guide
No ratings yet
Java UML Class Diagram Guide
13 pages
SQL Server Performance Tuning - Fast Track: Course Title
No ratings yet
SQL Server Performance Tuning - Fast Track: Course Title
8 pages
TX 1xx Instruction Manual V3.00 - ENG
No ratings yet
TX 1xx Instruction Manual V3.00 - ENG
141 pages
Datami Android VPN SDK
No ratings yet
Datami Android VPN SDK
14 pages
DVTK RIS Emulator User Manual
No ratings yet
DVTK RIS Emulator User Manual
15 pages
CV of Moeen Khan (Software Engineer)
No ratings yet
CV of Moeen Khan (Software Engineer)
1 page
ICO TELEGRAM by All Currency
No ratings yet
ICO TELEGRAM by All Currency
4 pages
Log
No ratings yet
Log
1,117 pages
Ejemplo de Ensayo Sobre La Ley de La Vida
100% (1)
Ejemplo de Ensayo Sobre La Ley de La Vida
7 pages
Yassmin Mohamed Gamal C.V
No ratings yet
Yassmin Mohamed Gamal C.V
3 pages
2018 Hypack-Manual en PDF
No ratings yet
2018 Hypack-Manual en PDF
2,434 pages
Sequence Diagrams
No ratings yet
Sequence Diagrams
19 pages
Plumbing Design With RME
No ratings yet
Plumbing Design With RME
20 pages
OpenACC 3 0
No ratings yet
OpenACC 3 0
149 pages
Managing Hardware and Software Assets
No ratings yet
Managing Hardware and Software Assets
57 pages
Extreme Privacy What It Takes To Disappear
100% (7)
Extreme Privacy What It Takes To Disappear
640 pages