KEMBAR78
Nosql Module 4 | PDF | Mongo Db | Databases
0% found this document useful (0 votes)
17 views61 pages

Nosql Module 4

Document databases store and manage documents in formats like XML and JSON, allowing for flexible schemas where documents can vary in structure. MongoDB, a popular document database, utilizes replica sets for high availability and offers features like WriteConcern for managing data consistency. The document model supports complex queries and can scale horizontally for both read and write operations, making it suitable for applications like event logging and content management systems.

Uploaded by

Raghavendra gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views61 pages

Nosql Module 4

Document databases store and manage documents in formats like XML and JSON, allowing for flexible schemas where documents can vary in structure. MongoDB, a popular document database, utilizes replica sets for high availability and offers features like WriteConcern for managing data consistency. The document model supports complex queries and can scale horizontally for both read and write operations, making it suitable for applications like event logging and content management systems.

Uploaded by

Raghavendra gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Document

Databases
MODULE 4,CLASS 1
• Documents are the main concept in document databases.

• The database stores and retrieves documents, which can be XML, JSON,

BSON, and so on.

• These documents are self-describing, hierarchical tree data structures

which can consist of maps, collections, and scalar values.

• The documents stored are similar to each other but do not have to be

exactly the same.

• Document databases store documents in the value part of the key-value

store; think about document databases as key-value stores where the value

is examinable.
Oracle and
MongoDB
• The _id is a special field that
is found on all documents in
Mongo, just like ROWID in
Oracle.

• In MongoDB, _id can be


assigned by the user, as long
as it is unique.
What Is a Document Database?
DOCUMENT-2
DOCUMENT- 1
• Looking at the documents, we can see that they are similar, but have
differences in attribute names.

• This is allowed in document databases.

• The schema of the data can differ across documents, but these documents
can still belong to the same collection—unlike an RDBMS where every row
in a table has to follow the same schema
• We represent a list of cities visited as an array, or a list of addresses as list of
documents embedded inside the main document.

• Embedding child documents as sub objects inside documents provides for easy
access and better performance.

• If you look at the documents, you will see that some of the attributes are similar,
such as firstname or city.

• At the same time, there are attributes in the second document which do not exist
in the first document, such as addresses, while likes is in the first document but
not the second.
• MongoDB [MongoDB],
• CouchDB [CouchDB],
• Terrastore [Terrastore],
popular • OrientDB [OrientDB],
document • RavenDB [RavenDB], and
• Lotus Notes [Notes Storage
databases Facility] that uses document
storage
Consistency

Transactions
Features of
document
Availability
database(MongoDB)

Query Features

Scaling
how MongoDB works
• Consistency in MongoDB database is
configured by using the replica sets and

1. Consistency choosing to wait for the writes to be


replicated to all the slaves or a given
number of slaves.

• Every write can specify the number of


servers the write has to be propagated to
before it returns as successful.
• A command like db.runCommand({ getlasterror : 1 , w : "majority" }) tells
the database how strong is the consistency you want.

• For example, if you have one server and specify the w as majority, the write
will return immediately since there is only one node.

• If you have three nodes in the replica set and specify w as majority, the write
will have to complete at a minimum of two nodes before it is reported as a
success.

• You can increase the w value for stronger consistency but you will suffer on
write performance, since now the writes have to complete at more nodes.
• Replica sets also allow you to increase the read performance by allowing
reading from slaves by setting slaveOk;

• this parameter can be set on the connection, or database, or collection, or


individually for each operation:
• Mongo mongo = new Mongo("localhost:27017");
• mongo.slaveOk();

• Here we are setting slaveOk per operation, so that we can decide which
operations can work with data from the slave node.
1. DBCollection collection = getOrderCollection();

2. BasicDBObject query = new BasicDBObject();

3. query.put("name", "Martin");

4. DBCursor cursor = collection.find(query).slaveOk();

• Similar to various options available for read, you can change the settings to
achieve strong write consistency, if desired.
WriteConcern

• By default, a write is reported successful once the database


receives it; you can change this so as to wait for the writes to be
synced to disk or to propagate to two or more slaves.

• This is known as WriteConcern:


• You make sure that certain writes are written to the master and
some slaves by setting WriteConcern to REPLICAS_SAFE.
Code: setting the WriteConcern for all writes to a
collection

• DBCollection shopping = database.getCollection("shopping");

• shopping.setWriteConcern(REPLICAS_SAFE);

• WriteConcern can also be set per operation by specifying it on the

savecommand:

• WriteResult result = shopping.insert(order, REPLICAS_SAFE);


• There is a tradeoff that you need to

carefully think about, based on your

application needs and business

requirements, to decide what settings

make sense for slaveOk during read or

what safety level you desire during write

with WriteConcern
END OF M4,C1
Availability,
Transactions, Query
Features
M4,C2
• Recall : CAP theorm

• Document databases try to improve on availability


by replicating data using the master-slave setup.
2. Availability
• The same data is available on multiple nodes and
the clients can get to the data even when the
primary node is down.

• Usually, the application code does not have to


determine if the primary node is available or not.

• MongoDB implements replication, providing high


availability using replica sets.
REPLICA SET

• In a replica set, there are two or more


nodes participating in an asynchronous
master-slave replication.

• The replica-set nodes elect the master,


or primary, among themselves.

• Assuming all the nodes have equal voting


rights, some nodes can be favoured for
being closer to the other servers, for
having more RAM, and so on;

• users can affect this by assigning a


priority—a number between 0 and 1000—
to a node
• All requests go to the master node, and the data is replicated to the slave
nodes.

• If the master node goes down, the remaining nodes in the replica set vote
among themselves to elect a new master;

• all future requests are routed to the new master, and the slave nodes start
getting data from the new master.

• When the node that failed comes back online, it joins in as a slave and catches
up with the rest of the nodes by pulling all the data it needs to get current.
Example:
Replica set configuration
with higher priority
assigned to nodes in the
same datacenter
• We have two nodes, mongo A and mongo B, running the MongoDB
database in the primary datacenter, and mongo C in the secondary
datacenter.

• If we want nodes in the primary datacenter to be elected as primary


nodes, we can assign them a higher priority than the other nodes.

• More nodes can be added to the replica sets without having to take them
offline.
• The application writes or reads from the primary (master) node.

• When connection is established, the application only needs to connect to one node
(primary or not, does not matter) in the replica set, and the rest of the nodes are
discovered automatically.

• When the primary node goes down, the driver talks to the new primary elected by the
replica set.

• The application does not have to manage any of the communication failures or node
selection criteria.
• Using replica sets gives you the
ability to have a highly available
document data store.

• Replica sets are generally used for


data redundancy, automated
failover, read scaling, server
maintenance without downtime, and
disaster recovery.
Transactions, in the traditional RDBMS sense,
mean that you can start modifying the database
with insert, update, or delete commands over
different tables and then decide if you want to
keep the changes or not by using commit or

3. rollback.

Transactions
These constructs are generally not available in
NoSQL solutions—a write either succeeds or
fails.
Transactions involving more
than one operation are not
Transactions at the single-
possible, although there are
document level are known as
products such as RavenDB that
atomic transactions.
do support transactions across
multiple operations.
• By default, all writes are reported as
successful.

• A finer control over the write can be


achieved by using WriteConcern parameter.

• We ensure that order is written to more than


one node before it’s reported successful by
using WriteConcern.REPLICAS_SAFE.

• Different levels of WriteConcern let you


choose the safety level during writes;
• for example, when writing log entries, you can use lowest level of
safety, WriteConcern.NONE.

• final Mongo mongo = new Mongo(mongoURI);

• mongo.setWriteConcern(REPLICAS_SAFE);
• DBCollection shopping =
mongo.getDB(orderDatabase).getCollection(shoppingCollection);
• try {

• WriteResult result = shopping.insert(order, REPLICAS_SAFE);

• //Writes made it to primary and at least one secondary

• } catch (MongoException writeException) {

• //Writes did not make it to minimum of two nodes including primary

• dealWithWriteFailure(order, writeException);

• }
4. Query Features

• CouchDB allows you to query via views—complex queries on documents

which can be either materialized or dynamic (think of them as RDBMS views

which are either materialized or not).

• With CouchDB, if you need to aggregate the number of reviews for a product as

well as the average rating, you could add a view implemented via map-reduce

to return the count of reviews and the average of their ratings.


• When there are many requests, you don’t want to compute the count

and average for every request;

• instead you can add a materialized view that precomputes the values

and stores the results in the database.

• These materialized views are updated when queried, if any data was

changed since the last update.


• One of the good features of document

databases, as compared to key-value

stores, is that we can query the data

inside the document without having to

retrieve the whole document by its key

and then introspect the document.

• This feature brings these databases closer

to the RDBMS query model.


MongoDB has a query language which is expressed
via JSON and has constructs such as

$query for the where clause,


$orderby for sorting the data, or
$explain to show the execution plan
of the query.

There are many more constructs like these


that can be combined to create a MongoDB
query.
END OF M4,C2
Query Features (cont) ,
Scaling , Suitable Use
Cases
M4,C3
SQL and equivalent MongoDB
queries
1. want to return all the documents in an order collection (all
rows in the order table).

• The SQL for this would be:

• SELECT * FROM order


• The equivalent query in Mongo shell would be:

• db.order.find()
2. Selecting the orders for a single customerId of 883c2c5b4e5b

• SQL Query would be:

• SELECT * FROM order WHERE customerId =


"883c2c5b4e5b"

• The equivalent query in Mongo to get all orders for a

single customerId of 883c2c5b4e5b:

• db.order.find({"customerId":"883c2c5b4e5b"})
3. selecting orderId and orderDate for one customer in
• SQL would be:

• SELECT orderId,orderDate FROM order WHERE


customerId = "883c2c5b4e5b"
• the equivalent in Mongo would be:
• db.order.find({customerId:"883c2c5b4e5b"},{order
Id:1,orderDate:1})

• Similarly, queries to count, sum, and so on are all


available.

• Since the documents are aggregated objects, it is really


easy to query for documents that have to be matched using
the fields with child objects.
4. QUERY FOR ALL THE ORDERS WHERE ONE OF THE ITEMS
ORDERED HAS A NAME LIKE REFACTORING.

SQL EQUIVALENT MONGO QUERY

• SELECT * FROM customerOrder, • db.orders.find({"items.product.name":/Ref


orderItem, product WHERE actoring/})
customerOrder.orderId =
orderItem.customerOrderId AND
orderItem.productId =
product.productId AND product.name
LIKE '%Refactoring%'
The query for MongoDB is simpler because the objects
are embedded inside a single document and you can
query based on the embedded child documents.
5. SCALING

• Scaling for heavy-read loads can be achieved by adding more read slaves, so
that all the reads can be directed to the slaves.

• Given a heavy-read application, with our 3-node replica-set cluster, we can


add more read capacity to the cluster as the read load increases just by
adding more slave nodes to the replica set to execute reads with the slaveOk
flag

• This is horizontal scaling for reads.


Adding a new node, mongo D, to an existing replica-set cluster
Once the new node, mongo D, is started, it needs
to be added to the replica set.
rs.add("mongod:27017");

When a new node is added, it will sync up with


the existing nodes, join the replica set as
secondary node, and start serving read requests.

An advantage of this setup is that we do not


have to restart any other nodes, and there is no
downtime for the application either.
SCALING FOR WRITE

When we want to scale for write, we can start


sharding the data.

Sharding is similar to partitions in RDBMS where


we split data by value in a certain column, such as
state or year.

With RDBMS, partitions are usually on the same node, so the


client application does not have to query a specific partition
but can keep querying the base table;

the RDBMS takes care of finding the right partition


for the query and returns the data
• In sharding, the data is also split by certain field, but then moved to
different Mongo nodes.

• The data is dynamically moved between nodes to ensure that shards are
always balanced.

• We can add more nodes to the cluster and increase the number of writable
nodes, enabling horizontal scaling for writes.
• db.runCommand( { shardcollection : "ecommerce.customer",

• key : {firstname : 1} } )

• Splitting the data on the first name of the customer ensures that the
data is balanced across the shards for optimal write performance;

• furthermore, each shard can be a replica set ensuring better read


performance within the shard
MongoDB sharded setup where each shard is a replica set
• When we add a new shard to this existing sharded cluster, the
data will now be balanced across four shards instead of three.

• As all this data movement and infrastructure refactoring is


happening, the application will not experience any downtime,
although the cluster may not perform optimally when large
amounts of data are being moved to rebalance the shards
• The shard key plays an important role.
• You may want to place your MongoDB database shards closer to
their users, so sharding based on user location may be a good idea.
• When sharding by customer location, all user data for the East
Coast of the USA is in the shards that are served from the East
Coast, and all user data for the West Coast is in the shards that are
on the West Coast.
EVENT LOGGING CONTENT MANAGEMENT
SYSTEMS, BLOGGING

Suitable Use PLATFORMS

Cases

WEB ANALYTICS OR E-COMMERCE


REAL-TIME ANALYTICS APPLICATIONS
Event Logging
• Applications have different event logging needs;

• within the enterprise, there are many different


applications that want to log events.

• Document databases can store all these different types


of events and can act as a central data store for event
storage.

• This is especially true when the type of data being


captured by the events keeps changing.

• Events can be sharded by the name of the application


where the event originated or by the type of event such
as order_processed or customer_logged.
Content Management
Systems, Blogging Platforms
• Since document databases have no predefined
schemas and usually understand JSON
documents, they work well in content
management systems or applications for
publishing websites, managing user comments,
user registrations, profiles, web-facing
documents.
Web Analytics or Real-Time
Analytics

• Document databases can store data for real-


time analytics;

• since parts of the document can be updated,


it’s very easy to store page views or unique
visitors, and new metrics can be easily
added without schema changes.
• E-commerce applications often

need to have flexible schema for

E-Commerce products and orders, as well as the


Applications ability to evolve their data models

without expensive database

refactoring or data migration


When Not to Use

Complex
Queries against
Transactions
Varying Aggregate
Spanning Different
Structure
Operations
Complex Transactions Spanning
Different Operations

If you need to have However, there are some


atomic cross-document document databases that
operations, then do support these kinds of
document databases operations, such as
may not be for you. RavenDB
Queries against Varying Aggregate Structure

• Flexible schema means that the database does not enforce any restrictions on
the schema.

• Data is saved in the form of application entities. If you need to query these
entities ad hoc, your queries will be changing (in RDBMS terms, this would
mean that as you join criteria between tables, the tables to join keep changing).

• Since the data is saved as an aggregate, if the design of the aggregate is


constantly changing, you need to save the aggregates at the lowest level of
granularity—basically, you need to normalize the data.

• In this scenario, document databases may not work.


END OF MODULE 4

You might also like