KEMBAR78
NOSQL | PDF | Scalability | Databases
0% found this document useful (0 votes)
255 views16 pages

NOSQL

Uploaded by

amithp169
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views16 pages

NOSQL

Uploaded by

amithp169
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DIPLOMA LETERAL ENTRY : LATE BUT G REAT

1. Explain the concept of relationships in graph databases and provide a clear diagram
to illustrate it?
Ans:

o Graph databases store entities (nodes) and their relationships (edges).


o Nodes represent objects/entities with properties (e.g., a "name"), while edges represent the
relationships between nodes and can also have properties.
1. Directional Relationships:
o Edges have directional significance, allowing nodes to be organized and connected in meaningful ways.
o This organization helps uncover patterns and interpret the same data differently based on relationships.
2. Properties of Nodes and Edges:
o Nodes have attributes, such as a "name" property.
o Relationships (edges) can also have attributes and support various types of connections, such as
hierarchical or temporal relationships.
3. Efficiency in Relationship Traversal:
o Relationships are stored directly in the graph database, making traversal of relationships much faster
compared to calculating them at query time.
4. Flexible Representation of Relationships:
o Nodes can have multiple and diverse types of relationships, supporting complex scenarios like spatial
indexing, categorization, and sorted access.
5. Unified Graph Representation:
o Graph databases allow all relationships and nodes to coexist in a single, flexible structure, enabling a
holistic view of data and its interconnections.

2. Explain with a neat diagram, the partitioning and combining in Map reduce.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

Ans:
o MapReduce is a programming model used for processing large data sets.
o It involves dividing the work into smaller chunks (partitioning) and then combining the results of those
smaller chunks to produce the final output

1. Single Reduce Function in MapReduce:


o In its simplest form, a MapReduce job uses a single reduce function that processes all outputs from the
mappers concatenated together.
o While functional, this approach lacks parallelism and increases data transfer, which can be optimized.
2. Partitioning for Parallelism:
o To improve efficiency, mapper outputs are partitioned based on keys. Multiple keys are grouped into
partitions, and each partition is sent to a specific reducer.
o This allows multiple reducers to operate in parallel, improving the overall processing speed.
3. Grouping Keys into Partitions:
o The framework collects data from all nodes for a given partition and combines it into a single group.
o Each partition is then processed by a dedicated reducer, allowing distributed and parallel processing of
data.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

4. Combiner for Reducing Data Transfer:


o A combiner function operates at the mapper level to aggregate repetitive data, reducing the volume of
key-value pairs sent to reducers. This minimizes data transfer and improves performance.
5. Combinable Reducers:
o The combiner function often mirrors the final reducer function but requires the reducer’s output to
match its input format.
o When this condition is met, the reducer can also act as a combiner to pre-aggregate data effectively.
6. Non-Combinable Reducers:
o Not all reducers are combinable. For instance, if the reducer’s input and output formats differ (e.g.,
counting unique customers for a product), a separate combining function is used.
o This function may perform basic pre-aggregation tasks, such as eliminating duplicates, but differs from
the final reducer.
3. Describe some example queries to use with document databases.
Ans:
1. Retrieve a Document by ID:
{ "_id": "12345" }:Fetches the document with _id equal to 12345.
2. Query with Filters:
{ "category": "electronics", "price": { "$lt": 500 } }:Finds documents where the category is "electronics"
and the price is less than 500.
3. Projection Query:
{ "category": "books" }, { "title": 1, "author": 1, "_id": 0 }:Retrieves documents in the "books" category,
returning only the "title" and "author" fields, excluding _id.
4. Update Query:
{ "status": "pending" }, { "$set": { "status": "completed" } }:Updates documents with a "pending" status,
changing it to "completed".
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

5. Array Query:
{ "tags": { "$in": ["technology", "innovation"] } }:Finds documents where the tags array contains either
"technology" or "innovation"

4. Explain the three methods for scaling graph databases, along with a clear diagram
to illustrate each approach
Ans:
Graph databases can be scaled using three primary methods: vertical scaling, horizontal scaling, and
hybrid scaling. Each approach has its strengths and is suited for specific use cases.
1. Vertical Scaling (Scaling Up):
o Involves upgrading the hardware of a single machine where the graph database is hosted. This
includes adding more CPU, RAM, or storage capacity.
o Simple to implement.
o No changes required to the database architecture.
o Limited by the maximum hardware capacity.
o Not cost-effective for extremely large graphs.
2. Horizontal Scaling (Sharding or Scaling Out):
o Involves distributing the graph data across multiple machines (nodes). The graph is partitioned
based on nodes, edges, or subgraphs, and each partition is stored on a separate server.
o Allows handling of very large graphs.
o Increases fault tolerance and redundancy.
o Complex to implement due to the need for efficient partitioning and querying.
o Traversals across partitions can be slower.
3. Hybrid Scaling (Combined Approach):
o Combines both vertical and horizontal scaling. Initially, hardware is upgraded (vertical scaling) to
optimize performance, followed by distributing the graph across multiple nodes (horizontal scaling)
to handle larger data sets.
o Balances the simplicity of vertical scaling with the capacity of horizontal scaling.
o Provides a more flexible solution for handling both small and large-scale graphs.
o Higher cost and complexity compared to individual scaling methods

5. Explain the concepts of scaling and application-level sharding of nodes, and provide

a clear diagram to illustrate these concepts?


DIPLOMA LETERAL ENTRY : LATE BUT G REAT

Ans:
o Unlike aggregate-oriented NoSQL databases, graph databases are relationship-oriented, making sharding
difficult. Since any node can connect to any other node, storing related nodes on the same server is crucial
for efficient graph traversal.
o Traversing graphs across different machines leads to performance bottlenecks.
o One technique involves increasing the RAM on a single server to store the working set of nodes and
relationships entirely in memory.
o This is effective only when the dataset size can realistically fit within the available RAM on a single
machine.
o For large datasets, read scaling can be achieved by using a master-slave architecture. All write operations
are handled by the master, while multiple slaves are used for read-only access. This ensures that read
queries are distributed across the slaves, reducing the load on the master.
o Adding more slaves improves the database's read performance. Each slave contains a replica of the data,
allowing the system to handle higher read workloads efficiently.
o This is especially useful when the dataset cannot fit into a single machine’s memory but can be replicated
across multiple machines.
o Slaves contribute to system availability by providing read-only access even if the master node experiences
downtime. These slaves are configured to never become masters, ensuring stable read access without
conflicting write operations.
o These scaling techniques depend on the size of the dataset and infrastructure limitations. If the dataset fits
in RAM, vertical scaling (adding more RAM) is sufficient. For larger datasets, the master-slave pattern
provides a balance between availability and scalability, leveraging replication effectively.

o When the dataset size makes replication impractical, we can shard the data from the
application side using domain-specific knowledge.
o For example, nodes that relate to the North America can be created on one server while the nodes that
relate to Asia on another.
o This application-level sharding needs to understand that nodes are stored on physically
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

different databases.
6. With an example of a two-stage Map-Reduce process and include a clear, detailed
diagram to explain it?"(Skip)
7. With a neat diagram. Explain the three ways in which graph databases can be
scaled.(Same as 4)

1. How do consistency, transactions, and availability apply to graph databases?


Ans:
Consistency:
o Graph databases make sure that all the connections between nodes (data
points) are correct and valid.
o On a single server, data is always consistent, meaning everything is up-to-date.
o In a cluster of servers, consistency is eventually ensured. For example, in
Neo4J, when data is updated on the master server, it will eventually be
updated on all other servers.
o Nodes can’t be deleted if they are still connected to other nodes, making sure
data integrity is preserved.
2. Transactions:
o In graph databases, changes like adding or updating nodes must be done inside
a transaction, which is like a set of instructions that must be completed fully.
o If a transaction is not successful, the changes are not saved, keeping the data
safe from errors.
o To make sure the changes are final, the transaction has to be marked as
"successful" before it ends.
o Some operations, like reading data, don't need a transaction, but writing or
changing data does.
3. Availability:
o Graph databases like Neo4J make sure data is always available by using
multiple copies of the data across different servers (slaves).
o These slave servers can handle read requests and even write data, which is
then updated on the master server.
o If the master server goes down, the database automatically chooses a new
master to ensure the system keeps running without downtime.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

o Tools like Apache ZooKeeper help manage these servers and make sure
everything stays in sync.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

2. List and explain use cases where graph databases are very useful.

Ans:

Connected Data:

a. Graph databases work well when data is highly connected, like social
networks or business relationships.
b. They can link data from different areas (social, commerce, location) to make
relationships more meaningful.
c. Example: Connecting people, companies, or products across various fields and
finding useful connections.

Routing, Dispatch, and Location-Based Services:

d. Locations (like addresses or places) are represented as nodes in a graph, and


relationships between them can represent paths.
e. Distance and location data help optimize delivery routes or recommend nearby
places like restaurants or stores.
f. Example: Delivery services can find the best route, and location-based apps
can notify users when they are near interesting places.

Recommendation Engines:

g. Graph databases suggest products, services, or places based on relationships,


like "friends also bought this" or "people who visited this place also liked
another."
h. As more data is added, recommendations become faster and more accurate by
analyzing relationships.
i. Example: Online stores can suggest products based on customer behavior, and
travel apps can recommend tourist spots based on others' visits.

3. Elaborate the suitable use cases of document databases. When document


databases are not suitable? Explain.
Ans:
Suitable Use Cases of Document Databases:
1. Event Logging:
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

o Can store diverse event types from various applications.


o Suitable for capturing evolving event data over time.
o Allows events to be categorized by type (e.g., order_processed,
customer_logged).
2. Content Management Systems (CMS) and Blogging Platforms:
o Ideal for managing dynamic, unstructured content like blogs, articles, and
comments.
o Handles flexible, changing content models (e.g., user profiles, metadata).
o Works well with JSON-like documents to store content and related data.
3. Web Analytics or Real-Time Analytics:
o Perfect for tracking and analyzing real-time data (e.g., page views, unique
visitors).
o Easily accommodates new metrics or data updates without schema changes.
o Supports dynamic querying for real-time reporting and analysis.
4. E-Commerce Applications:
o Allows flexible schema for storing product catalogs, customer orders, and
transactions.
o Easily adapts to evolving product attributes or business requirements.
o Avoids the need for expensive schema changes or data migrations.

When Document Databases Are Not Suitable:


1. Complex Transactions Spanning Different Operations:
o Not ideal for applications needing atomic, cross-document transactions.
o Some databases like RavenDB support multi-document transactions, but it's
not common.
o Incompatible with systems requiring full ACID compliance for complex
operations (e.g., banking or financial applications).
2. Queries Against Varying Aggregate Structures:
o Difficult to perform efficient queries when document structures change
frequently.
o Complex aggregations or joins across documents can be inefficient.
o Not suitable for applications requiring strict, consistent query structures or
schema enforcement.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

4. Explain graph database. With a neat diagram, explain relationship with properties in a
graph. (Skip)

5. Explain Query features in detail with examples


DIPLOMA LETERAL ENTRY : LATE BUT G REAT

6. Illustrate the differences between SQL queries and their equivalent commands
in the MongoDB shell?
Operati SQL MongoDB Command Explanati
on Comma on
nd
Select SELE db.users.find(); Fetches all
All Data CT * records from
FROM the "users"
users; table/collecti
on.

Select SELE db.users.find({}, Selects


Specific CT {name: 1, age: 1}); only the
Columns name, name and
age age fields
FROM from
users; "users".

Insert INSER db.users.insertOne({na Inserts a


Data T INTO me: 'John', age: 30}); new user
users into the
(name, "users"
age) table/collecti
VALUE on.
S ('John',
30);
Update UPDA db.users.updateOne({n Updates
Data TE users ame: 'John'}, { $set: age for a
SET age {age: 31}}); user named
= 31 "John".
WHERE
name =
'John';
Delete DELE db.users.deleteOne({na Deletes a
Data TE me: 'John'}); user with the
FROM name
users "John".
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

WHERE
name =
'John';
Count SELE db.users.countDocume Counts the
Records CT nts(); number of
COUNT( records in
*) FROM the "users"
users; table/collecti
on.

Find SELE db.users.find({age: Finds all


with CT * {$gt: 25}}); users where
Condition FROM age is
users greater than
WHERE 25.
age > 25;
Sorting SELE db.users.find().sort({ag Sorts the
Data CT * e: -1}); users by age
FROM in
users descending
ORDER order.
BY age
DESC;
• SQL queries are based on relational databases and follow a structured format using
tables.
• MongoDB queries are used with non-relational, document-based databases and
follow a more flexible, JSON-like syntax

7. What are graph databases, and can you explain them with an example of a graph
structure?

Ans:
What is Graph Data?
Graph data represents information as a collection of entities (known as nodes) and
relationships between those entities (known as edges). In graph databases, these nodes
and edges are used to represent complex, interconnected data. The main advantage of
graph databases is their ability to easily model and explore the relationships between
data points.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

• Nodes (Entities): These are individual objects or entities. For example, in a social
network, nodes could represent users.
• Edges (Relationships): These represent connections between nodes. For example, in
the social network, edges could represent friendships or interactions between users.
• Properties: Both nodes and edges can have properties, which are additional details
about them. For example, a node representing a user could have properties like name,
age, and location. An edge connecting two users could have properties like since
(indicating when they became friends).

8. Briefly Explain scaling feature in document databases, with a neat diagram.


Scaling in Document Databases (MongoDB)
1. Horizontal Scaling for Reads:
o To handle heavy read loads, more read slaves (secondary nodes) are added to
a replica set.
o Example: Adding a new node (mongo D) to a 3-node replica set to handle
more reads.
o No downtime: The application continues running as new nodes are added and
sync with the existing ones.
2. Horizontal Scaling for Writes (Sharding):
o To scale for writes, sharding is used to distribute data across multiple nodes.
o Sharding splits data by a specific field (e.g., customer firstname) and
distributes it across different nodes.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

o Example:
o db.runCommand({ shardcollection: "ecommerce.customer", key: {firstname:
1} })
3. Data Balancing:
o MongoDB automatically balances data between nodes to optimize
performance.
o When new nodes are added, data is rebalanced across them.
4. Replica Sets in Sharded Clusters:
o Each shard in a sharded cluster can be a replica set to ensure better read
performance within the shard.
5. Sharding Based on Location:
o Data can be sharded based on user location, so data is closer to the users (e.g.,
East Coast users' data in East Coast shards).
o This ensures low latency and better performance for geographically distributed
users.
6. No Downtime During Scaling:
o Adding nodes and rebalancing data can be done without taking the application
offline, though performance may temporarily decrease during large data
movements.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

9. Define Key-value stores. Explain the data storage in Riak with limitations and
solutions to overcome the limitation.
Ans:
Key-Value Stores:

• A key-value store is a simple NoSQL database where each data element is stored as a
key-value pair.

• The key is a unique identifier, and the value is the data associated with that key.

• Operations in key-value stores generally involve three basic commands:

o GET: Retrieve the value for a given key.

o PUT: Insert or update the value for a given key.

o DELETE: Remove the key-value pair.

• Data in the value part can be of any format, such as text, JSON, or binary data, with
the database not concerned with the contents of the value.

Data Storage in Riak:

• Buckets: Riak organizes data into buckets, which act as namespaces for keys.

• Keys and Values: Within each bucket, data is stored as key-value pairs. The key is
unique within the bucket, and the value can be any data type, such as text, JSON, or
binary data.

• Replication: Riak replicates data across multiple nodes to ensure availability and
fault tolerance. The number of replicas is configurable.

• Consistency: Riak offers tunable consistency levels, allowing users to balance


between consistency and availability based on application needs.

• Conflict Resolution: In scenarios where multiple versions of a value exist due to


concurrent writes, Riak provides mechanisms to handle conflicts, such as vector
clocks and sibling resolution.

• Sharding: Riak automatically distributes data across nodes using consistent hashing,
ensuring even data distribution and scalability.
DIPLOMA LETERAL ENTRY : LATE BUT G REAT

• Secondary Indexes: Riak supports secondary indexes, enabling efficient querying


based on non-primary key attributes.

• MapReduce: Riak provides MapReduce capabilities for performing complex queries


and data processing tasks.

the limitations of Riak in simple points:

1. Eventual Consistency: Data may not be consistent across all nodes immediately after
a write.

2. Limited Querying: Only supports key-based access, no complex querying or filtering


by value.

3. No Full ACID Transactions: Lacks full support for ACID transactions like relational
databases.

4. Conflict Resolution: Resolving write conflicts can be complex and requires manual
handling.

5. Sharding Issues: Data can become unavailable if a specific node in the cluster fails.

6. Performance Overhead: Replication and consistency checks can slow down write
operations.

You might also like