KEMBAR78
2-Spring24 NoSQL Systems | PDF | Databases | No Sql
0% found this document useful (0 votes)
3 views36 pages

2-Spring24 NoSQL Systems

Uploaded by

sabakabha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views36 pages

2-Spring24 NoSQL Systems

Uploaded by

sabakabha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

NoSQL Systems

RDBMS Databases

● good for handling transactional workloads involving small amounts of


data with random read/write properties.
● are ACID-compliant, atomicity, consistency, isolation, and durability.
○ they are generally restricted to a single node.
○ do not provide out-of-the-box redundancy and fault tolerance.
● To handle large volumes of data RDBMSs employ vertical scaling which
is a more costly
○ RDBMSs less than ideal for long-term storage of data that accumulates over time
RDBMS Databases
● Relational databases need to be
manually sharded, mostly using
application logic.
○ This means that the
application logic needs
to know which shard to
query in order to get the
required data.
○ This further complicates
data processing when
data from multiple
shards is required.
RDBMS Databases
● the use of the application logic
to join data retrieved from
multiple shards
RDBMS Databases

● Relational databases generally require data to adhere to a schema.


○ semi-structured and unstructured data not directly supported.
● traditional RDBMS is generally not useful as the primary storage device
in a Big Data solution environment.
Types of NoSQL Systems

1 Key-value Database

2. Document-oriented Database

3. Column-oriented Database

4. Graph Database
Key-value Database

● One of the simplest NoSQL databases.


● Data is represented as a collection of <key,value> pairs.
● It works by storing buckets of <key,value> pairs in a logical way in which all
relevant data relating to an item are stored within that item.
● A key can have a dynamic set of attributes attached to it. fast response time
● ability to store an enormous number of records with extremely low-latency
● provides all the maintenance and failover services
● Some examples of this type of databases are Redis, Riak, Amazon
DynamoDB, and Voldemort .
DOCUMENT—ORIENTED DATABASE

● A document-oriented database extends the concept of a key-value


database by employing flexible data structures
● Store records as “documents”
● support nested and complex structure documents to define subcategories
of information.
● he data values in a key-value database are opaque to the store, whereas
the data values in a document-oriented database are transparent to the
store
DOCUMENT—ORIENTED DATABASE
DOCUMENT—ORIENTED DATABASE

● Strengths
○ Cost of scaling out compared to a SQL database.
○ Can index the fields of documents which allows the user to query not only by the primary
key but also by a document’s contents.
○ Schemaless, completely free to define the contents of a document.
● Limitations
○ Generally not suitable for business transaction application.
○ does not offer any referential integrity support.
○ does not offer joins across collections.
MongoDB

● Data Representation:
○ MongoDB is a document-style database.
○ Document is analogous to the concept of row in RDBMS.
○ In MongoDB, a Collection is a group of documents. This is analogous to a table in RDBMS
○ Documents in MongoDB are stored in JavaScript Object Notation (JSON) format
● Indexing and Sharding
○ Documents are indexed according to keywords for faster access and retrieval.
○ sharding (or index sharding) is the process of splitting a database across multiple
machines.
○ MongoDB incorporates auto-sharding, through which a MongoDB cluster can split data
and re-balance automatically.
MongoDB

● Automatic sharding benefits:


○ Automatic balancing of data.
○ Scaling out with minimal down time, i.e., new hosts can be added.
○ Replication to avoid single point of failure.

MongoDB

● A shard consists of one or more servers that contains the subset of data
that it is responsible for.
● If there are more than one servers in a shard then a shard may also contain
replicated data.
○ If there are more than one servers in a shard then a shard may also contain replicated
data.

Example
Mongo DB + Python

! python -m pip install pymongo==3.7.2


###########
import pymongo
from pymongo import MongoClient
client = MongoClient()
#######
Mongo DB + Python

#create db1
mydb =client[ "db1"]
#create collection

mydb.create_collection( 'addressbook ')

# Set the collection to work with


collection = mydb. addressbook
# Insert one item to create the collection

collection.insert_one({ 'name' : 'Ali'})


# Show the existing collections
list (collection.find())
Mongo DB + Python

#insert
data = { 'name' : "Ali" , # String
'age' : 25, # Integer
'gender' : "M", # String
'address': {
'street' : "ahmad tarawnwh" , # String
'number' : 77, # Integer
'city' : "AMMAN", # String
'floor' : None, # Null
'postalcode' : "11910", # String containing a
number
},
'favouriteFruits' : ['banana','pineapple' ,'orange'] # Array
}
collection.insert_one( data)
Mongo DB + Python

list ( collection.find() )
list ( collection.find( {'name' : "Ali" } ))
#Projection : selecting only some fields
list ( collection.find( {},{'name' : 1,'age':1 } ))
#Projection : avoiding some fields
list ( collection.find( {},{'name' : 0,'age':0 } ))
#Projection : selecting only some fields and avoid the id
list ( collection.find( {},{'name' : 1,'address.city':1,'_id':0 } ))
#Projection : selecting only some fields
list ( collection.find( {},{'name' : 1,'address.city':1,'_id':0 } ))
Comparison Query Operators

Source
Comparison Query Operators
#Example comparison operators
list ( collection.find( {'age' : {'$lt':30}} ))

list ( collection.find( {'age' : {'$lt':30}}, {'name' : 1,'age':1,'_id':0 } ))


list ( collection.find( {'age' : {'$gte':25}}, {'name' : 1,'age':1,'_id':0 } ))
#$in operator
list ( collection.find( {'age' : {'$in':[20,30]}}, {'name' : 1,'age':1,'_id':0 } ))
#$nin operator
list ( collection.find( {'age' : {'$nin':[20,30]}}, {'name' : 1,'age':1,'_id':0 } ))
Logical Query Operators

Source
Logical Query Operators
list ( collection.find( {
'$and':[ { 'name':"Ali"}, {'age' : {'$lt':30} } ]},
{'name' : 1,'age':1,'_id':0 } ))
list ( collection.find( {
'$and':[ { 'age':{'$gt':15} }, {'age' : {'$lt':30} } ]},
{'name' : 1,'age':1,'_id':0 } ))
list ( collection.find( {
'age':{'$gt':15,'$lt':30} } ,
{'name' : 1,'age':1,'_id':0 } ))

Source
Sorting
list ( collection.find( {} ,
{'name' : 1,'age':1,'_id':0 }
).sort('age',-1) )

list ( collection.find( {} ,
{'name' : 1,'age':1,'_id':0 }
).sort( [('name',pymongo.ASCENDING),('age',pymongo.DESCENDING) ] ) )

.sort([('name', 1), ('age', -1)])


Aggregation Operations

● You can use aggregation operations to:


○ Group values from multiple documents together.
○ Perform operations on the grouped data to return a single result.
○ Analyze data changes over time.
● We can use
○ Aggregation pipelines
○ Single purpose aggregation methods
Aggregation Pipeline

A pipeline consists of one or more stages that process documents


Sample operation on each stage
● $project – select fields for the output documents.
● $match – select documents to be processed.
● $sort – sort documents.
● $group – group documents by a specified key.
….
Example

mydb.create_collection( 'stdinfo')
std_collection=mydb.stdinfo
data =[
{'name':'ali','gpa':90,'prog':"CS"},
{'name':'zaid','gpa':88, 'prog':"DS"},
{'name':'ahmed','gpa':70,'prog':"SE"},
{'name':'maryam','gpa':68,'prog':"SE"},
{'name':'fatema','gpa':87,'prog':"DS"},
{'name':'kareem','gpa':77,'prog':"CS"}
]

std_collection.insert_many(data)
list(std_collection.find())
list (std_collection.aggregate([
{
'$group': {
'_id': '$prog',
'agvGPA': {'$avg': "$gpa"}
}
}

])
)
list (std_collection.aggregate([
{
'$group': {
'_id': '$prog',
'agvGPA': {'$avg': "$gpa"}
}
},
{
'$sort': {'agvGPA': -1 }
}
])
)
list (std_collection.aggregate([
{
'$group': {
'_id': '$prog',
'agvGPA': {'$avg': "$gpa"}
}
},
{
'$match': { 'agvGPA': {'$gt': 70} }
},
{
'$sort': { 'agvGPA': -1 }
}
])
)
list (std_collection.aggregate([
{
'$match': { 'prog':{'$in':['CS','SE']}
}
},
{
'$group': {
'_id': '$prog',
'agvGPA': {'$avg': "$gpa"}
}
},
{'$sort': {'agvGPA': -1 } }
])
)
COLUMN-ORIENTED DATABASE
A column-oriented database stores its content by column as opposed to by row
and
serializes all of the values of a column together. A columnar database aims to
efficiently
retrieve or write data from hard disk storage in order to speed up the time it
takes to return
a query.
Strengths

● High data. compression and help storage capacity to be used more


efficiently
● Can achieve high query performance on aggregation queries such as AVG.
SUM. MAX. MIN. and COUNT
● more efficient for inserting a single column values at once as this can be
written efficienttly without affecting any other columns for the rows.
● The quick searching, scanning and aggregation abilities of column
oriented database storage are higlily efficient for analytics
GRAPH DATABASE

You might also like