KEMBAR78
DataScienceVSEM NoSQL DataBases | PDF | Database Index | Mongo Db
0% found this document useful (0 votes)
88 views74 pages

DataScienceVSEM NoSQL DataBases

data science bsc nosql databases

Uploaded by

sana siddiqua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views74 pages

DataScienceVSEM NoSQL DataBases

data science bsc nosql databases

Uploaded by

sana siddiqua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

B.

Sc Data Science V Semester Lab Record

NISHITHA COMMERCE AND SCIENCE COLLEGE

B.Sc Data Science V Semester

No SQL Databases
Lab Manual

Prepared by
S.Jayanth Reddy

No SQL Databases Page 1


B.Sc Data Science V Semester Lab Record

B.Sc. III Year V Semester (CBCS) : Data Science Syllabus


(With Mathematics Combination)
(Examination at the end of Semester - V)
Practical – 5(B) : NoSQL Data Bases (Lab)
[3 HPW :: 1 Credit :: 50 Marks]

Objective: The main objective of this lab is to become familiar with the four NoSQL
databases:

1. Redis for key-value databases,


2. MongoDB for document databases,
3. Cassandra for column-family databases, and
4. Neo4J for graph databases
NoSQL Databases:
1. Redis (http://redis.io)
2. MongoDB (http://www.mongodb.org)
3. Cassandra (http://cassandra.apache.org)
4. Neo4j (http://neo4j.com)
Exercises

1. Installation of NoSQL Databases: Redis, MongoDB, Cassandra, Neo4j on

Windows & Linux

2. Practice CRUD (Create, Read, Update, and Delete) operations on the four

databases: Redis, MongoDB, Cassandra, Neo4j

3. Usage of Where Clause equivalent in MongoDB

4. Usage of operations in MongoDB – AND in MongoDB, OR in MongoDB,

Limit Records and Sort Records. Usage of operations in MongoDB – Indexing,

Advanced Indexing, Aggregation and Map Reduce.

5. Write a program to count the number of occurrences of a word using

MapReduce

6. Practice with ' MacDonald’s ' collection data for document oriented database.

Import restaurants collection and apply some queries to get specified output.

No SQL Databases Page 2


B.Sc Data Science V Semester Lab Record

1. Installation of NoSQL Databases: Redis, MongoDB, Cassandra, Neo4j


on Windows & Linux

Installation of Redis:

You can run Redis on Windows 10 using Windows Subsystem for Linux(a.k.a WSL2).
WSL2 is a compatibility layer for running Linux binary executables natively on
Windows 10 and Windows Server 2019. WSL2 lets developers run a GNU/Linux
environment (that includes command-line tools, utilities, and applications) directly
on Windows.

Follow these instructions to run a Redis database on Microsoft Windows 10.

Step 1: Turn on Windows Subsystem for Linux


In Windows 10, Microsoft replaced Command Prompt with PowerShell as the default
shell. Open PowerShell as Administrator and run this command to enable Windows
Subsystem for Linux (WSL):

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-


Subsystem-Linux

Reboot Windows after making the change — note that you only need to do this once.

Step 2: Launch Microsoft Windows Store


start ms-windows-store:

Then search for Ubuntu, or your preferred distribution of Linux, and download the
latest version.

Step 3: Install Redis server

Installing Redis is simple and straightforward. The following example works with
Ubuntu (you'll need to wait for initialization and create a login upon first use):

sudo apt-add-repository ppa:redislabs/redis


sudo apt-get update
sudo apt-get upgrade
sudo apt-get install redis-server

NOTE

The sudo command may or may not be required based on the user configuration of
your system.

No SQL Databases Page 3


B.Sc Data Science V Semester Lab Record

Step 4: Restart the Redis server


Restart the Redis server as follows:

sudo service redis-server restart

Step 5: Verify if your Redis server is running

Use the redis-cli command to test connectivity to the Redis database.

$ redis-cli
127.0.0.1:6379> set user:1 "Jane"
127.0.0.1:6379> get user:1
"Jane"

NOTE

By default, Redis has 0-15 indexes for databases, you can change that number
databases NUMBER in redis.conf.

Step 6: Stop the Redis Server


sudo service redis-server stop

No SQL Databases Page 4


B.Sc Data Science V Semester Lab Record

Installation of MongoDB:

Step 1 — Download the MongoDB MSI Installer Package

Head over here and download the current version of MongoDB. Make sure you select

MSI as the package you want to download.

Step 2 — Install MongoDB with the Installation Wizard

A. Make sure you are logged in as a user with Admin privileges. Then navigate to

your downloads folder and double click on the .msi package you just downloaded.

This will launch the installation wizard.

No SQL Databases Page 5


B.Sc Data Science V Semester Lab Record

B. Click Next to start installation.

No SQL Databases Page 6


B.Sc Data Science V Semester Lab Record

C. Accept the licence agreement then click Next.

D. Select the Complete setup.

No SQL Databases Page 7


B.Sc Data Science V Semester Lab Record

E. Select “Run service as Network Service user” and make a note of the data

directory, we’ll need this later.

F. We won’t need Mongo Compass, so deselect it and click Next.

No SQL Databases Page 8


B.Sc Data Science V Semester Lab Record

G. Click Install to begin installation.

No SQL Databases Page 9


B.Sc Data Science V Semester Lab Record

F. Hit Finish to complete installation.

Step 3— Create the Data Folders to Store our Databases


A. Navigate to the C Drive on your computer using Explorer and create a new folder
called data here.

No SQL Databases Page 10


B.Sc Data Science V Semester Lab Record

B. Inside the data folder you just created, create another folder called db.

Step 4 — Setup Alias Shortcuts for Mongo and Mongod


Once installation is complete, we’ll need to set up MongoDB on the local system.

A. Open up your Hyper terminal running Git Bash.

B. Change directory to your home directory with the following command:

cd ~

C. Here, we’re going to create a file called .bash_profile using the following
command:

touch .bash_profile

D. Open the newly created .bash_profile with vim using the following command:

vim .bash_profile

E. In vim, hit the I key on the keyboard to enter insert mode.

No SQL Databases Page 11


B.Sc Data Science V Semester Lab Record

F. In your explorer go to C → Program Files → MongoDB → Server

Now you should see the version of your MongoDB.

No SQL Databases Page 12


B.Sc Data Science V Semester Lab Record

G. Paste in the following code into vim, make sure your replace the 4.0 with your

version that you see in explorer

alias mongod="/c/Program\ files/MongoDB/Server/4.0/bin/mongod.exe"

alias mongo="/c/Program\ Files/MongoDB/Server/4.0/bin/mongo.exe"

F. Hit the Escape key on your keyboard to exit the insert mode. Then type

:wq!

to save and exit Vim.

Step 5 — Verify That Setup was Successful


A. Close down the current Hyper terminal and quit the application.

B. Re-launch Hyper.

C. Type the following commands into the Hyper terminal:

mongo --version

Once you’ve hit enter, you should see something like this:

No SQL Databases Page 13


B.Sc Data Science V Semester Lab Record

This means that you have successfully installed and setup MongoDB on your local

system!

No SQL Databases Page 14


B.Sc Data Science V Semester Lab Record

Installation of Cassandra:

Step 1) Run the Datastax community edition setup.


After running the Setup, following page will be displayed. Here in the screenshot 64
bit version is being installed. You can download 32 bit version as well according to
your requirements. But I recommend 64 bit version to use.

This page gives you information about the Cassandra version you are going to install.
Press the ‘next’ button.

Step 2) press the ‘next’ button.


After pressing the ‘next’ button, following page will be displayed.

No SQL Databases Page 15


B.Sc Data Science V Semester Lab Record

This page is about the license agreement. Mark the checkbox and press the next
button.
Step 3) Press the ‘next’ button.
The following page will be displayed asks about the installation location.

1. Default location is C:\Program Files. You can change installation location if


you want to change. It is recommended not to change installation location.
2. After setting installation location, press the ‘next’ button

No SQL Databases Page 16


B.Sc Data Science V Semester Lab Record

Step 4) start Cassandra and OpsCenter.


After pressing ‘next’ button in above step, the following page will be displayed. This
page asks about whether you want to automatically start Cassandra and OpsCenter.

1. Mark the checkboxes if you want to automatically start Cassandra and


opsCenter.
2. After providing this information, press the ‘next’ button.

Step 5) Click on install button.


Setup has collected all the necessary information and now the setup is ready to
install. Press install button.

No SQL Databases Page 17


B.Sc Data Science V Semester Lab Record

Step 6) Click on Next


After pressing ‘install’ button, following page will be displayed.

No SQL Databases Page 18


B.Sc Data Science V Semester Lab Record

Datastax community edition is being installed. After installation is completed, click


on next button. When setup is installed successfully, press the ‘Finish’ button.

Go to windows start programs, search Cassandra CQL Shell and run the Cassandra
Shell. After running Cassandra shell, you will see the following command line

Now you can create a keyspace, tables, and write queries.

No SQL Databases Page 19


B.Sc Data Science V Semester Lab Record

Installation of Neo4j:

Before you install Neo4j on Windows, check System Requirements to see if your
setup is suitable.

Windows console application


1. If it is not already installed, get OpenJDK 17 or Oracle Java 17.
2. Download the latest release from Neo4j Download Center.
Select the appropriate ZIP distribution.
3. Make sure to download Neo4j from Neo4j Download Center and always check
that the SHA hash of the downloaded file is correct:
a. To find the correct SHA hash, go to Neo4j Download Center and click on SHA-
256 which will be located below your downloaded file.
b. Using the appropriate commands for your platform, display the SHA-256 hash
for the file that you downloaded.
c. Ensure that the two are identical.
4. Right-click the downloaded file and click Extract All.
5. Place the extracted files in a permanent home on your server, for
example, D:\neo4j\. The top-level directory is referred to as NEO4J_HOME.
6. From Neo4j v5.4 onwards, you are required to accept either the commercial or the
evaluation license agreement before running the Neo4j Enterprise Edition. If you are
using Community Edition, you can skip this step.
 Use one of the following options to accept the commercial license agreement.
See the Neo4j licensing page for details on the available agreements.
o Set it as an environment variable using set
NEO4J_ACCEPT_LICENSE_AGREEMENT=yes.
o Run <NEO4J_HOME>\bin\neo4j-admin server license --accept-
commercial
 Use one of the following options to accept the Neo4j Evaluation Agreement
for Neo4j Software.
o Set it as an environment variable using set
NEO4J_ACCEPT_LICENSE_AGREEMENT=eval.
o Run <NEO4J_HOME>\bin\neo4j-admin server license --accept-
evaluation.
Start Neo4j:
 To run Neo4j as a console application, use: <NEO4J_HOME>\bin\neo4j
console.
 To install Neo4j as a service use: <NEO4J_HOME>\bin\neo4j windows-service
install. For additional commands and to learn about the Windows PowerShell
module included in the Zip file, see Windows PowerShell module.
Open http://localhost:7474 in your web browser.
Connect using the username neo4j with the default password neo4j. You will then be
prompted to change the password.
Stop the server by typing Ctrl-C in the console.

No SQL Databases Page 20


B.Sc Data Science V Semester Lab Record

Windows service
Neo4j can also be run as a Windows service. Install the service with bin\neo4j
windows-service install, and start it with bin\neo4j start.
The available commands
for bin\neo4j are: version, help, console, start, stop, restart, status, and windows-
service.
When installing a new release of Neo4j, you must first run bin\neo4j windows-service
uninstall on any previously installed versions.

Java options

When Neo4j is installed as a service, Java options are stored in the service
configuration. Changes to these options after the service is installed will not take
effect until the service configuration is updated. For example, changing the
setting server.memory.heap.initial_size in neo4j.conf will not take effect until the
service is updated and restarted. To update the service, run bin\neo4j update-
service. Then restart the service to run it with the new configuration. To update the
service, run bin\neo4j windows-service update.
The same applies to the path to where Java is installed on the system. If the path
changes, for example when upgrading to a new version of Java, it is necessary to run
the update-service command and restart the service. Then the new Java location will
be used by the service.
Example 1. Update service example

1. Install service
bin\neo4j windows-service install
2. Change memory configuration
3. echo server.memory.heap.initial_size=8g >> conf\neo4j.conf
echo server.memory.heap.initial_size=16g >> conf\neo4j.conf
4. Update service
bin\neo4j windows-service update
5. Restart service
bin\neo4j restart

Windows PowerShell module


The Neo4j PowerShell module allows administrators to:
 Install, start and stop Neo4j Windows® Services.
 Start tools, such as Neo4j Admin and Cypher Shell.

No SQL Databases Page 21


B.Sc Data Science V Semester Lab Record

 The PowerShell module is installed as part of the ZIP file distributions of


Neo4j.

System requirements

 Requires PowerShell v2.0 or above.


 Supported on either 32 or 64 bit operating systems.

Managing Neo4j on Windows

On Windows, it is sometimes necessary to Unblock a downloaded ZIP file before you


can import its contents as a module. If you right-click on the ZIP file and choose
"Properties" you will get a dialog which includes an "Unblock" button, which will
enable you to import the module.
Running scripts has to be enabled on the system. This can, for example, be achieved
by executing the following from an elevated PowerShell prompt:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned
For more information, see About execution policies.
The PowerShell module will display a warning if it detects that you do not have
administrative rights.

How do I import the module?

The module file is located in the bin directory of your Neo4j installation, i.e. where
you unzipped the downloaded file. For example, if Neo4j was installed
in C:\Neo4j then the module would be imported like this:
Import-Module C:\Neo4j\bin\Neo4j-Management.psd1
This will add the module to the current session.
Once the module has been imported you can start an interactive console version of a
Neo4j Server like this:
Invoke-Neo4j console
To stop the server, issue Ctrl-C in the console window that was created by the
command.

How do I get help about the module?

Once the module is imported you can query the available commands like this:
Get-Command -Module Neo4j-Management
The output should be similar to the following:
CommandType Name Version Source
----------- ---- ------- ------
Function Invoke-Neo4j 5.11.0 Neo4j-Management
Function Invoke-Neo4jAdmin 5.11.0 Neo4j-Management
Function Invoke-Neo4jBackup 5.11.0 Neo4j-Management
Function Invoke-Neo4jImport 5.11.0 Neo4j-Management
Function Invoke-Neo4jShell 5.11.0 Neo4j-Management

No SQL Databases Page 22


B.Sc Data Science V Semester Lab Record

2. Practice CRUD (Create, Read, Update, and Delete) operations on the


four databases: Redis, MongoDB, Cassandra, Neo4j

CURD Operations Using Redis

Create Spring data redis crud application


Create a Spring boot application with required dependencies. Add spring-boot-
starter-web, spring-boot-starter-data-redis, and Lombok dependencies to
the pom.xml file of the Spring Boot application.

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>

Creating REST endpoints


Let us create the REST endpoints to perform the CRUD operations with Redis.

Create an entity class


Create a java class with the name Student. Also, this class is similar to the JPA entity
class, which will get persisted in the Redis server.
package com.asbnotebook.entity;
import org.springframework.data.annotation.Id;
import org.springframework.data.redis.core.RedisHash;
import org.springframework.data.redis.core.index.Indexed;
import lombok.Data;
@Data
@RedisHash(value = "student")
public class Student {
@Id
@Indexed

private String id;

private String name;

private String grade;

No SQL Databases Page 23


B.Sc Data Science V Semester Lab Record

 @Data: Lombok annotation that provides the required constructor and getter/setter
methods.
 @RedisHash: The annotation marks objects as aggregate roots to be stored in
a Redis hash.
 @Id: Indicates that this is the Id field of the entity class.
 @Indexed: Creates an index on Redis for the annotated field, which helps in
improvised performance during retrieval of data.

Create repository layer

Similar to the JPA repositories, Spring boot provides built-in support for basic data
operations for Redis as well.

Create an interface with the name StudentRepository, and


extend CrudRepositroy to make use of the basic out-of-the-box data functionalities
provided by Spring Boot.

package com.asbnotebook.repository;
import org.springframework.data.repository.CrudRepository;
import com.asbnotebook.entity.Student;
public interface StudentRepository extends CrudRepository<Student, String> {
}

Create REST endpoints


Now, create a java class with the name StudentController.

This class will have all the CRUD endpoints required for our application.

Also, notice that we have auto wired the repository instance into the controller class
and used available methods to perform the CRUD operation on our Student object.

package com.asbnotebook.controller;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
import com.asbnotebook.entity.Student;

No SQL Databases Page 24


B.Sc Data Science V Semester Lab Record

import com.asbnotebook.repository.StudentRepository;
@RestController
public class StudentController {
@Autowired
private StudentRepository studentRepository;
@PostMapping("/students")
public ResponseEntity<Student> createStudent(@RequestBody Student student) {
Student savedStudent = studentRepository.save(student);
return new ResponseEntity<>(savedStudent, HttpStatus.CREATED);
}
@PutMapping("/student/{id}")
public ResponseEntity<Student> updateStudent(@PathVariable(name = "id") String id,
@RequestBody Student student) {
Optional<Student> std = studentRepository.findById(id);
if (std.isPresent()) {
Student studentDB = std.get();
studentDB.setGrade(student.getGrade());
studentDB.setName(student.getName());
Student updatedStudent = studentRepository.save(studentDB);
return new ResponseEntity<>(updatedStudent, HttpStatus.CREATED);
}
return null;
}
@GetMapping("/students")
public ResponseEntity<List<Student>> getStudents() {
List<Student> students = new ArrayList<>();
studentRepository.findAll().forEach(students::add);
return new ResponseEntity<>(students, HttpStatus.OK);
}
@DeleteMapping("/student/{id}")
public ResponseEntity<String> deleteStudent(@PathVariable(name = "id") String id) {
studentRepository.deleteById(id);
return new ResponseEntity<>("Student with id:" + id + " deleted successfully", HttpStatus.OK);
}
}

Configuring Redis
Add the below Redis configurations to the spring Boot
applications application.properties configuration file under
the /src/main/resources/ directory.

spring.redis.host=localhost

spring.redis.port=6379

Testing the CRUD operation


Run the Spring Boot application.

No SQL Databases Page 25


B.Sc Data Science V Semester Lab Record

Create

Pass the student details by passing the JSON request to the POST endpoint.

Finally, we can notice that the Redis hash in the Redis server, as shown below.

Update

Send the JSON request object to the PUT method with an updated student name.

No SQL Databases Page 26


B.Sc Data Science V Semester Lab Record

Get

We can also get all the available students in the Redis server by calling the GET
endpoint.

No SQL Databases Page 27


B.Sc Data Science V Semester Lab Record

Delete

Pass the id to the DELETE endpoint to delete the student object, stored in the Redis
server.

No SQL Databases Page 28


B.Sc Data Science V Semester Lab Record

CURD Operations Using MongoDB

As we know that we can use MongoDB for various things like building an
application (including web and mobile), or analysis of data, or an administrator of a
MongoDB database, in all these cases we need to interact with the MongoDB server
to perform certain operations like entering new data into the application, updating
data into the application, deleting data from the application, and reading the data
of the application. MongoDB provides a set of some basic but most essential
operations that will help you to easily interact with the MongoDB server and these
operations are known as CRUD operations.

Create Operations –
The create or insert operations are used to insert or add new documents in the
collection. If a collection does not exist, then it will create a new collection in the
database. You can perform, create operations using the following methods provided
by the MongoDB:

Method Description

It is used to insert a single document in the


db.collection.insertOne() collection.

It is used to insert multiple documents in


db.collection.insertMany() the collection.

db.createCollection() It is used to create an empty collection.

Example 1: In this example, we are inserting details of a single student in the form
of document in the student collection using db.collection.insertOne()

No SQL Databases Page 29


B.Sc Data Science V Semester Lab Record

method.

Example 2: In this example, we are inserting details of the multiple students in


the form of documents in the student collection using db.collection.insertMany()
method.

No SQL Databases Page 30


B.Sc Data Science V Semester Lab Record

Read Operations –
The Read operations are used to retrieve documents from the collection, or in other
words, read operations are used to query a collection for a document. You can
perform read operation using the following method provided by the MongoDB:

Method Description

db.collection.find() It is used to retrieve documents from the collection.

.pretty() : this method is used to decorate the result such that it is easy
to read.
Example : In this example, we are retrieving the details of students from the
student collection using db.collection.find() method.

No SQL Databases Page 31


B.Sc Data Science V Semester Lab Record

Update Operations –
The update operations are used to update or modify the existing document in the
collection. You can perform update operations using the following methods
provided by the MongoDB:

Method Description

It is used to update a single document in


db.collection.updateOne() the collection that satisfy the given criteria.

It is used to update multiple documents in


db.collection.updateMany() the collection that satisfy the given criteria.

It is used to replace single document in the


db.collection.replaceOne() collection that satisfy the given criteria.

Example 1: In this example, we are updating the age of Sumit in the student
collection using db.collection.updateOne() method.

No SQL Databases Page 32


B.Sc Data Science V Semester Lab Record

Example 2: In this example, we are updating the year of course in all the
documents in the student collection using db.collection.updateMany() method.

No SQL Databases Page 33


B.Sc Data Science V Semester Lab Record

Delete Operations –
The delete operation are used to delete or remove the documents from a collection.
You can perform delete operations using the following methods provided by the
MongoDB:

Method Description

It is used to delete a single document from


db.collection.deleteOne() the collection that satisfy the given criteria.

It is used to delete multiple documents from


db.collection.deleteMany() the collection that satisfy the given criteria.

No SQL Databases Page 34


B.Sc Data Science V Semester Lab Record

Example 1: In this example, we are deleting a document from the student


collection using db.collection.deleteOne() method.

No SQL Databases Page 35


B.Sc Data Science V Semester Lab Record

Example 2: In this example, we are deleting all the documents from the student
collection using db.collection.deleteMany() method.

No SQL Databases Page 36


B.Sc Data Science V Semester Lab Record

CURD Operations Using Cassandra:


Cassandra CRUD Operation stands for Create, Update, Read and Delete or Drop.
These operations are used to manipulate data in Cassandra. Apart from this, CRUD
operations in Cassandra, a user can also verify the command or the data.
a. Create Operation

A user can insert data into the table using Cassandra CRUD operation. The data is
stored in the columns of a row in the table. Using INSERT command with proper
what, a user can perform this operation.

Create Operation-

INSERT INTO <table name>

(<column1>,<column2>....)

VALUES (<value1>,<value2>...)

USING<option>
Let’s create a table data to illustrate the operation. Example consist of a table with
information about students in college. The following table will give the details about
the students.

Table.1 Cassandra Crud Operation – Create Operation

EN NAME BRANCH PHONE CITY

Electrical
001 Ayush 9999999999 Boston
Engineering

Computer
002 Aarav 8888888888 New York City
Engineering

003 Kabir Applied Physics 7777777777 Philadelphia

EXAMPLE 1: Creating a table and inserting the data into a table:


INPUT:

cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)

VALUES(001, 'Ayush', 'Electrical Engineering', 9999999999, 'Boston');

cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)

VALUES(002, 'Aarav', 'Computer Engineering', 8888888888, 'New York City');

No SQL Databases Page 37


B.Sc Data Science V Semester Lab Record

cqlsh:keyspace1> INSERT INTO student(en, name, branch, phone, city)

VALUES(003, 'Kabir', 'Applied Physics', 7777777777, 'Philadelphia');

Table.2 Cassandra Crud Operation – OUTPUT After Verification (READ


operation)

EN NAME BRANCH PHONE CITY

Electrical
001 Ayush 9999999999 Boston
Engineering

Computer
002 Aarav 8888888888 New York City
Engineering

003 Kabir Applied Physics 7777777777 Philadelphia

Update Operation
The second operation in the Cassandra CRUD operation is the UPDATE operation. A
user can use UPDATE command for the operation. This operation uses three
keywords while updating the table.
 Where: This keyword will specify the location where data is to be updated.
 Set: This keyword will specify the updated value.
 Must: This keyword includes the columns composing the primary key.
Furthermore, at the time of updating the rows, if a row is unavailable, then
Cassandra has a feature to create a fresh row for the same.

A Syntax of Update Operation-


UPDATE <table name>

SET <column name>=<new value>

<column name>=<value>...

WHERE <condition>
EXAMPLE 2: Let’s change few details in the table ‘student’. In this example, we will
update Aarav’s city from ‘New York City’ to ‘San Fransisco’.

INPUT:
cqlsh:keyspace1> UPDATE student SET city='San Fransisco'

WHERE en=002;

No SQL Databases Page 38


B.Sc Data Science V Semester Lab Record

Table.3 Cassandra Crud Operation – OUTPUT After Verification


EN NAME BRANCH PHONE CITY

Electrical
001 Ayush 9999999999 Boston
Engineering

Computer
002 Aarav 8888888888 San Fransisco
Engineering

003 Kabir Applied Physics 7777777777 Philadelphia

Read Operation
This is the third Cassandra CRUD Operation – Read Operation. A user has a choice
to read either the whole table or a single column. To read data from a table, a user
can use SELECT clause. This command is also used for verifying the table after every
operation.

SYNTAX to read the whole table-

SELECT * FROM <table name>;


EXAMPLE 3: To read the whole table ‘student’.

INPUT:
cqlsh:keyspace1> SELECT * FROM student;
Table.4 Cassandra Crud Operation – OUTPUT After Verification

EN NAME BRANCH PHONE CITY

Electrical
001 Ayush 9999999999 Boston
Engineering

Computer
002 Aarav 8888888888 San Fransisco
Engineering

003 Kabir Applied Physics 7777777777 Philadelphia

SYNTAX to read selected columns-


SELECT <column name1>,<column name2>.... FROM <table name>;

No SQL Databases Page 39


B.Sc Data Science V Semester Lab Record

EXAMPLE 4: To read columns of name and city from table ‘student’.


INPUT:

cqlsh:keyspace1> SELECT name, city FROM student;


Table.5 Cassandra Crud Operation – OUTPUT After Verification

NAME CITY

Ayush Boston

Aarav San Fransisco

Kabir Philadelphia

Delete Operation
Delete operation is the last Cassandra CRUD Operation, allows a user to delete data
from a table. The user can use DELETE command for this operation.
A Syntax of Delete Operation-

DELETE <identifier> FROM <table name> WHERE <condition>;


EXAMPLE 5: In the ‘student’ table let us delete the ‘phone’ or phone number from
003 row.

cqlsh:keyspace1> DELETE phone FROM student WHERE en=003;


Table.6 Cassandra Crud Operation – OUTPUT After Verification

EN NAME BRANCH PHONE CITY

Electrical
001 Ayush 9999999999 Boston
Engineering

Computer
002 Aarav 8888888888 San Fransisco
Engineering

003 Kabir Applied Physics null Philadelphia

SYNTAX for deleting the entire row-

DELETE FROM <identifier> WHERE <condition>;

No SQL Databases Page 40


B.Sc Data Science V Semester Lab Record

EXAMPLE 6: In the ‘student’ table, let us delete the entire third row.
cqlsh:keyspace1> DELETE FROM student WHERE en=003;

Table.7 Cassandra Crud Operation – OUTPUT After Verification

EN NAME BRANCH PHONE CITY

Electrical
001 Ayush 9999999999 Boston
Engineering

Computer
002 Aarav 8888888888 San Fransisco
Engineering

CURD Operations Using Neo4j:

SAVE : To store the data in neo4j we need to use create a statement of neo4j which
creates the specified labeled node in the database and set the respective property in
the node, eg if we need to create the node which stores the movie data then we
need to use the label movie or as desired and set parameter in indexed order.

session
.run("CREATE (movie:Movie{name:{name},releaseDate:{releaseDate}}) RETURN
movie",{name:"MY MOVIE",releaseDate:"22-04-18"})
.then(function (result) {
console.log(result);
})
.catch(function (error) {
console.log(error);
});

GET:: To get the data from neo4j using bolt driver we need to use GET statement of
neo4j which takes the name of the label to search the specified labeled nodes in the
database and return the respective properties of the nodes.

session
.run("MATCH (movie:Movie) RETURN movie")
.then(function (result) {
console.log(result);
})
.catch(function (error) {
console.log(error);
});

No SQL Databases Page 41


B.Sc Data Science V Semester Lab Record

Note if we don't mention the label then it will return all the nodes of the database.

UPDATE: To update any nodes in the database we need to use MATCH statement
of neo4j and set the respective properties in the node.
session
.run('MATCH (movie:Movie) where id(movie)={id} set movie.name={name} return
movie', { id:121,name:"MOVIE-2" })
.then(function (result) {
console.log(result);
})
.catch(function (error) {
console.log(error);
});

Note: The searching of the nodes will be faster when we mention the label of the
node.

DELETE: To delete any nodes in the database we need to use the DELETE
statement of the neo4j. If the nodes are attached to any other node with a
relationship then we must need to detach the node from a relationship before delete.

session
.run('MATCH (movie:Movie) where id(movie)={id} detach delete movie', { id: 121
})
.then(function (result) {
console.log(result);
})
.catch(function (error) {
console.log(error);
});

No SQL Databases Page 42


B.Sc Data Science V Semester Lab Record

3. Usage of Where Clause equivalent in MongoDB

The MongoDB $where operator is used to match documents that satisfy a JavaScript
expression. A string containing a JavaScript expression or a JavaScript function can
be pass using the $where operator. The JavaScript expression or function may be
referred as this or obj.

Our database name is 'myinfo' and our collection name is 'table3'. Here,
is the collection bellow.

Sample collection "table3"


{
"_id" : ObjectId("52873b364038253faa4bbc0e"),
"student_id" : "STU002",
"sem" : "sem1",
"english" : "A",
"maths" : "A+",
"science" : "A"
}
{
"_id" : ObjectId("52873b5d4038253faa4bbc0f"),
"student_id" : "STU001",
"sem" : "sem1",
"english" : "A+",
"maths" : "A+",
"science" : "A"
}
{
"_id" : ObjectId("52873b7e4038253faa4bbc10"),
"student_id" : "STU003",
"sem" : "sem1",
"english" : "A+",
"maths" : "A",
"science" : "A+"
}

No SQL Databases Page 43


B.Sc Data Science V Semester Lab Record

Example of MongoDB Evaluation Query operator - $where

If we want to select all documents from the collection "table3" which satisfying the
condition -

The grade of english must be same as science

the following mongodb command can be used :

>db.table3.find( { $where: function() { return (this.english ==


this.science) }}).pretty();

N.B. find() method displays the documents in a non structured format but to display
the results in a formatted way, the pretty() method can be used.

Output:
{
"_id" : ObjectId("52873b364038253faa4bbc0e"),
"student_id" : "STU002",
"sem" : "sem1",
"english" : "A",
"maths" : "A+",
"science" : "A"
}
{
"_id" : ObjectId("52873b7e4038253faa4bbc10"),
"student_id" : "STU003",
"sem" : "sem1",
"english" : "A+",
"maths" : "A",
"science" : "A+"
}

If we want to get the above output the other mongodb statements can be written as
below -

>db.table3.find( { $where: function() { return (obj.english ==


obj.science)}}).pretty();

>db.table3.find( "this.english == this.science").pretty();

No SQL Databases Page 44


B.Sc Data Science V Semester Lab Record

4. Usage of operations in MongoDB – AND in MongoDB, OR in


MongoDB, Limit Records and Sort Records. Usage of operations in
MongoDB – Indexing, Advanced Indexing, Aggregation and Map
Reduce.
MongoDB AND operator ( $and )

MongoDB provides different types of logical query operators and $and operator is
one of them. This operator is used to perform logical AND operation on the array of
one or more expressions and select or retrieve only those documents that match all
the given expression in the array. You can use this operator in methods like find(),
update(), etc. according to your requirements.
 This operator performs short-circuit evaluation.
 If the first expression of $and operator evaluates to false, then MongoDB will
not evaluate the remaining expressions in the array.
 You can also use AND operation implicitly with the help of comma(, ).
 You can use AND operation explicitly (i.e., $and) when the same field or
operator specified in multiple expressions.

Syntax:
{ $and: [ { Expression1 }, { Expression2 }, ..., { ExpressionN } ] }
or
{ { Expression1 }, { Expression2 }, ..., { ExpressionN }}

Examples: In the following examples, we are working with:

No SQL Databases Page 45


B.Sc Data Science V Semester Lab Record

MongoDB OR operator ( $or )

MongoDB provides different types of logical query operators and $or operator is
one of them. This operator is used to perform logical OR operation on the array of
two or more expressions and select or retrieve only those documents that match at
least one of the given expression in the array.
 You can use this operator in methods like find(), update(), etc. according to
your requirements.
 You can also use this operator with text queries, GeoSpatial queries, and sort
operations.
 When MongoDB evaluating the clauses in the $or expression, it performs a
collection scan. Or if all the clauses in $or expression are supported by
indexes, then MongoDB performs index scans.
 You can also nest $or operation.

Syntax:
{ $or: [ { Expression1 }, { Expression2 }, ..., { ExpressionN } ] }
In the following examples, we are working with:

No SQL Databases Page 46


B.Sc Data Science V Semester Lab Record

MongoDB – sort() Method

The sort() method specifies the order in which the query returns the matching
documents from the given collection. You must apply this method to the cursor
before retrieving any documents from the database. It takes a document as a
parameter that contains a field: value pair that defines the sort order of the result
set. The value is 1 or -1 specifying an ascending or descending sort respectively.
 If a sort returns the same result every time we perform on same data, then
such type of sort is known as a stable sort.
 If a sort returns a different result every time we perform on same data, then
such type of sort is known as unstable sort.
 MongoDB generally performs a stable sort unless sorting on a field that
holds duplicate values.
 We can use limit() method with sort() method, it will return first m
documents, where m is the given limit.
 MongoDB can find the result of the sort operation using indexes.
 If MongoDB does not find sort order using index scanning, then it uses top-k
sort algorithm.

Syntax:
db.Collection_Name.sort({field_name:1 or -1})

Parameter:
The parameter contains a field: value pair that defines the sort order of the result
set. The value is 1 or -1 that specifies an ascending or descending sort respectively.
The type of parameter is a document.
Return:
It returns the documents in sorted order.
Examples:
In the following examples, we are working with:

No SQL Databases Page 47


B.Sc Data Science V Semester Lab Record

 Return all the documents in ascending order of the age:


db.student.find().sort({age:1})

MongoDB – limit() Method


In MongoDB, the limit() method limits the number of records or documents that
you want. It basically defines the max limit of records/documents that you want. Or
in other words, this method uses on cursor to specify the maximum number of
documents/ records the cursor will return. We can use this method after the find()
method and find() will give you all the records or documents in the collection. You
can also use some conditions inside the find to give you the result that you want.
 In this method, we only pass numeric values.
 This method is undefined for values which is less than -231 and greater than
231.
 Passing 0 in this method(limit(0)) is equivalent to no limit.

Syntax:
cursor.limit()

Or
db.collectionName.find(<query>).limit(<number>)
Examples:
In the following examples, we are working with:

No SQL Databases Page 48


B.Sc Data Science V Semester Lab Record

Indexing in MongoDB

MongoDB is leading NoSQL database written in C++. It is high scalable and


provides high performance and availability. It works on the concept of collections
and documents. Collection in MongoDB is group of related documents that are
bound together. The collection does not follow any schema which is one of the
remarkable feature of MongoDB.

Indexing in MongoDB :

MongoDB uses indexing in order to make the query processing more efficient. If
there is no indexing, then the MongoDB must scan every document in the collection
and retrieve only those documents that match the query. Indexes are special data
structures that stores some information related to the documents such that it
becomes easy for MongoDB to find the right data file. The indexes are order by the
value of the field specified in the index.

Creating an Index :
MongoDB provides a method called createIndex() that allows user to create an
index.
Syntax –

db.COLLECTION_NAME.createIndex({KEY:1})
The key determines the field on the basis of which you want to create an index and 1
(or -1) determines the order in which these indexes will be arranged(ascending or
descending).
Example –

db.mycol.createIndex({“age”:1})
{
“createdCollectionAutomatically” : false,
“numIndexesBefore” : 1,
“numIndexesAfter” : 2,
“ok” : 1
}
The createIndex() method also has a number of optional parameters.
These include:

 background (Boolean)
 unique (Boolean)
 name (string)
 sparse (Boolean)
 expireAfterSeconds (integer)
 hidden (Boolean)
 storageEngine (Document)

No SQL Databases Page 49


B.Sc Data Science V Semester Lab Record

Drop an index:
In order to drop an index, MongoDB provides the dropIndex() method.
Syntax –

db.NAME_OF_COLLECTION.dropIndex({KEY:1})
The dropIndex() methods can only delete one index at a time. In order to delete (or
drop) multiple indexes from the collection, MongoDB provides the dropIndexes()
method that takes multiple indexes as its parameters.
Syntax –

db.NAME_OF_COLLECTION.dropIndexes({KEY1:1, KEY2, 1})


The dropIndex() methods can only delete one index at a time. In order to delete (or
drop) multiple indexes from the collection, MongoDB provides the dropIndexes()
method that takes multiple indexes as its parameters.
Get description of all indexes:
The getIndexes() method in MongoDB gives a description of all the indexes that
exists in the given collection.
Syntax –

db.NAME_OF_COLLECTION.getIndexes()
It will retrieve all the description of the indexes created within the collection.

Advance Indexing in Mongo DB


MongoDB provides different types of indexes that are used according to the data
type or queries. The indexes supported by MongoDB is are as follows:
1. Single field Index: A single field index means index on a single field of a
document. This index is helpful for fetching data in ascending as well as descending
order.
Syntax:
db.students.createIndex({“<fieldName>” : <1 or -1>});
Here 1 represents the field is specified in ascending order and -1 for descending
order.
Example:
db.students.createIndex({studentsId:1})
In this example we are creating a single index on studentsId field and the field is
specified in ascending order.

No SQL Databases Page 50


B.Sc Data Science V Semester Lab Record

2. Compound Index: We can combine multiple fields for compound indexing


and that will help for searching or filtering documents in that way. Or in other
words, the compound index is an index where a single index structure holds
multiple references.
Syntax:
db.<collection>.createIndex( { <field1>: <type>, <field2>: <type2>, … } )
Here, we can combine the required fields in this pattern. Also the value of these
fields is 1(for ascending order) or -1(for descending order).
Note: Compound indexes may have a single hashed index field but a hashing
function is required by Hashed indexes to compute the hash of the value of the
index field.
Example:
Here, we create a compound index on studentAge: 1, studentName:1
db.students.createIndex({studentAge: 1, studentName:1})

No SQL Databases Page 51


B.Sc Data Science V Semester Lab Record

db.students.find().sort({"studentAge":1,"studentName":1}).pretty()
Here we are taking the sorting functionality based on “studentAge” followed by
“studentName” fields and hence in the below image, though there are 2 documents
matching for “studentAge = 25”, as studentName is an additional value given, as a
second document, studentName with value “Geek40” is displayed and after that
only, as a third document, studentName with value “GeeksForGeeksbest” is
displayed. Hence, sometimes there will be a need to create compound indexes when
we want to have a closer level of filtration.

3. Multikey Index: MongoDB uses the multikey indexes to index the values
stored in arrays. When we index a field that holds an array value then MongoDB
automatically creates a separate index of each and every value present in that array.
Using these multikey indexes we can easily find a document that contains an array
by matching the items. In MongoDB, you don’t need to explicitly specify the
multikey index because MongoDB automatically determines whether to create a
multikey index if the indexed field contains an array value.
Syntax:
db.<collection>.createIndex( { <field>: <type>} )
Here, the value of the field is 1(for ascending order) or -1(for descending order).
Example:
In the students collection, we have three documents that contains array fields.

No SQL Databases Page 52


B.Sc Data Science V Semester Lab Record

Now we create a multikey index:


db.students.createIndex({skillsets:1})

Now we view the document that holds skillsets:[“Java”, “Android”]


db.students.find({skillsets:["Java", "Android"]}).pretty()

No SQL Databases Page 53


B.Sc Data Science V Semester Lab Record

4. Geospatial Indexes: It is an important feature in MongoDB. MongoDB


provides two geospatial indexes known as 2d indexes and 2d sphere indexes using
these indexes we can query geospatial data. Here, the 2d indexes support queries
that are used to find data that is stored in a two-dimensional plane. It only supports
data that is stored in legacy coordinate pairs. Whereas 2d sphere indexes support
queries that are used to find the data that is stored in spherical geometry. It
supports data that is stored in legacy coordinate pairs as well as GeoJSON objects.
It also supports queries like queries for inclusion, intersection, and proximity, etc.
Syntax of 2d sphere indexes:
db.<collection>.createIndex( { <Locationfield>: “2dsphere”} )
Example:
Let us assume the available data for “industries”

Now, let us create a 2d sphere index on the location field:


db.industries.createIndex({location:"2dsphere"})

No SQL Databases Page 54


B.Sc Data Science V Semester Lab Record

Now, on the execution of the below query, we get


db.industries.find(
{
location:
{$near:
{
$geometry:{type: "Point", coordinates:[-73.9667, 40.78]},
$minDistance:1000,
$maxDistance: 5000
}
}
}
}.pretty()
Here, the “$near” operator returns documents that are in the specified range of at
least 1000 meters from and at most 5000 meters from the specified GeoJSON
point, and hence we are getting only Tidal Park output. Similar to $near, it can
support for $nearSphere, $geoWithin, $geoIntersects,$geoNear etc.,

No SQL Databases Page 55


B.Sc Data Science V Semester Lab Record

5. Text Index: MongoDB supports query operations that perform a text search of
string content. Text index allows us to find the string content in the specified
collection. It can include any field that contains string content or an array of string
items. A collection can contain at most one text index. You are allowed to use text
index in the compound index.
Syntax:
db.<collection>.createIndex( { <field>: “text”} )
We can give exact phrases also for searching by enclosing the search terms in
double quotes
db.<collectionname>.find( { $text: { $search: “\”<Exact search term>\”” } } )
As here enclosed in double quotes, the search results contain only exact searched
data.
In case, if we want to exclude a few texts in our search term, then we can do as
db.<collectionname>.find( { $text: { $search: “<search terms> -<not required
search terms>” } } )
Prepending a – character makes the search text to get ignored and the rest of the
text is considered.
In the text search, the results are available in unsorted order. To make it available
in sorted order of relevance score, $meta textScore field is needed and sort on it.
Example:
db.singers.find(
{ $text: { $search: "Annisten" } },
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )
Example:
In accessories collection we create text index:
db.accessories.createIndex({name: "text", description: "text"})

Now we display those documents that contain the string “Input”:


db.accessories.find({$text:{$search: "Input"}})

No SQL Databases Page 56


B.Sc Data Science V Semester Lab Record

6. Hash Index: To maintain the entries with hashes of the values of the indexed
field(mostly _id field in all collections), we use Hash Index. This kind of index is
mainly required in the even distribution of data via sharding. Hashed keys are
helpful to partition the data across the sharded cluster.
Syntax:
db.<collection>.createIndex( { _id: “hashed” } )
From Version 4.4 onwards, the compound Hashed Index is applicable
7. Wildcard Index: MongoDB supports creating indexes either on a field or set of
fields and if the set of fields are mentioned, it is called as Wildcard Index.
Generally, the wildcard index does not include _id field but if you what to include
_id field in the wildcard index then you have to define it explicitly. MongoDB
allows you to create multiple wildcard indexes in the given collection. Wildcard
indexes support queries for unknown or arbitrary fields.
Syntax:
To create a wild card index on the specified field:
db.<collection>.createIndex( { “field.$**”:1 } )
To create a wild card index on all the field:
db.<collection>.createIndex( { “$**”:1 } )
To create a wild card index on multiple specified fields:
db.<collection>.createIndex(
{ “$**”:1 },
{“wildcardProjection”:
{“field1”: 1, “field2”:2}
})
Example:
In book collection we create the wildcard index:

No SQL Databases Page 57


B.Sc Data Science V Semester Lab Record

Let us create an index for “authorTags” field


db.book.createIndex( { "authorTags.$**" : 1 } )
Since “index” is created on set of fields, we can easily query in the following way
db.book.find( { "authorTags.inclusions" : "RDBMS" } )
db.book.find( { "authorTags.usedin" : "Multipurpose" } )

No SQL Databases Page 58


B.Sc Data Science V Semester Lab Record

Aggregation in MongoDB
In MongoDB, aggregation operations process the data records/documents and
return computed results. It collects values from various documents and groups
them together and then performs different types of operations on that grouped data
like sum, average, minimum, maximum, etc to return a computed result. It is
similar to the aggregate function of SQL.
MongoDB provides three ways to perform aggregation
 Aggregation pipeline
 Map-reduce function
 Single-purpose aggregation
Aggregation pipeline
In MongoDB, the aggregation pipeline consists of stages and each stage transforms
the document. Or in other words, the aggregation pipeline is a multi-stage pipeline,
so in each state, the documents taken as input and produce the resultant set of
documents now in the next stage(id available) the resultant documents taken as
input and produce output, this process is going on till the last stage. The basic
pipeline stages provide filters that will perform like queries and the document
transformation modifies the resultant document and the other pipeline provides
tools for grouping and sorting documents. You can also use the aggregation
pipeline in sharded collection.
Let us discuss the aggregation pipeline with the help of an example:

No SQL Databases Page 59


B.Sc Data Science V Semester Lab Record

In the above example of a collection of train fares in the first stage. Here, the
$match stage filters the documents by the value in class field i.e. class: “first-class”
and passes the document to the second stage. In the Second Stage, the $group stage
groups the documents by the id field to calculate the sum of fare for each unique id.
Here, the aggregate() function is used to perform aggregation it can have three
operators stages, expression and accumulator.

Stages: Each stage starts from stage operators which are:


 $match: It is used for filtering the documents can reduce the amount of
documents that are given as input to the next stage.
 $project: It is used to select some specific fields from a collection.
 $group: It is used to group documents based on some value.
 $sort: It is used to sort the document that is rearranging them
 $skip: It is used to skip n number of documents and passes the remaining
documents
 $limit: It is used to pass first n number of documents thus limiting them.
 $unwind: It is used to unwind documents that are using arrays i.e. it
deconstructs an array field in the documents to return documents for each
element.
 $out: It is used to write resulting documents to a new collection
Expressions: It refers to the name of the field in input documents for e.g. {
$group : { _id : “$id“, total:{$sum:”$fare“}}} here $id and $fare are expressions.
Accumulators: These are basically used in the group stage
 sum: It sums numeric values for the documents in each group
 count: It counts total numbers of documents
 avg: It calculates the average of all given values from all documents
 min: It gets the minimum value from all the documents
 max: It gets the maximum value from all the documents
 first: It gets the first document from the grouping
 last: It gets the last document from the grouping
Note:
 in $group _id is Mandatory field
 $out must be the last stage in the pipeline
 $sum:1 will count the number of documents and $sum:”$fare” will give the
sum of total fare generated per id.
Examples:
In the following examples, we are working with:

No SQL Databases Page 60


B.Sc Data Science V Semester Lab Record

No SQL Databases Page 61


B.Sc Data Science V Semester Lab Record

 Displaying the total number of students in one section only

db.students.aggregate([{$match:{sec:"B"}},{$count:"Total student in sec:B"}])

In this example, for taking a count of the number of students in section B we first
filter the documents using the $match operator, and then we use
the $count accumulator to count the total number of documents that are passed
after filtering from the $match.

 Displaying the total number of students in both the sections and


maximum age from both section
db.students.aggregate([{$group: {_id:"$sec", total_st: {$sum:1},
max_age:{$max:"$age"} } }])
In this example, we use $group to group, so that we can count for every other
section in the documents, here $sum sums up the document in each group
and $max accumulator is applied on age expression which will find the maximum
age in each document.

 Displaying details of students whose age is greater than 30 using


match stage
db.students.aggregate([{$match:{age:{$gt:30}}}])
In this example, we display students whose age is greater than 30. So we use
the $match operator to filter out the documents.

No SQL Databases Page 62


B.Sc Data Science V Semester Lab Record

 Sorting the students on the basis of age


db.students.aggregate([{'$sort': {'age': 1}}])
In this example, we are using the $sort operator to sort in ascending order we
provide ‘age’:1 if we want to sort in descending order we can simply change 1 to -1
i.e. ‘age’:-1.

 Displaying details of a student having the largest age in the section –


B
db.students.aggregate([{$match:{sec:"B"}},{'$sort': {'age': -
1}},{$limit:1}])
In this example, first, we only select those documents that have section B, so for
that, we use the $match operator then we sort the documents in descending order
using $sort by setting ‘age’:-1 and then to only show the topmost result we
use $limit.

No SQL Databases Page 63


B.Sc Data Science V Semester Lab Record

 Unwinding students on the basis of subject


Unwinding works on array here in our collection we have array of subjects (which
consists of different subjects inside it like math, physics, English, etc) so unwinding
will be done on that i.e. the array will be deconstructed and the output will have
only one subject not an array of subjects which were there earlier.
db.students.aggregate([{$unwind:"$subject"}])

Map Reduce
Map reduce is used for aggregating results for the large volume of data. Map reduce
has two main functions one is a map that groups all the documents and the second
one is the reduce which performs operation on the grouped data.
Syntax:
db.collectionName.mapReduce(mappingFunction, reduceFunction,
{out:'Result'});
Example:
In the following example, we are working with:

No SQL Databases Page 64


B.Sc Data Science V Semester Lab Record

var mapfunction = function(){emit(this.age, this.marks)}


var reducefunction = function(key, values){return Array.sum(values)}
db.studentsMarks.mapReduce(mapfunction, reducefunction, {'out':'Result'})
Now, we will group the documents on the basis of age and find total marks in each
age group. So, we will create two variables first mapfunction which will emit age as
a key (expressed as “_id” in the output) and marks as value this emitted data is
passed to our reducefunction, which takes key and value as grouped data, and then

No SQL Databases Page 65


B.Sc Data Science V Semester Lab Record

it performs operations over it. After performing reduction the results are stored in a
collection here in this case the collection is Results.

Single Purpose Aggregation


It is used when we need simple access to document like counting the number of
documents or for finding all distinct values in a document. It simply provides the
access to the common aggregation process using the count(), distinct(), and
estimatedDocumentCount() methods, so due to which it lacks the flexibility and
capabilities of the pipeline.
Example:
In the following example, we are working with:

No SQL Databases Page 66


B.Sc Data Science V Semester Lab Record

No SQL Databases Page 67


B.Sc Data Science V Semester Lab Record

 Displaying distinct names and ages (non-repeating)


db.studentsMarks.distinct("name")
Here, we use a distinct() method that finds distinct values of the specified field(i.e.,
name).

 Counting the total numbers of documents


db.studentsMarks.count()
Here, we use count() to find the total number of the document, unlike find()
method it does not find all the document rather it counts them and return a
number.

No SQL Databases Page 68


B.Sc Data Science V Semester Lab Record

5. Write a program to count the number of occurrences of a word


using MapReduce

Word Count Program(in Java & Python)

Example

The word count program is like the "Hello World" program in MapReduce.

Hadoop MapReduce is a software framework for easily writing applications which


process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters
(thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

A MapReduce job usually splits the input data-set into independent chunks which
are processed by the map tasks in a completely parallel manner. The framework sorts
the outputs of the maps, which are then input to the reduce tasks. Typically both the
input and the output of the job are stored in a file-system. The framework takes care
of scheduling tasks, monitoring them and re-executes the failed tasks.

Word Count Example:

WordCount example reads text files and counts how often words occur. The input is
text files and the output is text files, each line of which contains a word and the count
of how often it occured, separated by a tab.

Each mapper takes a line as input and breaks it into words. It then emits a key/value
pair of the word and each reducer sums the counts for each word and emits a single
key/value with the word and sum.

As an optimization, the reducer is also used as a combiner on the map outputs. This
reduces the amount of data sent across the network by combining each word into a
single record.

Word Count Code:

package org.myorg;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

No SQL Databases Page 69


B.Sc Data Science V Semester Lab Record

public class WordCount {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {


private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)


throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount");

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
}

No SQL Databases Page 70


B.Sc Data Science V Semester Lab Record

To run the example, the command syntax is:

bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r


<#reducers>] <in-dir> <out-dir>

All of the files in the input directory (called in-dir in the command line above) are
read and the counts of words in the input are written to the output directory (called
out-dir above). It is assumed that both inputs and outputs are stored in HDFS.If your
input is not already in HDFS, but is rather in a local file system somewhere, you need
to copy the data into HDFS using a command like this:

bin/hadoop dfs -mkdir <hdfs-dir> //not required in hadoop 0.17.2 and later
bin/hadoop dfs -copyFromLocal <local-dir> <hdfs-dir>

No SQL Databases Page 71


B.Sc Data Science V Semester Lab Record

Word Count example in Python:

mapper.py

import sys
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# increase counters
for word in words:
print '%s\t%s' % (word, 1)
reducer.py

import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
# remove leading and trailing whitespaces
line = line.strip()
# parse the input we got from mapper.py
word, count = line.split('\t', 1)
# convert count (currently a string) to int
try:
count = int(count)
except ValueError:
# count was not a number, so silently
# ignore/discard this line
continue
if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s\t%s' % (current_word, current_count)

The above program can be run using cat filename.txt | python mapper.py
| sort -k1,1 | python reducer.py

No SQL Databases Page 72


B.Sc Data Science V Semester Lab Record

6. Import restaurants collection and apply some queries to get


specified output.

Structure of 'restaurants' collection:

{
"address": {
"building": "1007",
"coord": [ -73.856077, 40.848447 ],
"street": "Morris Park Ave",
"zipcode": "10462"
},
"borough": "Bronx",
"cuisine": "Bakery",
"grades": [
{ "date": { "$date": 1393804800000 }, "grade": "A", "score": 2 },
{ "date": { "$date": 1378857600000 }, "grade": "A", "score": 6 },
{ "date": { "$date": 1358985600000 }, "grade": "A", "score": 10 },
{ "date": { "$date": 1322006400000 }, "grade": "A", "score": 9 },
{ "date": { "$date": 1299715200000 }, "grade": "B", "score": 14 }
],
"name": "Morris Park Bake Shop",
"restaurant_id": "30075445"
}
1. Write a MongoDB query to display all the documents in the collection
restaurants.

Query:

db.restaurants.find();

2. Write a MongoDB query to display the fields restaurant_id, name,


borough and cuisine for all the documents in the collection restaurant.

Query:

db.restaurants.find({},{"restaurant_id" : 1,"name":1,"borough":1,"cuisine" :1});

3.Write a MongoDB query to display the fields restaurant_id, name,


borough and cuisine, but exclude the field _id for all the documents in
the collection restaurant.

Query:

db.restaurants.find({},{"restaurant_id" : 1,"name":1,"borough":1,"cuisine"
:1,"_id":0});

No SQL Databases Page 73


B.Sc Data Science V Semester Lab Record

4.Write a MongoDB query to display all the restaurant which is in the


borough Bronx.

Query:

db.restaurants.find({"borough": "Bronx"});

5.Write a MongoDB query to display the next 5 restaurants after skipping


first 5 which are in the borough Bronx.

Query:

db.restaurants.find({"borough": "Bronx"}).skip(5).limit(5);

6.Write a MongoDB query to find the restaurants who achieved a score


more than 90.

Query:

db.restaurants.find({grades : { $elemMatch:{"score":{$gt : 90}}}});

No SQL Databases Page 74

You might also like