KEMBAR78
Project 2 Mongo: Francesco Valente | PDF | Mongo Db | Map Reduce
0% found this document useful (0 votes)
120 views3 pages

Project 2 Mongo: Francesco Valente

The document summarizes a project analyzing a movie database using MongoDB. It describes ingesting and curating the data by correcting data types. Standard queries and aggregation queries are then performed on the database. Some key points: - The movie database contains attributes like year, votes, and last updated that require data type corrections for proper analysis. - Standard queries are run to find movies by criteria like ratings, genres, countries. Aggregation queries are also used, such as finding average ratings by year. - The project involves ingesting the data, correcting data types, then performing various queries like filtering, sorting, averaging and grouping on the MongoDB database.

Uploaded by

Luigi Ferrettino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views3 pages

Project 2 Mongo: Francesco Valente

The document summarizes a project analyzing a movie database using MongoDB. It describes ingesting and curating the data by correcting data types. Standard queries and aggregation queries are then performed on the database. Some key points: - The movie database contains attributes like year, votes, and last updated that require data type corrections for proper analysis. - Standard queries are run to find movies by criteria like ratings, genres, countries. Aggregation queries are also used, such as finding average ratings by year. - The project involves ingesting the data, correcting data types, then performing various queries like filtering, sorting, averaging and grouping on the MongoDB database.

Uploaded by

Luigi Ferrettino
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Project 2 Mongo

Francesco Valente
Politecnico di Torino
Student id: s291486
s291486@studenti.polito.it

Abstract—Report of the Advanced Data Bases project on B. year


MongoDB. The report firstly describe the data ingestion and
curation of a data collection and then it depict all the query
db.movies.find({
performed on the MongoDB database built on the data collection "year": {$exists: true}
previously modified. }).forEach(
function(doc) {
I. P ROJECT OVERVIEW
var int_value =
The data collection proposed for the project is related new NumberInt(doc.year);
to a collection of movies’ information and their associated db.movies.updateOne({
reviews on IMDB and Rotten Tomatoes. Some attributes of "_id": doc._id
the collection use data type not suitable for analysis, so it’s }, {
important to correctly parse these data type after the import of "$set": {
the collection in MongoDB and before starting query the DB. "year" : int_value
After the data curation the project consist in the execution }
of different query on the MongoDB: standard queries and });
queries based on the aggregation framework and Map-reduce }
framework. );
II. DATA INGESTION AND CURATION
C. lastupdate
The database created for the project is called ”imdb” and
db.movies.find({
contains a single collection named ”movies”.
"lastupdated": {$exists: true}
The movie collection contains several attributes and each of
}).forEach(
them is associated with a data type. Among these attributes
function(doc) {
there are three of them which have an incorrect data type:
var date_value =
imdb.votes, years and lastupdated.
new Date(doc.lastupdated);
An incorrect data type can give a wrong result in the
db.movies.updateOne({
following queries or it may be impossible to perform certain
"_id": doc._id
kind of queries. For this reason it’s important to execute some
}, {
functions on the DB which correctly update all documents,
"$set": {
parsing the data type for those attributes.
"lastupdated" : date_value
The function used to update the collection are:
}
A. imdb.votes });
db.movies.find({ }
"imdb.votes": {$exists: true} );
}).forEach(
III. S TANDARD QUERIES RESOLUTION
function(doc) {
var int_value = All the standard queries (overall four) has been executed on
new NumberInt(doc.imdb.votes); the simple graphical interface offered by MongoDB Compass.
db.movies.updateOne({ Below, for each query, are reported only the code of the indi-
"_id": doc._id vidual parts that make up the query tab (FILTER, PROJECT,
}, { SORT).
"$set": { A. Standard query 1
"imdb.votes" : int_value
Find all the movies which have been scored higher than 4.5
}
on Rotten Tomatoes. Sort the results using the ascending order
});
for the release date.
}
); FILTER {"tomatoes":{$exists:true},
"released":{$exists:true}, $group: {
"tomatoes.viewer.rating":{$gt:4.5}} _id: "$year",
SORT {released:1} avg_rating: {
$avg: "$tomatoes.viewer.rating"
B. Standard query 2 }
Find the movies that have been written by 3 writers and }
directed by 2 directors. }]
FILTER {directors: {$size: 3},
writers: {$size: 2}} B. Aggregation query 2
C. Standard query 3 [{
For the movies that belong to the “Drama” genre and belong $match: {
to the USA country, show their plot, duration ( runtime ), and "countries" : {
title. Order the results according to the descending duration. $elemMatch: {
$eq: "Italy"
FILTER {genres: { }
$elemMatch: {$eq: "Drama"} },
}, "directors": {
countries: { $exists: true
$elemMatch: {$eq: "USA"} }
} }
} },{
PROJECT {plot: 1, runtime: 1, title: 1} $group: {
SORT {runtime: -1} _id: null,
D. Standard query 4 total_directors: {
$avg: {
Find the movies satisfying all the following conditions: have $size: "$directors"
been published between 1900 and 1910, have an imdb rating }
higher than 9.0 and contain the fullplot attribute. In the results, }
show the publication year and the length of the full plot in }
terms of number of characters. Sort the results according to }]
the ascending order of the IMDB rating.
FILTER {released: { C. Aggregation query 3
$gt: new Date("1900"),
$lt: new Date("1911")}, [{
"imdb.rating": {$gt: 9.0}, $match: {
fullplot: {$exists: true}} "imdb.rating": {
PROJECT {year:1, $exists: true,
fullplotLen: { $type: "number"
$strLenCP: "$fullplot"} }
} }
SORT {"imdb.rating": 1} },{
$unwind: {
IV. AGGREGATION FRAMEWORK AND MAP - REDUCE path: "$genres"
In this section it is shown the solutions for queries that }
required aggregation pipeline and map-reduce. },{
The aggregation ones are listed first: $group: {
_id: "$genres",
A. Aggregation query 1 avg_pub_year: {
[{ $avg: "$year"
$match: { },
"tomatoes.viewer.rating" : { max_imdb_score: {
$exists: true $max: "$imdb.rating"
} }
} }
},{ }]
D. Aggregation query 4 },
[{ function(key, values){
$unwind: { return Array.sum(values)
path: "$directors" }
} )
},{ V. I NTEREST QUERY
$group: {
_id: "$directors",
tot_films: {
$sum: 1
}
}
},{
$sort: {
tot_films: -1
}
}]
MongoDB provide not only aggregation pipeline, but also
another method to perform aggregation: map-reduce.
The solutions to proposed queries are the following:
E. Map-reduce query 1
db.movies.mapReduce(
function(){
if(this.year){
emit(this.year, this._id)
}
},
function(key, values){
return values.length
}
)
F. Map-reduce query 2
db.movies.mapReduce(
function(){
if(this.writers && this.title){
emit(this.writers.length,
String(this.title).split(" ").length)
}
},
function(key, values){
sum = values.reduce((a,b) => a+b, 0);
avg = sum/values.length;
return avg;
}
)
G. Map-reduce query 3
db.movies.mapReduce(
function(){
if(this.languages){
this.languages.forEach(
language => emit(language, 1)
)
}

You might also like