KEMBAR78
Aggregation Framework in MongoDB Overview Part-1 | ODP
Aggregation Framework
in MongoDB
Overview - Part-1
What is Aggregation
English Definition:- The act of gathering
something together.
Database Definition:- Aggregation are
operations that process on the data sets,
group some data records, do some
computation on that records and return
computed results.
Aggregration Approach in
MongoDB
● Aggregation Pipeline
● Map-Reduce
● Single Purpose Aggregation Operations
● Hadoop Connector
Aggregation-Pipeline
Similar to pipeline in UNIX.
In Unix -
Cat file.txt | grep abc | wc -l
In MongoDB -
Group Limit SortCollection Output
MapReduce
● Two Phase process – Mapper & Reducer.
● Use JavaScript function for map-reduce
job.
● Less efficient as it run over only one thread.
● Finalise stage to make final modification.
Aggregarion Vs. MapReduce
Easy to use, just need to use the build
in operators.
Complex and steep learning curve.
Supports non-sharded and sharded
input collections.
Supports non-sharded and sharded
input collections.
Returns results inline. Return result in inline, new collection,
merge, replace, reduce.
Limited to the operators and
expressions supported by the
aggregation pipeline.
Custom map, reduce and finalize
JavaScript functions offer flexibility to
aggregation logic.
Single Purpose Aggregation
Operation
● Applicable for single purpose of
aggregation, like count, distinct, group.
● Limited scope as compared to aggregation
pipeline and map-reduce.
● group does not support data in sharded
collections, and result of operation should
not be more than 16 MB.
Aggregation Pipeline Operators
● $group
● $match
● $project
● $limit
● $sort
● $unwind
● $skip
SQL to Aggregation Mapping Chart
SQL Operators Aggregation Operators
WHERE $match
GROUP BY $group
SELECT $project
LIMIT $limit
ORDER BY $sort
SUM() $sum
COUNT() $sum
JOIN $unwind (Not exact operator as JOIN
works, but unwinds the array
embedded in the document)
Examples
SQL Query Aggregation Query
SELECT COUNT(*) AS COUNT
FROM EMPLOYEE
db.employee.aggregate([
{$group: {_id:null, count: { $sum:1} } }
])
Explaination :
Group by : on nothing
Sum : Just add 1 to count field for each
record
SELECT SUM(SALARY) AS TOTAL
FROM EMPLOYEE
db.employee.aggregate([
{$group: {_id:null, total:
{ $sum:”$salary”} } }
])
Explaination :
Group by : on nothing
Sum: do the sum of value of salary
field of each doc and put result in total.
Example Cont.
SELECT DEPARTMENT_ID,
SUM(SALARY) AS TOTAL FROM
EMPLOYEE GROUP BY
DEPARTMENT_ID ORDER BY
TOTAL
db.employee.aggregate( [
{ $group: { _id: “$department_id”, total:
{ $sum:”$salary” } },
{ $sort: { total : 1 } } }
] )
Explanation:
Group by : department,
Sum : salary,
Order by : total of salary for each
department
Example Cont.
SELECT DEPT_ID, SUM(SALARY)
AS TOTAL FROM EMPLOYEE
WHERE AGE>25 GROUB BY
DEPT_ID, HAVING TOTAL > 5000
db.employee.aggregate( [
{ $match : { age : {$gt: 25 } } }
{ $group: { _id: “$dept_id”, total:
{ $sum: ”$salary” } } },
{ $match: { total: {$gt:5000 } } }
] )
Explanation:
Group by : department id,
Sum : salary,
Having: on total of each department
salary
Example Cont.
SELECT DEPT_ID, SUM(SALARY) AS
TOTAL FROM EMPLOYEE WHERE
AGE>25 GROUB BY DEPT_ID,
HAVING TOTAL > 5000
db.employee.aggregate( [
{ $match : { age : {$gt: 25 } } }
{ $group: { _id: “$dept_id”, total:
{ $sum: ”$salary” } } },
{ $match: { total: {$gt:5000 } } }
] )
Explanation:
Group by : department id,
Sum : salary,
Having: on total of each department
salary
$unwind Operator
Decompose the embedded array into flat
document and relate each entry in the
array with outer fields.
Example :-
{ _id: “blog”, tags: [ “social”, “economic” ] }
{ _id: “blog”, { _id: “blog”
tags: “social” } tags: “economic” }
Mongo Aggregation Optimization
MongoDB re-arranged the pipeline
operations to optimize the aggregation
performance.
● Pipeline Sequence Optimization.
● Projection Optimization.
Mongo Aggregation Optimization
MongoDB re-arranged the pipeline
operations to optimize the aggregation
performance.
● Pipeline Sequence Optimization.
● Projection Optimization.
Pipeline Sequence Optimization
● $sort + $skip + $limit
{ $sort: { salary : -1 } }, { $sort: { salary: -1 } },
{ $skip: 10 }, { $limit: 20 },
{ $limit: 10 } { $skip: 10}
Projection Optimization
● $project
● Reduce the amount of data passing
through channels of operation and will help
in performace improvement.
● Below example will only emit salary.
db.employee.aggregate(
[ {$match: {“name”: “xyz” } }
{ $project: { salary:1, _id:0} } ]
)
Restriction
● Output BSON document cannot exceed the
16 MB of data. If exceed will throw error.
● If single aggregation operation consumes
more than 10 percent of system RAM, the
operation will produce error.
Aggregation Examples
● Download data from the below link:
http://media.mongdb.org/zips.json
● Practice sample problem from the below
links:
http://docs.mongodb.org/manual/tutorial/aggregatio-zip-code-data-set/
References
● http://docs.mongodb.org/
Good Examples:-
● http://rubayeet.wordpress.com/2013/12/29/web-analytics
-using-mongodb-aggregation-framework/
● http://derickrethans.nl/aggregation-framework.html
● http://architects.dzone.com/articles/using-mongodb
-aggregation
Anuj Jain
● ajain@equalexperts.com
● anuj2jain@gmail.com

Aggregation Framework in MongoDB Overview Part-1

  • 1.
  • 2.
    What is Aggregation EnglishDefinition:- The act of gathering something together. Database Definition:- Aggregation are operations that process on the data sets, group some data records, do some computation on that records and return computed results.
  • 3.
    Aggregration Approach in MongoDB ●Aggregation Pipeline ● Map-Reduce ● Single Purpose Aggregation Operations ● Hadoop Connector
  • 4.
    Aggregation-Pipeline Similar to pipelinein UNIX. In Unix - Cat file.txt | grep abc | wc -l In MongoDB - Group Limit SortCollection Output
  • 5.
    MapReduce ● Two Phaseprocess – Mapper & Reducer. ● Use JavaScript function for map-reduce job. ● Less efficient as it run over only one thread. ● Finalise stage to make final modification.
  • 7.
    Aggregarion Vs. MapReduce Easyto use, just need to use the build in operators. Complex and steep learning curve. Supports non-sharded and sharded input collections. Supports non-sharded and sharded input collections. Returns results inline. Return result in inline, new collection, merge, replace, reduce. Limited to the operators and expressions supported by the aggregation pipeline. Custom map, reduce and finalize JavaScript functions offer flexibility to aggregation logic.
  • 8.
    Single Purpose Aggregation Operation ●Applicable for single purpose of aggregation, like count, distinct, group. ● Limited scope as compared to aggregation pipeline and map-reduce. ● group does not support data in sharded collections, and result of operation should not be more than 16 MB.
  • 9.
    Aggregation Pipeline Operators ●$group ● $match ● $project ● $limit ● $sort ● $unwind ● $skip
  • 10.
    SQL to AggregationMapping Chart SQL Operators Aggregation Operators WHERE $match GROUP BY $group SELECT $project LIMIT $limit ORDER BY $sort SUM() $sum COUNT() $sum JOIN $unwind (Not exact operator as JOIN works, but unwinds the array embedded in the document)
  • 11.
    Examples SQL Query AggregationQuery SELECT COUNT(*) AS COUNT FROM EMPLOYEE db.employee.aggregate([ {$group: {_id:null, count: { $sum:1} } } ]) Explaination : Group by : on nothing Sum : Just add 1 to count field for each record SELECT SUM(SALARY) AS TOTAL FROM EMPLOYEE db.employee.aggregate([ {$group: {_id:null, total: { $sum:”$salary”} } } ]) Explaination : Group by : on nothing Sum: do the sum of value of salary field of each doc and put result in total.
  • 12.
    Example Cont. SELECT DEPARTMENT_ID, SUM(SALARY)AS TOTAL FROM EMPLOYEE GROUP BY DEPARTMENT_ID ORDER BY TOTAL db.employee.aggregate( [ { $group: { _id: “$department_id”, total: { $sum:”$salary” } }, { $sort: { total : 1 } } } ] ) Explanation: Group by : department, Sum : salary, Order by : total of salary for each department
  • 13.
    Example Cont. SELECT DEPT_ID,SUM(SALARY) AS TOTAL FROM EMPLOYEE WHERE AGE>25 GROUB BY DEPT_ID, HAVING TOTAL > 5000 db.employee.aggregate( [ { $match : { age : {$gt: 25 } } } { $group: { _id: “$dept_id”, total: { $sum: ”$salary” } } }, { $match: { total: {$gt:5000 } } } ] ) Explanation: Group by : department id, Sum : salary, Having: on total of each department salary
  • 14.
    Example Cont. SELECT DEPT_ID,SUM(SALARY) AS TOTAL FROM EMPLOYEE WHERE AGE>25 GROUB BY DEPT_ID, HAVING TOTAL > 5000 db.employee.aggregate( [ { $match : { age : {$gt: 25 } } } { $group: { _id: “$dept_id”, total: { $sum: ”$salary” } } }, { $match: { total: {$gt:5000 } } } ] ) Explanation: Group by : department id, Sum : salary, Having: on total of each department salary
  • 15.
    $unwind Operator Decompose theembedded array into flat document and relate each entry in the array with outer fields. Example :- { _id: “blog”, tags: [ “social”, “economic” ] } { _id: “blog”, { _id: “blog” tags: “social” } tags: “economic” }
  • 16.
    Mongo Aggregation Optimization MongoDBre-arranged the pipeline operations to optimize the aggregation performance. ● Pipeline Sequence Optimization. ● Projection Optimization.
  • 17.
    Mongo Aggregation Optimization MongoDBre-arranged the pipeline operations to optimize the aggregation performance. ● Pipeline Sequence Optimization. ● Projection Optimization.
  • 18.
    Pipeline Sequence Optimization ●$sort + $skip + $limit { $sort: { salary : -1 } }, { $sort: { salary: -1 } }, { $skip: 10 }, { $limit: 20 }, { $limit: 10 } { $skip: 10}
  • 19.
    Projection Optimization ● $project ●Reduce the amount of data passing through channels of operation and will help in performace improvement. ● Below example will only emit salary. db.employee.aggregate( [ {$match: {“name”: “xyz” } } { $project: { salary:1, _id:0} } ] )
  • 20.
    Restriction ● Output BSONdocument cannot exceed the 16 MB of data. If exceed will throw error. ● If single aggregation operation consumes more than 10 percent of system RAM, the operation will produce error.
  • 21.
    Aggregation Examples ● Downloaddata from the below link: http://media.mongdb.org/zips.json ● Practice sample problem from the below links: http://docs.mongodb.org/manual/tutorial/aggregatio-zip-code-data-set/
  • 22.
    References ● http://docs.mongodb.org/ Good Examples:- ●http://rubayeet.wordpress.com/2013/12/29/web-analytics -using-mongodb-aggregation-framework/ ● http://derickrethans.nl/aggregation-framework.html ● http://architects.dzone.com/articles/using-mongodb -aggregation
  • 23.