Map reduce advantages over parallel databases

M a p R e d u c e a d va n t a g e s o ve r p a r a l l e l
d a t a b a s e s
MAPREDUCE
1

Outline
• Introduction
• Heterogeneous Systems
• Complex Functions
• Fault Tolerance
• Performance
• Conclusion
2

MapReduce: A Flexible Data Processing
Tool
• Map Reduce is a programming model MapReduce is
composed of :
• Map function key/value pairs
• Reduce function key/values
• Processes many terabytes of data
• System easy to use
3

Paper
• By Andrew Pavloetal
• Comparison paper
• MapReduce is a major step backwards
4

Heterogeneous Systems
• MapReduce provides a simple model for analyzing data
in such heterogeneous systems
• Storage systems like relational database or file systems
• In parallel database :
• input must be copied
• then analyze
5

Complex Functions
• Map & Reduce functions are simple and straight forward
SQL equivalent.
• Pavloetal pointed that some times it very complicated to
be expressed in SQL
• User Defined Functions(UDFs)
• Buggy some times
• MapReduce is a better framework
6

Fault Tolerance (1/2)
• Two models to transfer data between mappers and
reducers:
• Pull model (move)
• Push model (write)
• Pull model create many files and disks (Pavloetal )
• MapReduce used :
• batching, sorting and grouping
• Smart scheduling for reads
7

Fault Tolerance (2/2)
• MapReduce do not use push model due to fault-
tolerance
• Fault tolerance will be more important to process these
data efficiently.
8

Performance (1/2)
• Merging results (cost) :
• Merging isn’t necessary when the next consumer of
MapReduce is:
• another MapReduce
• not another MapReduce
9

Performance (2/2)
• Data loading:
• Hadoop can analyze data 5 to 50 times faster than the time
needed to load data to parallel database
10

Conclusion
• MapReduce is a highly effective and efficient tool for
large-scale fault-tolerant data analysis
• MapReduce is very useful in a heterogeneous system
• MapReduce provides a good framework
11

Map reduce advantages over parallel databases

More Related Content

What's hot

Similar to Map reduce advantages over parallel databases

More from Ahmad El Tawil

Recently uploaded

Map reduce advantages over parallel databases