KEMBAR78
Map reduce advantages over parallel databases | PPTX
M a p R e d u c e a d va n t a g e s o ve r p a r a l l e l
d a t a b a s e s
MAPREDUCE
1
Outline
• Introduction
• Heterogeneous Systems
• Complex Functions
• Fault Tolerance
• Performance
• Conclusion
2
MapReduce: A Flexible Data Processing
Tool
• Map Reduce is a programming model MapReduce is
composed of :
• Map function key/value pairs
• Reduce function key/values
• Processes many terabytes of data
• System easy to use
3
Paper
• By Andrew Pavloetal
• Comparison paper
• MapReduce is a major step backwards
4
Heterogeneous Systems
• MapReduce provides a simple model for analyzing data
in such heterogeneous systems
• Storage systems like relational database or file systems
• In parallel database :
• input must be copied
• then analyze
5
Complex Functions
• Map & Reduce functions are simple and straight forward
SQL equivalent.
• Pavloetal pointed that some times it very complicated to
be expressed in SQL
• User Defined Functions(UDFs)
• Buggy some times
• MapReduce is a better framework
6
Fault Tolerance (1/2)
• Two models to transfer data between mappers and
reducers:
• Pull model (move)
• Push model (write)
• Pull model create many files and disks (Pavloetal )
• MapReduce used :
• batching, sorting and grouping
• Smart scheduling for reads
7
Fault Tolerance (2/2)
• MapReduce do not use push model due to fault-
tolerance
• Fault tolerance will be more important to process these
data efficiently.
8
Performance (1/2)
• Merging results (cost) :
• Merging isn’t necessary when the next consumer of
MapReduce is:
• another MapReduce
• not another MapReduce
9
Performance (2/2)
• Data loading:
• Hadoop can analyze data 5 to 50 times faster than the time
needed to load data to parallel database
10
Conclusion
• MapReduce is a highly effective and efficient tool for
large-scale fault-tolerant data analysis
• MapReduce is very useful in a heterogeneous system
• MapReduce provides a good framework
11

Map reduce advantages over parallel databases

  • 1.
    M a pR e d u c e a d va n t a g e s o ve r p a r a l l e l d a t a b a s e s MAPREDUCE 1
  • 2.
    Outline • Introduction • HeterogeneousSystems • Complex Functions • Fault Tolerance • Performance • Conclusion 2
  • 3.
    MapReduce: A FlexibleData Processing Tool • Map Reduce is a programming model MapReduce is composed of : • Map function key/value pairs • Reduce function key/values • Processes many terabytes of data • System easy to use 3
  • 4.
    Paper • By AndrewPavloetal • Comparison paper • MapReduce is a major step backwards 4
  • 5.
    Heterogeneous Systems • MapReduceprovides a simple model for analyzing data in such heterogeneous systems • Storage systems like relational database or file systems • In parallel database : • input must be copied • then analyze 5
  • 6.
    Complex Functions • Map& Reduce functions are simple and straight forward SQL equivalent. • Pavloetal pointed that some times it very complicated to be expressed in SQL • User Defined Functions(UDFs) • Buggy some times • MapReduce is a better framework 6
  • 7.
    Fault Tolerance (1/2) •Two models to transfer data between mappers and reducers: • Pull model (move) • Push model (write) • Pull model create many files and disks (Pavloetal ) • MapReduce used : • batching, sorting and grouping • Smart scheduling for reads 7
  • 8.
    Fault Tolerance (2/2) •MapReduce do not use push model due to fault- tolerance • Fault tolerance will be more important to process these data efficiently. 8
  • 9.
    Performance (1/2) • Mergingresults (cost) : • Merging isn’t necessary when the next consumer of MapReduce is: • another MapReduce • not another MapReduce 9
  • 10.
    Performance (2/2) • Dataloading: • Hadoop can analyze data 5 to 50 times faster than the time needed to load data to parallel database 10
  • 11.
    Conclusion • MapReduce isa highly effective and efficient tool for large-scale fault-tolerant data analysis • MapReduce is very useful in a heterogeneous system • MapReduce provides a good framework 11