By: Jeffrey Dean & Sanjay Ghemawat
Presented by: Warunika Ranaweera
Supervised by: Dr. Nalin Ranasinghe
MapReduce: Simplified Data Processing
on Large Clusters
In Proceedings of the 6th Symposium on Operating Systems
Design and Implementation (OSDI' 04)
Also appears in the Communications of the ACM (2008)
Ph.D. in Computer Science University of Washington
Google Fellow in Systems and Infrastructure Group
ACM Fellow
Research Areas: Distributed Systems and Parallel Computing
Ph.D. in Computer Science Massachusetts Institute of
Technology
Google Fellow
Research Areas: Distributed Systems and Parallel Computing
Calculate 30*50
Easy?
30*50 + 31*51 + 32*52 + 33*52 + .... + 40*60
Little bit hard?
Simple computation, but huge data set
Real world example for large computations
20+ billion web pages * 20kB webpage
One computer reads 30/35 MB/sec from disc
Nearly four months to read the web
Parallelize tasks in a distributed computing
environment
Web page problem solved in 3 hours with
1000 machines
Complexities in Distributed Computing
o How to parallelize the computation?
o Coordinate with other nodes
o Handling failures
o Preserve bandwidth
o Load balancing
A platform to hide the messy details of distributed
computing
Which are,
Parallelization
Fault-tolerance
Data distribution
Load Balancing
A programming model
An implementation
Example: Word count
the quick
brown fox
the fox ate
the mouse
Document
the
quick
brown
fox
the
fox
ate
the
mouse
1
1
1
1
1
1
1
1
1
Mapped
the
quick
brown
fox
ate
mouse
3
1
1
2
1
1
Reduced
Eg: Word count using MapReduce
the
quick
brown
fox
Map
the, 3
the, 1
quick, 1
brown, 1
fox, 1
quick, 1
brown, 1
Reduce
the fox
ate
the
mouse
Input
Map
Map
the, 1
fox, 1
ate,1
the, 1
mouse, 1
fox, 2
ate, 1
mouse, 1
Reduce
Output
Input Text file
Output (fox, 1)
Document Name
Document Contents
map(String key, String value):
for each word w in value:
EmitIntermediate(w, "1");
Intermediate key/value pair Eg: (fox, 1)
Input (fox, {1, 1})
Output (fox, 2)
Word
List of Counts (Output from Map)
reduce(String key, Iterator values):
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
Accumulated Count
Reverse Web-Link Graph
Source
Web
page 1
Source
Web
page 5
Source
Web
page 4
Target
(My web
page)
Source
Web
page 2
Source
Web
page 3
Reverse Web-Link Graph
(My Web, Source 1)
(Not My Web, Source 2)
(My Web, Source 3)
Map
(My Web, Source 4)
(My Web, Source 5)
Source web pages
Target
(My Web, {Source 1, Source 3,.....})
Source pointing
to the target
Reduce
User Program
(1) Fork
(1) Fork
Master
(2) Assign Map
Split 0
Split 1
Split 2
(1) Fork
(2) Assign Reduce
Worker
(3) Read
(4) Local Write
Worker
(6) Write
Worker
(5) Remote Read
Split 3
Split 4
Worker
Input Layer
Map Layer
O/P File 0
Intermediate
Files
Worker
Reduce Layer
O/P File 1
Output Layer
Complexities in Distributed Computing, to be solved
parallelization
using Map & Reduce
o Automatic
How to parallelize
the computation?
o Coordinate with other nodes
o Handling failures
o Preserve bandwidth
o Load balancing
Restricted Programming model
User specified Map & Reduce functions
1000s of workers, different data sets
Data
Worker1
Worker2
Worker3
User-defined
Map/Reduce
Instruction
Complexities in Distributed Computing, solving..
o Automatic parallelization using Map & Reduce
o Coordinate with
nodesother
usingnodes
a master node
o Handling failures
o Preserve bandwidth
o Load balancing
Master data structure
Pushing information (meta-data) between
workers
Master
Information
Map
Worker
Information
Reduce
Worker
Complexities in Distributed Computing , solving..
o Automatic parallelization using Map & Reduce
o Coordinate nodes using a master node
o Fault
Handling
failures
tolerance
(Re-execution) & back up tasks
o Preserve bandwidth
o Load balancing
No response from a worker task?
If an ongoing Map or Reduce task: Re-execute
If a completed Map task: Re-execute
If a completed Reduce task: Remain untouched
Master failure (unlikely)
Restart
Straggler: machine that takes a long time
to complete the last steps in the computation
Solution: Redundant Execution
Near end of phase, spawn backup copies
Task that finishes first "wins"
Complexities in Distributed Computing , solving..
o Automatic parallelization using Map & Reduce
o Coordinate nodes using a master node
o Fault tolerance (Re-execution) & back up tasks
Preserve
bandwidth
o Saves
bandwidth
through locality
o Load balancing
Same data set in different machines
If a task has data locally, no need to access
other nodes
Complexities in Distributed Computing , solving..
solved
o Automatic parallelization using Map & Reduce
o Coordinate nodes using a master node
o Fault tolerance & back up tasks
o Saves bandwidth through locality
o Load balancing through granularity
Fine granularity tasks: map tasks > machines
1 worker several tasks
Idle workers are quickly assigned to work
Partitioning
Combining
Skipping bad records
Debuggers local execution
Counters
891 S
Normal Execution
1283 S
No backup tasks
44% increment in
time
Very long tail
Stragglers take
>300s to finish
891 S
933 S
5% increment in
time
Quick failure
recovery
Normal Execution
200 processes killed
Clustering for Google News and Google Product Search
Google Maps
Locating addresses
Map tiles rendering
Google PageRank
Localized Search
Apache Hadoop MapReduce
Hadoop Distributed File System (HDFS)
Used in,
Yahoo! Search
Facebook
Amazon
Twitter
Google
Higher level languages/systems based on Hadoop
Amazon Elastic MapReduce
Available for general public
Process data in the cloud
Pig and Hive
Large variety of problems can be expressed as Map
& Reduce
Restricted programming model
Easy to hide details of distributed computing
Achieved scalability & programming efficiency