0% found this document useful (0 votes)

222 views17 pages

Introduction To MapReduce

MapReduce is a programming model for processing large datasets across clusters of computers. It works by splitting data, distributing it, and processing it in parallel on the nodes using user-defined map and reduce functions. The results are then aggregated and output. It provides high degrees of parallelism, fault tolerance, and transparency to programmers without needing to deal with complex distributed systems details. Common examples are distributed grep, word count, and sorting of large datasets.

Uploaded by

Quincy Israel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

222 views17 pages

Introduction To MapReduce

Uploaded by

Quincy Israel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

Introduction to Google MapReduce

WING Group Meeting 13 Oct 2006 Hendra Setiawan

What is MapReduce?
A programming model (& its associated
implementation) For processing large data set Exploits large set of commodity computers Executes process in distributed manner Offers high degree of transparencies In other words:
simple and maybe suitable for your tasks !!!

Distributed Grep
Split data

Very big data

Split data Split data

grep grep grep grep

matches matches

matches

cat

All matches

Split data

matches

Distributed Word Count

Split data

Very big data

Split data Split data

count count count count

count count

count

merge

merged count

Split data

count

Map Reduce
Very big data M A P Partitioning Function R E D U C E Result

Map:
Accepts input key/value pair Emits intermediate key/value pair

Reduce :
Accepts intermediate key/value* pair Emits output key/value pair

Partitioning Function

Partitioning Function (2)

Default : hash(key) Guarantee:
mod R

Relatively well-balanced partitions Ordering guarantee within partition

Distributed Sort
Map:
emit(key,value)

Reduce (with R=1):

emit(key,value)

MapReduce
Distributed Grep
Map:
if match(value,pattern) emit(value,1)

Reduce:
emit(key,sum(value*))

Distributed Word Count

Map:
for all w in value do emit(w,1)

Reduce:
emit(key,sum(value*))

MapReduce Transparencies
Plus Google Distributed File System : Parallelization Fault-tolerance Locality optimization Load balancing

Suitable for your task if

Have a cluster Working with large dataset Working with independent data (or
assumed) Can be cast into map and reduce

MapReduce outside Google

Hadoop (Java)
Emulates MapReduce and GFS

The architecture of Hadoop MapReduce

and DFS is master/slave
Master Slave MapReduce jobtracker tasktracker DFS namenode datanode

Example Word Count (1)

Map
public static class MapClass extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = ((Text)value).toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } }

Example Word Count (2)

Reduce
public static class Reduce extends MapReduceBase implements Reducer { public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += ((IntWritable) values.next()).get(); } output.collect(key, new IntWritable(sum)); } }

Example Word Count (3)

Main
public static void main(String[] args) throws IOException { //checking goes here JobConf conf = new JobConf();
conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); JobClient.runJob(conf); }

One time setup

set hadoop-site.xml and slaves Initiate namenode Run Hadoop MapReduce and DFS Upload your data to DFS Run your process Download your data from DFS

Summary
A simple programming model for
processing large dataset on large set of computer cluster Fun to use, focus on problem, and let the library deal with the messy detail

References
Original paper
(http://labs.google.com/papers/mapreduce .html) On wikipedia (http://en.wikipedia.org/wiki/MapReduce) Hadoop MapReduce in Java (http://lucene.apache.org/hadoop/) Starfish - MapReduce in Ruby (http://rufy.com/starfish/)

MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop
No ratings yet
Hadoop
34 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Cloud Computing & MapReduce Basics
No ratings yet
Cloud Computing & MapReduce Basics
55 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Hadoop MapReduce WordCount Guide
No ratings yet
Hadoop MapReduce WordCount Guide
5 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
MapReduce & Hadoop for CS Students
No ratings yet
MapReduce & Hadoop for CS Students
25 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
30 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Distributed Systems: MapReduce Basics
No ratings yet
Distributed Systems: MapReduce Basics
24 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
MapReduce for Data Processing
No ratings yet
MapReduce for Data Processing
7 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
MapReduce and Hadoop Overview
No ratings yet
MapReduce and Hadoop Overview
69 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Hadoop and Spark Overview
No ratings yet
Hadoop and Spark Overview
34 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Big Data
No ratings yet
Big Data
43 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Lecture 2 - Map Reduce
No ratings yet
Lecture 2 - Map Reduce
20 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
MapReduce for Big Data Analysis
No ratings yet
MapReduce for Big Data Analysis
59 pages
Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy
No ratings yet
Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy
36 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
Bda 03
No ratings yet
Bda 03
10 pages
Assignment-EC2 and EFS
No ratings yet
Assignment-EC2 and EFS
16 pages
WIS EPC 2012 Install
No ratings yet
WIS EPC 2012 Install
5 pages
Flvto Downloader Installation Log
No ratings yet
Flvto Downloader Installation Log
4 pages
Linux Top 100 Commands
No ratings yet
Linux Top 100 Commands
5 pages
CS0051 - Module 01 - Subtopic 1
No ratings yet
CS0051 - Module 01 - Subtopic 1
27 pages
Users and Groups
No ratings yet
Users and Groups
21 pages
3.7.YARN - Failures in Classic MapReduce
No ratings yet
3.7.YARN - Failures in Classic MapReduce
5 pages
DX Diag
No ratings yet
DX Diag
35 pages
Kaggle Data Import & Model Training
No ratings yet
Kaggle Data Import & Model Training
10 pages
Unix ETL Interview Questions
No ratings yet
Unix ETL Interview Questions
5 pages
Exp# 1c Exec System Call Aim: CS2257 Operating System Lab
No ratings yet
Exp# 1c Exec System Call Aim: CS2257 Operating System Lab
3 pages
CS 3307 01 Written Assignment Unit 5
No ratings yet
CS 3307 01 Written Assignment Unit 5
6 pages
Activity Sheet Installation
No ratings yet
Activity Sheet Installation
3 pages
3.1. File and Buffered IO
No ratings yet
3.1. File and Buffered IO
25 pages
WiFi Password List Script
No ratings yet
WiFi Password List Script
2 pages
Difference Between Windows and Linux: Mobile Operating Systems
No ratings yet
Difference Between Windows and Linux: Mobile Operating Systems
22 pages
Ubuntu 12.04: Install Samba Server
No ratings yet
Ubuntu 12.04: Install Samba Server
28 pages
2 Month DevOps Detailed Timetable
No ratings yet
2 Month DevOps Detailed Timetable
10 pages
MRT Installation Guide: Version 2.0.0 Alpha
No ratings yet
MRT Installation Guide: Version 2.0.0 Alpha
7 pages
Docking Station Driver Install Guide
No ratings yet
Docking Station Driver Install Guide
4 pages
ASM Config
No ratings yet
ASM Config
57 pages
TPS300 SDK Dev Guide
100% (1)
TPS300 SDK Dev Guide
11 pages
Winxpsp2 Support Tools
No ratings yet
Winxpsp2 Support Tools
3 pages
Deadlock Detection Using Java
0% (1)
Deadlock Detection Using Java
3 pages
WWW Javatpoint Com Operating System Interview Questions
No ratings yet
WWW Javatpoint Com Operating System Interview Questions
19 pages
Windows XP Secrets Unveiled
100% (2)
Windows XP Secrets Unveiled
44 pages
Installing Nagios XI 2024 Offline
No ratings yet
Installing Nagios XI 2024 Offline
3 pages
Operating System
No ratings yet
Operating System
5 pages
UNIX API Module-3 Open and Creat
No ratings yet
UNIX API Module-3 Open and Creat
8 pages
Introduction To Docker
No ratings yet
Introduction To Docker
29 pages

Introduction To MapReduce

Uploaded by

Introduction To MapReduce

Uploaded by

Introduction to Google MapReduce

WING Group Meeting 13 Oct 2006 Hendra Setiawan

Very big data

Split data Split data

grep grep grep grep

Distributed Word Count

Very big data

Split data Split data

count count count count

Partitioning Function (2)

Relatively well-balanced partitions Ordering guarantee within partition

Reduce (with R=1):

Distributed Word Count

Suitable for your task if

MapReduce outside Google

The architecture of Hadoop MapReduce

Example Word Count (1)

Example Word Count (2)

Example Word Count (3)

One time setup

You might also like