100% found this document useful (1 vote)

113 views6 pages

BigData MapReduce

MapReduce is a programming model used to process large datasets across clusters of computers. It works by having a master node divide input data into smaller subproblems and distribute them to worker nodes. Each worker node then processes its subset and returns results to the master node, which combines the results into the final output. Key aspects of MapReduce include mapping functions to divide the work, a partitioning function to group output data, and reduce functions to combine results from each partition.

Uploaded by

arjuncchaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

113 views6 pages

BigData MapReduce

Uploaded by

arjuncchaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Big Data

Map Reduce

Table of Contents
Key approach to work with Big Data............................................................................................................2
Mapping......................................................................................................................................................2
The Map Step..........................................................................................................................................2
Appling the Reduce Step.............................................................................................................................3
Reduce Step.............................................................................................................................................3
Map Reduce Data Flow................................................................................................................................4
A Closer Look at the map and partition Step...............................................................................................5

1
Big Data
Map Reduce

Key approach to work with Big Data

 MapReduce is a programing model for processing large data sets, and the name of an

implementation of the model by Google.

 MapReduce is typically used to do distribute computing of large datasets on clusters of

computers.

Worker Node1
Map
Problem Data

Master Node Worker Node2

Problem Data Worker Node3

Mapping

The Map Step

 The master node takes the input, divides it into smaller sub-problems, and distributed them to

worker nodes.

 This process is iterative which can lead to a multi-level tree structure.

2
Big Data
Map Reduce

 The worker nodes process their small problem and hand their result back to their parent node.

INPUT LIST

MAPPING FUNCTION

OUTPUT LIST

Appling the Reduce Step

Reduce Step

The master node will then collect the answer from all the child nodes and combine them in a meaningful

way to from the primary output, which is the answer to the problem that was put to the system.

Input List

MAPPING FUNCTION

Output List

3
Big Data
Map Reduce

Map Reduce Data Flow

Input Format

Split Split Split File

File
RR RR RR

Map Map Map

Partitioner

(Short)

Reduce

Output Format

 If we zoom in on each part of the MapReduce framework, we see this is a large distributed sort.
The most important steps are defined as follows.

 An input function

 A Map Function

 A Partition function

 A compare/sort function

4
Big Data
Map Reduce

 A reduce function

 An output writer

A Closer Look at the map and partition Step

 The map function takes a series of key/value pairs; it will then subdivide these further creating the

full structure.

 Each Map node output is assigned to a particular reducer by the application’s partition function for

sharing purpose.

 The partition function is given the key and the number of reduce and return the index.

 The input for each reduces is pulled from the machine where the map ran and sorted using the

application’s comparison function.

 The framework calls the applications reduce function once for each unique key in the sorted

order. The reduce can iterate through the values that are associated with the key and produce

zero or more outputs.

 The output writer writes the output of the reduce of the stable storage, usually a distributed file

system.

5
Big Data
Map Reduce

Input List

MAPPING FUNCTION

Output List

Map Reduce With Hadoop:: Presented by ANIVESHA-126 ARITRA-128 RIA-142 Shashvat - 150 SHEKHAR-151
100% (1)
Map Reduce With Hadoop:: Presented by ANIVESHA-126 ARITRA-128 RIA-142 Shashvat - 150 SHEKHAR-151
9 pages
Map Reduce
100% (1)
Map Reduce
33 pages
MDX Tutorial
100% (1)
MDX Tutorial
31 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
31 pages
Creating A Modern Analytics Architecture
No ratings yet
Creating A Modern Analytics Architecture
18 pages
Cloud Computing for IT Professionals
No ratings yet
Cloud Computing for IT Professionals
37 pages
Data Security in Cloud Computing
0% (1)
Data Security in Cloud Computing
6 pages
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Newest Edition 2025
0% (1)
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Newest Edition 2025
127 pages
PeopleSoft File Import Guide
No ratings yet
PeopleSoft File Import Guide
4 pages
Data Models Data Modelling and Analysis
No ratings yet
Data Models Data Modelling and Analysis
55 pages
Big Data Landscape Overview 2017
No ratings yet
Big Data Landscape Overview 2017
1 page
Document Database Data Modeling
No ratings yet
Document Database Data Modeling
27 pages
20779B 01
No ratings yet
20779B 01
18 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Big Data Processing Techniques
No ratings yet
Big Data Processing Techniques
22 pages
MDX Introduction and Overview
No ratings yet
MDX Introduction and Overview
3 pages
Modernize Data Platforms With SingleStore - IBM
No ratings yet
Modernize Data Platforms With SingleStore - IBM
27 pages
CDMP Chapter 7 Data Security
No ratings yet
CDMP Chapter 7 Data Security
22 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
The Operational Data Store - Tactical Analysis at Your Fingertips
86% (7)
The Operational Data Store - Tactical Analysis at Your Fingertips
64 pages
Week-09-10-11-12 Fundamentals of Cybersecurity
No ratings yet
Week-09-10-11-12 Fundamentals of Cybersecurity
67 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
MIE1628 Big Data Analytics Lecture8
No ratings yet
MIE1628 Big Data Analytics Lecture8
82 pages
Big Data - GCP IM Point of View
No ratings yet
Big Data - GCP IM Point of View
38 pages
Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
Android Employee Tracker App
0% (1)
Android Employee Tracker App
16 pages
Data Quality for Data Stewards
No ratings yet
Data Quality for Data Stewards
28 pages
Python ML Course Notes
No ratings yet
Python ML Course Notes
36 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
10
No ratings yet
10
4 pages
Chap 3. NoSQL
No ratings yet
Chap 3. NoSQL
97 pages
Big Data Analytics Lab Manual 2025
No ratings yet
Big Data Analytics Lab Manual 2025
91 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
Data Versioning For Graph Databases
No ratings yet
Data Versioning For Graph Databases
71 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
Lecture 9 Overview of Geospatial Programming Languages Block 2
No ratings yet
Lecture 9 Overview of Geospatial Programming Languages Block 2
41 pages
Teradata Studio User Guide
No ratings yet
Teradata Studio User Guide
256 pages
Big Data Architecture Overview
No ratings yet
Big Data Architecture Overview
8 pages
Forrester Wave In-Memory Data Grids
No ratings yet
Forrester Wave In-Memory Data Grids
16 pages
Analyzing Big Data Course Overview
No ratings yet
Analyzing Big Data Course Overview
6 pages
Folium Documentation: Release 0.2.0
No ratings yet
Folium Documentation: Release 0.2.0
16 pages
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
No ratings yet
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
407 pages
Azure AnalysisServiceOverview
No ratings yet
Azure AnalysisServiceOverview
173 pages
Fabric Settings
No ratings yet
Fabric Settings
3 pages
CT113H Lecture 1 - Introduction To NoSQL
No ratings yet
CT113H Lecture 1 - Introduction To NoSQL
51 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Understanding DB2 Bufferpool Tuning 2005 Final
No ratings yet
Understanding DB2 Bufferpool Tuning 2005 Final
40 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
MongoDB Security Architecture WP
No ratings yet
MongoDB Security Architecture WP
17 pages
Big Educational Data & Analytics Survey
No ratings yet
Big Educational Data & Analytics Survey
23 pages
CB Queryoptimization 01
No ratings yet
CB Queryoptimization 01
78 pages
MC5303 Web Programming Essentials
100% (1)
MC5303 Web Programming Essentials
115 pages
Spark
No ratings yet
Spark
160 pages
Oracle Cloud World 2024 Presentations and Their Corresponding Links
No ratings yet
Oracle Cloud World 2024 Presentations and Their Corresponding Links
52 pages
IoT Network with LoRa and Iota
No ratings yet
IoT Network with LoRa and Iota
18 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
0542 (MDM) HubJavaUserExits en H2L
No ratings yet
0542 (MDM) HubJavaUserExits en H2L
18 pages
BigData - Oozie
No ratings yet
BigData - Oozie
5 pages
Migrating IDD Applications To The Business Entity Data Model
No ratings yet
Migrating IDD Applications To The Business Entity Data Model
24 pages
MDM 960 Sif
No ratings yet
MDM 960 Sif
150 pages
Big Data - Impala
No ratings yet
Big Data - Impala
5 pages
School Policies
No ratings yet
School Policies
8 pages
UltraEdit Regex Guide for Developers
No ratings yet
UltraEdit Regex Guide for Developers
3 pages
StudyGuides PMstudy Project Scope Management
No ratings yet
StudyGuides PMstudy Project Scope Management
19 pages
Data Warehouse Road Map
No ratings yet
Data Warehouse Road Map
31 pages
Serious Money-How To Make It and Enjoy It
100% (12)
Serious Money-How To Make It and Enjoy It
218 pages
Informatica Interview Q&A Guide
No ratings yet
Informatica Interview Q&A Guide
11 pages
Oracle Data IntegratorODI - PoV
No ratings yet
Oracle Data IntegratorODI - PoV
7 pages
Informatica Interview Q&A Guide
No ratings yet
Informatica Interview Q&A Guide
11 pages
DIALux evo: Office Lighting Design
0% (1)
DIALux evo: Office Lighting Design
51 pages
67067bos54070 cp12
No ratings yet
67067bos54070 cp12
21 pages
Mark Sheet
No ratings yet
Mark Sheet
1 page
Project Design Brief (G2)
No ratings yet
Project Design Brief (G2)
1 page
Model 363 Control Valve Guide
No ratings yet
Model 363 Control Valve Guide
20 pages
KIRAN's Resume
No ratings yet
KIRAN's Resume
1 page
Technological University (Meiktila) Department of Electronic Engineering
No ratings yet
Technological University (Meiktila) Department of Electronic Engineering
62 pages
10 Essential InDesign Skills by InDesignSkills
100% (5)
10 Essential InDesign Skills by InDesignSkills
14 pages
Kasun CV
No ratings yet
Kasun CV
3 pages
Wireshark Lab 1.2 Import and Examine PCAP File (V1.1)
No ratings yet
Wireshark Lab 1.2 Import and Examine PCAP File (V1.1)
9 pages
Control Builder Components Reference EXDOC-XX15-en-110 PDF
0% (1)
Control Builder Components Reference EXDOC-XX15-en-110 PDF
254 pages
L I N K e D: Linked List
100% (1)
L I N K e D: Linked List
11 pages
Slac Template
No ratings yet
Slac Template
5 pages
EG Manual-1
No ratings yet
EG Manual-1
13 pages
Extrahop Mib
No ratings yet
Extrahop Mib
6 pages
Professional Practices in Information Technology: Hand Book
No ratings yet
Professional Practices in Information Technology: Hand Book
131 pages
Chapter Four - Memory Managment
No ratings yet
Chapter Four - Memory Managment
16 pages
Roxtec Transit Designer™: Online Tool For Easy Design of Cable and Pipe Transits
No ratings yet
Roxtec Transit Designer™: Online Tool For Easy Design of Cable and Pipe Transits
2 pages
Naive Bayes for Data Science Students
No ratings yet
Naive Bayes for Data Science Students
1,652 pages
Teltonika FMC125 Brochure
No ratings yet
Teltonika FMC125 Brochure
3 pages
BIG-IP CGNAT Implementations
No ratings yet
BIG-IP CGNAT Implementations
208 pages
Class 10 Real Numbers Solutions
No ratings yet
Class 10 Real Numbers Solutions
7 pages
Manual SIMOTION Web Accumulator V3.0.0
No ratings yet
Manual SIMOTION Web Accumulator V3.0.0
59 pages
Am-Stick-Wb: Part.-No. 349081
No ratings yet
Am-Stick-Wb: Part.-No. 349081
19 pages
Big Data Batch Analytics Lecture
100% (1)
Big Data Batch Analytics Lecture
36 pages
Deepfake's Impact on Trust
No ratings yet
Deepfake's Impact on Trust
7 pages
Oracle DBA Resume: Piyush Solanki
No ratings yet
Oracle DBA Resume: Piyush Solanki
5 pages
ACP Tutorials PDF
No ratings yet
ACP Tutorials PDF
10 pages
Aadhaar Details for Mehtab
No ratings yet
Aadhaar Details for Mehtab
1 page
Data Analysis With STATA - Sample Chapter
No ratings yet
Data Analysis With STATA - Sample Chapter
22 pages

BigData MapReduce

Uploaded by

BigData MapReduce

Uploaded by

Big Data

Key approach to work with Big Data

implementation of the model by Google.

 MapReduce is typically used to do distribute computing of large datasets on clusters of

Master Node Worker Node2

Problem Data Worker Node3

The Map Step

 This process is iterative which can lead to a multi-level tree structure.

Appling the Reduce Step

Map Reduce Data Flow

Split Split Split File

Map Map Map

A Closer Look at the map and partition Step

application’s comparison function.

zero or more outputs.

You might also like