0% found this document useful (0 votes)

35 views3 pages

MapReduce Commands

Computer science

Uploaded by

Ravinder K Singla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views3 pages

MapReduce Commands

Computer science

Uploaded by

Ravinder K Singla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

# load Hadoop module

--------------------

module load Hadoop/2.6.0-cdh5.8.0-native

# find out where Hadoop is installed (variable $HADOOP_HOME)

echo $HADOOP_HOME
#/opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native/share/hadoop/mapreduce

# find the streaming library

find /opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native -name "hadoop-streaming*jar"
# . . .
#/opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native/share/hadoop/tools/lib/hadoop-
streaming-2.6.0-cdh5.8.0.jar

# save library in the variable $STREAMING

export STREAMING=/opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native/share/hadoop/
tools/lib/hadoop-streaming-2.6.0-cdh5.8.0.jar

# start a simple MapReduce job

#-----------------------------

# Simple job
############

# check that the output directory does not exist

hdfs dfs -rm -r output

# copy the file to HDFS

hdfs dfs -put wiki_1K_lines

# launch MapReduce job

# hadoop jar $STREAMING \
-input wiki_1k_lines \
-output output \
-mapper /bin/cat \
-reducer '/bin/wc -l'

# check if job was successful (output should contain a file named _SUCCESS)
hdfs dfs -ls output
# check result
hdfs dfs -cat output/part-00000

# Simple job with 4 mappers

###########################

hdfs dfs -rm -r output

# launch MapReduce job

hadoop jar $STREAMING \
-D mapreduce.job.maps=4 \
-input wiki_1k_lines \
-output output \
-mapper /bin/cat \
-reducer '/bin/wc -l'
# Wordcount with MapReduce
##########################

# use mapper.py and reducer.py

# mini-test of mapper and reducer
echo "carrot carrot apple carrot" | ./mapper.py | sort -k1 | ./reducer.py

# run wordcount job

# upload file to HDFS
hdfs dfs -put data/wiki_1k_lines
# remove output directory
hdfs dfs -rm -r output

hadoop jar $STREAMING \

-files mapper.py \
-files reducer.py \
-mapper mapper.py \
-reducer reducer.py \
-input wiki_1k_lines \
-output output

# check if output contains _SUCCESS

hdfs dfs -ls output
# check result
hdfs dfs -cat output/part-00000|head

# sort output by frequency

hdfs dfs -cat output/part-00000|sort -k2nr|head

# use swap_keyval.py

# might not be necessary

hdfs dfs -rm -r output2

hadoop jar $STREAMING \

-files swap_keyval.py \
-input output \
-output output2 \
-mapper swap_keyval.py

# check if output contains _SUCCESS

hdfs dfs -ls output
# check result

hdfs dfs -cat output2/part-00000|head

# 10021 his
# 1005 per
# 101 merely
# . . .

hdfs dfs -rm -r output2

comparator_class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator

hadoop jar $STREAMING \

-D mapreduce.job.output.key.comparator.class=$comparator_class \
-D mapreduce.partition.keycomparator.options=-nr \
-files swap_keyval.py \
-input output \
-output output2 \
-mapper swap_keyval.py

hdfs dfs -cat output2/part-00000|head

# 193778 the
# 117170 of
# 89966 and
# 69186 in

# Run MapReduce examples

########################

# list all examples

hadoop jar $HADOOP_HOME/hadoop-mapreduce-examples-2.6.0-cdh5.8.0.jar

Hadoop Installation & MapReduce Guide
No ratings yet
Hadoop Installation & MapReduce Guide
13 pages
Bda Rec
No ratings yet
Bda Rec
29 pages
Hadoop Training for Researchers
100% (1)
Hadoop Training for Researchers
23 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Hadoop
No ratings yet
Hadoop
51 pages
Hadoop Ubuntu Commands
No ratings yet
Hadoop Ubuntu Commands
1 page
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Hadoop Phase3 Notes
No ratings yet
Hadoop Phase3 Notes
4 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
HDFS Commands
No ratings yet
HDFS Commands
1 page
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
19 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Bda File
No ratings yet
Bda File
28 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
虚拟机安装 CentOS7
No ratings yet
虚拟机安装 CentOS7
49 pages
Big Data File
No ratings yet
Big Data File
16 pages
Big Data Questions MQC
No ratings yet
Big Data Questions MQC
9 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Hadoop Single-Node Setup Guide
No ratings yet
Hadoop Single-Node Setup Guide
4 pages
Hadoop Setup for Beginners
No ratings yet
Hadoop Setup for Beginners
4 pages
Bda Lab-3
No ratings yet
Bda Lab-3
3 pages
Data Science
No ratings yet
Data Science
82 pages
Bda Lab-3 - 146
No ratings yet
Bda Lab-3 - 146
3 pages
Dsbda 2
No ratings yet
Dsbda 2
12 pages
Hadoop Commands
No ratings yet
Hadoop Commands
5 pages
Exp 5 - 9
No ratings yet
Exp 5 - 9
25 pages
CC Hadoop Lab
No ratings yet
CC Hadoop Lab
6 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Hadoop Module1
No ratings yet
Hadoop Module1
37 pages
Data Science Record
No ratings yet
Data Science Record
30 pages
DA Lab EXERCISE
No ratings yet
DA Lab EXERCISE
24 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
MongoDB NareshIT 17 1 2022
No ratings yet
MongoDB NareshIT 17 1 2022
13 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
Hadoop and Hive Installation
No ratings yet
Hadoop and Hive Installation
19 pages
DSBDN
No ratings yet
DSBDN
4 pages
Big Data
No ratings yet
Big Data
5 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Hadoop
No ratings yet
Hadoop
4 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Hadoop 3.3.5 Setup and HDFS File Management
No ratings yet
Hadoop 3.3.5 Setup and HDFS File Management
3 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
BDA Output
No ratings yet
BDA Output
32 pages
Assignment No1solution PDF Process (Computing) Operating System
No ratings yet
Assignment No1solution PDF Process (Computing) Operating System
1 page
Midterm Sol
No ratings yet
Midterm Sol
12 pages
2022 (2023) AL ICT Marking Scheme English Medium 2
No ratings yet
2022 (2023) AL ICT Marking Scheme English Medium 2
1 page
Ethical Challenges of The Information Society
No ratings yet
Ethical Challenges of The Information Society
17 pages
Database Vs Data Warehouse
No ratings yet
Database Vs Data Warehouse
21 pages
Advanced UNIX Utilities Guide
No ratings yet
Advanced UNIX Utilities Guide
32 pages
Data Sources For Data Warehouse
No ratings yet
Data Sources For Data Warehouse
2 pages
(Solved) in The Context of MS-PowerPoint, A Presentation Software, WH
No ratings yet
(Solved) in The Context of MS-PowerPoint, A Presentation Software, WH
1 page
Test Your C Skills
No ratings yet
Test Your C Skills
129 pages
Transmission Media: Wires, Cables, Fiber Optics, and Microwaves
No ratings yet
Transmission Media: Wires, Cables, Fiber Optics, and Microwaves
15 pages
CS1255 OS Lab Manual Good
No ratings yet
CS1255 OS Lab Manual Good
66 pages
Department of Computer Science & Applications Panjab University
No ratings yet
Department of Computer Science & Applications Panjab University
24 pages
Introduction To Information Technology - Notes On The SDLC (DR R.K. Singla)
No ratings yet
Introduction To Information Technology - Notes On The SDLC (DR R.K. Singla)
9 pages
Apps SQL Queries
100% (3)
Apps SQL Queries
11 pages
Log
No ratings yet
Log
2 pages
NBIMS-US V3 3 Terms and Definitions
100% (1)
NBIMS-US V3 3 Terms and Definitions
38 pages
Netcool
No ratings yet
Netcool
39 pages
CloudEngine 8800, 7800, 6800, and 5800 V200R005C10 Configuration Guide - Ethernet Switching
No ratings yet
CloudEngine 8800, 7800, 6800, and 5800 V200R005C10 Configuration Guide - Ethernet Switching
769 pages
Requirements Gathering Template (Version 1.0)
No ratings yet
Requirements Gathering Template (Version 1.0)
7 pages
Spamming Tutorial +918954133645
67% (3)
Spamming Tutorial +918954133645
18 pages
Auto Insurance Fraud Detection
No ratings yet
Auto Insurance Fraud Detection
27 pages
Weather Forecasting Application Using Api
No ratings yet
Weather Forecasting Application Using Api
53 pages
Gcu Online Thesis
100% (2)
Gcu Online Thesis
8 pages
TAFJ Basic Program Compilation Guide
No ratings yet
TAFJ Basic Program Compilation Guide
35 pages
MGM LAN Tutorial
No ratings yet
MGM LAN Tutorial
13 pages
iFIX HMI/SCADA Features Overview
No ratings yet
iFIX HMI/SCADA Features Overview
45 pages
P2V Consideration and Pre-Post Migration Checklist: Candidate Selection
No ratings yet
P2V Consideration and Pre-Post Migration Checklist: Candidate Selection
2 pages
Biochemistry Analysers New
No ratings yet
Biochemistry Analysers New
9 pages
Services Path - Tier Guide
No ratings yet
Services Path - Tier Guide
14 pages
Artificial Intelligence in Manufacturing PDF
No ratings yet
Artificial Intelligence in Manufacturing PDF
12 pages
956H5电气线路图解 EN
100% (3)
956H5电气线路图解 EN
38 pages
An Analysis On Measuring Graph Patterns in Social Networks
No ratings yet
An Analysis On Measuring Graph Patterns in Social Networks
6 pages
DCN Unit III
No ratings yet
DCN Unit III
19 pages
Morphological Image Processing
No ratings yet
Morphological Image Processing
45 pages
Learning3 6pp
No ratings yet
Learning3 6pp
15 pages
Pec 104 Lesson 2
0% (1)
Pec 104 Lesson 2
11 pages
CSIT115 - Lab 1 ERD
No ratings yet
CSIT115 - Lab 1 ERD
6 pages
Chapter 1 Algorithm and Complexity Lesson 1
No ratings yet
Chapter 1 Algorithm and Complexity Lesson 1
18 pages
19c Rac DG PT 8 Days
No ratings yet
19c Rac DG PT 8 Days
7 pages
Web Technology NCS504 2016 17
No ratings yet
Web Technology NCS504 2016 17
3 pages
Homework 6 Grade
No ratings yet
Homework 6 Grade
10 pages
Private Files MODULE-4-CHS11M4real-1 PDF
No ratings yet
Private Files MODULE-4-CHS11M4real-1 PDF
19 pages
CISA 30 Questions
No ratings yet
CISA 30 Questions
6 pages

MapReduce Commands

Uploaded by

MapReduce Commands

Uploaded by

# load Hadoop module

module load Hadoop/2.6.0-cdh5.8.0-native

# find out where Hadoop is installed (variable $HADOOP_HOME)

# find the streaming library

# save library in the variable $STREAMING

# start a simple MapReduce job

# check that the output directory does not exist

# copy the file to HDFS

# launch MapReduce job

# Simple job with 4 mappers

hdfs dfs -rm -r output

# launch MapReduce job

# use mapper.py and reducer.py

# run wordcount job

hadoop jar $STREAMING \

# check if output contains _SUCCESS

# sort output by frequency

# might not be necessary

hadoop jar $STREAMING \

# check if output contains _SUCCESS

hdfs dfs -cat output2/part-00000|head

hdfs dfs -rm -r output2

hadoop jar $STREAMING \

hdfs dfs -cat output2/part-00000|head

# Run MapReduce examples

# list all examples

You might also like