Big Data Lab Guide for CS Students
Big Data Lab Guide for CS Students
EXPERIMENT FILE
B.TECH
(IV YEAR – VII SEM)
(2022‐2023)
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING
Course Objectives:
1. Get familiar with Hadoop distributions, configuring Hadoop and performing File
management tasks
2. Experiment MapReduce in Hadoop frameworks
3. Implement MapReduce programs in variety applications
4. Explore MapReduce support for debugging
5. Understand different approaches for building Hadoop MapReduce programs for real-time
applications
Experiments:
2. Develop a MapReduce program to calculate the frequency of a given word in agiven file.
6. Develop a MapReduce to find the maximum electrical consumption in each year given
electrical consumption for each month in each year.
7. Develop a MapReduce to analyze weather data set and print whether the day is shinny or
cool day.
8. Develop a MapReduce program to find the number of products sold in each country by
considering sales data containing fields like
The data is coming in log files and looks like as shown below.
UserId | TrackId | Shared | Radio | Skip
111115 | 222 | 0 | 1 | 0
111113 | 225 | 1 | 0 | 0
111117 | 223 | 0 | 1 | 1
111115 | 225 | 1 | 0 | 0
11. Develop a MapReduce program to find the frequency of books published eachyear and find
in which year maximum number of books were published usingthe following data.
Title Author Published Author Language No of pages
year country
12. Develop a MapReduce program to analyze Titanic ship data and to find the average age of
the people (both male and female) who died in the tragedy. How many persons are survived
in each class.
The titanic data will be..
Column 1 :PassengerI d Column 2 : Survived (survived=0
&died=1)
Column 3 :Pclass Column 4 : Name
Column 5 : Sex Column 6 : Age
Column 7 :SibSp Column 8 :Parch
Column 9 : Ticket Column 10 : Fare
Column 11 :Cabin Column 12 : Embarked
13. Develop a MapReduce program to analyze Uber data set to find the days on which each
basement has more trips using the following dataset.
16. Develop a Java application to find the maximum temperature using Spark. Text Books:
1. Tom White, “Hadoop: The Definitive Guide” Fourth Edition, O’reilly Media, 2015.
Reference Books:
1. Glenn J. Myatt, Making Sense of Data , John Wiley & Sons, 2007 Pete Warden, Big Data
Glossary, O’Reilly, 2011.
2. Michael Berthold, David J.Hand, Intelligent Data Analysis, Spingers, 2007.
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos, Uderstanding Big
Data : Analytics for Enterprise Class Hadoop and Streaming Data, McGrawHill Publishing,
2012.
4. AnandRajaraman and Jeffrey David UIIman, Mining of Massive Datasets Cambridge
University Press, 2012.
Course Outcomes:
EXP NO: 1
Install Apache Hadoop
Date:
AIM: To Install Apache Hadoop.
Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open
source project in the big data playing field and is sponsored by the Apache Software
Foundation.
Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
MapReduce is the original processing model for Hadoop clusters. It distributes work
within the cluster or map, then organizes and reduces the results from the nodes into a
response to a query. Many other processing models are available for the 2.x version of
Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode
which is suitable for learning about Hadoop, performing simple operations, and debugging.
Procedure:
we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.
Prerequisites:
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
2
Procedure to Run Hadoop
If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install,
Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.
2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)
Namenode, Datanode, Resource Manager and Node Manager will be started in few
minutes and ready to execute Hadoop MapReduce job in the Single Node
(pseudo-distributed mode) cluster.
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
Create a text file with some content. We'll pass this file as input to the
wordcount MapReduce job for counting words.
C:\file1.txt
Install Hadoop
Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used for counting
words.
C:\Users\abhijitg>cd c:\hadoop
C:\hadoop>bin\hdfs dfs -mkdir input
Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
Result: We've installed Hadoop in stand-alone mode and verified it by running an example
program it provided.
EXP NO: 2
MapReduce program to calculate the frequency
Date:
AIM: To Develop a MapReduce program to calculate the frequency of a given word in agiven
file Map Function – It takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (Key-Value pair).
Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN
Output
Convert into another set of data
(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
8
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
1. Splitting – The splitting parameter can be anything, e.g. splitting by space, comma,
semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In order to
group them in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each cluster) is
combine together to form a Result
Make sure that Hadoop is installed on your system with java idk
Steps to follow
Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo) >
Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish
Step 3. Right Click on Package > New > Class (Name it - WordCount)
Step 4. Add Following Reference Libraries –
Right Click on Project > Build Path> Add External Archivals
• /usr/lib/hadoop-0.20/hadoop-core.jar
• Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar
10
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}
12
Department of CSE
BIG DATA LABORATORY
To Move this into Hadoop directly, open the terminal and enter the following commands:
13
Department of CSE
BIG DATA LABORATORY
Description: MapReduce is a programming model designed for processing large volumes of data
in parallel by dividing the work into a set of independent tasks.Our previous traversal has given
an introduction about MapReduce This traversal explains how to design a MapReduce program.
The aim of the program is to find the Maximum temperature recorded for each year of NCDC
data. The input for our program is weather data files for each year This weather data is collected
by National Climatic Data Center – NCDC from weather sensors at all over the world. You can
find weather data for each year from ftp://ftp.ncdc.noaa.gov/pub/data/noaa/.All files are zipped
by year and the weather station. For each year, there are multiple files for different weather
stations. Here is an example for 1990 (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1901/).
• 010080-99999-1990.gz
• 010100-99999-1990.gz
• 010150-99999-1990.gz
14
Department of CSE
BIG DATA LABORATORY
• …………………………………
MapReduce is based on set of key value pairs. So first we have to decide on the types for the
key/value pairs for the input.
Map Phase: The input for Map phase is set of weather data files as shown in snap shot. The
types of input key value pairs are LongWritable and Text and the types of output key value pairs
are Text and IntWritable. Each Map task extracts the temperature data from the given year file.
The output of the map phase is set of key value pairs. Set of keys are the years. Values are the
temperature of each year.
Reduce Phase: Reduce phase takes all the values associated with a particular key. That is all the
temperature values belong to a particular year is fed to a same reducer. Then each reducer finds
the highest recorded temperature for each year. The types of output key value pairs in Map phase
is same for the types of input key value pairs in reduce phase (Text and IntWritable). The types
of output key value pairs in reduce phase is too Text and IntWritable. So, in this example we
write three java classes:
• HighestMapper.java
• HighestReducer.java
• HighestDriver.java
Program: HighestMapper.java
import java.io.IOException; import
org.apache.hadoop.io.*; import
org.apache.hadoop.mapred.*;
public class HighestMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,
IntWritable>
{ public static final int MISSING = 9999;
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException
{
String line = value.toString();
String year =
line.substring(15,19); int
temperature; if
(line.charAt(87)=='+')
temperature = Integer.parseInt(line.substring(88, 92));
else
15
Department of CSE
BIG DATA LABORATORY
HighestReducer.java
import java.io.IOException; import
java.util.Iterator; import
org.apache.hadoop.io.*; import
org.apache.hadoop.mapred.*;
public class HighestReducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException
{
int max_temp = 0;
;
while (values.hasNext())
{
int current=values.next().get();
if ( max_temp < current)
max_temp = current;
}
output.collect(key, new
IntWritable(max_temp/10)); }
HighestDriver.java
import org.apache.hadoop.fs.Path; import
org.apache.hadoop.conf.*; import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*; import
org.apache.hadoop.util.*; public class HighestDriver
extends Configured implements
Tool{public int run(String[] args) throws Exception
{
16
Department of CSE
BIG DATA LABORATORY
Output:
17
Department of CSE
BIG DATA LABORATORY
Result:
17
EXP NO: 4
MapReduce program to find the grades of student’s
Date:
AIM: To Develop a MapReduce program to find the grades of student’s.
import java.util.Scanner;
public class JavaExample
{ public static void main(String args[])
{
/* This program assumes that the student has 6 subjects,
* thats why I have created the array of size 6. You can
* change this as per the requirement.
*/ int marks[] = new
int[6]; int i; float
total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+
(i+1)+":"); marks[i] = scanner.nextInt(); total = total +
marks[i];
}
scanner.close();
//Calculating average
here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
}
else if(avg>=40 && avg<60)
{
Department of CSE
BIG DATA LABORATORY
System.out.print("C");
}
else
{
System.out.print("D");
}
}
}
Expected Output:
Result:
EXP NO: 5
MapReduce program to implement Matrix Multiplication
Date:
19
Department of CSE
BIG DATA LABORATORY
20
Department of CSE
BIG DATA LABORATORY
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.DoubleWritable; import
org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.io.Text; import
org.apache.hadoop.io.Writable; import
org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import
21
Department of CSE
BIG DATA LABORATORY
org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
Pair() { i =
0;
22
Department of CSE
BIG DATA LABORATORY
j = 0;
}
Pair(int i, int j)
{ this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException
{i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException
{output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare)
{if (i > compare.i)
{ return 1;
} else if ( i < compare.i)
{return -1;
} else { if(j > compare.j)
{ return 1;
} else if (j < compare.j)
{return -1;
}
} return
0;
}
public String toString()
{ return i + " " + j + "
";
}
}
public class Multiply { public static class MatriceMapperM extends
Mapper<Object,Text,IntWritable,Element> {
23
@Override
Department of CSE
BIG DATA LABORATORY
24
Department of CSE
BIG DATA LABORATORY
if (tempElement.tag == 0)
{ M.add(tempEleme
n
t);
} else if(tempElement.tag == 1)
{N.add(tempElement);
}
}
for(int i=0;i<M.size();i++) { for(int
j=0;j<N.size();j++) {
context.write(p, new
DoubleWritable(multiplyOutput)); }
}
}
}
public static class MapMxN extends Mapper<Object, Text, Pair, DoubleWritable>
{@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] pairValue = readLine.split(" ");
Pair p = new
Pair(Integer.parseInt(pairValue[0]),Integer.parseInt(pairValue[1]));
DoubleWritable val = new
DoubleWritable(Double.parseDouble(pairValue[2]));
context.write(p, val);
}
}
public static class ReduceMxN extends Reducer<Pair, DoubleWritable, Pair,
DoubleWritable>
{ @Override
public void reduce(Pair key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double sum = 0.0;
for(DoubleWritable value : values) {
25
Department of CSE
BIG DATA LABORATORY
sum += value.get();
}
context.write(key, new
DoubleWritable(sum)); }
}
public static void main(String[] args) throws Exception
{Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,
MatriceMapperN.class); job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true); Job
job2 = Job.getInstance();
job2.setJobName("MapFinalOutput")
; job2.setJarByClass(Project1.class);
job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);
job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
26
Department of CSE
BIG DATA LABORATORY
job2.waitForCompletion(true);
}
}
#!/bin/bash rm -rf
multiply.jar classes
module load
hadoop/2.6.0
echo "end"
stop-yarn.sh
stop-dfs.sh myhadoop-
cleanup.sh Expected Output:
27
Department of CSE
BIG DATA LABORATORY
Actual Output:
Result:
EXP NO: 6 MapReduce to find the maximum electrical consumption in
Date: each year
AIM: To Develop a MapReduce to find the maximum electrical consumption in each year
given electrical consumption for each month in each year.
28
Department of CSE
BIG DATA LABORATORY
Given below is the data regarding the electrical consumption of an organization. It contains the
monthly electrical consumption and the annual average for various years.
If the above data is given as input, we have to write applications to process it and produce
results such as finding the year of maximum usage, year of minimum usage, and so on. This is a
walkover for the programmers with finite number of records. They will simply write the logic
to produce the required output, and pass the data to the application written.
But, think of the data representing the electrical consumption of all the largescale industries of a
particular state, since its formation.
Input Data
The above data is saved as sample.txt and given as input. The input file looks as shown below.
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
Source code:
import java.util.*; import
java.io.IOException; import
java.io.IOException; import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.conf.*; import
org.apache.hadoop.io.*; import
org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class ProcessUnits
{
//Mapper class
29
Department of CSE
BIG DATA LABORATORY
30
Department of CSE
BIG DATA LABORATORY
{
JobConf conf = new JobConf(ProcessUnits.class);
conf.setJobName("max_eletricityunits"); conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
Expected OUTPUT:
Input:
Kolkata,56
Jaipur,45
Delhi,43
Mumbai,34
Goa,45
Kolkata,35
Jaipur,34
Delhi,32
Output:
Kolkata 56
Jaipur 45
Delhi 43
Mumbai 34
Actual Output:
Result:
EXP NO: 7 MapReduce to analyze weather data set and print whether the
Date: day is shinny or cool
AIM: To Develop a MapReduce to analyze weather data set and print whether the day is shinny
or cool day.
31
Department of CSE
BIG DATA LABORATORY
NCDC provides access to daily data from the U.S. Climate Reference Network /
U.S. Regional Climate Reference Network (USCRN/USRCRN) via anonymous ftp at:
Dataset ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01
After going through wordcount mapreduce guide, you now have the basic idea of
how a mapreduce program works. So, let us see a complex mapreduce program
on weather dataset. Here I am using one of the dataset of year 2015 of Austin,
Texas . We will do analytics on the dataset and classify whether it was a hot day or
a cold day depending on the temperature recorded by NCDC.
NCDC gives us all the weather data we need for this
32
Department of CSE
BIG DATA LABORATORY
ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2015/CRND
0103-2015- TX_Austin_33_NW.txt
Step 1
https://drive.google.com/file/d/0B2SFMPvhXPQ5bUdoVFZsQjE2ZDA/view?
usp=sharing
import java.io.IOException; import
java.util.Iterator; import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import
org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import
org.apache.hadoop.mapreduce.Job; import
org.apache.hadoop.mapreduce.Mapper; import
org.apache.hadoop.mapreduce.Reducer; import
org.apache.hadoop.conf.Configuration; public class MyMaxMin {
33
Department of CSE
BIG DATA LABORATORY
//date
Department of CSE
BIG DATA LABORATORY
new Text(String.valueOf(temp_Min)));
}
}
}
}
//Reducer
*MaxTemperatureReducer class is static and extends Reducer abstract having
*/
public void reduce (Text Key, Iterator<Text> Values, Context context) throws
IOException, Interrupted Exception { String
temperature = Values.next().toString();
context.write(Key, new Text(temperature));
}
}
public static void main(String[] args) throws Exception
{Configuration conf = new Configuration();
Job job = new Job(conf, "weather example");
job.setJarByClass(MyMaxMin.class);
job.setMapOutputKeyClass(Text.class);
35
Department of CSE
BIG DATA LABORATORY
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
OutputPath.getFileSystem(conf).delete(OutputPath);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Import the project in eclipse IDE in the same way it was told in earlier guide and
change the jar paths with the jar files present in the lib directory of this project.
When the project is not having any error, we will export it as a jar file, same as we
did in wordcount mapreduce guide. Right Click on the Project file and click on
Export. Select jar file.
36
Department of CSE
BIG DATA LABORATORY
37
Department of CSE
BIG DATA LABORATORY
38
Department of CSE
BIG DATA LABORATORY
39
Department of CSE
BIG DATA LABORATORY
You can download the jar file directly using below link temperature.jar
https://drive.google.com/file/d/0B2SFMPvhXPQ5RUlZZDZSR3FYVDA/view?us
p=sharing
link weather_data.txt
https://drive.google.com/file/d/0B2SFMPvhXPQ5aFVILXAxbFh6ejA/view?usp=s
haring
OUTPUT:
40
Department of CSE
BIG DATA LABORATORY
Result:
41
Department of CSE
BIG DATA LABORATORY
Source code:
public class Driver extends Configured implements Tool
{enum Counters { DISCARDED_ENTRY
}
public static void main(String[] args) throws Exception { ToolRunner.run(new Driver(), args);
} public int run(String[] args) throws Exception { Configuration configuration =
getConf();
job.setMapperClass(Mapper.class); job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setCombinerClass(Combiner.class); job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class);
protected void
map( LongWritabl
e key,
Text value,
42
Department of CSE
BIG DATA LABORATORY
org.apache.hadoop.mapreduce.Mapper<
43
LongWritable,
Text,
LongWritable,
Text
>.Context context
.substring(values.get(2).length() - 4);
* 60 + Integer.parseInt(time.substring(2,4));
LongWritable(Integer.parseInt(year)), new
Text(Integer.toString(minutes) + ":1")
);
} else
{
Department of CSE
BIG DATA LABORATORY
context.getCounter(Driver.Counters.DISCARDED_ENTRY).increment(1);
}} protected boolean
isValid(ArrayList<String> values)
Department of CSE
BIG DATA LABORATORY
45
Department of CSE
BIG DATA LABORATORY
Department of CSE
BIG DATA LABORATORY
Expected Output:
Actual Output:
47
EXP NO: 9 MapReduce program to find the tags associated with each
Date: movie by analyzing movie lens data
Department of CSE
BIG DATA LABORATORY
AIM: To Develop a MapReduce program to find the tags associated with each movie by
analyzing movie lens data.
For this analysis the Microsoft R Open distribution was used. The reason for this was its
multithreaded performance as described here. Most of the packages that were used come from
the tidyverse - a collection of packages that share common philosophies of tidy data. The tidytext
and wordcloud packages were used for some text processing. Finally, the doMC package was
used to embrace the multithreading in some of the custom functions which will be described
later. doMC package is not available on Windows. Use doParallel package instead. Driver1.java
package KPI_1;
Department of CSE
BIG DATA LABORATORY
//use MultipleOutputs and specify different Record class and Input formats
MultipleInputs.addInputPath(job, firstPath, TextInputFormat.class,
movieDataMapper.class);
MultipleInputs.addInputPath(job, sencondPath, TextInputFormat.class,
ratingDataMapper.class); //set
Reducer class
job.setReducerClass(dataReducer.class);
FileOutputFormat.setOutputPath(job, outputPath_1);
job.waitForCompletion(true)
Job job1 = Job.getInstance(conf, "Most Viewed Movies2");
job1.setJarByClass(Driver1.class);
job1.setMapperClass(topTenMapper.class);
job1.setReducerClass(topTenReducer.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(LongWritable.class);
job1.setOutputKeyClass(LongWritable.class);
job1.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, outputPath_1);
FileOutputFormat.setOutputPath(job1, outputPath_2);
job1.waitForCompletion(true);
}
}
50
Department of CSE