Matrix Multiplication using Hadoop
Map-Reduce
Step 1: Install Hadoop in Stand-Alone Mode
Step 2: Matrix MultiplicationUsing MapReduce Programming
1.1 Installing Java
Check Existing Java version by running command
java -version
1.2 Create hadoop home directory
We will use hadoop 3.1.2.tar.gz here.
Extract hadoop file using following command
tar -xzvf hadoop-2.7.3.tar.gz
Move hadoop to /usr/local
sudo mv hadoop-3.1.2 /usr/local/hadoop
1.3 Configuring Hadoop's Java_home
Hadoop requires that you set the path to Java, either as an environment variable or
in the Hadoop configuration file.
The path to Java, /usr/bin/java is a symlink to /etc/alternatives/java, which is in
turn a symlink to default Java binary. We will use readlink with the -f flag to
follow every symlink in every part of the path, recursively. Then, we'll use sed to
trim bin/java from the output to give us the correct value for JAVA_HOME
To find the default Java path
readlink -f /usr/bin/java | sed "s:bin/java::"
Output :
/usr/lib/jvm/java-11-openjdk-amd64/
Use Readlink to Set the Value Dynamically
Sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Add this line for
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
1.4 Running Hadoop
Now we should be able to run Hadoop:
/usr/local/hadoop/bin/hadoop
Output :
The help means we've successfully configured Hadoop to run in stand-alone mode.
We'll ensure that it is functioning properly by running the example MapReduce
program it ships with. To do so, create a directory called input in our home
directory and copy Hadoop's configuration files into it to use those files as our
data.
mkdir ~/input
cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
Next, we can use the following command to run the MapReduce hadoop-mapreduce-examples
program, a Java archive with several options. We'll invoke its grep program, one of many
examples included in hadoop-mapreduce-examples, followed by the input directory, input and
the output directory grep_example. The MapReduce grep program will count the matches of a
literal word or regular expression. Finally, we'll supply a regular expression to find
occurrences of the word principal within or at the end of a declarative sentence. The
expression is case-sensitive, so we wouldn't find the word if it were capitalized at the
beginning of a sentence:
/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep
~/input ~/grep_example 'principal[.]*'
When the task completes, it provides a summary of what has been processed and errors it has
encountered, but this doesn't contain the actual results
Results are stored in the output directory and can be checked by running cat on the output
directory:
cat ~/grep_example/*
Step 2: Matrix Multi1plicationUsing MapReduce Programming
2.1. In mathematics, matrix multiplication or the matrix product is a binary operation that
produces a matrix from two matrices. The definition is motivated by linear equations and linear
transformations on vectors, which have numerous applications in applied mathematics, physics, and
engineering. In more detail, if A is an n × m matrix and B is an m × p matrix, their matrix product
AB is an n × p matrix, in which the m entries across a row of A are multiplied with the m entries
down a column of B and summed to produce an entry of AB. When two linear transformations are
represented by matrices, then the matrix product represents the composition of the two
transformations.
Algorithm for Map Function.
a. for each element mij of M do
produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N
b. for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of rows
of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij)
and (N, j,njk) for all possible values of j.
Algorithm for Reduce Function.
for each key (i,k) do
sort values begin with M by j in listM
sort values begin with N by j in listN
multiply mij and njk for jth value of each list
sum up mij x njk return (i,k), Σj=1 mij x njk
2.2 Download the hadoop jar files with these links.
Download Hadoop Common Jar files :
wget https://goo.gl/G4MyHp -O hadoop-common-3.1.2.jar
Download Hadoop Mapreduce Jar File :
wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-3.1.2.jar
2.3 Creating Mapper file for Matrix Multiplication.
Refer Map.java
2.4 Creating Reducer.java file for Matrix Multiplication
educe.java
Refer R
2.5 Creating MatrixMultiply.java file
Refer MatrixMultiply.java
2.6 Compiling the program in particular folder named as operation/
javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -d operation/
Map.java
javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -d operation/
Reduce.java
javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -d operation/
MatrixMultiply.java
2.7 Let’s retrieve the directory after compilation.
ls -R operation/
reating Jar file for the Matrix Multiplication.
2.8 C
jar -cvf MatrixMultiply.jar -C operation/ .
Output :
2.9 Uploading the M, N file which contains the matrix multiplication data to HDFS.
Refer File ‘M’
Refer File ‘N’
hadoop fs -mkdir Matrix/
hadoop fs -copyFromLocal M Matrix/
hadoop fs -copyFromLocal N Matrix/
2.10 Executing the jar file using hadoop command and thus how fetching record from
HDFS and storing output in HDFS.
hadoop jar MatrixMultiply.jar MatrixMultiply Matrix result
NOTE : Here output of mapper and reducer will be generated
2.11 Getting Output from part-r-00000 that was generated after the execution of the
hadoop command.
hadoop fs -cat result/part-r-00000