SRM INSTITUTE OF SCIENCE AND TECHNOLOGY: VADAPALANI CAMPUS
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEEERING
Prerequisite
ubuntu 16.04
Make ec2 as password Authentication:
use command for setting password to ec2 ubuntu image: sudo passwd ubuntu
Step:1 JAVA 8-----
1. sudo add-apt-repository ppa:webupd8team/java
2. sudo apt-get update
3. sudo apt-get install oracle-java8-installer
4. sudo apt-get install oracle-java8-set-default
Step 2: SSH SERVER INSTALLATION
5. sudo apt-get install openssh-server
6. sudo sed -i -e 's/PasswordAuthentication no/PasswordAuthentication yes/g'
/etc/ssh/sshd_config
7. ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
8. sudo service ssh restart
9. ssh localhost
//passwordless login
10. exit
Step 3: Download hadoop package
https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
10 .wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-
2.7.3.tar.gz
11. sudo tar -xzvf hadoop-2.7.3.tar.gz
sudo mkdir -p /usr/local/hadoop
sudo mv hadoop-2.7.3/* /usr/local/hadoop/
12. sudo chown -R ubuntu:ubuntu /usr/local/hadoop
//create folder for datanode and name node
13sudo mkdir -p /app/hadoop/tmp
14 sudo mkdir -p /app/hadoop/tmp
set permission
15 sudo chown -R ubuntu /app/hadoop/tmp
Step 4: Configure Hadoop:
Check where your Java is installed:
16 readlink -f /usr/bin/java
If you get something like /usr/lib/jvm/java-8-oracle/jre/bin/java,
/usr/lib/jvm/java-8-oracle is what you should used for JAVA_HOME.
Add to ~/.bashrc file:
17 sudo nano ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
Reload ~/.bashrc file:
18 source ~/.bashrc
Modify JAVA_HOME in
19 sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
Modify
20. sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml
to have something like:
<configuration>
...
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
...
</configuration>
Modify
21. sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
to have something like:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
Create /usr/local/lib/hadoop-2.7.0/etc/hadoop/mapred-site.xml
from template:
21. cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
Modify
22. sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
to have something like:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Modify
23. sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
to have something like:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format file system:
24. hdfs namenode -format
Start Hadoop:
25. start-dfs.sh
26. start-yarn.sh
You might be asked to accept machine’s key.
Check if everything is running:
27. jps
You should get something like:
Jps
NodeManager
NameNode
ResourceManager
DataNode
SecondaryNameNode
TYPE IN WEB BROWSER
28. http://localhost:8088/cluster
29. http://localhost:50070/
INSTALLED HADOOP CLUSTER SUCCESSFULLY IN AMAZON EC2