Hadoop Installation on Ubuntu (Linux)
Live USB Installation
The following installation is extracted from
https://www.vultr.com/docs/install-and-configure-apache-hadoop-on-ubuntu-20-04
author: Thomas Rakwach
1. Install Java
Install the latest version of Java.
$ sudo apt install default-jdk default-jre -y
Verify the installed version of Java.
$ java -version
2. Create Hadoop User and Configure Password-
less SSH
Add a new user hadoop.
$ sudo adduser hadoop
Add the hadoop user to the sudo group.
$ sudo usermod -aG sudo hadoop
Switch to the created user.
$ sudo su - hadoop
Install the OpenSSH server and client.
$ sudo apt install openssh-server openssh-client -y
Sham Shul Shukri Mat
When you get a prompt, respond with:
keep the local version currently installed
Switch to the created user.
$ sudo su - hadoop
Generate public and private key pairs.
$ ssh-keygen -t rsa
Add the generated public key from id_rsa.pub to authorized_keys.
$ sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Change the permissions of the authorized_keys file.
$ sudo chmod 640 ~/.ssh/authorized_keys
Verify if the password-less SSH is functional.
$ ssh localhost
3. Install Apache Hadoop
Log in with hadoop user.
$ sudo su - hadoop
Download the latest stable version of Hadoop. To get the latest version, go to Apache
Hadoop official download page.
$ sudo wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Extract the downloaded file.
$ tar -xvzf hadoop-3.3.1.tar.gz
Sham Shul Shukri Mat
Move the extracted directory to the /usr/local/ directory.
$ sudo mv hadoop-3.3.1 /usr/local/hadoop
Create directory to store system logs.
$ sudo mkdir /usr/local/hadoop/logs
Change the ownership of the hadoop directory.
$ sudo chown -R hadoop:hadoop /usr/local/hadoop
4. Configure Hadoop
Edit file ~/.bashrc to configure the Hadoop environment variables.
$ sudo nano ~/.bashrc
Add the following lines to the file. Save and close the file.
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Activate the environment variables.
$ source ~/.bashrc
5. Configure Java Environment Variables
Hadoop has a lot of components that enable it to perform its core functions. To configure
these components such as YARN, HDFS, MapReduce, and Hadoop-related project settings,
you need to define Java environment variables in hadoop-env.sh configuration file.
Sham Shul Shukri Mat
Find the Java path.
$ which javac
Find the OpenJDK directory.
$ readlink -f /usr/bin/javac
Edit the hadoop-env.sh file.
$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Add the following lines to the file. Then, close and save the file.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"
Browse to the hadoop lib directory.
$ cd /usr/local/hadoop/lib
Download the Javax activation file.
$ sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-
api/1.2.0/javax.activation-api-1.2.0.jar
Verify the Hadoop version.
$ hadoop version
Edit the core-site.xml configuration file to specify the URL for your NameNode.
$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following lines. Save and close the file.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
<description>The default file system URI</description>
</property>
</configuration>
Sham Shul Shukri Mat
Create a directory for storing node metadata and change the ownership to hadoop.
$ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
$ sudo chown -R hadoop:hadoop /home/hadoop/hdfs
Edit hdfs-site.xml configuration file to define the location for storing node metadata, fs-
image file.
$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following lines. Close and save the file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
Edit mapred-site.xml configuration file to define MapReduce values.
$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following lines. Save and close the file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit the yarn-site.xml configuration file and define YARN-related settings.
$ sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the following lines. Save and close the file.
Sham Shul Shukri Mat
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Log in with hadoop user.
$ sudo su - hadoop
Validate the Hadoop configuration and format the HDFS NameNode.
$ hdfs namenode -format
6. Start the Apache Hadoop Cluster
Start the NameNode and DataNode.
$ start-dfs.sh
Start the YARN resource and node managers.
$ start-yarn.sh
Verify all the running components.
$ jps
7. Access Apache Hadoop Web Interface
You can access the Hadoop NameNode on your browser via http://server-IP:9870. For
example:
http://192.0.2.11:9870
Sham Shul Shukri Mat