1. Installing Sun JDK 1.6: Installing JDK is a required step to install Hadoop.
You can follow the steps in
my previous post.
1. Based on your linux architecture, download the proper version from Oracle website (Oracle
JDK 1.7)
2. Then, uncompress the jdk archive using the following command:
tar -xvf jdk-7u65-linux-i586.tar
Or using the following command for 64 bits:
tar -xvf jdk-7u65-linux-x64.tar
3. Create a folder named jvm under (if not exists) using the following command
sudo mkdir -p /usr/lib/jvm
4. Then, move the extracted directory to /usr/lib/jvm:
sudo mv ~/Downloads/jdk1.7.0_71 /usr/lib/jvm/
5. Run the following commands to update the execution alternatives:
sudo update-alternatives --install "/usr/bin/java" "java"
"/usr/lib/jvm/jdk1.7.0_71/bin/java" 1 sudo update-alternatives
--install "/usr/bin/javac" "javac"
"/usr/lib/jvm/jdk1.7.0_71/bin/javac" 1 sudo update-alternatives
--install "/usr/bin/javaws" "javaws"
"/usr/lib/jvm/jdk1.7.0_71/bin/javaws" 1
6. Finally, you need to export JAVA_HOME variable:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
or it is better to set JAVA_HOME in .bashrc:
nano ~/.bashrc
then add the same line:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
2. Adding a dedicated Hadoop system user: You will need a user for hadoop system you will install. To
create a new user "hduser" in a group called "hadoop", run the following commands in your terminal:
$sudo addgroup hadoop
$sudo adduser --ingroup hadoop hduser
3.ConfiguringSSH:inMichaelBlog,heassumedthattheSSHisalreadyinstalled.Butifyoudidn'tinstallSSH
serverbefore,youcanrunthefollowingcommandinyourterminal:Bythiscommand,youwillhaveinstalledssh
serveronyourmachine,theportis22bydefault.
$sudo apt-get install openssh-server
WehaveinstalledSSHbecauseHadooprequiresaccesstolocalhost(incasesinglenodecluster)or
communicateswithremotenodes(incasemultinodecluster).
Afterthisstep,youwillneedtogenerateSSHkeyforhduser(andtheusersyouneedtoadministerHadoopif
any)byrunningthefollowingcommands,butyouneedfirsttoswitchtohduser:
$su - hduser
$ssh-keygen -t rsa -P ""
TobesurethatSSHinstallationiswentwell,youcanopenanewterminalandtrytocreatesshsessionusing
hduserbythefollowingcommand:
$ssh localhost
InstallingHadoop
NowwecandownloadHadooptobegininstallation.GotoApacheDownloadsanddownloadHadoopversion
0.20.2.Toovercomethesecurityissues,youcandownloadthetarfileinhduserdirectory,for
example,/home/hduser.Checkthefollowingsnapshot:
Thenyouneedtoextractthetarfileandrenametheextractedfolderto'hadoop'.Openanewterminalandrunthe
followingcommand:
$ cd /home/hduser
$ sudo tar xzf hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop
Pleasenoteifyouwanttograntaccessforanotherhadoopadminuser(e.g.hduser2),youhavetogrant
readpermissiontofolder/home/hduserusingthefollowingcommand:
sudo chown -R hduser2:hadoop hadoop
Update$HOME/.bashrc
Youwillneedtoupdatethe.bachrcforhduser(andforeveryuseryouneedtoadministerHadoop).Toopen.bachrc
file,youwillneedtoopenitasroot:
$sudogedit/home/hduser/.bashrc
Thenyouwilladdthefollowingconfigurationsattheendof.bachrcfile
# Set Hadoop-# related environment variables
export HADOOP_HOME=/home/hduser/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later
on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun
# or you can write the following command if you used this post to install your java
# export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
HadoopConfiguration
Now,weneedtoconfigureHadoopframeworkonUbuntumachine.Thefollowingareconfigurationfileswecan
usetodotheproperconfiguration.Toknowmoreabouthadoopconfigurations,youcanvisitthissite
hadoopenv.sh
WeneedonlytoupdatetheJAVA_HOMEvariableinthisfile.Simplyyouwillopenthisfileusingatexteditor
usingthefollowingcommand:
$sudo gedit /home/hduser/hadoop/conf/hadoop-env.sh
Thenyouwillneedtochangethefollowingline
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
export JAVA_HOME=/usr/lib/jvm/java-6-sun
or you can write the following command if you used this post to install your java
# export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Note:ifyoufaced"Error:JAVA_HOMEisnotset"Errorwhilestartingtheservices,thenyouseemsthatyou
forgottoeuncommentthepreviousline(justremove#).
coresite.xml
First,weneedtocreateatempdirectoryforHadoopframework.Ifyouneedthisenvironmentfortestingoraquick
prototype(e.g.developsimplehadoopprogramsforyourpersonaltest...),Isuggesttocreatethisfolder
under/home/hduser/directory,otherwise,youshouldcreatethisfolderinasharedplaceundersharedfolder(like
/usr/local...)butyoumayfacesomesecurityissues.Buttoovercometheexceptionsthatmaycausedbysecurity
(likejava.io.IOException),Ihavecreatedthetmpfolderunderhduserspace.
Tocreatethisfolder,typethefollowingcommand:
$ sudo mkdir
/home/hduser/tmp
Pleasenotethatifyouwanttomakeanotheradminuser(e.g.hduser2inhadoopgroup),youshouldgranthimaread
andwritepermissiononthisfolderusingthefollowingcommands:
$ sudo chown hduser2:hadoop /home/hduser/tmp
$ sudo chmod 755 /home/hduser/tmp
Now,wecanopenhadoop/conf/coresite.xmltoeditthehadoop.tmp.direntry.
Wecanopenthecoresite.xmlusingtexteditor:
$sudogedit/home/hduser/hadoop/conf/coresite.xml
Thenaddthefollowingconfigurationsbetween<configuration>..</configuration>xmlelements:
<!-- In: conf/core-site.xml -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
mapredsite.xml
Wewillopenthehadoop/conf/mapredsite.xmlusingatexteditorandaddthefollowingconfigurationvalues(like
coresite.xml)
<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
hdfssite.xml
Openhadoop/conf/hdfssite.xmlusingatexteditorandaddthefollowingconfigurations:
<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
FormattingNameNode
YoushouldformattheNameNodeinyourHDFS.Youshouldnotdothisstepwhenthesystemisrunning.Itis
usuallydoneonceatfirsttimeofyourinstallation.
Runthefollowingcommand
$/home/hduser/hadoop/bin/hadoop namenode -format
NameNode Formatting
StartingHadoopCluster
Youwillneedtonavigatetohadoop/bindirectoryandrun./startall.shscript.
Starting Hadoop Services using ./start-all.sh
Thereisanicetoolcalledjps.Youcanuseittoensurethatalltheservicesareup.
Using jps tool
The key feature of a Writable is that the framework knows how to serialize and deserialize
a Writable object. The WritableComparable adds the compareTo interface so the framework
knows how to sort the WritableComparable objects.