1 of 4
BMCS2013 DATA ENGINEERING
PRACTICAL 2 Hadoop Distributed File System (HDFS)
1. Launch the Ubuntu-22.04-de distro:
In PowerShell (run as administrator), launch the distro for this course:
PS C:\Users\TARUMT> wsl ~
hduser@PC25:~$
~~~ As the user hduser ~~~
2. Start HDFS and YARN
2.1. Start the HDFS service
hduser@PC25:~$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [PC25]
To check the services currently running, use the jps command:
hduser@PC25:~$ jps
1392 Jps
1114 SecondaryNameNode
876 DataNode
685 NameNode
2.2. Start the YARN service
hduser@PC25:~$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
hduser@PC25:~$ jps
2112 Jps
1696 NodeManager
1542 ResourceManager
1114 SecondaryNameNode
876 DataNode
685 NameNode
FYI only, the following actions have already been completed in the distro:
# Create the directories named user and tmp in the distributed file system:
# The /user directory is where all Hadoop users’ home directories will be created later on.
hduser@PC25:~$ hdfs dfs -mkdir /user
hduser@PC25:~$ hdfs dfs -mkdir /tmp
# Give full permissions for all users to the tmp directory:
hduser@PC25:~$ hdfs dfs -chmod -R 777 /tmp
2 of 4
3. Create User Directories in HDFS
3.1. Create a HDFSuser directory for student:
hduser@PC25:~$ hdfs dfs -mkdir /user/student
3.2. Change ownership for the newly created directory:
hduser@PC25:~$ hdfs dfs -chown student:hduser /user/student
Note (FYI only):
HDFS file permissions are similar to Linux file permissions.
E.g., to change the permission of the file shakespeare.txt to 664:
$ hdfs dfs -chmod 664 shakespeare.txt
where 664 is an octal representation of the flags to set for the permission triple.
The above statement changes the permissions to -rw-rw-r--:
● 6 is 110, which means read and write, but not execute.
● 7 is 111, which means complete permissions.
● 4 is 100, which means read-only.
~~~ As the user student ~~~
4. Switch user to student
hduser@PC25:~$ su - student
student@PC25:~$
5. HDFS Basic File System Operations
5.1. See the available commands in the dfs shell
student@PC25:~$ hdfs dfs -help
5.2. Download the shakespeare.txt file from Google Drive into your local file
system (Ubuntu 22.04)
student@PC25:~$ wget --no-check-certificate
'https://docs.google.com/uc?
export=download&id=122PnuKaSaA_OyYOKnxQOdlMc5awdyf5v' -O
shakespeare.txt
💡 Remember to confirm that the above action is successful.
5.3. Copy the downloaded file shakespeare.txt from the local file system to HDFS
student@PC25:~$ hdfs dfs -put shakespeare.txt shakespeare.txt
💡 Remember to confirm that the above action is successful.
3 of 4
4 of 4
5.4. Read the contents of the file in HDFS using the cat command, and then pipe the
output to less in order to view the contents of the remote file.
student@PC25:~$ hdfs dfs -cat shakespeare.txt | less
Note: use the arrow keys to navigate the file. Type q to quit.
5.5. Copy the file from HDFS to the local file system and rename it as shakespeare-
dfs.txt.
student@PC25:~$ hdfs dfs -get shakespeare.txt ./shakespeare-
dfs.txt
💡 Remember to confirm that the above action is successful.
6. To end your practical sessions
6.1. Logout from the student account
student@PC25:~$ exit
hduser@PC25:~$ su - student
~~~ As the user hduser ~~~
6.2. Terminate the YARN service
hduser@PC25:~$ stop-yarn.sh
6.3. Terminate the HDFS service
hduser@PC25:~$ stop-dfs.sh
6.4. Logout from the hduser account
hduser@PC25:~$ exit
PS C:\Users\TARUMT>
6.5. Terminate the WSL instance
PS C:\Users\TARUMT> exit
Other HDFS Commands
Recall that the HDFS shell commands are similar to POSIX-like commands and invoked using:
$ hdfs dfs <args> <command>
Other HDFS commands include:
cat chown ls rm
chgrp cp mkdir stat
chmod du mv tail