Week 1 in terminal $ sudo apt update install jdk 8
$ sudo apt install openjdk-8-jdk -y check the version
$java --version check the ssh for client server communication $ sudo apt install openssh-server
openssh-client -y create hadoop user $ sudo adduser hdoop
$ su - hdoop $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ chmod 0600 ~/.ssh/authorized_keys $ ssh localhost
download and install haddop
$ wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.0/hadoop- 3.4.0.tar.gz
$ tar xzf hadoop-3.4.0.tar.gz
Single Node Hadoop Deployment (Pseudo-Distributed Mode)
Configure a Hadoop environment by editing a set of configuration files:
.bashrc hadoop-env.sh core-site.xml hdfs-site.xml mapred-site-xml yarn-site.xml
$ nano .bashrc
add the following content at the end of the file
export HADOOP_HOME=/home/hdoop/hadoop-3.4.0 export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME export
HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-
Djava.library.path=$HADOOP_HOME/lib/native" saving procedure for all config files--> ctrl+x -> press 'y'
-> press enter
Run the command below to apply the changes to the current running environment:
$ source ~/.bashrc
Use the previously created $HADOOP_HOME variable to access the hadoop- env.sh file:
$ nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh edit and add export
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 save and quit for java path : using command $ which
java
$ readlink -f $(which java) we get /usr/lib/jvm/java-8-openjdk-amd64..... Open the core-site.xml file in a
text editor:
$ nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following configuration to override the default values for the temporary directory and add your
HDFS URL to replace the default local file system setting: <configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property> </configuration> save and quit
Use the following command to open the hdfs-site.xml file for editing:
before this step create directory in /home/hdoop/dfsdata
$ cd .. $ cd hdoop $ mkdir dfsdata $ cd dfsdata $ mkdir namenode
$ mkdir datanode
and in dfsdata folder create namenode and datanode directories
$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following configuration to the file and, if needed, adjust the NameNode and DataNode
directories to your custom locations:
<configuration> <property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property> <property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property> <property>
<name>dfs.replication</name>
<value>1</value>
</property> </configuration> save and quit
Use the following command to access the mapred-site.xml file and define MapReduce values:
$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following configuration to change the default MapReduce framework name value to yarn:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property> </configuration> save and quit
Open the yarn-site.xml file in a text editor: $ nano $HADOOP_HOME/etc/hadoop/yarn-site.xml Append
the following configuration to the file:
<configuration> <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property> <property
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property> <property>
<name>yarn.acl.enable</name>
<value>0</value>
</property> <property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLAS
SPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property> </configuration> save and quit format the namenode $ hdfs namenode format start the
services $ start-all.sh
or start namenode and yarn individually
$ -/start-dfs.sh
namenode,secondary namenodes and datanodes get started
$ ./start-yarn.sh
resourcemanager and nodemanagers get started
now Run the following command to check if all the daemons are active and running a> $ jps access
hadoop from browser using http://localhost:9870 access individual datanodes from browser using
http://localhost:9864
access yarn resource manager using http://localhost:8088
week 2
DESCRIPTION:-
HDFS is a scalable distributed file system designed to scale to petabytes of data while running on top
of the underlying file system of the operating system. HDFS keeps track of where the data resides in a
network by associating the name of its rack (or network switch) with the dataset. This allows Hadoop
to efficiently schedule tasks to those nodes that contain data, or which are nearest to it, optimizing
bandwidth utilization. Hadoop provides a set of command line utilities that work similarly to the
Linux file commands, and serve as your primary interface with HDFS.
We‘re going to have a look into HDFS by interacting with it from the command line. We will take a
look at the most common file management tasks in Hadoop, which include:
• Adding files and directories to HDFS
• Retrieving files from HDFS to local file system
• Deleting files from HDFS
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS
Step-1 Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, put the data into HDFS first. Create a
directory and put a file in it. HDFS has a default working directory of /user/$USER, where $USER is
your login username. Create it with the mkdir command. hadoop fs -put example.txt hadoop fs -put
example.txt /user/
Step-2 Retrieving Files from HDFS
The Hadoop command get copies files from HDFS back to the local file system.
To retrieve example.txt, we can run the following command: hadoop fs -cat example.txt Step-3
Deleting Files from HDFS hadoop fs -rm example.txt
Command for creating a directory in hdfs is “hdfs dfs –mkdir /cse”.
Adding directory is done through the command “hdfs dfs –put cse /”
Step-4 Copying Data from NFS to HDFS
Copying from directory command is
“hdfs dfs –copyFromLocal /home/Desktop/cse/” View the file by using the command “hdfs dfs –cat
/cse”
Command for listing of items in Hadoop is “hdfs dfs –ls hdfs://localhost:9000/”. Command for
Deleting files is “hdfs dfs –rm r /cse”.
Week 5
PROGRAM LOGIC: STEPS FOR INSTALLING APACHE PIG
1) Extract the pig-0.15.0.tar.gz and move to home directory 2) Set the environment of PIG in bashrc
file.
3) Pig can run in two modes Local Mode and Hadoop Mode Pig –x local and pig
4) Grunt Shell Grunt >
5) LOADING Data into Grunt Shell DATA = LOAD USING PigStorage(DELIMITER) as
(ATTRIBUTE : DataType1, ATTRIBUTE : DataType2…..)
6) Describe Data Describe DATA; 7) DUMP Data Dump DATA;
INPUT/OUTPUT: Input as Website Click Count Data
Write Pig Latin scripts sort, group, join, project, and filter your data
PROGRAM LOGIC:
FILTER Data FDATA = FILTER DATA by ATTRIBUTE = VALUE;
GROUP Data GDATA = GROUP DATA by ATTRIBUTE;
Iterating Data FOR_DATA = FOREACH DATA GENERATE GROUP AS GROUP_FUN,
ATTRIBUTE = <VALUE>
Sorting Data SORT_DATA = ORDER DATA BY ATTRIBUTE WITH CONDITION; LIMIT Data LIMIT_DATA =
LIMIT DATA COUNT;
JOIN Data JOIN DATA1 BY (ATTRIBUTE1,ATTRIBUTE2….) , DATA2 BY
(ATTRIBUTE3,ATTRIBUTE….N)
Week 2 Steps
Step 1: Enter the EC2 Dashboard
When you click here, the AWS management console will open in tab. Find EC2 under Compute and
click to open the Amazon EC2 Console.
Step 2: Create and Configure Your Virtual Machine
a. You are now in the Amazon EC2 console. Click Launch Instance
b. With Amazon EC2, specify the software and specifications of the instance you want to use. Choose
an Amazon Machine Image (AMI), which is a template that contains the software configuration
required to launch your instance.
Click Select Microsoft Windows Server 2012 R2 Base
C. Choose an instance type. Instance types comprise of varying combinations of CPU, memory,
storage, and networking capacity so you can choose the appropriate mix for your applications. Select
the default option of t2.micro then click Review and Launch at the bottom of the page.
d. Review the options that are selected for your instance which include AMI Details, Instance Type,
Security Groups, Instance Details, Storage, and Tags
Step 3: Create a Key Pair and Launch Your Instance
To connect to your virtual machine, you need a key pair. A key pair is used to log into your instance
(just like your house key is used to enter your home).
a. In the popover, select Create a new key pair and name it MyFirstKey. Then click Download Key Pair.
MyFirstKey.pem will be downloaded to your computer -- make sure to save this key pair in a safe
location on your computer.
b. After you have downloaded and saved your key pair, click Launch Instance to start your Windows
Server instance c. On the next screen, click View Instances to view the instance you have just created
and see its status.
Step 4: Connect to Your Instance
After launching your instance, it's time to retrieve the administrator password and connect to it using
a Remote Desktop Protocol (RDP) client.
AWS documentation includes information on how to install a RDP client if you need one.
Select the Windows Server instance you just created and click Connect
c.In order to retrieve the password, you will need to locate the Key Pair you created in Step 3. Click
Choose File and browse to the directory you stored MyFirstKey.pem. Your Key Pair will surface in the
text box. Click Decrypt Password.
Step 5: Terminate Your Windows VM
You can easily terminate the Windows Server VM from the Amazon EC2 console. In fact, it is a best
practice to terminate instances you are no longer using so you don’t keep getting charged for them.
a.Back on the EC2 Console, select the box next to the instance you created. Then click the Actions
button, navigate to Instante State, and click Terminate. B. you will be asked a confirm your terminal
week 5
Goto Services and select SQS under Messaging services.
Click on get started button and give the name of Queue and select the type of queue.
Click on Quick Create Queue button to create a Queue.
Goto Queue Actions and Select send message option.
Enter the message and click on delay check box and enter the delay time for the message and
click on send message then click on close.
Message is Displayed on the Queue Dash Board.
Click on Queue Actions and select View/Delete message option then click on start pooling for
messages button.
Message is displayed and select message click on delete message to delete.
2. Creating CDN with video streaming in AWS.
Steps
Click on S3 give the bucket name as unique by selecting the region and click on create
button.
Click on Cloud front console and click on create distribution then select RTMP button then
select S3 bucket for streaming video then select continue then click on Create distribution
with default values.
Upload video in the S3 bucket then select the video and make it public.
Paste the Distribution URL and Video file name in the Amazon Cloud Front Streaming
Diagnostic Client.
Click on play.
2. Creating CDN with static web content in AWS.
Steps
Click on S3 give the bucket name as unique by selecting the region and click on create
button.
Click on Cloud front console and click on create distribution then select WEB button then
select S3 bucket for static web content then select continue then click on Create distribution
with default values.
Upload static content in the S3 bucket then select the static content and make it public.
Create an HTML web page and give the distribution URL in the web page to get the static
content. Run the web page then you get the static content in the web page.
3. Check Elastic Bean stack for Deploying Web based Applications in AWS.
Steps:
Create a web application in any web base language.
Goto services and under compute services click on Amazon Elastic Bean stack.
Click on Get Started Button and Give the name of the application then select the platform of the
application to run and select the way how to upload the application code under Create a web project
the click on create application button.
Then it creates a platform to execute the application.
Click on URL of the machine to run your application
Week 6 Study and Implement Cloud
Step 1:Login Aws management console.Then search for VPC and create VPC.
Step 2:Configure two VPCs along with subnets.
Step 3:Configure the routes table one as public and the other as private.
Step 4:Choose Peering Connections, and select Create Peering Connection.
Step 5:Configure the following information, and choose Create Peering Connection.
Peering connection name tag: Type the name of VPC peering connection.
VPC (Requester): Select the VPC in your account with which you want to create the VPC
peering connection.
VPC(Accepter):Select Ensure My account is selected, and select the same region.
Step 6:In the confirmation dialog box, choose OK.
Step 7:Select the VPC peering connection that you have created Step8:choose Actions and select
Accept Request.
Step 9:In the confirmation dialog box select Yes, Accept
Week 7 Building a “Hello world” app for the cloud by using AWS Lambda
Step 1.Login to AWS lambda console.
Step 2. Go to functions and click on create function.
Step 3. Select author from scratch and give basic information like function name runtime and path.
Step 4. Give advanced settings and choose create function.
Step 5. This is the code of our programme that which language we have selected as runtime.
Click on test.
Step 7. Give the event template name and event name, if we want we can change the key values
then click on create.
Step 8. Again click on test it will show the execution result.
Finally our function was created.
Step 9. To delete the function.
Select the function and got actions choose delete.
Now click on delete.
Week8
Steps
To create a Google App Engine (GAE) Java project (hello world example), run it
locally, and deploy it to Google App Engine account.
1. Install Google Plugin for Eclipse
Read this guide – how to install Google Plugin for Eclipse. If you install the Google
App Engine Java SDK together with “Google Plugin for Eclipse“, then go to step 2,
Otherwise, get the Google App Engine Java SDK and extract it.
2. Create New Web Application Project
In Eclipse toolbar, click on the Google icon, and select “New Web Application
Project…”
3. Hello World
Review the generated project directory. 4.Run it local
Right click on the project and run as “Web Application”. Eclipse console: //...
Access URL http://localhost:8888/, see output
and also the hello world servlet – http://localhost:8888/helloworld
5. Deploy to Google App Engine