KEMBAR78
Shoaib Program From 7 | PDF | Computer File | Directory (Computing)
0% found this document useful (0 votes)
15 views17 pages

Shoaib Program From 7

Uploaded by

er.dixitdev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Shoaib Program From 7

Uploaded by

er.dixitdev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

PROGRAM 7

i) Installation Of Pig.
ii) Write Pig Latin Scripts sort, group, join, project, and Filter your Data

Installation Apache Pig

First of all, download the latest version of Apache Pig from the following website
- https://pig.apache.org/

Step 1
Open the homepage of Apache Pig website. Under the section News, click on the link release
page as shown in the following snapshot.

Step 2
On clicking the specified link, you will be redirected to the Apache Pig Releases page. On
this page, under the Download section, you will have two links, namely, Pig 0.8 and later and
Pig 0.7 and before. Click on the link Pig 0.8 and later, then you will be redirected to the page
having a set of mirrors.

MOHD SHOYEB SHEAKH 2101331540065


Step 3

Choose and click any one of these mirrors as shown below, These mirrors will take you to
the releases page. This page contains various versions of Apache pig. Click the latest version
to download.

Step 5

Within these folders, you will have the source and binary files of Apache Pig in various
distributions. Download the tar files of the source and binary files of Apache Pig 0.15,
pig0.15.0-src.tar.gz and pig-0.15.0.tar.gz.

l Install Apache Pig

After downloading the Apache Pig software, install it in your Linux environment by
following the steps given below.

MOHD SHOYEB SHEAKH 2101331540065


Step 1
Create a directory with the name Pig in the same directory where the installation directories
of Hadoop, Java, and other software were installed. (In our tutorial, we have created the Pig
directory in the user named Hadoop).

$ mkdir Pig

Step 2
Extract the downloaded tar files as shown below.

$ cd Downloads/ $ tar zxvf pig-0.15.0-src.tar.gz


$ tar zxvf pig-0.15.0.tar.gz

Step 3
Move the content of pig-0.15.0-src.tar.gz file to the Pig directory created earlier as shown
below.

$ mv pig-0.15.0-src.tar.gz/* /home/Hadoop/Pig/

Configure Apache Pig


After installing Apache Pig, we must configure it. To configure, we need to edit two files
- bashrc and pig. properties.
. bashrc file
In the .bashrc file, set the following variables –

Step 5
Within these folders, you will have the source and binary files of Apache Pig in various
distributions. Download the tar files of the source and binary files of Apache Pig 0.15,
pig0.15.0-src.tar.gz and pig-0.15.0.tar.gz.

Install Apache Pig


After downloading the Apache Pig software, install it in your Linux environment by
following the steps given below.
Hadoop installations (the directory that contains the core-site.xmi, hdfs-site.xml and mapred-
site.xml files).

export PIG_HOME = /home/Hadoop/Pig


export PATH = $PATH:/home/Hadoop/pig/bin
export PIG_CLASSPATH = $HADOOP_HOME/conf

pig.properties file
In the conf folder of Pig, we have a file named pig.properties. In the pig.properties file, you
can set various parameters as given below.
pig-h properties
Verifying the Installation
Verify the installation of Apache Pig by typing the version command. If the installation is
successful, you will get the version of Apache Pig as shown below.

$ pig -version
Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
MOHD SHOYEB SHEAKH 2101331540065
Write Pig Latin Scripts sort, group, join, project, and Filter your Data.
Dataset Creation :
touch input1.csv

Cat>> input1.csv

id,name,age
1,John,25
2,Alice,30
3,Bob,28

l SORT: This script defines two functions: pig_latin() to convert a single word to Pig
Latin, and sort_pig_latin() to sort a list of words using Pig Latin transformation as the
sorting key. Finally, it applies the sorting function to a list of words and prints the
result.

MOHD SHOYEB SHEAKH 2101331540065


GROUP: In this modified script, the group_pig_latin() function iterates through the list
of words, converts each word to its Pig Latin form using the pig_latin() function, and
then adds the original word to the corresponding list in the dictionary pig_latin_groups.
Finally, it prints out the dictionary, displaying each Pig Latin form along with the list of
words that have that form.

MOHD SHOYEB SHEAKH 2101331540065


JOIN: In this modified script, the join_grouped_words() function iterates through each
group of Pig Latin forms and joins the words in each group into a single string separated
by commas. Finally, it prints out the dictionary with each Pig Latin form along with the
joined string of words that have that form.

l PROJECT: A Pig Latin script project could involve creating a more comprehensive
program that performs various operations related to Pig Latin. Here's a breakdown of
what such a project. By combining these features, you can create a versatile Pig Latin
script project that users can use for various purposes, from simple word conversion to
more complex operations like sorting and grouping.

MOHD SHOYEB SHEAKH 2101331540065


 FILTER: After executing this script, the resulting file (filtered_words.txt) will
contain only the words with counts greater than 5. You can customize the filter
condition based on your specific requirements, such as filtering based on other
attributes or using more complex conditions. Pig Latin provides various operators and
functions that you can use within the FILTER clause to achieve different filtering
tasks.

MOHD SHOYEB SHEAKH 2101331540065


PROGRAM 8
l Run the Pig Latin Scripts to find Word Count.
l Run the Pig Latin Scripts to find a max temp for every year.

Run the Pig Latin Scripts to find Word Count

Word count input:-

Word count output:-

MOHD SHOYEB SHEAKH 2101331540065


Run the Pig Latin Scripts to find a max temp for every year.

 Create temperature.csv and insert values in it.

 Then Load the data into pig

 Output of the max temperature

MOHD SHOYEB SHEAKH 2101331540065


PROGRAM 9

 Installation of HIVE
 Use Hive to create, alter, and drop databases, tables, views, functions and
indexes
Downloading Hive
We use hive-0.14.0 in this tutorial. You can download it by visiting the following link
http://apache.petsads.us/hive/hive-0.14.0/.Let us assume it gets downloaded onto the
/Downloads directory. Here, we download Hive archive named "apache-hive-0.14.0-
bin.tar.gz" for this tutorial. The following command is used to verify the download:
$ cd Downloads
$ ls
On successful download, you get to see the following response:
apache-hive-0.14.0-bin.tar.gz

Installing Hive
The following steps are required for installing Hive on your system. Let us assume the Hive
archive is downloaded onto the /Downloads directory.

Extracting and verifying Hive Archive


The following command is used to verify the download and extract the hive archive:

$ tar zxvf apache-hive-0.14.0-bin.tar.gz


$ ls
On successful download, you get to see the following response:

apache-hive-0.14.0-bin apache-hive-0.14.0-bin.tar.gz

Copying files to /us/local/hive directory


We need to copy the files from the super user "su-". The following commands are used to
copy the files from the extracted directory to the /usr/local/hive" directory.

$ su -
passwd:
# cd /home/user/Download
# mv apache-hive-0.14.0-bin /usr/local/hive
# exit

Setting up environment for Hive


MOHD SHOYEB
export SHEAKH
HIVE_HOME=/usr/local/hive 2101331540065
export PATH=$PATH:$HIVE_HOME/bin export
CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*: export
CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.
You can set up the Hive environment by appending the following lines to ~/.bashrc file:The following
$ source ~ /.bashrc
command is used to execute ~/.bashrc file.

Configuring Hive
To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the
$HIVE_HOME/conf directory. The following commands redirect to Hive config folder and
copy the template file:
$ cd $HIVE HOME /conf
$ cp hive-env.sh.template hive-env.sh

Edit the hive-env.sh file by appending the following line:


export HADOOP_HOME=/usr/local/Hadoop

Creating and using database then creating tables and inserting values into it

Viewing a table:

Using ALTER to rename a table:

MOHD SHOYEB SHEAKH 2101331540065


Creating view from table, dropping the table and the database:

Using count function to count total entries in a table using count() function:

Fetching the maximum salary using max() function.

MOHD SHOYEB SHEAKH 2101331540065


PROGRAM 10
Install HBase and perform CRUD operations using
HBase Shell.

Installation of HBase:

1. Download the latest version of Apache HBase from the official website
(https://base.apache.org/) link - https://dlcdn.apache.org/hbase/3.0.0-beta-1
(. click -> hbase-3.0.0-beta-1-bin.tar.gz).

2. Extract the downloaded file to a directory of your choice.

3. Set the environment variable HBASE_HOME to the directory where you extracted the
files.

export HBASE_HOME="Desktop/hadoop-3.3.6/hbase/hbase-
3.0.0-beta-1" export PATH=$PATH:$HBASE_HOME/bin

4. Now create two Folder namely

mkdir HBase_data
mkdir zookeeper

5. Add the HBase binary directory to the PATH environment variable.


Edit hive-site.xml file.

<property>

<name>hbase.rootdir</name>
<value>/home/shivam/hbase_data</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>

6. Verify the installation by running the command hbase shell in the terminal.

bin/start-
hbase.sh
HBase shell
bin/stop-

MOHD SHOYEB SHEAKH 2101331540065


Performing CRUD operations using HBase Shell:

l Creating a table:

l Insert the data into the table:

l Retrieve the data form the table:

l Update data in the table:

l Delete Data from the file:

l Scan the table:

l Disable and drop the table:

PROGRAM 11
Implement Spark Core Processing RDD to run Word Count program.

MOHD SHOYEB SHEAKH 2101331540065


PROGRAM 12
Implement Spark Core Processing RDD to read a table stored

MOHD SHOYEB SHEAKH 2101331540065


in a database and calculate the number of people for every age.

MOHD SHOYEB SHEAKH 2101331540065


MOHD SHOYEB SHEAKH 2101331540065

You might also like