KEMBAR78
Passing Parameters using File and Command Line | PDF
Other Functions and
Passing Parameters
Using command line or a
file
Other Functions
 Average:
grunt> grouped = group dataTransaction by CustomerName
grunt> average = FOREACH grouped GENERATE group, AVG
( dataTransaction.TransAmt1);
 COUNT: doesn’t count the NULL VALUES
grunt> cnt = foreach grouped generate group,
COUNT(dataTransaction);
grunt> dump cnt;
 COUNT_STAR: counts even the NULL VALUES
grunt> cntStar = foreach grouped GENERATE group,
COUNT_STAR($1);
Rupak Roy
 Concatenate:
grunt> c = foreach concat.csv GENERATE
CONCAT($0,$1);
 Multiple concatenate:
grunt> c = foreach concat.csv GENERATE
CONCAT($0,’-’,Transaction_ID);
 Is Empty: to check if a bag or map is empty
grunt> F = filter dataTransaction by IsEmpty($1);
Or
grunt> F = filter dataTransaction by Not IsEmpty($1);
Rupak Roy
 MAX/MIN
grunt> g = group dataTransaction by
CustomerName;
grunt> m= foreach g generate group ,
MIN( dataTransaction.TransAmt1);
or
m = foreach g generate
dataTransaction.CustomerName,
MIN(dataTranscation.TransAmt1);
grunt> m= foreach g generate
dataTransaction.CustomerName,
MAX( dataTransaction.TransAmt1);
Rupak Roy
SIZE: is used to calculate the size of the data
according to the Pig data type
grunt> S =foreach dataTransaction generate
SIZE($0), SIZE(CustomerName),SIZE($2);
Rupak Roy
 SUM
grunt> g = group dataTransaction by
CustomerName;
grunt> s= foreach grouped generate
dataTransaction.CustomerName,
SUM( dataTransation.TransAmt1)
Note: SUM, MAX/MIN, COUNT, COUNT_STAR,AVG
requires GROUP statement before we apply the
functions
Rupak Roy
Flatten Operator
 It used to change the structure of the tuples and
bags. Flatten un-nest tuples and bags.
 For example: consider the tuple has structure like
(a(b,c)). If we add FLATTEN such as GENERATE
flatten($0) it will cause the Tuple to become
(a,b,c)
 Again, if we have tuple in the from of
(a,{(b,c,),(d,e)}) which is a group generated by
GROUP OPERATOR and add GENERATE FLATTEN
$0 will give you (a,b,c) and (a,d,e)
Rupak Roy
Run Pig Scripts directly from a file
First create a file and save it in a .pig extension.
Type vi output.pig in the terminal
Then write the Pig script
A= LOAD ‘home/hduser/datasets/store.csv’ using
PigStorage(‘,’) as ( )
B= foreach A generate $0,$2;
Now, save the file as output.pig ( or with any .pig extension)
and now execute from any terminal
[bob$localhost~]$ pig –x local /home/hduser/output.pig
Note: if you want to use in HDFS just type only ‘ pig’
And for local mode ‘ pig –x local ‘
Rupak Roy
Pig gives you 2 available options to
pass parameters:
1. Using file: -param_file path to the
parameter file.
2. Using command line: -p,-param key value
pair of the form param=val
Rupak Roy
Passing Parameters
USING COMMAND LINE:
Create a new file:
vi output1.pig
A= LOAD ‘home/hduser/datasets/store.csv’ using PigStorage(‘,’) as
( )
B = FILTER A by Place ==’$Place’;
DUMP B;
Save the file as output1.pig or with any name and execute the file from
terminal.
[bob$localhost~]$ pig -x local -p Place=‘Alberta’ output1.pig
To pass multiple parameters:
pig -x local -p Place=‘Alberta’ -p Age=‘29’ -p Product=‘electronics’
output1.pig
Rupak Roy
USING FILE:
Create a Parameter file using: vi pfile type i to enter
insert mode
Then type CustomerName == ‘Carl Jackson’
threshold = 5
To exit from insert mode Press Esc
then type :wq! To save the contents
Pig –param_file pfile home/hduser/displayoutput.pig
Rupak Roy
Next
 Flume, a distributed, reliable tool for
collecting large amount of streaming
data.
Rupak Roy

Passing Parameters using File and Command Line

  • 1.
    Other Functions and PassingParameters Using command line or a file
  • 2.
    Other Functions  Average: grunt>grouped = group dataTransaction by CustomerName grunt> average = FOREACH grouped GENERATE group, AVG ( dataTransaction.TransAmt1);  COUNT: doesn’t count the NULL VALUES grunt> cnt = foreach grouped generate group, COUNT(dataTransaction); grunt> dump cnt;  COUNT_STAR: counts even the NULL VALUES grunt> cntStar = foreach grouped GENERATE group, COUNT_STAR($1); Rupak Roy
  • 3.
     Concatenate: grunt> c= foreach concat.csv GENERATE CONCAT($0,$1);  Multiple concatenate: grunt> c = foreach concat.csv GENERATE CONCAT($0,’-’,Transaction_ID);  Is Empty: to check if a bag or map is empty grunt> F = filter dataTransaction by IsEmpty($1); Or grunt> F = filter dataTransaction by Not IsEmpty($1); Rupak Roy
  • 4.
     MAX/MIN grunt> g= group dataTransaction by CustomerName; grunt> m= foreach g generate group , MIN( dataTransaction.TransAmt1); or m = foreach g generate dataTransaction.CustomerName, MIN(dataTranscation.TransAmt1); grunt> m= foreach g generate dataTransaction.CustomerName, MAX( dataTransaction.TransAmt1); Rupak Roy
  • 5.
    SIZE: is usedto calculate the size of the data according to the Pig data type grunt> S =foreach dataTransaction generate SIZE($0), SIZE(CustomerName),SIZE($2); Rupak Roy
  • 6.
     SUM grunt> g= group dataTransaction by CustomerName; grunt> s= foreach grouped generate dataTransaction.CustomerName, SUM( dataTransation.TransAmt1) Note: SUM, MAX/MIN, COUNT, COUNT_STAR,AVG requires GROUP statement before we apply the functions Rupak Roy
  • 7.
    Flatten Operator  Itused to change the structure of the tuples and bags. Flatten un-nest tuples and bags.  For example: consider the tuple has structure like (a(b,c)). If we add FLATTEN such as GENERATE flatten($0) it will cause the Tuple to become (a,b,c)  Again, if we have tuple in the from of (a,{(b,c,),(d,e)}) which is a group generated by GROUP OPERATOR and add GENERATE FLATTEN $0 will give you (a,b,c) and (a,d,e) Rupak Roy
  • 8.
    Run Pig Scriptsdirectly from a file First create a file and save it in a .pig extension. Type vi output.pig in the terminal Then write the Pig script A= LOAD ‘home/hduser/datasets/store.csv’ using PigStorage(‘,’) as ( ) B= foreach A generate $0,$2; Now, save the file as output.pig ( or with any .pig extension) and now execute from any terminal [bob$localhost~]$ pig –x local /home/hduser/output.pig Note: if you want to use in HDFS just type only ‘ pig’ And for local mode ‘ pig –x local ‘ Rupak Roy
  • 9.
    Pig gives you2 available options to pass parameters: 1. Using file: -param_file path to the parameter file. 2. Using command line: -p,-param key value pair of the form param=val Rupak Roy
  • 10.
    Passing Parameters USING COMMANDLINE: Create a new file: vi output1.pig A= LOAD ‘home/hduser/datasets/store.csv’ using PigStorage(‘,’) as ( ) B = FILTER A by Place ==’$Place’; DUMP B; Save the file as output1.pig or with any name and execute the file from terminal. [bob$localhost~]$ pig -x local -p Place=‘Alberta’ output1.pig To pass multiple parameters: pig -x local -p Place=‘Alberta’ -p Age=‘29’ -p Product=‘electronics’ output1.pig Rupak Roy
  • 11.
    USING FILE: Create aParameter file using: vi pfile type i to enter insert mode Then type CustomerName == ‘Carl Jackson’ threshold = 5 To exit from insert mode Press Esc then type :wq! To save the contents Pig –param_file pfile home/hduser/displayoutput.pig Rupak Roy
  • 12.
    Next  Flume, adistributed, reliable tool for collecting large amount of streaming data. Rupak Roy