KEMBAR78
PerlScripting | PDF
Writing Perl Scripts

Boyce Thompson Institute for Plant
            Research
           Tower Road
  Ithaca, New York 14853-1801
             U.S.A.

             by
 Aureliano Bombarely Gomez
Writing Perl Scripts:
   1. Four mandatory lines.

   2. Useful modules I: Files.

   3. Useful modules II: Options.

   4. Documentation and being verbose.

   5. Exercise: Assembly stats.
Writing Perl Scripts:
   1. Four mandatory lines.

   2. Useful modules I: Files.

   3. Useful modules II: Options.

   4. Documentation and being verbose.

   5. Exercise: Assembly stats.
1. Four mandatory lines.


   1.LINE:             #!/usr/bin/perl
      Where ?         At the beginning of the script.
      Why ?           It says to the operating system what
                      program needs to use to executate the script .

   2.LINE:            use warnings;
      Where ?         Before declare the modules and variables.
                      (sooner is better).
      Why ?           It will print any compilation warnings.

   3.LINE:            use strict;
      Where ?         Before declare the modules and variables.
                      (sooner is better).
      Why ?           It will check any gramatical error and it will not
                      Let run scripts with errors.
1. Four mandatory lines.


   4.LINE:                1;
      Where ?             At the end of the script.
      Why ?               It says to the operating system that the script
                          It is done.


        #!/usr/bin/perl

        use strict;
        use warnings;

        ###########
        ## MY CODE
        ###########

        1;
Writing Perl Scripts:
   1. Four mandatory lines.

   2. Useful modules I: Files.

   3. Useful modules II: Options.

   4. Documentation and being verbose.

   5. Exercise: Assembly stats.
2. Useful modules I: Files


   JUST A REMINDER: How open/Read/Write/Close files.

   1. OPEN FUNCTION.

          open (FILEHANDLE, MODE, REFERENCE);


          FILEHANDLES: undefined scalar variable autovivified.

          MODE:     read, input only:               <
                    write, output only:             >
                    append to a file:               >>
                    read/write update access:       +<
                    write/read update access        +>
                    read/append update access       +>>

          REFERENCE: Filename or reference to open
2. Useful modules I: Files


   JUST A REMINDER: How open/Read/Write/Close files.

   1. OPEN FUNCTION.

          open (my $ifh, '<', $input_filename);
          open (my $ofh, '>', $output_filename);


       SUGGESTIONS: “use autodie” instead “OR die(“my error”)”;


          open (my $ifh, '<', $input_filename)
             OR die(“ERROR OPENING FILE: $!”);
2. Useful modules I: Files


   JUST A REMINDER: How open/Read/Write/Close files.

   2. READING OPENED FILES.
          while(<FILEHANDLE>) {
              ## BLOCK USING $_ as LINE (don't forget chomp)
          }


       SUGGESTIONS: “Know the status of the file”

          my @filelines = <FILEHANDLE>;
          my $L = scalar(@filelines);
          my $l = 0;

          foreach my $line (@filelines) {
              $l++;
              print STDERR “Reading line $l of $L lines   r”;
          }
2. Useful modules I: Files


   JUST A REMINDER: How open/Read/Write/Close files.

   3. WRITE OVER OPENED FILES.

          print $ofh “This to print over the file”;




   4. CLOSE FILES.

          close($ofh);
1. Useful modules I: Files


   a) File::Basename;

       Parse file paths into directory, filename and suffix.


           use File::Basename;

           my ($name, $path, $suffix) = fileparse($fullname,@suffixlist);


           my $name = fileparse($fullname, @suffixlist);

           my $basename = basename($fullname, @suffixlist);

           my $dirname = dirname($fullname);
2. Useful modules I: Files


   b) File::Spec;

       Operations over filenames.


          use File::Spec;


          my $currdir = File::Spec->currdir();
          my $tempdir = File::Spec->tempdir();

          my $path = File::Spec->catfile($currdir, $filename);
Writing Perl Scripts:
   1. Four mandatory lines.

   2. Useful modules I: Files.

   3. Useful modules II: Options.

   4. Documentation and being verbose.

   5. Exercise: Assembly stats.
3. Useful modules II: Options


   Usual way to pass options: Using $ARGV

          user@comp$ myscript.pl argument1 argument2



          #!/usr/bin/perl

          use strict;
          use warnings;
          use autodie;

          my ($arg1, $arg2) = @ARGV;


          1;
3. Useful modules II: Options


   Usual way to pass options: Using $ARGV


   PROBLEM:

      When there are multiple arguments can be confusing.

      Mandatory arguments are difficult to check !!!

   SOLUTION:

      Use modules GetOpt::Std or GetOpt::Long
3. Useful modules II: Options


   GetOpt::Std;

      Process single-character arguments from the command line

          user@comp$ myscript.pl -i argument1 -o argument2 -V -H
          use GetOpt::Std;

          our( $opt_i, $opt_o, $opt_V, $opt_H);
          getopts(i:o:VH);

          ## i: and o: expect something aftter the switch.
          my $input = $opt_i || die(“ERROR: -i <input> was not supplied.”);
          my $output = $opt_i || die(“ERROR: -o <output> was not supplied.”);

          ## V and H don't expect anything after the switch.
          if ($opt_H) {
               print $help;
          }
Writing Perl Scripts:
   1. Four mandatory lines.

   2. Useful modules I: Files.

   3. Useful modules II: Options.

   4. Documentation and being verbose.

   5. Exercise: Assembly stats.
4. Documentation and being verbose


   Three types of documentation:

      1) Document code with #.
            GOOD: Useful for developers.
            BAD: Inaccessible for users if they not open the script.

      2) Document using perldoc.
            GOOD: Clear and formated information.
            BAD: perdoc is not always installed in the system.

      3) Document using an inside print function.
            GOOD: Frecuently easy to access. Intuitive.
            BAD: ??? Well increase the size of your script.
4. Documentation and being verbose


   Three types of documentation:

      1) Document code with #.
            GOOD: Useful for developers.
            BAD: Inaccessible for users if they not open the script.

      2) Document using perldoc.
            GOOD: Clear and formated information.
            BAD: perdoc is not always installed in the system.

      3) Document using an inside print function.
            GOOD: Frecuently easy to access. Intuitive.
            BAD: ??? Well increase the size of your script.
4. Documentation and being verbose


   Documenting through a function;
         sub help {

             print STDERR <<EOF;
             $0:
                 Description:
                     My program description.

                 Synopsis:
                    myscript.pl [-H] [-V] -i <input>

                 Arguments:
                    -i <input>        input file (mandatory)
                    -H <help>         print Help.
                    -V <verbose>      be verbose

             EOF;
             Exit(1);
         }
4. Documentation and being verbose


   Calling help;


        use GetOpt::Std;

        our( $opt_i, $opt_o, $opt_V, $opt_H);
        getopts(i:o:VH);

        ## i: and o: expect something aftter the switch.
        my $input = $opt_i || die(“ERROR: -i <input> was not supplied.”);
        my $output = $opt_i || die(“ERROR: -o <output> was not supplied.”);

        ## V and H don't expect anything after the switch.
        if ($opt_H) {
             help();
        }
4. Documentation and being verbose


   Being verbose;

        use GetOpt::Std;

        our( $opt_i, $opt_o, $opt_V, $opt_H);
        getopts(i:o:VH);

        ## i: and o: expect something aftter the switch.
        my $input = $opt_i || die(“ERROR: -i <input> was not supplied.”);
        my $output = $opt_i || die(“ERROR: -o <output> was not supplied.”);

        if ($opt_V) {
             my $date = `date`;
             chomp($date);
             print STDERR “Step 1 [$date]:ntParsing -i $input file.n”;
        }
4. Documentation and being verbose


   Being verbose;



         my @filelines = <FILEHANDLE>;
         my $L = scalar(@filelines);
         my $l = 0;

         foreach my $line (@filelines) {
             $l++;
             if ($opt_V) {
                  print STDERR “Reading line $l of $L lines   r”;
             }
         }
Writing Perl Scripts:
   1. Four mandatory lines.

   2. Useful modules I: Files.

   3. Useful modules II: Options.

   4. Documentation and being verbose.

   5. Exercise: Assembly stats.
5. Exercise: Assembly Stats


    GOAL: Create a script to calculate:

       1) Number of sequence in a file.

       2) Total BP of a file.

       3) Longest sequence

       4) Shortest sequence.

       5) Average and SD.

       6) N25, N50, N75, N90, N95 (length and indexes)
5. Exercise: Assembly Stats


       6) N25, N50, N75, N90, N95 (length and indexes)

          Just a reminder:

             N50 Length is the minimun length contained by the
                 50% of the size of the file (in bp) when it is ordered
                 by decreasing length.

             N50 Index is the number os sequences contained by the
                 50% of the size of the file (in bp) when it is ordered
                 by decreasing length.

PerlScripting

  • 1.
    Writing Perl Scripts BoyceThompson Institute for Plant Research Tower Road Ithaca, New York 14853-1801 U.S.A. by Aureliano Bombarely Gomez
  • 2.
    Writing Perl Scripts: 1. Four mandatory lines. 2. Useful modules I: Files. 3. Useful modules II: Options. 4. Documentation and being verbose. 5. Exercise: Assembly stats.
  • 3.
    Writing Perl Scripts: 1. Four mandatory lines. 2. Useful modules I: Files. 3. Useful modules II: Options. 4. Documentation and being verbose. 5. Exercise: Assembly stats.
  • 4.
    1. Four mandatorylines. 1.LINE: #!/usr/bin/perl Where ? At the beginning of the script. Why ? It says to the operating system what program needs to use to executate the script . 2.LINE: use warnings; Where ? Before declare the modules and variables. (sooner is better). Why ? It will print any compilation warnings. 3.LINE: use strict; Where ? Before declare the modules and variables. (sooner is better). Why ? It will check any gramatical error and it will not Let run scripts with errors.
  • 5.
    1. Four mandatorylines. 4.LINE: 1; Where ? At the end of the script. Why ? It says to the operating system that the script It is done. #!/usr/bin/perl use strict; use warnings; ########### ## MY CODE ########### 1;
  • 6.
    Writing Perl Scripts: 1. Four mandatory lines. 2. Useful modules I: Files. 3. Useful modules II: Options. 4. Documentation and being verbose. 5. Exercise: Assembly stats.
  • 7.
    2. Useful modulesI: Files JUST A REMINDER: How open/Read/Write/Close files. 1. OPEN FUNCTION. open (FILEHANDLE, MODE, REFERENCE); FILEHANDLES: undefined scalar variable autovivified. MODE: read, input only: < write, output only: > append to a file: >> read/write update access: +< write/read update access +> read/append update access +>> REFERENCE: Filename or reference to open
  • 8.
    2. Useful modulesI: Files JUST A REMINDER: How open/Read/Write/Close files. 1. OPEN FUNCTION. open (my $ifh, '<', $input_filename); open (my $ofh, '>', $output_filename); SUGGESTIONS: “use autodie” instead “OR die(“my error”)”; open (my $ifh, '<', $input_filename) OR die(“ERROR OPENING FILE: $!”);
  • 9.
    2. Useful modulesI: Files JUST A REMINDER: How open/Read/Write/Close files. 2. READING OPENED FILES. while(<FILEHANDLE>) { ## BLOCK USING $_ as LINE (don't forget chomp) } SUGGESTIONS: “Know the status of the file” my @filelines = <FILEHANDLE>; my $L = scalar(@filelines); my $l = 0; foreach my $line (@filelines) { $l++; print STDERR “Reading line $l of $L lines r”; }
  • 10.
    2. Useful modulesI: Files JUST A REMINDER: How open/Read/Write/Close files. 3. WRITE OVER OPENED FILES. print $ofh “This to print over the file”; 4. CLOSE FILES. close($ofh);
  • 11.
    1. Useful modulesI: Files a) File::Basename; Parse file paths into directory, filename and suffix. use File::Basename; my ($name, $path, $suffix) = fileparse($fullname,@suffixlist); my $name = fileparse($fullname, @suffixlist); my $basename = basename($fullname, @suffixlist); my $dirname = dirname($fullname);
  • 12.
    2. Useful modulesI: Files b) File::Spec; Operations over filenames. use File::Spec; my $currdir = File::Spec->currdir(); my $tempdir = File::Spec->tempdir(); my $path = File::Spec->catfile($currdir, $filename);
  • 13.
    Writing Perl Scripts: 1. Four mandatory lines. 2. Useful modules I: Files. 3. Useful modules II: Options. 4. Documentation and being verbose. 5. Exercise: Assembly stats.
  • 14.
    3. Useful modulesII: Options Usual way to pass options: Using $ARGV user@comp$ myscript.pl argument1 argument2 #!/usr/bin/perl use strict; use warnings; use autodie; my ($arg1, $arg2) = @ARGV; 1;
  • 15.
    3. Useful modulesII: Options Usual way to pass options: Using $ARGV PROBLEM: When there are multiple arguments can be confusing. Mandatory arguments are difficult to check !!! SOLUTION: Use modules GetOpt::Std or GetOpt::Long
  • 16.
    3. Useful modulesII: Options GetOpt::Std; Process single-character arguments from the command line user@comp$ myscript.pl -i argument1 -o argument2 -V -H use GetOpt::Std; our( $opt_i, $opt_o, $opt_V, $opt_H); getopts(i:o:VH); ## i: and o: expect something aftter the switch. my $input = $opt_i || die(“ERROR: -i <input> was not supplied.”); my $output = $opt_i || die(“ERROR: -o <output> was not supplied.”); ## V and H don't expect anything after the switch. if ($opt_H) { print $help; }
  • 17.
    Writing Perl Scripts: 1. Four mandatory lines. 2. Useful modules I: Files. 3. Useful modules II: Options. 4. Documentation and being verbose. 5. Exercise: Assembly stats.
  • 18.
    4. Documentation andbeing verbose Three types of documentation: 1) Document code with #. GOOD: Useful for developers. BAD: Inaccessible for users if they not open the script. 2) Document using perldoc. GOOD: Clear and formated information. BAD: perdoc is not always installed in the system. 3) Document using an inside print function. GOOD: Frecuently easy to access. Intuitive. BAD: ??? Well increase the size of your script.
  • 19.
    4. Documentation andbeing verbose Three types of documentation: 1) Document code with #. GOOD: Useful for developers. BAD: Inaccessible for users if they not open the script. 2) Document using perldoc. GOOD: Clear and formated information. BAD: perdoc is not always installed in the system. 3) Document using an inside print function. GOOD: Frecuently easy to access. Intuitive. BAD: ??? Well increase the size of your script.
  • 20.
    4. Documentation andbeing verbose Documenting through a function; sub help { print STDERR <<EOF; $0: Description: My program description. Synopsis: myscript.pl [-H] [-V] -i <input> Arguments: -i <input> input file (mandatory) -H <help> print Help. -V <verbose> be verbose EOF; Exit(1); }
  • 21.
    4. Documentation andbeing verbose Calling help; use GetOpt::Std; our( $opt_i, $opt_o, $opt_V, $opt_H); getopts(i:o:VH); ## i: and o: expect something aftter the switch. my $input = $opt_i || die(“ERROR: -i <input> was not supplied.”); my $output = $opt_i || die(“ERROR: -o <output> was not supplied.”); ## V and H don't expect anything after the switch. if ($opt_H) { help(); }
  • 22.
    4. Documentation andbeing verbose Being verbose; use GetOpt::Std; our( $opt_i, $opt_o, $opt_V, $opt_H); getopts(i:o:VH); ## i: and o: expect something aftter the switch. my $input = $opt_i || die(“ERROR: -i <input> was not supplied.”); my $output = $opt_i || die(“ERROR: -o <output> was not supplied.”); if ($opt_V) { my $date = `date`; chomp($date); print STDERR “Step 1 [$date]:ntParsing -i $input file.n”; }
  • 23.
    4. Documentation andbeing verbose Being verbose; my @filelines = <FILEHANDLE>; my $L = scalar(@filelines); my $l = 0; foreach my $line (@filelines) { $l++; if ($opt_V) { print STDERR “Reading line $l of $L lines r”; } }
  • 24.
    Writing Perl Scripts: 1. Four mandatory lines. 2. Useful modules I: Files. 3. Useful modules II: Options. 4. Documentation and being verbose. 5. Exercise: Assembly stats.
  • 25.
    5. Exercise: AssemblyStats GOAL: Create a script to calculate: 1) Number of sequence in a file. 2) Total BP of a file. 3) Longest sequence 4) Shortest sequence. 5) Average and SD. 6) N25, N50, N75, N90, N95 (length and indexes)
  • 26.
    5. Exercise: AssemblyStats 6) N25, N50, N75, N90, N95 (length and indexes) Just a reminder: N50 Length is the minimun length contained by the 50% of the size of the file (in bp) when it is ordered by decreasing length. N50 Index is the number os sequences contained by the 50% of the size of the file (in bp) when it is ordered by decreasing length.