0% found this document useful (0 votes)

9 views25 pages

Perl Notes

Perl is a versatile high-level programming language known for its text manipulation capabilities, making it popular among various professionals including system administrators and web developers. It features three primary data types: scalars, arrays, and hashes, each with specific uses and syntax. The Comprehensive Perl Archive Network (CPAN) provides a wealth of resources and modules for Perl programming, enhancing its functionality and ease of use.

Uploaded by

sanbioinfo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views25 pages

Perl Notes

Uploaded by

sanbioinfo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

What is Perl?

Perl is a high-level programming language with an eclectic heritage written by Larry Wall and a
cast of thousands. It derives from the ubiquitous C programming language and to a lesser
extent from sed, awk, the Unix shell, and at least a dozen other tools and languages. Perl's
process, file, and text manipulation facilities make it particularly well-suited for tasks involving
quick prototyping, system utilities, software tools, system management tasks, database access,
graphical programming, networking, and web programming. These strengths make it especially
popular with system administrators and web developers, but mathematicians, geneticists,
journalists, and even managers also use Perl. Maybe you should, too.

A good starting point for Perl information is http://www.perl.org/

Edit—Run—Revise (and Save)

The most important thing about programming is that it's a hands-on learning activity such as
dancing, playing music, cooking, or some other family-oriented activity. You can read about it,
but you can't actually do it until you actually do it.
While learning to program in Perl, you need to read about how Perl works, as you will in the
chapters that follow. You also need to look at plenty of examples of programs. But you
especially need to attempt to write your own programs, as you are asked to do in the exercises
at the end of the later chapters. Only this kind of direct experience will make you a
programmer.

So I want to give you an overview of the most important tasks involved in writing programs, to
help you approach your first programs with a clearer idea of what's really involved.

What exactly will you be doing at the computer? The bulk of a programmer's work involves the
steps of writing or revising a program in an editor, then running the program and watching how
it behaves, and on the basis of that behavior going back and revising the program again. A
typical programmer spends more than half of his or her time editing the program.

Ease of Programming
Computer languages differ in which things they make easy. By "easy" I mean easy for a
programmer to program. Perl has certain features that simplifies several common
bioinformatics tasks. It can deal with information in ASCII text files or flat files, which are exactly
the kinds of files in which much important biological data appears, in the GenBank and PDB
databases, among others. easy to process and manipulate long sequences such as DNA and
proteins. Perl makes it convenient to write a program that controls one or more other
programs. As a final example, Perl is used to put biology research labs, and their results, on
their own dynamic web sites. Perl does all this and more.

Interpreters
An interpreter normally means a computer program that executes, i.e. performs, instructions
written in a programming language. An interpreter may be a program that either
1. executes the source code directly.
2. translates source code into some efficient intermediate representation (code) and
immediately executes this.
3. explicitly executes stored precompiled code made by a compiler which is part of the
interpreter system.
Perl, Python, MATLAB, and Ruby are examples of type 2, while UCSD Pascal and Java are type 3:
Source programs are compiled ahead of time and stored as machine independent code, which
is then linked at run-time and executed by an interpreter and/or compiler (for JIT systems).
Some systems, such as Smalltalk, BASIC and others, may also combine 2 and 3. While
interpreting and compiling are the two main means by which programming languages are
implemented, these are not fully distinct categories, one of the reasons being that most
interpreting systems also perform some translation work, just like compilers. The terms
"interpreted language" or "compiled language" merely mean that the canonical
implementation of that language is an interpreter or a compiler; a high level language is
basically an abstraction which is (ideally) independent of particular implementations.

Comprehensive Perl Active Network

CPAN is the Comprehensive Perl Archive Network, a large collection of Perl software and
documentation. You can begin exploring from either http://www.cpan.org/or any of the mirrors
listed at http://www.cpan.org/SITES.html.

CPAN is also the name of a Perl module, CPAN.pm, which is used to download and install Perl
software from the CPAN archive. This FAQ covers only a little about the CPAN module and you
may find the documentation for it by using perldoc CPAN via the command line or on the web
athttp://search.cpan.org/dist/CPAN/lib/CPAN.pm.

Perl Variables

Perl Variables with the techniques of handling them are an important part of the Perl language.
As a language-type script, Perl was designed to handle huge amounts of data text. Working with
variables is fairly straightforward given that it is not necessary to define and allocate them, so
no sophisticated techniques for the release of memory occupied by them.

As general information, to note that the names of Perl variables contain alphabetic characters,
numbers and the underscore (_) character and are case sensitive.

A specific language feature is that variables have a non-alphabetical prefix that fashion
somewhat cryptic the language. This, however, presents the advantage that provides
immediate information on the type of variable and what can be done with it. Thus, according to
the first name that begins with a variable, we have:

scalar variables – starting with $

array variables – starting with @

hashes or associative arrays indicated by %

The $, @ and % characters actually predefine the variable type in Perl. Perl language also offers
some built-in predefined variables that facilitate and shorten the programming code.

Below are the most important characteristics of the three types of Perl variables and some
examples of their use.

Scalar Perl variables.

Are simple variables that can contain a single element: a string, a number or a reference to an
object. Strings are sequences of characters including any symbol, letter or number. The
numbers may contain exponents, integers or decimal values. The reference is a scalar value that
contains a memory address where a scalar, array or hash variable is stored. The reference type
variable is preceded by the character.

Below you can see some examples of code on how to use the scalar variable type.
$pi = 3.14;
$height = "20 cm";
$message = "Hello World!\n";
print $message;
$message = "Welcome!\n";
print $message;
$ref_a=$message;
print $$ref_a;
Array Perl variables.
Arrays are ordered list of scalars, where the first element of the list begins with the index 0. An
array can hold un unlimited number of elements and its name begins with the character @. An
element of the array is written with the $ prefix followed by the array name and its index
placed in square brackets.

Below you can see an example of an array variable:

@timeUnits = ("hours", "minutes", "seconds","miliseconds");

Taking the example above, by @timeUnits we mean the entire array and by$timeUnits[0] we
mean the first element of the array @timeUnits, in our case "hours". In order to print all the
elements of the @timeUnits array, we could use a code snippet like this:
foreach $i (@timeUnits)
{
print "$i\n";
}

where $i indicates the current loop iterator.

Hash Perl variables

Hashes or associative arrays consists of a group of pairs of elements – a key and a data value.
Meanwhile the arrays are indexed by numbers, hashes are indexed by strings. Perl hash names
are prefixed by % character. Let’s look below for a hash variable example:

%animalColors = ("bear", "brown", "mouse", "gray",

"panther", "black", "panda", "white");

If you want to refer the first element of the %animalColors hash, you must use the $ scalar
symbol and curly brackets as in $animalColors{"bear"}.

If you want to print the %animalColors hash, you could use a little code like this:

%animalColors = ("bear", "brown", "mouse", "gray",

"panther", "black", "panda", "white");
foreach $item (keys %animalColors) {
print "The $item is $animalColors{$item}.\n";
}

And if you will run it you’ll get as result something like this:

The bear is brown.

The mouse is gray.
The panda is white.
The panther is black.
Please note that keys %animalColors will return the keys in random order, so if you want to
print the hash in the exact order, you must specify the keys and rewrite the foreach loop like in
the example below:

foreach $item ("bear", "mouse", "panther", "panda") {

print "The $item is $animalColors{$item}.\n";
}

And you’ll get as result:

The bear is brown.

The mouse is gray.
The panther is black.
The panda is white.
But enough about hashes for now. Let’s speak a bit about:

Predefined Perl variables

The Perl language has a lot of special variables you must know. You can modify the values of
these variables, except the variables that are read only. Of course, the predefined variables are
of scalar, array or hash type. One of the most important is the scalar variable $_ which is used
by default by many operators and functions as in the follows example:

$_= "Hello World!\n";

print;

If you will run the code, this will print the text "Hello World!".

Perl Data Types

There are three built-in Perl Data Types: scalars, arrays and hash or associative arrays, that
make the Perl language a powerful tool for text manipulation.

1. The first type of Perl data types is the scalar which is Perl’s data fundamental unit and it can
be a single string, anumber or a reference to a specific object.

When we speak about scalars, we have in view two types:

 scalar literals (or constants) that don’t change over the life of a program – one simple
example is the value of pi
 scalar variables which let you hold data and manipulate them:
o each variable is associated with a name that enables you to refer to the data and
the address of a chunk of memory where the variable value is stored
o the variable value can be changed during the execution of the program.

Scalar literals can be numbers or strings:

 Numbers can be integers or floating point decimal numbers expressed in different

notations: 145.23, 22., 2.7E-2.
 Strings are sequences of characters and can contain any kind of data including simple
ASCII text and binary data. There is no real limit for the size of a string literal. String
literals are usually included between single or double quotes: '12.5', "Hello World!"

Scalar variables are used for storing scalar literals. We represent a scalar variable with the dollar
sign $ followed by the name of the variable: $Company, $country, $count, $x.

The name of a variable can contain any alphabetic characters, numbers or underscore. Note
that the first character of a scalar variable name can’t be a number and the variable names are
case sensitive, which means there is a significant difference between upper and lowercase
characters. In Perl language it’s not necessary to declare a scalar variable, you just name and
use it like in this scalar variable assignment example:

$pi = 3.14;

In a scalar variable we can store the memory address of a chunk of memory, too. We
call reference the scalar value that contains a memory address. Perl allocates and deallocates
automatically the memory for references. Look at the following code snippet to see how you
can use a reference variable:

$a="Hello World!";
$ref_a=\$a;
print $$ref_a;

In the first line of code, the variable named $a stores the string “Hello World”. In the second
line, the scalar variable ref_a is assigned with a reference to the variable $a (note that $a is
preceded by the character \). The third line of the snippet is an example of how to print the
string stored in the variable $a using a reference – we say that we dereference the scalar
variable $ref_a before print. It’s a good practice to name a scalar reference beginning with the
substring ref_, because this will tell you that this is a reference variable and if you want to use
it, you must dereference it before.

Another kind of constants are literal lists which are used to initialize an array or a hash. To
create a literal list it is very easy, just put a set of parentheses enclosing scalar values as in the
following example:
(1, 'hello', 'world', $a)

where we can see a list with 4 elements.

2. The second basic type of the Perl data types is the array which is indexed by a number. For
creating an array you simple must put something into it. It is not necessary to declare or specify
its dimension. The name of an array begins with the character @ and in the example below we
will assign the array @things with the literal list presented above:

@things = (1, 'hello', 'world', $a);

3. The third basic type of the Perl data types is the hash or associative array, which, like the
array, contains a number of scalars. Hashes are indexed by strings and each hash has two parts:
a key that identifies each element of the hash and a value which is the data associated with the
key. The name of a hash structure begins with the character %. In order to assign a hash key,
we’ll write it like in the following code line:
$hash{$key}=value;

Here we have a short example of a hash structure called %NotebooksPrice:

# We first assign some elements:

$NotebookPrices{"Toshiba"}=650;
$NotebookPrices{"HP"}=550;
$NotebookPrices{"Acer"}=750;

# we print now the keys of the hash %NotebookPrices

foreach $item (keys %NotebookPrices) {
print "$item\n";
}

You can combine the three Perl data types enumerated above to get more complex data
structures (like array of arrays, array of hashes, and so on).

Operators
In Perl, there are different versions of the operators for numbers and strings. For instance, if you want to
compare a number, you would use a traditional symbol such as <, >, and so on. However, when you
compare two strings, the less-than and greater-than signs are not used. Instead, a special version is used
to compare strings. Less-than would be the two letters lt and greater-than would be gt. In the lists that
follow, there will be separate listings for numerical and string operators where this is necessary.

Arithmetic Operators
These operators are used to perform mathematical calculations on numbers. Keep in mind though, they
are not used to combine strings. There are some special string operators for this. Note that the
assignment operator does work both ways, we used it to assign values to variables in the last section.
Here is the list:
Operator Function

+ Addition

- Subtraction, Negative Numbers, Unary Negation

* Multiplication

/ Division

% Modulus

** Exponent

To use these, you will place them in your statements like a mathematical expression. So, if you want to
store the sum of two variables in a third variable, you would write something like this:

$adrevenue=20;
$sales=10;
$total_revenue=$adrevenue+$sales;

You can use the other arithmetic operators in the same way, it is quite similar to other programming
languages.

Assignment Operators
We have already used the = sign as an assignment operator to assign values to a variable. You can also
use the = sign with another arithmetic operator to perform a special type of assignment. You can precede
the = sign with the + operator, for example:

$revenue+=10;

What this does is create a shorthand for writing out the following statement:

$revenue=$revenue+10;

It takes the variable $revenue and assigns it the value of $revenue (itself) plus 10. So, if you had an initial
value for $revenue set at 5:

$revenue=5; $revenue=$revenue+10;

After these statements, $revenue is 15. It added 10 to the value it had before the new assignment.

The others work the same way, but perform the various different operations. Here a list of the arithmetic
operators we used above when we place them with the assignment operator:
Operator Function

= Normal Assignment

+= Add and Assign

-= Subtract and Assign

*= Multiply and Assign

/= Divide and Assign

%= Modulus and Assign

**= Exponent and Assign

Remember, these are used for the sake of typing less or cutting the file size of the code. You can write
the statements out the long way if it makes it more understandable when you read your code.

Increment/Decrement
Another shorthand method is to use the increment and decrement operators, rather than writing out
something like this:

$revenue=$revenue+1;

You can simply write something like this:

$revenue++;

However, using these operators you must remember that you could also write something like this:

++$revenue;

If you place the ++ before the variable name, it the variable adds one to itself before it is used or
evaluated. For example, if you write:

$revenue=5;
$total= ++$revenue + 10;

The $revenue variable is incremented before it is used in the calculation, so it is changed to 6 before 10 is
added to it. Thus, $total turns out to be 16 here.

If you want to increment the variable after it is used, you use the ++ after the variable name:

$revenue=5;
$total= $revenue++ + 10;

This way $total is only 15 because $revenue is used before being incremented, so it stays at 5 for this
expression. If you use $revenue again after this, it will have a value of 6.

With that in mind, here is the short list of the two operators:
Operator Function

++ Increment (Add 1)

-- Decrement (Subtract 1)

The -- operator works the same way as ++, but it subtracts one from the value of the variable (decrements
it).

Adding Strings
Like I mentioned at the beginning of this section, there are different operators for strings under certain
conditions. If you want to put two strings together (also called concatenate), you will want to use the dot
operator. Unlike C and JavaScript (where it is used with objects), the dot operator in Perl concatenates
two strings.

For example, if you want to place two strings together, you could do this:

$full_string="light" . "house";

This would make $full_string have the value of lighthouse. This is more useful if you are using variables
for this:

$word1="light";
$word2="house";
$full_string=$word1 . $word2;
print "If I had a $word1 and a $word2, would I be able to make a
$full_string?";

Yes, it prints out this silly little sentence:

If I had a light and a house, would I be able to make a lighthouse?

This can also be used with the assignment operator to do what we did with numbers earlier. In this case,
it gives the string the value of itself put together with another string:

$word1="light";
$full_string=$word1 . "house";

Of course, we again get the value of lighthouse for the $full_string variable. Here is the list for these two
string operators:

Operator Function

. Concatenate Strings

.= Concatenate and Assign

Numeric Comparison
These operators are used to compare two numbers, but not to compare strings. We'll get to those next.
These operators are typically used in some type of conditional statement that executes a block of code or
initiates a loop. We'll get to the conditional statements in the next section, but to introduce this we will use
the beginning of an if () condition. Before that, let's look at the list:

Operator Function

== Equal to

!= Not Equal to

> Greater than

< Less than
>= Greater than or Equal to
<= Less than or Equal to

So, suppose you want to execute some code only if one number is equal to another. You would use the if
() condition with the == operator above:

$money=5;
if ($money==5)
{
....more code....
}

Since money is equal to 5, it would execute the code you place between the curly brackets. It works the
same way if you use one of the other operators:

$money=5;
if ($money<3)
{
....more code....
}

This time it would not go through, as 5 is not less than 3-- the value of $money.

String Comparison
These are similar to the numerical comparisons, but they work with strings. We will note a few differences
in the way these work after the list below:

Operator Function

eq Equal to

ne Not Equal to

gt Greater than
lt Less than

ge Greater than or Equal to

le Less than or Equal to

Strings are equal if they are exactly the same. So, "cool" and "cool" are equal, but "cool" and "coolz" are
not. Here is a string equality example:

$i_am="cool";
if ($i_am eq "cool")
{
....more cool code....
}

The greater-than and less-than operators compare strings using alphabetical order. Here is a sample:

$i_am="all right";
if ($i_am lt "cool")
{
print "You are not very cool, dude.";
}

Logical Operators
These are often used when you need to check more than one condition. Here they are:

Operator Function
&& AND
|| OR
! NOT

So, if you want to see if a number is less than or equal to 10, and also greater than zero:

$number=5;
if (($number <= 10) && ($number > 0))
{
...code....
}

Notice the nested parentheses. Since we are checking for two conditions, we want to be sure the
comparison is done first. Thus, they have their own sets of parentheses within the parentheses for the if ()
condition.
PERL functions

Perl functions are available to you at any place in your script and they do not need any declaration to use,
being installed within your system with Perl packages. You can do a search at CPAN and find an almost
exhaustive presentation of all the built-in Perl functions, grouped by categories or alphabetically. If you
want to write your own function, you can use a user-defined subroutine and then call it anywhere within
your Perl code.

It’s beyond the scope of this section to clear all the aspects related to functions in a programming
language. In general, a function has a name and represents a block of code which we can call anywhere
we need in a program source. We use the functions in order not to repeat indefinitely the same block of
code using the copy and paste techniques.

PERL modules

We'll refer below to Perl Modules. When we talk about modules we refer to some well-structured
components of the system, which have an interface defined to some other components of the system.
Starting with the fifth version, Perl provides you thousands of modules available through CPAN
(Comprehensive Perl Archive Network) or other resources.

A module is a piece of code that others have already written, code that you can use and integrate in your
own Perl script in order to save a lot of time. You can design your own modules in order to optimize your
code or use modules made by others. Refactoring the source code by creating your own modules is
beyond the scope of this page, so we’ll discuss here only the opportunities you have in implementing the
modules available at a certain moment on different platforms.

It is very probable that, if you have a problem to code into your script, there is somebody who has already
solved and made it available on CPAN, so reusing code through a module interface is always a good
idea.

On the other hand, there are two types of Perl modules: those that are distributed with Perl which you can
use immediately after the Perl installation, and those you can download from CPAN or other sources and
install yourself. Each module available on CPAN has a detailed documentation which you can read before
downloading the entire package and see by yourself if it’s what you are looking for. Some modules
depend on other modules, so please read carefully the associated documentation before you download
and install them.

In order to use a module it’s not necessary to know how all of the behind-the-scenes magic works, it’s
only important to understand its interface and know its functions or subroutines.

To find modules that are not distributed with Perl you can start browsing through the CPAN categories or
search Google or some other search engines directly.

PERL subroutines

Like any good programming langauge Perl allows the user to define their own functions,
called subroutines. They may be placed anywhere in your program but it's probably best to put them all at
the beginning or all at the end. A subroutine has the form
sub mysubroutine
{
print "Not a very interesting routine\n";
print "This does the same thing every time\n";
}

regardless of any parameters that we may want to pass to it. All of the following will work to call this
subroutine. Notice that a subroutine is called with an & character in front of the name:

&mysubroutine; # Call the subroutine

&mysubroutine($_); # Call it with a parameter
&mysubroutine(1+2, $_); # Call it with two parameters

Parameters

In the above case the parameters are acceptable but ignored. When the subroutine is called any
parameters are passed as a list in the special @_ list array variable. This variable has absolutely nothing
to do with the $_ scalar variable. The following subroutine merely prints out the list that it was called with.
It is followed by a couple of examples of its use.

sub printargs
{
print "@_\n";
}

&printargs("perly", "king"); # Example prints "perly king"

&printargs("frog", "and", "toad"); # Prints "frog and toad"

Just like any other list array the individual elements of @_ can be accessed with the square bracket
notation:

sub printfirsttwo
{
print "Your first argument was $_[0]\n";
print "and $_[1] was your second\n";
}

Again it should be stressed that the indexed scalars $_[0] and $_[1] and so on have nothing to with the
scalar $_ which can also be used without fear of a clash.

Returning values

Result of a subroutine is always the last thing evaluated. This subroutine returns the maximum of two
input parameters. An example of its use follows.

sub maximum
{
if ($_[0] > $_[1])
{
$_[0];
}
else
{
$_[1];
}
}

$biggest = &maximum(37, 24); # Now $biggest is 37

The &printfirsttwo subroutine above also returns a value, in this case 1. This is because the last thing that
subroutine did was a print statement and the result of a successful print statement is always 1.

Local variables

The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1], $_[2], and so on.
Other variables can be made local too, and this is useful if we want to start altering the input parameters.
The following subroutine tests to see if one string is inside another, spaces not withstanding. An example
follows.

sub inside
{
local($a, $b); # Make local variables
($a, $b) = ($_[0], $_[1]); # Assign values
$a =~ s/ //g; # Strip spaces from
$b =~ s/ //g; # local variables
($a =~ /$b/ || $b =~ /$a/); # Is $b inside $a
# or $a inside $b?
}

&inside("lemon", "dole money"); # true

In fact, it can even be tidied up by replacing the first two lines with

local($a, $b) = ($_[0], $_[1]);

Perl File Handling: open, read, write and close files

This article describes the facilities provided for Perl file handling.

Opening files
Opening a file in perl in straightforward: open FILE, "filename.txt" or die $!; The command
above will associate the FILE filehandle with the file filename.txt. You can use the filehandle to
read from the file. If the file doesn't exist - or you cannot read it for any other reason - then the script
will die with the appropriate error message stored in the $! variable.

What if you wanted to modify the file instead of just reading from it? Then you'd have to specify the
appropriate mode using the three-argument form of open.
open FILEHANDLE, MODE, EXPR The available modes are the following:

mode operand create truncate

read <

write > ✓ ✓

append >> ✓

Each of the above modes can also be prefixed with the + character to allow for simultaneous reading
and writing.

mode operand create truncate

read/write +<

read/write +> ✓ ✓

read/append +>> ✓

Notice, how both +< and +> open the file in read/write mode but the latter also creates the file if it
doesn't exist or truncates (deletes) an existing file. So, if you wanted to open a file for writing,
creating it if it doesn't exist and truncating it first if does, you'd do the following: open FILE, ">",
"filename.txt" or die $! This operation might fail if for example you don't have the appropriate
permissions. In this case $! will be set appropriately.

The mode and the filename in the three-argument form can be combined, so the above can also be
written as: open FILE, ">filename.txt" or die $!; As you might have guessed already if you
just want read access you can skip the mode just as we did in the very first example above.

Reading files
If you want to read a text file line-by-line then you can do it as such: my @lines =
<FILE>; The <FILE> operator - where FILE is a previously opened filehandle - returns all the
unread lines of the text file in list context or a single line in scalar context. Hence, if you had a
particularly large file and you wanted to conserve memory you could process it line by line: while
(<FILE>) {

print $_;

} The $_ variable is automatically set for you to the contents of the current line. If you wish you may
name your line variable instead: while (my $line = <FILE>) { ... will set the $line variable to
the contents of the current line. The newline character at the end of the line is not removed
automatically. If you wish to remove it you can use the chomp command. After all lines have been
read the <FILE> operator will return a false value hence causing the loop to terminate.
There may cases where you need to read a file only a few characters at a time instead of line-by-
line. This may be the case for binary data. To do just that you can use the read command. open
FILE, "picture.jpg" or die $!;

binmode FILE;

my ($buf, $data, $n);

while (($n = read FILE, $data, 4) != 0) {

print "$n bytes read\n";

$buf .= $data;

close(FILE);

There is a lot going on here so let's take it step by step. In the first line of the above code fragment a
file is opened. As you can guess from the filename it is a binary file. Binary files need to treated
differently than text files on some operating systems (eg, Windows). The reason is that on these
platforms a newline "character" is actually represented within text files by the two character
sequence \cM\cJ (that's control-M, control-J). When reading the text file Perl will convert the \cM\
cJ sequence into a single \n newline characted. The converse also holds when writing files. Clearly,
when reading binary data this behavior is undesired and calling binmode on the filehandle will make
sure that this conversion is avoided.

The read command takes either 3 or 4 arguments. The 3-argument form is: read FILEHANDLE,
SCALAR, LENGTH while the 4-argument form is: read FILEHANDLE, SCALAR, LENGTH, OFFSET In the
first case LENGTH characters of data are read in the variable specified by SCALAR from FILEHANDLE.
The return value of read is the number of characters actually read, 0 at the end of the file
or undef in the case of an error. Returning to our example above the third line of code will read at
most 4 characters of data into the $data variable. The number of characters read will be stored
in $n. Successive read operations on the same filehandle will set the current file position to be just
before the first unread character. Thus the code above will read the contents of the
file picture.jpg and store them in $buf, printing the number of characters read at every iteration.

If OFFSET is specified then the characters read will be placed at that position within the SCALAR.
Taking advantage of this we could rewrite the loop above as such: my ($data, $n, $offset);
while (($n = read FILE, $data, 4, $offset) != 0) {

print "$n bytes read\n";

$offset += $n;

Even though the example above demonstrates binary reading the read command works just as well
on text files - just make sure to use (for binary) or not use (for text) binmode accordingly.

Writing files
Now that you know how to open and read files learning how to write to them is straighforward. Take
a look at the following code: open FILE, ">file.txt" or die $!;
print FILE $str;

close FILE; Not much is new here. The only thing to observe is the two-argument use of print,
the first argument being the FILEHANDLE to write to and the second an expression to be written. The
expression can be anything: a scalar, a list, a hash, etc. Appending to a file can be accomplished
in exactly the same manner - apart from specifying the appropriate (>>) mode of course.

At this point you are probably thinking that a description of the write is what follows. Actually, as the
manual page for write puts it:

Note that write is not the opposite of read. Unfortunately.

Instead write is used to write formatted records to file, a subject outside the scope of this article.

Closing files
Once you are done reading and writing you should close any open filehandles. open FILE1,
"file.txt" or die $!;

open FILE2, "picture.jpg" or die $!;

...

close FILE2;

close FILE1; If you forget to close a filehandle Perl will do it for you before your script exists but it
is good practice to close yourself what you have opened.

The close command may also fail returning false, eg, if you try to close a closed filehandle. If you
want to catch these errors you can check the return value of close and the approriate error
message stored in $! as is done in the following example: close FILE or die $!

Here is the basic perl program.

#!/usr/local/bin/perl
# Program to open the password file, read it in,
# print it, and close it again.

$file = '/etc/passwd'; # Name the file

open(INFO, $file); # Open the file
@lines = <INFO>; # Read it into an array
close(INFO); # Close the file
print @lines; # Print the array

The open function opens a file for input (i.e. for reading). The first parameter is
the filehandle which allows Perl to refer to the file in future. The second parameter is
an expression denoting the filename. If the filename was given in quotes then it is
taken literally without shell expansion. So the expression '~/notes/todolist' will not be
interpreted successfully. If you want to force shell expansion then use angled
brackets: that is, use <~/notes/todolist> instead.

The close function tells Perl to finish with that file.

There are a few useful points to add to this discussion on filehandling. First,
the open statement can also specify a file for output and for appending as well as for
input. To do this, prefix the filename with a > for output and a >> for appending:
open(INFO, $file); # Open for input
open(INFO, ">$file"); # Open for output
open(INFO, ">>$file"); # Open for appending
open(INFO, "<$file"); # Also open for input

Second, if you want to print something to a file you've already opened for output then
you can use the print statement with an extra parameter. To print a string to the file
with the INFO filehandle use
print INFO "This line goes to the file.\n";

Third, you can use the following to open the standard input (usually the keyboard) and
standard output (usually the screen) respectively:
open(INFO, '-'); # Open standard input
open(INFO, '>-'); # Open standard output
In the above program the information is read from a file. The file is the INFO file and
to read from it Perl uses angled brackets. So the statement
@lines = <INFO>;

reads the file denoted by the filehandle into the array @lines. Note that the <INFO>
expression reads in the file entirely in one go. This because the reading takes place in
the context of an array variable. If @ lines is replaced by the scalar $lines then only
the next one line would be read in. In either case each line is stored complete with its
newline character at the end.

Flow control

Flow control is the order in which the statements of a program are executed. A
program executes from the first statement at the top of the program to the last
statement at the bottom, in order, unless told to do otherwise. There are two ways to
tell a program to do otherwise: conditional statements and loops. A conditional
statement executes a group of statements only if the conditional test succeeds;
otherwise, it just skips the group of statements. A loop repeats a group of statements
until an associated test fails.

Finding motifs

One of the most common things we do in bioinformatics is to look for motifs, short
segments of DNA or protein that are of particular interest. They may be regulatory
elements of DNA or short stretches of protein that are known to be conserved across
many species. (The PROSITE web site athttp://www.expasy.ch/prosite/ has extensive
information about protein motifs.)

The motifs you look for in biological sequences are usually not one specific sequence.
They may have several variants—for example, positions in which it doesn't matter
which base or residue is present. They may have variant lengths as well.

Perl has a handy set of features for finding things in strings. This, as much as
anything, has made it a popular language for bioinformatics. Example 5-3 introduces
this string-searching capability; it does something genuinely useful, and similar
programs are used all the time in biology research. It does the following:

1. Reads in protein sequence data from a file.

2. Puts all the sequence data into one string for easy searching.
3. Looks for motifs the user types in at the keyboard.

#!/usr/bin/perl -w
# Searching for motifs

# Ask the user for the filename of the file containing

# the protein sequence data, and collect it from the keyboard
print "Please type the filename of the protein sequence data: ";

$proteinfilename = <STDIN>;

# Remove the newline from the protein filename

chomp $proteinfilename;

# open the file, or exit

unless ( open(PROTEINFILE, $proteinfilename) ) {

print "Cannot open file \"$proteinfilename\"\n\n";

exit;
}

# Read the protein sequence data from the file, and store it
# into the array variable @protein
@protein = <PROTEINFILE>;

# Close the file - we've read all the data into @protein now.
close PROTEINFILE;

# Put the protein sequence data into a single string, as it's

easier
# to search for a motif in a string than in an array of
# lines (what if the motif occurs over a line break?)
$protein = join( '', @protein);

# Remove whitespace
$protein =~ s/\s//g;

# In a loop, ask the user for a motif, search for the motif,
# and report if it was found.
# Exit if no motif is entered.
do {
print "Enter a motif to search for: ";

$motif = <STDIN>;
# Remove the newline at the end of $motif

chomp $motif;
# Look for the motif

if ( $protein =~ /$motif/ )
{
print "I found it!\n\n";
}
else
{
print "I couldn\'t find it.\n\n";
}

# exit on an empty user input

} until ( $motif =~ /^\s*$/ );

# exit the program

exit;
As you see from the output, this program finds motifs that the user types in at the
keyboard. With such a program, you no longer have to search manually through
potentially huge amounts of data. The computer does the work and does it much faster
and more accurately than a human.

Counting Nucleotides

#!/usr/bin/perl -w
# Determining frequency of nucleotides, take 2

# Get the DNA sequence data

print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

# Does the file exist?

unless ( -e $dna_filename) #this –e checks for file existence
{

print "File \"$dna_filename\" doesn\'t seem to exist!!\n";

exit;
}

# Can we open the file?

unless ( open(DNAFILE, $dna_filename) ) {

print "Cannot open file \"$dna_filename\"\n\n";

exit;
}

@DNA = <DNAFILE>;

close DNAFILE;

$DNA = join( '', @DNA);

# Remove whitespace
$DNA =~ s/\s//g;

# Initialize the counts.

# Notice that we can use scalar variables to hold numbers.
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$errors = 0;

# In a loop, look at each base in turn, determine which of the

# four types of nucleotides it is, and increment the
# appropriate count.
for ( $position = 0 ; $position < length $DNA ; ++$position ) {
$base = substr($DNA, $position, 1);

if ( $base eq 'A' )
{ ++$count_of_A; }
elsif ( $base eq 'C' )
{ ++$count_of_C; }
elsif ( $base eq 'G' )
{ ++$count_of_G; }
elsif ( $base eq 'T' )
{ ++$count_of_T; }
else
{
print "!!!!!!!! Error - I don\'t recognize this base: $base\n";
++$errors;
}
}

# print the results

print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "errors = $errors\n";

# exit the program

exit;

( -e $dna_filename) explanation

Note that files have several attributes, such as size, permission, location in the
filesystem, and type of file, and that many of these things can be tested for easily with
the file test operators, and –e checks for file existence in the location specified.

Everything else is familiar, until you hit the for loop; it requires a little explanation:
for ( $position = 0 ; $position < length $DNA ; ++$position ) {

# the statements in the block

}

This for loop is the equivalent of this while loop:

$position = 0;

while( $position < length $DNA ) {

# the same statements in the block, plus ...

++$position;
}
Take a moment and compare these two loops. You'll see the same statements but in
different locations.

As you can see, the for loop brings the initialization and increment of a counter
($position) into the loop statement, whereas in the while loop, they are separate
statements. In the for loop, both the initialization and the increment statement are
placed between parentheses, whereas you find only the conditional test in
the while loop. In the for loop, you can put initializations before the first semicolon
and increment statements after the second semicolon. The initialization statement is
done just once before starting the loop, and the increment statement is done at the end
of each iteration through the block before going back to the conditional test. It's really
just a shorthand for the equivalent while loop as just shown.

The conditional test checks to see if the position reached in the string is less than the
length of the string. It uses the length Perl function. Obviously, you don't want to
check characters beyond the length of the string. But a word is in order here about
the numbering of positions in strings and arrays.

By default, Perl assumes that a string begins at position 0 and its last character is at a
position that's numbered one less than the length of the string. Why do it this way
instead of numbering the positions from 1 up to and including the length of the string?
There are reasons, but they're somewhat abstruse; see the documentation for
enlightenment. If it's any comfort, many other programming languages make the same
choice.

This way of numbering is important to biologists because they are used to numbering
sequences beginning with 1, not with 0 the way Perl does it. You sometimes have to
add 1 to a position before printing out results so they'll make sense to
nonprogrammers. It's mildly annoying, but you'll get used to it.

The same holds true for numbering the elements of an array. The first element of an
array is element0; the last is element $length-1.

Anyway, you see that the conditional test evaluates to true while the value
of $position islength-1 or less and fails when $position reaches the same value as
the length of the string. For example, say you have a string that contains the text
"seeing." This has a length of six characters. The "s" is at position 0, and the "g" is at
position 5, which is one less than the string length 6.

Back in the block, you call the substr function to look into the string:
$base = substr($DNA, $position, 1);
This is a fairly general-purpose function for working with strings; you can also insert
and delete things. Here, you look at just one character, so you call substr on the
string $DNA, ask it to look in position $position for one character, and save the result
in scalar variable $base.

Conversion of sequences into FASTA format.

http://www.bioon.com/book/biology/Beginning%20Perl%20for%20Bioinformatics/44.htm

Brief Introduction To Perl
No ratings yet
Brief Introduction To Perl
11 pages
Perl Basics for Programmers
No ratings yet
Perl Basics for Programmers
99 pages
Unit 1 Notes - PERL
No ratings yet
Unit 1 Notes - PERL
53 pages
SL Unit-III (Perl)
No ratings yet
SL Unit-III (Perl)
210 pages
Introduction to Perl Programming
No ratings yet
Introduction to Perl Programming
210 pages
Perl Intro
No ratings yet
Perl Intro
21 pages
Unit 3
No ratings yet
Unit 3
117 pages
Perl Version 5.14.1 Documentation - Perlintro
No ratings yet
Perl Version 5.14.1 Documentation - Perlintro
11 pages
Perl Intro for Beginners
No ratings yet
Perl Intro for Beginners
11 pages
Perl Version 5.16.2 Documentation - Perlintro
No ratings yet
Perl Version 5.16.2 Documentation - Perlintro
11 pages
Perlintro (Unix)
No ratings yet
Perlintro (Unix)
11 pages
Perl Version 5.10.1 Documentation - Perlintro
No ratings yet
Perl Version 5.10.1 Documentation - Perlintro
11 pages
An Experiment-Driven Guide To Perl
No ratings yet
An Experiment-Driven Guide To Perl
4 pages
Learn Perl in About 2 Hours 30 Minutes PDF
No ratings yet
Learn Perl in About 2 Hours 30 Minutes PDF
25 pages
Perl Tutorial PDF
No ratings yet
Perl Tutorial PDF
25 pages
Perl Paxterra PDF
No ratings yet
Perl Paxterra PDF
232 pages
Perl Version 5.8.8 Documentation - Perlintro: Strongly Perltoc
No ratings yet
Perl Version 5.8.8 Documentation - Perlintro: Strongly Perltoc
10 pages
Unit III Perl
No ratings yet
Unit III Perl
185 pages
Learn Perl in About 2 Hours 30 Minutes: by Sam Hughes
No ratings yet
Learn Perl in About 2 Hours 30 Minutes: by Sam Hughes
23 pages
Module 2
No ratings yet
Module 2
23 pages
PERL - Complete
No ratings yet
PERL - Complete
303 pages
Perl Learning Material I
No ratings yet
Perl Learning Material I
13 pages
An Introduction To Perl PDF
No ratings yet
An Introduction To Perl PDF
25 pages
Unit-5 Oss
No ratings yet
Unit-5 Oss
31 pages
Perl Programming Basics
No ratings yet
Perl Programming Basics
302 pages
UNIT 3 Part 1
No ratings yet
UNIT 3 Part 1
63 pages
PERL - Complete
100% (2)
PERL - Complete
302 pages
Intro to Perl for Beginners
No ratings yet
Intro to Perl for Beginners
90 pages
Unit 3
No ratings yet
Unit 3
70 pages
Unit 3
No ratings yet
Unit 3
68 pages
Perl Programming: David Schweikert
No ratings yet
Perl Programming: David Schweikert
14 pages
Tutorial
No ratings yet
Tutorial
48 pages
Csc410 Perl Programming
No ratings yet
Csc410 Perl Programming
36 pages
Introduction To Perl Programming
No ratings yet
Introduction To Perl Programming
302 pages
Oss PERL 5 COMPLETE NOTES
No ratings yet
Oss PERL 5 COMPLETE NOTES
81 pages
Practical Extraction and Reporting Language
No ratings yet
Practical Extraction and Reporting Language
45 pages
Quick Perl Guide for Coders
No ratings yet
Quick Perl Guide for Coders
21 pages
Perl
No ratings yet
Perl
25 pages
Perl
No ratings yet
Perl
60 pages
WT Unit-5 Notes
No ratings yet
WT Unit-5 Notes
26 pages
Perl 01 - Basics
No ratings yet
Perl 01 - Basics
57 pages
Learning To Program With Perl OpenLibra
No ratings yet
Learning To Program With Perl OpenLibra
75 pages
Perl Programming Essentials
100% (1)
Perl Programming Essentials
52 pages
Perl Scripting Basics in Linux
No ratings yet
Perl Scripting Basics in Linux
3 pages
PERL On Unix/Linux: Practical Extraction and Reporting Language
No ratings yet
PERL On Unix/Linux: Practical Extraction and Reporting Language
135 pages
Programming Languages CCS21103: Assignment 1
No ratings yet
Programming Languages CCS21103: Assignment 1
10 pages
A Word About Operating Systems
No ratings yet
A Word About Operating Systems
35 pages
WT - Unit - 3 (Except Last Topic)
No ratings yet
WT - Unit - 3 (Except Last Topic)
16 pages
Perl
100% (1)
Perl
44 pages
Comprehensive Evaluation of Ten Docking Programs on a Diverse Set of Protein–Ligand Complexes= the Prediction Accuracy of Sampling Power and Scoring Power
No ratings yet
Comprehensive Evaluation of Ten Docking Programs on a Diverse Set of Protein–Ligand Complexes= the Prediction Accuracy of Sampling Power and Scoring Power
12 pages
Accessible High-Throughput Virtual Screening Molecular Docking Software for Students and Educators
No ratings yet
Accessible High-Throughput Virtual Screening Molecular Docking Software for Students and Educators
5 pages
Annotation - How To Do
No ratings yet
Annotation - How To Do
35 pages
Objects and Classes
No ratings yet
Objects and Classes
6 pages
C++ Programming: From Problem Analysis To Program Design
No ratings yet
C++ Programming: From Problem Analysis To Program Design
46 pages
Informatica Unicode File Handling
No ratings yet
Informatica Unicode File Handling
3 pages
Assignment 20 Stringfunctions Ans
No ratings yet
Assignment 20 Stringfunctions Ans
7 pages
Flat (Complete Notes)
100% (1)
Flat (Complete Notes)
91 pages
(Internal) I18n Code Evals Instructions
No ratings yet
(Internal) I18n Code Evals Instructions
18 pages
All VB6 Commands
No ratings yet
All VB6 Commands
10 pages
Unit 7 EXAM (ArrayLists) - REVIEW!!!
No ratings yet
Unit 7 EXAM (ArrayLists) - REVIEW!!!
9 pages
Structured Text for PLC Beginners
100% (1)
Structured Text for PLC Beginners
32 pages
String
No ratings yet
String
78 pages
Computer Programming 2 Week 11
No ratings yet
Computer Programming 2 Week 11
9 pages
SMS Computer Prelim 2025
No ratings yet
SMS Computer Prelim 2025
9 pages
Python 3 Without Pri by Spahic Benjamin
No ratings yet
Python 3 Without Pri by Spahic Benjamin
98 pages
Automata Formal Languages and Turing Machines PDF
50% (2)
Automata Formal Languages and Turing Machines PDF
348 pages
Python Homework Sheet 1 Data Types
100% (2)
Python Homework Sheet 1 Data Types
4 pages
QRMT Lecture: Counting & Patterns
No ratings yet
QRMT Lecture: Counting & Patterns
33 pages
Flat Mid I Question Bank Unit-I Part A
No ratings yet
Flat Mid I Question Bank Unit-I Part A
9 pages
Y10 04 CT24 Assessment
No ratings yet
Y10 04 CT24 Assessment
4 pages
List of Python Functions
No ratings yet
List of Python Functions
5 pages
Tcl Tutorial for Programmers
No ratings yet
Tcl Tutorial for Programmers
89 pages
The Graphical Kernel System (GKS) : Daduceand Frahopgood
No ratings yet
The Graphical Kernel System (GKS) : Daduceand Frahopgood
14 pages
Aim: Program 1
No ratings yet
Aim: Program 1
4 pages
C Programming
No ratings yet
C Programming
54 pages
Pick Systems Reference Manual
No ratings yet
Pick Systems Reference Manual
1,023 pages
Java Interfaces: Attribute Datatype
100% (1)
Java Interfaces: Attribute Datatype
33 pages
Python U 2 ONE SHOT Notes
No ratings yet
Python U 2 ONE SHOT Notes
80 pages
Freebasic Beginners Guide
No ratings yet
Freebasic Beginners Guide
255 pages
C Questions
No ratings yet
C Questions
4 pages
LogicalBuildingQuestionsPdf PDF
No ratings yet
LogicalBuildingQuestionsPdf PDF
4 pages
CA Lab2 2021
100% (1)
CA Lab2 2021
11 pages