An introduction to Perl
Dr. Vipin Singh,
Asst. Professor,
Amity Institute of Biotechnology,
Amity University,Noida.
Perl Basics
• Practical Extraction and Report Language - Perl is a high-
level, general-purpose, interpreted, scripting language.
• Strengths in text processing
• Descendant of
– C, Lisp, shell scripting (sh), …
• Created by Larry Wall , 1978 Links
www.perl.org
documentation: perldoc.perl.org
software download: www.activestate.com
Perl – scripting language
Interpreter Compiler
A program
written in high- Scans the entire program and
Translates program one
translates it as a whole into
level language statement at a time.
machine code.
is called a It takes large amount of time to
It takes less amount of time to
source code. analyze the source code but the
analyze the source code but the
overall execution time is
We need to overall execution time is slower.
comparatively faster.
convert the Generates intermediate object
No intermediate object code is
source code generated, hence are memory
code which further requires
linking, hence requires more
into machine efficient.
memory.
code and this is Continues translating the It generates the error message
accomplished program until the first error is only after scanning the whole
met, in which case it stops. program. Hence debugging is
my compilers Hence debugging is easy. comparatively hard.
and Programming language like Perl, Programming language like C,
interpreters. Python, use interpreters. C++ use compilers.
Perl – scripting language
A scripting language is a form of programming that is
usually interpreted rather than compiled
Programs in scripting languages are interpreted one
command at a time
control an application
typically not strongly typed
typically interpreted
scripts can be created, modified, executed at run-time
Perl is a scripting language
Perl Applications
Perl is a general-purpose • Used for
– text handling
programming language – parsing
originally developed for – data management
text manipulation and
• Applications
now used for a wide range – system administration
of tasks including system – client-side scripting in web
applications
administration, web
• CGI scripts (Common
development, network Gateway Interface)
programming, GUI – – network programming
– GUI development
(Graphic User Interface)
development, and more.
Design Principles
• Stated goals
– practical (easy to use, efficient, complete),
• To make easy tasks easy and difficult tasks possible
• Things that are different should look different
• Many features
– "There's more than one way to do it"
– "The Swiss Army Chainsaw of Programming Languages"
– "No unnecessary limits”
Perl Features
• Procedural programming – routines, subroutines and
functions
• Object-Oriented programming (OOPS) - a programming
language model organized around ’objects’ rather than
"actions" and data rather than logic
• Powerful built-in text processing – Regular Expressions
• Very large collection of third-party modules
Perl Data Types
Perl has three main variable types:
scalars,
arrays,
and hashes.
Scalars
A scalar represents a single value, begins with a $ sign
1.my $animal = "camel"; 2. my $answer = 42;
Scalar values can be strings, integers or floating point numbers.
There is no need to pre-declare your variable types, but you have to
declare them using the my keyword the first time you use them. (This
is one of the requirements of use strict; )
Scalar values can be used in various ways:
1.print $animal;
2.print "The animal is $animal\n";
3.print "The square of $answer is ", $answer * $answer, "\n“;
Arrays
An array represents a list of values, begin with @ sign
1.my @animals = ("camel", "llama", "owl");
2.my @numbers = (23, 42, 69);
3.my @mixed = ("camel", 42, 1.23);
Arrays are zero-indexed and represent an ordered collection
of scalars
When an individual element of an array is called, the value is scalar and
therefore prefixed by ‘$’
1.print $animals[0]; # prints "camel"
2.print $animals[1]; # prints "llama“
Arrays
my @dna = (“AAAA", “TTTT", “CCCC“,”GGGG”);
$count=@dna; Arrays do not have any finite size, do
$count=(@dna); not have to be predeclared, arrays
$count=scalar @dna; change in size as necessary
my @dna=(); @dna[1000]=“AAAA”;
@dna is now 1001 elements long (0-1000), of which only one is
populated, the other 1000 are undef
shift – removes the first element of the array
pop – removes last element of the array
unshift – adds to the beginning of the array
push - adds to the end of an array
splice – removes array elemnts from a defined index and defined
number of elements from the index
Hashes – Associative Arrays
A hash represents a set of key/value pairs, it is
preceded by a % sign
%sequences=
(
'Vipin'=>'ATGC','Sachin'=>'AAGC','Sudeep'=>'AAAA',
);
print "$sequences{Sachin}\n”; #prints AAGC
Hashes represent an unordered collection of
scalars, indexed by strings
Variables
• Prefix identify the data type of variables
– $ scalar
my $DNA = 'ATGCATGC';
– @ array
my @names = (“Vipin”,”Sachin”,”Sudeep”);
my @sequences= (“ATGC”,”AAGC”,”AAAA”);
print “$name[1],$sequence[1]”;
– % hash
my %sequences = (‘Vipin’ => ‘ATGC’, ‘Sachin’ => ‘AAGC’, ‘Sudeep’
=>’AAAA’);
• my declares a lexically scoped variable, without my, a variable is global
• Variables "interpolate" into strings - print "$DNA"; #prints the value of $DNA
Operators
• Arithmetic and relational
– Like Java: +, -, *, /, ==,!=, <, >, <=, >=
• Boolean
– Like Java: !, &&, ||
– Also: not, and, or
• String
– Comparisons: eq, ne, lt, gt, le, ge
– Concatenation: .
Selections
• If-clause
if (condition)
{…}
elsif (another_condition )
{…}
else {...}
• If-not-clause
unless (condition) {... #same as if (!
condition)
}
Loops
• while and until loops Loops are used
while (condition) {...
} when a part of the
until (condition) {... program is to be
}
executed multiple
• for loop
times – until a
for ($i=0; $i <= $max; $i++) {... condition is
}
satisfied – 3 loops
• foreach loop in Perl
foreach (@array) { # default variable $_ contains an
element
print "This element is $_\n";
}
Split – converting a string into an array
split /PATTERN/,EXPR
FILE HANDLING –
Reading from a file
In Perl the file is read through a File Handle
File handle is a sort of buffer, from where the file can be read, written or
appended.
File handle – can be any name usually in upper case
The file is first opened in Read mode
e.g. open (FILE_HANDLE,”DNA.txt”);
Or open (FH,”DNA.txt”);
Or open (VIPIN1,”DNA.txt”);
It is always good to confirm whether such a file exists and to display a message if it
does not
e.g. open (FILE_HANDLE,”DNA.txt”) || die “The file does not exist”;
READING A FILE
open (FILE_HANDLE,”DNA.txt”) || die “The file does not exist”;
Once the contents of the file are transferred to the file
handle, the file can be read line by line or as a whole
Reading a file – line by line
while ($line=<FILE_HANDLE>)
{
print $line;
} Reading a file in full
@line=<FILE_HANDLE>;
WRITING TO A FILE
Files can be written in two modes – ‘write’ and ‘append’
1. Write mode – a new file is opened with ‘>’ to write the results
open (FILE_HANDLE,”>RESULTS.txt”) || die “Cannot open file”;
2. Append mode – a pre-existing file is opened with ‘>>’ and appended
open (FILE_HANDLE,”>>RESULTS.txt”) || die “Cannot open file”;
To write or append to a file –
print FH “………….”;
Regular Expressions
A regular expression is a pattern - a template - to be matched
against a string.
Matching a regular expression against a string either succeeds or
fails.
Regular expressions let you easily manipulate strings of all sorts,
such as DNA and protein sequence data.
Regular expressions can be as simple as a word, which matches the
word itself, or they can be complex and made to match a large set of
different words (or even every word!).
Regular Expressions
There are three regular expression operators within Perl.
Match m//
Substitute s///
Translate tr///
The forward slashes in each case act as delimiters
for the regular expression that you are specifying
Operator =~ or !~ binding operator
match Operator
The match operator, m//, is used to match a string or statement
to a regular expression
substitute Operator
The substitution operator, s///, is an extension of the match operator
that allows you to replace the text matched to with some new text
The basic form of the operator is –
s/PATTERN/REPLACEMENT
e.g. $dna=~s/T/U; the first T in string is replaced by
U
or $dna=~s/T/U/g; - all Ts replaced by U
substitute Operator