Indian Institute of Technology Kharagpur
CGI Scripts
Prof. Indranil Sen Gupta
Dept. of Computer Science & Engg.
I.I.T. Kharagpur, INDIA
Lecture 19: CGI Scripts
On completion, the student will be able to:
• Explain the basic structure of CGI scripts,
and their working.
• Explain the different ways in which form data
can be passed on to a CGI script program.
• Illustrate the URL encoding/decoding issues.
• State the standard environment variables
used in CGI scripts.
• Explain how response is sent back to the
web client after the CGI script finishes
execution.
1
Introduction
• CGI stands for Common Gateway
Interface.
¾Allows interactive web pages to be written.
Page created dynamically, based on user
request.
¾CGI programs are called “scripts” because
the first CGI programs were written using
UNIX shell scripts, and PERL.
Can be written in almost any language.
¾Usually resides in a special directory in the
web server (typically, “cgi-bin”).
• Apache Directory Structure: a case study
¾cgi-bin
Here most of the interactive programs will
reside. These will be written in Perl, Java, or any
other programming language.
¾conf
This will contain the configuration files.
¾htdocs
This will contain the actual HTML documents,
and will typically have many subdirectories. This
directory is known as the DocumentRoot.
2
¾icons
This contains the icons that Apache will use
when displaying information or error messages.
¾images
This will contain the image files that will be used
in the web site.
¾logs
This will contain the log files: the access_log
and error_log.
Structure of CGI Script
• When a CGI script is invoked by the
server, the server passes information to
the script in one of two ways:
a) GET
b) POST
• The request method used is passed to
the script via the environment variable
REQUEST_METHOD.
3
“GET” Request Method
• The GET method sends request information
as parameters appended at the end of the
URL.
http://myserver.edu/cgi-bin/myprog.pl?
name=niloy&rollno=7312&age=24
• The parameters are passed to the CGI
program via the environment variable
QUERY_STRING.
¾For the above example, QUERY_STRING
will contain
name=niloy&rollno=7312&age=24
“POST” Request Method
• The data gets passed from the server to
the CGI script through STDIN.
• The environment variable
CONTENT_LENGTH indicates the size in
bytes of the incoming data.
• The format of the POST-ed data is:
var1=value1&var2=value2&……
• The REQUEST_METHOD environment
variable must be examined to know
whether or not to read from STDIN.
4
To Summarize
• For GET
¾Data are read from QUERY_STRING
environment variable.
• For POST
¾Data are read from STDIN.
¾Number of bytes to be read is obtained
from CONTENT_LENGTH.
• Both data available in same format:
var1=value1&var2=value2&……
name=niloy & rollno=7312 & age=24
URL Encoding
• For platform independence, all data
passed to the server are URL-encoded.
¾Variables are separated by ‘&’.
¾Special characters (including ‘&’) are
escaped as 2-digit hex numbers, e,g,
%25 Î ‘%’
%20 Î ‘ ’
¾‘+’ sign is interpreted as a space character.
5
• The process of decoding back:
¾Separate out the variables.
¾Replace all ‘+’ signs by spaces.
¾Replace all %## with the corresponding
ASCII character.
• Which characters are encoded?
¾Control characters: 0x00 through 0x1F,
and 0x7F.
¾8-bit characters: 0x80 through 0xFF
¾Characters given special importance
within URLs: ; / ? : @ & = + $ ,
¾Characters often used to delimit URLs: < >
# % “
¾Characters considered unsafe as they may
have special meaning for other protocols:
{ } | \ ^ [ ] `
6
• A point to note:
¾When the server passes data using the
POST method, the scripts checks the
environment variable CONTENT_TYPE.
¾If the value of CONTENT_TYPE is
application/x-www-form-urlencoded
the data needs to be decoded before use.
Basic Structure of CGI Script
• Step 1: Initialization
¾Check REQUEST_METHOD.
¾Parse string and extract variables
depending on “GET” or “POST”.
¾Check CONTENT_TYPE, to find out if
the string is URL-encoded.
• Step 2: Processing
¾Process the input data.
¾Output the results (MIME-type header,
and the contents).
7
• Step 3: Termination
¾Release the system resources.
¾Terminate the program.
Environment Variables Used
• CONTENT_LENGTH
¾Length of URL-encoded data in bytes.
• CONTENT_TYPE
¾Specifies the type of data as a MIME header.
• QUERY_STRING
¾Information at the end of the URL after ‘?’.
• REMOTE_ADDR
¾IP address of the client making the request.
• REMOTE_HOST
¾Resolved host name of the client.
8
• REQUEST_METHOD
¾“GET” or “POST”.
• SERVER_NAME
¾Web server’s host name, or IP address.
• SERVER_PROTOCOL
¾Say, HTTP/1.0
• SERVER_PORT
¾Port number on server that received the
HTTP request.
• SCRIPT_NAME
¾Name of the CGI script being run.
Response Header
• The most common response header is
Content-Type, which is based on MIME
types.
• Typical values are:
Content-Type: text/plain
text/html
image/gif
video/avi
9
• A complete MIME header looks like this:
Content-Type: text/plain;
charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Description: Postscript
CGI Real-life Examples
• Search Engine
• Page-hit Counter
• Student Registration
• On-line Booking of Tickets
• On-line Purchase of Items
• E-mail Gateways
• Feedback Scripts
• Web-based Games
10
Security with CGI Scripts
• A CGI script is a program that anyone
in the world can run on your machine.
• Do not trust the user input.
¾In particular, do not put user data in a
shell command without verifying the data
carefully.
¾An example in next slide.
• An example
¾Suppose that you have a CGI script that
lets users run the “finger” command on
your host.
¾In Perl, there can be a line:
system “finger $username”
¾A malicious user may enter
isg; rm –r /
as the username.
¾The result --- all files will get deleted.
11
Enter UserId isg; rm –r /
An Example CGI Program
• Using bash shell script:
#!/bin/sh
CAT=/bin/cat
echo Content-type: text/plain
echo ""
if [[ -x $CAT]]
then
$CAT $1 | sort
else
echo Cannot find command on this
system.
fi
12
• What this program does?
¾Sends the contents of a file residing on
the server back to the browser.
• How to invoke?
<A HREF="/cgi-bin/test1.sh?
/home/user1/public_html/text-file.txt">
Click here to activate</A>
$1
Another Example
#!/bin/sh
echo Content-type: text/html
echo ""
/bin/cat << EOM
<HTML>
<HEAD>
<TITLE>File Output: /home/user1/public_html/text-file.txt
</TITLE>
</HEAD>
<BODY bgcolor="#cccccc" text="#000000">
<HR SIZE=5>
<H1>File Output: /home/user1/public_html/text-file.txt </H1>
<HR SIZE=5> <P>
13
<SMALL>
<PRE>
EOM
/bin/cat /home/user1/public_html/text-file.txt
CAT << EOM
</PRE>
</SMALL> <P>
</BODY>
</HTML>
EOM
• What this program does?
¾Outputs the contents of the file “text-
file.txt” as a HTML file.
• How to invoke?
¾Through a dummy HTML form.
¾Through the following link:
<A HREF="/cgi-bin/test2.sh">Click here</A>
14
E-mail Gateways: an Example
• E-mail gateways are very popular on the
web.
• Allows users to send and receive mails,
without having to worry about managing a
mail server.
• Can be designed using CGI scripts, or any
other similar technologies.
• Popular e-mail gateways: yahoo, rediffmail,
hotmail, gmail, etc.
15
Email
Browser Mail Server
Gateway
Writing CGI Scripts using Perl
• Would be discussed later.
¾After discussing the syntax and
semantics of Perl.
¾We will see how the form data can be
extracted and processed.
Requires string manipulation.
16
SOLUTIONS TO QUIZ
QUESTIONS ON
LECTURE 18
17
Quiz Solutions on Lecture 18
1. What is a hot spot?
A hot spot is a defined region on an
image map which, when clicked,
hyperlinks to a specified URL.
2. What is the essential difference between
client-side and server-side image maps?
In server-side image map, the processing
of mouse click is done on the server
side. In client-side image map, all
information is there in the HTML file, and
can be done locally itself.
Quiz Solutions on Lecture 18
3. What information does the image map
configuration file contain?
Default URL, an optional base URL,
and the geometries of the hot spots.
4. What is the purpose of the default URL
in case of server-side image map?
It specifies the URL where the user will
be taken if he/she clicks on a region
which is not a hot spot.
18
Quiz Solutions on Lecture 18
5. Why is client-side image map faster and
puts less load on the server?
Because all processing is done locally on
the browser.
6. Why is the ISMAP attribute used?
To indicate that the included image is a
clickable map.
7. Why is the USEMAP attribute used?
For linking to an image in client-side
image map.
Quiz Solutions on Lecture 18
8. Show a client-side image map configuration
specification where there are four triangular
shaped areas joined together to form a
square shaped structure.
(0,0) (50,0)
TOP
LEFT (25,25) RIGHT
BOTTOM
(0,50) (50,50)
19
Quiz Solutions on Lecture 18
<MAP NAME = “demo_map”>
<AREA SHAPE=POLY COORDS=“0,0,0,50,25,25”
HREF=“left.html”>
<AREA SHAPE=POLY COORDS=“50,0,50,50,25,25”
HREF=“right.html”>
<AREA SHAPE=POLY COORDS=“0,0,50,0,25,25”
HREF=“top.html”>
<AREA SHAPE=POLY COORDS=“10,50,50,50,25,25”
HREF=“bottom.html”>
</MAP>
QUIZ QUESTIONS ON
LECTURE 19
20
Quiz Questions on Lecture 19
1. What does the REQUEST_METHOD
environment variable specify?
2. How does the form data get accessed in
GET, and in what form?
3. How does the form data get accessed in
POST?
4. Why is the POST method more desirable as
compared to GET in general?
5. Perform URL encoding on the following
string:
http://xyz.com?name=Subir Das
Quiz Questions on Lecture 19
6. How does the CGI script know that the
form data as received has been URL
encoded?
7. What is the function of the UNIX command
“finger”?
8. Write a CGI program using shell script
which will send back the message
“THANK YOU FOR SUBMITTING” every
time a form is submitted to it.
21