KEMBAR78
Python Net Binder | PDF | Network Socket | Hypertext Transfer Protocol
0% found this document useful (0 votes)
649 views124 pages

Python Net Binder

Python Net Binder is a book about creating web apps with Python

Uploaded by

Mihai Feraru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
649 views124 pages

Python Net Binder

Python Net Binder is a book about creating web apps with Python

Uploaded by

Mihai Feraru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Python Network Programming

David M. Beazley
http://www.dabeaz.com

Edition: Thu Jun 17 19:49:58 2010

Copyright (C) 2010


David M Beazley
All Rights Reserved

Python Network Programming : Table of Contents


!
!
!
!
!

1. Network Fundamentals !
2. Client Programming!
!
3. Internet Data Handling! !
4. Web Programming Basics!
5. Advanced Networks!
!

Edition: Thu Jun 17 19:49:58 2010

!
!
!
!
!

!
!
!
!
!

!
!
!
!
!

!
!
!
!
!

!
!
!
!
!

4
32
49
65
93

Threaded Server
Forking Server (Unix)
Asynchronous Server
Utility Functions
Omissions
Discussion

Slide Title Index

0. Introduction
Introduction
Support Files
Python Networking
This Course
Standard Library
Prerequisites

0-1
0-2
0-3
0-4
0-5
0-6

1. Network Fundamentals
Network Fundamentals
The Problem
Two Main Issues
Network Addressing
Standard Ports
Using netstat
Connections
Client/Server Concept
Request/Response Cycle
Using Telnet
Data Transport
Sockets
Socket Basics
Socket Types
Using a Socket
TCP Client
Exercise 1.1
Server Implementation
TCP Server
Exercise 1.2
Advanced Sockets
Partial Reads/Writes
Sending All Data
End of Data
Data Reassembly
Timeouts
Non-blocking Sockets
Socket Options
Sockets as Files
Exercise 1.3
Odds and Ends
UDP : Datagrams
UDP Server
UDP Client
Unix Domain Sockets
Raw Sockets
Sockets and Concurrency

1-1
1-2
1-3
1-4
1-5
1-6
1-7
1-8
1-9
1-10
1-11
1-12
1-13
1-14
1-15
1-16
1-17
1-18
1-19
1-27
1-28
1-29
1-31
1-32
1-33
1-34
1-35
1-36
1-37
1-39
1-40
1-41
1-42
1-43
1-44
1-45
1-46

1-50
1-51
1-52
1-53
1-54
1-55

2. Client Programming
Client Programming
Overview
urllib Module
urllib protocols
HTML Forms
Web Services
Parameter Encoding
Sending Parameters
Response Data
Response Headers
Response Status
Exercise 2.1
urllib Limitations
urllib2 Module
urllib2 Example
urllib2 Requests
Requests with Data
Request Headers
urllib2 Error Handling
urllib2 Openers
urllib2 build_opener()
Example : Login Cookies
Discussion
Exercise 2.2
Limitations
ftplib
Upload to a FTP Server
httplib
smtplib
Exercise 2.3

2-1
2-2
2-3
2-5
2-6
2-8
2-9
2-10
2-12
2-13
2-14
2-15
2-16
2-17
2-18
2-19
2-20
2-21
2-22
2-23
2-24
2-25
2-26
2-27
2-28
2-29
2-30
2-31
2-32
2-33

3. Internet Data Handling


Internet Data Handling
Overview
CSV Files
Parsing HTML
Running a Parser
HTML Example
XML Parsing with SAX
Brief XML Refresher
SAX Parsing

3-1
3-2
3-3
3-4
3-6
3-7
3-9
3-10
3-11

Exercise 3.1
XML and ElementTree
etree Parsing Basics
Obtaining Elements
Iterating over Elements
Element Attributes
Search Wildcards
cElementTree
Tree Modification
Tree Output
Iterative Parsing
Exercise 3.2
JSON
Sample JSON File
Processing JSON Data
Exercise 3.3

3-13
3-14
3-15
3-17
3-18
3-19
3-20
3-22
3-23
3-24
3-25
3-28
3-29
3-30
3-31
3-32

4. Web Programming
Web Programming Basics
Introduction
Overview
Disclaimer
HTTP Explained
HTTP Client Requests
HTTP Responses
HTTP Protocol
Content Encoding
Payload Packaging
Exercise 4.1
Role of Python
Typical Python Tasks
Content Generation
Example : Page Templates
Commentary
Exercise 4.2
HTTP Servers
A Simple Web Server
Exercise 4.3
A Web Server with CGI
CGI Scripting
CGI Example
CGI Mechanics
Classic CGI Interface
CGI Query Variables
cgi Module
CGI Responses
Note on Status Codes
CGI Commentary
Exercise 4.4
WSGI
WSGI Interface

4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
4-14
4-15
4-17
4-18
4-19
4-20
4-21
4-22
4-23
4-24
4-27
4-28
4-29
4-30
4-31
4-32
4-33
4-34
4-35
4-36

WSGI Example
WSGI Applications
WSGI Environment
Processing WSGI Inputs
WSGI Responses
WSGI Content
WSGI Content Encoding
WSGI Deployment
WSGI and CGI
Exercise 4.5
Customized HTTP
Exercise 4.6
Web Frameworks
Commentary

4-37
4-38
4-39
4-41
4-42
4-44
4-45
4-46
4-48
4-49
4-50
4-53
4-54
4-56

5. Advanced Networking
Advanced Networking
Overview
Problem with Sockets
SocketServer
SocketServer Example
Execution Model
Exercise 5.1
Big Picture
Concurrent Servers
Server Mixin Classes
Server Subclassing
Exercise 5.2
Distributed Computing
Discussion
XML-RPC
Simple XML-RPC
XML-RPC Commentary
XML-RPC and Binary
Exercise 5.3
Serializing Python Objects
pickle Module
Pickling to Strings
Example
Miscellaneous Comments
Exercise 5.4
multiprocessing
Connections
Connection Use
Example
Commentary
What about...
Network Wrap-up
Exercise 5.5

5-1
5-2
5-3
5-4
5-5
5-11
5-12
5-13
5-14
5-15
5-16
5-17
5-18
5-19
5-20
5-21
5-23
5-24
5-25
5-26
5-27
5-28
5-29
5-31
5-32
5-33
5-34
5-35
5-36
5-38
5-40
5-41
5-42

Section 0

Introduction

Support Files
Course exercises:
http://www.dabeaz.com/python/pythonnetwork.zip

This zip file should be downloaded and extracted


someplace on your machine

All of your work will take place in the the


"PythonNetwork" folder

1- 2

Copyright (C) 2010, http://www.dabeaz.com

Python Networking
Network programming is a major use of Python
Python standard library has wide support for
network protocols, data encoding/decoding, and
other things you need to make it work

Writing network programs in Python tends to be


substantially easier than in C/C++

1- 3

Copyright (C) 2010, http://www.dabeaz.com

This Course
This course focuses on the essential details of
network programming that all Python
programmers should probably know

Low-level programming with sockets


High-level client modules
How to deal with common data encodings
Simple web programming (HTTP)
Simple distributed computing
1- 4

Copyright (C) 2010, http://www.dabeaz.com

Standard Library
We will only cover modules supported by the
Python standard library

These come with Python by default


Keep in mind, much more functionality can be
found in third-party modules

Will give links to notable third-party libraries as


appropriate

1- 5

Copyright (C) 2010, http://www.dabeaz.com

Prerequisites
You should already know Python basics
However, you don't need to be an expert on all

of its advanced features (in fact, none of the code


to be written is highly sophisticated)

You should have some prior knowledge of

systems programming and network concepts

1- 6

Copyright (C) 2010, http://www.dabeaz.com

Section 1

Network Fundamentals

The Problem
Communication between computers
Network

It's just sending/receiving bits


1- 2

Copyright (C) 2010, http://www.dabeaz.com

Two Main Issues


Addressing
Specifying a remote computer and service
Data transport
Moving bits back and forth
1- 3

Copyright (C) 2010, http://www.dabeaz.com

Network Addressing
Machines have a hostname and IP address
Programs/services have port numbers
foo.bar.com
205.172.13.4
port 4521

Network

www.python.org
82.94.237.218

port 80

1- 4

Copyright (C) 2010, http://www.dabeaz.com

Standard Ports
Ports for common services are preassigned
21
22
23
25
80
110
119
443

FTP
SSH
Telnet
SMTP (Mail)
HTTP (Web)
POP3 (Mail)
NNTP (News)
HTTPS (web)

Other port numbers may just be randomly

assigned to programs by the operating system


1- 5

Copyright (C) 2010, http://www.dabeaz.com

Using netstat
Use 'netstat' to view active network connections
shell % netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address
Foreign Address
tcp
0
0 *:imaps
*:*
tcp
0
0 *:pop3s
*:*
tcp
0
0 localhost:mysql
*:*
tcp
0
0 *:pop3
*:*
tcp
0
0 *:imap2
*:*
tcp
0
0 *:8880
*:*
tcp
0
0 *:www
*:*
tcp
0
0 192.168.119.139:domain *:*
tcp
0
0 localhost:domain
*:*
tcp
0
0 *:ssh
*:*
...

Note: Must execute from the command shell on


both Unix and Windows

1- 6

Copyright (C) 2010, http://www.dabeaz.com

State
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN

Connections
Each endpoint of a network connection is always
represented by a host and port #

In Python you write it out as a tuple (host,port)


("www.python.org",80)
("205.172.13.4",443)

In almost all of the network programs youll


write, you use this convention to specify a
network address

1- 7

Copyright (C) 2010, http://www.dabeaz.com

Client/Server Concept
Each endpoint is a running program
Servers wait for incoming connections and
provide a service (e.g., web, mail, etc.)

Clients make connections to servers


Client

Server
www.bar.com
205.172.13.4

browser

web

Port 80

1- 8

Copyright (C) 2010, http://www.dabeaz.com

Request/Response Cycle
Most network programs use a request/
response model based on messages

Client sends a request message (e.g., HTTP)


GET /index.html HTTP/1.0

Server sends back a response message


HTTP/1.0 200 OK
Content-type: text/html
Content-length: 48823
<HTML>
...

The exact format depends on the application


1- 9

Copyright (C) 2010, http://www.dabeaz.com

Using Telnet
As a debugging aid, telnet can be used to

directly communicate with many services


telnet hostname portnum

Example:
type this
and press
return a few
times

shell % telnet www.python.org 80


Trying 82.94.237.218...
Connected to www.python.org.
Escape character is '^]'.
GET /index.html HTTP/1.0
HTTP/1.1 200 OK
Date: Mon, 31 Mar 2008 13:34:03 GMT
Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2
mod_ssl/2.2.3 OpenSSL/0.9.8c
...

1- 10

Copyright (C) 2010, http://www.dabeaz.com

Data Transport
There are two basic types of communication
Streams (TCP): Computers establish a

connection with each other and read/write data


in a continuous stream of bytes---like a file. This
is the most common.

Datagrams (UDP): Computers send discrete

packets (or messages) to each other. Each


packet contains a collection of bytes, but each
packet is separate and self-contained.
1- 11

Copyright (C) 2010, http://www.dabeaz.com

Sockets
Programming abstraction for network code
Socket: A communication endpoint
socket

socket
network

Supported by socket library module


Allows connections to be made and data to be
transmitted in either direction

1- 12

Copyright (C) 2010, http://www.dabeaz.com

Socket Basics
To create a socket

import socket
s = socket.socket(addr_family, type)

Address families
socket.AF_INET
socket.AF_INET6

Internet protocol (IPv4)


Internet protocol (IPv6)

socket.SOCK_STREAM
socket.SOCK_DGRAM

Connection based stream (TCP)


Datagrams (UDP)

Socket types
Example:

from socket import *


s = socket(AF_INET,SOCK_STREAM)

1- 13

Copyright (C) 2010, http://www.dabeaz.com

Socket Types
Almost all code will use one of following
from socket import *
s = socket(AF_INET, SOCK_STREAM)
s = socket(AF_INET, SOCK_DGRAM)

Most common case: TCP connection


s = socket(AF_INET, SOCK_STREAM)

1- 14

Copyright (C) 2010, http://www.dabeaz.com

10

Using a Socket
Creating a socket is only the first step
s = socket(AF_INET, SOCK_STREAM)

Further use depends on application


Server
Listen for incoming connections
Client
Make an outgoing connection
1- 15

Copyright (C) 2010, http://www.dabeaz.com

TCP Client
How to make an outgoing connection
from socket import *
s = socket(AF_INET,SOCK_STREAM)
s.connect(("www.python.org",80))
s.send("GET /index.html HTTP/1.0\n\n")
data = s.recv(10000)
s.close()

# Connect
# Send request
# Get response

s.connect(addr) makes a connection


s.connect(("www.python.org",80))

Once connected, use send(),recv() to


transmit and receive data

close() shuts down the connection


1- 16

Copyright (C) 2010, http://www.dabeaz.com

11

Exercise 1.1

Time : 10 Minutes

1- 17

Copyright (C) 2010, http://www.dabeaz.com

Server Implementation
Network servers are a bit more tricky
Must listen for incoming connections on a
well-known port number

Typically run forever in a server-loop


May have to service multiple clients
1- 18

Copyright (C) 2010, http://www.dabeaz.com

12

TCP Server

A simple server

from socket import *


s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()

Send a message back to a client

% telnet localhost 9000


Connected to localhost.
Escape character is '^]'.
Hello 127.0.0.1
Server
Connection closed by foreign host.
%

message
1- 19

Copyright (C) 2010, http://www.dabeaz.com

TCP Server

Address binding

from socket import *


s = socket(AF_INET,SOCK_STREAM)
binds the socket to
s.bind(("",9000))
a specific address
s.listen(5)
while True:
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()

Addressing

binds to localhost

s.bind(("",9000))
s.bind(("localhost",9000))
s.bind(("192.168.2.1",9000))
s.bind(("104.21.4.2",9000))

If system has multiple


IP addresses, can bind
to a specific address
1- 20

Copyright (C) 2010, http://www.dabeaz.com

13

TCP Server

Start listening for connections

from socket import *


s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
Tells operating system to
s.listen(5)
start listening for
while True:
connections on the socket
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()

s.listen(backlog)
backlog is # of pending connections to allow
Note: not related to max number of clients
1- 21

Copyright (C) 2010, http://www.dabeaz.com

TCP Server

Accepting a new connection

from socket import *


s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
Accept a new client
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()

connection

s.accept() blocks until connection received


Server sleeps if nothing is happening
1- 22

Copyright (C) 2010, http://www.dabeaz.com

14

TCP Server

Client socket and address

from socket import *


s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
Accept returns a pair (client_socket,addr)
while True:
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()
<socket._socketobject
object at 0x3be30>

("104.23.11.4",27743)

This is the network/port


address of the client that
connected

This is a new socket


that's used for data

1- 23

Copyright (C) 2010, http://www.dabeaz.com

TCP Server

Sending data

from socket import *


s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
Send
c.close()

data to client

Note: Use the client socket for


transmitting data. The server
socket is only used for
accepting new connections.

1- 24

Copyright (C) 2010, http://www.dabeaz.com

15

TCP Server

Closing the connection

from socket import *


s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()
Close client connection

Note: Server can keep client connection alive


as long as it wants

Can repeatedly receive/send data


1- 25

Copyright (C) 2010, http://www.dabeaz.com

TCP Server

Waiting for the next connection


from socket import *
s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
Wait for next connection
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()

Original server socket is reused to listen for


more connections

Server runs forever in a loop like this


1- 26

Copyright (C) 2010, http://www.dabeaz.com

16

Exercise 1.2

Time : 20 Minutes

1- 27

Copyright (C) 2010, http://www.dabeaz.com

Advanced Sockets
Socket programming is often a mess
Huge number of options
Many corner cases
Many failure modes/reliability issues
Will briefly cover a few critical issues
1- 28

Copyright (C) 2010, http://www.dabeaz.com

17

Partial Reads/Writes
Be aware that reading/writing to a socket
may involve partial data transfer

send() returns actual bytes sent


recv() length is only a maximum limit
>>> len(data)
1000000
>>> s.send(data)
37722
>>>

Sent partial data

>>> data = s.recv(10000)


>>> len(data)
6420
Received
>>>

less than max


1- 29

Copyright (C) 2010, http://www.dabeaz.com

Partial Reads/Writes
Be aware that for TCP, the data stream is
continuous---no concept of records, etc.
# Client
...
s.send(data)
s.send(moredata)
...

This recv() may return data


from both of the sends
combined or less data than
even the first send

# Server
...
data = s.recv(maxsize)
...

A lot depends on OS buffers, network


bandwidth, congestion, etc.

Copyright (C) 2010, http://www.dabeaz.com

18

1- 30

Sending All Data


To wait until all data is sent, use sendall()
s.sendall(data)

Blocks until all data is transmitted


For most normal applications, this is what
you should use

Exception :You dont use this if networking is


mixed in with other kinds of processing
(e.g., screen updates, multitasking, etc.)

1- 31

Copyright (C) 2010, http://www.dabeaz.com

End of Data
How to tell if there is no more data?
recv() will return empty string
>>> s.recv(1000)
''
>>>

This means that the other end of the

connection has been closed (no more sends)

1- 32

Copyright (C) 2010, http://www.dabeaz.com

19

Data Reassembly
Receivers often need to reassemble

messages from a series of small chunks

Here is a programming template for that


fragments = []
while not done:
chunk = s.recv(maxsize)
if not chunk:
break
fragments.append(chunk)

# List of chunks
# Get a chunk
# EOF. No more data

# Reassemble the message


message = "".join(fragments)

Don't use string concat (+=). It's slow.


1- 33

Copyright (C) 2010, http://www.dabeaz.com

Timeouts

Most socket operations block indefinitely


Can set an optional timeout
s = socket(AF_INET, SOCK_STREAM)
...
s.settimeout(5.0)
# Timeout of 5 seconds
...

Will get a timeout exception

>>> s.recv(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.timeout: timed out
>>>

Disabling timeouts
s.settimeout(None)

1- 34

Copyright (C) 2010, http://www.dabeaz.com

20

Non-blocking Sockets
Instead of timeouts, can set non-blocking
>>> s.setblocking(False)

Future send(),recv() operations will raise an

exception if the operation would have blocked


>>> s.setblocking(False)
>>> s.recv(1000)
No data available
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.error: (35, 'Resource temporarily unavailable')
>>> s.recv(1000)
Data arrived
'Hello World\n'
>>>

Sometimes used for polling

1- 35

Copyright (C) 2010, http://www.dabeaz.com

Socket Options
Sockets have a large number of parameters
Can be set using s.setsockopt()
Example: Reusing the port number
>>> s.bind(("",9000))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in bind
socket.error: (48, 'Address already in use')
>>> s.setsockopt(socket.SOL_SOCKET,
...
socket.SO_REUSEADDR, 1)
>>> s.bind(("",9000))
>>>

Consult reference for more options

1- 36

Copyright (C) 2010, http://www.dabeaz.com

21

Sockets as Files
Sometimes it is easier to work with sockets
represented as a "file" object
f = s.makefile()

This will wrap a socket with a file-like API


f.read()
f.readline()
f.write()
f.writelines()
for line in f:
...
f.close()

1- 37

Copyright (C) 2010, http://www.dabeaz.com

Sockets as Files
Commentary : From personal experience,

putting a file-like layer over a socket rarely


works as well in practice as it sounds in theory.

Tricky resource management (must manage


both the socket and file independently)

It's easy to write programs that mysteriously


"freeze up" or don't operate quite like you
would expect.

1- 38

Copyright (C) 2010, http://www.dabeaz.com

22

Exercise 1.3

Time : 15 Minutes

1- 39

Copyright (C) 2010, http://www.dabeaz.com

Odds and Ends


Other supported socket types
Datagram (UDP) sockets
Unix domain sockets
Raw sockets/Packets
Sockets and concurrency
Useful utility functions
1- 40

Copyright (C) 2010, http://www.dabeaz.com

23

UDP : Datagrams
DATA

DATA

DATA

Data sent in discrete packets (Datagrams)


No concept of a "connection"
No reliability, no ordering of data
Datagrams may be lost, arrive in any order
Higher performance (used in games, etc.)
1- 41

Copyright (C) 2010, http://www.dabeaz.com

UDP Server
A simple datagram server
from socket import *
s = socket(AF_INET,SOCK_DGRAM)
s.bind(("",10000))

Create datagram socket

Bind to a specific port

while True:
data, addr = s.recvfrom(maxsize)
resp = "Get off my lawn!"
s.sendto(resp,addr)

Wait for a message

Send response
(optional)

No "connection" is established
It just sends and receives packets

1- 42

Copyright (C) 2010, http://www.dabeaz.com

24

UDP Client
Sending a datagram to a server
from socket import *
s = socket(AF_INET,SOCK_DGRAM)

Create datagram socket

msg = "Hello World"


s.sendto(msg,("server.com",10000))
data, addr = s.recvfrom(maxsize)

returned data

Send a message
Wait for a response
(optional)

remote address

Key concept: No "connection"


You just send a data packet
1- 43

Copyright (C) 2010, http://www.dabeaz.com

Unix Domain Sockets


Available on Unix based systems. Sometimes

used for fast IPC or pipes between processes

Creation:
s = socket(AF_UNIX, SOCK_STREAM)
s = socket(AF_UNIX, SOCK_DGRAM)

Address is just a "filename"


s.bind("/tmp/foo")
s.connect("/tmp/foo")

# Server binding
# Client connection

Rest of the programming interface is the same


1- 44

Copyright (C) 2010, http://www.dabeaz.com

25

Raw Sockets
If you have root/admin access, can gain direct
access to raw network packets

Depends on the system


Example: Linux packet sniffing
s = socket(AF_PACKET, SOCK_DGRAM)
s.bind(("eth0",0x0800))
# Sniff IP packets
while True:
msg,addr = s.recvfrom(4096)
...

# get a packet

1- 45

Copyright (C) 2010, http://www.dabeaz.com

Sockets and Concurrency


Servers usually handle multiple clients
clients

server

browser

web

Port 80

web
web
browser

1- 46

Copyright (C) 2010, http://www.dabeaz.com

26

Sockets and Concurrency


Each client gets its own socket on server
# server code

clients
s = socket(AF_INET,

server

SOCK_STREAM)

...
while True:
c,a = s.accept()
... browser

a connection
point for clients

web
web
web

client data
transmitted
on a different
socket

browser

1- 47

Copyright (C) 2010, http://www.dabeaz.com

Sockets and Concurrency


New connections make a new socket
clients

server

browser

web
connect

browser

web
web

Port 80

accept()

web
send()/recv()
browser

1- 48

Copyright (C) 2010, http://www.dabeaz.com

27

Sockets and Concurrency


To manage multiple clients,
Server must always be ready to accept
new connections

Must allow each client to operate

independently (each may be performing


different tasks on the server)

Will briefly outline the common solutions


1- 49

Copyright (C) 2010, http://www.dabeaz.com

Threaded Server
Each client is handled by a separate thread
import threading
from socket import *

def handle_client(c):
... whatever ...
c.close()
return
s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
t = threading.Thread(target=handle_client,
args=(c,))

1- 50

Copyright (C) 2010, http://www.dabeaz.com

28

Forking Server (Unix)

Each client is handled by a subprocess

import os
from socket import *
s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
if os.fork() == 0:
# Child process. Manage client
...
c.close()
os._exit(0)
else:
# Parent process. Clean up and go
# back to wait for more connections
c.close()

Note: Omitting some critical details

1- 51

Copyright (C) 2010, http://www.dabeaz.com

Asynchronous Server
Server handles all clients in an event loop
import select
from socket import *
s = socket(AF_INET,SOCK_STREAM)
...
clients = [] # List of all active client sockets
while True:
# Look for activity on any of my sockets
input,output,err = select.select(s+clients,
clients, clients)
# Process all sockets with input
for i in input:
...
# Process all sockets ready for output
for o in output:
...

Frameworks such as Twisted build upon this

1- 52

Copyright (C) 2010, http://www.dabeaz.com

29

Utility Functions
Get the hostname of the local machine
>>> socket.gethostname()
'foo.bar.com'
>>>

Get the IP address of a remote machine


>>> socket.gethostbyname("www.python.org")
'82.94.237.218'
>>>

Get name information on a remote IP


>>> socket.gethostbyaddr("82.94.237.218")
('dinsdale.python.org', [], ['82.94.237.218'])
>>>

1- 53

Copyright (C) 2010, http://www.dabeaz.com

Omissions
socket module has hundreds of obscure
socket control options, flags, etc.

Many more utility functions


IPv6 (Supported, but new and hairy)
Other socket types (SOCK_RAW, etc.)
More on concurrent programming (covered in
advanced course)

1- 54

Copyright (C) 2010, http://www.dabeaz.com

30

Discussion
It is often unnecessary to directly use sockets
Other library modules simplify use
However, those modules assume some
knowledge of the basic concepts (addresses,
ports, TCP, UDP, etc.)

Will see more in the next few sections...


1- 55

Copyright (C) 2010, http://www.dabeaz.com

31

Section 2

Client Programming

Overview
Python has library modules for interacting with
a variety of standard internet services

HTTP, FTP, SMTP, NNTP, XML-RPC, etc.


In this section we're going to look at how some
of these library modules work

Main focus is on the web (HTTP)


2- 2

Copyright (C) 2010, http://www.dabeaz.com

32

urllib Module
A high level module that allows clients to
connect a variety of internet services

HTTP
HTTPS
FTP
Local files

Works with typical URLs on the web...


2- 3

Copyright (C) 2010, http://www.dabeaz.com

urllib Module
Open a web page: urlopen()
>>> import urllib
>>> u = urllib.urlopen("http://www.python/org/index.html")
>>> data = u.read()
>>> print data
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML ...
...
>>>

urlopen() returns a file-like object


Read from it to get downloaded data
2- 4

Copyright (C) 2010, http://www.dabeaz.com

33

urllib protocols
Supported protocols
u
u
u
u

=
=
=
=

urllib.urlopen("http://www.foo.com")
urllib.urlopen("https://www.foo.com/private")
urllib.urlopen("ftp://ftp.foo.com/README")
urllib.urlopen("file:///Users/beazley/blah.txt")

Note: HTTPS only supported if Python


configured with support for OpenSSL

2- 5

Copyright (C) 2010, http://www.dabeaz.com

HTML Forms
One use of urllib is to automate forms
Example HTML source for the form
<FORM ACTION="/subscribe" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">

2- 6

Copyright (C) 2010, http://www.dabeaz.com

34

HTML Forms
Within the form, you will find an action and
named parameters for the form fields

<FORM ACTION="/subscribe" METHOD="POST">


Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">

Action (a URL)
http://somedomain.com/subscribe

Parameters:
name
email

2- 7

Copyright (C) 2010, http://www.dabeaz.com

Web Services
Another use of urllib is to access web services
Downloading maps
Stock quotes
Email messages
Most of these are controlled and accessed in
the same manner as a form

There is a particular request and expected set


of parameters for different operations

2- 8

Copyright (C) 2010, http://www.dabeaz.com

35

Parameter Encoding
urlencode()
Takes a dictionary of fields and creates a
URL-encoded string of parameters
fields = {
'name' : 'Dave',
'email' : 'dave@dabeaz.com'
}
parms = urllib.urlencode(fields)

Sample result

>>> parms
'name=Dave&email=dave%40dabeaz.com'
>>>

2- 9

Copyright (C) 2010, http://www.dabeaz.com

Sending Parameters
Case 1 : GET Requests
<FORM ACTION="/subscribe" METHOD="GET">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">

Example code:
fields = { ... }
parms = urllib.urlencode(fields)
u = urllib.urlopen("http://somedomain.com/subscribe?"+parms)

You create a long URL by concatenating


the request with the parameters
http://somedomain.com/subscribe?name=Dave&email=dave%40dabeaz.com

2- 10

Copyright (C) 2010, http://www.dabeaz.com

36

Sending Parameters
Case 2 : POST Requests
<FORM ACTION="/subscribe" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">

Example code:
fields = { ... }
parms = urllib.urlencode(fields)
u = urllib.urlopen("http://somedomain.com/subscribe", parms)

Parameters get uploaded separately


as part of the request body
POST /subscribe HTTP/1.0
...
name=Dave&email=dave%40dabeaz.com

2- 11

Copyright (C) 2010, http://www.dabeaz.com

Response Data
To read response data, treat the result of
urlopen() as a file object

>>> u = urllib.urlopen("http://www.python.org")
>>> data = u.read()
>>>

Be aware that the response data consists of


the raw bytes transmitted

If there is any kind of extra encoding (e.g.,

Unicode), you will need to decode the data


with extra processing steps.
2- 12

Copyright (C) 2010, http://www.dabeaz.com

37

Response Headers
HTTP headers are retrieved using .info()
>>> u = urllib.urlopen("http://www.python.org")
>>> headers = u.info()
>>> headers
<httplib.HTTPMessage instance at 0x1118828>
>>> headers.keys()
['content-length', 'accept-ranges', 'server',
'last-modified', 'connection', 'etag', 'date',
'content-type']
>>> headers['content-length']
'13597'
>>> headers['content-type']
'text/html'
>>>

A dictionary-like object

2- 13

Copyright (C) 2010, http://www.dabeaz.com

Response Status
urlopen() ignores HTTP status codes (i.e.,
errors are silently ignored)

Can manually check the response code


u = urllib.urlopen("http://www.python.org/java")
if u.code == 200:
# success
...
elif u.code == 404:
# Not found!
...
elif u.code == 403:
# Forbidden
...

Unfortunately a little clumsy (fixed shortly)


2- 14

Copyright (C) 2010, http://www.dabeaz.com

38

Exercise 2.1

Time : 15 Minutes

2- 15

Copyright (C) 2010, http://www.dabeaz.com

urllib Limitations
urllib only works with simple cases
Does not support cookies
Does not support authentication
Does not report HTTP errors gracefully
Only supports GET/POST requests
2- 16

Copyright (C) 2010, http://www.dabeaz.com

39

urllib2 Module
urllib2 - The sequel to urllib
Builds upon and expands urllib
Can interact with servers that require
cookies, passwords, and other details

Better error handling (uses exceptions)


Is the preferred library for modern code
2- 17

Copyright (C) 2010, http://www.dabeaz.com

urllib2 Example
urllib2 provides urlopen() as before
>>> import urllib2
>>> u = urllib2.urlopen("http://www.python.org/index.html")
>>> data = u.read()
>>>

However, the module expands functionality


in two primary areas

Requests
Openers
2- 18

Copyright (C) 2010, http://www.dabeaz.com

40

urllib2 Requests
Requests are now objects
>>> r = urllib2.Request("http://www.python.org")
>>> u = urllib2.urlopen(r)
>>> data = u.read()

Requests can have additional attributes added


User data (for POST requests)
Customized HTTP headers
2- 19

Copyright (C) 2010, http://www.dabeaz.com

Requests with Data


Create a POST request with user data
data = {
'name' : 'dave',
'email' : 'dave@dabeaz.com'
}
r = urllib2.Request("http://somedomain.com/subscribe",
urllib.urlencode(data))
u = urllib2.urlopen(r)
response = u.read()

Note :You still use urllib.urlencode() from the


older urllib library

2- 20

Copyright (C) 2010, http://www.dabeaz.com

41

Request Headers
Adding/Modifying client HTTP headers
headers = {
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1; .NET CLR 2.0.50727)'
}
r = urllib2.Request("http://somedomain.com/",
headers=headers)
u = urllib2.urlopen(r)
response = u.read()

This can be used if you need to emulate a

specific client (e.g., Internet Explorer, etc.)


2- 21

Copyright (C) 2010, http://www.dabeaz.com

urllib2 Error Handling


HTTP Errors are reported as exceptions
>>> u = urllib2.urlopen("http://www.python.org/perl")
Traceback...
urllib2.HTTPError: HTTP Error 404: Not Found
>>>

Catching an error
try:
u = urllib2.urlopen(url)
except urllib2.HTTPError,e:
code = e.code
# HTTP error code

Note: urllib2 automatically tries to handle


redirection and certain HTTP responses

2- 22

Copyright (C) 2010, http://www.dabeaz.com

42

urllib2 Openers
The function urlopen() is an "opener"
It knows how to open a connection, interact
with the server, and return a response.

It only has a few basic features---it does not

know how to deal with cookies and passwords

However, you can make your own opener


objects with these features enabled

2- 23

Copyright (C) 2010, http://www.dabeaz.com

urllib2 build_opener()
build_opener() makes an custom opener
# Make a URL opener with cookie support
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor()
)
u = opener.open("http://www.python.org/index.html")

Can add a set of new features from this list


CacheFTPHandler
HTTPBasicAuthHandler
HTTPCookieProcessor
HTTPDigestAuthHandler
ProxyHandler
ProxyBasicAuthHandler
ProxyDigestAuthHandler

2- 24

Copyright (C) 2010, http://www.dabeaz.com

43

Example : Login Cookies


fields = {
'txtUsername' : 'dave',
'txtPassword' : '12345',
'submit_login' : 'Log In'
}
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor()
)
request = urllib2.Request(
"http://somedomain.com/login.asp",
urllib.urlencode(fields))
# Login
u = opener.open(request)
resp = u.read()
# Get a page, but use cookies returned by initial login
u = opener.open("http://somedomain.com/private.asp")
resp = u.read()

2- 25

Copyright (C) 2010, http://www.dabeaz.com

Discussion
urllib2 module has a huge number of options
Different configurations
File formats, policies, authentication, etc.
Will have to consult reference for everything

2- 26

Copyright (C) 2010, http://www.dabeaz.com

44

Exercise 2.2

Time : 15 Minutes
Password: guido456

2- 27

Copyright (C) 2010, http://www.dabeaz.com

Limitations
urllib and urllib2 are useful for fetching files
However, neither module provides support for
more advanced operations

Examples:
Uploading to an FTP server
File-upload via HTTP Post
Other HTTP methods (e.g., HEAD, PUT)
2- 28

Copyright (C) 2010, http://www.dabeaz.com

45

ftplib
A module for interacting with FTP servers
Example : Capture a directory listing
>>> import ftplib
>>> f = ftplib.FTP("ftp.gnu.org","anonymous",
...
"dave@dabeaz.com")
>>> files = []
>>> f.retrlines("LIST",files.append)
'226 Directory send OK.'
>>> len(files)
15
>>> files[0]
'-rw-r--r-1 0
0
1765 Feb 20 16:47 README'
>>>

2- 29

Copyright (C) 2010, http://www.dabeaz.com

Upload to a FTP Server


host
username
password
filename

=
=
=
=

"ftp.foo.com"
"dave"
"1235"
"somefile.dat"

import ftplib
ftp_serv = ftplib.FTP(host,username,password)
# Open the file you want to send
f = open(filename,"rb")
# Send it to the FTP server
resp = ftp_serv.storbinary("STOR "+filename, f)
# Close the connection
ftp_serv.close()

2- 30

Copyright (C) 2010, http://www.dabeaz.com

46

httplib
A module for implementing the client side of an
HTTP connection

import httplib
c = httplib.HTTPConnection("www.python.org",80)
c.putrequest("HEAD","/tut/tut.html")
c.putheader("Someheader","Somevalue")
c.endheaders()
r = c.getresponse()
data = r.read()
c.close()

Low-level control over HTTP headers, methods,


data transmission, etc.

2- 31

Copyright (C) 2010, http://www.dabeaz.com

smtplib
A module for sending email messages
import smtplib
serv = smtplib.SMTP()
serv.connect()
msg = """\
From: dave@dabeaz.com
To: bob@yahoo.com
Subject: Get off my lawn!
Blah blah blah"""
serv.sendmail("dave@dabeaz.com",['bob@yahoo.com'],msg)

Useful if you want to have a program send you a


notification, send email to customers, etc.

2- 32

Copyright (C) 2010, http://www.dabeaz.com

47

Exercise 2.3

Time : 15 Minutes

2- 33

Copyright (C) 2010, http://www.dabeaz.com

48

Section 3

Internet Data Handling

Overview
If you write network clients, you will have to

worry about a variety of common file formats

CSV, HTML, XML, JSON, etc.


In this section, we briefly look at library
support for working with such data

3- 2

Copyright (C) 2010, http://www.dabeaz.com

49

CSV Files
Comma Separated Values
Elwood,Blues,"1060 W Addison,Chicago 60637",110
McGurn,Jack,"4902 N Broadway,Chicago 60640",200

Parsing with the CSV module


import csv
f = open("schmods.csv","r")
for row in csv.reader(f):
# Do something with items in row
...

Understands quoting, various subtle details


3- 3

Copyright (C) 2010, http://www.dabeaz.com

Parsing HTML
Suppose you want to parse HTML (maybe
obtained via urlopen)

Use the HTMLParser module


A library that processes HTML using an
"event-driven" programming style

3- 4

Copyright (C) 2010, http://www.dabeaz.com

50

Parsing HTML
Define a class that inherits from HTMLParser
and define a set of methods that respond to
different document features
from HTMLParser import HTMLParser
class MyParser(HTMLParser):
def handle_starttag(self,tag,attrs):
...
def handle_data(self,data):
...
def handle_endtag(self,tag):
...

starttag

data

endttag

<tag attr="value" attr="value">data</tag>

3- 5

Copyright (C) 2010, http://www.dabeaz.com

Running a Parser
To run the parser, you create a parser object
and feed it some data

# Fetch a web page


import urllib
u = urllib.urlopen("http://www.example.com")
data = u.read()
# Run it through the parser
p = MyParser()
p.feed(data)

The parser will scan through the data and


trigger the various handler methods

3- 6

Copyright (C) 2010, http://www.dabeaz.com

51

HTML Example
An example:

Gather all links

from HTMLParser import HTMLParser


class GatherLinks(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.links = []
def handle_starttag(self,tag,attrs):
if tag == 'a':
for name,value in attrs:
if name == 'href':
self.links.append(value)

3- 7

Copyright (C) 2010, http://www.dabeaz.com

HTML Example
Running the parser
>>> parser = GatherLinks()
>>> import urllib
>>> data = urllib.urlopen("http://www.python.org").read()
>>> parser.feed(data)
>>> for x in parser.links:
...
print x
/search/
/about
/news/
/doc/
/download/
...
>>>

3- 8

Copyright (C) 2010, http://www.dabeaz.com

52

XML Parsing with SAX


The event-driven style used by HTMLParser is
sometimes used to parse XML

Basis of the SAX parsing interface


An approach sometimes seen when dealing

with large XML documents since it allows for


incremental processing

3- 9

Copyright (C) 2010, http://www.dabeaz.com

Brief XML Refresher


XML documents use structured markup
<contact>
<name>Elwood Blues</name>
<address>1060 W Addison</address>
<city>Chicago</city>
<zip>60616</zip>
</contact>

Documents made up of elements


<name>Elwood Blues</name>

Elements have starting/ending tags


May contain text and other elements
3- 10

Copyright (C) 2010, http://www.dabeaz.com

53

SAX Parsing
Define a special handler class
import xml.sax
class MyHandler(xml.sax.ContentHandler):
def startDocument(self):
print "Document start"
def startElement(self,name,attrs):
print "Start:", name
def characters(self,text):
print "Characters:", text
def endElement(self,name):
print "End:", name

In the class, you define methods that capture


elements and other parts of the document

3- 11

Copyright (C) 2010, http://www.dabeaz.com

SAX Parsing
To parse a document, you create an instance
of the handler and give it to the parser
# Create the handler object
hand = MyHandler()
# Parse a document using the handler
xml.sax.parse("data.xml",hand)

This reads the file and calls handler methods


as different document elements are
encountered (start tags, text, end tags, etc.)

3- 12

Copyright (C) 2010, http://www.dabeaz.com

54

Exercise 3.1

Time : 15 Minutes

3- 13

Copyright (C) 2010, http://www.dabeaz.com

XML and ElementTree


xml.etree.ElementTree module is one of
the easiest ways to parse XML

Lets look at the highlights

3- 14

Copyright (C) 2010, http://www.dabeaz.com

55

etree Parsing Basics


Parsing a document
from xml.etree.ElementTree import parse
doc = parse("recipe.xml")

This builds a complete parse tree of the


entire document

To extract data, you will perform various


kinds of queries on the document object

3- 15

Copyright (C) 2010, http://www.dabeaz.com

etree Parsing Basics


A mini-reference for extracting data
Finding one or more elements
elem = doc.find("title")
for elem in doc.findall("ingredients/item"):
statements

Element attributes and properties


elem.tag
elem.text
elem.get(aname [,default])

# Element name
# Element text
# Element attributes

3- 16

Copyright (C) 2010, http://www.dabeaz.com

56

Obtaining Elements
<?xml version="1.0" encoding="iso-8859-1"?>
<recipe>
<title>Famous Guacamole</title>
<description>
A southwest favorite!
</description>
<ingredients>
<item num="2">Large avocados, chopped</item>
doc =chopped</item>
parse("recipe.xml")
<item num="1">Tomato,
desc_elem = doc.find("description")
<item num="1/2" units="C">White
onion, chopped</item>
<item num="1" units="tbl">Fresh
squeezed lemon juice</item>
desc_text = desc_elem.text
<item num="1">Jalapeno pepper, diced</item>
<item num="1" units="tbl">Fresh cilantro, minced</item>
<item num="3" units="tsp">Sea Salt</item>
doc = parse("recipe.xml")
<item num="6" units="bottles">Ice-cold
beer</item>
desc_text = doc.findtext("description")
</ingredients>
<directions>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</directions>
</recipe>

or

3- 17

Copyright (C) 2010, http://www.dabeaz.com

Iterating over Elements


<?xml version="1.0" encoding="iso-8859-1"?>
doc = parse("recipe.xml")
<recipe>
for item in doc.findall("ingredients/item"):
<title>Famous Guacamole</title>
<description>
statements
A southwest favorite!
</description>
<ingredients>
<item num="2">Large avocados, chopped</item>
<item num="1">Tomato, chopped</item>
<item num="1/2" units="C">White onion, chopped</item>
<item num="1" units="tbl">Fresh squeezed lemon juice</item>
<item num="1">Jalapeno pepper, diced</item>
<item num="1" units="tbl">Fresh cilantro, minced</item>
<item num="3" units="tsp">Sea Salt</item>
<item num="6" units="bottles">Ice-cold beer</item>
</ingredients>
<directions>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</directions>
</recipe>

3- 18

Copyright (C) 2010, http://www.dabeaz.com

57

Element Attributes
<?xml version="1.0" encoding="iso-8859-1"?>
<recipe>
<title>Famous Guacamole</title>
<description>
A southwest favorite!
</description>
<ingredients>
<item num="2">Large avocados, chopped</item>
for
item
in doc.findall("ingredients/item"):
<item
num="1">Tomato,
chopped</item>
<item
num="1/2"
units="C">White onion, chopped</item>
num
= item.get("num")
<item
num="1"
units="tbl">Fresh squeezed lemon juice</item>
units
= item.get("units")
<item num="1">Jalapeno pepper, diced</item>
<item num="1" units="tbl">Fresh cilantro, minced</item>
<item num="3" units="tsp">Sea Salt</item>
<item num="6" units="bottles">Ice-cold beer</item>
</ingredients>
<directions>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</directions>
</recipe>

3- 19

Copyright (C) 2010, http://www.dabeaz.com

Search Wildcards
Specifying a wildcard for an element name
items = doc.findall("*/item")
items = doc.findall("ingredients/*")

The * wildcard only matches a single element


Use multiple wildcards for nesting
<?xml version="1.0"?>
<top>
<a>
<b>
<c>text</c>
</b>
</a>
</top>

c = doc.findall("*/*/c")
c = doc.findall("a/*/c")
c = doc.findall("*/b/c")

3- 20

Copyright (C) 2010, http://www.dabeaz.com

58

Search Wildcards
Wildcard for multiple nesting levels (//)
items = doc.findall("//item")

More examples
<?xml version="1.0"?>
<top>
<a>
<b>
<c>text</c>
</b>
</a>
</top>

c = doc.findall("//c")
c = doc.findall("a//c")

3- 21

Copyright (C) 2010, http://www.dabeaz.com

cElementTree
There is a C implementation of the library
that is significantly faster

import xml.etree.cElementTree
doc = xml.etree.cElementTree.parse("data.xml")

For all practical purposes, you should use


this version of the library given a choice

Note : The C version lacks a few advanced


customization features, but you probably
won't need them

3- 22

Copyright (C) 2010, http://www.dabeaz.com

59

Tree Modification
ElementTree allows modifications to be
made to the document structure

To add a new child to a parent node


node.append(child)

To insert a new child at a selected position


node.insert(index,child)

To remove a child from a parent node


node.remove(child)

3- 23

Copyright (C) 2010, http://www.dabeaz.com

Tree Output
If you modify a document, it can be rewritten
There is a method to write XML
doc = xml.etree.ElementTree.parse("input.xml")
# Make modifications to doc
...
# Write modified document back to a file
f = open("output.xml","w")
doc.write(f)

Individual elements can be turned into strings


s = xml.etree.ElementTree.tostring(node)

3- 24

Copyright (C) 2010, http://www.dabeaz.com

60

Iterative Parsing

An alternative parsing interface

from xml.etree.ElementTree import iterparse


parse = iterparse("file.xml", ('start','end'))
for event, elem in parse:
if event == 'start':
# Encountered an start <tag ...>
...
elif event == 'end':
# Encountered an end </tag>
...

This sweeps over an entire XML document


Result is a sequence of start/end events and
element objects being processed

3- 25

Copyright (C) 2010, http://www.dabeaz.com

Iterative Parsing
If you combine iterative parsing and tree
modification together, you can process
large XML documents with almost no
memory overhead

Programming interface is significantly easier


to use than a similar approach using SAX

General idea : Simply throw away the

elements no longer needed during parsing


3- 26

Copyright (C) 2010, http://www.dabeaz.com

61

Iterative Parsing
Programming pattern
from xml.etree.ElementTree import iterparse
parser = iterparse("file.xml",('start','end'))
for event,elem in parser:
if event == 'start':
if elem.tag == 'parenttag':
parent = elem
if event == 'end':
if elem.tag == 'tagname':
# process element with tag 'tagname'
...
# Discard the element when done
parent.remove(elem)

The last step is the critical part


3- 27

Copyright (C) 2010, http://www.dabeaz.com

Exercise 3.2

Time : 15 Minutes

3- 28

Copyright (C) 2010, http://www.dabeaz.com

62

JSON
Javascript Object Notation
A data encoding commonly used on the
web when interacting with Javascript

Sometime preferred over XML because it's


less verbose and faster to parse

Syntax is almost identical to a Python dict


3- 29

Copyright (C) 2010, http://www.dabeaz.com

Sample JSON File


{
"recipe" : {
"title" : "Famous Guacomole",
"description" : "A southwest favorite!",
"ingredients" : [
{"num": "2", "item":"Large avocados, chopped"},
{"num": "1/2", "units":"C", "item":"White onion, chopped"},
!
{"num": "1", "units":"tbl", "item":"Fresh squeezed lemon juice"},
!
{"num": "1", "item":"Jalapeno pepper, diced"},
!
{"num": "1", "units":"tbl", "item":"Fresh cilantro, minced"},
!
{"num": "3", "units":"tsp", "item":"Sea Salt"},
!
{"num": "6", "units":"bottles","item":"Ice-cold beer"}
!
],
"directions" : "Combine all ingredients and hand whisk to desired
consistency. Serve and enjoy with ice-cold beers."
}
}

3- 30

Copyright (C) 2010, http://www.dabeaz.com

63

Processing JSON Data


Parsing a JSON document
import json
doc = json.load(open("recipe.json"))

Result is a collection of nested dict/lists


ingredients = doc['recipe']['ingredients']
for item in ingredients:
# Process item
...

Dumping a dictionary as JSON


f = open("file.json","w")
json.dump(doc,f)

3- 31

Copyright (C) 2010, http://www.dabeaz.com

Exercise 3.3

Time : 15 Minutes

3- 32

Copyright (C) 2010, http://www.dabeaz.com

64

Section 4

Web Programming Basics

Introduction
The web is (obviously) so pervasive,

knowing how to write simple web-based


applications is basic knowledge that all
programmers should know about

In this section, we cover the absolute

basics of how to make a Python program


accessible through the web

4- 2

Copyright (C) 2010, http://www.dabeaz.com

65

Overview
Some basics of Python web programming
HTTP Protocol
CGI scripting
WSGI (Web Services Gateway Interface)
Custom HTTP servers
4- 3

Copyright (C) 2010, http://www.dabeaz.com

Disclaimer
Web programming is a huge topic that
could span an entire multi-day class

It might mean different things


Building an entire website
Implementing a web service
Our focus is on some basic mechanisms

found in the Python standard library that all


Python programmers should know about
4- 4

Copyright (C) 2010, http://www.dabeaz.com

66

HTTP Explained
HTTP is the underlying protocol of the web
Consists of requests and responses
GET /index.html

Browser

200 OK
...
<content>

Web Server

4- 5

Copyright (C) 2010, http://www.dabeaz.com

HTTP Client Requests


Client (Browser) sends a request

GET /index.html HTTP/1.1


Host: www.python.org
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.3)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,tex
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
<blank line>

Request line followed by headers that provide


additional information about the client

4- 6

Copyright (C) 2010, http://www.dabeaz.com

67

HTTP Responses
Server sends back a response
HTTP/1.1 200 OK
Date: Thu, 26 Apr 2007 19:54:01 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3
Last-Modified: Thu, 26 Apr 2007 18:40:24 GMT
Accept-Ranges: bytes
Content-Length: 14315
Connection: close
Content-Type: text/html
<HTML>
...

Response line followed by headers that

further describe the response contents


4- 7

Copyright (C) 2010, http://www.dabeaz.com

HTTP Protocol
There are a small number of request types
GET
POST
HEAD
PUT

There are standardized response codes


200
403
404
501
...

OK
Forbidden
Not Found
Not implemented

But, this isn't an exhaustive tutorial


4- 8

Copyright (C) 2010, http://www.dabeaz.com

68

Content Encoding
Content is described by these header fields:
Content-type:
Content-length:

Example:
Content-type: image/jpeg
Content-length: 12422

Of these, Content-type is the most critical


Length is optional, but it's polite to include it if
it can be determined in advance

4- 9

Copyright (C) 2010, http://www.dabeaz.com

Payload Packaging
Responses must follow this formatting
Headers
...
Content-type: image/jpeg
Content-length: 12422
...
\r\n

(Blank Line)

Content
(12422 bytes)

4- 10

Copyright (C) 2010, http://www.dabeaz.com

69

Exercise 4.1

Time : 10 Minutes

4- 11

Copyright (C) 2010, http://www.dabeaz.com

Role of Python
Most web-related Python programming
pertains to the operation of the server
GET /index.html

Firefox
Safari
Internet Explorer
etc.

Web Server
Apache
Python
MySQL
etc.

Python scripts used on the server to create,


manage, or deliver content back to clients

4- 12

Copyright (C) 2010, http://www.dabeaz.com

70

Typical Python Tasks


Static content generation.

One-time
generation of static web pages to be served
by a standard web server such as Apache.

Dynamic content generation.

Python scripts
that produce output in response to requests
(e.g., form processing, CGI scripting).

4- 13

Copyright (C) 2010, http://www.dabeaz.com

Content Generation
It is often overlooked, but Python is a useful
tool for simply creating static web pages

Example : Taking various pages of content,


adding elements, and applying a common
format across all of them.

Web server simply delivers all of the generated


content as normal files

4- 14

Copyright (C) 2010, http://www.dabeaz.com

71

Example : Page Templates


Create a page "template" file
<html>
<body>
<table width=700>
<tr><td>
!
Your Logo : Navigation Links
!
<hr>
! </td></tr>
Note the
<tr><td>
special
!
$content
!
<hr>
$variable
!
<em>Copyright (C) 2008</em>
!
</td></tr>
</table>
</body>
</html>

4- 15

Copyright (C) 2010, http://www.dabeaz.com

Example : Page Templates


Use template strings to render pages
from string import Template
# Read the template string
pagetemplate = Template(open("template.html").read())
# Go make content
page = make_content()
# Render the template to a file
f = open(outfile,"w")
f.write(pagetemplate.substitute(content=page))

Key idea : If you want to change the

appearance, you just change the template


4- 16

Copyright (C) 2010, http://www.dabeaz.com

72

Commentary
Using page templates to generate static
content is extremely common

For simple things, just use the standard library


modules (e.g., string.Template)

For more advanced applications, there are


numerous third-party template packages

4- 17

Copyright (C) 2010, http://www.dabeaz.com

Exercise 4.2

Time : 10 Minutes

4- 18

Copyright (C) 2010, http://www.dabeaz.com

73

HTTP Servers
Python comes with libraries that implement
simple self-contained web servers

Very useful for testing or special situations

where you want web service, but don't want


to install something larger (e.g., Apache)

Not high performance, sometimes "good


enough" is just that

4- 19

Copyright (C) 2010, http://www.dabeaz.com

A Simple Web Server


Serve files from a directory
from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler
import os
os.chdir("/home/docs/html")
serv = HTTPServer(("",8080),SimpleHTTPRequestHandler)
serv.serve_forever()

This creates a minimal web server


Connect with a browser and try it out
4- 20

Copyright (C) 2010, http://www.dabeaz.com

74

Exercise 4.3

Time : 10 Minutes

4- 21

Copyright (C) 2010, http://www.dabeaz.com

A Web Server with CGI


Serve files and allow CGI scripts
from BaseHTTPServer import HTTPServer
from CGIHTTPServer import CGIHTTPRequestHandler
import os
os.chdir("/home/docs/html")
serv = HTTPServer(("",8080),CGIHTTPRequestHandler)
serv.serve_forever()

Executes scripts in "/cgi-bin" and "/htbin"

directories in order to create dynamic content

4- 22

Copyright (C) 2010, http://www.dabeaz.com

75

CGI Scripting
Common Gateway Interface
A common protocol used by existing web
servers to run server-side scripts, plugins

Example: Running Python, Perl, Ruby scripts


under Apache, etc.

Classically associated with form processing,


but that's far from the only application

4- 23

Copyright (C) 2010, http://www.dabeaz.com

CGI Example
A web-page might have a form on it
Here is the underlying HTML code
<FORM ACTION="/cgi-bin/subscribe.py" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">

Specifies a CGI program on the server


4- 24

Copyright (C) 2010, http://www.dabeaz.com

76

CGI Example
Forms have submitted fields or parameters
<FORM ACTION="/cgi-bin/subscribe.py" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">

A request will include both the URL (cgi-bin/


subscribe.py) along with the field values

4- 25

Copyright (C) 2010, http://www.dabeaz.com

CGI Example
Request encoding looks like this:
Request

POST /cgi-bin/subscribe.py HTTP/1.1


User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS
Accept: text/xml,application/xml,application/xhtml
Accept-Language: en-us,en;q=0.5
...

Query
String

name=David+Beazley&email=dave%40dabeaz.com&submitbutton=Subscribe HTTP/1.1

Request tells the server what to run


Query string contains encoded form fields
4- 26

Copyright (C) 2010, http://www.dabeaz.com

77

CGI Mechanics
CGI was originally implemented as a scheme

for launching processing scripts as a subprocess


to a web server
/cgi-bin/subscribe.py

HTTP Server

Script will decode the


request and carry out
some kind of action

stdin

stdout

Python
subscribe.py

4- 27

Copyright (C) 2010, http://www.dabeaz.com

Classic CGI Interface


Server populates environment variables with
information about the request
import os
os.environ['SCRIPT_NAME']
os.environ['REMOTE_ADDR']
os.environ['QUERY_STRING']
os.environ['REQUEST_METHOD']
os.environ['CONTENT_TYPE']
os.environ['CONTENT_LENGTH']
os.environ['HTTP_COOKIE']
...

stdin/stdout provide I/O link to server


sys.stdin
sys.stdout

# Read to get data sent by client


# Write to create the response

4- 28

Copyright (C) 2010, http://www.dabeaz.com

78

CGI Query Variables


For GET requests, an env. variable is used
query = os.environ['QUERY_STRING']

For POST requests, you read from stdin


if os.environ['REQUEST_METHOD'] == 'POST':
size = int(os.environ['CONTENT_LENGTH'])
query = sys.stdin.read(size)

This yields the raw query string


name=David+Beazley&email=dave
%40dabeaz.com&submit-button=Subscribe

4- 29

Copyright (C) 2010, http://www.dabeaz.com

cgi Module
A utility library for decoding requests
Major feature: Getting the passed parameters
#!/usr/bin/env python
# subscribe.py
import cgi
form = cgi.FieldStorage()

Parse parameters

# Get various field values


name = form.getvalue('name')
email = form.getvalue('email')

All CGI scripts start like this


FieldStorage parses the incoming request into
a dictionary-like object for extracting inputs

4- 30

Copyright (C) 2010, http://www.dabeaz.com

79

CGI Responses

CGI scripts respond by simply printing

response headers and the raw content


name = form.getvalue('name')
email = form.getvalue('email')
... do some kind of processing ...
# Output a response
print "Status: 200 OK"
print "Content-type: text/html"
print
print "<html><head><title>Success!</title></head><body>"
print "Hello %s, your email is %s" % (name,email)
print "</body>"

Normally you print HTML, but any kind of

data can be returned (for web services, you


might return XML, JSON, etc.)
4- 31

Copyright (C) 2010, http://www.dabeaz.com

Note on Status Codes


In CGI, the server status code is set by

including a special "Status:" header field


import cgi
form = cgi.FieldStorage()
name = form.getvalue('name')
email = form.getvalue('email')
...
print "Status: 200 OK"
print "Content-type: text/html"
print
print "<html><head><title>Success!</title></head><body>"
print "Hello %s, your email is %s" % (name,email)
print "</body>"

This is a special server directive that sets the


response status

4- 32

Copyright (C) 2010, http://www.dabeaz.com

80

CGI Commentary
There are many more minor details (consult
a reference on CGI programming)

The basic idea is simple


Server runs a script
Script receives inputs from

environment variables and stdin

Script produces output on stdout

It's old-school, but sometimes it's all you get


4- 33

Copyright (C) 2010, http://www.dabeaz.com

Exercise 4.4

Time : 25 Minutes

4- 34

Copyright (C) 2010, http://www.dabeaz.com

81

WSGI
Web Services Gateway Interface (WSGI)
This is a standardized interface for creating
Python web services

Allows one to create code that can run under a


wide variety of web servers and frameworks as
long as they also support WSGI (and most do)

So, what is WSGI?


4- 35

Copyright (C) 2010, http://www.dabeaz.com

WSGI Interface
WSGI is an application programming interface
loosely based on CGI programming

In CGI, there are just two basic features


Getting values of inputs (env variables)
Producing output by printing
WSGI takes this concept and repackages it into
a more modular form

4- 36

Copyright (C) 2010, http://www.dabeaz.com

82

WSGI Example
With WSGI, you write an "application"
An application is just a function (or callable)
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response

This function encapsulates the handling of some


request that will be received

4- 37

Copyright (C) 2010, http://www.dabeaz.com

WSGI Applications
Applications always receive just two inputs
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response

environ - A dictionary of input parameters


start_response - A callable (e.g., function)
4- 38

Copyright (C) 2010, http://www.dabeaz.com

83

WSGI Environment
The environment contains CGI variables
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
environ['REQUEST_METHOD']
environ['SCRIPT_NAME']
start_response(status,response_headers)
environ['PATH_INFO']
response.append("Hello World\n")
environ['QUERY_STRING']
response.append("You requested :"+environ['PATH_INFO]')
environ['CONTENT_TYPE']
return response
environ['CONTENT_LENGTH']
environ['SERVER_NAME']
...

The meaning and values are exactly the same as


in traditional CGI programs

4- 39

Copyright (C) 2010, http://www.dabeaz.com

WSGI Environment
Environment also contains some WSGI variables
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
environ['wsgi.input']
environ['wsgi.errors']
start_response(status,response_headers)
environ['wsgi.url_scheme']
response.append("Hello World\n")
environ['wsgi.multithread']
response.append("You requested :"+environ['PATH_INFO]')
environ['wsgi.multiprocess']
return response
...

wsgi.input - A file-like object for reading data


wsgi.errors - File-like object for error output
4- 40

Copyright (C) 2010, http://www.dabeaz.com

84

Processing WSGI Inputs


Parsing of query strings is similar to CGI
import cgi
def sample_app(environ,start_response):
fields = cgi.FieldStorage(environ['wsgi.input'],
environ=environ)
# fields now has the CGI query variables
...

You use FieldStorage() as before, but give it

extra parameters telling it where to get data

4- 41

Copyright (C) 2010, http://www.dabeaz.com

WSGI Responses

The second argument is a function that is called


to initiate a response

def hello_app(environ, start_response):


status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response

You pass it two parameters


A status string (e.g., "200 OK")
A list of (header, value) HTTP header pairs
Copyright (C) 2010, http://www.dabeaz.com

85

4- 42

WSGI Responses
start_response() is a hook back to the server
Gives the server information for formulating
the response (status, headers, etc.)

Prepares the server for receiving content data

4- 43

Copyright (C) 2010, http://www.dabeaz.com

WSGI Content
Content is returned as a sequence of byte strings
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response

Note: This differs from CGI programming


where you produce output using print.

4- 44

Copyright (C) 2010, http://www.dabeaz.com

86

WSGI Content Encoding

WSGI applications must always produce bytes


If working with Unicode, it must be encoded
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/html')]
start_response(status,response_headers)
return [u"That's a spicy Jalape\u00f1o".encode('utf-8')]

This is a little tricky--if you're not anticipating

Unicode, everything can break if a Unicode


string is returned (be aware that certain
modules such as database modules may do this)
4- 45

Copyright (C) 2010, http://www.dabeaz.com

WSGI Deployment
The main point of WSGI is to simplify
deployment of web applications

You will notice that the interface depends on

no third party libraries, no objects, or even any


standard library modules

That is intentional. WSGI apps are supposed to


be small self-contained units that plug into
other environments

4- 46

Copyright (C) 2010, http://www.dabeaz.com

87

WSGI Deployment
Running a simple stand-alone WSGI server
from wsgiref import simple_server
httpd = simple_server.make_server("",8080,hello_app)
httpd.serve_forever()

This runs an HTTP server for testing


You probably wouldn't deploy anything using

this, but if you're developing code on your own


machine, it can be useful

4- 47

Copyright (C) 2010, http://www.dabeaz.com

WSGI and CGI


WSGI applications can run on top of standard
CGI scripting (which is useful if you're
interfacing with traditional web servers).
#!/usr/bin/env python
# hello.py
def hello_app(environ,start_response):
...
import wsgiref.handlers
wsgiref.handlers.CGIHandler().run(hello_app)

4- 48

Copyright (C) 2010, http://www.dabeaz.com

88

Exercise 4.5

Time : 20 Minutes

4- 49

Copyright (C) 2010, http://www.dabeaz.com

Customized HTTP
Can implement customized HTTP servers
Use BaseHTTPServer module
Define a customized HTTP handler object
Requires some knowledge of the underlying
HTTP protocol

4- 50

Copyright (C) 2010, http://www.dabeaz.com

89

Customized HTTP

Example: A Hello World Server

from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer


class HelloHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == '/hello':
self.send_response(200,"OK")
self.send_header('Content-type','text/plain')
self.end_headers()
self.wfile.write("""<HTML>
<HEAD><TITLE>Hello</TITLE></HEAD>
<BODY>Hello World!</BODY></HTML>""")
serv = HTTPServer(("",8080),HelloHandler)
serv.serve_forever()

Defined a method for "GET" requests


4- 51

Copyright (C) 2010, http://www.dabeaz.com

Customized HTTP

A more complex server

from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer


class MyHandler(BaseHTTPRequestHandler):
def do_GET(self):
...
Redefine the behavior of the
def do_POST(self):
server by defining code for
...
def do_HEAD(self):
all of the standard HTTP
...
request types
def do_PUT(self):
...
serv = HTTPServer(("",8080),MyHandler)
serv.serve_forever()

Can customize everything (requires work)


4- 52

Copyright (C) 2010, http://www.dabeaz.com

90

Exercise 4.6

Time : 15 Minutes

4- 53

Copyright (C) 2010, http://www.dabeaz.com

Web Frameworks

Python has a huge number of web frameworks


Zope
Django
Turbogears
Pylons
CherryPy
Google App Engine
Frankly, there are too many to list here..
4- 54

Copyright (C) 2010, http://www.dabeaz.com

91

Web Frameworks
Web frameworks build upon previous concepts
Provide additional support for
Form processing
Cookies/sessions
Database integration
Content management
Usually require their own training course
4- 55

Copyright (C) 2010, http://www.dabeaz.com

Commentary
If you're building small self-contained

components or middleware for use on the


web, you're probably better off with WSGI

The programming interface is minimal


The components you create will be self-

contained if you're careful with your design

Since WSGI is an official part of Python,

virtually all web frameworks will support it


4- 56

Copyright (C) 2010, http://www.dabeaz.com

92

Section 5

Advanced Networking

Overview
An assortment of advanced networking topics
The Python network programming stack
Concurrent servers
Distributed computing
Multiprocessing
5- 2

Copyright (C) 2010, http://www.dabeaz.com

93

Problem with Sockets


In part 1, we looked at low-level programming
with sockets

Although it is possible to write applications

based on that interface, most of Python's


network libraries use a higher level interface

For servers, there's the SocketServer module


5- 3

Copyright (C) 2010, http://www.dabeaz.com

SocketServer
A module for writing custom servers
Supports TCP and UDP networking
The module aims to simplify some of the

low-level details of working with sockets and


put to all of that functionality in one place

5- 4

Copyright (C) 2010, http://www.dabeaz.com

94

SocketServer Example
To use SocketServer, you define handler
objects using classes

Example: A time server


import SocketServer
import time
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime()+"\n")
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()

5- 5

Copyright (C) 2010, http://www.dabeaz.com

SocketServer Example
Handler Class
Server is implemented
by a handler class

import SocketServer
import time

class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime()+"\n")
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()

5- 6

Copyright (C) 2010, http://www.dabeaz.com

95

SocketServer Example
Handler Class

Must inherit from


BaseRequestHandler

import SocketServer
import time

class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()

5- 7

Copyright (C) 2010, http://www.dabeaz.com

SocketServer Example
handle() method
import SocketServer
import time

Define handle()
to implement the
server action
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()

5- 8

Copyright (C) 2010, http://www.dabeaz.com

96

SocketServer Example
Client socket connection
import SocketServer
import time
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
Socket object
serv.serve_forever()

for client connection

This is a bare socket object


5- 9

Copyright (C) 2010, http://www.dabeaz.com

SocketServer Example
Creating and running the server
import SocketServer
import time

Creates a server and


class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
connects a handler
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()

Runs the server


forever

5- 10

Copyright (C) 2010, http://www.dabeaz.com

97

Execution Model
Server runs in a loop waiting for requests
On each connection, the server creates a
new instantiation of the handler class

The handle() method is invoked to handle

the logic of communicating with the client

When handle() returns, the connection is

closed and the handler instance is destroyed


5- 11

Copyright (C) 2010, http://www.dabeaz.com

Exercise 5.1

Time : 15 Minutes

5- 12

Copyright (C) 2010, http://www.dabeaz.com

98

Big Picture
A major goal of SocketServer is to simplify

the task of plugging different server handler


objects into different kinds of server
implementations

For example, servers with different

implementations of concurrency, extra


security features, etc.

5- 13

Copyright (C) 2010, http://www.dabeaz.com

Concurrent Servers
SocketServer supports different kinds of
concurrency implementations

TCPServer
- Synchronous TCP server (one client)
ForkingTCPServer
- Forking server (multiple clients)
ThreadingTCPServer - Threaded server (multiple clients)

Just pick the server that you want and plug


the handler object into it

serv = SocketServer.ForkingTCPServer(("",8000),TimeHandler)
serv.serve_forever()
serv = SocketServer.ThreadingTCPServer(("",8000),TimeHandler)
serv.serve_forever()

5- 14

Copyright (C) 2010, http://www.dabeaz.com

99

Server Mixin Classes


SocketServer defines these mixin classes
ForkingMixIn
ThreadingMixIn

These can be used to add concurrency to

other server objects (via multiple inheritance)


from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler
from SocketServer import ThreadingMixIn
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
pass
serv = ThreadedHTTPServer(("",8080),
SimpleHTTPRequestHandler)

5- 15

Copyright (C) 2010, http://www.dabeaz.com

Server Subclassing
SocketServer objects are also subclassed to
provide additional customization

Example: Security/Firewalls
class RestrictedTCPServer(TCPServer):
# Restrict connections to loopback interface
def verify_request(self,request,addr):
host, port = addr
if host != '127.0.0.1':
return False
else:
return True
serv = RestrictedTCPServer(("",8080),TimeHandler)
serv.serve_forever()

5- 16

Copyright (C) 2010, http://www.dabeaz.com

100

Exercise 5.2

Time : 15 Minutes

5- 17

Copyright (C) 2010, http://www.dabeaz.com

Distributed Computing
It is relatively simple to build Python

applications that span multiple machines or


operate on clusters

5- 18

Copyright (C) 2010, http://www.dabeaz.com

101

Discussion
Keep in mind: Python is a "slow" interpreted
programming language

So, we're not necessarily talking about high


performance computing in Python (e.g.,
number crunching, etc.)

However, Python can serve as a very useful


distributed scripting environment for
controlling things on different systems

5- 19

Copyright (C) 2010, http://www.dabeaz.com

XML-RPC
Remote Procedure Call
Uses HTTP as a transport protocol
Parameters/Results encoded in XML
Supported by languages other than Python

5- 20

Copyright (C) 2010, http://www.dabeaz.com

102

Simple XML-RPC
How to create a stand-alone server

from SimpleXMLRPCServer import SimpleXMLRPCServer


def add(x,y):
return x+y
s = SimpleXMLRPCServer(("",8080))
s.register_function(add)
s.serve_forever()

How to test it (xmlrpclib)

>>> import xmlrpclib


>>> s = xmlrpclib.ServerProxy("http://localhost:8080")
>>> s.add(3,5)
8
>>> s.add("Hello","World")
"HelloWorld"
>>>

5- 21

Copyright (C) 2010, http://www.dabeaz.com

Simple XML-RPC
Adding multiple functions
from SimpleXMLRPCServer import SimpleXMLRPCServer
s = SimpleXMLRPCServer(("",8080))
s.register_function(add)
s.register_function(foo)
s.register_function(bar)
s.serve_forever()

Registering an instance (exposes all methods)


from SimpleXMLRPCServer import SimpleXMLRPCServer
s = SimpleXMLRPCServer(("",8080))
obj = SomeObject()
s.register_instance(obj)
s.serve_forever()

5- 22

Copyright (C) 2010, http://www.dabeaz.com

103

XML-RPC Commentary
XML-RPC is extremely easy to use
Almost too easy--you might get the perception
that it's extremely limited or fragile

I have encountered a lot of major projects that


are using XML-RPC for distributed control

Users seem to love it (I concur)


5- 23

Copyright (C) 2010, http://www.dabeaz.com

XML-RPC and Binary


One wart of caution...
XML-RPC assumes all strings are UTF-8
encoded Unicode

Consequence:You can't shove a string of raw


binary data through an XML-RPC call

For binary: must base64 encode/decode


base64 module can be used for this
5- 24

Copyright (C) 2010, http://www.dabeaz.com

104

Exercise 5.3

Time : 15 Minutes

5- 25

Copyright (C) 2010, http://www.dabeaz.com

Serializing Python Objects


In distributed applications, you may want to

pass various kinds of Python objects around


(e.g., lists, dicts, sets, instances, etc.)

Libraries such as XML-RPC support simple


data types, but not anything more complex

However, serializing arbitrary Python objects


into byte-strings is quite simple

5- 26

Copyright (C) 2010, http://www.dabeaz.com

105

pickle Module
A module for serializing objects
Serializing an object onto a "file"
import pickle
...
pickle.dump(someobj,f)

Unserializing an object from a file


someobj = pickle.load(f)

Here, a file might be a file, a pipe, a wrapper


around a socket, etc.

5- 27

Copyright (C) 2010, http://www.dabeaz.com

Pickling to Strings
Pickle can also turn objects into byte strings
import pickle
# Convert to a string
s = pickle.dumps(someobj, protocol)
...
# Load from a string
someobj = pickle.loads(s)

This can be used if you need to embed a

Python object into some other messaging


protocol or data encoding

5- 28

Copyright (C) 2010, http://www.dabeaz.com

106

Example
Using pickle with XML-RPC
# addserv.py
import pickle
def add(px,py):
x = pickle.loads(px)
y = pickle.loads(py)
return pickle.dumps(x+y)
from SimpleXMLRPCServer import SimpleXMLRPCServer
serv = SimpleXMLRPCServer(("",15000))
serv.register_function(add)
serv.serve_forever()

Notice: All input arguments and return values


are encoded/decoded with pickle

5- 29

Copyright (C) 2010, http://www.dabeaz.com

Example
Passing Python objects from the client
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
[1,
>>>

import pickle
import xmlrpclib
serv = xmlrpclib.ServerProxy("http://localhost:15000")
a = [1,2,3]
b = [4,5]
r = serv.add(pickle.dumps(a),pickle.dumps(b))
c = pickle.loads(r)
c
2, 3, 4, 5]

Again, all input and return values are processed


through pickle

5- 30

Copyright (C) 2010, http://www.dabeaz.com

107

Miscellaneous Comments
Pickle is really only useful if used in a Pythononly environment

Would not use if you need to communicate


to other programming languages

There are also security concerns


Never use pickle with untrusted clients

(malformed pickles can be used to execute


arbitrary system commands)
5- 31

Copyright (C) 2010, http://www.dabeaz.com

Exercise 5.4

Time : 15 Minutes

5- 32

Copyright (C) 2010, http://www.dabeaz.com

108

multiprocessing
Python 2.6/3.0 include a new library module
(multiprocessing) that can be used for
different forms of distributed computation

It is a substantial module that also addresses


interprocess communication, parallel
computing, worker pools, etc.

Will only show a few network features here


5- 33

Copyright (C) 2010, http://www.dabeaz.com

Connections
Creating a dedicated connection between
two Python interpreter processes

Listener (server) process


from multiprocessing.connection import Listener
serv = Listener(("",16000),authkey="12345")
c = serv.accept()

Client process
from multiprocessing.connection import Client
c = Client(("servername",16000),authkey="12345")

On surface, looks similar to a TCP connection


5- 34

Copyright (C) 2010, http://www.dabeaz.com

109

Connection Use
Connections allow bidirectional message
passing of arbitrary Python objects
c
c.send(obj)

obj = c.recv()

Underneath the covers, everything routes


through the pickle module

Similar to a network connection except that


you just pass objects through it

5- 35

Copyright (C) 2010, http://www.dabeaz.com

Example
Example server using multiprocessing
# addserv.py
def add(x,y):
return x+y
from multiprocessing.connection import Listener
serv = Listener(("",16000),authkey="12345")
c = serv.accept()
while True:
x,y = c.recv()
# Receive a pair
c.send(add(x,y))
# Send result of add(x,y)

Note: Omitting a variety of error checking/


exception handling

5- 36

Copyright (C) 2010, http://www.dabeaz.com

110

Example
Client connection with multiprocessing
>>>
>>>
>>>
>>>
>>>
>>>
>>>
[1,
>>>

from multiprocessing.connection import Client


client = Client(("",16000),authkey="12345")
a = [1,2,3]
b = [4,5]
client.send((a,b))
c = client.recv()
c
2, 3, 4, 5]

Even though pickle is being used underneath


the covers, you don't see it here

5- 37

Copyright (C) 2010, http://www.dabeaz.com

Commentary
Multiprocessing module already does the

work related to pickling, error handling, etc.

Can use it as the foundation for something


more advanced

There are many more features of

multiprocessing not shown here (e.g.,


features related to distributed objects,
parallel processing, etc.)
5- 38

Copyright (C) 2010, http://www.dabeaz.com

111

Commentary
Multiprocessing is a good choice if you're
working strictly in a Python environment

It will be faster than XML-RPC


It has some security features (authkey)
More flexible support for passing Python
objects around

5- 39

Copyright (C) 2010, http://www.dabeaz.com

What about...
CORBA? SOAP? Others?
There are third party libraries for this
Honestly, most Python programmers aren't
into big heavyweight distributed object
systems like this (too much trauma)

However, if you're into distributed objects,

you should probably look at the Pyro project


(http://pyro.sourceforge.net)
5- 40

Copyright (C) 2010, http://www.dabeaz.com

112

Network Wrap-up
Have covered the basics of network support
that's bundled with Python (standard lib)

Possible directions from here...


Concurrent programming techniques

(often needed for server implementation)

Parallel computing (scientific computing)


Web frameworks
5- 41

Copyright (C) 2010, http://www.dabeaz.com

Exercise 5.5

Time : 15 Minutes

5- 42

Copyright (C) 2010, http://www.dabeaz.com

113

Python Network Programming Index

Django, 4-54
dump() function, pickle module, 5-27
dumps() function, pickle module, 5-28

E
accept() method, of sockets, 1-19, 1-22
Address binding, TCP server, 1-20
Addressing, network, 1-4
Asynchronous network server, 1-52

B
BaseRequestHandler, SocketServer module, 5-5
bind() method, of sockets, 1-19, 1-20, 1-42
Browser, emulating in HTTP requests, 2-21
build_opener() function, urllib2 module, 2-24

C
cElementTree module, 3-22
cgi module, 4-30
CGI scripting, 4-23, 4-24, 4-25, 4-26, 4-27
CGI scripting, and WSGI, 4-48
CGI scripting, creating a response, 4-31, 4-32
CGI scripting, environment variables, 4-28
CGI scripting, I/O model, 4-28
CGI scripting, parsing query variables, 4-30
CGI scripting, query string, 4-26
CGI scripting, query variables, 4-29
CherryPy, 4-54
Client objects, multiprocessing module, 5-34
Client/Server programming, 1-8
close() method, of sockets, 1-16, 1-25
Concurrency, and socket programming, 1-46
connect() method, of sockets, 1-16
Connections, network, 1-7
Content encoding, HTTP responses, 4-9
Cookie handling and HTTP requests, 2-25
Cookies, and urllib2 module, 2-17
CORBA, 5-40
Creating custom openers for HTTP requests, 2-24
csv module, 3-3

D
Datagram, 1-43
Distributed computing, 5-18, 5-19

ElementTree module, modifying document


structure, 3-23
ElementTree module, performance, 3-22
ElementTree module, xml.etree package, 3-14
ElementTree, attributes, 3-19
ElementTree, incremental XML parsing, 3-25
ElementTree, wildcards, 3-20
ElementTree, writing XML, 3-24
End of file, of sockets, 1-32
environ variable, os module, 4-28
Error handling, HTTP requests, 2-22

F
FieldStorage object, cgi module, 4-30
File upload, via urllib, 2-28
Files, creating from a socket, 1-37
Forking server, 1-51
ForkingMixIn class, SocketServer module, 5-15
ForkingTCPServer, SocketServer module, 5-14
ForkingUDPServer, SocketServer module, 5-14
Form data, posting in an HTTP request, 2-10,
2-11, 2-20
FTP server, interacting with, 2-29
FTP, uploading files to a server, 2-30
ftplib module, 2-29

G
gethostbyaddr() function, socket module, 1-53
gethostbyname() function, socket module, 1-53
gethostname() function, socket module, 1-53
Google AppEngine, 4-54

H
Hostname, 1-4
Hostname, obtaining, 1-53
HTML, parsing of, 3-4, 3-7
HTMLParser module, 3-5, 3-7

HTTP cookies, 2-25


HTTP protocol, 4-5
HTTP request, with cookie handling, 2-25
HTTP status code, obtaining with urllib, 2-14
HTTP, client-side protocol, 2-31
HTTP, methods, 4-8
HTTP, request structure, 4-6
HTTP, response codes, 4-8
HTTP, response content encoding, 4-9
HTTP, response structure, 4-7, 4-10, 4-12
httplib module, 2-31

I
Interprocess communication, 1-44
IP address, 1-4
IPC, 1-44
IPv4 socket, 1-13
IPv6 socket, 1-13

O
Objects, serialization of, 5-26
Opener objects, urllib2 module, 2-23
OpenSSL, 2-5

P
Parsing HTML, 3-7
Parsing, JSON, 3-29
Parsing, of HTML, 3-5
pickle module, 5-27
POST method, of HTTP requests, 2-6, 2-7
Posting form data, HTTP requests, 2-10, 2-11,
2-20
Pylons, 4-54

Query string, and CGI scripting, 4-26

JSON, 3-29
json module, 3-31

L
Limitations, of urllib module, 2-28
listen() method, of sockets, 1-19, 1-21
Listener objects, multiprocessing module, 5-34
load() function, pickle module, 5-27
loads() function, pickle module, 5-28

Raw Sockets, 1-45


recv() method, of sockets, 1-16
recvfrom() method, of sockets, 1-42, 1-43
Request objects, urllib2 module, 2-19
Request-response cycle, network programming,
1-9
RFC-2822 headers, 4-6

S
M
makefile() method, of sockets, 1-37
multiprocessing module, 5-33

N
netstat, 1-6
Network addresses, 1-4, 1-7
Network programming, client-server concept, 1-8
Network programming, standard port
assignments, 1-5

sax module, xml package, 3-11


select module, 1-52
select() function, select module, 1-52
send() method, of sockets, 1-16, 1-24
sendall() method, of sockets, 1-31
Sending email, 2-32
sendto() method, of sockets, 1-42, 1-43
Serialization, of Python objects, 5-26
serve_forever() method, SocketServer, 5-5
setsockopt() method, of sockets, 1-36
settimeout() method, of sockets, 1-34
SimpleXMLRPCServer module, 5-21

simple_server module, wsgiref package, 4-46,


4-47
smtplib module, 2-32
SOAP, 5-40
socket module, 1-13
socket() function, socket module, 1-13
Socket, using for server or client, 1-15
Socket, wrapping with a file object, 1-37
Sockets, 1-12, 1-13
Sockets, and concurrency, 1-46
Sockets, asynchronous server, 1-52
Sockets, end of file indication, 1-32
Sockets, forking server example, 1-51
Sockets, partial reads and writes, 1-29
Sockets, setting a timeout, 1-34
Sockets, setting options, 1-36
Sockets, threaded server, 1-50
SocketServer module, 5-4
SocketServer, subclassing, 5-16
Standard port assignments, 1-5

UDPServer, SocketServer module, 5-14


Unix domain sockets, 1-44
Uploading files, to an FTP server, 2-30
URL, parameter encoding, 2-6, 2-7
urlencode() function, urllib module, 2-9
urllib module, 2-3
urllib module, limitations, 2-28
urllib2 module, 2-17
urllib2 module, error handling, 2-22
urllib2 module, Request objects, 2-19
urlopen() function, obtaining response headers,
2-13
urlopen() function, obtaining status code, 2-14
urlopen() function, reading responses, 2-12
urlopen() function, urllib module, 2-4
urlopen() function, urllib2 module, 2-18
urlopen(), posting form data, 2-10, 2-11, 2-20
urlopen(), supported protocols, 2-5
User-agent, setting in HTTP requests, 2-21

V
viewing open network connections, 1-6
TCP, 1-13, 1-14
TCP, accepting new connections, 1-22
TCP, address binding, 1-20
TCP, client example, 1-16
TCP, communication with client, 1-23
TCP, example with SocketServer module, 5-5
TCP, listening for connections, 1-21
TCP, server example, 1-19
TCPServer, SocketServer module, 5-10
Telnet, using with network applications, 1-10
Threaded network server, 1-50
ThreadingMixIn class, SocketServer module,
5-15
ThreadingTCPServer, SocketServer module, 5-14
ThreadingUDPServer, SocketServer module, 5-14
Threads, and network servers, 1-50
Timeout, on sockets, 1-34
Turbogears, 4-54
Twisted framework, 1-52

U
UDP, 1-13, 1-41
UDP, client example, 1-43
UDP, server example, 1-42

W
Web frameworks, 4-54, 4-55
Web programming, and WSGI, 4-35, 4-36
Web programming, CGI scripting, 4-23, 4-24,
4-25, 4-26, 4-27
Web services, 2-8
Webdav, 2-28
WSGI, 4-36
WSGI (Web Services Gateway Interface), 4-35
WSGI, and CGI environment variables, 4-39
WSGI, and wsgi.* variables, 4-40
WSGI, application inputs, 4-38
WSGI, applications, 4-37
WSGI, parsing query string, 4-41
WSGI, producing content, 4-44
WSGI, response encoding, 4-45
WSGI, responses, 4-42
WSGI, running a stand-alone server, 4-46, 4-47
WSGI, running applications within a CGI script,
4-48
WWW, see HTTP, 4-5

X
XML, element attributes, 3-19
XML, element wildcards, 3-20
XML, ElementTree interface, 3-15, 3-16
XML, ElementTree module, 3-14
XML, finding all matching elements, 3-18
XML, finding matching elements, 3-17
XML, incremental parsing of, 3-25
XML, modifying documentation structu with
ElementTree, 3-23
XML, parsing with SAX, 3-9
XML, writing to files, 3-24
XML-RPC, 5-20

Z
Zope, 4-54

You might also like