Distributed Systems
Principles and Paradigms
Chapter 12
(version October 15, 2007)
Maarten van Steen
Vrije Universiteit Amsterdam, Faculty of Science
Dept. Mathematics and Computer Science
Room R4.20. Tel: (020) 598 7784
E-mail:steen@cs.vu.nl, URL: www.cs.vu.nl/∼steen/
01 Introduction
02 Architectures
03 Processes
04 Communication
05 Naming
06 Synchronization
07 Consistency and Replication
08 Fault Tolerance
09 Security
10 Distributed Object-Based Systems
11 Distributed File Systems
12 Distributed Web-Based Systems
13 Distributed Coordination-Based Systems
00 – 1 /
Distributed Web-Based Systems
Essence: The WWW is a huge client-server system
with millions of servers; each server hosting thousands
of hyperlinked documents:
2. Server fetches
Client machine Server machine document from
local file
Browser Web server
OS
3. Response
1. Get document request (HTTP)
• Documents are generally represented in text (plain
text, HTML, XML)
• Alternative types: images, audio, video, but also
applications (PDF, PS)
• Documents may contain scripts that are executed
by the client-side software
12 – 1 Distributed Web-Based Systems/12.1 Architecture
Multi-tiered Architectures
Observation: Already very soon, Web sites were or-
ganized into three tiers:
3. Start process to fetch document
1. Get request 4. Database interaction
HTTP CGI
request program
6. Return result handler
5. HTML document
created
Web server CGI process Database server
12 – 2 Distributed Web-Based Systems/12.1 Architecture
Web Services
Observation: At a certain point, people started rec-
ognizing that it is was more than just user ↔ site in-
teraction: sites could offer services to other sites ⇒
standardization is then badly needed.
Client machine Server machine
Look up
a service Client Server Publish service
application application
Stub Stub
Communication SOAP Communication
subsystem subsystem
Generate stub Generate stub
from WSDL from WSDL
description description
Servicedescription
Service description(WSDL)
(WSDL)
Service description (WSDL)
Directory service (UDDI)
12 – 3 Distributed Web-Based Systems/12.1 Architecture
Clients: Web browsers
Observation: browsers form the Web’s most impor-
tant client-side sofware. They used to be simple, but
that is long ago.
Display back end
User interface
Browser engine
Rendering engine
Client-side
Network script HTML/XML
comm. interpreter parser
12 – 4 Distributed Web-Based Systems/12.2 Processes
Apache Web Server
Observation: More than 70% of all Web sites are
based on Apache. The server is internally organized
more or less according to the steps needed to process
an HTTP request:
Module Module Function Module
... ... ...
Link between
function and hook
Hook Hook Hook Hook
Apache core
Functions called per hook
Request Response
12 – 5 Distributed Web-Based Systems/12.2 Processes
Server Clusters (1/2)
Essence: To improve performance and availability,
WWW servers are often clustered in a way that is
transparent to clients:
Web Web Web Web
server server server server
LAN
Front end handles
Front all incoming requests
end and outgoing responses
Request Response
Problem: The front end may easily get overloaded,
so that special measures need to be taken.
Transport-layer switching: Front end simply passes
the TCP request to one of the servers, taking some
performance metric into account.
Content-aware distribution: Front end reads the con-
tent of the HTTP request and then selects the
best server.
12 – 6 Distributed Web-Based Systems/12.2 Processes
Server Clusters (2/2)
Question: Why can content-aware distribution be so
much better?
6. Server responses
Web
5. Forward server 3. Hand of
other f
TCP connection
messages Distributor
Other messages
Dis-
Client Switch 4. Inform patcher
Setup request switch
1. Pass setup request Distributor 2. Dispatcher selects
to a distributor server
Web
server
12 – 7 Distributed Web-Based Systems/12.2 Processes
Communication (1/2)
Essence: Communication in the Web is generally based
on HTTP; a relatively simple client-server transfer pro-
tocol having the following request messages:
Operation
Description
Head Request to return the header of a document
Get Request to return a document to the client
Put Request to store a document
Post Provide data that are to be added to a docu-
ment (collection)
Delete Request to delete a document
12 – 8 Distributed Web-Based Systems/12.3 Communication
Communication (2/2)
C/S
Header Contents
Accept C The type of documents the client can handle
Accept-Charset C The character sets are acceptable for the client
Accept- C The document encodings the client can handle
Encoding
Accept- C The natural language the client can handle
Language
Authorization C A list of the client’s credentials
WWW- S Security challenge the client should respond to
Authenticate
Date C+S Date and time the message was sent
ETag S The tags associated with the returned document
Expires S The time for how long the response remains valid
From C The client’s e-mail address
Host C The TCP address of the document’s server
If-Match C The tags the document should have
If-None-Match C The tags the document should not have
If-Modified- C Tells the server to return a document only if it has
Since been modified since the specified time
If-Unmodified- C Tells the server to return a document only if it has
Since not been modified since the specified time
Last-Modified S The time the returned document was last modified
Location S A document reference to which the client should
redirect its request
Referer C Refers to client’s most recently requested document
Upgrade C+S The application protocol sender wants to switch to
Warning C+S Information about status of the data in the message
12 – 9 Distributed Web-Based Systems/12.3 Communication
SOAP
Simple Object Access Protocol: Based on XML,
this is the standard protocol for communication be-
tween Web services.
• SOAP is bound to an underlying protocol (i.e., it
is not independent from its carrier)
• Conversational exchange style: Send a docu-
ment one way, get a filled-in response back.
• RPC-style exchange: Used to invoke a Web ser-
vice.
12 – 10 Distributed Web-Based Systems/12.3 Communication
A Note on XML
Observation: XML has the advantage of allowing self-
describing documents. Full stop (i.e., it introduces
performance problems and is not meant to be read
by human beings)
env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<n:alertcontrol xmlns:n="http://example.org/alertcontrol">
<n:priority>1</n:priority>
<n:expires>2001-06-22T14:00:00-05:00</n:expires>
</n:alertcontrol>
</env:Header>
<env:Body>
<m:alert xmlns:m="http://example.org/alert">
<m:msg>Pick up Mary at school at 2pm</m:msg>
</m:alert>
</env:Body>
</env:Envelope>
12 – 11 Distributed Web-Based Systems/12.3 Communication
Naming: URL
URL: Uniform Resource Locator tells how and where
to access a resource.
Scheme Host name Pathname
http :// www.cs.vu.nl /home/steen/mbox
(a)
Scheme Host name Port Pathname
http :// www.cs.vu.nl : 80 /home/steen/mbox
(b)
Scheme Host name Port Pathname
http :// 130.37.24.11 : 80 /home/steen/mbox
(c)
Examples:
http HTTP http://www.cs.vu.nl:80/globe
mailto Mail mailto:steen@cs.vu.nl
ftp FTP ftp://ftp.cs.vu.nl/pub/minix/README
file Local file file:/edu/book/work/chp/11/11
data Inline data data:text/plain;charset=iso-8859-7,
%e1%e2%e3
telnet Remote login telnet://flits.cs.vu.nl
tel Telephone tel:+31201234567
modem Modem modem:+31201234567;type=v32
12 – 12 Distributed Web-Based Systems/12.4
Synchronization: WebDAV
Problem: There is a growing need for collaborative
auditing of Web documents, but bare-bones HTTP can’t
help here. Solution: Web Distributed Authoring and
Versioning.
• Supports exclusive and shared write locks, which
operate on entire documents
• A lock is passed by means of a lock token; the
server registers the client(s) holding the lock
• Clients modify the document locally and post it
back to the server along with the lock token
Note: There is no specific support for crashed clients
holding a lock.
12 – 13 Distributed Web-Based Systems/12.5 Synchronization
Web Proxy Caching
Basic idea: Sites install a separate proxy server that
handles all outgoing requests. Proxies subsequently
cache incoming documents. Cache-consistency pro-
tocols:
• Always verify validity by contacting server
• Age-based consistency:
Texpire = α · ( Tcached − Tlast modi f ied ) + Tcached
• Cooperative caching, by which you first check your
neighbors on a cache miss:
Web
server
3. Forward request
to Web server
1. Look in
local cache
Web 2. Ask neighboring proxy caches Web
Cache proxy proxy Cache
Client Client Client Client Client Client
Web
HTTP Get request proxy Cache
Client Client Client
12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication in Web Hosting
Systems
Observation: By-and-large, Web hosting systems are
adopting replication to increase performance. Much
research is done to improve their organization. Fol-
lows the lines of self-managing systems:
Uncontrollable parameters (disturbance / noise)
Initial configuration Corrections Observed output
Web hosting system
+/- +/- +/-
Reference input
Replica Consistency Request Metric
placement enforcement routing estimation
Analysis
Adjustment triggers Measured output
12 – 15 Distributed Web-Based Systems/12.6 Consistency and Replication
Handling Flash Crowds
Observation: We need dynamic adjustment to bal-
ance resource usage. Flash crowds introduce a se-
rious problem:
2 days 2 days
(a) (b)
6 days 2.5 days
(c) (d)
12 – 16 Distributed Web-Based Systems/12.6 Consistency and Replication
Server Replication
Content Delivery Network: CDNs act as Web host-
ing services to replicate documents across the Inter-
net providing their customers guarantees on high avail-
ability and performance (example: Akamai).
6. Get embedded documents
CDN (if not already cached)
Cache server
5. Get embedded
documents
Return IP address 7. Embedded documents
client-best server
1. Get base document
CDN DNS 4 Origin
Client
server server
2. Document with refs
DNS lookups 3 to embedded documents
Regular
DNS system
Question: How would consistency be maintained in
this system?
12 – 17 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (1/3)
Observation: Replication becomes more difficult when
dealing with databses and such. No single best solu-
tion.
Edge-server side Origin-server side
Client query
Server Server
response
Content-blind Database
cache copy
full/partial data replication
Content-aware Authoritative
full schema replication/
cache database
Schema query templates Schema
Assumption: Updates are carried out at origin server,
and propagated to edge servers.
12 – 18 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (2/3)
Edge-server side Origin-server side
Client query
Server Server
response
Content-blind Database
cache copy
full/partial data replication
Content-aware full schema replication/ Authoritative
cache database
Schema query templates Schema
• Full replication: high read/write ratio, often in
combination with complex queries. Note: replica-
tion may possibly speed-down performance when
R/W ratio goes down.
• Partial replication: high read/write ratio, but in
combination with simple queries
12 – 19 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (3/3)
Edge-server side Origin-server side
Client query
Server Server
response
Content-blind Database
cache copy
full/partial data replication
Content-aware full schema replication/ Authoritative
cache query templates database
Schema Schema
• Content-aware caching: Check for queries at lo-
cal database, and subscribe for invalidations at
the server. Works good with range queries and
complex queries.
• Content-blind caching: Simply cache the result
of previous queries. Works great with simple queries
that address unique results (e.g., no range queries).
12 – 20 Distributed Web-Based Systems/12.6 Consistency and Replication
Security: TLS (SSL)
Transport Layer Security: Modern version of the
the Secure Socket Layer (SSL), which “sits” between
transport layer and application protocols. Relatively
simple protocol that can support mutual authentica-
tion using certificates:
1
Possibilities
2
Choices
Server
Client
3
[ K+S ] CA
4
[ K+C ] CA
5
K +S ([ R ] C)
12 – 21 Distributed Web-Based Systems/12.6 Consistency and Replication