KEMBAR78
Unit-5_3 PPT on Distributed Web based System.pdf
PPT On Unit-5
Distributed Web-Based Systems
Outline
• WWW
• URL
• Web Documents
• HTTP
– Connections
– Methods
– Messages
– Caching
• Content Distribution Network
• Web Service
– Terminology
• Architecture
– Traditional Web Based Systems
– Multi-tiered Web Based Systems
• Web Server Clusters
• Web Security
– SSL
• References
World Wide Web

It is a wide distributed system with millions of clients and servers for accessing
linked documents.



Servers maintain collections of documents while clients provide users an easy-
to-use interface for presenting and accessing those documents.



A document is fetched from a server, transferred to a client, and presented on
the screen.



There is conceptually no difference between a documents stored locally or in
another part of the world for any user.



Now, Web has become more than just a simple document based system.



With the emergence of Web Services, it is becoming a system of distributed
services rather than just documents offered to any user or machine.

Uniform Resource Locator

A reference called Uniform Resource Locator (URL) is used to refer a
document.





The DNS name of its associated server along with a file name is
specified.



Example:



http://www.example.sharif.edu/notes/WebBasedDistributedSystem.ppt

WEB DOCUMENTS

A Web document does not only contain text, but it can include

all kinds of dynamic features such as
audio, video, animations, etc.

In many cases special helper applications (interpreters) are
needed, and they are integrated into the browser.




The main part of Web documents are written in a markup
language, such as



HyperText Markup Language (HTML) and



eXtensible Markup Language (XML)

WEB DOCUMENTS

HTML and XML can include tags that refer to embedded
documents, which are references to other files.






An embedded document can be a complete program executed on-
the-fly as part of displaying information.






Multipurpose Internet Mail Exchange (MIME) is used to specify
the type of an embedded document.






MIME was originally developed to provide information on the
content of e-mail messages.

WEB DOCUMENTS
Six top-level Multipurpose Internet Mail Exchange types and some common subtypes.
HTTP

All communication between the clients and servers is based
on the HTTP. Servers listen on port 80.



HTTP is a simple protocol; a client sends a request to a
server and waits for a response.



HTTP is based on TCP; whenever a client issues a
request to a server, it first sets up a TCP connection and
sends the message on that connection. The same
connection is used for receiving the response.



One of the problems with the first versions of HTTP was its
inefficient use of TCP connections.


HTTP 1.0 vs. HTTP 1.1

HTTP CONNECTIONS

A Web document is constructed from a collection of different files from
the same server.



In HTTP version 1.0 and older, each request to a server required
setting up a separate connection. When server had responded the
connection was broken down. These connections are referred as
non-persistent.



In HTTP version 1.1, several requests and their responses can be
issued without the need for a separate connection. These
connections are referred as persistent.



Furthermore, a client can issue several requests in a row without
waiting for the response to the first request which is referred as
pipelining.

HTTP CONNECTIONS
(a) Using non-persistent connections. (b) Using persistent connections.
HTTP Operations
HTTP MESSAGES (Request)
HTTP MESSAGES (Response)
Status code (Phrase): 200 (OK), 400 (Bad Request),
403 (Forbidden), and 404 (Not Found).
HTTP MESSAGES (Response)

There are also various message headers that the client
can send to the server explaining what it is able to
accept as a response

HTTP MESSAGES (Response)
HTTP Caching
• Clients often cache documents
– Challenge: update of documents
– If-Modified-Since requests to check
• HTTP 0.9/1.0 used just date
• HTTP 1.1 has an opaque “entity tag” (could be a file signature, etc.) as
well
• When/how often should the original be checked for changes?
– Check every time?
– Check each session? Day? Etc?
– Use “Expires” header
• If no Expires, often use Last-Modified as estimate
Example Cache Check Request
GET / HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
If-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMT
If-None-Match: "7a11f-10ed-3a75ae4a"
User-Agent: Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)
Host: www.intel-iris.net
Connection: Keep-Alive
17
Example Cache Check Response
HTTP/1.1 304 Not Modified
Date: Tue, 27 Mar 2001 03:50:51 GMT
Server: Apache/1.3.14 (Unix) (Red-Hat/Linux)
mod_ssl/2.7.1 OpenSSL/0.9.5a
DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24
Connection: Keep-Alive
Keep-Alive: timeout=15, max=100
ETag: "7a11f-10ed-3a75ae4a"
18
Problems
• Over 50% of all HTTP objects are un-cacheable .
• Not easily solvable
– Dynamic data : stock prices, scores, web cams
– CGI scripts : results based on passed parameters
– SSL : encrypted data is not cacheable
• Most web clients don’t handle mixed pages well : many
generic objects transferred with SSL
– Cookies : results may be based on passed data
– Hit metering : owner wants to measure # of hits for
revenue, etc.
19
Server Selection
• Lowest load :
– to balance load on servers
• Best performance :
– to improve client performance
• Any alive node :
– to provide fault tolerance
• How to direct clients to a specific server?
– Cluster load balancing : TCP hand-off
– As part of application : HTTP redirect
– As part of naming : DNS
20
Application-Based Redirection
• HTTP supports simple way to indicate that Web page
has moved (30X responses)
• Server receives Get request from client
– Decides which server is best suited for particular client and
object
– Returns HTTP redirect to that server
• May introduce additional overhead :
– multiple connection setup, name lookups, etc.
21
Naming Based
• Client does name lookup for service
• Name server chooses appropriate server address
– A record returned is “best” one for the client
• Name server could base decision on
– Server load/location must be collected
– Information in the name lookup request
• Name service client :
– typically the local name server for client
22
Web Proxy Caches
Proxy
server
client
origin
server
client
origin
server
• User configures browser: Web accesses via cache
• Browser sends all HTTP requests to cache
– Object in cache: cache returns object
– Else cache requests object from origin server, then returns object to
client 23
Content Distribution Networks
(CDNs)
• The content providers
are the CDN customers.
Content replication
• CDN company installs
hundreds of CDN servers
throughout Internet
– Close to users
• CDN replicates its
customers’ content in
CDN servers. When
provider updates
content, CDN updates
origin server
in North America
CDN distribution node
CDN server
CDN server
In U.S.A CDN server
in Asia
in Europe
servers
24
Content Distribution Networks
• Replicate content on many servers
The general organization of a CDN as a feedback-control system 25
Web Service
• Web Service:
– “software that makes services available on a network using technologies
such as XML and HTTP”
• Service-Oriented Architecture (SOA):
– “development of applications from distributed collections of smaller
loosely coupled service providers”
26
Web Services Terminology
• SOAP
– Simple Object Access Protocol
– exchanging XML messages on a network
• WSDL
– Web Service Description Language
– describing interfaces of Web services
• UDDI
– Universal Description, Discovery and Integration
– managing registries of Web services
27
Web Services Framework
28
Why a New Framework?
• CORBA, DCOM, Java/RMI, ... already exist
• XML+HTTP: platform/language neutral, widely
accepted and utilized
Web service interoperability
29
Servlets/CGI vs. Web Services
Browser
Browser GUI
Client
HTTP GET/POST Web WSDL
Server
SOAP
WSDL
Web
SOAP
WSDL
Server
W
S
D
L
Server Web
JDBC
JDBC
DB DB
30
TRADITIONAL WEB-BASED SYSTEMS

Many Web-based systems are still organized as simple client-
server architectures.



The core of a Web site: a process that has access to a local file
system storing documents.



A client interacts with Web servers through a special application
known as browser.



What’s the key function of a browser?



Responsible for displaying documents.

31
TRADITIONAL WEB-BASED SYSTEMS
32
MULTITIERED ARCHITECTURES

Web documents can be built in two ways:

Static

locates and returns the object identified in the request.

includes predefined HTML pages and JPEG or GIF files.

Webservers do not require communication with any server-side
application.

Dynamic

The request is forwarded to an application system where the resulting
reply is generated dynamically. (server-side program execution)

Although Web started as simple two-tiered client-server architecture
for static Web documents, this architecture has been extended to
support advanced type of documents.
33
MULTITIERED ARCHITECTURES

One of the first enhancements is Common Gateway Interface
(CGI): user data comes from an HTML form, specifying the
program and parameters.

34
MULTITIERED ARCHITECTURES

Because of the server-side processing many Web sites are now
organized as three-tiered architectures consisting of a Web server,
an application server, and a database server.






Server-side scripting technologies are used to generate dynamic
content:



Microsoft: Active Server Pages (ASP.NET)



Sun: Java Server Pages (JSP)



Netscape: JavaScript



Free Software Foundation: PHP



Most popular Web server software

– Apache. As of March 2007, 58% of all websites are using
it. 35
WEB SERVER CLUSTERS
• Web servers are replicated and combined with a front end
to improve performance.
36
WEB SERVER CLUSTERS

The front end can be designed in two ways:



Transport-layer switch



simply passes data sent along the TCP connection to one of the server’s,
depending on some measurement of the server’s load.



Content-aware request distribution



it first inspects the HTTP request and decides which server it should
forward that request to.



For example, if the front end always forwards requests for the same
document to the same server, the server may cache the document
resulting in better response times.

37
WEB SERVER CLUSTERS
A scalable content-aware cluster of Web servers.
38
WEB SERVER CLUSTERS

Another alternative to set up a Web Server Cluster is to use
round-robin DNS



a single domain name is associated with multiple IP addresses.




When resolving a host name, a browser would receive a list of multiple
addresses, each address corresponding a server.




Normally, browsers choose the first address on the list, but most DNS
servers circulate the entries.




As a result, simple distribution of requests over the servers in the
cluster is achieved.

Web Security Issues
• The Web has become the visible interface of the Internet

Many corporations now use the Web for advertising, marketing and sales


• Web servers might be easy to use but

Complicated to configure correctly and difficult to build without security flaws



They can serve as a security hole by which an adversary might be able to access other data and computer
systems

Threats Consequences Countermeasures
Integrity Modification of Data Loss of Information MACs and Hashes
Trojan horses Compromise of Machine
Confidentiality Eavesdropping Loss of Information Encryption
Theft of Information Privacy Breach
DoS Stopping Stopped Transactions
Filling up Disks and
Resources
Authentication ImpersonationData Misrepresentation of Signatures, MACs
Forgery User
Accept false Data
Secure the Web
• There are many strategies to securing the web
1. We may attempt to secure the IP Layer of the TCP/IP Stack: This may
be accomplished using IPSec, for example.
2. We may leave IP alone and secure on top of TCP: This may be
accomplished using the Secure Sockets Layer (SSL) or Transport
Layer Security (TLS)
3. We may seek to secure specific applications by using application-
specific security solutions: For example, we may use Secure
Electronic Transaction (SET)
• The first two provide generic solutions, while the third
provides for more specialized services
Securing the TCP/IP Stack
HTTP FTP SMTP
TCP
IP/IPSEC
At the Network Level
HTTP FTP SMTP
SSL/TLS
TCP
IP
At the Transport Level
S/MIME PGP SET
Kerberos SMTP HTTP
UDP TCP
IP
At the Application Level
Secure Sockets Layer (SSL)
• Originally developed (1994) by Netscape in order to secure http
communications
• Slight variation became Transport Layer Security (TLS)
– Backward compatible with SSL
• TCP provides a reliable end-to-end service
• Consists of two sublayers:
– SSL Record Protocol (where all the action takes place)
– SSL Management (Handshake/Cipher Change/ Alert Protocols)
Application
SSL
TCP
IP
Protocol Structure
Application
Change
Alert Handshake Cipher
Data
Spec
Record
Layer
TCP

Unit-5_3 PPT on Distributed Web based System.pdf

  • 1.
    PPT On Unit-5 DistributedWeb-Based Systems
  • 2.
    Outline • WWW • URL •Web Documents • HTTP – Connections – Methods – Messages – Caching • Content Distribution Network • Web Service – Terminology • Architecture – Traditional Web Based Systems – Multi-tiered Web Based Systems • Web Server Clusters • Web Security – SSL • References
  • 3.
    World Wide Web  Itis a wide distributed system with millions of clients and servers for accessing linked documents.    Servers maintain collections of documents while clients provide users an easy- to-use interface for presenting and accessing those documents.    A document is fetched from a server, transferred to a client, and presented on the screen.    There is conceptually no difference between a documents stored locally or in another part of the world for any user.    Now, Web has become more than just a simple document based system.    With the emergence of Web Services, it is becoming a system of distributed services rather than just documents offered to any user or machine. 
  • 4.
    Uniform Resource Locator  Areference called Uniform Resource Locator (URL) is used to refer a document.      The DNS name of its associated server along with a file name is specified.    Example:    http://www.example.sharif.edu/notes/WebBasedDistributedSystem.ppt 
  • 5.
    WEB DOCUMENTS  A Webdocument does not only contain text, but it can include  all kinds of dynamic features such as audio, video, animations, etc.  In many cases special helper applications (interpreters) are needed, and they are integrated into the browser.     The main part of Web documents are written in a markup language, such as    HyperText Markup Language (HTML) and    eXtensible Markup Language (XML) 
  • 6.
    WEB DOCUMENTS  HTML andXML can include tags that refer to embedded documents, which are references to other files.       An embedded document can be a complete program executed on- the-fly as part of displaying information.       Multipurpose Internet Mail Exchange (MIME) is used to specify the type of an embedded document.       MIME was originally developed to provide information on the content of e-mail messages. 
  • 7.
    WEB DOCUMENTS Six top-levelMultipurpose Internet Mail Exchange types and some common subtypes.
  • 8.
    HTTP  All communication betweenthe clients and servers is based on the HTTP. Servers listen on port 80.    HTTP is a simple protocol; a client sends a request to a server and waits for a response.    HTTP is based on TCP; whenever a client issues a request to a server, it first sets up a TCP connection and sends the message on that connection. The same connection is used for receiving the response.    One of the problems with the first versions of HTTP was its inefficient use of TCP connections.   HTTP 1.0 vs. HTTP 1.1 
  • 9.
    HTTP CONNECTIONS  A Webdocument is constructed from a collection of different files from the same server.    In HTTP version 1.0 and older, each request to a server required setting up a separate connection. When server had responded the connection was broken down. These connections are referred as non-persistent.    In HTTP version 1.1, several requests and their responses can be issued without the need for a separate connection. These connections are referred as persistent.    Furthermore, a client can issue several requests in a row without waiting for the response to the first request which is referred as pipelining. 
  • 10.
    HTTP CONNECTIONS (a) Usingnon-persistent connections. (b) Using persistent connections.
  • 11.
  • 12.
  • 13.
    HTTP MESSAGES (Response) Statuscode (Phrase): 200 (OK), 400 (Bad Request), 403 (Forbidden), and 404 (Not Found).
  • 14.
    HTTP MESSAGES (Response)  Thereare also various message headers that the client can send to the server explaining what it is able to accept as a response 
  • 15.
  • 16.
    HTTP Caching • Clientsoften cache documents – Challenge: update of documents – If-Modified-Since requests to check • HTTP 0.9/1.0 used just date • HTTP 1.1 has an opaque “entity tag” (could be a file signature, etc.) as well • When/how often should the original be checked for changes? – Check every time? – Check each session? Day? Etc? – Use “Expires” header • If no Expires, often use Last-Modified as estimate
  • 17.
    Example Cache CheckRequest GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate If-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMT If-None-Match: "7a11f-10ed-3a75ae4a" User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Host: www.intel-iris.net Connection: Keep-Alive 17
  • 18.
    Example Cache CheckResponse HTTP/1.1 304 Not Modified Date: Tue, 27 Mar 2001 03:50:51 GMT Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24 Connection: Keep-Alive Keep-Alive: timeout=15, max=100 ETag: "7a11f-10ed-3a75ae4a" 18
  • 19.
    Problems • Over 50%of all HTTP objects are un-cacheable . • Not easily solvable – Dynamic data : stock prices, scores, web cams – CGI scripts : results based on passed parameters – SSL : encrypted data is not cacheable • Most web clients don’t handle mixed pages well : many generic objects transferred with SSL – Cookies : results may be based on passed data – Hit metering : owner wants to measure # of hits for revenue, etc. 19
  • 20.
    Server Selection • Lowestload : – to balance load on servers • Best performance : – to improve client performance • Any alive node : – to provide fault tolerance • How to direct clients to a specific server? – Cluster load balancing : TCP hand-off – As part of application : HTTP redirect – As part of naming : DNS 20
  • 21.
    Application-Based Redirection • HTTPsupports simple way to indicate that Web page has moved (30X responses) • Server receives Get request from client – Decides which server is best suited for particular client and object – Returns HTTP redirect to that server • May introduce additional overhead : – multiple connection setup, name lookups, etc. 21
  • 22.
    Naming Based • Clientdoes name lookup for service • Name server chooses appropriate server address – A record returned is “best” one for the client • Name server could base decision on – Server load/location must be collected – Information in the name lookup request • Name service client : – typically the local name server for client 22
  • 23.
    Web Proxy Caches Proxy server client origin server client origin server •User configures browser: Web accesses via cache • Browser sends all HTTP requests to cache – Object in cache: cache returns object – Else cache requests object from origin server, then returns object to client 23
  • 24.
    Content Distribution Networks (CDNs) •The content providers are the CDN customers. Content replication • CDN company installs hundreds of CDN servers throughout Internet – Close to users • CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates origin server in North America CDN distribution node CDN server CDN server In U.S.A CDN server in Asia in Europe servers 24
  • 25.
    Content Distribution Networks •Replicate content on many servers The general organization of a CDN as a feedback-control system 25
  • 26.
    Web Service • WebService: – “software that makes services available on a network using technologies such as XML and HTTP” • Service-Oriented Architecture (SOA): – “development of applications from distributed collections of smaller loosely coupled service providers” 26
  • 27.
    Web Services Terminology •SOAP – Simple Object Access Protocol – exchanging XML messages on a network • WSDL – Web Service Description Language – describing interfaces of Web services • UDDI – Universal Description, Discovery and Integration – managing registries of Web services 27
  • 28.
  • 29.
    Why a NewFramework? • CORBA, DCOM, Java/RMI, ... already exist • XML+HTTP: platform/language neutral, widely accepted and utilized Web service interoperability 29
  • 30.
    Servlets/CGI vs. WebServices Browser Browser GUI Client HTTP GET/POST Web WSDL Server SOAP WSDL Web SOAP WSDL Server W S D L Server Web JDBC JDBC DB DB 30
  • 31.
    TRADITIONAL WEB-BASED SYSTEMS  ManyWeb-based systems are still organized as simple client- server architectures.    The core of a Web site: a process that has access to a local file system storing documents.    A client interacts with Web servers through a special application known as browser.    What’s the key function of a browser?    Responsible for displaying documents.  31
  • 32.
  • 33.
    MULTITIERED ARCHITECTURES  Web documentscan be built in two ways:  Static  locates and returns the object identified in the request.  includes predefined HTML pages and JPEG or GIF files.  Webservers do not require communication with any server-side application.  Dynamic  The request is forwarded to an application system where the resulting reply is generated dynamically. (server-side program execution)  Although Web started as simple two-tiered client-server architecture for static Web documents, this architecture has been extended to support advanced type of documents. 33
  • 34.
    MULTITIERED ARCHITECTURES  One ofthe first enhancements is Common Gateway Interface (CGI): user data comes from an HTML form, specifying the program and parameters.  34
  • 35.
    MULTITIERED ARCHITECTURES  Because ofthe server-side processing many Web sites are now organized as three-tiered architectures consisting of a Web server, an application server, and a database server.       Server-side scripting technologies are used to generate dynamic content:    Microsoft: Active Server Pages (ASP.NET)    Sun: Java Server Pages (JSP)    Netscape: JavaScript    Free Software Foundation: PHP    Most popular Web server software  – Apache. As of March 2007, 58% of all websites are using it. 35
  • 36.
    WEB SERVER CLUSTERS •Web servers are replicated and combined with a front end to improve performance. 36
  • 37.
    WEB SERVER CLUSTERS  Thefront end can be designed in two ways:    Transport-layer switch    simply passes data sent along the TCP connection to one of the server’s, depending on some measurement of the server’s load.    Content-aware request distribution    it first inspects the HTTP request and decides which server it should forward that request to.    For example, if the front end always forwards requests for the same document to the same server, the server may cache the document resulting in better response times.  37
  • 38.
    WEB SERVER CLUSTERS Ascalable content-aware cluster of Web servers. 38
  • 39.
    WEB SERVER CLUSTERS  Anotheralternative to set up a Web Server Cluster is to use round-robin DNS    a single domain name is associated with multiple IP addresses.     When resolving a host name, a browser would receive a list of multiple addresses, each address corresponding a server.     Normally, browsers choose the first address on the list, but most DNS servers circulate the entries.     As a result, simple distribution of requests over the servers in the cluster is achieved. 
  • 40.
    Web Security Issues •The Web has become the visible interface of the Internet  Many corporations now use the Web for advertising, marketing and sales   • Web servers might be easy to use but  Complicated to configure correctly and difficult to build without security flaws    They can serve as a security hole by which an adversary might be able to access other data and computer systems  Threats Consequences Countermeasures Integrity Modification of Data Loss of Information MACs and Hashes Trojan horses Compromise of Machine Confidentiality Eavesdropping Loss of Information Encryption Theft of Information Privacy Breach DoS Stopping Stopped Transactions Filling up Disks and Resources Authentication ImpersonationData Misrepresentation of Signatures, MACs Forgery User Accept false Data
  • 41.
    Secure the Web •There are many strategies to securing the web 1. We may attempt to secure the IP Layer of the TCP/IP Stack: This may be accomplished using IPSec, for example. 2. We may leave IP alone and secure on top of TCP: This may be accomplished using the Secure Sockets Layer (SSL) or Transport Layer Security (TLS) 3. We may seek to secure specific applications by using application- specific security solutions: For example, we may use Secure Electronic Transaction (SET) • The first two provide generic solutions, while the third provides for more specialized services
  • 42.
    Securing the TCP/IPStack HTTP FTP SMTP TCP IP/IPSEC At the Network Level HTTP FTP SMTP SSL/TLS TCP IP At the Transport Level S/MIME PGP SET Kerberos SMTP HTTP UDP TCP IP At the Application Level
  • 43.
    Secure Sockets Layer(SSL) • Originally developed (1994) by Netscape in order to secure http communications • Slight variation became Transport Layer Security (TLS) – Backward compatible with SSL • TCP provides a reliable end-to-end service • Consists of two sublayers: – SSL Record Protocol (where all the action takes place) – SSL Management (Handshake/Cipher Change/ Alert Protocols)
  • 44.