HTTP- Hypertext Transfer Protocol (HTTP)
• The World Wide Web (WWW), also called the Web, is an information space where documents and other web
  resources are identified by Uniform Resource Locators (URLs), interlinked by hypertext links, and accessible via
  the Internet
• The original goal of the Web is to organize and retrieve information, drawing on ideas about hypertext—
  interlinked documents.
• Web is as a set of cooperating clients and servers, all of whom speak the same language: HTTP.
     ARCHITECTURE
• The WWW today is a distributed client-
  server service, in which
• a client using a browser can access a
  service using a server.
• However, the service provided is
  distributed over many locations called
  sites.
• Each site holds one or more documents,
  referred to as Web pages.
• Each Web page, however, can contain
  some links to other Web pages in the
  same or other sites.
• a Web page can be simple or composite.
  A simple Web page has no link to other
  Web pages; a composite Web page has
  one ormore links to other Web pages.
• Each Web page is a file with a name and
  address
          • A website[1] is a collection of related web pages
WEBSITE     including multimedia content, typically identified with
            a common domain name, and published on at least
            one web server.
          • examples are wikipedia.org, google.com,
            and amazon.com
          Website is a collection of webpages.
WEB PAGES
        hypertext
• The core idea of hypertext is that
• one document can link to another
  document, and the protocol (HTTP) and
  document language (HTML) were designed
  to meet that goal.
• system of interlinked documents is known as
  hypertext.
  Web Client (Browser)
• Most people are exposed to the Web
  through a graphical client program or
  web browser like Safari, Chrome,
  Firefox, or Internet Explorer.
• web browser has a function that
  allows the user to obtain an object by
  opening a URL.
Web Server
• The Web page is stored at the server. Each time a client request
  arrives, the corresponding document is sent to the client.
• To improve efficiency, servers normally store requested files in a
  cache in memory; memory is faster to access than disk.
• Some popular Web servers include Apache and Microsoft Internet
  Information Server.
      Uniform Resource Locators (URLs)
• Uniform Resource Locators
  (URLs) - provide information that
  allows objects on the Web to be
  located
• Eg:
  http://www.cs.princeton.edu/in
  dex.html
• Every Web page has an address
  so that browsers, and you, can
  find it. Every Web page has a
  URL,
                                                                    • Finally, a Web URL can have a query part at the end,
                                                                      following a question mark, eg:
                                                                    • http://airline.travel:80/index.phtml?chickens
      http     :// airinfo.travel :    80           / index.phtml   • When a URL has a query part, it tells the host computer more
       https :// www.firstpost.com                  /india/anna-      specifically what you want the page to display When you type
university-results-2018-april-may-re-evaluation-results-              a URL into your Web browser,
released-on-aucoe-annauniv-edu-5085411.html                         • you can leave out the http:// part because the browser adds
• The first item in a URL, the letters that appear before the         it for you..
   colon, is the scheme, which describes the way a browser can      • Another useful URL scheme is mailto.
   get to the resource.
                                                                    • A mailto URL looks like this:
• Following the colon are two slashes (always forward slashes,
   never backslashes) and                                           • mailto:internet12@gurus.org
• the name of the host computer on which the resource lives;        • That is, a mailto link is an e-mail address.
   in this case, airline travel .                                   • Clicking a mailto URL runs your e-mail program and creates a
• Then comes another slash and a path, which gives the name           new message addressed to the address in the link
   of the resource on that host; in this case, a file named
   index.phtml.
• Web URLs allow a few other optional parts. They can include
   a port number, which specifies, which of several programs
   running on that host should handle the request. The port
   number goes after a colon after the host name, eg:
• http://airline.travel:80/index.phtml
• The standard http port number is 80.
     Types of URL
                                                 • A relative URL typically
• A URL specifies the location of a target(file, consists only of the path, and
  directory, HTML page, image, program, and optionally, the resource, but
  so on) stored on a local or networked            no scheme or server. because
  computer.                                        it assumes the files are located
• An absolute URL contains all the information in a folder or on a server that’s
  necessary to locate a resource.                  relative to the originating
                                                   document
• An absolute URL uses the following
  format: scheme://server/path/resource          • Eg: index.html
• eg:http://www.cs.princeton.edu/index.html
• If you opened a URL, your web browser would open a TCP
  connection to the web server at a machine called
  www.cs.princeton.edu and immediately retrieve and display the
  file called index.html.
• Most files on the Web contain images and text, and many have
  other objects such as audio and video clips, pieces of code, etc.
• They also frequently include URLs that point to other files that
  may be located on other machines, which is the core of the
  “hypertext” part of HTTP and HTML.
• A web browser has some way in which you can recognize URLs
  (often by highlighting or underlining some text. These
  embedded URLs are called hypertext links.
• When you ask your web browser to open one of these
  embedded URLs (e.g., by pointing and clicking on it with a
  mouse), it will open a new connection and retrieve and display
  a new file. This is called following a link.
• It thus becomes very easy to hop from one machine to another
  around the network, following links to all sorts of information.
• to embed a link in a document and allow a user to follow that
  link to get another document-- basis of a hypertext system.
• When you ask your browser to view a page, your browser (the
  client) fetches the page from the server using HTTP running
  over TCP.
• Like SMTP, HTTP is a text-oriented protocol. HTTP is a
  request/response protocol, where every message has the
  general form
HTTP is a                 • <CRLF> stands for carriage-return+line-feed.
request/response          • The first line (START LINE) indicates whether this is a
protocol, where every       request message or a response message.
message has the general
form                      • The next set of lines specifies zero or more of these
                            MESSAGE HEADER lines—the set is terminated by a
                            blank line is a collection of options and parameters
START_LINE <CRLF>           that qualify the request or response.
MESSAGE_HEADER<CRLF>      • HTTP defines many possible header types, some of
<CRLF>                      which pertain to request messages, some to
                            response messages, and some to the data carried in
MESSAGE_BODY <CRLF>         the message body.
                          • Finally, after the blank line comes the contents of the
                            requested message (MESSAGE BODY); this part of the
                            message is where a server would place the requested
                            page when responding to a request, and it is typically
                            empty for request messages.
• Request Messages
• The first line of an HTTP request message specifies
  three things:
• the operation to be performed,
• the Web page the operation should be performed
  on, and the version of HTTP being used.
• Although HTTP defines a wide assortment of
  possible request operations—including
• write operations that allow a Web page to be
  posted on a server—
• the two most common operations are
• GET (fetch the specified Web page) and
• HEAD (fetch status information about the specified
  Web page).
• GET-used when your browser wants to retrieve and
  display a Web page.
• HEAD-used to test the validity of a hypertext link or
  to see if a particular page has been modified since
• For example, the START LINE
• GET http://www.cs.princeton.edu/index.html HTTP/1.1
.
      Conditional Request
• A client can add a condition in its
  request. In this case, the server will
  send the requested Web page if the
  condition is met or inform the client
  otherwise.
• One of the most common conditions
  imposed by the client is the time and
  date the Web page is modified.
• If-Modified-Since, which gives the
  client a way to conditionally request
  to a Web page—the server returns
  the page only if it has been modified
  since the time specified in that
  header line
• Response Messages
• Like request messages, response
  messages begin with a single START
  LINE.
• In this case, the line specifies the
  version of HTTP being used,
• a three-digit code indicating whether
  or not the request was successful, and
  a text string giving the reason for the
  response.
• HTTP/1.1 202 Accepted- server was able to satisfy
  the request.
• HTTP/1.1 404 Not Found- it was not able to satisfy
  the request because the page was not found.
      Uniform Resource Identifiers
• .A URI is a character string that identifies a resource, where a resource can be anything
  that has identity, such as a document, an image, or a service.
• The format of URIs:
• The first part of a URI is a scheme that names a particular way of identifying certain
  kind of resource, such as
• mailto for email addresses or file for file names.
• The second part of a URI, separated from the first part by a colon, is the scheme-
  specific part.
• It is a resource identifier consistent with the scheme in the first part, as in the URIs
• mailto:santa@northpole.org
• and
• file:///C:/foo.html
• A resource doesn’t have to be retrievable or accessible .
• extensible markup language (XML) namespaces are identified by URIs.
Nonpersistent Connection Vs persistent Connection
          TCP Connections:
• The original version of HTTP (1.0) established a
   separate TCP connection for each data item
   retrieved from the server.
• It’s a very inefficient mechanism:
• connection setup and teardown messages had to
   be exchanged between the client and server even
   if all the client wanted to do was verify that it had
   the most recent copy of a page. Thus, retrieving a
   page that included some text and a dozen icons or
   other small graphics would result in 13 separate
   TCP connections being established and closed.
• the sequence of events:
• for fetching a page that has just a single
   embedded object.
• Colored lines indicate TCP messages,
• while black lines indicate the HTTP requests and
   responses.
• Disadvantage:
You can see two round trip times are spent setting
up TCP connections.
latency impact,
there is also processing cost on the server to handle
the extra TCP connection establishment and
termination.
• To overcome this situation,
• HTTP version 1.1 introduced persistent
  connections—the client and server can
  exchange multiple request/ response
  messages over the same TCP connection.
• Advantages:
• Frist , eliminate the connection setup
  overhead, thereby reducing the load on
  the server,
• the load on the network caused by the
  additional TCP packets, and the delay
  understood by the user.
• Second, because a client can send
  multiple request messages down a single
  TCP connection, TCP’s congestion window
  mechanism is able to operate more
  efficiently.
• persistent connection is the case where
  the connection is already open
  (presumably due to some prior access of
  the same server).
• Disavantages:
• neither the client nor server necessarily
  knows how long to keep a particular TCP
  connection open.
• This is especially critical on the server, which
  might be asked to keep connections opened
  on behalf of thousands of clients.
• solution : the server must time out and close
  a connection if it has received no requests on
  the connection for a period of time.
• Also, both the client and server must watch
  on the other side has elected to close the
  connection, and they must use that
  information as a signal that they should close
  their side of the connection as well. both
  sides must close a TCP connection before it is
  fully terminated
• added complexity may be one reason why
  persistent connections were not used from
  the outset, but today it is widely accepted
  that the benefits of persistent connections
  more than offset the drawbacks.
•Caching                    is the temporary storage of web
    documents such as HTML pages and images. web
                                                                • Caching can be implemented in many different places.
                                                                • a user’s browser can cache recently accessed pages and
                                                                  simply display the cached copy if the user visits the same
    browser stores copies of web pages visited recently to        page again.
    reduce its bandwidth usage, server load, and lag.
                                                                • a site can support a single site-wide cache. This allows
•   a cache is a hardware or software component that stores       users to take advantage of pages previously downloaded
    data so that future requests for that data can be served      by other users.
    faster.
                                                                • Closer to the middle of the Internet, Internet Service
•   Advantages:.                                                  Providers (ISPs) can cache pages.
•   From the client’s perspective, a page that can be           • Note that, in the second case, the users within the site
    retrieved from a nearby cache can be displayed much           most likely know what machine is caching pages on behalf
    more quickly than if it has to be fetched from across the     of the site, and they configure their browsers to connect
    world.                                                        directly to the caching host.
•    From the server’s perspective, having a cache intercept    • This node is sometimes called a proxy.
    and satisfy a request reduces the load on the server.
                                                                • In contrast, the sites that connect to the ISP are probably
                                                                  not aware that the ISP is caching pages.
                                                                • It simply happens to be the case that HTTP requests
                                                                  coming out of the various sites pass through a common
                                                                  ISP router. This router can peek inside the request
                                                                  message and look at the URL for the requested page. If it
                                                                  has the page in its cache, it returns it. If not, it forwards
                                                                  the request to the server and watches for the response to
                                                                  fly by in the other direction.
                                                                • When it does, the router saves a copy in the hope that it
                                                                  can use it to satisfy a future request.
• HTTP supports proxy servers.
• A proxy server is a computer that keeps
  copies of responses to recent requests.
• The HTTP client sends a request to the               REQUEST
  proxy server. The proxy server checks its                                     REQUEST
                                               HTTP                                         TARGET
  cache. If the response is not stored in     CLIENT             PROXY SERVER               SERVER
  the cache, the proxy server sends the                               CHECKES
  request to the corresponding server.                                          If the
                                                                   CACHE        response
• Incoming responses are sent to the                                            is not
  proxy server and stored for future                                            stored in
  requests from other clients.                                                  the cache
                                                         RESPONSES
• The proxy server reduces the load on the               ARE STORED
                                                                                RESPONSE
  original server, decreases traffic, and                IN CACHE
  improves latency.
• However, to use the proxy server, the
  client must be configured to access the
  proxy instead of the target server.
                                        • PROXY SERVER acts as a server
• Note that the proxy server acts
  both as a server and client.                             REQUEST
                                                                      PROXY
• When it receives a request from              CLIENT
                                                                      SERVER
                                                           RESPONSE
  a client for which it has a
  response, it acts as a server and
  sends the response to the client.
• When it receives a request from
  a client for which it does not
                                                         REQUEST
  have a response, it first acts as a                                            REQUEST    TARGET
                                               CLIENT                   PROXY
  server and sends a request to                         NO RESPONSE     SERVER              SERVER
  the target server. When the                            RESPONSE
                                                                                 RESPONSE
  response has been received, it
  acts again as a server and sends
  the response to the client.
• No matter where pages are cached, the            More generally, there are a set of cache
  ability to cache Web pages is important          directives that must be obeyed by all caching
  enough that HTTP has been designed to            mechanisms along the request/response
  make the job easier.                             chain.
• The trick is that the cache needs to make        • These directives specify whether or not a
  sure it is not responding with an out-of-date      document can be cached,
  version of the page.
                                                   • how long it can be cached,
• For example, the server assigns an
  expiration date (the Expires header field) to    • how fresh a document must be, and so on
  each page it sends back to the client (or to a
  cache between the server and client).
• The cache remembers this date and knows
  that it need not reverify the page each time
  it is requested until after that expiration
  date has passed.
• After that time the cache can use the HEAD
  or conditional GET operation (GET with If-
  Modified-Since header line) to verify that it
  has the most recent copy of the page.
• Cache Update                                 • HTTP Security
• how long a response should remain in         • The HTTP per se does not provide
  the proxy server?before being deleted          security.
  and replaced.                                • HTTP can be run over the Secure Socket
• different strategies:.                         Layer (SSL).
• to store the list of sites whose             • In this case, HTTP is referred to asHTTPS.
  information remains the same for a           • HTTPS provides confidentiality, client and
  while.                                         server authentication, and data integrity.
• For example, a news agency may change
  its news page every morning. This means
  that a proxy server can get the news
  early in the morning and keep it until the
  next day.
• to add some headers to show the last
  modification time of the information.
  The proxy server can then use the
  information in this header to guess how
  long the information would be valid.