KEMBAR78
02 HTTP | PDF | Http Cookie | Networking
0% found this document useful (0 votes)
9 views67 pages

02 HTTP

The document provides an overview of the Hypertext Transfer Protocol (HTTP), detailing its function as a communication protocol between web servers and client browsers. It covers key concepts such as HTTP messages, methods, headers, and the stateless nature of HTTP, along with the use of cookies for session management. Additionally, it discusses the structure of URLs, HTTP transactions, and the various response status codes associated with HTTP requests.

Uploaded by

mhbhmani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views67 pages

02 HTTP

The document provides an overview of the Hypertext Transfer Protocol (HTTP), detailing its function as a communication protocol between web servers and client browsers. It covers key concepts such as HTTP messages, methods, headers, and the stateless nature of HTTP, along with the use of cookies for session management. Additionally, it discusses the structure of URLs, HTTP transactions, and the various response status codes associated with HTTP requests.

Uploaded by

mhbhmani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

‫بادخ مان ه ناادخ م خدا‬

HTTP
Internet Engineering
Autumn 2019
Present By : Seyed Hossein Ahmadpanah
CE & IT Department, Dr. Shariaty Technical College

Origin By: Bahador Bakhshi


CE & IT Department, Amirkabir University of Technology
Questions
 Q1) How do web server and client browser talk to
each other?
 Q1.1) What is the common protocol?
 Q1.2) How are resources identified?
 Q1.3) What are requests & responses?
 Q1.4) Can/Should server know its clients?
 Q1.5) Who can influence the communication between
server & client?
 Q1.6) Is everything public?

3
Outline
Introduction
 Messages
 Methods
 Headers

Cookie
Proxy & Cache
Authentication

4
Outline
Introduction
 Messages
 Methods
 Headers

Cookie
Proxy & Cache
Authentication

5
Introduction
 The transfer protocol for web applications is HTTP
 HTTP: Hyper Text Transfer Protocol
 HTTP 1.0 (RFC 1945), HTTP 1.1 (RFC 2068)
 In fact, it can be used to transfer everything (not only hyper text)
 Text documents: HTML, XML, …
 Multimedia (JPG, GIF, Video, …), Applications (pdf, zip, …)
 HTTP uses the client/server paradigm
 HTTP server provides resource
 HTTP client (usually web browser) gets resource
 But not pure client/server communication
 Proxies, caches, …

6
Introduction (cont’d)
 HTTP is an application layer protocol
 HTTP assumes reliable communication
 TCP, default (server) port: 80

 HTTP is stateless
 Server does not keep history/state of clients
 If client asks an object 10 times, server will give it
back each time
 High performance & Low complexity
 Problematic in some applications (sessions)

7
Resources
 HTTP is the protocol to transfer data between server and client (usually from
server to client)
 Which data?
 It can be anything
 In web, usually, it is a resource/object on server
 Each resource must be identified uniquely
 URI (Uniform Resource Identifier)
 Common practical implementation of URI is URL

8
URL
 URL requirements
 Basic requirements:
 Destination machine identification
 Transport layer protocol & port identification
 Application layer protocol identification
 Resource address in the destination machine
 Additional requirements:
 Security/Authentication
 Sending data from client to server
 Partial access to resource

9
URL (cont’d)
 <protocol(scheme)>://<user>:<pass>@<host>:<port>/<path>?
<query>#<frag>

 Some examples:

http://www.aut.ac.ir
ftp://me:123@kernel.org/private
http://www.bing.com/search?
q=web&go=&qs=n&form=QBLH&pq=web&sc=0-0&sp=-1
file://c:/windows/ file:///home/bahador/work

10
URL (cont’d)
 Scheme: the application layer protocol
 HTTP: The web protocol
 HTTPS: Secure HTTP
 FTP: File transfer protocol
 File: Access to a local file
 mailto: Send email to given address
 javascript: Run javascript code
…

11
URL (cont’d)
 Path: the path of the object on host filesystem
 It is with respect to web server (document) root directory

 E.g. web server root directory: /var/www/


 http://www.example.com/1.html
 /var/www/1.html
 http://www.example.com/1/2/3.jpg
 /var/www/1/2/3.jpg

12
URL (cont’d)
 Query: a mechanism to pass information from client to active
pages or forms, e.g.,
 Fill information in a university registration form
 Ask Google to search a phrase

 Starts with “?”


 name=value format
 “&” is the border between multiple parameters

13
URL (cont’d)
 Frag: A name for a part of resource
 A section in a document

http://www.example.com/paper.html#results

 Handled by browser
 Browser gets whole resource (doc) from sever
 In display time, it jumps to the specified part

14
URL (cont’d)
 URL is encoded by client before transmission
 How: Each byte is divided into two 4-bit group,
hexadecimal of the 4-bits are prefixed by %
 Example: ~  126 (ASCII)  01111110  %7E E

 What & Why?


 Non-ASCII (e.g., Persian characters)
 Reserved character when are not used for special role
 E.g. @, :, $, …
 Unsafe character, e.g. space, %, …

15
URL in Action
 User asks the browser to retrieve a resource (e.g. webpage,
image, pdf, …)
 Enter the URL in address bar in browser window
 Click on a link  browser extracts the corresponding URL

 Browser finds the IP address of <host> (DNS lookup)


 Browser creates a TCP connection to the IP address and the
<port>
 Browser sends HTTP requests through the connection
 The “Path” identifies the resource on the server

 Browser gets the response and processes it

16
How does HTTP work? Transactions
 HTTP data transfer is a collection of transactions
 Each transaction is composed of 2 HTTP messages
 Client  Server: HTTP Request message
 Server  Client: HTTP Response message

 Requests are identified by methods


 Method: The action that client asks from server

 Response are identified by status codes


 Status: The result of the requested action

17
HTTP Transaction (cont’d)
Client Server
Request

Response

Request

Response

Request

Response

18
HTTP Transaction in Web
 (Typically) each web page contains multiple resources
 The main skeleton HTML page
 Some linked materials: figures, videos, JS, CSS, …

 Displaying a web page by browser


 Get the HTML page (first transaction)
 Try to display the page (rendering)
 Other resources are linked to the page

19
HTTP Transaction in Web (cont’d)
 HTTP Transactions & TCP connections
 1) Non-persistent
 A new TCP connection per object
 Networking overhead + Connection establish delay +
Resource intensive (specially in server side)
 Parallel connections speed up browsing
 2) Persistent
 Get multiple objects using single TCP connection
 No extra processing & networking overhead
 Poor performance if implemented in serial manner
 Pipeline requests speed up browsing
 Added in HTTP/1.1

20
HTTP Transaction in Web: Example
Get a HTML page from a server
Capture the packets
Investigate the transactions

21
Outline
Introduction
 Messages
 Methods
 Headers

Cookie
Proxy & Cache
Authentication

22
HTTP Messages
 HTTP is text-based protocol
 Human readable headers
 The header is composed of some lines!!!


Start line: specifies the type of message

Header: depends on message type

An Empty Line

Message body: Data/payload

23
HTTP Messages (cont’d)
Request message format

Method<sp>Path<sp>version<CRLF>
<Header field>:<value><CRLF>

<Header field>:<value><CRLF>
<CRLF>
<Entity body>

24
HTTP Messages (cont’d)
E.g. HTTP request message

GET /index.html HTTP/1.1


Host: www.aut.ac.ir
User-Agent: Mozilla/6.0
Accept-Language: en-us
Connection: keep-alive

25
HTTP Messages (cont’d)
Response message format

Version<sp>code<sp>Reason<CRLF>
<Header field>:<value><CRLF>

<Header field>:<value><CRLF>
<CRLF>
<Entity body>

26
HTTP Messages (cont’d)
 E.g. HTTP response message

HTTP/1.1 200 OK
Date: Sun, 02 Oct 2011 20:30:40
Server: Apache/2.2.2
Last-Modified: Mon, 03 May 2009 10:20:22
Connection: keep-alive
Content-Length: 3000

data data data …

27
HTTP Methods
 Methods are actions that client asks from server to do on the specified resource
(given by the path parameter)
 Which actions?
 Basic data communication operations
 Safe operations
 Get a resource from server
 Send data to server

 Unsafe operations
 Delete a resource on server
 Create/Replace a resource on server

 Debugging and troubleshooting


 Get information about a resource
 Check what server got from client
 Get List of operations which can be applied on a resource

28
HTTP Methods (cont’d)
 GET (must be implemented by server): Retrieve resource
from server
 HEAD (must be implemented by server): Similar to GET but
the resource itself is not retrieved, just the HTTP response
header
 Useful for debugging or some other applications

 POST: Submit data to be processed by the specified resource


 Data itself is enveloped in message body

29
HTTP Methods (cont’d)
 DELETE: Remove the resource!!!
 Not popular in web, can be used in other
applications
 PUT: Add message body as the specified resource
(a file with given path) to server!!!
 TRACE: Server echoes back the received message
 For troubleshooting & debugging
 OPTIONS: Request the list of supported methods
by server on the resource

30
HTTP Responses
 The message for the result/response of the
requested action
 Which responses?
 Basic responses
 Success
 Failure
 Bad client request
 Server problem

 Others
 E.g., Redirection to other resources

31
HTTP Responses (cont’d)
 2xx: Successful operation
 200: OK
 201: Created

 4xx: Client error


 400: Bad request
 401: Unauthorized (Authorization required)
 403: Forbidden
 404: Not found
 405: Not allowed method

32
HTTP Responses (cont’d)
 5xx: Server error
 500: Internal server error
 501: Not implemented
 503: Service unavailable

 3xx:
 301: Moved Permanently & 307E : Moved Temporarily
 Resource has been moved, Redirection
 Location header  the new location of resource
 304: Not modified

33
HTTP Messages Examples
Connect to a web server
 telnet can create TCP socket

Play with the server by sending


HTTP methods and checking the
responses

34
HTTP Headers
 Headers are additional information sent by client
to server and vice versa
 Most (almost all) are optional

 Which headers?
 Information about client
 Information about server
 Information about the requested resource
 Information about the response
 Security/Authentication

35
HTTP Headers
 General headers
 Appear both on request & response messages
 Request headers
 Information about request
 Response headers
 Information about response
 Entity headers
 Information about body (size, …)
 Extension headers
 New headers (not standard)

36
General Headers
 Date: date & time that message is created
 Connection: close or keep-alive
 Close: Non-persistent connection
 Keep-alive: Persistent connection

 Via: Information about the intermediate nodes


between two sides
 Proxy servers

37
Request Headers
 Host: The name of the server (required, why?)
 Referer :P: URL that contains requested URL
 Information about client
 User-Agent: The client program
 UA-OS: The OS of client program
 UA-Disp: Information about display of client
 Accept: The acceptable media types
 Accept-Encoding: Acceptable encoding
 Accept-Language: What language are acceptable

38
Request Headers (cont’d)
 Range: Specific range (in byte) of resource
 Authorization: Response to the authenticate requests
 Will be discussed when studying Authentication

 If-Modified-Since: Request is processed if the objected is


modified since the specified time
 Used in Web Caching
 When the client has a copy of object and wants to check its
freshness

39
Response Headers
 Server: Information about server
 WWW-Authenticate: Used to specify authentication parameters by
server
 Will be discussed when studying Authentication

 Proxy-Authenticate: Used to specify authentication parameters by


proxy
 Will be discussed when studying Authentication

 Set-Cookie: To send a cookie to client


 Will be discussed when studying Cookies

40
Entity Headers
 Content-Length: The length of body (in byte)
 Content-Type: The type of entity
 MIME types: text/html, image/gif

 Allow: The allowed request method can be performed on the entity


 This is in response of OPTIONS method

 Location: The new location of entity to redirect client

41
Entity Headers (cont’d)
 Content-Range: Range of this entity in the entire
resource
 Expire: The date and time at which the entity will
expire
 Last-Modified: The date and time of last
modification of entity

42
Outline
Introduction
 Messages
 Methods
 Headers

Cookie
Proxy & Cache
Authentication

43
Stateless Problem
 HTTP is a stateless protocol
 Server does not remember its clients

 How to personalize pages (personal portal)?


 Use http header: Client-ip, From, …
 Is not usually sent by browsers
 Find client IP address from TCP connection
 The problem is NAT

44
Solution of Stateless Problem: Cookie
 Cookies: Are information (e.g., unique identifiers) sent by server to user ( browser)
which are retuned back to server
 How it works
 Server asks client to remember the information
 Set-Cookie header in response message
 Client gives back the information to server in every request
 Cookie header in request messages
 Server customizes responses according to the cookie
 Types
 Session cookies: To identify a session
 Persistent cookies: To identify a client (browser)

45
Cookies (cont’d)
1)

2)

3)

46
Cookies (cont’d)
 Limitation (cannot be used to store large data)
 Typically x1 total cookies, x2 cookies per domain, x3 data per
cookie
 Cookies are text files
 No virus spread

 There is not any request from server to read cookies


 By default cookies are sent by browser
 Browser checks URL and finds appropriate cookies

47
Cookies (cont’d)
 Client can control cookies
 Disable cookies: no cookie is saved & used
 View & Delete cookies

 Server can control cookies by its attributes


 Expiration time
 Domain
 Path
 Security

48
Cookies Attributes
 Expire & Max-Age: The life time of the cookie
 Expire: An absolute time to delete cookie
 Max-Age: The maximum life time (sec) of cookie
 If exist  permanent cookie
 Send a past time (or negative) to delete a cookie

 Secure: Cookie is sent only if channel is secure


 Specially useful for cookies for login sessions

 HttpOnly: Cookie is sent only if HTTP is used


 JavaScript cannot access to the cookies

49
Cookies Attributes: Domain & Path
 Domain & Path determine the scope of the cookie
 For which path and domain, the cookie is saved & returned
back by browser
 If these attributes are absent  browser assumes current host &
current path
 Browser returns back the cookie for the host & path and also
for all sub-paths
 If present  browser checks validity
 If they are valid  Browser returns back the cookie for that
domain & that path and also for all subdomains and

50
Cookies Attributes: Domain & Path
 Validity check by major browsers
 Domain names must start with dot
 Some browsers accept names without dot as domain
 Don’t accept for other domains than the base domain
 Don’t accept cookies for sub-domains
 Accept cookies for higher domains
 Except the top level domains, e.g., .com, .ac.ir
 Accept cookies for other (sub or higher) paths

51
Cookies Attributes: Domain & Path Examples
 A php script sets some cookies
 The script can be run using different domains
 Check out which cookies are accepted
 Check out which cookies are sent back

 In summary, cookies are filtered two times


 Which cookies must be accepted
 Which cookies are sent back

52
Outline
Introduction
 Messages
 Methods
 Headers

Cookie
Proxy & Cache
Authentication

53
Proxy
 Proxies sit between client and server
 Act as server for client
 Act as client for server

Proxy

Client Server Client Server


Process

54
HTTP Proxy Applications
 Authentication
 Client side: Authenticate clients before they access web
 Server side: Authenticate clients before access the server

 Accounting: Log client activities


 Security: Analyze request before sending it to server
 Integrated in modern firewalls

 Filtering: Limit access to specified contents


 Anonymizer: Anonymous web browsing

55
Caching
 Caching: save a copy of a resource and use it instead of
requesting server
 Browser has its own local caches
 Cache server is special proxy for caching
 Benefits
 Reduce redundant data transfer
 Reduce network bottleneck
 Reduce load on server

56
Caching Algorithm
 If the object is not cached, it is got from server, saved in cache,
and sent to client
 Else, if object is in cache
 Cache server must return only fresh objects
 Freshness check
 Objects life-time specified by server
 Expire header: Absolute expiration time
 Cache-Control: max-age: Relative expiration time

 If requested object is not expired


 Cache server gives it to client

57
Caching Algorithm (cont’d)
 If requested object is expired
 Its freshness must be checked

 Freshness is checked by conditional request


 If-Modified-Since: current last-modified time

 Server responses
 304 Not modified response + new expire time
 Cached copy is valid until the specified time
 200 OK
 Server provides a new version of the object
 Cache server updates cached copy

58
Outline
Introduction
 Messages
 Methods
 Headers

Cookie
Proxy & Cache
Authentication

59
HTTP Authentication
 All resources are not public in web; e.g.,
 Financial documents, Customer information, …

 HTTP has two (similar) authentications


 Basic: Base64 encoded “user:pass”
 Digest: Plain username + Digest of pass

 Steps are the same


 Client requests resource, Server challenges, Client asks
User/Pass, Client responses, Server authenticates and
allows
 Authentication information are sent by every request
until end of current session (why?!!)

60
HTTP Authentication (cont’d)
Basic authentication

1) Client  Server

2) Server  Client

3) Client  Server

2) Server  Client

61
Digest Authentication
 Basic authentication is insecure
 Password is sent in base64 encoding
 Attacker can easily find it

 Digest authentication: Don’t send password


 Send its digest (hash)

 Digest/hash function
 One way function, irreversible

 Attacker cannot find password 


 But! Reply attack 
 Attacker resends the same digest  He is authenticated

62
Digest Authentication
 1) Client requests a private resource
 2) Server creates a nonce
 WWW-Authenticate: Digest nonce=39X9s#! …

 3) Client computes digest of password and nonce


 Authorization: username, hash(pass, nonce)

 4) Server looks up the password of username and computes


hash(pass, nonce)
 If this value and Authorization are the same  Ok

63
Real Example
 Basic & Digest Authentication using the .htaccess mechanism in Apache server

 If .htaccess is located in a directory, it protects that directory and its all sub-
directories

64
Security
 Digest authentication protect password only
 Data is completely insecure
 No mechanism in HTTP to protect data
 HTTP over SSL/TLS is the popular solution
 An encrypted tunnel between client & server

65
Answers
 Q1.1) What is the common protocol?
 HTTP, Message = Header + Body
 Q1.2) How are resources identified?
 URL
 Q1.3) What are requests & responses?
 Method (GET, HEAD, …)
 Status (2xx, 3xx, 4xx, 5xx)
 Q1.4) Can/Should server know its clients?
 Yes, using cookies
 Q1.5) Who can influence the communication between server &
client?
 Proxy servers, e.g., cache servers
 Q1.6) Is everything public in server?
 No, HTTP basic/digest authentication

66
What is the Next?! HTTP/2
 The second major version of the HTTP
 Based on Google’s SPDY

 Published as RFC 7E 540 in May 2015


 ~33% of all websites support HTTP/2 (Feb. 2019)
 Main features
 Binary, instead of textual
 Can use one connection for parallelism
 Header compression
 Allows servers to “push” responses proactively

67

You might also like