بادخ مان ه ناادخ م خدا
HTTP
Internet Engineering
Autumn 2019
Present By : Seyed Hossein Ahmadpanah
CE & IT Department, Dr. Shariaty Technical College
Origin By: Bahador Bakhshi
CE & IT Department, Amirkabir University of Technology
Questions
Q1) How do web server and client browser talk to
each other?
Q1.1) What is the common protocol?
Q1.2) How are resources identified?
Q1.3) What are requests & responses?
Q1.4) Can/Should server know its clients?
Q1.5) Who can influence the communication between
server & client?
Q1.6) Is everything public?
3
Outline
Introduction
Messages
Methods
Headers
Cookie
Proxy & Cache
Authentication
4
Outline
Introduction
Messages
Methods
Headers
Cookie
Proxy & Cache
Authentication
5
Introduction
The transfer protocol for web applications is HTTP
HTTP: Hyper Text Transfer Protocol
HTTP 1.0 (RFC 1945), HTTP 1.1 (RFC 2068)
In fact, it can be used to transfer everything (not only hyper text)
Text documents: HTML, XML, …
Multimedia (JPG, GIF, Video, …), Applications (pdf, zip, …)
HTTP uses the client/server paradigm
HTTP server provides resource
HTTP client (usually web browser) gets resource
But not pure client/server communication
Proxies, caches, …
6
Introduction (cont’d)
HTTP is an application layer protocol
HTTP assumes reliable communication
TCP, default (server) port: 80
HTTP is stateless
Server does not keep history/state of clients
If client asks an object 10 times, server will give it
back each time
High performance & Low complexity
Problematic in some applications (sessions)
7
Resources
HTTP is the protocol to transfer data between server and client (usually from
server to client)
Which data?
It can be anything
In web, usually, it is a resource/object on server
Each resource must be identified uniquely
URI (Uniform Resource Identifier)
Common practical implementation of URI is URL
8
URL
URL requirements
Basic requirements:
Destination machine identification
Transport layer protocol & port identification
Application layer protocol identification
Resource address in the destination machine
Additional requirements:
Security/Authentication
Sending data from client to server
Partial access to resource
9
URL (cont’d)
<protocol(scheme)>://<user>:<pass>@<host>:<port>/<path>?
<query>#<frag>
Some examples:
http://www.aut.ac.ir
ftp://me:123@kernel.org/private
http://www.bing.com/search?
q=web&go=&qs=n&form=QBLH&pq=web&sc=0-0&sp=-1
file://c:/windows/ file:///home/bahador/work
10
URL (cont’d)
Scheme: the application layer protocol
HTTP: The web protocol
HTTPS: Secure HTTP
FTP: File transfer protocol
File: Access to a local file
mailto: Send email to given address
javascript: Run javascript code
…
11
URL (cont’d)
Path: the path of the object on host filesystem
It is with respect to web server (document) root directory
E.g. web server root directory: /var/www/
http://www.example.com/1.html
/var/www/1.html
http://www.example.com/1/2/3.jpg
/var/www/1/2/3.jpg
12
URL (cont’d)
Query: a mechanism to pass information from client to active
pages or forms, e.g.,
Fill information in a university registration form
Ask Google to search a phrase
Starts with “?”
name=value format
“&” is the border between multiple parameters
13
URL (cont’d)
Frag: A name for a part of resource
A section in a document
http://www.example.com/paper.html#results
Handled by browser
Browser gets whole resource (doc) from sever
In display time, it jumps to the specified part
14
URL (cont’d)
URL is encoded by client before transmission
How: Each byte is divided into two 4-bit group,
hexadecimal of the 4-bits are prefixed by %
Example: ~ 126 (ASCII) 01111110 %7E E
What & Why?
Non-ASCII (e.g., Persian characters)
Reserved character when are not used for special role
E.g. @, :, $, …
Unsafe character, e.g. space, %, …
15
URL in Action
User asks the browser to retrieve a resource (e.g. webpage,
image, pdf, …)
Enter the URL in address bar in browser window
Click on a link browser extracts the corresponding URL
Browser finds the IP address of <host> (DNS lookup)
Browser creates a TCP connection to the IP address and the
<port>
Browser sends HTTP requests through the connection
The “Path” identifies the resource on the server
Browser gets the response and processes it
16
How does HTTP work? Transactions
HTTP data transfer is a collection of transactions
Each transaction is composed of 2 HTTP messages
Client Server: HTTP Request message
Server Client: HTTP Response message
Requests are identified by methods
Method: The action that client asks from server
Response are identified by status codes
Status: The result of the requested action
17
HTTP Transaction (cont’d)
Client Server
Request
Response
Request
Response
Request
Response
18
HTTP Transaction in Web
(Typically) each web page contains multiple resources
The main skeleton HTML page
Some linked materials: figures, videos, JS, CSS, …
Displaying a web page by browser
Get the HTML page (first transaction)
Try to display the page (rendering)
Other resources are linked to the page
19
HTTP Transaction in Web (cont’d)
HTTP Transactions & TCP connections
1) Non-persistent
A new TCP connection per object
Networking overhead + Connection establish delay +
Resource intensive (specially in server side)
Parallel connections speed up browsing
2) Persistent
Get multiple objects using single TCP connection
No extra processing & networking overhead
Poor performance if implemented in serial manner
Pipeline requests speed up browsing
Added in HTTP/1.1
20
HTTP Transaction in Web: Example
Get a HTML page from a server
Capture the packets
Investigate the transactions
21
Outline
Introduction
Messages
Methods
Headers
Cookie
Proxy & Cache
Authentication
22
HTTP Messages
HTTP is text-based protocol
Human readable headers
The header is composed of some lines!!!
Start line: specifies the type of message
Header: depends on message type
An Empty Line
Message body: Data/payload
23
HTTP Messages (cont’d)
Request message format
Method<sp>Path<sp>version<CRLF>
<Header field>:<value><CRLF>
…
<Header field>:<value><CRLF>
<CRLF>
<Entity body>
24
HTTP Messages (cont’d)
E.g. HTTP request message
GET /index.html HTTP/1.1
Host: www.aut.ac.ir
User-Agent: Mozilla/6.0
Accept-Language: en-us
Connection: keep-alive
25
HTTP Messages (cont’d)
Response message format
Version<sp>code<sp>Reason<CRLF>
<Header field>:<value><CRLF>
…
<Header field>:<value><CRLF>
<CRLF>
<Entity body>
26
HTTP Messages (cont’d)
E.g. HTTP response message
HTTP/1.1 200 OK
Date: Sun, 02 Oct 2011 20:30:40
Server: Apache/2.2.2
Last-Modified: Mon, 03 May 2009 10:20:22
Connection: keep-alive
Content-Length: 3000
data data data …
27
HTTP Methods
Methods are actions that client asks from server to do on the specified resource
(given by the path parameter)
Which actions?
Basic data communication operations
Safe operations
Get a resource from server
Send data to server
Unsafe operations
Delete a resource on server
Create/Replace a resource on server
Debugging and troubleshooting
Get information about a resource
Check what server got from client
Get List of operations which can be applied on a resource
28
HTTP Methods (cont’d)
GET (must be implemented by server): Retrieve resource
from server
HEAD (must be implemented by server): Similar to GET but
the resource itself is not retrieved, just the HTTP response
header
Useful for debugging or some other applications
POST: Submit data to be processed by the specified resource
Data itself is enveloped in message body
29
HTTP Methods (cont’d)
DELETE: Remove the resource!!!
Not popular in web, can be used in other
applications
PUT: Add message body as the specified resource
(a file with given path) to server!!!
TRACE: Server echoes back the received message
For troubleshooting & debugging
OPTIONS: Request the list of supported methods
by server on the resource
30
HTTP Responses
The message for the result/response of the
requested action
Which responses?
Basic responses
Success
Failure
Bad client request
Server problem
Others
E.g., Redirection to other resources
31
HTTP Responses (cont’d)
2xx: Successful operation
200: OK
201: Created
4xx: Client error
400: Bad request
401: Unauthorized (Authorization required)
403: Forbidden
404: Not found
405: Not allowed method
32
HTTP Responses (cont’d)
5xx: Server error
500: Internal server error
501: Not implemented
503: Service unavailable
3xx:
301: Moved Permanently & 307E : Moved Temporarily
Resource has been moved, Redirection
Location header the new location of resource
304: Not modified
33
HTTP Messages Examples
Connect to a web server
telnet can create TCP socket
Play with the server by sending
HTTP methods and checking the
responses
34
HTTP Headers
Headers are additional information sent by client
to server and vice versa
Most (almost all) are optional
Which headers?
Information about client
Information about server
Information about the requested resource
Information about the response
Security/Authentication
35
HTTP Headers
General headers
Appear both on request & response messages
Request headers
Information about request
Response headers
Information about response
Entity headers
Information about body (size, …)
Extension headers
New headers (not standard)
36
General Headers
Date: date & time that message is created
Connection: close or keep-alive
Close: Non-persistent connection
Keep-alive: Persistent connection
Via: Information about the intermediate nodes
between two sides
Proxy servers
37
Request Headers
Host: The name of the server (required, why?)
Referer :P: URL that contains requested URL
Information about client
User-Agent: The client program
UA-OS: The OS of client program
UA-Disp: Information about display of client
Accept: The acceptable media types
Accept-Encoding: Acceptable encoding
Accept-Language: What language are acceptable
38
Request Headers (cont’d)
Range: Specific range (in byte) of resource
Authorization: Response to the authenticate requests
Will be discussed when studying Authentication
If-Modified-Since: Request is processed if the objected is
modified since the specified time
Used in Web Caching
When the client has a copy of object and wants to check its
freshness
39
Response Headers
Server: Information about server
WWW-Authenticate: Used to specify authentication parameters by
server
Will be discussed when studying Authentication
Proxy-Authenticate: Used to specify authentication parameters by
proxy
Will be discussed when studying Authentication
Set-Cookie: To send a cookie to client
Will be discussed when studying Cookies
40
Entity Headers
Content-Length: The length of body (in byte)
Content-Type: The type of entity
MIME types: text/html, image/gif
Allow: The allowed request method can be performed on the entity
This is in response of OPTIONS method
Location: The new location of entity to redirect client
41
Entity Headers (cont’d)
Content-Range: Range of this entity in the entire
resource
Expire: The date and time at which the entity will
expire
Last-Modified: The date and time of last
modification of entity
42
Outline
Introduction
Messages
Methods
Headers
Cookie
Proxy & Cache
Authentication
43
Stateless Problem
HTTP is a stateless protocol
Server does not remember its clients
How to personalize pages (personal portal)?
Use http header: Client-ip, From, …
Is not usually sent by browsers
Find client IP address from TCP connection
The problem is NAT
44
Solution of Stateless Problem: Cookie
Cookies: Are information (e.g., unique identifiers) sent by server to user ( browser)
which are retuned back to server
How it works
Server asks client to remember the information
Set-Cookie header in response message
Client gives back the information to server in every request
Cookie header in request messages
Server customizes responses according to the cookie
Types
Session cookies: To identify a session
Persistent cookies: To identify a client (browser)
45
Cookies (cont’d)
1)
2)
3)
46
Cookies (cont’d)
Limitation (cannot be used to store large data)
Typically x1 total cookies, x2 cookies per domain, x3 data per
cookie
Cookies are text files
No virus spread
There is not any request from server to read cookies
By default cookies are sent by browser
Browser checks URL and finds appropriate cookies
47
Cookies (cont’d)
Client can control cookies
Disable cookies: no cookie is saved & used
View & Delete cookies
Server can control cookies by its attributes
Expiration time
Domain
Path
Security
48
Cookies Attributes
Expire & Max-Age: The life time of the cookie
Expire: An absolute time to delete cookie
Max-Age: The maximum life time (sec) of cookie
If exist permanent cookie
Send a past time (or negative) to delete a cookie
Secure: Cookie is sent only if channel is secure
Specially useful for cookies for login sessions
HttpOnly: Cookie is sent only if HTTP is used
JavaScript cannot access to the cookies
49
Cookies Attributes: Domain & Path
Domain & Path determine the scope of the cookie
For which path and domain, the cookie is saved & returned
back by browser
If these attributes are absent browser assumes current host &
current path
Browser returns back the cookie for the host & path and also
for all sub-paths
If present browser checks validity
If they are valid Browser returns back the cookie for that
domain & that path and also for all subdomains and
50
Cookies Attributes: Domain & Path
Validity check by major browsers
Domain names must start with dot
Some browsers accept names without dot as domain
Don’t accept for other domains than the base domain
Don’t accept cookies for sub-domains
Accept cookies for higher domains
Except the top level domains, e.g., .com, .ac.ir
Accept cookies for other (sub or higher) paths
51
Cookies Attributes: Domain & Path Examples
A php script sets some cookies
The script can be run using different domains
Check out which cookies are accepted
Check out which cookies are sent back
In summary, cookies are filtered two times
Which cookies must be accepted
Which cookies are sent back
52
Outline
Introduction
Messages
Methods
Headers
Cookie
Proxy & Cache
Authentication
53
Proxy
Proxies sit between client and server
Act as server for client
Act as client for server
Proxy
Client Server Client Server
Process
54
HTTP Proxy Applications
Authentication
Client side: Authenticate clients before they access web
Server side: Authenticate clients before access the server
Accounting: Log client activities
Security: Analyze request before sending it to server
Integrated in modern firewalls
Filtering: Limit access to specified contents
Anonymizer: Anonymous web browsing
55
Caching
Caching: save a copy of a resource and use it instead of
requesting server
Browser has its own local caches
Cache server is special proxy for caching
Benefits
Reduce redundant data transfer
Reduce network bottleneck
Reduce load on server
56
Caching Algorithm
If the object is not cached, it is got from server, saved in cache,
and sent to client
Else, if object is in cache
Cache server must return only fresh objects
Freshness check
Objects life-time specified by server
Expire header: Absolute expiration time
Cache-Control: max-age: Relative expiration time
If requested object is not expired
Cache server gives it to client
57
Caching Algorithm (cont’d)
If requested object is expired
Its freshness must be checked
Freshness is checked by conditional request
If-Modified-Since: current last-modified time
Server responses
304 Not modified response + new expire time
Cached copy is valid until the specified time
200 OK
Server provides a new version of the object
Cache server updates cached copy
58
Outline
Introduction
Messages
Methods
Headers
Cookie
Proxy & Cache
Authentication
59
HTTP Authentication
All resources are not public in web; e.g.,
Financial documents, Customer information, …
HTTP has two (similar) authentications
Basic: Base64 encoded “user:pass”
Digest: Plain username + Digest of pass
Steps are the same
Client requests resource, Server challenges, Client asks
User/Pass, Client responses, Server authenticates and
allows
Authentication information are sent by every request
until end of current session (why?!!)
60
HTTP Authentication (cont’d)
Basic authentication
1) Client Server
2) Server Client
3) Client Server
2) Server Client
61
Digest Authentication
Basic authentication is insecure
Password is sent in base64 encoding
Attacker can easily find it
Digest authentication: Don’t send password
Send its digest (hash)
Digest/hash function
One way function, irreversible
Attacker cannot find password
But! Reply attack
Attacker resends the same digest He is authenticated
62
Digest Authentication
1) Client requests a private resource
2) Server creates a nonce
WWW-Authenticate: Digest nonce=39X9s#! …
3) Client computes digest of password and nonce
Authorization: username, hash(pass, nonce)
4) Server looks up the password of username and computes
hash(pass, nonce)
If this value and Authorization are the same Ok
63
Real Example
Basic & Digest Authentication using the .htaccess mechanism in Apache server
If .htaccess is located in a directory, it protects that directory and its all sub-
directories
64
Security
Digest authentication protect password only
Data is completely insecure
No mechanism in HTTP to protect data
HTTP over SSL/TLS is the popular solution
An encrypted tunnel between client & server
65
Answers
Q1.1) What is the common protocol?
HTTP, Message = Header + Body
Q1.2) How are resources identified?
URL
Q1.3) What are requests & responses?
Method (GET, HEAD, …)
Status (2xx, 3xx, 4xx, 5xx)
Q1.4) Can/Should server know its clients?
Yes, using cookies
Q1.5) Who can influence the communication between server &
client?
Proxy servers, e.g., cache servers
Q1.6) Is everything public in server?
No, HTTP basic/digest authentication
66
What is the Next?! HTTP/2
The second major version of the HTTP
Based on Google’s SPDY
Published as RFC 7E 540 in May 2015
~33% of all websites support HTTP/2 (Feb. 2019)
Main features
Binary, instead of textual
Can use one connection for parallelism
Header compression
Allows servers to “push” responses proactively
67