KEMBAR78
Module 1 Part 2 | PDF | Network Socket | World Wide Web
0% found this document useful (0 votes)
26 views179 pages

Module 1 Part 2

The document discusses the Application Layer of the TCP/IP protocol suite, highlighting its role in providing services to users and its distinction as the highest layer. It explores two paradigms: the traditional client-server model, which relies on a continuous server process, and the peer-to-peer model, where responsibilities are shared among peers. Additionally, it covers the use of APIs, socket communication, and various transport layer protocols (UDP, TCP, SCTP) that facilitate data exchange in network applications.

Uploaded by

karthikpm0412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views179 pages

Module 1 Part 2

The document discusses the Application Layer of the TCP/IP protocol suite, highlighting its role in providing services to users and its distinction as the highest layer. It explores two paradigms: the traditional client-server model, which relies on a continuous server process, and the peer-to-peer model, where responsibilities are shared among peers. Additionally, it covers the use of APIs, socket communication, and various transport layer protocols (UDP, TCP, SCTP) that facilitate data exchange in network applications.

Uploaded by

karthikpm0412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 179

Module 1-part 2

Application Layer
• The fifth layer of the TCP/IP protocol suite
• The application layer provides services to the user.
• The communication at the application layer is logical, not physical.
• The application layer different from other layers in that it is the highest
layer in the suite.
• The protocols in this layer do not provide services to any other protocol in
the suite; they only receive services from the protocols in the transport
layer.
• This means that protocols can be added or removed from this layer easily.
• the application-layer protocols can be both standard and nonstandard.
Application-Layer Paradigms
• client-server paradigm
• peer-to-peer paradigm
Traditional Paradigm: Client-Server
• In this paradigm, the service provider is an application program,
called the server process;
• It runs continuously, waiting for another application program, called
the client process, to make a connection through the Internet and ask
for service.
• There are normally some server processes that can provide a specific
type of service, but there are many clients that request service from
any of these server processes.
• The server process must be running all the time; the client process is
started when the client needs to receive service.
Example of a client-server paradigm
• One problem with this paradigm is that the concentration of the
communication load is on the shoulder of the server,
• which means the server should be a powerful computer.
• Even a powerful computer may become overwhelmed if a large
number of clients try to connect to the server at the same time.
• Another problem is that there should be a service provider willing to
accept the cost and create a powerful server for a specific service,
• which means the service must always return some type of income for
the server in order to encourage such an arrangement.
• World Wide Web (WWW) , HyperText Transfer Protocol (HTTP), file
transfer protocol (FTP), secure shell (SSH), e-mail, and so on uses this
paradigm.
New Paradigm: Peer-to-Peer
• P2P paradigm
• In this paradigm, there is no need for a server process to be running
all the time and waiting for the client processes to connect.
• The responsibility is shared between peers.
• A computer connected to the Internet can provide service at one time
and receive service at another time.
• A computer can even provide and receive services at the same time.
• Internet telephony and file sharing is indeed a peer-to-peer activity.
Example of a peer-to-peer paradigm
• Easily scalable and cost-effective in eliminating the need for expensive
servers.
• The main challenge has been security;
• It is more difficult to create secure communication between
distributed services.
• The other challenge is applicability;
• it appears that not all applications can use this new paradigm.
• BitTorrent, Skype, IPTV, and Internet telephony, use this paradigm.
Mixed Paradigm
• combining the advantages of both.
• For example, a light-load client-server communication can be used to
find the address of the peer that can offer a service.
• When the address of the peer is found, the actual service can be
received from the peer by using the peer-to peer paradigm.
CLIENT-SERVER PARADIGM working
• In a client-server paradigm, communication at the application layer is
between two running application programs called processes: a client
and a server.
• A client is a running program that initializes the communication by
sending a request; a server is another application program that waits
for a request from a client.
• The server handles the request received from a client, prepares a
result, and sends the result back to the client.
• The lifetime of a server is infinite
• The lifetime of a client is finite.
How can a client process communicate with a
server process?
• We need a new set of instructions to tell the lowest four layers of the
TCP/IP suite to open the connection, send and receive data from the
other end, and close the connection.
• A set of instructions of this kind is normally referred to as Application
Programming Interface(API).
• A computer manufacturer needs to build the first four layers of the
suite in the operating system and include an API.
• The operating system encapsulates the first four layers.
• Socket interface, Transport Layer Interface (TLI), and STREAM are
some APIs.
Socket interface
• It is an API (Application Programming Interface)
• Socket interface started in the early 1980s at UC Berkeley as part of a
UNIX environment.
• The socket interface is a set of instructions that provide
communication between the application layer and the operating
system.
• It is a set of instructions that can be used by a process to
communicate with another process.
• Sockets allows us to use the set of all instructions already designed in
a programming language.
Sockets
• A socket behave like a terminal or a file, but it is not a physical entity
like them; it is an abstraction.
• It is a data structure that is created and used by the application
program.
Position of the socketinterface
• Communication between a client process and server process is
communication between two sockets, created at two ends.
• The client thinks that the socket is the entity that receives the request
and gives the response;
• the server thinks that the socket is the one that has a request and
needs the response.
• If we create two sockets, one at each end, and define the source and
destination addresses correctly, we can use the available instructions
to send and receive data.
• The rest is the responsibility of the operating system and the
embedded TCP/IP protocol.
Socket Addresses
• The interaction between a client and a server is two-way
communication.
• In a two-way communication, we need a pair of addresses: local
(sender) and remote (receiver).
• The local address in one direction is the remote address in the other
direction and vice versa.
• Since communication in the client-server paradigm is between two
sockets, we need a pair of socket addresses for communication: a
local socket address and a remote socket address.
• A socket address should first define the computer on which a client or
a server is running.
• a computer in the Internet is uniquely defined by its IP address, a 32-
bit integer.
• However, several client or server processes may be running at the
same time on a computer, which means that we need another
identifier to define the specific client or server involved in the
communication.
• An application program can be defined by a port number, a 16-bit
integer.
• This means that a socket address should be a combination of an IP
address and a port number.
Finding Socket Addresses
• The situation is different for each site.
• The server needs a local (server) and a remote (client) socket address for
communication.
• The client also needs a local (client) and a remote (server) socket address
for communication.
• The local (server) socket address is provided by the operating system.
• The remote socket address for a server is the socket address of the client
that makes the connection.
• The local (client) socket address is also provided by the operating system.
• DNS maps the remote server name to the IP address of the computer
running that server.
Using Services of the Transport Layer
• Since there is no physical communication at the application layer,
need to use the services provided by the transport layer.
• there are three common transport layer protocols in the TCP/IP suite:
UDP, TCP, and SCTP.
• Most standard applications have been designed to use the services of
one of these protocols.
• The choice of the transport layer protocol seriously affects the
capability of the application processes.
UDP Protocol
• User Datagram Protocol provides connectionless, unreliable,
datagram service.
• Connectionless service means that there is no logical connection
between the two ends exchanging messages.
• Each message is an independent entity encapsulated in a packet
called a datagram.
• UDP does not see any relation (connection) between consequent
datagrams coming from the same source and going to the same
destination.
• UDP is not a reliable protocol.
• Although it may check that the data is not corrupted during the
transmission,
• it does not ask the sender to resend the corrupted or lost datagram.
• For some applications, UDP has an advantage: it is message-oriented.
• It gives boundaries to the messages exchanged.
• An application program may be designed to use UDP if it is sending
small messages.
• the simplicity and speed is more important for the application than
reliability.
• For example, some management and multimedia applications.
TCP Protocol
• Transmission Control Protocol provides connection-oriented, reliable,
byte-stream service.
• TCP requires that two ends first create a logical connection between
themselves by exchanging some connection-establishment packets.
• This phase is sometimes called handshaking.
• Establishes some parameters between the two ends including the size
of the data packets to be exchanged, the size of buffers to be used for
holding the chunks of data until the whole message arrives, and so
on.
• After the handshaking process, the two ends can send chunks of data
in segments in each direction.
• By numbering the bytes exchanged, the continuity of the bytes can be
checked.
• For example, if some bytes are lost or corrupted, the receiver can
request the resending of those bytes, which makes TCP a reliable
protocol.
• TCP also can provide flow control and congestion control .
• One problem with the TCP protocol is that it is not message-oriented;
it does not put boundaries on the messages exchanged.
• Most of the standard applications that need to send long messages
and require reliability may benefit from the service of the TCP.
SCTP Protocol
• Stream Control Transmission Protocol provides a service which is a
combination of TCP and UDP.
• Like TCP, SCTP provides a connection-oriented, reliable service, but it
is not byte-stream oriented.
• It is a message-oriented protocol like UDP.
• In addition, SCTP can provide multistream service by providing
multiple network-layer connections.
• SCTP is normally suitable for any application that needs reliability and
at the same time needs to remain connected, even if a failure occurs
in one network-layer connection.
STANDARD CLIENT-SERVERAPPLICATIONS

• World Wide Web


• HTTP
• File transfer and electronic mail applications
• TELNET and SSH protocols.
• DNS
• Dynamic Host Configuration Protocol (DHCP)
• Simple Network Management Protocol (SNMP)
World Wide Web
• The idea of the Web was first proposed by Tim Berners-Lee in 1989
• The commercial Web started in the early 1990s.
• The Web is a repository of information in which the documents are called
Web pages.
• They are distributed all over the world and related documents are linked
together.
• Each web server in the world can add a new web page to the repository.
• Linking allows one web page to refer to another web page stored in
another server, achieved using a concept called hypertext.
• allows the linked document to be retrieved when the link was clicked by
the user.
• Hypermedia can be a text document, an image, an audio file, or a video
file.
• The purpose of the Web has gone beyond the simple retrieving of
linked documents.
• Today, the Web is used to provide electronic shopping and gaming.
• One can use the Web to listen to radio programs or view television
programs.
Architecture of web
• The WWW is a distributed client-server service, in which a client using
a browser can access a service using a server.
• However, the service provided is distributed over many locations
called sites.
• Each site holds one or more documents, referred to as web pages.
Each web page, however, can contain some links to other web pages
in the same or other sites.
• A web page can be simple or composite. A simple web page has no
links to other web pages; a composite web page has one or more
links to other web pages.
• Each web page is a file with a name and address.
Web Client(Browser)
• Browsers interpret and display a web page.
• Each browser usually consists of three parts: a controller, client
protocols, and interpreters
• The controller receives input from the keyboard or the mouse and
uses the client programs to access the document.
• After the document has been accessed, the controller uses one of the
interpreters to display the document on the screen.
• The client protocol can be HTTP or FTP.
• The interpreter can be HTML, Java, or JavaScript, depending on the
type of document.
• Some commercial browsers include Internet Explorer, Netscape
Navigator, and Firefox.
Web Server
• The web page is stored at the server. Each time a request arrives, the
corresponding document is sent to the client.
• To improve efficiency, servers normally store requested files in a
cache in memory; memory is faster to access than disk.
• A server can also become more efficient through multithreading or
multiprocessing.
• In this case, a server can answer more than one request at a time.
• Some popular web servers include Apache and Microsoft Internet
Information Server.
Uniform Resource Locator(URL)
• A web page, as a file, needs to have a unique identifier to distinguish it
from other web pages.
• To define a web page, we need four identifiers: protocol, host, port, and
path.
• The first identifier is the abbreviation for the client-server program that we
need in order to access the web page.
• The host identifier can be the IP address or the domain name
• The port, a 16-bit integer, is normally predefined. For example, for HTTP
port number is 80.
• The path identifies the location and the name of the file.
• To combine these four pieces together, the uniform resource locator (URL)
has been designed.
• http://www.mhhe.com/compsci/forouzan/
Web Documents
• The documents in the WWW can be grouped into three broad categories:
static, dynamic, and active.
• Static Documents
• Static documents are fixed-content documents that are created and stored
in a server.
• The client can get a copy of the document only.
• the contents of the file are determined when the file is created, not when it
is used.
• the contents in the server can be changed, but the user cannot change
them.
• Static documents are prepared using languages: Hypertext Markup
Language (HTML), Extensible Markup Language(XML), Extensible Style
Language (XSL), and Extensible Hypertext Markup Language(XHTML).
• Dynamic Documents
• A dynamic document is created by a web server whenever a browser
requests the document.
• When a request arrives, the web server runs an application program
or a script that creates the dynamic document.
• The server returns the result of the program or script as a response to
the browser that requested the document.
• Because a fresh document is created for each request, the contents of
a dynamic document may vary from one request to another.
• A very simple example of a dynamic document is the retrieval of the
time and date from a server.
• Scripting languages such as Java Server Pages (JSP), Active Server
Pages (ASP), ColdFusion are used to create dynamic docs.
• Active Documents
• For many applications, we need a program or a script to be run at the client
site.
• These are called active documents.
• For example, suppose we want to run a program that creates animated
graphics on the screen or a program that interacts with the user.
• The program definitely needs to be run at the client site where the
animation or interaction takes place.
• When a browser requests an active document, the server sends a copy of
the document or a script.
• The document is then run at the client (browser) site.
• One way to create an active document is to use Java applets, It is compiled
and ready to be run. The document is in bytecode (binary) format.
• Another way is to use JavaScripts but download and run the script at the
client site.
HyperText Transfer Protocol (HTTP)
• The HyperText Transfer Protocol (HTTP) is a protocol that is used to define
how the client-server programs can be written to retrieve web pages from
the Web.
• An HTTP client sends a request; an HTTP server returns a response.
• The server uses the port number 80; the client uses a temporary port
number.
• HTTP uses the services of TCP a connection-oriented and reliable protocol.
• This means that, before any transaction between the client and the server
can take place, a connection needs to be established between them.
• After the transaction, the connection should be terminated.
• Nonpersistent and Persistent Connections.
• If the web pages, objects to be retrieved, are located on different
servers, create a new TCP connection for retrieving each object.
• If some of the objects are located on the same server, we have two
choices:
• retrieve each object using a new TCP connection or make a TCP
connection and retrieve them all.
• The first method is referred to as nonpersistent connections, the
second as persistent connections.
• HTTP, prior to version 1.1, specified nonpersistent connections, while
persistent connections is the default in version 1.1
Nonpersistent Connections
• One TCP connection is made for each request/response.
• The following lists the steps in this strategy:
1. The client opens a TCP connection and sends a request.
2. The server sends the response and closes the connection.
3.The client reads the data until it encounters an end-of-file marker; it
then closes the connection.
• In this strategy, if a file contains links to N different pictures in
different files (all located on the same server), the connection must
be opened and closed N + 1 times.
• The nonpersistent strategy imposes high overhead on the server
because the server needs N + 1 different buffers each time a
connection is opened.
Persistent Connections
• The server leaves the connection open for more requests after sending a
response.
• The server can close the connection at the request of a client or if a time-out has
been reached.
• The sender usually sends the length of the data with each response.
• However, there are some occasions when the sender does not know the length of
the data.
• In these cases, the server informs the client that the length is not known and
closes the connection after sending the data so the client knows that the end of
the data has been reached.
• Time and resources are saved using persistent connections.
• Only one set of buffers and variables needs to be set for the connection at each
site.
• The round trip time for connection establishment and connection termination is
saved.
Non-persistent
persistent
Message Formats
• The HTTP protocol defines the format of the request and response
messages.
• Each message is made of four sections.
• The first section in the request message is called the request line;
• the first section in the response message is called the status line.
• Other three sections have the same names in the request and
response messages.
Request Message
• There are three fields in first line separated by one space and
terminated by two characters (carriage return and line feed)
• The method field defines the request types.
• In version 1.1 of HTTP, several methods are defined.
• Most of the time, the client uses the GET method to send a request.
• In this case, the body of the message is empty.
• The POST method is used to send some information to the server to
be added to the web page or to modify the web page.
• After the request line, we can have zero or more request header lines.
• Each header line sends additional information from the client to the
server.
• For example, the client can request that the document be sent in a
special format.
• Each header line has a header name, a colon, a space, and a header
value .
• The value field defines the values associated with each header name.
• The body can be present in a request message.
• Usually, it contains the comment to be sent or the file to be
published on the website when the method is PUT or POST.
Response Message
• A response message consists of a status line, header lines, a blank line, and sometimes a
body.
• The first line in a response message is called the status line.
• There are three fields in this line separated by spaces and terminated by a carriage
return and line feed.
• The first field defines the version of HTTP protocol, currently 1.1.
• The status code field defines the status of the request.
• It consists of three digits.
• The codes in the 100 range are only informational, the codes in the 200 range indicate a
successful request.
• The codes in the 300 range redirect the client to another URL, and the codes in the 400
range indicate an error at the client site.
• Finally, the codes in the 500 range indicate an error at the server site.
• The status phrase explains the status code in text form.
• After the status line, we can have zero or more response header lines.
• Each header line sends additional information from the server to the client.
• For example, the sender can send extra information about the document.
• Each header line has a header name, a colon, a space, and a header value.
Example
Cookies
• The World Wide Web was originally designed as a stateless entity.
• A client sends a request; a server responds. Their relationship is over.
• Today the Web has other functions that need to remember some
information about the clients:
• Websites are being used as electronic stores that allow users to
browse through the store, select wanted items, put them in an
electronic cart, and pay at the end.
• Websites may allow access to registered clients only.
• websites are used as portals or advertising agency.
• For these purposes, the cookie mechanism was devised.
Creating and Storing Cookies
• When a server receives a request from a client, it stores information
about the client in a file or a string.
• The contents of the cookie (information the server has gathered
about the client such as name, registration number, and so on), a
timestamp, and other information depending on the implementation.
• The server includes the cookie in the response that it sends to the
client.
• When the client receives the response, the browser stores the cookie
in the cookie directory, which is sorted by the server domain name.
Using Cookies
• When a client sends a request to a server, the browser looks in the
cookie directory to see if it can find a cookie sent by that server.
• If found, the cookie is included in the request.
• When the server receives the request, it knows that this is an old
client, not a new one.
• Note that the contents of the cookie are never read by the browser or
disclosed to the user.
• It is a cookie made by the server and eaten by the server.
Example
• An electronic store (e-commerce) can use a cookie for its client
shoppers.
• When a client selects an item and inserts it in a cart, a cookie that
contains information about the item, such as its number and unit
price, is sent to the browser.
• If the client selects a second item, the cookie is updated with the
new selection information, and so on.
• When the client finishes shopping and wants to check out, the last
cookie is retrieved and the total charge is calculated.
Web Caching: Proxy Server
• HTTP supports proxy servers.
• A proxy server is a computer that keeps copies of responses to recent requests.
• The HTTP client sends a request to the proxy server.
• The proxy server checks its cache.
• If the response is not stored in the cache, the proxy server sends the request to
the corresponding server.
• Incoming responses are sent to the proxy server and stored for future requests
from other clients.
• The proxy server reduces the load on the original server, decreases traffic, and
improves latency.
• However, to use the proxy server, the client must be configured to access the
proxy instead of the target server.
• The proxy server acts as both server and client.
Proxy Server Location
• The proxy servers are normally located at the client site.
• This means that we can have a hierarchy of proxy servers.
1.A client computer can also be used as a proxy server, in a small
capacity, that stores responses to requests often invoked by the client.
2.In a company, a proxy server may be installed on the computer LAN
to reduce the load going out of and coming into the LAN.
3.An ISP with many customers can install a proxy server to reduce the
load going out of and coming into the ISP network.
Example of a proxy server
Cache Update
• How long a response should remain in the proxy server ?
• Several different strategies are used for this purpose.
• One solution is to store the list of sites whose information remains
the same for a while.
• For example, a news agency may change its news page every
morning.
• This means that a proxy server can get the news early in the morning
and keep it until the next day.
• Another recommendation is to add some headers to show the last
modification time of the information.
• The proxy server can then use the information in this header to guess
how long the information would be valid.
HTTP Security
• HTTP by default does not provide security.
• However HTTP can be run over the Secure Socket Layer (SSL).
• In this case, HTTP is referred to as HTTPS.
• HTTPS provides confidentiality, client and server authentication, and
data integrity.
FTP
• File Transfer Protocol (FTP) is the standard protocol provided by
TCP/IP for copying a file from one host to another.
• Two systems may use different file name conventions.
• Two systems may have different ways to represent data.
• Two systems may have different directory structures.
• All of these problems have been solved by FTP.
basic model ofFTP
• The client has three components: user interface, client control
process, and the client data transfer process.
• The server has two components: the server control process and the
server data transfer process.
• The control connection is made between the control processes.
• The data connection is made between the data transfer processes.
• Separation of commands and data transfer makes FTP more efficient.
• The control connection uses very simple rules of communication.
• The data connection, on the other hand, needs more complex rules
due to the variety of data types transferred.
• The control connection remains connected during the entire
interactive FTP session.
• The data connection is opened and then closed for each file transfer
activity.
• when a user starts an FTP session, the control connection opens.
• While the control connection is open, the data connection can be
opened and closed multiple times if several files are transferred.
• FTP uses two well-known TCP ports:
• port 21 is used for the control connection, and port 20 is used for the
data connection.
• Communication is achieved through commands and responses.
• During this control connection, commands are sent from the client to
the server and responses are sent from the server to the client.
• Commands, which are sent from the FTP client control process, are in
the form of ASCII uppercase, which may or may not be followed by an
argument.
Some FTPcommands..
• Every FTP command generates at least one response.
• A response has two parts: a three-digit number followed by text.
• The numeric part defines the code; the text part defines needed parameters
or further explanations.
• The first digit defines the status of the command.
• The second digit defines the area in which the status applies.
• The third digit provides additional information.
Someresponses..
• The creation of a data connection is different from the control
connection.
Steps:
1. The client, issues a passive open using an ephemeral port, because it
is the client that issues the commands for transferring files.
2.The client sends this port number to the server using the PORT
command.
3.The server receives the port number and issues an active open using
the wellknown port 20 and the received ephemeral port number.
• The client must define the type of file to be transferred, the structure
of the data, and the transmission mode.
• Before sending the file through the data connection, prepare for
transmission through the control connection.
• The heterogeneity problem is resolved by defining three attributes of
communication:
• file type, data structure, and transmission mode.
• FTP can transfer one of the following file types across the data
connection:
• ASCII file, EBCDIC file, or image file.
• Structure of the data may be : file structure, record structure, or page structure.
• The file structure format (used by default) has no structure.
• It is a continuous stream of bytes.
• In the record structure, the file is divided into records. This can be used only
with text files.
• In the page structure, the file is divided into pages, with each page having a
page number and a page header.
• The pages can be stored and accessed randomly or sequentially.
• Three transmission modes: stream mode, block mode, or compressed
mode.
• The stream mode is the default mode; data are delivered from FTP to
TCP as a continuous stream of bytes.
• In the block mode, data can be delivered from FTP to TCP in blocks.
• In this case, each block is preceded by a 3-byte header.
• The first byte is called the block descriptor; the next two bytes define
the size of the block in bytes.
• File transfer occurs over the data connection under the control of the
commands sent over the control connection.
• file transfer in FTP means one of three things:
• Retrieving a file (server to client),
• Storing a file (client to server),
• Directory listing (server to client).
Security forFTP
• Although FTP requires a password, the password is sent in plaintext
(unencrypted), which means it can be intercepted and used by an
attacker.
• The data transfer connection also transfers data in plaintext, which is
insecure.
• To be secure, one can add a Secure Socket Layer between the FTP
application layer and the TCP layer.
• In this case FTP is called SSL-FTP.
ElectronicMail
• Electronic mail (or e-mail) allows users to exchange messages.
• In an application such as HTTP or FTP, the server program is running all the
time, waiting for a request from a client.
• When the request arrives, the server provides the service.
• There is a request and there is a response.
• In the case of electronic mail, the situation is different.
• E-mail is considered a one-way transaction.
• May or may not respond.
• It is neither feasible nor logical to run a server program and wait until
someone sends an e-mail.
• The idea of client/ server programming should be implemented in another
way:
• using some intermediate computers (servers).
• The users run only client programs when they want and the
intermediate servers apply the client/server paradigm.
Architecture ofe-mail
• In the common scenario, the sender and the receiver of the e-mail, Alice and Bob
respectively, are connected via a LAN or a WAN to two mail servers.
• The administrator has created one mailbox for each user where the received
messages are stored.
• A mailbox is part of a server hard drive, a special file with permission
restrictions.
• Only the owner of the mailbox has access to it.
• The administrator has also created a queue (spool) to store messages waiting to be
sent.
• A simple e-mail from Alice to Bob takes nine different steps.
• Alice and Bob use three different agents: a User Agent (UA), a Mail Transfer
Agent (MTA), and a Message Access Agent (MAA).
• When Alice needs to send a message to Bob, she runs a UA program to
prepare the message and send it to her mail server.
• The mail server at her site uses a queue (spool) to store messages waiting to be
sent.
• The message, however, needs to be sent through the Internet from Alice’s site to
Bob’s site using an MTA.
• Here two message transfer agents are needed: one client and one server.
• The server MTA needs to run all the time because it does not know when a client
will ask for a connection. The client, on the other hand, can be triggered by the
system when there is a message in the queue to be sent.
• The user agent at the Bob site allows Bob to read the received message.
• Bob later uses an MAA client to retrieve the message from an MAA server
running on the second server.
• Can not bypass the mail server and use the MTA server directly.
• A pair of client-server programs: message access programs needed.
• push program: the client pushes the message to the server.
• pull program :The client needs to pull the message from the server.
• The electronic mail system needs two UAs, two pairs of MTAs (client and
server), and a pair of MAAs (client and server).
User Agent
• The first component of an electronic mail system.
• It provides service to the user to make the process of sending and receiving a
message easier.
• A user agent is a software package (program) that composes, reads, replies to,
and forwards messages.
• It also handles local mailboxes on the user computers.
• There are two types of user agents: command-driven and GUI-based.
• Some examples of command driven user agents are mail, pine, and elm.
• Some examples of GUI-based user agents are Eudora and Outlook.
Sending Mail
• To send mail, the user, through the UA, creates mail that looks very similar to
postal mail.
• It has an envelope and a message.
• The envelope usually contains the sender address, the receiver address,
and other information.
• The message contains the header and the body.
• The header of the message defines the sender, the receiver, the subject of
the message, and some other information.
• The body of the message contains the actual information to be read by the
recipient.
ReceivingMail
• The user agent is triggered by the user (or a timer).
• If a user has mail, the UA informs the user with a notice.
• If the user is ready to read the mail, a list is displayed in which each line
contains a summary of the information about a particular message in the
mailbox.
• The summary usually includes the sender mail address, the subject, and the
time the mail was sent or received.
• The user can select any of the messages and display its contents on the screen.
EmailAddress
• To deliver mail, a mail handling system must use an addressing system with unique
addresses.
• In the Internet, the address consists of two parts: a local part and a domain
name, separated by an @ sign.

• An organization usually selects one or more hosts to receive and send e-mail; they
are sometimes called mail servers or exchangers.
Mailing List or GroupList
• Electronic mail allows one name, an alias, to represent several different
e-mail addresses;
• this is called a mailing list.
• Every time a message is to be sent, the system checks the recipient’s name
against the alias database;
• if there is a mailing list for the defined alias, separate messages, one for each
entry in the list, must be prepared and handed to the MTA.
Protocols used in electronicmail
Message Transfer Agent:SMTP
• The formal protocol that defines the MTA client and server in the Internet is
called Simple Mail Transfer Protocol (SMTP).
• SMTP is used two times,
• Between the sender and the sender’s mail server
• Between the two mail servers.
• SMTP simply defines how commands and responses must be sent back and forth.
• SMTP uses commands and responses to transfer messages between an MTA
client and an MTA server.
• The command is from an MTA client to an MTA server; the response is from an
MTA server to the MTA client.
• Each command or reply is terminated by a two character (carriage return and
line feed) end-of-line token.
SMTP Commands
• It consists of a keyword followed by zero or more arguments. SMTP defines 14
commands.
SMTP Responses
• A response is a three digit code that may be followed by additional textual
information.
Mail TransferPhases
• The process of transferring a mail message occurs in three phases:
• Connection establishment
• Mail transfer.
• Connection termination
Connection Establishment
• After a client has made a TCP connection to the port 25,
• the SMTP server starts the connection phase.
• This phase involves the following three steps:
• 1. The server sends code 220 (service ready) to tell the client that it is
ready to receive mail. If the server is not ready, it sends code 421
(service not available).
• 2. The client sends the HELO message to identify itself, using its
domain name address to inform the server of the domain name of
the client.
• 3. The server responds with code 250 (request command completed)
or some other code depending on the situation.
Message Transfer
• This phase involves eight steps.
• Steps 3 and 4 are repeated if there is more than one recipient.
• 1. The client sends the MAIL FROM message to introduce the sender of the
message. It includes the mail address of the sender (mailbox and the domain
name). This step is needed to give the server the return mail address for
returning errors and reporting messages.
• 2. The server responds with code 250 or some other appropriate code.
• 3. The client sends the RCPT TO (recipient) message, which includes the mail
address of the recipient.
• 4. The server responds with code 250 or some other appropriate code.
• 5. The client sends the DATA message to initialize the message transfer.
• 6. The server responds with code 354 (start mail input) or some other
appropriate message.
• 7. The client sends the contents of the message in consecutive lines. The message
is terminated by a line containing just one period.
• 8. The server responds with code 250 (OK) or some other appropriate code.
Connection Termination
• After the message is transferred successfully, the client terminates
the connection.
• This phase involves two steps.
• 1. The client sends the QUIT command.
• 2. The server responds with code 221 or some other appropriate
code.
Message Access Agent: POP andIMAP
• SMTP is a push protocol; it pushes the message from the client to the
server.
• The receiver needs a pull protocol; the client must pull messages from
the server.
• Two message access protocols are available: Post Office Protocol
version 3 (POP3) and Internet Mail Access Protocol version 4 (IMAP4).
POP3
• Post Office Protocol, version 3 (POP3) is simple but limited in
functionality.
• The client POP3 software is installed on the recipient computer; the
server POP3 software is installed on the mail server.
• Mail access starts with the client when the user needs to download its
e-mail from the mailbox on the mail server.
• The client opens a connection to the server on TCP port 110.
• It then sends its user name and password to access the mailbox.
• The user can then list and retrieve the mail messages, one by one.
• POP3 has two modes: the delete mode and the keep mode.
• In the delete mode, the mail is deleted from the mailbox after each
retrieval.
• In the keep mode, the mail remains in the mailbox after retrieval.
• The delete mode is normally used when the user is working at her
permanent computer and can save and organize the received mail
after reading or replying.
• The keep mode is normally used when the user accesses her mail
away from her primary computer (for example, from a laptop). The
mail is read but kept in the system for later retrieval and organizing.
• POP3 is deficient in several ways.
• It does not allow the user to organize her mail on the server;
• the user cannot have different folders on the server.
• In addition, POP3 does not allow the user to partially check the
contents of the mail before downloading.
IMAP4
• Internet Mail Access Protocol, version 4 (IMAP4).
• IMAP4 is similar to POP3, but it has more features;
• IMAP4 is more powerful and more complex.
• IMAP4 provides the following extra functions:
• A user can check the e-mail header prior to downloading.
• A user can search the contents of the e-mail for a specific string of
characters prior to downloading.
• A user can partially download e-mail. This is especially useful if bandwidth
is limited and the e-mail contains multimedia with high bandwidth
requirements.
• A user can create, delete, or rename mailboxes on the mail server.
• A user can create a hierarchy of mailboxes in a folder for e-mail storage.
MIME
• Multipurpose Internet Mail Extensions (MIME) is a supplementary protocol that
allows non-ASCII data to be sent through e-mail.
• MIME is a set of software functions that transforms non-ASCII data to ASCII
data and vice versa.

• Normally E-mail can send messages only in NVT 7-bit ASCII format.
• It cannot be used for languages other than English.
• It cannot be used to send binary files or video or audio data.

• MIME transforms non-ASCII data at the sender site to NVT ASCII data and
delivers it . The message at the receiving site is transformed back to the original
data.
MIME

• NVT is a Network Virtual Terminal which gives facilities in networks.


MIME Headers
• MIME defines five headers.
• MIME-Version :This header defines the version of MIME used. The current
version is 1.1.
• Content-ID :This header uniquely identifies the whole message in a multiple
message environment.
• Content-Description : This header defines whether the body is image, audio, or
video.
Content-Type
• MIME allows seven different types of data.
Content-Transfer-Encoding
• The five types of encoding methods are
Web-BasedMail
• Common sites are Hotmail, Yahoo, and Google mail.
E-Mail Security
• Email protocols does not provide any security provisions by default.
• However, e-mail exchanges can be secured using two application-
layer security protocols:
• Pretty Good Privacy (PGP)
• Secure/Multipurpose Internet Mail Extensions (S/MIME),
TELNET
• One of the original remote logging protocols is TELNET, which is an
abbreviation for TErminaL NETwork.
• Client/server programs that allow a user on the client site to log into
the computer at the server site and use the services available there.
• For example, if a student needs to use the Java compiler program at
her university lab,
• The student can use a client logging program to log into the university
server and use the compiler program at the university.
• these generic client/server pairs are remote logging applications.
• TELNET requires a logging name and password.
• But it is vulnerable to hacking because it sends all data including the
password in plaintext (not encrypted).
• A hacker can eavesdrop and obtain the logging name and password.
• Because of this security issue, the use of TELNET has diminished in
favour of another protocol, Secure Shell (SSH).
• Network administrators often use TELNET for diagnostic and
debugging purposes.
Locallogging
• When a user logs into a local system, it is called local login.
• As a user types at a terminal the keystrokes are accepted by the
terminal driver.
• The terminal driver passes the characters to the operating system.
• The operating system, in turn, interprets the combination of
characters and invokes the desired application program or utility.
RemoteLogging
• When a user wants to access an application program or utility located on a
remote machine, it is called remote logging.
• Here the TELNET client and server programs are used.
• The user sends the keystrokes to the terminal driver where the local operating
system accepts the characters but does not interpret them.
• The characters are sent to the TELNET client, which transforms the characters
into a universal character set called Network Virtual Terminal (NVT) characters
and delivers them to the local TCP/IP stack.
• The commands or text, in NVT form, travel through the Internet and arrive at
theTCP/IP stack at the remote machine.
• Here the characters are delivered to the operating system and passed to the
TELNET server, which changes the characters to the corresponding characters
understandable by the remote computer.
• A piece of software called a pseudoterminal driver gives this character to the OS.
• The operating system then passes the characters to the appropriate application
program.
Network Virtual Terminal(NVT)
• Every computer and its operating system accepts a special combination of
characters as tokens.
• For example, the end-of-file token in a computer running the DOS
operating system is Ctrl+z, while the UNIX operating system recognizes
Ctrl+d.
• TELNET solves this problem by defining a universal interface called the
Network Virtual Terminal (NVT) character set.
• Via this interface, the client TELNET translates characters (data or
commands) that come from the local terminal into NVT form and delivers
them to the network.
• The server TELNET, on the other hand, translates data and commands from
NVT form into the form acceptable by the remote computer.
• NVT uses two sets of characters, one for data and one for control.
• Both are 8-bit bytes.
• For data, NVT normally uses what is called NVT ASCII.
• This is an 8-bit character set in which the seven lowest order bits are
the same as ASCII and the highest order bit is 0.
• To send control characters between computers (from client to server
or vice versa), NVT uses an 8-bit character set in which the highest
order bit is set to 1.
TELNETcommands
• TELNET lets the client and server negotiate options before or during
the use of the service.
• Options are extra features available to a user.
• The operating system (UNIX, for example) defines an interface with
user-friendly commands.
Secure Shell(SSH)
• Secure Shell (SSH) is a secure application program that can be used
for several purposes such as remote logging and file transfer,
• it was originally designed to replace TELNET.
• There are two versions of SSH: SSH-1 and SSH-2, which are totally
incompatible.
• SSH is an application-layer protocol with three components.
Components ofSSH
SSH Transport-Layer Protocol(SSH-TRANS)
• Since TCP is not a secured transport-layer protocol, SSH first uses a
protocol that creates a secured channel on top of the TCP.
• This new layer is an independent protocol referred to as SSH-TRANS.
• The client and server first use the TCP protocol to establish an insecure
connection.
• Then they exchange several security parameters to establish a secure channel on
top of the TCP.
• The services provided by this protocol are:
• 1. Privacy or confidentiality of the message exchanged
• 2. Data integrity, it is guaranteed that the messages exchanged between the
client and server are not changed by an intruder
• 3. Server authentication, the client is now sure that the server is the one that it
claims to be
• 4. Compression of the messages, which improves the efficiency of the
system and makes attack more difficult
SSH Authentication Protocol(SSH-AUTH)

• After a secure channel is established between the client and the


server and the server is authenticated for the client, SSH can call
another procedure that can authenticate the client for the server.
• This layer defines a number of authentication tools.
• Authentication starts with the client, which sends a request message
to the server.
• The request includes the user name, server name, the method of
authentication, and the required data.
• The server responds with either a success message, which confirms
that the client is authenticated, or a failed message, which means
that the process needs to be repeated with a new request message.
SSH Connection Protocol(SSH-CONN)

• After the secured channel is established and both server and client are
authenticated for each other, SSH can call a piece of software that
implements the third protocol, SSH-CONN.
• One of the services provided by the SSH-CONN protocol
is multiplexing.
• SSH-CONN takes the secure channel established by the two previous
protocols and lets the client create multiple logical channels over it.
• Each channel can be used for a different purpose, such as remote
logging, file transfer, and so on.
Applications ofSSH
• SSH is, in fact, a general-purpose protocol that provides a secure connection
between a client and server.
• Several free and commercial applications use SSH for remote logging. PuTTy and
Tectia are SSH programs that can be used for remote logging
• The Secure File Transfer Program (sftp) uses one of the channels provided
by the SSH to transfer files.
• Another common application is called Secure Copy (scp).
• This application uses the same format as the UNIX copy command, cp, to
copy files.
PortForwarding
• One of the services provided by the SSH protocol is port forwarding.
• The SSH port forwarding mechanism creates a tunnel through which
the messages belonging to other protocols can travel.
• For this reason, this mechanism is sometimes referred to as SSH
tunneling.
Format of the SSHPackets

• The data field is the data transferred by the packet in different protocols.
• The length field defines the length of the packet but does not include the
padding.
• One to eight bytes of padding is added to the packet to make the attack on
the security provision more difficult.
• The cyclic redundancy check (CRC) field is used for error detection.
• The type field designates the type of the packet used in different SSH
protocols.
Domain Name System(DNS)
• This is a client-server application program designed to help other
application programs.
• To identify an entity, TCP/IP protocols use the IP address, which
uniquely identifies the connection of a host to the Internet.
• However, people prefer to use names instead of numeric addresses.
• Therefore, the Internet needs to have a directory system that can
map a name to an address.
• Since the Internet is so huge today, a central directory system cannot
hold all the mapping.
• In addition, if the central computer fails, the whole communication
network will collapse.
• A better solution is to distribute the information among many
computers in the world.
• In this method, the host that needs mapping can contact the closest
computer holding the needed information.
• This method is used by the Domain Name System (DNS).
Steps : Hostname to IP address
• The following six steps map the host name to an IP address:
• 1. The user passes the host name to the file transfer client.
• 2. The file transfer client passes the host name to the DNS client.
• 3. The DNS client sends a message to a DNS server with a query that
gives the file transfer server name using the known IP address of the
DNS server.
• 4. The DNS server responds with the IP address of the desired file
transfer server.
• 5. The DNS client passes the IP address to the file transfer client.
• 6. The file transfer client now uses the received IP address to access
the file transfer server.
Name Space
• The names assigned to machines must be unique because the addresses
are unique.
• A name space that maps each address to a unique name can be organized
in two ways: flat or hierarchical.
• In a flat name space, a name is a sequence of characters without structure.
• The main disadvantage of a flat name space is that it cannot be used in a
large system such as the Internet because it must be centrally controlled to
avoid ambiguity and duplication.
• In a hierarchical name space, each name is made of several parts.
• The first part can define the nature of the organization, the second part can
define the name of an organization, the third part can define departments
in the organization, and so on.
• Examples: ceasar.first.com and ceasar.second.com.
Domain Name Space
• To have a hierarchical name space, a domain name space was
designed.
• In this design the names are defined in an inverted-tree structure
with the root at the top.
• The tree can have only 128 levels: level 0 (root) to level 127
• Each node in the tree has a label, which is a string with a maximum of 63
characters.
• The root label is a null string (empty string).
• DNS requires that children of a node have different labels, which guarantees the
uniqueness of the domain names.
• Each node in the tree has a domain name.
• A full domain name is a sequence of labels separated by dots (.).
• The domain names are always read from the node up to the root.
• This means that a full domain name always ends in a null label, which means the
last character is a dot because the null string is nothing.
• If a label is terminated by a null string that is dot, it is called a fully qualified
domain name (FQDN).
• If a label is not terminated by a null string, it is called a partially qualified domain
name (PQDN).
• A PQDN starts from a node, but it does not reach the root.
• It is used when the name to be resolved belongs to the same site as the client.
• Here the resolver can supply the missing part, called the suffix, to create an
FQDN.
Example for domain name
• A domain is a subtree of the domain name space.
• The name of the domain is the name of the node at the top of the
subtree.
• A domain may itself be divided into domains.
Distribution of Name Space
• It is very inefficient and also not reliable to have just one computer
store such a huge amount of information.
• It is inefficient because responding to requests from all over the world
places a heavy load on the system.
• It is not reliable because any failure makes the data inaccessible.
• The solution to these problems is to distribute the information among
many computers called DNS servers.
• A hierarchy of servers in the same way that we have a hierarchy of
names.
• Since the complete domain name hierarchy cannot be stored on a
single server, it is divided among many servers.
• What a server is responsible for or has authority over is called a zone.
• We can define a zone as a contiguous part of the entire tree.
• A root server is a server whose zone consists of the whole tree.
Primary and Secondary Servers
• DNS defines two types of servers: primary and secondary.
• A primary server is a server that stores a file about the zone for which it is an
authority.
• It is responsible for creating, maintaining, and updating the zone file.
• It stores the zone file on a local disk.
• A secondary server is a server that transfers the complete information about a
zone from another server (primary or secondary) and stores the file on its local
disk.
• The secondary server neither creates nor updates the zone files.
• If updating is required, it must be done by the primary server, which sends the
updated version to the secondary.
• This is to create redundancy for the data so that if one server fails, the other can
continue serving clients.
• A primary server loads all information from the disk file;
• the secondary server loads all information from the primary server.
DNS in the Internet
• In the Internet, the domain name space (tree) was originally divided
into three different sections: generic domains, country domains, and
the inverse domain.
• Due to the rapid growth of the Internet, it became extremely difficult
to keep track of the inverse domains, which could be used to find the
name of a host when given the IP address.
Generic Domains
• The generic domains define registered hosts according to their
generic behavior. Each node in the tree defines a domain, which is an
index to the domain name space database.
Country Domains
• The country domains section uses two-character country abbreviations (e.g., us for United States).

• Second labels can be organizational, or they can be more specific, national designations.
• uci.ca.us. can be translated to University of California.
What is Resolution
• Mapping a name to an address is called name-address resolution.
• DNS is designed as a client-server application.
• A host that needs to map an address to a name or a name to an address
calls a DNS client called a resolver.
• The resolver accesses the closest DNS server with a mapping request.
• If the server has the information, it satisfies the resolver;
• otherwise, it either refers the resolver to other servers or asks other
servers to provide the information.
• After the resolver receives the mapping, it interprets the response to see if
it is a real resolution or an error, and finally delivers the result to the
process that requested it.
• A resolution can be either recursive or iterative.
Recursive Resolution
Iterative Resolution
• In iterative resolution, each server that does not know the mapping
sends the IP address of the next server back to the one that
requested it.
Caching
• When a server asks for a mapping from another server and receives
the response, it stores this information in its cache memory.
• Caching speeds up resolution, but it can also be problematic.
• If a server caches a mapping for a long time, it may send an outdated
mapping to the client.
• To solve this, the server always adds information to the mapping
called time to live (TTL).
• It defines the time in seconds that the receiving server can cache the
information.
• After that time, the mapping is invalid.
DNS Resource Records

• The zone information associated with a server is implemented as a


set of resource records.
• A name server stores a database of resource records.
• A resource record is a 5-tuple structure:
• The domain name field is what identifies the resource record.
• The value defines the information kept about the domain name.
• The TTL defines the number of seconds for which the information is
valid.
• The class defines the type of network.
DNS Messages
• To retrieve information about hosts, DNS uses two types of messages:
query and response. Both types have the same format:
• The identification field is used by the client to match the response with the
query.
• The flag field defines whether the message is a query or response. It also
includes status of error.
• The next four fields in the header define the number of each record type in
the message.
• The question section, which is included in the query and repeated in the
response message, consists of one or more question records.
• It is present in both query and response messages.
• The answer section consist of one or more resource records. It is present
only in response messages.
• The authoritative section gives information (domain name) about one or
more authoritative servers for the query.
• The additional information section provides additional information that
may help the resolver.
• In UNIX and Windows, the nslookup utility can be used to retrieve
address/name mapping.
• The following shows how we can retrieve an address when the
domain name is given.
Encapsulation in DNS
• DNS can use either UDP or TCP.
• In both cases the well-known port used by the server is port 53.
• UDP is used when the size of the response message is less than 512 bytes.
• If the size of the response message is more than 512 bytes, a TCP
connection is used.
• If the resolver has prior knowledge that the size of the response message is
more than 512 bytes, it uses the TCP connection.
• If the resolver does not know the size of the response message, it can use
the UDP port.
• However, if the size of the response message is more than 512 bytes, the
server truncates the message and turns on the TC bit.
• The resolver now opens a TCP connection and repeats the request to get a
full response from the server.
How are new domains added to DNS?
• This is done through a registrar, a commercial entity accredited by ICANN
(Internet Corporation for Assigned Names and Numbers).
• A registrar first verifies that the requested domain name is unique and
then enters it into the DNS database.
• A fee is charged. There are many registrars; their names and addresses can
be found at: http://www.intenic.net
• To register, the organization needs to give the name of its server and the IP
address of the server.
• For example, a new commercial organization named wonderful with a
server named ws and IP address 200.200.200.5 needs to give the following
information to one of the registrars:
• Domain name: ws.wonderful.com IP address: 200.200.200.5
DDNS
• Dynamic Domain Name System.
• In DNS, when there is a change, such as adding a new host, removing a
host, or changing an IP address, the change must be made to the DNS
master file.
• These types of changes involve a lot of manual updating.
• The size of today’s Internet does not allow for this kind of manual
operation.
• The DNS master file must be updated dynamically.
• The Dynamic Domain Name System (DDNS) therefore was devised.
• In DDNS, when a binding between a name and an address is determined,
the information is sent, usually by DHCP (Dynamic Host Configuration
Protocol )to a primary DNS server.
• The primary server updates the zone.
• To provide security and prevent unauthorized changes in the DNS records,
DDNS can use an authentication mechanism.
Security of DNS
• DNS is one of the most important systems in the Internet infrastructure.
• To protect DNS, IETF has devised a technology named DNS Security
(DNSSEC).
• DNS can be attacked in several ways:
• 1. The attacker may read the response of a DNS server to find the nature or
names of sites the user mostly accesses. This type of information can be
used to find the user’s profile.
• To prevent this attack, DNS messages need to be confidential.
• 2. The attacker may intercept the response of a DNS server and change it to
direct the user to another site .This type of attack can be prevented using
message origin authentication and message integrity
• 3. The attacker may flood the DNS server to overwhelm it or eventually
crash it. This type of attack can be prevented using the provision against
denial-of-service attack.
PEER-TO-PEER PARADIGM
• History
• The first instance of peer-to-peer file sharing goes back to December
1987 when Wayne Bell created WWIVnet.
• Freenet 1999
• Napster (1999–2001)
• Gnutella 2000
• Fast-Track (used by the Kazaa), BitTorrent, WinMX, and GNUnet in
2001.
P2P Networks
• Internet users that are ready to share their resources become peers and
form a network.
• When a peer in the network has a file (for example, an audio or video file)
to share, it makes it available to the rest of the peers.
• An interested peer can connect itself to the computer where the file is
stored and download it.
• After a peer downloads a file, it can make it available for other peers to
download.
• As more peers join and download that file, more copies of the file become
available to the group.
• The P2P networks can be divided into two categories: centralized and
decentralized.
Centralized Networks
• In a centralized P2P network, the directory system⎯listing of the peers
and what they offer⎯uses the client-server paradigm, but the storing
and downloading of the files are done using the peer-to-peer
paradigm.
• For this reason, a centralized P2P network is sometimes referred to as
a hybrid P2P network.
• Napster was an example of a centralized P2P.
• In this type of network, a peer first registers itself with a central
server.
• The peer then provides its IP address and a list of files it has to share.
• A peer, looking for a particular file, sends a query to a central server.
• The server searches its directory and responds with the IP addresses
of nodes that have a copy of the file.
• The peer contacts one of the nodes and downloads the file. The
directory is constantly updated.
• Centralized networks make the maintenance of the directory simple
but have several drawbacks.
• Accessing the directory can generate huge traffic and slow down the
• system.
• The central servers are vulnerable to attack, and if all of them fail, the
whole system goes down.
Decentralized Network

• A decentralized P2P network does not depend on a centralized


directory system.
• In this model, peers arrange themselves into an overlay network,
which is a logical network made on top of the physical network.
• Depending on how the nodes in the overlay network are linked, a
decentralized P2P network is classified as either unstructured or
structured.
Unstructured Networks
• In an unstructured P2P network, the nodes are linked randomly.
• A search in an unstructured P2P is not very efficient because a query
to find a file must be flooded through the network, which produces
significant traffic and still the query may not be resolved.
• Two examples of this type of network are Gnutella and Freenet.
Example : Gnutella
• The Gnutella network is an example of a peer-to-peer network that is decentralized but
unstructured.
• It is unstructured in a sense that the directory is randomly distributed between nodes.
• When node A wants to access an object (such as a file), it contacts one of its neighbors.
• A neighbor, in this case, is any node whose address is known to node A.
• Node A sends a query message to the neighbor, node W.
• The query includes the identity of the object (for example, file name).
• If node W knows the address of node X, which has the object, it sends a response
message, that includes the address of node X.
• Node A now can use the commands defined in a transfer protocol such as HTTP to get a
copy of the object from node X.
• If node W does not know the address of node X, it floods the request from A to all its
neighbors.
• one of the reasons that Gnutella can not be scaled well is the flooding.
• Gnutella adopted techniques such as Query Routing Protocol (QRP) and Dynamic
Querying (DQ) to reduce traffic overhead.
Structured Networks
• A structured network uses a predefined set of rules to link nodes so
that a query can be effectively and efficiently resolved.
• The most common technique used for this purpose is the Distributed
Hash Table (DHT).
• DHT is used in many applications including Distributed Data Structure
(DDS), Content Distributed Systems (CDS), Domain Name System
(DNS), and P2P file sharing.
• One popular P2P file sharing protocol that uses the DHT is BitTorrent.
Distributed Hash Table (DHT)
• A Distributed Hash Table (DHT) distributes data (or references to data)
among a set of nodes according to some predefined rules.
• Each peer in a DHT-based network becomes responsible for a range
of data items.
• To avoid the flooding overhead, DHT-based networks allow each peer
to have a partial knowledge about the whole network.
• This knowledge can be used to route the queries about the data items
to the responsible nodes using effective and scalable procedures.
Address Space in DHT
• In a DHT-based network, each data item and the peer is mapped to a
point in a large address of size 2m.
• The address space is designed using modular arithmetic,
• which means that points in the address space as distributed evenly
on a circle with 2m points (0 to 2m − 1) using clockwise direction
• Most of the DHT implementations use m = 160.
Hashing Peer Identifier
• The first step in creating the DHT system is to place all peers on the
address space ring.
• This is normally done by using a hash function that hashes the peer
identifier, normally its IP address, to an m-bit integer, called a node
ID.
• node ID = hash (Peer IP address)
• A hash function is a mathematical function that creates an output
from an input.
• DHT uses some of the cryptographic hash functions such as Secure
Hash Algorithm (SHA) that are collision resistant.
Hashing Object Identifier
• The name of the object (for example, a file) to be shared is also
hashed to an m-bit integer in the same address space.
• The result in DHT parlance is called a key.
• key = hash (Object name)
• In the DHT an object is normally related to the pair (key, value) in
which the key is the hash of the object name and the value is the
object or a reference to the object.
Storing the Object
• There are two strategies for storing the object: the direct method and the
indirect method.
• In the direct method, the object is stored in the node whose ID is
somehow closest to the key in the ring.
• Most DHT systems use the indirect method due to efficiency.
• The peer that owns the object keeps the object, but a reference to the
object is created and stored in the node whose ID is closest to the key
point.
• The physical object and the reference to the object are stored in two
different locations.
• In the direct strategy, we create a relationship between the node ID that
stores the object and the key of the object; in the indirect strategy, we
create a relationship between the reference (pointer) to the object and the
node that stores that reference.
Example
• The node N5 with IP address 110.34.56.20 has a file named Liberty
that wants to share with its peers.
• The node makes a hash of the file name, “Liberty,” to get the key = 14.
• Since the closest node to key 14 is node N17, N5 creates a reference
to file name (key), its IP address, and the port number (and possibly
some other information about the file) and sends this reference to be
stored in node N17.
• In other words, the file is stored in N5, the key of the file is k14 (a
point in the DHT ring), but the reference to the file is stored in node
N17.
Routing in DHT Networks
• The main function of DHT is to route a query to the node which is
responsible for storing the reference to an object.
• Each DHT implementation uses a different strategy for routing
• But all follow the idea that each node needs to have a partial
knowledge about the ring to route a query to a node that is closest to
the responsible node.
Arrival and Departure of Nodes
• In a P2P network, each peer can be a desktop or a laptop computer,
which can be turned on or off.
• When a computer peer launches the DHT software, it joins the
network;
• when the computer is turned off or the peer closes the software, it
leaves the network.

You might also like