KEMBAR78
Module 2 | PDF | Port (Computer Networking) | Computer Network
0% found this document useful (0 votes)
10 views49 pages

Module 2

The document discusses network communication in distributed systems, focusing on the TCP/IP model and its four layers: data link, internet, transport, and application. It explains how messages are transmitted between machines, emphasizing the importance of protocols like HTTP for communication between compute nodes. Additionally, it highlights the structure of HTTP requests and the significance of methods like GET and POST in distributed environments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views49 pages

Module 2

The document discusses network communication in distributed systems, focusing on the TCP/IP model and its four layers: data link, internet, transport, and application. It explains how messages are transmitted between machines, emphasizing the importance of protocols like HTTP for communication between compute nodes. Additionally, it highlights the structure of HTTP requests and the significance of methods like GET and POST in distributed environments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

CSC 610

DISTRIBUTED
SYSTEMS
Dr. S. Pitchumani Angayarkanni
Module 2
Network Communication
Building Distributed Document Search
Introduction to Network Communication
• Learn the basics of network
communication, which allows our
machines to communicate with each
other within the cluster.
• We'll start with talking about the TCP IP
network model, and later we will walk
through a full example of a message
traveling between two machines over
the network.
Introduction
• Network communication is particularly important in distributed systems.
• So let's start talking about network communication and why it's so important
in multithreading.
• Passing a message from one thread to another thread was a very easy
task.
• Since all our threads ran in a context of the same application, they all had a
shared memory space, or they could pass data to each other.
• Of course, we had to do it carefully to avoid race conditions, so we use
locks for protection and semaphores or condition variables for signaling
in distributed systems.
• We don't have the shared memory luxury
anymore, so the only efficient way for our
nodes to communicate is using the
network.
• There is a lot of work that happens behind
the scenes that allow our computers to
exchange messages with each other.
TCP/IP Network Model
• All that work can be described through four layers of
obstruction defined in the TCP IP network model.
• In this model, each layer is getting service from the
layer underneath it and it's communicating with the
same layer on the other machine or station using
their relevant protocols.
• Those protocols are agreements between two
points on the network on how to exchange data.
• The higher the obstruction layer, the closer and more
relevant it is to us, the software developers and the users, and
the lower the obstruction layer is, the closer it is to the
physical hardware.
• The concepts of networking are very similar to those of a postal service
as in the end of the day, both services are there to help us delivering
items from a source to a destination.
• The first and the lowest network obstruction layer is the data link
layer.
• The data link layer is concerned with all the details of interfacing with
the hardware, and in charge of the physical delivery of the data
between two points connected by a single link.
• Those can be either two different hosts or a host and a router.
• This obstruction layer is in charge of encapsulating and encoding
our data. It takes care of the flow, control, the error, that action, error
correction, and many other lower level details.
• The most important protocol in this layer is the ethernet protocol.
• The Ethernet protocol wraps our data packets into
frames and uses the devices, Mac addresses to
deliver those packets from one device to another
device.
• The data link layer is equivalent to the post office
logistics department that they scare of the post-
service planes, trucks, and the postman schedules,
and make sure that the letter moves from one point
to another point on its way to its destination.
• The internet layer get service from the
data link layer and thinks a higher level of
role.
• It is in charge of delivering the data,
potentially across multiple networks, and
then routing the packets from the source
computer.
• So the destination computer protocol that
allows to accomplish this is the internet
protocol
• Each device in the network is a sign, an IP
address, and this address allows the
packets to travel across the networks from
the source host to the destination host.
• The only thing we really care about in this layer is
obtaining the IP address of the computer we want to
communicate with and getting our own IP address
if we want to share it with others through the
service registry
• For example, and the postal service analogy, we
can think of the IP address as the address of the
building where the recipient is located.
• Using the building and knowledge in log by accident
as the internet layer only takes care of delivering a
packet from one computer to another, but it doesn't
know which application process the packet is
intended for, nor does it know which process sent
it and where the response should go to.
• For that purpose We have the transport layer.
• This obstruction layer takes care of delivering the messages and
end to end from the process on one machine to another process on
the other machine and the transport layer.
• Each endpoint or socket identifies itself by 16 bit port.
• The listening port is chosen ahead of time by the destination
application, but the source port is generated on the fly by descender.
• Depending on the available ports at the moment, using the port
wants, the packet arrives at the destination computer, the operating
system knows which process the packet belongs to.
• Carrying on with the postal service analogy, the port number would be,
or exact apartment address and name, which ensures that the packet
that arrived at the apartment complex door would end up in your hands
instead of your neighbors or roommates and the transport layer.
• There are two primary protocols.
• They user datagram protocol and the transmission control
protocol.
• UDP is a connectionless protocol that guarantees only the best effort,
which means it's unreliable and the messages can be lost,
duplicated, or reordered.
• The basic building block of the UDP protocol is a unit called data
datagram, which is limited in size.
• In the Serbian systems, UDP is not very
common, but it is used where the speed
and simplicity of the message delivery
is more important than the reliability.
• One use case can be sending the bug
information to a distributed login
service.
• If some log messages get lost, it's
generally not a big deal as long as it’s not
financial data.
• For example, another perfect use case for UDP is a realtime
video or audio streaming service.
• For example, if our distributed system is getting a live stream of a
game or a show and he's delivering it to users, it's better to lose a
few frames rather than slowing down the entire stream until that
last frame is redelivered.
• Also, if our distributed system connects users playing an online
game, it's better to keep the latency as low as possible.
• And most past events such as character position, for example,
even if laws can be easily extrapolated from newer events.
• UDP also allows for broadcasting, which basically allows a single
computer to send a message to all the computers in the local
network without knowing anything about who those
computers are.
• This allows for full decoupling between the sender and their
receivers.
• Delivery of data is more important to us, even at a cost
of some latency.
• Because TCP is based on a connection between
exactly two points.
• Even if we have two sources that connect to the same
destination IP and port, the data flow will be split into
two separate sockets and it will be handled
completely separately by the operating system and the
application.
• That's because each TCP connection is uniquely
identified by the four of source IP port and destination
IP port.
• This feature allows us to build web servers very
easily.
• We simply need to listen on a single well known port
using TCP and every host that wants to communicate
with our web server simply needs to connect to that
port.
• But since each request comes from a different source
IP and port number, it will create a separate
connection and a separate stream of data.
• The only problem is TCP works as a plain stream of bytes nor distinguishing
which byte belongs to what message.
• However, in our case, we want to send and receive commands, request or
messages between nodes in the cluster and pair messages out of a stream of
bytes is very hard.
• We need to know where the message starts, where it ends, and how the data
is formatted for that purpose.
• We have the final layer, which is the
application layer. This layer is the most
important and relevant to us.
• The software engineers who build
distributed applications that communicate
with each other through the network.
• In the application layer, we have many
different protocols already defined for us
for different purposes.
Client Server Request Example

• Let's walk through a full example of how a message


would travel between two applications on the network.
• Let's assume with two nodes, the client and the server,
both of them are located in different networks, which are
connected by a router.
• First, in the application layer, our application will create a
message using HTTP, which contains the message and
the HTTP header.
• In the transport layer at TCP, header will be added to the
message containing the server port and the client's port.
• Typically, at this point, the original message would also
be broken into multiple segments, but since this
message is very small, a single TCP segment is enough.
• In the next stage, the IP protocol in the internet layer,
we'll add a nod or header containing the server IP
address as well as the clients.
• In the next stage, the IP protocol in
the internet layer, we'll add a node or
header containing the server IP
address as well as the clients.
• The sender's IP and port is necessary
so that the server will know how to
respond to it in the future.
• Finally on the Data link layer.
• The ethernet protocol adds that Mac address of the
router in its network is that of that of the client.
• All of those layers together are now for me, a frame. This
entire frame is then sent through the network in a form of
individual bits until what they arrives at the router.
• When the router receives the message, it removes the
ethernet protocol.
• Header, looks at the destination IP address, and using a
routing table, it figures out which network this message
needs to be sent next.
• So it would arrive at the destination server. Then the
router adds a new Ethan and header of the server Mac
address as the destination and the routers Mac address
as the source.
• After that, it sends the frame over the link straight to the
server. When the frame finally arrives at the server, all
those layers are peeled off like an onion first, the
Ethernet had had her is removed and the server
validates that this packet actually belongs to this
particular server.
• Next, the IP header is removed and saving the source IP address in memory so that the server can use it.
• The respond to the client.
• Later on the transport layer, the destination port is inspected.
• We'll see which application and which socket is listening on that port.
• After the application instance is identified by the OS the TCP header is removed and the message is handed to
the application.
• At this point, the code we write in the application will handle the message or form the action and respond to the
client using the exact same process.
• We'll learn the old essentials of network communication.
• We broke it down into four layers based on the TCP IP network model, and went through a full example of a
message traveling from an application on one machine to another application on another machine.
• It's also clear to us why in the service registry we published the Adler's in this form.
• This clearly States our address on the internet layer.
• The port will listen to on the transport layer and the protocol we're using in the application layer, which in the case
of HTTP also dictates TCP being used in the transport layer.
• Our next areas of focus are going to be HTTP, which is the most popular and most important protocol used to
connect devices on the web, including in distributed systems and talk specifically about how to package our data
to send efficiently between different nodes in the cluster.
View Detailed data of current node
Details:
1) cZxid: create the transaction zxid of the node
Each modification of Zookeeper status will generate a Zookeeper transaction id, which is
the total order of all modifications in Zookeeper. If zxid1 < zxid2, zxid1 occurs before zxid2
2) ctime: the number of milliseconds that znode was created (since 1969)
3) mZxid: transaction zxid last updated by znode
4) mtime: the number of milliseconds the znode was last modified (since 1969)
5) pZxid: the last updated child node zxid of znode
6) cversion: change number of znode child node and modification times of znode child
node
7) Data version: znode data change number
8) dataLength: the data length of znode
9) numChildren: number of child nodes of znode
• https://programmer.ink/think/zookeeper-command-line-
operation.html
HTTP COMMUNICATION IN DISTRIBUTED
SYSTEMS
• The HTTP is the most convenient
Why HTTP? popular and most importantly flexible
protocol to communicate between
different computers.
• The HTTP protocol allows us to almost entirely
remove ourselves from any layer underneath
the application layer and focus on building our
applications logic as well as sending messages
between computers with ease.
• HTTP was initially designed to serve content
from the web server to the users
• Web browser however in our course will focus
on the use of HTTP in distributed systems for
communication between compute nodes rather
than for serving content to the end client
• Every HTTP transaction has two parts a
request that is sent from the client to the server
and the response sent from the server back to
the client even
• If the client doesn't need any data from the
server the transaction is never considered
complete until the client receives even an
empty response back from the server with a
status indicating the outcome of the request
HTTP REQUEST STRUCTURE
• Let's first focus on the structure of the HTTP request
sent from the client to the server
• Every HTTP request consists of those five parts
• The first part is the method which gives the server a hint
about the action it needs to take then comes the relative
path which helps in routing the request to the
appropriate request handler in our code after that comes
the
• HTTP protocol version which is very important in how
the communication is managed between the two entities
later
• We have the optional set of HTTP headers which are
just key value pairs containing additional information
about the content of the message or the connection
between the peers and finally we have the optional
message body which can be in almost any format or size
• HTTP method can be any of the following set of
standard methods
• Each of those methods has precise semantics
and definition and it's primarily important to
adhere to those semantics
• If we intend to communicate with entities
outside of our cluster we’re going to primarily
focus on two of those methods which are the
most common when communicating within a
distributed system the methods are the get and
the post the get method handler implementation
should have these two properties
• It's safe which means the only action a server
receiving the get request should take is
retrieve some data or report its internal state
and have no side effects just like a simple getter
method in Java
• The get method should also be idempotent which means that perform in this operation end times gives the same
result or has the same effect as performing it only once
• The get request does not contain any message body which is in line with the purpose of the request
• A get request can be used for example by a service that periodically sends a request to check the health of each
node in the cluster and if one of those node stops responding the health check service may restart the faulty node
and bring it back to life another use case can be data retrieval that spends multiple microservices for example a
user can send a get request from the browser to a front-facing service requesting data about all the purchases he
made in the last month that service instead of having to talk to different databases which all have their own
protocols and structures can simply send an HTTP GET request to the users micro service which completely
obstructs away the users data storage then it can make a similar HTTP GET request to the purchases micro
service which may use a completely different type of database in the end the user facing service would aggregate
the data from both services build up the visual representation for the data and send it back to the user this way
each service is concerned with its own logic and all the communication is done through simple HTTP requests the
post method is almost the opposite of the get method unlike the get request it contains a message body the
operation performed by the post request can have side effects and specifically in the case of distributed systems
we expect to have a result of a complex computation as a result of the request because the post requests can
carry a message body it will be more useful for us to send messages between different nodes the next part of the
request is the relative path the relative path is the part that comes in the URL after the host and port this makes it
easy in our application to decide what part of the code needs to handle this particular request as part of the relative
path we may also provide a query string which may contain a few helpful parameters to perform the operation
going back to the distributed data retrieval

You might also like