Chapter-5 MSC Cs
Chapter-5 MSC Cs
Distributed System:
A distributed operating system (DOS) is an essential type of operating system. Distributed
systems use many central processors to serve multiple real-time applications and users. As a
result, data processing jobs are distributed between the processors.
Client-Server System
This type of system requires the client to request a resource, after which the
server gives the requested resource. When a client connects to a server, the
server may serve multiple clients at the same time.
Client-Server Systems are also referred to as "Tightly Coupled Operating
Systems". This system is primarily intended for multiprocessors and
homogenous multicomputer. Client-Server Systems function as a centralized
server since they approve all requests issued by client systems.
Peer-to-Peer System
The nodes play an important role in this system. The task is evenly distributed
among the nodes. Additionally, these nodes can share data and resources as
needed. Once again, they require a network to connect.
The Peer-to-Peer System is known as a "Loosely Couple System". This concept
is used in computer network applications since they contain a large number of
processors that do not share memory or clocks. Each processor has its own local
memory, and they interact with one another via a variety of communication
methods like telephone lines or high-speed buses.
Middleware
Middleware enables the interoperability of all applications running on different
operating systems. Those programs are capable of transferring all data to one
other by using these services.
Three-tier
The information about the client is saved in the intermediate tier rather than in
the client, which simplifies development. This type of architecture is most
commonly used in online applications.
N-tier
When a server or application has to transmit requests to other enterprise services
on the network, n-tier systems are used.
Network Structure:
A network consists of nodes and links that form dyads, triads, groups, or a system of interconnected
animate (actors) and inanimate objects, based on the specific types of relationships between them. In a
dyad, ties come together, through the type of relation, to create a system of interdependence.
There are basically two types of networks: local-area networks (LAN) and wide-area networks
(WAN). The main difference between the two is the way in which they are geographically distributed.
Local-Area Networks
Local-area networks emerged in the early 1970s as a substitute for large mainframe computer
systems. For many enterprises, it is more economical to have a number of small computers,
each with its own self-contained applications, than to have a single large system. Because
each small computer is likely to need a full complement of peripheral devices (such as disks
and printers), and because some form of data sharing is likely to occur in a single enterprise,
it was a natural step to connect these small systems into a network.
Wide-Area Networks
Wide-area networks emerged in the late 1960s, mainly as an academic research project to
provide efficient communication among sites, allowing hardware and software to be shared
conveniently and economically by a wide community of visers. The first WAN to be
designed and developed was the Arpanet. Begun in 1968, the Arpanet has grown from a four-
site experimental network to a worldwide network of networks, the Internet, comprising
millions of computer systems.
Network Topology:
A network topology is the physical and logical arrangement of nodes and
connections in a network. Nodes usually include devices such as switches,
routers and software with switch and router features. Network topologies
are often represented as a graph.
Types of Network Topology:
Point to Point Topology
Mesh Topology
Star Topology
Bus Topology
Ring Topology
Tree Topology
Hybrid Topology
Mesh Topology
Star Topology
In Star Topology, all the devices are connected to a single hub through a cable. This
hub is the central node and all other nodes are connected to the central node. The
hub can be passive in nature i.e., not an intelligent hub such as broadcasting devices,
at the same time the hub can be intelligent known as an active hub. Active hubs have
repeaters in them. Coaxial cables or RJ-45 cables are used to connect the computers.
In Star Topology, many popular Ethernet LAN protocols are used as CD(Collision
Detection), CSMA (Carrier Sense Multiple Access), etc.
Star Topology
Bus Topology
Bus Topology is a network type in which every computer and network device is
connected to a single cable. It is bi-directional. It is a multi-point connection and a
non-robust topology because if the backbone fails the topology crashes. In Bus
Topology, various MAC (Media Access Control) protocols are followed by LAN
ethernet connections like TDMA, Pure Aloha, CDMA, Slotted Aloha, etc.
Bus Topology
Advantages of Bus Topology
If N devices are connected to each other in a bus topology, then the
number of cables required to connect them is 1, known as backbone cable,
and N drop lines are required.
Coaxial or twisted pair cables are mainly used in bus-based networks that
support up to 10 Mbps.
The cost of the cable is less compared to other topologies, but it is used to
build small networks.
Bus topology is familiar technology as installation and troubleshooting
techniques are well known.
CSMA is the most common method for this type of topology.
Drawbacks of Bus Topology
A bus topology is quite simpler, but still, it requires a lot of cabling.
If the common cable fails, then the whole system will crash down.
If the network traffic is heavy, it increases collisions in the network. To
avoid this, various protocols are used in the MAC layer known as Pure
Aloha, Slotted Aloha, CSMA/CD, etc.
Adding new devices to the network would slow down networks.
Security is very low.
A common example of bus topology is the Ethernet LAN, where all devices are
connected to a single coaxial cable or twisted pair cable. This topology is also used
in cable television networks.
Ring Topology
In a Ring Topology, it forms a ring connecting devices with exactly two neighboring
devices. A number of repeaters are used for Ring topology with a large number of
nodes, because if someone wants to send some data to the last node in the ring
topology with 100 nodes, then the data will have to pass through 99 nodes to reach
the 100th node. Hence to prevent data loss repeaters are used in the network.
The data flows in one direction, i.e. it is unidirectional, but it can be made
bidirectional by having 2 connections between each Network Node, it is called Dual
Ring Topology. In-Ring Topology, the Token Ring Passing protocol is used by the
workstations to transmit the data.
Ring Topology
Tree Topology
Hybrid Topology
This topological technology is the combination of all the various types of topologies
we have studied above. Hybrid Topology is used when the nodes are free to take any
form. It means these can be individuals such as Ring or Star topology or can be a
combination of various types of topologies seen above. Each individual topology
uses the protocol that has been discussed earlier.
Hybrid Topology
Communication Protocols:
Communication protocol is a system of rules that allows two or more entities of a
communications system to transmit information via any kind of variation of a
physical quantity.
There are 3 types of communication that occur across services that help us in
building quality scalable systems:
1. Synchronous Communication
2. Asynchronous Communication
3. Message-based Communication
1. Synchronous Communication
Synchronous communication is a form of communication when two or more parties
exchange information nonstop from beginning to end.
An example of synchronous communication or a blocking call is when the client has
made a request and waits for its fulfillment.
Consider below 3 steps:
1. Data is being fetched from the database.
2. Updation over data received.
3. Return the updated data.
Message-based Communication
Asynchronous communication is
Synchronous communication is costly.
economical.
In this transmission, users have to wait Here, users do not have to wait for the
till the transmission is complete before completion of transmission in order to get
getting a response back from the server. a response from the server.
Robustness:
A distributed system may suffer from various types of hardware failure. The failure of a link, the
failure of a site, and the loss of a message are the most common types. To ensure that the system
is robust, we must detect any of these failures, reconfigure the system so that computation can
continue, and recover when a site or a link is repaired.
Failure Detection
In an environment with no shared memory, we are generally unable to differentiate among link
failure, site failure, and message loss. We can usually detect only that one of these failures has
occurred. Once a failure has been detected, appropriate action must be taken. What action is
appropriate depends on the particular application.
Reconfiguration
Suppose that site A has discovered, through the mechanism described in the previous section, that a
failure has occurred. It must then initiate a procedure that will allow the system to reconfigure and to
continue its normal mode of operation.
• If a direct link from A to B has failed, this information must be broadcast to every site in the
system, so that the various routing tables can be updated accordingly.
• If the system believes that a site has failed (because that site can be reached no longer), then all
sites in the system must be so notified, so that they will no longer attempt to use the services of the
failed site. The failure of a site that serves as a central coordinator for some activity (such as
deadlock detection) requires the election of a new coordinator. Similarly, if the failed site is part of a
logical ring, then a new logical ring must be constructed.
Design Issues:
Design issues of the distributed system –
1. Heterogeneity: Heterogeneity is applied to the network, computer
hardware, operating system, and implementation of different developers.
A key component of the heterogeneous distributed system client-server
environment is middleware. Middleware is a set of services that enables
applications and end-user to interact with each other across a
heterogeneous distributed system.
2. Openness: The openness of the distributed system is determined primarily
by the degree to which new resource-sharing services can be made
available to the users. Open systems are characterized by the fact that their
key interfaces are published. It is based on a uniform communication
mechanism and published interface for access to shared resources. It can
be constructed from heterogeneous hardware and software.
3. Scalability: The scalability of the system should remain efficient even
with a significant increase in the number of users and resources connected.
It shouldn’t matter if a program has 10 or 100 nodes; performance
shouldn’t vary. A distributed system’s scaling requires consideration of a
number of elements, including size, geography, and management.
4. Security: The security of an information system has three components
Confidentially, integrity, and availability. Encryption protects shared
resources and keeps sensitive information secrets when transmitted.
5. Failure Handling: When some faults occur in hardware and the software
program, it may produce incorrect results or they may stop before they
have completed the intended computation so corrective measures should
to implemented to handle this case. Failure handling is difficult in
distributed systems because the failure is partial i, e, some components fail
while others continue to function.
6. Concurrency: There is a possibility that several clients will attempt to
access a shared resource at the same time. Multiple users make requests on
the same resources, i.e. read, write, and update. Each resource must be
safe in a concurrent environment. Any object that represents a shared
resource in a distributed system must ensure that it operates correctly in a
concurrent environment.
7. Transparency: Transparency ensures that the distributed system should
be perceived as a single entity by the users or the application programmers
rather than a collection of autonomous systems, which is cooperating. The
user should be unaware of where the services are located and the transfer
from a local machine to a remote one should be transparent.
The following are the various kinds of transparency that exist in distributed systems:
Access Transparency
Location Transparency
Concurrency Transparency
Replication Transparency
Failure Transparency
Mobility Transparency
Performance Transparency
Scaling Transparency
Parallelism Transparency
1. Access Transparency: Access Transparency allows the same operations to be
used to access local and remote resources. The file distribution must be hidden
from the clients. The storing of data on separate servers that are physically
separated, and a common set of actions should be available to access both
remote and local files. Applications for local files are to be designed such that
they should be able to run on remote files as well.
Examples – The File system in Network File System (NFS), SQL queries, and web
navigation exhibits the feature of access transparency.
2. Location Transparency: Location Transparency permits access to resources
regardless of their physical or network location. There should be a view of a
consistent file namespace for the clients. It must possess the feature of moving files
such that their pathnames are not to be affected. There is no information regarding
the physical location of the object in case of a location transparent name. It is a quite
vital and critical feature for facilitating resource movement and service availability.
Location and Access Transparency together makes Network transparency.
Examples – NFS file system and the pages of the web.
3. Concurrency Transparency: Concurrency Transparency permits many processes
to run in parallel using shared resources without interfering with one another. As we
know distributed systems exhibit concurrent environments so the shareable items are
all accessed at the same time. It is hard to control Concurrency and implementation.
Example – Automatic Teller Machine (ATM) network.
4. Replication Transparency: Replication Transparency ensures the existence of
numerous instances of resources to improve reliability and performance without
having to know about replication to the user. In other words, this type of
transparency should primarily be applied to distributed file systems, where
replication of data over two or more sites exists for increased reliability. The
existence of a mirrored copy of data must be unknown to the client.
Example- Distributed DBMS (Database Management System).
5. Failure Transparency: Failure Transparency permits fault abstraction in the
background, allowing users and application programs to execute tasks even when
hardware and software components fail. The fault tolerance property is exhibited by
the procedures that deal with access transparency. The main concern in the
distributed system is that they are more prone to failure since any of the components
could fail, resulting in a degraded or non-existent/unavailable service. It is quite
difficult to tell the difference between a failed and a slow-running operation since
the complexities are hidden.
Examples – Database Management Systems (DBMS).
6. Mobility (Migration) Transparency: Mobility Transparency lets a system or
resources move around without disrupting user or software processes. It also
bolsters the load balancing of any client that may be overburdened.
Examples – Network File System (NFS) and Web pages.
7. Performance Transparency: Performance Transparency enables system
reconfiguration to increase or enhance performance.
8. Scaling (Size) Transparency: Scaling Transparency enables systems and
applications to scale up without requiring changes to the system architecture or
application techniques. The resources that are not of any relevance to the user are
hidden from the user and application programmers. The smooth development and
evolution with this type of transparency are critical for most businesses. A system
should be able to scale down to small surroundings when necessary, as well as be
space and/or time-efficient when required.
Example – World Wide Web (WWW)
9. Parallelism Transparency: Parallelism Transparency enables parallel activities
to run without users knowing how, where and when it is made possible by the
systems.
Degree of Transparency:
Stateless Protocol does not require the Stateful Protocol require server to save
Stateless Protocol Stateful Protocol
server to retain the server information or the status and session information.
session details.
Stateless Protocols are easy to implement in Stateful protocols are logically heavy
Internet. to implement in Internet.
The requests are not dependent on the The requests are always dependent on
server side and are self contained. the server side.
File Replication:
Replication is the practice of keeping several copies of data in different places.
Types of Replication
Active Replication
Passive Replication
Active Replication:
Passive Replication:
The client request goes to the primary replica, also called the main replica.
There are more replicas that act as backup for the primary replica.
Primary replica informs all other backup replicas about any modification
done.
The response is returned to the client by a primary replica.
Periodically primary replica sends some signal to backup replicas to let
them know that it is working perfectly fine.
In case of failure of a primary replica, a backup replica becomes the
primary replica
Advantages:
The resource consumption is less as backup servers only come into play
when the primary server fails.
The time complexity of this is also less as there’s no need for updating in
all the nodes replicas, unlike active replication.
Disadvantages:
If some failure occurs, the response time is delayed.
Distributed Coordination:
Event Ordering:
Event ordering is essential in distributed systems because it determines order in which events
occur. In a distributed system, events can happen concurrently on different nodes, and
maintaining a consistent order of these events is crucial for proper functioning of system. A
typical example is a bank transaction system, where system must ensure that withdrawals and
deposits are processed in correct order. If system does not maintain proper ordering, it could
result in inconsistencies, such as double withdrawals or deposits.
Total ordering
Partial ordering
Total Ordering
Total ordering guarantees that all events in system occur in same order on all nodes. In other
words, there is a total order of events that is consistent across all nodes in system. Total ordering
is critical in systems where order of events is critical, such as in financial transactions, where
order of events determines account balance.
Partial Ordering
Partial ordering only guarantees that some events are ordered with respect to each other. In other
words, there is no guarantee that all events will be ordered in same way on all nodes in system.
Partial ordering is useful in systems where order of events is not critical, such as in social media
applications, where order of posts is not crucial.
Mutual Exclusion:
Mutual exclusion is a concurrency control property which is introduced to prevent
race conditions. It is the requirement that a process can not enter its critical section
while another concurrent process is currently present or executing in its critical
section i.e only one process is allowed to execute the critical section at any given
instance of time.
Requirements of Mutual exclusion Algorithm:
No Deadlock: Two or more site should not endlessly wait for any
message that will never arrive.
No Starvation: Every site who wants to execute critical section should get
an opportunity to execute it in finite time. Any site should not wait
indefinitely to execute critical section while other site are repeatedly
executing critical section
Fairness: Each site should get a fair chance to execute critical section.
Any request to execute critical section must be executed in the order they
are made i.e Critical section execution requests should be executed in the
order of their arrival in the system.
Fault Tolerance: In case of failure, it should be able to recognize it by
itself in order to continue functioning without any disruption.
1. Token Based Algorithm:
A unique token is shared among all the sites.
If a site possesses the unique token, it is allowed to enter its critical
section
This approach uses sequence number to order requests for the critical
section.
Each requests for critical section contains a sequence number. This
sequence number is used to distinguish old and current requests.
This approach insures Mutual exclusion as the token is unique
Example : Suzuki–Kasami Algorithm
2. Non-token based approach:
A site communicates with other sites in order to determine which sites
should execute critical section next. This requires exchange of two or
more successive round of messages among sites.
This approach use timestamps instead of sequence number to order
requests for the critical section.
When ever a site make request for critical section, it gets a timestamp.
Timestamp is also used to resolve any conflict between critical section
requests.
All algorithm which follows non-token based approach maintains a logical
clock. Logical clocks get updated according to Lamport’s scheme
Example : Ricart–Agrawala Algorithm
3. Quorum based approach:
Instead of requesting permission to execute the critical section from all
other sites, Each site requests only a subset of sites which is called
a quorum.
Any two subsets of sites or Quorum contains a common site.
This common site is responsible to ensure mutual exclusion
Example : Maekawa’s Algorithm
DeadLock Handling:
Deadlock is a situation where a process or a set of processes is blocked, waiting for
some other resource that is held by some other waiting process. It is an undesirable
state of the system.
Methods of handling deadlocks:
1. Deadlock Prevention
2. Deadlock avoidance (Banker's Algorithm)
3. Deadlock detection & recovery
4. Deadlock Ignorance (Ostrich Method)
Reaching Agreement:
For a system to be reliable, we need a mechanism that allows a set of processes to agree on a
common value. Such an agreement may not take place, for several reasons. First, the
communication medium may be faulty, resulting in lost or garbled messages. Second, the
processes themselves may be faulty, resulting in unpredictable process behavior.
The best we can hope for in this case is that processes fail in a clean way, stopping their
execution without deviating from their normal execution pattern. In the worst case,
processes may send garbled or incorrect messages to other processes or even collaborate
with other failed processes in an attempt to destroy the integrity of the system.
The Byzantine generals problem provides an analogy for this situation. Several divisions of
the Byzantine army, each commanded by its own general, surround an enemy camp. The
Byzantine generals must reach agreement on whether or not to attack the enemy at dawn. It
is crucial that all generals agree, since an attack by only some of the divisions would result
in defeat.
The various divisions are geographically dispersed, and the generals can communicate with
one another only via messengers who run from camp to camp. The generals may not be able
to reach agreement for at least two major reasons:
1. Messengers may get caught by the enemy and thus may be unable to deliver their
messages. This situation corresponds to unreliable communication in a computer system and
is discussed further in Section 18.7.1.
2. Generals may be traitors, trying to prevent the loyal generals from reaching an agreement.
This situation corresponds to faulty processes in a computer system and is discussed further
in Section 18.7.2.
Unreliable Communications
Let us first assume that, if processes fail, they do so in a clean way and that the
communication medium is unreliable. Suppose that process P,- at site Si, which has sent a
message to process P; at site S2, needs to know whether Pj has received the message so that
it can decide how to proceed with its computation. For example, P, may decide to compute a
function foo if Pj has received its message or to compute a function boo if Pj has not
received the message (because of some hardware failure).