Distributed Systems and Cloud
Computing
                                                      Examination Scheme
 Course                           Contact         Theory
              Course Name                                                   End
  Code                             Hours
                                                                           Sem.   Total
                                                              Total        Exam
                                            CA   MSE
                                                           (CA + MSE)
ME-MCA-3    Distributed Systems
                                    03      25   20            50           50    100
   15      and Cloud Computing
                               Syllabus and Course Outcomes
No.                          Module                                Course Outcomes(CO)
      Introduction to Distributed Computing Concepts:
                                                            CO1: Explain the architecture, design issues
 1    Inter Process Communication, Remote Communication     and communication mechanisms in distributed
                                                            systems
 2    Time & Coordination in Distributed Systems            CO2: Apply synchronization techniques and
                                                            shared memory management in distributed
 3    Distributed Shared Memory & Data Consistency          environments.
 4    Distributed System Management: Resource Management,   CO3: Analyze process and resource
      Process Management, Distributed File System           management in distributed systems.
 5    Cloud Computing Foundations & Architectures - Cloud
      Computing Architecture
                                                            CO4: Analyze Cloud computing and cloud
 6    Cloud Platforms, Security & Challenges - Cloud        models
      Service Providers, Implementation, Security and
      Compliance
Reference Books
● Pradeep K. Sinha , Distributed Operating System: Concepts and Design
● Dr. Sunita Mahajan , Seema Shah, Distributed Computing
Module 1 - Introduction to Distributed
       Computing Concepts
     Module 1 - Introduction to Distributed Computing
                         Concepts
No.                                                 Topics
        Introduction to Distributed Computing Concepts:
        Basic concepts of distributed systems, distributed computing models, issues in designing
 1
        distributed systems. CAP Theorem: Trade-offs between Consistency, Availability, and Partition
        Tolerance. Scalability & Fault Tolerance in Distributed Systems
        Inter Process Communication:
 2      Fundamental concepts related to inter process communication including message passing
        mechanism, Concepts of group communication
 3      Remote Communication:
        Remote Procedure Call (RPC), Remote Method Invocation (RMI)
                 Basic Terminology of Computing
●   Uses of a network
    — Information  sharing
    — Resource sharing
    — Higher reliability
    — All of above
●   Response time is
    — Time taken to complete task
    — Time taken to give first response to request
                  Basic Terminology of Computing
●   Centralized Multi user system will have
    —One server – multiple terminals
    —Multiple servers- multiple terminals
●   Throughput is
    —No. of tasks processed in per unit of time
    —Total no. of tasks executed
●   Extensibility of a system means
    —To expand the system over a network
    —Ability to modify or add more components
                   Basic Terminology of Computing
●   Fault tolerance means
    —System gives correct result in spite of wrong program
    —Continue operating properly even if components fail
●   Mobility means
    —Can access the system from any place
    —System can be accessed from any device
●   System availability means
    —System will not fail when it is needed
    —System will be ready to use all the time
Centralized Multi-user System
●   All processing and data storage is managed by a single, central server.
●   Most commonly used architecture, where a client sends a request to the
    company/organization server and receives the response.
Problems:
●   Single point of failure                    Network
●   Difficult to expand
Distributed Computing System
●   Distributed computing refers to a system where processing and data storage
    is distributed across multiple devices or systems, rather than being handled
    by a single central device.
●   A collection of independent computers that appears to its users as a single
    coherent system. (Andrew Tanenbaum).
Distributed Computing System
●   A collection of heterogeneous nodes connected by one or more interconnection
    networks which provides access to system-wide shared resources and services.
●   It is basically a collection of interconnected processors covering wide geographical
    area in which each processor has its own local memory and other peripherals.
●   The communication between any two processor takes place by message passing
    over communication network.
Example of Distributed Computing System
Distributed Computing System
●   A type of computing in which different components and objects comprising an
    application can be located on different computers connected to a network.
●   Ex, a word processing application might consist of an editor component on one
    computer, a spell-checker object on a second computer, and a thesaurus on a third
    computer. In some distributed computing systems, each of the three computers could
    even be running a different operating system.
●   The data used in a distributed processing environment is also distributed across
    platforms.
Distributed Applications
●   Automated banking systems
●   Tracking roaming cellular phones
●   Global positioning systems
●   Passenger reservation system: railways and airlines
●   Amazon, Flipkart
●   Air-traffic control
●   Avionics (fly-by-wire)
●   Research Institutions
●   The World Wide Web
Hardware Considerations
●   Architecture of interconnected multiple processors are of two types:
    ○   Tightly Coupled System
    ○   Loosely Coupled System
Parallel vs Distributed Architecture
Parallel vs Distributed Architecture
●   Tightly Coupled System (Shared Memory Architecture)
     ○ Single system wide primary memory.
     ○ Communication takes place through shared memory.
     ○ No. of systems are limited by bandwidth of shared memory.
     ○ Sometimes called Parallel Processing Systems.
●   Loosely Coupled System (Distributed Memory architecture)
     ○ Each processor has its own local memory.
     ○ Communication is done by passing message across the network.
     ○ Scalable - It can have unlimited number of processors.
     ○ Processors may be geographically separated.
     ○ It is known as Distributed Computing System.
                  Parallel Systems           Distributed Systems
Memory            Tightly coupled shared     Distributed memory
                  memory                     Message passing, RPC, and/or used of
                                             distributed shared memory
Control           Global clock control       No global clock control. Synchronization
                                             algorithms needed.
Processor         Order of Tbps              Order of Gbps
interconnection   Bus, mesh, tree, mesh of   Ethernet(bus), token ring and
                  tree, and hypercube        SCI (ring)
                  network
Main focus        Performance - Scientific   Performance - cost and scalability,
                  computing                  Reliability, Information/resource sharing
Network Operating System
Distributed Operating System
                 Network OS                                 Distributed OS
Single System    ●   User is aware of the fact that         ●    YES. Provides virtual uniprocessor image to
Image                multiple computers are being used.          the user. Ex. DOS          dynamically and
                     Ex.     Selection of machine for            automatically   allocates jobs to various
                     executing a job     is manual via           machines.
                     remote login.
                 ●   Has to know location of resource to    ●    Single set of globally valid system calls.
                     access it & use different system
                     calls to access local & remote
                     resource
Autonomy         ●   High. Local OS at each computer &      ●    Low. A single system-wide OS.
                     communicate       via     common       ●    Processes & resources managed globally
                     communication protocol. Shared file
                     system.
                 ●   Each      computer       functions
                     independently & manages its own
                     processes & resources.
Fault           Unavailability grows as faulty machines    Unavailability remains little even if fault machines
Tolerance       increase.                                  increase.
Distributed Computing System
            Models
1. Minicomputer Model
MiniComputer Model
●   Simple extension of centralized time-sharing system.
●   Consists of a few minicomputers interconnected by a communication
    network where each minicomputer usually has multiple users simultaneously
    logged on to it.
●   Several interactive terminals are connected to each minicomputer. Each user
    logs on to one specific minicomputer that has remote access to other
    minicomputers. Does not reflect uniprocessor image.
●   Network allows a user to access remote resources that are available on
    some machine other than the one on to which the user is currently logged.
●   Used for resource sharing with remote users.
●   Ex. ARPA Net
2. Workstation Model
Workstation Model
●   WorkStation
     ○ A powerful, single-user computer, like a personal computer, but has a
       more powerful microprocessor.
     ○ Each has its own local disk and a local file system – diskful workstation.
●   Process migration
     ○ Users first log on his/her personal workstation.
     ○ If there are idle remote workstations, one or more processes are
        migrated to one of them.
     ○ Result of execution migrated back to user’s workstation.
Workstation Model
●   Issues to be resolved:
     ○ How to find an idle workstation
     ○ How to migrate a job
     ○ What if a user logs on the remote machine executing process of another
        machine – run two processes simultaneously, kill remote process,
        migrate process back to its home workstation ?
●   Examples – Sprite System, Xerox PARC
3. Workstation-Server Model
Workstation-Server Model
●   Client WorkStation
     ○ Network of personal workstations each with its own disk and local file
         system.(diskful workstations).
     ○ Local disk of diskful workstation used for storage of temporary files, etc.
     ○ Workstation without local disk is called diskless workstation.
●   Server minicomputers
     ○ Each minicomputer is dedicated to one or more different types of services,
         for managing & providing access to to shared resources.
     ○ Multiple servers used for a service for better scalability and higher reliability.
●   User logs on to his machine. Normal computation activities carried at home
    workstation but services provided by special servers.
●   No process migration involved. Request Response Protocol implemented.
Workstation-Server Model
●   Advantages
     ○ Cheaper - few minicomputers vs. large no. of diskful workstations
     ○ Backup and hardware maintenance easier
     ○ Flexibility to access files from any file server.
     ○ No process migration
     ○ Guaranteed response time(no remote process execution)
●   Disadvantage
     ○ Does not exploit idle workstations
●   Client-Server model of communication
     ○ RPC (Remote Procedure Call) & RMI (Remote Method Invocation)
●   Most widely user model for building distributed Systems.
    Example: V system.
4. Processor-Pool Model
Processor-Pool Model
●   Based on the observation that most of the time a user does not need any computing
    power but once in a while the user may need a very large amount of computing power
    for a short time.
●   Processors (microcomputers & minicomputers ) are pooled together to be shared by
    users as needed.
●   Each processor has its own memory to load and run a system program or an
    application program of the DCS.
●   Servers- Necessary number of processors are allocated to each user from the pool by
    run server.
Processor-Pool Model
●   No concept of home machine. User logs on to system as whole.
●   Better utilization of processing power but less interactivity
●   Greater flexibility – processors can act as extra servers
●   Unsuitable for high performance interactive application as communication slow between
    processor & terminal
●   Example – Amoeba, Cambridge Distributed System
Hybrid Model
●   Combines advantage of both the workstation – server and processor - pool model
●   Based on workstation – server model with additional pool of processors
●   The processor in the pool can perform large computations
●   Workstation-server model can perform user interactive jobs.
●   Hybrid model is more expensive to implement.
Distributed Computing System Models
Think!!
Suppose a component of a distributed system suddenly crashes. How will this
inconvenience the users when one of the following happens:
1.   The system uses processor-pool model and crashed component is a processor in the
     pool.
2.   In processor-pool model, a user terminal crashes.
3.   The system uses a workstation-server model and server crashes.
4.   In the workstation-server model, one of the client crashes.
Issues - Answer!!!
●   Processor-Pool Model and Crashed Component is a Processor in the Pool
     ○   Reduced Processing Power
     ○   Task Reassignment
     ○   Load Balancing Issues
●   Processor-Pool Model and a User Terminal Crashes
     ○   Loss of Session Data
     ○   Reconnection Efforts.
     ○   Minimal Impact on Others.
●   Workstation-Server Model and Server Crashes
     ○   System-Wide Downtime.
     ○   Data Loss
     ○   Recovery Time.
●   Workstation-Server Model and a Client Crashes
     ○   Loss of Session Data
     ○   Minimal Impact on Server and Other Clients
     ○   Reconnection Required
Issues in Distributed Computing
            Systems.
Transparency
●   How to achieve the single-system image, i.e., how to make a collection of
    computers appear as a single computer.
●   8 transparencies identified -
     ○   Access Transparency
     ○   Location Transparency
     ○   Replication Transparency
     ○   Failure Transparency
     ○   Migration Transparency
     ○   Concurrency Transparency
     ○   Performance Transparency
     ○   Scaling Transparency
Transparency in a Distributed System
●   Access Transparency
     ○   Allow uniform access to resources(local or remote).
     ○   Use global set of system calls & global resource naming facility ( ex. URL).
●   Location Transparency
     ○   Hide where a resource is located. Require system-wide, global unique resource naming facility.
     ○   Name transparency – Name of resource should not reveal its physical location. Allow
         resource migration without changing name.
     ○   User Mobility – User should be able to freely log on to any machine in the system and access
         a resource with same name.
Transparency in a Distributed System
●   Replication Transparency
     ○ Naming of replicas – name various replicas & map user supplied name of resource
         to appropriate replica.
     ○ Replication control – How many replicas, where place them, when create/delete
         decision to be taken by system.
●   Migration Transparency
     ○ Movement of object is handled automatically by system & following issues are
         taken care of –
          ■ Migration decision made automatically by system.
          ■ Name of resource remains same on migration from one node to another
          ■ IPC ensures proper receipt of message by process, even if it further
              migrates.
Transparency in a Distributed System
●   Failure Transparency
     ○ Partial failure - masking partial failures in the system from the user. Ex. machine
          failure/storage crash
     ○ Complete failure - Not achievable with the current state of the art DOS. Ex. Failure
          of communication network often disrupts the work and is noticeable by the user.
●   Concurrency Transparency
     ○ Each user feels that he is sole user of system. Resource sharing mechanism
          should follow -
            ■ Event ordering property- All access requests to various system resources are
                properly ordered to provide a consistent view to all users
            ■ Mutual exclusion property – At any time at most one process accesses a
                shared resource, not to be used simultaneously by multiple processes.
            ■ No starvation property – Every process releases the resource eventually
            ■ No deadlock property
Transparency in a Distributed System
●   Performance Transparency
     ○ System is automatically configured to improve performance as per load
        varying in the system.
     ○ Processing capability of the system should be uniformly distributed
        among the current jobs available.
●   Scaling Transparency
     ○ System can expand in scale without disrupting activities of users
     ○ Open system architecture and scalable algorithms.
Reliability
●   Faults
     ○ Fail stop
          ■ system stops functioning after changing to a state in which failure
            is detected.
     ○ Byzantine failure
          ■ system continues to function but produces wrong result.
Reliability
●   Fault avoidance
     ○ Occurrence of faults is minimized by making components more reliable & testing
         thoroughly.
●   Fault tolerance
     ○ Redundancy techniques
           ■ K-fail stop failures needs K + 1 replicas
           ■ K-Byzantine failures needs 2K + 1 replicas.
     ○ Distributed control
           ■ Many of algorithms or protocols are use to avoid a single point of failure.
●   Fault detection and recovery
     ○ Atomic transaction - all operations execute or no effect occurs.
     ○ Stateless servers -
     ○ Acknowledge & timeout based retransmissions of messages
CAP Theorem
Introduction to CAP Theorem
●   The CAP Theorem (also known as Brewer’s Theorem) is a fundamental principle
    in distributed systems.
●   It states that in the event of a network partition, a distributed system can
    guarantee only two out of the following three properties:
     ○   Consistency
     ○   Availability
     ○   Partition Tolerance
●   This theorem was first introduced by Eric Brewer in 2000 and formally proved
    by Gilbert and Lynch in 2002.
●   It serves as a crucial guideline for architects and engineers when designing
    distributed databases and systems, especially in environments where
    network failures are possible or even likely.
What is CAP in CAP Theorem?
 Consistency                                         Availability
                                                   All the clients can
   Each client        CAP Theorem                   always read and
 always has the
  same view of                                     write , but it might
     data.                                          not be the most
                                                         recent.
                            Partition
                  The system continues cooperate
                      despite network failure.
What is 🧢 in CAP Theorem?
Each letter in CAP stands for a specific property:
Property              Description
                      Every read receives the most recent write or an error. All nodes see the same data at
Consistency (C)
                      the same time.
                      Every request receives a (non-error) response, without guarantee that it contains the
Availability (A)
                      most recent write.
Partition Tolerance   The system continues to operate despite arbitrary network partitioning or failures
(P)                   between nodes.
Consistency
●   Consistency defines that all clients see the same data simultaneously, no
    matter which node they connect to in a distributed system.
●   For eventual consistency, the guarantees are a bit loose. Eventual consistency
    guarantee means client will eventually see the same data on all the nodes at
    some point of time in the future.
●   All nodes in the system see the same data at the same time. This is because
    the nodes are constantly communicating with each other and sharing
    updates.
●   Any changes made to the data on one node are immediately propagated to all
    other nodes, ensuring that everyone has the same up-to-date information.
Availability
●   Availability defines that all non-failing nodes in a distributed system return a
    response for all read and write requests in a bounded amount of time, even if
    one or more other nodes are down.
●   User send requests, even though we don't see specific network components.
    This implies that the system is available and functioning.
●   Every request receives a response, whether successful or not. This is a crucial
    aspect of availability, as it guarantees that users always get feedback.
Partition Tolerance
●   Partition Tolerance defines that the system continues to operate despite
    arbitrary message loss or failure in parts of the system. Distributed systems
    guaranteeing partition tolerance can gracefully recover from partitions once
    the partition heals.
●   Addresses network failures, a common cause of partitions. It suggests that
    the system is designed to function even when parts of the network become
    unreachable.
●   The system can adapt to arbitrary partitioning, meaning it can handle
    unpredictable network failures without complete failure.
Trade-Offs in the CAP Theorem
 Trade-Offs in the CAP Theorem
 According to the CAP theorem, a distributed system can only guarantee two out of the three properties at
 the same time—not all three, especially during network partitions:
Model   Properties Guaranteed       Trade-off Example
                                    Not partition tolerant. Works only if the network is reliable. Rare in
 CA     Consistency, Availability
                                    distributed systems.
        Consistency, Partition      May sacrifice availability during a partition. Example: Some banking
 CP
        Tolerance                   systems, HBase, MongoDB.
        Availability, Partition     May sacrifice consistency during a partition. Example: NoSQL databases
 AP
        Tolerance                   like Cassandra, DynamoDB.
CA (Consistency + Availability)
●   The system guarantees that all nodes see the same data at the same time
    (consistency) and that every request receives a response (availability)—but
    only as long as the network is reliable and there are no partitions.
●   Design strategies:
     ○   Use a centralized data store or tightly coupled cluster, ensuring all updates are immediately
         visible to all nodes.
     ○   Master-slave replication or distributed locking can be used to prevent conflicting updates.
CA (Consistency + Availability) (cont.)
●   Limitations: If a network partition occurs, the system cannot guarantee both
    properties. It may become unavailable or inconsistent until the partition is
    resolved.
●   Use cases: Rare in large-scale distributed systems due to the inevitability of
    network partitions. More common in single-site databases or tightly
    controlled environments where network failures are extremely rare.
●   Example: Traditional relational databases deployed on a single server or
    within a single, highly reliable data center.
CP (Consistency + Partition Tolerance)
●   What it means: The system ensures that all nodes see the same data, even if
    some parts of the network cannot communicate (partition tolerance). To
    maintain this, the system may become unavailable to some requests during a
    partition.
●   Design strategies:
     ○   Use distributed consensus protocols (like Paxos or Raft) to ensure all nodes agree on the data
         state, even across partitions.
     ○   If a partition occurs, nodes may refuse to process requests that could compromise
         consistency, effectively sacrificing availability.
CP (Consistency + Partition Tolerance) (cont.)
●   Limitations: Users may experience downtime or errors during network
    failures, as the system prefers to deny requests rather than risk
    inconsistency.
●   Use cases: Critical systems where data accuracy is paramount, and
    temporary unavailability is preferable to serving incorrect data.
●   Example: Banking systems (to prevent double-spending), databases like
    MongoDB, HBase, and Redis in certain configurations.
AP (Availability + Partition Tolerance)
●   What it means: The system remains available and continues to operate even if
    network partitions occur (partition tolerance). However, it may serve stale or
    inconsistent data during partitions.
●   Design strategies:l
     ○   Use eventual consistency: updates are propagated to all nodes over time, but not necessarily
         immediately.
     ○   Implement conflict resolution mechanisms to reconcile differences when partitions heal.
AP (Availability + Partition Tolerance)
1.   Limitations: There is a window of time when different nodes may return
     different data, leading to temporary inconsistencies.
2.   Use cases: Applications where high availability is more important than
     immediate consistency, and users can tolerate seeing slightly outdated data.
3.   Example: Social media newsfeeds, shopping carts during browsing, NoSQL
     databases like Cassandra, DynamoDB, and CouchDB
System   Consistency   Availability   Partition   Typical Use Cases             Limitation During Partition
Type                                  Tolerance
                                                   Single-site databases,            Fails or becomes
CA           Yes            Yes           No
                                                        reliable LANs                   inconsistent
                                                  Banking, financial, critical         Some requests
CP           Yes            No            Yes
                                                           records                  denied/unavailable
                                                        Social media,                    May serve
AP           No             Yes           Yes
                                                   e-commerce browsing            stale/inconsistent data
Stateful vs. Stateless File Server
Stateful File Server
Stateless File Server
Flexibility - Monolithic vs. Micro Kernel
Flexibility
●   Choosing appropriate kernel
     ○ Monolithic kernel
        ■ Kernel where the entire operating system is working in the kernel
            space and in supervisor mode.
     ○ Micro kernel
        ■ Kernel is reduced to contain minimal facilities necessary, and the
            other system services are implemented as user level server
            processes. Each server process has its own address space & can be
            programmed separately.
        ■ Highly modular, thus easy to design, modify & add new services.
        ■ Requires message passing & context switching
Advantages of Micro kernel
●   Modular hence very flexible
●   Each server is an independent process having its own address space.
●   Simple & flexible.
●   Ease of design, maintenance & portability.
●   Requires message passing & context switching.
●   Slower than monolithic kernel.
Performance
●   Performance of a distributed system should be at least as good as
    centralized system.
●   Various performance metrics - response time, throughput, system
    utilization, network capacity utilization.
Design Issues to Increase Performance
●   Batch if possible
     ○ Transfer data across network in large chunks rather than as individual pages.
     ○ Piggybacking of acknowledgement.
●   Cache data on client side whenever possible
     ○ Reduces contention on centralized resources.
     ○ Saves computation time & network bandwidth.
●   Minimize copying of data
●   Minimize network traffic
     ○ Migrate process closer to its resource.
     ○ Cluster processes that communicate frequently.
●   Fine grain parallelism; simultaneously service requests from several clients. Use
    threads
Scalability
●   Capability of a system to adapt to increased service load without disruption
    of service or significant loss of performance.
     ○ Avoid centralized entities
     ○ Avoid centralized algorithms
     ○ Perform most operations on client workstations
 Concept                            Example
 Centralized Services               A single server for all users
 Centralized Data                   A single online telephone book
 Centralized Algorithms             Doing routing based on complete
                                    information
Scalability
●   Avoid centralized entities
     ○ Failure of centralized entity brings entire system down
     ○ Performance of centralized entity becomes system bottleneck
     ○ Capacity of network that connects centralized entity gets saturated
●   Avoid centralized algorithms
     ○ Collects information from all nodes, process it on single node, then distributes
         result to other nodes
●   Perform most operations on client workstation
     ○ Server is common resource for several clients.
Heterogeneity
●   Caused by interconnected sets of dissimilar hardware or software systems (Ex. different
    topologies, protocols, word lengths(32/64) etc).
●   Data and instruction formats depend on each machine architecture.
●   If a system consists of K different machine types, we need K–1 translation software's,
    at sender/receiver. Adding new data format becomes difficult.
●   Instead use intermediate standard data format.
Security
●   Difficult due to lack of a single point of control & use of insecure networks
    for data communication
●   Security concerns:
     ○ Messages may be stolen, plagiarized or changed by an intruder.
     ○ Ensure message received by intended receiver & sent by genuine
        sender.
●   Cryptography used for security.
Middleware
●   Middleware is an additional layer of software that is used in NOS to more or
    less hide the heterogeneity of the collection of underlying platforms but also
    to improve distribution transparency.
●   It offers a higher level of abstraction.
●   It is placed in the middle between applications & NOS.
Distributed System as Middleware
Distributed Computing Environment (DCE)
●   It is an integrated set of services and tools that can be installed as a
    coherent environment on top of existing OS and serve as a platform for
    building and running distributed application.
●   It runs on many different kinds of computers , OS , and network produced
    by different vendors.
●   It hides differences between machines by automatically performing data-
    type conversions, thus making heterogeneous nature of system transparent
    to application programmers.
DCE Based Distributed System
                      DCE Applications
                        DCE Software
               Operating system and networking
Questions
1.   List & explain various Distributed Computing System Models.
2.   Differentiate –Network, Distributed
3.   Difference between service, server, client. Discuss relative advantages/
     disadvantages of using single / multiple servers.
4.   Replication Transparency
5.   Explain CAP Theorem and it’s Trade-offs.
6.   Distributed Computing Environment – advantages
7.   How existence of multiple computers in a distributed system is made
     invisible & provide uniprocessor image.
8.   Short note – DOS, processor pool model
9.   Issues in designing DOS