RNS Institute of Technology, Bangalore – 98
BIS613D – Cloud Computing
For all VI semester Students
Module – 1
Distributed System Models and Enabling Technologies
RNSIT Vision and Mission
Vision
Building RNSIT into a World Class Institution
Mission
To impart high quality education in Engineering, Technology and Management
with a difference, enabling students to excel in their career by
Attracting quality Students and preparing them with a strong foundation in fundamentals
so as to achieve distinctions in various walks of life leading to outstanding contributions
Imparting value based, need based, choice based and skill based professional education
to the aspiring youth and carving them into disciplined, World class Professionals with
social responsibility
Promoting excellence in Teaching, Research and Consultancy that galvanizes academic
consciousness among Faculty and Students
Exposing Students to emerging frontiers of knowledge in various domains and make
them suitable for Industry, Entrepreneurship, Higher studies, and Research &
Development
Providing freedom of action and choice for all the Stakeholders with better visibility
Department of ISE
Vision
Building Information Technology Professionals by Imparting Quality Education and Inculcating
Key Competencies
Mission
Provide strong fundamentals through learner centric approach
Instil technical, interpersonal, interdisciplinary skills and logical thinking for holistic
development
Train to excel in higher education, research, and innovation with global perspective
Develop leadership and entrepreneurship qualities with societal responsibilities
Syllabus
Distributed System Models and Enabling Technologies: Scalable Computing Over the
Internet, Technologies for Network Based Systems, System Models for Distributed and
Cloud Computing, Software Environments for Distributed Systems and Clouds,
Performance, Security and Energy Efficiency.
Textbook 1: Chapter 1: 1.1 to 1.5
Module-01
Distributed System Models and Enabling Technologies
Scalable Computing Over the Internet
Evolution of Computing Technology
Over the last 60 years, computing has evolved through multiple platforms and
environments.
Shift from centralized computing to parallel and distributed systems.
Modern computing relies on data-intensive and network-centric architectures.
The Age of Internet Computing
High-performance computing (HPC) systems cater to large-scale computational
needs.
High-throughput computing (HTC) focuses on handling a high number of
simultaneous tasks.
The shift from Linpack Benchmark to HTC systems for measuring performance.
Platform Evolution
First Generation (1950-1970): Mainframes like IBM 360 and CDC 6400.
Second Generation (1960-1980): Minicomputers like DEC PDP 11 and VAX.
Third Generation (1970-1990): Personal computers with VLSI microprocessors.
Fourth Generation (1980-2000): Portable and wireless computing devices.
Fifth Generation (1990-present): HPC and HTC systems in clusters, grids, and cloud
computing.
Page 2
High-Performance Computing (HPC)
Focused on raw speed, measured in floating-point operations per second (FLOPS).
Used mainly in scientific, engineering, and industrial applications.
Limited to a small number of specialized users.
High-Throughput Computing (HTC)
Shift from HPC to HTC for market-oriented applications.
Used in Internet searches, web services, and enterprise computing.
Emphasis on cost reduction, energy efficiency, security, and reliability.
Emerging Computing Paradigms
Service-Oriented Architecture (SOA): Enables Web 2.0 services.
Virtualization: Key technology for cloud computing.
Page 3
Internet of Things (IoT): Enabled by RFID, GPS, and sensor technologies.
Cloud Computing: Evolution of computing as a utility.
Computing Paradigm Distinctions
Centralized Computing: All resources in one system.
Parallel Computing: Processors work simultaneously in a shared-memory or
distributed-memory setup.
Distributed Computing: Multiple autonomous computers communicate over a
network.
Cloud Computing: Uses both centralized and distributed computing over data
centers.
Distributed System Families
Clusters: Homogeneous compute nodes working together.
Grids: Wide-area distributed computing infrastructures.
P2P Networks: Client machines globally distributed for file sharing and content
delivery.
Cloud Computing: Utilizes clusters, grids, and P2P technologies.
Future Computing Needs and Design Objectives
Efficiency: Maximizing parallelism, job throughput, and power efficiency.
Dependability: Ensuring reliability and Quality of Service (QoS).
Adaptation: Scaling to billions of requests over vast data sets.
Flexibility: Supporting both HPC (scientific/engineering) and HTC (business)
applications.
Page 4
Scalable Computing Trends and New Paradigms
Computing Trends and Parallelism
Technological progress drives computing applications, as seen in
(processor speed doubling every 18 months) and (network bandwidth
doubling yearly).
Commodity hardware advancements, driven by personal computing markets, have
influenced large-scale computing.
Degrees of Parallelism (DoP):
o Bit-level (BLP): Transition from 4-bit to 64-bit CPUs.
o Instruction-level (ILP): Techniques like pipelining, superscalar processing, and
multithreading.
o Data-level (DLP): SIMD and vector processing for efficient parallel execution.
o Task-level (TLP): Parallel tasks on multicore processors, though challenging
to program.
o Job-level (JLP): High-level parallelism in distributed systems, integrating fine-
grain parallelism.
Innovative Applications of Parallel and Distributed Systems
Transparency in data access, resource allocation, job execution, and failure recovery is
essential.
Application domains:
o Banking and finance: Distributed transaction processing and data consistency
challenges.
o Science, engineering, healthcare, and web services: Demand scalable and
reliable computing.
Page 5
Challenges include network saturation, security threats, and lack of software
support.
OR
Utility Computing and Cloud Adoption
Page 6
Utility computing: Provides computing resources as a paid service (grid/cloud
platforms).
Cloud computing extends utility computing, leveraging distributed resources and
virtualized environments.
Challenges: Efficient processors, scalable memory/storage, distributed OS,
middleware, and new programming models.
Hype Cycle of Emerging Technologies
New technologies go through five stages:
o Innovation trigger Peak of inflated expectations Disillusionment
Enlightenment Productivity plateau.
2010 Predictions:
o Cloud computing was expected to mature in 2-5 years.
o 3D printing was 5-10 years from mainstream adoption.
o Mesh network sensors were more than 10 years from maturity.
o Broadband over power lines was expected to become obsolete.
Promising technologies: Cloud computing, biometric authentication, interactive
TV, speech recognition, predictive analytics, and media tablets.
The Internet of Things and Cyber-Physical Systems
The Internet of Things (IoT)
IoT extends the Internet to everyday objects, interconnecting devices, tools, and
computers via sensors, RFID, and GPS.
History: Introduced in 1999 at MIT, IoT enables communication between objects and
people.
IPv6 Impact: With 2¹² IP addresses, IoT can assign unique addresses to all objects,
tracking up to 100 trillion static or moving objects.
Page 7
Communication Models:
o H2H (Human-to-Human)
o H2T (Human-to-Thing)
o T2T (Thing-to-Thing)
Development & Challenges:
o IoT is in its early stages, mainly advancing in Asia and Europe.
o Cloud computing is expected to enhance efficiency, intelligence, and scalability in
IoT interactions.
Smart Earth Vision: IoT aims to create intelligent cities, clean energy, better healthcare,
and sustainable environments.
Cyber-Physical Systems (CPS)
CPS integrates computation, communication, and control (3C) into a closed intelligent
feedback system between the physical and digital worlds.
Page 8
Features:
o IoT vs. CPS: IoT focuses on networked objects, while CPS focuses on VR applications
in the real world.
o CPS enhances automation, intelligence, and interactivity in physical environments.
Development:
o Actively researched in the United States.
o Expected to revolutionize real-world interactions just as the Internet transformed
virtual interactions.
Technologies for Network Based Systems
Introduction to Distributed Computing Technologies
Discusses hardware, software, and network technologies for distributed computing.
Focuses on designing distributed operating systems for handling massive
parallelism.
Advances in CPU Processors
Modern CPUs use multicore architecture (dual, quad, six, or more cores).
Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP) improve
performance.
Processor speed evolution:
o 1 MIPS (VAX 780, 1978) 1,800 MIPS (Intel Pentium 4, 2002) 22,000
MIPS (Sun Niagara 2, 2008).
holds for CPU growth, but clock rates are limited (~5 GHz max) due to
heat and power constraints.
Modern CPU technologies include:
o Superscalar architecture, dynamic branch prediction, speculative execution.
o Multithreaded CPUs (e.g., Intel i7, AMD Opteron, Sun Niagara, IBM Power 6).
Page 9
Multicore CPU and Many-Core GPU Architectures
CPUs may scale to hundreds of cores but face memory wall limitations.
GPUs (Graphics Processing Units) are designed for massive parallelism and data-
level parallelism (DLP).
x86-based processors dominate HPC and HTC systems.
Page 10
Trend towards heterogeneous processors combining CPU and GPU cores on a
single chip.
Multithreading Technology
Types of processor architectures:
o Superscalar Single-threaded with multiple functional units.
o Fine-grain multithreading Switches between threads per cycle.
o Coarse-grain multithreading Executes multiple instructions per thread before
switching.
o Chip Multiprocessor (CMP) Multicore processor executing multiple threads.
o Simultaneous Multithreading (SMT) Executes instructions from different threads in
parallel.
Page 11
GPU Computing and Exascale Systems
GPUs were initially graphics accelerators, now widely used for HPC and AI.
First GPU: NVIDIA GeForce 256 (1999).
Modern GPUs have hundreds of cores, e.g., NVIDIA CUDA Tesla.
GPGPU (General-Purpose GPU Computing) enables parallel processing beyond
graphics.
How GPUs Work
Early GPUs functioned as CPU coprocessors.
Modern GPUs have 128+ cores, each handling multiple threads.
GPUs optimize throughput, CPUs optimize latency.
Used in supercomputers, AI, deep learning, gaming, and mobile devices.
GPU Programming Model
CPU offloads floating-point computations to GPU.
CUDA programming (by NVIDIA) enables large-scale parallel computing.
Page 12
Example 1.1 the NVIDIA Fermi GPU Chip with 512 CUDA Cores
Power Efficiency of GPUs
GPUs offer better performance per watt than CPUs.
Energy consumption:
o CPU: ~2 nJ per instruction.
o GPU: ~200 pJ per instruction (10x more power-efficient).
Challenges in future computing:
o Power consumption constraints.
o Optimization of storage hierarchy and memory management.
Page 13
o Need for self-aware OS, locality-aware compilers, and auto-tuners for GPU-
based computing.
Memory, Storage, and Wide-Area Networking
Memory Technology
DRAM capacity growth: 16 KB (1976) 64 GB (2011), increasing 4× every 3 years.
Memory wall problem: CPU speed increases faster than memory access speed,
creating a performance gap.
Hard drive capacity growth: 260 MB (1981) 250 GB (2004) 3 TB (2011),
increasing 10× every 8 years.
Challenge: Faster CPUs and larger memory lead to CPU-memory bottlenecks.
Disks and Storage Technology
Disk arrays exceeded 3 TB in capacity beyond 2011.
Flash memory & SSDs are revolutionizing HPC (High-Performance Computing) and HTC
(High-Throughput Computing).
SSD lifespan: 300,000 1 million write cycles per block, making them durable for years.
Page 14
Storage trends:
o Tape units obsolete
o Disks function as tape units
o Flash storage replacing traditional disks
o Memory functions as cache
Challenges: Power consumption, cooling, and cost of large storage systems.
System-Area Interconnects
Small clusters use Ethernet switches or Local Area Networks (LANs).
Types of storage networks:
o Storage Area Network (SAN) connects servers to network storage.
o Network Attached Storage (NAS) allows client hosts direct disk access.
Page 15
Smaller clusters use Gigabit Ethernet with copper cables.
Wide-Area Networking
Ethernet speed evolution:
10 Mbps (1979) 1 Gbps (1999) 40-100 Gbps (2011) projected 1 Tbps
(2013).
Network performance grows 2× per year, surpassing for CPUs.
High-bandwidth networking enables large-scale distributed computing.
IDC 2010 report: InfiniBand & Ethernet will dominate HPC interconnects.
Most data centers use Gigabit Ethernet for server clusters.
Virtual Machines and Virtualization Middleware
Traditional computers have a single OS, tightly coupling applications to hardware.
Virtual Machines (VMs) provide flexibility, resource utilization, software
manageability, and security.
Virtualization enables access to computing, storage, and networking resources
dynamically.
Middleware like Virtual Machine Monitors (VMMs) or hypervisors manage VMs.
Page 16
VM Architectures
Host Machine: Physical hardware runs an OS (e.g., Windows).
Native VM (Bare-Metal): A hypervisor directly manages hardware, running guest OS
(e.g., XEN on Linux).
Host VM: The VMM runs in non-privileged mode without modifying the host OS.
Hybrid VM: VMM operates at both user and supervisor levels, requiring host OS
modifications.
Advantages: OS independence, application portability, hardware abstraction.
VM Primitive Operations
Multiplexing: Multiple VMs run on a single hardware machine.
Suspension & Storage: VM state is saved for later use.
Resumption: Suspended VM can be restored on a different machine.
Migration: VM moves across platforms, enhancing flexibility.
Benefits: Improved resource utilization, reduced server sprawl, increased efficiency
(VMware reports 60 80% server utilization).
Page 17
Virtual Infrastructures
Page 18
Maps physical resources (compute, storage, networking) to virtualized applications.
Separates hardware from software, reducing costs and increasing efficiency.
Supports cloud computing through dynamic resource mapping.
Data Center Virtualization for Cloud Computing
Cloud Architecture: Uses commodity hardware (x86 processors, low-cost storage,
Gigabit Ethernet).
Design Priorities: Cost-efficiency over raw performance, focusing on storage and
energy savings.
Data Center Growth & Cost Breakdown
Large data centers contain thousands of servers.
Cost Distribution (2009 IDC Report):
o 30% for IT equipment (servers, storage).
o 60% for maintenance and management (cooling, power, etc.).
o Electricity & cooling costs increased from 5% to 14% in 15 years.
Low-Cost Design Philosophy
Uses commodity x86 servers and Ethernet networks instead of expensive hardware.
Software manages network traffic, fault tolerance, and scalability.
Convergence of Technologies Enabling Cloud Computing
1. Hardware Virtualization & Multi-Core Chips: Allows dynamic configurations.
2. Utility & Grid Computing: Forms the foundation of cloud computing.
3. SOA, Web 2.0, and Mashups: Advances in web technologies drive cloud adoption.
Page 19
4. Autonomic Computing & Data Center Automation: Enhances efficiency.
Impact of Cloud Computing on Data Science & E-Research
Data Deluge: Massive data from sensors, web, simulations, requiring advanced data
management.
E-Science Applications: Used in biology, chemistry, physics, and social sciences.
MapReduce & Iterative MapReduce: Enable parallel processing of big data.
Multicore & GPU Clusters: Boost computational power for scientific research.
Cloud Computing & Data Science Convergence: Revolutionizes computing
architecture and programming models.
System Models for Distributed and Cloud Computing
Distributed and Cloud Computing Systems
Built over a large number of autonomous computer nodes.
Nodes are interconnected using SANs, LANs, or WANs in a hierarchical manner.
Clusters of clusters can be created using WANs for large-scale systems.
These systems are highly scalable, supporting web-scale connectivity.
Page 20
Classification of Massive Systems
Four major types: Clusters, P2P Networks, Computing Grids, and Internet Clouds.
Involves hundreds, thousands, or even millions of participating nodes.
Clusters: Popular in supercomputing applications.
P2P Networks: Used in business applications but face copyright concerns.
Grids: Underutilized due to middleware and application inefficiencies.
Cloud Computing: Cost-effective and simple for providers and users.
Clusters of Cooperative Computers
A computing cluster consists of interconnected computers working as a single unit.
Handles heavy workloads and large datasets efficiently.
Page 21
Cluster Architecture
Uses low-latency, high-bandwidth interconnection networks (e.g., SAN, LAN).
Built using Gigabit Ethernet, Myrinet, or InfiniBand switches.
Connected to the Internet via VPN gateways.
Cluster nodes often run under different OS, leading to multiple system images.
Single-System Image (SSI)
Ideal cluster design merges multiple system images into a single-system image.
SSI makes a cluster appear as a single machine to users.
Achieved using middleware or OS extensions.
Hardware, Software, and Middleware Support
Cluster nodes include PCs, workstations, servers, or SMP.
MPI and PVM used for message passing.
Most clusters run on Linux OS.
Middleware is essential for SSI, high availability (HA), and distributed shared memory
(DSM).
Virtualization allows dynamic creation of virtual clusters.
Page 22
Major Cluster Design Issues
No unified cluster-wide OS for resource sharing.
Middleware is required for cooperative computing and high performance.
Benefits of clusters: Scalability, efficient message passing, fault tolerance, and job
management.
Grid Computing Infrastructures
Evolution: Internet Web Grid computing.
Enables interaction among applications running on distant computers.
Grid computing supports a rapidly growing IT-based economy.
Page 23
Computational Grids
Similar to an electric power grid, integrates computers, software, middleware, and
users.
Constructed across LANs, WANs, or the Internet at various scales.
Virtual platforms for supporting virtual organizations.
Computers used: Workstations, servers, clusters, supercomputers.
Personal devices (PCs, laptops, PDAs) can also access grid systems.
Example of a Computational Grid
Built over multiple resource sites owned by different organizations.
Offers diverse computing resources (e.g., workstations, large servers, Linux clusters).
Uses broadband IP networks (LANs, WANs) to integrate computing, communication,
and content.
Special applications:
o SETI@Home (search for extraterrestrial life).
o Astrophysics research (pulsars).
Examples of large grids:
o NSF TeraGrid (USA).
o EGEE (Europe).
o ChinaGrid (China).
Grid Families
Definition: Grid computing integrates distributed resources to solve large-scale
computing problems.
Types:
Page 24
Computational/Data Grids: Built at a national level for large-scale computing and data
sharing.
P2P Grids: Decentralized and self-organizing without a central control.
Peer-to-Peer (P2P) Network Families
Client-Server Model: Traditional architecture where clients connect to a central server
for computing resources.
P2P Architecture: Decentralized, with each node acting as both a client and a server.
Page 25
P2P Systems
Decentralized Control: No master-slave relationship; no global view of the system.
Self-Organizing: Peers join and leave voluntarily.
Ad Hoc Network: Uses the Internet (TCP/IP, NAI protocols).
Overlay Networks
Virtual network over a physical P2P network.
Types:
Unstructured Overlay: Random connections, flooding-based search.
Structured Overlay: Organized topology, efficient routing.
P2P Application Families
File Sharing: E.g., BitTorrent, Napster.
Page 26
Collaboration: E.g., Skype, MSN.
Distributed Computing: E.g., SETI@home.
P2P Platforms: E.g., JXTA, .NET.
P2P Computing Challenges
Heterogeneity: Hardware, software, and network incompatibilities.
Scalability: Handling increased workloads.
Security & Privacy: Lack of trust, copyright concerns.
Performance Issues: Data location, routing efficiency, load balancing.
Fault Tolerance: Replication prevents single points of failure.
Cloud Computing Over the Internet
Cloud computing is revolutionizing computational science by enabling large-scale data
processing with efficient resource allocation.
It shifts computing and data storage from desktops to centralized data centers, offering
on-demand services.
Page 27
Cloud Computing
Moves computing from desktops to large data centers.
Enables on-demand software, hardware, and data as a service.
Supports scalability, redundancy, and self-recovering systems.
Internet Clouds
Uses virtualized platforms with dynamic resource provisioning.
Provides cost-effective solutions for both users and providers.
Ensures security, trust, and dependability in cloud operations.
The Cloud Landscape
Cloud computing addresses challenges in traditional distributed computing systems,
such as maintenance, poor utilization, and high costs.
It provides scalable, on-demand computing resources through various service models.
Challenges in Traditional Systems
Require constant maintenance.
Suffer from poor resource utilization.
Have high costs for hardware and software upgrades.
Page 28
Cloud Computing as a Solution
Provides an on-demand computing paradigm.
Offers scalable and cost-efficient alternatives to traditional systems.
Cloud Service Models
Infrastructure as a Service (IaaS): Provides virtualized computing resources (servers,
storage, networking). Users manage applications but not infrastructure.
Platform as a Service (PaaS): Offers a virtualized development platform with
middleware, databases, and APIs (e.g., Java, Python, Web 2.0).
Software as a Service (SaaS): Delivers software applications via browsers (e.g., CRM,
ERP, HR systems) without upfront infrastructure investment.
Cloud Deployment Models
Page 29
Private Cloud: Exclusive use by a single organization, ensuring high security.
Public Cloud: Services available to multiple users with lower costs.
Hybrid Cloud: Combines private and public cloud benefits.
Managed Cloud: Maintained by third-party providers for efficient management.
Security Considerations
Different service level agreements (SLAs) define security responsibilities.
Security is shared among cloud providers, users, and third-party software providers.
Benefits of Cloud Computing
1. Efficient location-based data centers with better energy management.
2. Better resource utilization through peak-load sharing.
3. Reduces infrastructure maintenance efforts.
4. Significantly lowers computing costs.
5. Facilitates cloud-based programming and development.
6. Enhances service and data discovery, along with content distribution.
7. Addresses privacy, security, and reliability challenges.
8. Supports flexible service agreements, business models, and pricing policies.
Software Environments for Distributed Systems and Clouds
This section explores the software environments used in distributed and cloud
computing, focusing on Service-Oriented Architecture (SOA), web services, REST, and
the evolving relationship between grids and clouds.
Service-Oriented Architecture (SOA) in Distributed Systems
Entities in SOA:
o Grids/Web Services Services
Page 30
o Java Java Objects
o CORBA Distributed Objects
SOA builds on the OSI networking model, using middleware like .NET, Apache Axis, and
Java Virtual Machine.
Higher-level environments handle entity interfaces and inter-entity communication,
rebuilding the OSI layers at the software level.
Layered Architecture for Web Services and Grids
Entity interfaces include Web Services Description Language (WSDL), Java methods,
and CORBA IDL.
Communication systems: SOAP (Web services), RMI (Java), IIOP (CORBA).
Middleware support: WebSphere MQ, Java Message Service (JMS) for messaging, fault
tolerance, and security.
Service discovery models: JNDI (Java), UDDI, LDAP, ebXML, CORBA Trading Service.
Page 31
Management services: CORBA Life Cycle, Enterprise JavaBeans, Jini lifetime model, and
web services frameworks.
Web Services vs. REST Architecture
Web Services (SOAP-based):
o Fully specifies service behavior and environment.
o Uses SOAP messages for universal distributed OS functionality.
o Implementation challenges due to complexity.
REST (Representational State Transfer):
o Focuses on simplicity and lightweight communication.
o Uses "XML over HTTP" for rapid technology environments.
o Suitable for modern web applications.
Evolution of SOA
SOA enables integration across grids, clouds, interclouds, and IoT.
Sensors (SS) collect raw data, which is processed through compute, storage,
filter, and discovery clouds.
Filter clouds eliminate unnecessary data to refine information for decision-
making.
Portals (e.g., OGFCE, HUBzero) serve as access points for users.
Page 32
Grids vs. Clouds
Grid Computing:
o Uses static resources allocated in advance.
o Focuses on distributed computing with a defined structure.
Cloud Computing:
o Uses elastic resources that scale dynamically.
o Supports virtualization and autonomic computing.
Page 33
Hybrid Approach:
o Grids can be built out of multiple clouds for better resource allocation.
o Models include cloud of clouds, grid of clouds, and interclouds.
Workflow Coordination in Distributed Systems:
o Technologies like BPEL Web Services, Pegasus, Taverna, Kepler, Trident, and Swift
help manage distributed services.
Trends Toward Distributed Operating Systems
Distributed Operating Systems (DOS)
Distributed systems have multiple system images due to independent OS on each
node.
A distributed OS enhances resource sharing and fast communication using
message passing and RPCs.
It improves performance, efficiency, and flexibility of distributed applications.
Page 34
Approaches to Distributed Resource Management (Tanenbaum's Classification)
1. Network OS Built over multiple heterogeneous OS platforms; offers lowest
transparency, mainly used for file sharing.
2. Middleware-based OS Provides limited resource sharing (e.g., MOSIX/OS for
clusters).
3. Truly Distributed OS Provides higher transparency and better resource
management.
Comparison of Distributed OS (Amoeba vs. DCE)
Amoeba (Netherlands) and DCE (Open Software Foundation) are research
prototypes.
No commercial OS has succeeded in following these systems.
Page 35
Future trends focus on web-based OS for virtualization and lightweight microkernel
designs.
MOSIX2 for Linux Clusters
MOSIX2 is a distributed OS with a virtualization layer for Linux.
Provides single-system image, supports sequential and parallel applications.
Enables resource migration across Linux nodes in clusters and grids.
Used in Linux clusters, GPU clusters, grid computing, and cloud environments.
Transparency in Programming Environments
Computing infrastructure is divided into four levels:
1. User Data (separated from applications).
2. Applications (runs on multiple OSes).
3. Operating Systems (provide standard interfaces).
4. Hardware (standardized across OSes).
Future cloud computing will allow users to switch OS and applications easily.
Parallel and Distributed Programming Models
Page 36
Message-Passing Interface (MPI)
A standard library for parallel programming in C and FORTRAN.
Used in clusters, grid systems, and P2P networks.
Alternative: Parallel Virtual Machine (PVM).
MapReduce
Scalable data processing model for large clusters (Google).
Uses Map (key/value generation) and Reduce (merging values) functions.
Handles terabytes of data across thousands of machines.
Hadoop
Open-source version of MapReduce, initially developed by Yahoo!.
Enables massive data processing over distributed storage.
Provides high parallelism, reliability, and scalability.
Page 37
Open Grid Services Architecture (OGSA)
A standard for grid computing.
Supports distributed execution, security policies, and trust management.
Genesis II is an OGSA-based implementation.
Globus Toolkits
Middleware for resource allocation, security, and authentication in grid computing.
Developed by Argonne National Lab and USC.
IBM extended Globus for business applications.
Performance, Security and Energy Efficiency
Performance Metrics and Scalability Analysis
Distributed system performance depends on several factors, including CPU speed
(MIPS), network bandwidth (Mbps), system throughput (Tflops, TPS), job response time,
and network latency.
Page 38
High-performance interconnection networks require low latency and high bandwidth.
Other key metrics include OS boot time, compile time, I/O data rate, system availability,
dependability, and security resilience.
Dimensions of Scalability
1. Size Scalability Increasing the number of processors, memory, or I/O channels to
improve performance.
2. Software Scalability Upgrading OS, compilers, and application software to work
efficiently in large systems.
3. Application Scalability Adjusting problem size to match machine scalability.
4. Technology Scalability Adapting to hardware and networking advancements while
ensuring compatibility with existing systems.
Scalability versus OS Image Count
Scalability is affected by OS image count. SMP systems have a single OS image, limiting
scalability, whereas NUMA, clusters, and cloud environments support multiple OS
images, enabling higher scalability.
Page 39
states that system speedup is limited by the sequential portion of a
program. Even with infinite processors, speedup is constrained by non-parallelizable
code.
Problem with Fixed Workload
Page 40
addresses this limitation by scaling the workload along with system
size, resulting in better efficiency in large distributed systems.
System Availability and Fault Tolerance
High Availability (HA) is crucial in clusters, grids, P2P networks, and clouds. It is defined
as: System Availability=
Page 41
Fault Tolerance Strategies include redundancy, component reliability, and failover
mechanisms.
As system size increases, availability decreases due to higher failure probability. Grids
and clusters have better fault isolation than SMP and MPP systems, while P2P
networks have the lowest availability.
Security Challenges in Distributed Systems
Common Threats:
o Information leaks (loss of confidentiality)
o Data integrity breaches (Trojan horses, user alterations)
o Denial of Service (DoS) attacks (disrupting system operation)
o Unauthorized access (exploiting open computing resources)
Security Responsibilities in Cloud Models:
Page 42
o SaaS (Software as a Service) Provider handles all security.
o PaaS (Platform as a Service) Provider ensures data integrity and availability, but
users manage confidentiality.
o IaaS (Infrastructure as a Service) Users handle most security functions, while
providers ensure availability.
Copyright Protection in P2P Networks:
Collusive Piracy: Paid clients (colluders) share copyrighted content with unpaid clients
(pirates), affecting commercial content delivery.
Content Poisoning Scheme: Proactively detects and prevents piracy using identity-based
signatures and timestamped tokens, protecting legitimate clients while stopping
colluders and pirates.
Reputation Systems: Essential in detecting and addressing piracy in P2P networks and
digital content sharing.
System Defense Technologies:
First Generation: Tools focused on preventing intrusions through access control
policies, cryptography, and tokens.
Second Generation: Tools for detecting intrusions (e.g., firewalls, IDS, reputation
systems) and triggering remedial actions.
Third Generation: Intelligent systems that respond to intrusions and adapt to security
threats.
Page 43
Data Protection Infrastructure:
Security Infrastructure: Involves trust negotiation, reputation aggregation, and intrusion
detection against viruses and DDoS attacks.
Cloud Security: Cloud service models (IaaS, PaaS, SaaS) divide security responsibilities:
o IaaS: Users manage confidentiality; providers manage data integrity and
availability.
o PaaS and SaaS: Both providers and users share responsibility for data
integrity and confidentiality.
Piracy Prevention: Measures against online piracy and copyright violations in digital
content.
Energy Efficiency in Distributed Computing:
1. Energy Consumption Challenges: Systems face rising energy costs, especially in
large-scale data centers and HPC systems (e.g., Earth Simulator, Petaflop).
2. Unused Servers: Many servers in data centers are left powered on without use,
leading to significant energy waste (e.g., 4.7 million idle servers globally).
o Potential Savings: Estimated savings of $3.8 billion in energy costs and $24.7
billion in operational costs from turning off unused servers.
Page 44
3. Energy in Active Servers: Techniques needed to reduce energy consumption without
affecting performance.
Energy Management in Distributed Systems (Four Layers):
Application Layer:
o Focus on energy-aware applications that balance energy consumption with
performance.
o Key factors: Instruction count and storage transactions affect energy use.
Middleware Layer:
o Manages energy-efficient scheduling and task management.
Page 45
o Incorporates energy-aware techniques to optimize power usage during task
scheduling.
Resource Layer:
o Manages hardware (e.g., CPU) and operating systems to optimize energy usage.
o Dynamic Power Management (DPM): Switches between idle and lower-power states.
o Dynamic Voltage-Frequency Scaling (DVFS): Controls power consumption by
adjusting voltage and frequency.
Network Layer:
o Focuses on energy-efficient network routing and protocols.
o New energy-efficient routing algorithms and models are needed for optimized
performance and reduced energy consumption.
Energy Efficiency Techniques:
Page 46
Page 47