KEMBAR78
Week 5 PDC | PDF | Parallel Computing | Cache (Computing)
0% found this document useful (0 votes)
126 views12 pages

Week 5 PDC

Uploaded by

writesharoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views12 pages

Week 5 PDC

Uploaded by

writesharoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Shared Memory (SM)

Shared Memory (SM) is a parallel memory model where all processors have direct access to the
same physical memory. This model simplifies programming by providing a unified view of
memory, making it easier to share data and coordinate tasks between processors.

Key Characteristics:

 Shared Memory Space: All processors share a common memory space, allowing them
to access and modify data directly.
 Uniform Access: Processors have equal access to any memory location, regardless of
their physical proximity.
 Cache Coherence: To ensure data consistency, cache coherence protocols are used to
maintain consistency between the main memory and the caches of multiple processors.

Types of SM Systems:

 Symmetric Multiprocessing (SMP): In SMP systems, multiple processors share a


common bus and memory, providing a tightly coupled environment.
 Massively Parallel Processing (MPP): MPP systems consist of multiple interconnected
nodes, each with its own processor and memory. While they share a logical memory
space, the physical memory is distributed across the nodes.

Advantages of SM:

 Simplicity: SM simplifies programming by providing a unified memory space, making it


easier to share data and coordinate tasks.
 Efficiency: SM can be very efficient for applications that exhibit a high degree of data
locality, as data can be accessed directly from the cache.
 Synchronization: SM provides built-in mechanisms for synchronization, such as locks
and semaphores, which can simplify the coordination of concurrent tasks.

Disadvantages of SM:

 Scalability: SM can be limited in scalability due to bus contention and cache coherence
issues, especially for large-scale systems.
 Cache Coherence Overhead: Maintaining cache coherence can introduce overhead,
especially for frequently accessed data.
 Bus Contention: As the number of processors increases, contention for the shared
memory bus can become a bottleneck.

Use Cases:

 General-Purpose Computing: SM is widely used in general-purpose computing


systems, such as servers and workstations.
 Scientific Computing: SM is suitable for many scientific computing applications,
especially those that exhibit a high degree of data locality and can effectively manage
cache coherence.
 Embedded Systems: SM is used in embedded systems where a small number of
processors share a common memory space.

In summary, shared memory is a powerful parallel memory model that simplifies programming
and can offer high performance for certain types of applications. However, its scalability and
performance can be limited by factors such as bus contention and cache coherence overhead.

Distributed Memory (DM)


Distributed Memory (DM) is a parallel memory model where each processor has its own
private memory, and data is exchanged through explicit message passing. This model is well-
suited for large-scale parallel systems and applications that exhibit a high degree of data locality
or require frequent communication between processors.

Key Characteristics:

 Private Memory: Each processor has its own local memory, which is not directly
accessible by other processors.
 Message Passing: Data is exchanged between processors using explicit message-passing
mechanisms. This involves sending and receiving messages containing data or control
information.
 Scalability: DM is highly scalable, as it can accommodate a large number of processors
without suffering from bus contention or cache coherence issues.
 Locality: DM is well-suited for applications that exhibit a high degree of data locality,
where most data accessed by a processor is stored in its local memory.
 Communication Overhead: While DM offers scalability, it can introduce significant
communication overhead, especially for applications that require frequent data transfers
between processors.

Programming Models:

 Message Passing Interface (MPI): MPI is a widely used standard for message passing
in distributed memory systems. It provides a rich set of routines for sending, receiving,
and managing messages.
 Point-to-Point Communication: This involves direct communication between two
processors, using functions like MPI_Send and MPI_Recv.
 Collective Communication: This involves coordinated communication among multiple
processors, using functions like MPI_Bcast, MPI_Gather, and MPI_Reduce.

Advantages of DM:

 Scalability: DM can handle large-scale parallel systems with thousands of processors.


 Flexibility: DM provides a high degree of flexibility in terms of data distribution and
communication patterns.
 Performance: DM can achieve high performance for applications that exhibit a high
degree of data locality and can effectively manage communication overhead.

Disadvantages of DM:

 Complexity: DM requires more complex programming techniques compared to shared


memory models.
 Communication Overhead: Frequent message passing can introduce significant
overhead, especially for applications with fine-grained communication patterns.
 Data Distribution: Careful data distribution is essential to optimize performance and
minimize communication overhead.

Use Cases:

 Scientific Computing: DM is widely used in scientific computing applications, such as


simulations, data analysis, and numerical methods.
 High-Performance Computing (HPC): DM is a common choice for HPC systems,
where large-scale parallel processing is required.
 Distributed Systems: DM is used in distributed systems, such as clusters and grids,
where multiple interconnected computers work together to solve problems.

In summary, distributed memory is a powerful parallel memory model that offers scalability
and flexibility for large-scale parallel applications. However, it requires careful programming
and attention to communication overhead to achieve optimal performance.
Distributed Shared Memory (DSM)
Distributed Shared Memory (DSM) is a parallel memory model that combines the advantages
of shared memory (SM) and distributed memory (DM). It provides a shared memory abstraction
to programmers, while utilizing the underlying distributed memory architecture for scalability
and performance.

Key Characteristics:

 Shared Memory Abstraction: DSM presents a unified memory space to applications, similar to
SM. This simplifies programming by eliminating the need for explicit message passing.
 Distributed Memory Implementation: DSM is implemented using distributed memory
techniques, such as message passing or remote memory access (RMA), to handle data sharing
and coherence.
 Data Coherence: DSM systems employ various techniques to ensure that multiple processors
have consistent copies of shared data. This is typically achieved through hardware or software-
based coherence protocols.

DSM Implementation Techniques:

 Hardware-Based DSM: This approach relies on hardware mechanisms, such as directory-based


coherence or snooping protocols, to maintain data coherence.
 Software-Based DSM: This approach uses software techniques, such as page-based coherence
or object-based coherence, to manage data consistency.

DSM Coherence Protocols:

 Directory-Based Coherence: A centralized directory keeps track of the location and state of each
memory block, allowing efficient coherence enforcement.
 Snooping: Processors monitor the bus for memory accesses and update their caches accordingly
to maintain coherence.
 Page-Based Coherence: Coherence is maintained at the page level, which can reduce the
overhead of coherence enforcement.
 Object-Based Coherence: Coherence is maintained at the object level, providing finer-grained
control but potentially increasing overhead.

Advantages of DSM:

 Shared Memory Abstraction: DSM simplifies programming by providing a shared memory


interface.
 Scalability: DSM can be highly scalable, as it leverages distributed memory techniques for
efficient data sharing.
 Performance: DSM can achieve good performance for applications that exhibit a high degree of
data locality and can effectively manage coherence overhead.

Disadvantages of DSM:

 Complexity: Implementing DSM systems can be more complex than SM or DM due to the need
for coherence protocols and data distribution.
 Overhead: Coherence protocols can introduce overhead, especially for frequently accessed
data.
 Performance Limitations: DSM may not always achieve the same performance as SM for
applications that require frequent synchronization or access to shared data.

Use Cases:

 Scientific Computing: DSM is well-suited for scientific computing applications that require a
shared memory abstraction but also benefit from the scalability of distributed memory.
 Parallel Programming Languages: Some parallel programming languages, such as OpenMP and
UPC, support DSM as a programming model.
 Cluster Computing: DSM can be used in cluster computing environments to provide a shared
memory abstraction across multiple nodes.

In summary, distributed shared memory offers a hybrid approach that combines the benefits of
shared memory and distributed memory. By providing a shared memory abstraction while
utilizing distributed memory techniques, DSM can enable efficient and scalable parallel
computing.
Uniform Memory Access (UMA)
Uniform Memory Access (UMA) is a parallel memory model where all processors have equal
access time to any memory location. This means that there is no performance difference between
accessing data from different memory modules or locations.

Key Characteristics:

 Equal Access Time: All processors can access any memory location with the same latency.
 Shared Bus or Interconnect: UMA systems typically use a shared bus or interconnect to connect
the processors to the memory, ensuring uniform access.
 Simplicity: UMA is relatively simple to program and manage, as there are no performance
disparities between processors.

Advantages of UMA:

 Simplicity: UMA is easy to program and understand, as there are no complexities related to
memory access latency or locality.
 Performance: UMA can offer good performance for applications that do not exhibit significant
memory access patterns or locality.
 Synchronization: UMA simplifies synchronization between processors, as there is no need to
consider memory access latencies.
Disadvantages of UMA:

 Scalability: UMA can be limited in scalability, especially for large-scale systems, as the shared
bus can become a bottleneck.
 Bus Contention: As the number of processors increases, contention for the shared bus can
degrade performance.
 Cache Coherence: UMA systems may require cache coherence protocols to ensure data
consistency between multiple processors.

Typical UMA Architectures:

 Symmetric Multiprocessing (SMP): SMP systems are a common example of UMA architectures,
where multiple processors share a common bus and memory.
 NUMA with Uniform Access: Some NUMA systems can be configured to provide uniform access
to all memory locations, effectively behaving as UMA systems.

Use Cases:

 Small-Scale Parallel Systems: UMA is well-suited for smaller-scale parallel systems with a
limited number of processors.
 General-Purpose Computing: UMA is commonly used in general-purpose computing systems,
such as servers and workstations.
 Embedded Systems: UMA can be used in embedded systems where a small number of
processors share a common memory space.

In summary, UMA is a simple and efficient parallel memory model that provides uniform
access to memory for all processors. However, its scalability can be limited by bus contention
and cache coherence overhead, especially for larger-scale systems.
Non-Uniform Memory Access (NUMA)
Non-Uniform Memory Access (NUMA) is a parallel memory model where processors have
different access times to memory locations based on their proximity. This means that accessing
data from memory modules that are closer to a processor will be faster than accessing data from
memory modules that are farther away.

Key Characteristics:

 Non-Uniform Access Time: Processors have different access times to memory locations
depending on their distance from the memory modules.
 Local Memory: Each processor typically has its own local memory module, which provides faster
access to data.
 Remote Memory Access (RMA): Processors can access data from remote memory modules, but
this involves additional latency and overhead.

Advantages of NUMA:

 Scalability: NUMA can be highly scalable, as it can accommodate a large number of processors
without suffering from excessive bus contention.
 Locality: NUMA can improve performance by exploiting data locality, as processors can access
data from their local memory more quickly.
 Flexibility: NUMA offers flexibility in terms of memory allocation and distribution, allowing for
efficient utilization of resources.

Disadvantages of NUMA:

 Complexity: NUMA can be more complex to program and manage than UMA, as developers
need to consider memory access patterns and locality.
 Performance Overhead: Remote memory access can introduce overhead, especially for
applications that require frequent data transfers between processors.
 Synchronization: Synchronization between processors can be more challenging in NUMA
systems due to the non-uniform access times.

NUMA Architectures:

 Hierarchical NUMA: In this architecture, processors are organized in a hierarchical structure,


with multiple levels of memory modules.
 Flat NUMA: In this architecture, processors have direct access to all memory modules, but the
access times vary based on distance.

Use Cases:

 Large-Scale Parallel Systems: NUMA is well-suited for large-scale parallel systems that require
high performance and scalability.
 Scientific Computing: NUMA can be used to improve the performance of scientific computing
applications that exhibit a high degree of data locality.
 High-Performance Computing (HPC): NUMA is a common choice for HPC systems, where large-
scale parallel processing is required.

In summary, NUMA is a parallel memory model that offers scalability and performance
benefits by exploiting data locality. However, it can introduce complexity and overhead due to
non-uniform access times and remote memory access.
Feature UMA NUMA
Access Time Equal for all processors Varies based on proximity
Memory Organization Shared memory bus Local memory modules
Scalability Limited Better scalability
Performance Can suffer from bus contention Can improve performance with data locality
Programming Model Simpler More complex

Cache Coherence Protocols


In parallel and distributed computing systems, cache coherence ensures that multiple processors
have consistent copies of shared data. This is crucial to maintain the correctness of parallel
applications. Cache coherence protocols are mechanisms used to achieve this consistency.

Basic Concepts

 Cache: A high-speed memory component that stores frequently accessed data to reduce
memory access latency.
 Shared Data: Data that is shared among multiple processors.
 Cache Line: The unit of data transfer between the cache and main memory.

Common Cache Coherence Protocols

Write-Invalidate

 Principle: When a processor writes to a shared location, it invalidates all other cached copies.
 Advantages: Simple to implement.
 Disadvantages: Can generate a lot of traffic, especially for frequently accessed data.

Write-Update

 Principle: When a processor writes to a shared location, it updates all other cached copies.
 Advantages: Reduces traffic compared to write-invalidate.
 Disadvantages: Can increase latency due to the need to update multiple caches.

Directory-Based

 Principle: A centralized directory keeps track of cache ownership and coherence state.
 Advantages: Scalable for large systems.
 Disadvantages: Requires additional hardware or software overhead.

MSI (Modified-Shared-Invalid)

 States: Modified, Shared, Invalid.


 Transitions:
o Read miss: If the block is in another cache, transition to Shared state. Otherwise, load
from memory and transition to Modified state.
o Write miss: If the block is in another cache, invalidate it and load from memory.
Otherwise, transition to Modified state.
o Write hit: Transition to Modified state.

MESI (Modified-Exclusive-Shared-Invalid)

 States: Modified, Exclusive, Shared, Invalid.


 Transitions:
o Read miss: If the block is in another cache, transition to Shared state. Otherwise, load
from memory and transition to Exclusive state.
o Write miss: If the block is in another cache, invalidate it and load from memory.
Otherwise, transition to Modified state.
o Write hit: If the block is in Exclusive state, transition to Modified state. Otherwise,
invalidate other caches and transition to Modified state.

MOESI (Modified-Owner-Exclusive-Shared-Invalid)

 States: Modified, Owner-Exclusive, Exclusive, Shared, Invalid.


 Transitions:
o Read miss: If the block is in another cache, transition to Shared state. Otherwise, load
from memory and transition to Owner-Exclusive state.
o Write miss: If the block is in another cache, invalidate it and load from memory.
Otherwise, transition to Modified state.
o Write hit: If the block is in Owner-Exclusive or Exclusive state, transition to Modified
state. Otherwise, invalidate other caches and transition to Modified state.

Factors Affecting Choice of Protocol

 Number of processors: Directory-based protocols are more scalable for larger systems.
 Communication patterns: Write-invalidate may be suitable for applications with low write rates,
while write-update or directory-based protocols are better for applications with high write
rates.
 Performance requirements: The choice of protocol can significantly impact performance, so
careful consideration is needed.

By understanding these cache coherence protocols, you can make informed decisions when
designing and implementing parallel and distributed computing systems.

You might also like