PARALLEL ALGORITHM AND PROGRAMMING
1. Parallel Computing Fundamentals
1.1 Motivating Parallelism
1.2 Scope of Parallel Computing
1.3 Parallel Computing Platforms
1.4 Setting up Parallel Computing Environment
2. Parallel Programming Models
2.1 Shared Address Space
2.2 Distributed memory
2.3 Data Parallelism
2.4 Hybrid
2.5 SPMD and MPMD
3. Parallel Algorithm Design
3.1 Methodical Design
3.2Problem Understanding
3.3 Partitioning
3.4 Communication
3.5 Synchronization
3.6 Dependencies
3.7 Mapping
4. Shared Address Space Programming Paradigm
4.1 Thread Lifecycle
4.2 Low-level APIs
4.3 Synchronization Construct
4.4 Liveness problems
4.5 High-level APIs
5. Message Passing Programming Paradigm
5.1 MPI: Message Passing Interface
5.2 Basic Operations
5.3 Communication Protocols
5.4 Point-to-point communication using API
5.5 Collective communication using API
6. Performance Analysis
6.1 Source of Overhead
6.2 Performance Metrics
6.3 Scalability
6.4 Analysis Techniques
Definition of Parallel Programming
Complex problems require complex solutions. Instead of waiting hours for a program to finish
running, why not utilize parallel programming? Parallel programming helps developers break
down the tasks that a program must complete into smaller segments of work that can be done in
parallel. While parallel programming can be a more time intensive effort up front for developers
to create efficient parallel algorithms and code, it overall saves time by leveraging parallel
processing power by running the program across multiple compute nodes and CPU cores at the
same time.
Parallel programming, in simple terms, is the process of decomposing a problem into smaller
tasks that can be executed at the same time using multiple compute resources. The term parallel
programming may be used interchangeable with parallel processing or in conjunction with
parallel computing, which refers to the systems that enable the high efficiency of parallel
programming.
In parallel programming, tasks are parallelized so that they can be run at the same time by using
multiple computers or multiple cores within a CPU. Parallel programming is critical for large
scale projects in which speed and accuracy are needed. It is a complex task, but allows
developers, researchers, and users to accomplish research and analysis quicker than with a
program that can only process one task at a time.
Parallel programming works by assigning tasks to different nodes or cores. In High Performance
Computing (HPC) systems, a node is a self-contained unit of a computer system contains
memory and processors running an operating system. Processors, such as central processing units
(CPUs) and graphics processing units (GPUs), are chips that contain a set of cores. Cores are the
units executing commands; there can be multiple cores in a processor and multiple processors in
a node. With parallel programming, a developer writes code with specialized software to make it
easy for them to run their program across on multiple nodes or processors. A simple example of
where parallel programming could be used to speed up processing is recoloring an image. A
developer writes the code to break up the overall task of to change the individual aspects of an
image by segmenting the image into equal parts and then assigns the recoloring of each part to a
different parallel task, each running on their own compute resources. Once the parallel tasks have
completed, the full image is reassembled.
Parallel processing techniques can be utilized on devices ranging from embedded, mobile,
laptops, and workstations to the world’s largest supercomputers. Different computer languages
provide various technologies to enable parallelism. For C, C++ and Fortran, OpenMP, open
multi-processing, provides a cross-platform API for developing parallel applications that enable
running parallel tasks across cores of a CPU. When processes need to communicate between
different computers or nodes, a technology such as MPI, message passing interface, is typically
used. There are benefits to both models. Multiple cores on a single node share memory. Shared
memory is typically faster for exchanging information than message passing between nodes over
a network. However, there’s a limit to how many cores a single node can have. As projects get
larger, developers may use both types of parallelism together. One of the challenges that
developers face though is properly decomposing their algorithm and parallelizing across multiple
nodes and multiple cores for maximum performance and debugging their parallel application
when it does not work correctly.
Relevance of Parallel Programming
Parallel programming’s ability to decompose tasks makes it a suitable solution for complex
problems involving large quantities of data, complex calculations or large simulations.
Previously unsolvable problems have been decomposed using parallel programming, such as
weather simulations, vaccine development, and astrophysics research.
Parallel programming use cases include:
      Advanced graphics in the entertainment industry
      Applied physics
      Climate research
      Electrical engineering
      Financial and economic modeling
      Molecular modeling
      National defense and nuclear weaponry
      Oil and gas exploration
      Quantum mechanics
Different parallel algorithms and techniques
Parallel algorithms and complexity are essential topics in computer engineering, as they can help
optimize the performance and scalability of various applications and systems. In this article, you
will learn how to compare and contrast different parallel algorithms and techniques, and how to
evaluate their efficiency and suitability for different problems.
Reasons for parallel algorithms
Parallel algorithms are algorithms that can be executed by multiple processors or threads
simultaneously, in order to solve a problem faster or with less resources than a sequential
algorithm. Parallel algorithms can be classified into different models, such as shared-memory,
distributed-memory, message-passing, or hybrid models, depending on how the processors
communicate and access data. Parallel algorithms can also be categorized by their design
patterns, such as divide-and-conquer, map-reduce, pipeline, or task parallelism, depending on
how the problem is decomposed and coordinated. Parallel algorithms are important because they
can exploit the parallelism inherent in many problems and applications, such as image
processing, machine learning, cryptography, or scientific computing. Parallel algorithms can
improve the performance, scalability, and reliability of these applications, by reducing the
execution time, increasing the throughput, or enhancing the fault tolerance. Parallel algorithms
can also take advantage of the advances in hardware and software technologies, such as
multicore processors, cloud computing, or parallel programming languages and frameworks.
Comparism among existing parallel algorithms
One way to compare parallel algorithms is to use complexity analysis, which measures the time
and space requirements of an algorithm as a function of the input size. Complexity analysis can
be applied to both sequential and parallel algorithms, but for parallel algorithms, there are
additional metrics to consider, such as speedup, efficiency, scalability, and overhead. Speedup is
the ratio of the sequential execution time to the parallel execution time, and it indicates how
much faster the parallel algorithm is. Efficiency is the ratio of speedup to the number of
processors, and it indicates how well the parallel algorithm utilizes the processors. Scalability is
the ability of the parallel algorithm to maintain speedup and efficiency as the input size or the
number of processors increases. Overhead is the extra time or space required by the parallel
algorithm due to communication, synchronization, or load balancing. Another way to contrast
parallel algorithms is to use empirical evaluation, which involves running the parallel algorithms
on real or simulated hardware and measuring their performance and behavior. Empirical
evaluation can provide more realistic and accurate results than complexity analysis, as it can
account for the actual characteristics and constraints of the hardware and software platforms,
such as processor speed, memory size, network bandwidth, or operating system. Empirical
evaluation can also reveal the trade-offs and limitations of different parallel algorithms, such as
load imbalance, communication bottleneck, or synchronization overhead.
Choosing parallel algorithms
The choice of parallel algorithms depends on several factors, such as the nature of the problem,
the characteristics of the data, the features of the hardware and software platforms, and the goals
and constraints of the application. Some problems are more amenable to parallelization than
others, depending on their degree of independence, granularity, regularity, or locality. Some data
structures are more suitable for parallel manipulation than others, depending on their size, shape,
distribution, or access pattern. Some platforms are more supportive of parallel execution than
others, depending on their architecture, communication model, programming language, or
framework. Some applications have more stringent requirements than others, such as accuracy,
reliability, security, or energy efficiency. Therefore, choosing parallel algorithms requires a
careful analysis and comparison of these factors, as well as a good understanding of the strengths
and weaknesses of different parallel algorithms and techniques.