0% found this document useful (0 votes)

11 views33 pages

002 IntroHPC

The document is an introduction to High Performance Computing (HPC) focusing on its importance, parallel computing concepts, and programming models. It discusses the need for parallel computing due to the limitations of single processors and outlines various parallel architectures and programming approaches. The document emphasizes the unique strategies required for different problems and the significance of understanding dependencies and communication in parallel algorithms.

Uploaded by

nilmarathe2307

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views33 pages

002 IntroHPC

Uploaded by

nilmarathe2307

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to

High Performance Computing

Gregory G. Howes
Department of Physics and Astronomy
University of Iowa

PHYS 5905: Numerical Simulation of Plasmas

Department of Physics and Astronomy
University of Iowa
Spring 2019
Thank you

This presentation borrows heavily from information freely available on the web by
Ian Foster and Blaise Barney
(see references)
Outline
• Introduction

• Thinking in Parallel

• Parallel Computer Architectures

• Parallel Programming Models

• References
Introduction
Disclaimer: High Performance Computing (HPC) is valuable to
a variety of applications over a very wide range of fields.
Many of my examples will come from the world of physics,
but I will try to present them in a general sense

Why Use Parallel Computing?

• Single processor speeds are reaching their ultimate limits

• Multi-core processors and multiple processors are the most

promising paths to performance improvements

Definition of a parallel computer:

A set of independent processors that can work cooperatively
to solve a problem.
Introduction
The March towards Petascale Computing
• Computing performance is defined in terms of
FLoating-point OPerations per Second (FLOPS)
GigaFLOP 1 GF = 109 FLOPS
TeraFLOP 1 TF = 1012 FLOPS
PetaFLOP 1 PF = 1015 FLOPS

• Petascale computing also refers to extremely large data sets

PetaByte 1 PB = 10 15
Bytes
Introduction
Performance improves by factor of ~10 every 4 years!
Outline
• Introduction

• Thinking in Parallel

• Parallel Computer Architectures

• Parallel Programming Models

• References
Thinking in Parallel
DEFINITION Concurrency: The property of a parallel algorithm
that a number of operations can be performed by separate
processors at the same time.

Concurrency is the key concept in the design of parallel algorithms:

• Requires a different way of looking at the strategy to solve a
problem
• May require a very different approach from a serial program to
achieve high efficiency
Thinking in Parallel
DEFINITION Scalability: The ability of a parallel algorithm to
demonstrate a speedup proportional to the number of processors
used.

DEFINITION Speedup: The ratio of the serial wallclock time to the

parallel wallclock time required for execution.
wallclock timeserial
S=
wallclock timeparallel

• An algorithm that has good scalability will take half the time with
double the number of processors

• Parallel Overhead, the time required to coordinate parallel tasks

and communicate information between processors, degrades
scalability.
Example: Numerical Integration
Numerical Integration: Monte Carlo Method
• Choose N points within the box of total area A
• Determine the number of points n falling below f (x)
n
• Integral value is I = A
N

f (x)

x
How do we do this computation in parallel?
Example: Numerical Integration
Strategies for Parallel Computation of the Numerical Integral:

1) Give different ranges of x 2) Give N/4 points to each

to different processors and processor and sum results
sum results
Example: Fibonacci Series
The Fibonacci series is defined by:
f (k + 2) = f (k + 1) + f (k) with f (1) = f (2) = 1
The Fibonacci series is therefore (1, 1, 2, 3, 5, 8, 13, 21, . . .)
The Fibonacci series can be calculated using the loop
f(1)=1
f(2)=1
do i=3, N
f(i)=f(i-1)+f(i-2)
enddo

How do we do this computation in parallel?

This calculation cannot be made parallel.
- We cannot calculate f (k + 2) until we have f (k + 1) and f (k)

- This is an example of data dependence that results in a non-

parallelizable problem
Example: Protein Folding
• Protein folding problems involve a large number of independent
calculations that do not depend on data from other calculations
• Concurrent calculations with no dependence on the data from
other calculations are termed Embarrassingly Parallel
• These embarrassingly parallel problems are ideal for solution by
HPC methods, and can realize nearly ideal concurrency and
scalability
Unique Problems Require Unique Solutions
• Each scientific or mathematical problem will, in general, require a
unique strategy for efficient parallelization

Thus, each of you may require a different parallel implementation

of your numerical problem to achieve good performance.

• Flexibility in the way a problem is solved is beneficial to finding a

parallel algorithm that yields a good parallel scaling.

• Often, one has to employ substantial creativity in the way a

parallel algorithm is implemented to achieve good scalability.
Understand the Dependencies
• One must understand all aspects of the problem to be solved, in
particular the possible dependencies of the data.
• It is important to understand fully all parts of a serial code that
you wish to parallelize.

Example: Pressure Forces (Local) vs. Gravitational Forces (Global)

Rule of Thumb
When designing a parallel algorithm, always remember:

Computation is FAST

Communication is SLOW

Input/Output (I/O) is INCREDIBLY SLOW

Other Issues
In addition to concurrency and scalability, there are a number of
other important factors in the design of parallel algorithms:

Locality

Granularity

Modularity

Flexibility

Load balancing

We’ll learn about these when we discuss the design of parallel

algorithms.
Outline
• Introduction

• Thinking in Parallel

• Parallel Computer Architectures

• Parallel Programming Models

• References
The Von Neumann Architecture
Virtually all computers follow this basic design
• Memory stores both instructions and data

• Control unit fetches instructions from memory,

decodes instructions, and then sequentially performs
operations to perform programmed task

• Arithmetic Unit performs mathematical operations

• Input/Output is interface to the user

Flynn’s Taxonomy

• SISD: This is a standard serial computer: one set of instructions, one data stream

• SIMD: All units execute same instructions on different data streams (vector)
- Useful for specialized problems, such as graphics/image processing
- Old Vector Supercomputers worked this way, also moderns GPUs

• MISD: Single data stream operated on by different sets of instructions, not

generally used for parallel computers

• MIMD: Most common parallel computer, each processor can execute different
instructions on different data streams
-Often constructed of many SIMD subcomponents
Parallel Computer Memory Architectures
Shared Memory Distributed Memory

Hybrid Distributed Shared Memory

Relation to Parallel Programming Models
• OpenMP: Multi-threaded calculations occur within shared-memory components
of systems, with different threads working on the same data.

• MPI: Based on a distributed-memory model, data associated with another

processor must be communicated over the network connection.

• GPUs: Graphics Processing Units (GPUs) incorporate many (hundreds) of

computing cores with single Control Unit, so this is a shared-memory model.

• Processors vs. Cores: Most common parallel computer, each processor can
execute different instructions on different data streams
-Often constructed of many SIMD subcomponents
Outline
• Introduction

• Thinking in Parallel

• Parallel Computer Architectures

• Parallel Programming Models

• References
Parallel Programming Models
• Embarrassingly Parallel

• Master/Slave

• Threads

• Message Passing

• Single Program-Multiple Data (SPMD)

vs. Multiple Program-Multiple Data (MPMD)

• Other Parallel Implementations: GPUs and CUDA

Embarrassingly Parallel
• Refers to an approach that involves solving many similar but independent
tasks simultaneously

• Little to no coordination (and thus no communication) between tasks

• Each task can be a simple serial program

• This is the “easiest” type of problem to implement in a parallel manner.

Essentially requires automatically coordinating many independent calculations
and possibly collating the results.

• Examples:
- Computer Graphics and Image Processing
- Protein Folding Calculations in Biology
- Geographic Land Management Simulations in Geography
- Data Mining in numerous fields
- Event simulation and reconstruction in Particle Physics
Master/Slave
• Master Task assigns jobs to pool of slave
tasks Master
Slave
• Each slave task performs its job
independently

• When completed, each slave Slave Slave

returns its results to the Slave Slave
master, awaiting a new job

Slave
• Emabarrasingly parallel problems are often
well suited to this parallel programming model
Multi-Threading
• Threading involves a single process that can have multiple, concurrent
execution paths

• Works in a shared memory architecture

• Most common implementation is OpenMP (Open Multi-Processing)

serial code
.
.
.

!$OMP PARALLEL DO
do i = 1,N
A(i)=B(i)+C(i)
enddo • Relatively easy to make inner loops of a
!$OMP END PARALLEL DO serial code parallel and achieve substantial
. speedups with modern multi-core processors
.
.
serial code
Message Passing
• The most widely used model for parallel programming

• Message Passing Interface (MPI) is the most widely used implementation

• A set of tasks have their own local memory during the computation
(distributed-memory, but can also be used on shared-memory machines)

• Tasks exchange data by sending

and receiving messages, requires
programmer to coordinate explicitly
all sends and receives.

• One aim of this course will focus on the

use of MPI to write parallel programs.
SPMD vs. MPMD
Single Program-Multiple Data (SPMD)

• A single program executes on all tasks simultaneously

• At a single point in time, different tasks

may be executing the same or different
instructions (logic allows different tasks to execute different parts of the code)

Multiple Program-Multiple Data (MPMD)

• Each task may be executing the same or different programs than other tasks

• The different executable programs may

communicate to transfer data
Other Parallel Programming Models
• GPUs (Graphics Processing Units) contain many (hundreds) of processing
cores, allowing for rapid vector processing (Single Instruction, Multiple Data)

• CUDA (Compute Unified Device Architecture) programming allows one to

call on this powerful computing engine from codes written in C, Fortran,
Python, Java, and Matlab.

• This is an exciting new way to achieve massive computing power for little
hardware cost, but memory access bandwidth limitations constrain the possible
applications.
Parting Thoughts
• Part of the challenge of parallel computing is that the most efficient
parallelization strategy for each problem generally requires a unique solution.

• It is generally worthwhile spending significant time considering alternative

algorithms to find an optimal one, rather than just implementing the first thing
that comes to mind

• But, consider the time required to code a given parallel implementation

- You can use a less efficient method if the implementation is much easier.
- You can always improve the parallelization scheme later. Just focus on making
the code parallel first.

TIME is the ultimate factor is choosing a parallelization strategy---Your Time!

References
Introductory Information on Parallel Computing
• Designing and Building Parallel Programs, Ian Foster
http://www.mcs.anl.gov/~itf/dbpp/
-Somewhat dated (1995), but an excellent online textbook with detailed discussion about
many aspects of HPC. This presentation borrowed heavily from this reference
• Introduction to Parallel Computing, Blaise Barney
https://computing.llnl.gov/tutorials/parallel_comp/
-Up to date introduction to parallel computing with excellent links to further information
• MPICH2: Message Passage Inteface (MPI) Implementation
http://www.mcs.anl.gov/research/projects/mpich2/
-The most widely used Message Passage Interface (MPI) Implementation
• OpenMP
http://openmp.org/wp/
-Application Program Interface (API) supports multi-platform shared-memory parallel
programming in C/C++ and Fortran
• Numerical Recipes
http://www.nr.com/
-Incredibly useful reference for a wide range of numerical methods, though not focused on
parallel algorithms.
• The Top 500 Computers in the World
http://www.top500.org/
-Updated semi-annually list of the Top 500 Supercomputers
References
Introductory Information on Parallel Computing
• Message Passing Interface (MPI), Blaise Barney
https://computing.llnl.gov/tutorials/mpi/
-Excellent tutorial on the use of MPI, with both Fortran and C example code

• OpenMP, Blaise Barney

https://computing.llnl.gov/tutorials/openMP/
-Excellent tutorial on the use of OpenMP, with both Fortran and C example code

• High Performance Computing Training Materials, Lawrence Livermore National Lab

https://computing.llnl.gov/?set=training&page=index
-An excellent online set of webpages with detailed tutorials on many aspects of high
performance computing.

Parallel Computing
No ratings yet
Parallel Computing
28 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
34 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
High Performance Computing Labs & Concepts
No ratings yet
High Performance Computing Labs & Concepts
5 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Parallel Computing A Comparative
No ratings yet
Parallel Computing A Comparative
65 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Parallel Computing MCSE011
No ratings yet
Parallel Computing MCSE011
189 pages
Unit 1
No ratings yet
Unit 1
22 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
Slides 1
No ratings yet
Slides 1
28 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
High Performance Computing ChapterSampler
No ratings yet
High Performance Computing ChapterSampler
124 pages
CO1-Parallel Computers - Latest
No ratings yet
CO1-Parallel Computers - Latest
16 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
40 pages
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
No ratings yet
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
8 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
01 - Parallel Programming
No ratings yet
01 - Parallel Programming
18 pages
UNIT 2 (HPC)
No ratings yet
UNIT 2 (HPC)
10 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
47 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
29 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
Parallel Computing Simply in Depth by Ajit Singh PDF
No ratings yet
Parallel Computing Simply in Depth by Ajit Singh PDF
125 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
90 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
A Review On Use of MPI in Parallel Algorithms: IPASJ International Journal of Computer Science (IIJCS)
No ratings yet
A Review On Use of MPI in Parallel Algorithms: IPASJ International Journal of Computer Science (IIJCS)
8 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Secure Software Notes
No ratings yet
Secure Software Notes
4 pages
Taniya Rawat: Objective
No ratings yet
Taniya Rawat: Objective
1 page
1.5.3.5 Packet Tracer - Creating A Cyber World
0% (1)
1.5.3.5 Packet Tracer - Creating A Cyber World
4 pages
Slab Slub Slob
No ratings yet
Slab Slub Slob
1 page
Enterprise Computing Note
No ratings yet
Enterprise Computing Note
108 pages
Cogito Fundamentals Exam Newest 2025 Questions and Correct Detailed Answers (Verified Answers) - Already Graded A+ Flashcards - Quizlet
No ratings yet
Cogito Fundamentals Exam Newest 2025 Questions and Correct Detailed Answers (Verified Answers) - Already Graded A+ Flashcards - Quizlet
10 pages
Maximo Automation & Integration Guide
100% (1)
Maximo Automation & Integration Guide
2 pages
EC200 Hop
No ratings yet
EC200 Hop
49 pages
Aws Certified Cloud Practitioner PDF
No ratings yet
Aws Certified Cloud Practitioner PDF
5 pages
Facebook's Growth and Impact
No ratings yet
Facebook's Growth and Impact
17 pages
Data Collection MCQs with Answers
100% (5)
Data Collection MCQs with Answers
9 pages
Module 5 MCQ of Cyber Security
No ratings yet
Module 5 MCQ of Cyber Security
14 pages
EMPTECH - Grade 11
No ratings yet
EMPTECH - Grade 11
2 pages
Tarefas Treinamentos
100% (1)
Tarefas Treinamentos
2 pages
Sr Data Engineer Job at NTE India
No ratings yet
Sr Data Engineer Job at NTE India
3 pages
Technical Manager
No ratings yet
Technical Manager
2 pages
Full Stack - QP
No ratings yet
Full Stack - QP
9 pages
Iwall Heliodor
No ratings yet
Iwall Heliodor
5 pages
Cybersource Salesforce Order Management Technical Guide
No ratings yet
Cybersource Salesforce Order Management Technical Guide
9 pages
Planning and Managing A Product Backlog Slides
No ratings yet
Planning and Managing A Product Backlog Slides
62 pages
Mobile Commerce Course Overview
No ratings yet
Mobile Commerce Course Overview
1 page
21 Century Literature From The Philippines and The Worl: Creative Literary Adaptations
No ratings yet
21 Century Literature From The Philippines and The Worl: Creative Literary Adaptations
8 pages
MLOps for ML Researchers & Practitioners
No ratings yet
MLOps for ML Researchers & Practitioners
13 pages
Lab 05 - JIRA Project With Kanban Board
No ratings yet
Lab 05 - JIRA Project With Kanban Board
17 pages
Certified Digital Marketing Professional - Session 01
No ratings yet
Certified Digital Marketing Professional - Session 01
13 pages
Data Security Policy
No ratings yet
Data Security Policy
7 pages
ATM Banking Software Support-Submitting A Software Support Incident
No ratings yet
ATM Banking Software Support-Submitting A Software Support Incident
1 page
Oracle - Fusion - Vs - ARCS - Reconciliation - Comparison To Be Updated From Word File
No ratings yet
Oracle - Fusion - Vs - ARCS - Reconciliation - Comparison To Be Updated From Word File
11 pages
MP Ist 115 16
No ratings yet
MP Ist 115 16
16 pages
Ibm Tivoli Sample Resume 3
No ratings yet
Ibm Tivoli Sample Resume 3
3 pages

002 IntroHPC

Uploaded by

002 IntroHPC

Uploaded by

Introduction to

High Performance Computing

PHYS 5905: Numerical Simulation of Plasmas

• Parallel Computer Architectures

• Parallel Programming Models

Why Use Parallel Computing?

• Multi-core processors and multiple processors are the most

Definition of a parallel computer:

• Petascale computing also refers to extremely large data sets

• Parallel Computer Architectures

• Parallel Programming Models

Concurrency is the key concept in the design of parallel algorithms:

DEFINITION Speedup: The ratio of the serial wallclock time to the

• Parallel Overhead, the time required to coordinate parallel tasks

1) Give different ranges of x 2) Give N/4 points to each

How do we do this computation in parallel?

- This is an example of data dependence that results in a non-

Thus, each of you may require a different parallel implementation

• Flexibility in the way a problem is solved is beneficial to finding a

• Often, one has to employ substantial creativity in the way a

Example: Pressure Forces (Local) vs. Gravitational Forces (Global)

Input/Output (I/O) is INCREDIBLY SLOW

We’ll learn about these when we discuss the design of parallel

• Parallel Computer Architectures

• Parallel Programming Models

• Control unit fetches instructions from memory,

• Arithmetic Unit performs mathematical operations

• Input/Output is interface to the user

• MISD: Single data stream operated on by different sets of instructions, not

Hybrid Distributed Shared Memory

• MPI: Based on a distributed-memory model, data associated with another

• GPUs: Graphics Processing Units (GPUs) incorporate many (hundreds) of

• Parallel Computer Architectures

• Parallel Programming Models

• Single Program-Multiple Data (SPMD)

• Other Parallel Implementations: GPUs and CUDA

• Little to no coordination (and thus no communication) between tasks

• Each task can be a simple serial program

• This is the “easiest” type of problem to implement in a parallel manner.

• When completed, each slave Slave Slave

• Works in a shared memory architecture

• Most common implementation is OpenMP (Open Multi-Processing)

• Message Passing Interface (MPI) is the most widely used implementation

• Tasks exchange data by sending

• One aim of this course will focus on the

• A single program executes on all tasks simultaneously

• At a single point in time, different tasks

Multiple Program-Multiple Data (MPMD)

• The different executable programs may

• CUDA (Compute Unified Device Architecture) programming allows one to

• It is generally worthwhile spending significant time considering alternative

• But, consider the time required to code a given parallel implementation

TIME is the ultimate factor is choosing a parallelization strategy---Your Time!

• OpenMP, Blaise Barney

• High Performance Computing Training Materials, Lawrence Livermore National Lab

You might also like