0% found this document useful (0 votes)

20 views42 pages

HPC Introduction Lecture 3

Uploaded by

Shehzad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views42 pages

HPC Introduction Lecture 3

Uploaded by

Shehzad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Independent University, Bangladesh

Department of Computer Science and Engineering

Course Title: Introduction to High Performance Computing
Course Code: Autumn-2024-CSC471

SECTION 1: (T) 06:30 PM --- 9:30 PM

Presented by
Dr. Rubaiyat Islam
Adjunct Faculty, IUB.
Omdena Bangladesh Chapter Lead
Crypto-economist Consultant
Sifchain Finance, USA.
WHAT IS APPLICATION
TIMING?

• Analysis of a program’s behavior using information gathered

as the program runs

• Why do it?
• Good way to improve efficiency of scripts
• Identify performance problems
• Many times required for allocation requests
HOW WOULD YOU DO IT?

• Measure time of execution of an entire program or simply

a code snippet
• Loops
• Timing functions within programs
• Python, Fortran, C++, R have functions that allow you to
measure the execution time of small code snippets
• Can also do this with the Linux “time” command
• Can make changes to code to improve efficiency
• Or…it’s just informational
THE LINUX TIME UTILITY

• The first place to start when profiling your program

time mpirun –np 4 ./prog.mpi

real 0m17.801s Wall clock time

user 0m58.125s Threads x Wall clock
time
sys 0m0.081s System overhead
FINE-GRAINED TIMING

• Often useful to time portions of a program

• Good idea when developing your own code
• Tough when its 3rd party software
• Useful functions:
• Fortran: system_clock
• C++ clock()
• Python: time.time()
• R sys.time()
SERIAL VS. PARALLEL
PROCESSING

• Serial processing
• A problem is broken into a set of discrete instructions
• These instructions are carried out sequentially on a single
processor
• Only one instruction is executed at a tifme

• Parallel processing
• Idea where many instructions are carried out simultaneously
across a computing system
• Can divide a large problem up into many smaller problems

6
WHY PARALLELIZE?

• Single core too slow for solving the problem in a

“reasonable” time
• “Reasonable” time: overnight, over lunch, duration of a PhD
thesis

• Memory requirements
• Larger problem
• More physics
• More particles
• Larger images
• Larger neural networks
7
BASIC COMPUTER
ARCHITECTURE

• Old computers – one unit to • New computers have 4 or

execute instructions more cpu cores

8
SERIAL PROCESSING –
THOUGHT EXPERIMENT

• Let’s say you own a lawn service company

• You have one hundred clients who each want their lawn
mowed, with patterns
• Each of them want their lawn mowed by the end of the week
• A serial process would be for you to mow all one hundred
laws yourself
• You cannot mow lawn 2 until you mow lawn 1, etc
• Let’s say doing this takes you the full 7 days to complete,
working 16 hour days

9
SERIAL PROCESSING

• Instructions are executed on

one core
• The other cores sit idle
• If a task is running, Task 2 waits
for Task 1 to complete, etc.
• Wasting resources
• Want to instead parallelize and
use all cores

10
SERIAL VS. PARALLEL
PROCESSING
1
2

PARALLEL PROCESSING –
THOUGHT EXPERIMENT

• Let’s say that you decide that 100 lawns is too many for one
person to mow in a week
• Or you want to finish it faster
• Therefore you hire one additional person to help you
• How long (in theory) should it take you to finish the lawns?
• Either 3.5 days working 16 hours each day, or 7 days working 8 hour
days
• You could accomplish this either by both working on one lawn
together or each of you working on a different lawn at the
same time (more on this later)
1
3

PARALLEL PROCESSING –
THOUGHT EXPERIMENT

• Similarly, you could hire three more people

• Now five total
• How long should it take you to finish?
• In theory, five times faster
• However, it doesn’t actually work out this way. Why?
• Overhead
• Communication
• Who is mowing which lawn?
• If you split a lawn, who mows which parts?
• How do you make sure the patterns match up?
1
4

PARALLEL PROCESSING – THOUGHT

EXP ERIMENT (CONT.)

• However, it doesn’t actually work out this way. Why?

• Resource contention
• Fights over who gets to use the best lawn mower
• So maybe instead of five times as fast its four times as fast
• Still faster
• More people?
• Too many people slows down the process too much to make it
worthwhile
• Diminishing return
• 100 might be too many
1
5

PARALLEL OVERHEAD

• Should you convert your serial code to parallel?

• Usually do it to speed up
• But need to consider things like overhead
• Overhead because of
• Startup time
• Synchronizations
• Communication
• Overhead by libraries, compilers
• Termination time

https://computing.llnl.gov/tutorials/parallel_comp/#ModelsShared
1
6

PROGRAMMING TO USE
PARALLELISM

• Parallelism across
processors/threads - OpenMP

• Parallelism across multiple

nodes - MPI

www.scan.co.uk
PARALLEL PROCESSING MUSTS
AND TRICKS

• Need to be able to break the problem up into parts that

can work independently of each other
• Can’t have the results from one CPU depend on another at
each time step

• Do loops are a great place to start looking for bottlenecks

in your code
1
8

MEMORY MODELS

• There are three common kinds of parallel memory models

• Shared
• Distributed
• Hybrid
SHARED MEMORY MODEL
2
0

SHARED MEMORY MODEL

• All cores share the same pool of memory

• HPC Architecture – we talked about the memory available on one node
• Any memory changes seen by all processors
2
1

THOUGHT EXPERIMENT

• Let’s go back to our lawn mowing example

• From the serial vs. parallel processing
• In this example, the lawns are the memory
• The workers are the cores
• When all the workers are working on one lawn, they are sharing the memory
• Every “core” is impacted by changes to “memory”
2
2

BENEFITS AND DRAWBACK

• Benefit:
• Data sharing is fast

• Drawback:
• Adding more processors may lead to performance issues when accessing the same
shared memory resource (memory contention)
DISTRIBUTED MEMORY MODEL
2
4

DISTRIBUTED MEMORY MODEL

• In a distributed memory model, each core has its own memory
• Processors share memory only through a network connection and/or
communication protocol ( e.g., MPI )
• Changes to local memory associated with processor do not have an impact on
other processors
• Remote-memory access must be explicitly managed by the programmer
2
5

THOUGHT EXPERIMENT

• Let’s go back to our lawn mowing example

• From the serial vs. parallel processing
• In this example, the lawns are the memory
• The workers are the cores
• When each worker is working on a different lawn, it is distributed memory
2
6

BENEFITS AND DRAWBACKS

• Biggest benefit is scalability

• Adding more processors doesn’t result in resource contention as far as memory is
concerned

• Biggest Drawback
• Can be tedious to program for distributed memory models
• All data relocation must be programmed by hand
HYBRID MEMORY MODEL
2
8

HYBRID MEMORY MODEL

• As the name implies, the hybrid memory model is a combination of the shared
and distributed memory models
• Most large and fast clusters today admit a hybrid-memory model
• A certain number of cores share the memory on one node, but are connected to
the cores sharing memory on other nodes through a network
2
9

THOUGHT EXPERIMENT

• Let’s go back to our lawn mowing example

• From the serial vs. parallel processing
• In this example, the lawns are the memory
• The workers are the cores
• The idea that you have several workers working on one lawn
• Or, better, several workers working on sections of a lawn, and have to
communicate to make it work
• Patterns
BENEFITS AND DRAWBACKS

• Benefit:
• Scalability

• Drawback
• Must know how to program communication between
nodes (e.g., MPI)
DATA AND TASK PARALLELISM

• Earlier discussed data parallel memory methods

• One of them was distributed memory, wherein different
memory pools were accessed by different cores on a single
node
• Data and task parallelism are a similar concept
• Data parallelism
• Distribute the data across processors
• Task parallelism
• Distribute the compute tasks across processors
DATA PARALLELISM

• Different parts of a dataset are distributed across nodes

array1=a b c d
NODE 1 NODE 2
a b c d
TASK PARALLELISM

• Each processor executes a different task on the same

dataset
• Tasks (code, instructions) are spread out among the cores
• Might be same instructions/code or different

• Distributed programming
• Example: Calculating wind speed from vector components
across a geographic area. Divide vector calculation among
processors
DATA PARALLELISM - SIMD

• Two types of data parallelism we’ll discuss here

• SIMD – Single Instruction, Multiple Data
• SPMD – Single Program, Multiple Data
• SIMD
• Carry out the same instruction simultaneously multiple times across
different elements of a dataset
• Vector operation
• Addition, subtraction, multiplication, division
• Have to prepare your data to be vectorized
VECTORIZATION

Non-vectorized
a=rand(1,4)
b=rand(1,4)

• Simply put, performing multiple math for i=1:length(a)

operations c(i)=a(i)+b(i)
end

Vectorized
a=rand(1,4)
b=rand(1,4)

c=a+b

Python, R, etc.
Compiled languages – compiler can handle it
DATA PARALLELISM - SPMD

• SPMD
• Carry out the same program multiple times on different elements of a dataset
• Calculate the wind direction from wind components

a1 a2 b1 b2

Program

c1 c2
WHY DO THIS?

• Cleaner code
• Faster execution time
• Eliminating loops!
• Usually not too challenging
• Many languages have functions that make this easy to perform
3
8

HIGH THROUGHPUT COMPUTING

• Thus far: High Performance Computing (HPC)

• Typical HPC: employ multiple processors to
• Solve a problem faster
• Solve larger problems
• Today: High Throughput Computing (HTC)
• Typical HTC:
• Multiple small jobs spread across many processors
3
9

HIGH THROUGHPUT COMPUTING

• HTC useful when have many small jobs that require little computational power
or memory
• Jobs are typically serial, and not parallel
• HTC advantage:
• Small serial jobs can fill in the “gaps” left by large parallel jobs
• E.g., Open Science Grid
• Effectively parallel: batch of jobs completes faster when spread across multiple cores
• Example: Image analysis
4
0

ADVANTAGES AND DISADVANTAGES

• Advantages
• Simplicity
• Much easier to match one task to one CPU rather than many at
once
• Doesn’t require knowledge of parallelization for programmer
• Disadvantage
• Your HPC center might not be set up ideally for HTC
• Might not allow for node sharing
• No batch submission system to manage multiple small jobs
• Possibly requires heavy scripting to manage workflow
4
1

HTC MECHANICS

• No real tricks
• Break down your problem and then submit a lot of smaller jobs
• For example, if you are analyzing 1 million images, rather than submitting
one job to analyze all 1 million images, submit one thousand jobs that
analyze 1000 images each
• The resource manager (i.e., Slurm) should take care of the rest

• Is HTC appropriate? Problem dependent!

• Is the serial execution time reasonable?
• Can the problem fit into one core’s worth of memory?
THANK YOU

Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
PDC 3
No ratings yet
PDC 3
26 pages
CO1-Parallel Computers - Latest
No ratings yet
CO1-Parallel Computers - Latest
16 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
Parallel 123
No ratings yet
Parallel 123
28 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
47 pages
Parallel Programming & Multithreading
No ratings yet
Parallel Programming & Multithreading
168 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
CO1-Parallel Computers Latest
No ratings yet
CO1-Parallel Computers Latest
16 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
PC 1
No ratings yet
PC 1
53 pages
Bcs702 Parallel Computing Module 1
100% (2)
Bcs702 Parallel Computing Module 1
35 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Parallel Programming
No ratings yet
Parallel Programming
10 pages
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
No ratings yet
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
179 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
High Performance Computing
No ratings yet
High Performance Computing
8 pages
01 - Lecture Intro To HPC
No ratings yet
01 - Lecture Intro To HPC
62 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
Parallel Programming
No ratings yet
Parallel Programming
5 pages
Distributed vs Parallel Computing Concepts
No ratings yet
Distributed vs Parallel Computing Concepts
29 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
40 pages
HPCfirstlecture
No ratings yet
HPCfirstlecture
4 pages
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
100% (2)
Parallel Computers Architecture and Programming V. Rajaraman PDF Download
59 pages
1 Introduction
No ratings yet
1 Introduction
65 pages
Soniya Kumari: Career Objective
No ratings yet
Soniya Kumari: Career Objective
2 pages
Handbook On Electronic Interlocking Maintenance Instruction Series I PDF
100% (2)
Handbook On Electronic Interlocking Maintenance Instruction Series I PDF
38 pages
Connected Components Workbench - 12.00.00 (Released 3 - 2019)
No ratings yet
Connected Components Workbench - 12.00.00 (Released 3 - 2019)
9 pages
Typing Test English
No ratings yet
Typing Test English
1 page
Ericsson MW MOP Doc 07.10.2019
No ratings yet
Ericsson MW MOP Doc 07.10.2019
4 pages
BUT170757
No ratings yet
BUT170757
7 pages
Cloud Architect Roadmap
No ratings yet
Cloud Architect Roadmap
3 pages
Web Technology Submission
No ratings yet
Web Technology Submission
65 pages
Hash Map
No ratings yet
Hash Map
13 pages
Introduction To Return
No ratings yet
Introduction To Return
3 pages
Manual
No ratings yet
Manual
10 pages
Casa Cookbook
No ratings yet
Casa Cookbook
576 pages
REVTEX 4.1 Formatting Guide
No ratings yet
REVTEX 4.1 Formatting Guide
6 pages
ZYCOO IP Audio Solution Whitepaper 20200326
No ratings yet
ZYCOO IP Audio Solution Whitepaper 20200326
23 pages
Aimbot Ahk
33% (3)
Aimbot Ahk
3 pages
MAD1
No ratings yet
MAD1
28 pages
Advanced Steam Distillation: Vapodest 300
No ratings yet
Advanced Steam Distillation: Vapodest 300
2 pages
PLT-06737 A.3 - HID FARGO DTC1500 Printer Firmware Release Notes
No ratings yet
PLT-06737 A.3 - HID FARGO DTC1500 Printer Firmware Release Notes
8 pages
螢幕截圖 2024-11-13 下午4.39.38
No ratings yet
螢幕截圖 2024-11-13 下午4.39.38
47 pages
FI MM SDintegration PDF
100% (1)
FI MM SDintegration PDF
25 pages
Dell NSA 220,250,2600,3600
No ratings yet
Dell NSA 220,250,2600,3600
9 pages
iOS Kernel Heap Armageddon
No ratings yet
iOS Kernel Heap Armageddon
95 pages
Create WIM Image of Windows XP For System Deployment - VMware Wiki PDF
No ratings yet
Create WIM Image of Windows XP For System Deployment - VMware Wiki PDF
41 pages
Aarong Software BRD or CR Format
No ratings yet
Aarong Software BRD or CR Format
7 pages
SD.1800 SD.310V: DVD Video Player
No ratings yet
SD.1800 SD.310V: DVD Video Player
59 pages
Unit 1 - How To Access The Portal: Assignment-0
No ratings yet
Unit 1 - How To Access The Portal: Assignment-0
3 pages
Monitoring Platform Common Issues
No ratings yet
Monitoring Platform Common Issues
12 pages
Cyber Threat Detection Basics
No ratings yet
Cyber Threat Detection Basics
49 pages
Xy4 Customization Guide
No ratings yet
Xy4 Customization Guide
318 pages
HS40 - v1.02.04 Technical Note - Eng - 20200918
100% (1)
HS40 - v1.02.04 Technical Note - Eng - 20200918
28 pages

HPC Introduction Lecture 3

Uploaded by

HPC Introduction Lecture 3

Uploaded by

Independent University, Bangladesh

Department of Computer Science and Engineering

SECTION 1: (T) 06:30 PM --- 9:30 PM

• Analysis of a program’s behavior using information gathered

• Measure time of execution of an entire program or simply

• The first place to start when profiling your program

time mpirun –np 4 ./prog.mpi

real 0m17.801s Wall clock time

• Often useful to time portions of a program

• Single core too slow for solving the problem in a

• Old computers – one unit to • New computers have 4 or

• Let’s say you own a lawn service company

• Instructions are executed on

• Similarly, you could hire three more people

PARALLEL PROCESSING – THOUGHT

• However, it doesn’t actually work out this way. Why?

• Should you convert your serial code to parallel?

• Parallelism across multiple

• Need to be able to break the problem up into parts that

• Do loops are a great place to start looking for bottlenecks

• There are three common kinds of parallel memory models

SHARED MEMORY MODEL

• All cores share the same pool of memory

• Let’s go back to our lawn mowing example

BENEFITS AND DRAWBACK

DISTRIBUTED MEMORY MODEL

• Let’s go back to our lawn mowing example

BENEFITS AND DRAWBACKS

• Biggest benefit is scalability

HYBRID MEMORY MODEL

• Let’s go back to our lawn mowing example

• Earlier discussed data parallel memory methods

• Different parts of a dataset are distributed across nodes

• Each processor executes a different task on the same

• Two types of data parallelism we’ll discuss here

• Simply put, performing multiple math for i=1:length(a)

HIGH THROUGHPUT COMPUTING

• Thus far: High Performance Computing (HPC)

HIGH THROUGHPUT COMPUTING

ADVANTAGES AND DISADVANTAGES

• Is HTC appropriate? Problem dependent!

You might also like