0% found this document useful (0 votes)

70 views64 pages

3.introduction To Parallelism

The lecture introduces the concept of parallelization, emphasizing its importance in computing efficiency and speed, particularly in processing large datasets. It covers various programming models for parallel computing, including shared and distributed memory systems, and highlights tools such as OpenMP and MPI. The session also discusses performance measures like speedup and efficiency, along with practical examples of parallel code implementation.

Uploaded by

1none2none3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views64 pages

3.introduction To Parallelism

Uploaded by

1none2none3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Lecture 3

Introduction to Parallelization

August 7, 2023
Logistics
• Office hours: 5 – 6 PM
– Chetna (RM) chetna@cse
– Muzafar (KD-213) muzafarwan@cse
– Vishal Singh (RM-505) vshlsng@cse
• Group formation
– Email by August 9
– Include names, roll numbers, email-ids
– vdeka@cse, chetna@cse

2
Parallelism Everywhere

Source: https://gilmour.com/
Why Parallel?
Task: Find the average age of Indians

India 2020 population is estimated at 1,380,004,385

people at mid year according to UN data.

Time (1 human): > 40 years

Time (1 CPU): 10 s

Time (2 CPUs): 5 s

Time (4 CPUs): 3 s
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

5
Why Fast?

6
Basic Computing Unit

CPU/Core
Memory

Intel Core i7
Processing unit
(Courtesy: www.intel.com)
System – Simplified View

Memory

Fast Memory CPU cores

DISK

8
Multicore Era
CPU
Intel 4004
(1971)
Single core
single chip

Single core Hydra Multiple cores

(1996)
multiple chips single chip
Cray X-MP IBM POWER4
(1982) (2001)
Multiple cores
multiple chips

9
Moore’s Law
Number of CPU cores per
node increased

Gordon Moore

[Source: Wikipedia]
Parallel Computing

DISK DISK

DISK DISK
Supercomputer/Cluster/Data Center
Network is the backbone for data communication
Parallel Computer

Compute nodes
Domain Decomposition

Compute nodes
Discretization

Gridded mesh for a global model [Credit: Tompkins, ICTP]

15
Data Bottleneck
Congestion

Read/write

Storage

Example: ‘Age’ is data

Compute nodes
Average – Serial vs. Parallel
Serial Parallel

for i = 1 to N for i = 1 to N/P

sum += a[i] sum += a[i]
avg = sum/N collect sums and compute

17
Parallel Computer

A parallel computer is a collection of processing

elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

18
Parallel Average
Core
Process
Memory

for i = 0 to N/P for i = N/P to 2N/P for i = 2N/P to 3N/P for i = 3N/P to N
sum += a[i] sum += a[i] sum += a[i] sum += a[i]

0 1 2 3
19
Parallel Code Example

// local computation at every process/thread

for i = N/P * id ; i < N/P * (id+1) ; i++
localsum += a[i]

// collect localsum, add up in one of the ranks

and compute average

20
Performance Measure
• Speedup
Time ( 1 processor)
SP =
Time ( P processors)

• Efficiency
SP
EP =
P

21
Parallel Performance (Parallel Sum)
Parallel efficiency of summing 10^7 doubles

#Processes Time (sec) Speedup Efficiency

1 0.025 1 1.00
2 0.013 1.9 0.95
4 0.010 2.5 0.63
8 0.009 2.8 0.35
12 0.007 3.6 0.30

22
Ideal Speedup
Speedup Linear
Superlinear

Sublinear

Processors
23
Scalability Bottleneck

Performance of weather simulation application

24
Programming

25
Parallel Programming Models
Libraries MPI, TBB, Pthread, OpenMP, …
New languages Haskell, X10, Chapel, …
Extensions Coarray Fortran, UPC, Cilk, OpenCL, …

• Shared memory
– OpenMP, Pthreads, CUDA, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP, MPI + CUDA
26
Sharing Data

27
Parallel Programming Models
Shared memory programming – OpenMP, Pthreads
Distributed memory programming – MPI

Cache
Core
Process/
thread
28
Shared Memory Programming
• Shared address space
• Time taken to access certain memory words is longer (NUMA)
• Programming paradigms – Pthreads, OpenMP
• Need to worry about concurrent access

Memory
CPU cores

29
Threads

From Tim Mattson’s slides 30

OpenMP (Open Multiprocessing)
• Standard for shared memory programming
– Compiler directives
– Runtime routines
– Environment variables
• OpenMP Architecture Review Board
• First released in Nov’97
• Current version 5.1 (Nov’20)

31
OpenMP Example
• Thread-based
• Fork-join model

#pragma omp parallel //fork

Spawn a
{ default number
of threads

} //join

32
OpenMP

$ gcc –fopenmp –o foo foo.c

33
OpenMP

34
OpenMP

35
Output

36
OpenMP – Parallel Sum

Work on distinct data concurrently

37
OpenMP Timing

double stime = omp_get_wtime();

#pragma omp parallel
{
…
}
double etime = omp_get_wtime();

38
Multiple Systems

Cache
Core
Process

39
Distributed Memory Systems
64 – 192 GB RAM/node

• Networked systems
Node • Distributed memory
• Local memory
• Remote memory
• Parallel
Codefile system

Cluster
40
MPI (Message Passing Interface)
• Standard for message passing in a distributed
memory environment (most widely used
programming model in supercomputers)
• Efforts began in 1991 by Jack Dongarra, Tony
Hey, and David W. Walker
• MPI Forum formed in 1993
– Version 1.0: 1994
– Version 4.0: 2021

41
Process - Distinct Address Space
Core
Process
Memory

Local data Local data Local data Local data

42
Multiple Processes on a Single Node

From N. Karanjkar’s slides

43
Multiple Processes on Multiple Nodes

Node 1

Node 2
44
Communication using Messages
Core
Process
Memory

Local data Local data Local data Local data

for i = 0 to N/P for i = N/P to 2N/P for i = 2N/P to 3N/P for i = 3N/P to N
sum += a[i] sum += a[i] sum += a[i] sum += a[i]

45
Communication using Messages
Core
Process
Memory

Local data Local data Local data Local data

Instruction 1 Instruction 1 Instruction 1 Instruction 1
Instruction 2 Instruction 2 Instruction 2 Instruction 2 SIMD
… … … …

46
Message Passing
Time

Process 0 Process 1 47
MPI Programming

48
MPI Programming

mpicc -o program.x program.c

49
Communication using Messages

Core
Process
Memory
Local data Local data Local data Local data

for i = 0 to N/P for i = N/P to 2N/P for i = 2N/P to 3N/P for i = 3N/P to N
sum += a[i] sum += a[i] sum += a[i] sum += a[i]

for i = N/P * rank ; i < N/P * (rank+1) ; i++

localsum += a[i]
Collect localsum, add up at one of the ranks
50
Communication using Messages

Core
Process
Memory

51
Simplest Communication Primitives

• MPI_Send
• MPI_Recv

52
MPI Programming

int MPI_Send (const void *buf, int count, MPI_Datatype datatype,

int dest, int tag, MPI_Comm comm)

SENDER RECEIVER
int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Status *status)
53
MPI Programming
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender process
if (myrank == 0) /* code for process 0 */
{
strcpy (message,"Hello, there");
MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99,
MPI_COMM_WORLD);
}

// Receiver process
else if (myrank == 1) /* code for process 1 */
{
MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
&status);
printf ("received :%s\n", message);
} 54
MPI – Parallel Sum
Assume the data array resides in the memory of process 0 initially
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender process
if (myrank == 0) /* code for process 0 */
{
for (int rank=1; rank<SIZE ; rank++) {
start = rank*N/size*sizeof(int);
MPI_Send (data+start, N/size, MPI_INT, rank, 99, MPI_COMM_WORLD);
}
}
else /* code for processes 1 … SIZE */
{
MPI_Recv (data, N/size, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);
}
55
MPI Timing
double stime = MPI_Wtime();
…
…
…
double etime = MPI_Wtime();

56
Parallelization

57
Interpolation

58
Interpolation

59
Range/Value Query

Input: File
Output: File

60
Query on a Million Processes

Compute nodes
Unstructured Mesh

Source: COMSOL
62
Unstructured Mesh

Obayashi et al., Multi-objective Design Exploration Using Efficient Global Optimization

63
Thank You

Bcs702 Parallel Computing Module 1
100% (2)
Bcs702 Parallel Computing Module 1
35 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Mpi Openmp Handouts
No ratings yet
Mpi Openmp Handouts
67 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Intro To MPI
No ratings yet
Intro To MPI
44 pages
Introduction To Parallel Programming: Center For Institutional Research Computing
No ratings yet
Introduction To Parallel Programming: Center For Institutional Research Computing
98 pages
Parallel Programming
No ratings yet
Parallel Programming
108 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
High Performance Computing For Computational Mechanics: ISCM-10
No ratings yet
High Performance Computing For Computational Mechanics: ISCM-10
63 pages
High Performance Computing Labs & Concepts
No ratings yet
High Performance Computing Labs & Concepts
5 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
HPC Summary
No ratings yet
HPC Summary
17 pages
ParallelProgramming Start2016
No ratings yet
ParallelProgramming Start2016
41 pages
Multi Core Architectures and Programming
No ratings yet
Multi Core Architectures and Programming
10 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
Parallel Programming For Multicore Machines Using OpenMP and MPI Lecture Notes (Dr. Constantinos Evangelinos) (Z-Library)
No ratings yet
Parallel Programming For Multicore Machines Using OpenMP and MPI Lecture Notes (Dr. Constantinos Evangelinos) (Z-Library)
292 pages
Mit Openmp Mpi
No ratings yet
Mit Openmp Mpi
77 pages
Apt05 2024S2
No ratings yet
Apt05 2024S2
23 pages
Lecture 03
No ratings yet
Lecture 03
39 pages
CSC-334 - P&DC - Lab Manual - V2.0
No ratings yet
CSC-334 - P&DC - Lab Manual - V2.0
102 pages
Introduction To Parallel Computing: What Is Parallel Computing? CS 480 - II Parallel and Scientific Computing
No ratings yet
Introduction To Parallel Computing: What Is Parallel Computing? CS 480 - II Parallel and Scientific Computing
10 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
MPI Python Workshop Day1 Fall2024
No ratings yet
MPI Python Workshop Day1 Fall2024
22 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
Parallel Programming and MPI
No ratings yet
Parallel Programming and MPI
54 pages
Understanding Multicore and OpenMP
No ratings yet
Understanding Multicore and OpenMP
82 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
CS621 Final Term Current Papers
100% (1)
CS621 Final Term Current Papers
9 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Parallel & Distributed Computing Course
33% (3)
Parallel & Distributed Computing Course
10 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
PDC Lecture 15 OpenMP
No ratings yet
PDC Lecture 15 OpenMP
18 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
MPI Programming for CS Students
No ratings yet
MPI Programming for CS Students
21 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
Mpi
No ratings yet
Mpi
46 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
3-Parallel Software
No ratings yet
3-Parallel Software
35 pages
Bahria University Lahore Campus: Department of Computer Sciences
No ratings yet
Bahria University Lahore Campus: Department of Computer Sciences
10 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Syllabus
No ratings yet
Syllabus
7 pages
A5E43455517 6 76 - MANUAL - SITOP Manager - en US
No ratings yet
A5E43455517 6 76 - MANUAL - SITOP Manager - en US
252 pages
UNIT - I Notes
No ratings yet
UNIT - I Notes
48 pages
SPM All Units
No ratings yet
SPM All Units
71 pages
Innovatrics SmartFace Datasheet
No ratings yet
Innovatrics SmartFace Datasheet
8 pages
OS Lab3
No ratings yet
OS Lab3
6 pages
XGBoost
No ratings yet
XGBoost
4 pages
Privileged Session Management Guide
No ratings yet
Privileged Session Management Guide
25 pages
VPN Procedure June23
No ratings yet
VPN Procedure June23
11 pages
Computer Networks Course Guide
No ratings yet
Computer Networks Course Guide
50 pages
Windows Server 2008 R2 Reviewers Guide RTM
No ratings yet
Windows Server 2008 R2 Reviewers Guide RTM
97 pages
CN Unitwise QB (2022-2026 Batch)
No ratings yet
CN Unitwise QB (2022-2026 Batch)
5 pages
6.1 Communication Applications
No ratings yet
6.1 Communication Applications
6 pages
Disk Storage and Basic File Structures
No ratings yet
Disk Storage and Basic File Structures
39 pages
Project Spring2023 Eng FINAL SV
No ratings yet
Project Spring2023 Eng FINAL SV
4 pages
Marketo Certification Guide
No ratings yet
Marketo Certification Guide
5 pages
Turing Machines and Language Theory
No ratings yet
Turing Machines and Language Theory
8 pages
Database Security Essentials
No ratings yet
Database Security Essentials
16 pages
Virinchi CV
No ratings yet
Virinchi CV
3 pages
Software Engineering: Engineering Department Computer Systems Engineering
No ratings yet
Software Engineering: Engineering Department Computer Systems Engineering
23 pages
Circuit Card Reference Avaya Communication Server 1000: Release 7.6 NN43001-311 Issue 06.01 March 2013
No ratings yet
Circuit Card Reference Avaya Communication Server 1000: Release 7.6 NN43001-311 Issue 06.01 March 2013
686 pages
Comparch Comparch-002 Exams Midterm A8Xj46NCRo
No ratings yet
Comparch Comparch-002 Exams Midterm A8Xj46NCRo
9 pages
CS111 Intro. To CS (Lecture 0 - Fall 2021)
No ratings yet
CS111 Intro. To CS (Lecture 0 - Fall 2021)
28 pages
Unit 1 Final
No ratings yet
Unit 1 Final
79 pages
Crash 2025 02 26 - 22.49.05 Client
No ratings yet
Crash 2025 02 26 - 22.49.05 Client
9 pages
FortiGate Security Study Guide For FortiOS5.6.2
No ratings yet
FortiGate Security Study Guide For FortiOS5.6.2
666 pages
Scalable LLM Deployment Architecture and Design
No ratings yet
Scalable LLM Deployment Architecture and Design
10 pages
DESE2024TT
No ratings yet
DESE2024TT
2 pages
Go Systems Programming Master Linux and Unix System Level Programming With Go 1st Edition Mihalis Tsoukalos Instant Download
100% (4)
Go Systems Programming Master Linux and Unix System Level Programming With Go 1st Edition Mihalis Tsoukalos Instant Download
77 pages
Practical Assessment ENP 2024-25 - Tagged
No ratings yet
Practical Assessment ENP 2024-25 - Tagged
4 pages
ABSTRACT - Spce061a Voice User Interface With
No ratings yet
ABSTRACT - Spce061a Voice User Interface With
86 pages

3.introduction To Parallelism

Uploaded by

3.introduction To Parallelism

Uploaded by

Lecture 3

India 2020 population is estimated at 1,380,004,385

Time (1 human): > 40 years

– Almasi and Gottlieb (1989)

Fast Memory CPU cores

Single core Hydra Multiple cores

Gridded mesh for a global model [Credit: Tompkins, ICTP]

Example: ‘Age’ is data

for i = 1 to N for i = 1 to N/P

A parallel computer is a collection of processing

– Almasi and Gottlieb (1989)

// local computation at every process/thread

// collect localsum, add up in one of the ranks

#Processes Time (sec) Speedup Efficiency

Performance of weather simulation application

From Tim Mattson’s slides 30

#pragma omp parallel //fork

$ gcc –fopenmp –o foo foo.c

Work on distinct data concurrently

double stime = omp_get_wtime();

Local data Local data Local data Local data

From N. Karanjkar’s slides

Local data Local data Local data Local data

Local data Local data Local data Local data

mpicc -o program.x program.c

for i = N/P * rank ; i < N/P * (rank+1) ; i++

int MPI_Send (const void *buf, int count, MPI_Datatype datatype,

Obayashi et al., Multi-objective Design Exploration Using Efficient Global Optimization

You might also like