0% found this document useful (0 votes)

14 views51 pages

L01 Introduction

The document provides an introduction to parallel computing, outlining its motivation, basic concepts, and challenges. It discusses the differences between serial and parallel computing, the importance of task decomposition, scheduling, and mapping, and the need for synchronization among tasks. Additionally, it highlights the evolution of computing and the necessity of parallelism to overcome limitations in serial processing.

Uploaded by

Vishnu Prasath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views51 pages

L01 Introduction

Uploaded by

Vishnu Prasath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

CS3210

Introduction

Lecture 1

1
Outline
 Parallel Computing
 Motivation
 Serial vs Parallel Computing
 Serial vs Parallel Processing Units
 Basic of Parallel Computing
 Decompose, Schedule and Map
 Performance
 Challenges
 Landscape of Parallel Computing
 Applications
 Systems
[ CS3210 AY2425S1 - L1 ]
2
Cheetah vs Hyena

 Weigh 60 kg  Weigh 50 kg
 Fastest land animal  Top speed of 65 km/h
 Top speed of 120 km/h
 0-100 km/h in 3 seconds
 Hunt alone  Hunt in group of 4-40
 Usually over in 1 minute  Can take down large animals
 50% of losing its catch to other like buffalo, rhino, lion
predators
[ CS3210 AY2425S1 - L1 ]
3
Observations

 Optimized on speed only  Optimized to cooperate and work

in parallel

 Single Processor:  Parallel Computer:

 Failed to meet Moore's Law since early  A collection of processing elements that
2000 cooperate to solve problems quickly
 Huge amount of computation power

[ CS3210 AY2425S1 - L1 ]
4
Parallel Computing - Motivation
 In the first issue of CACM (1958), Saul Gorn notes that:

“We know that the so-called parallel computers are somewhat faster
than the serial ones, and that no matter how fast we make access and
arithmetic serially, we can do correspondingly better in parallel.
However access and arithmetic speeds seem to be approaching a
definite limit ... Does this mean that digital speeds are reaching a limit,
that digital computation of multi-variate partial differential systems must
accept it, and that predicting tomorrow’s weather must require
more than one day? Not if we have truly-parallel machines, with say,
a thousand instruction registers.”

[ CS3210 AY2425S1 - L1 ]
5
Hitting the Wall
 ‘70s: “Supercomputers” for Scientific Computing
 Early ‘90s: Databases
 Inflection point in 2004: Intel hits the Power Density Wall

“obtaining more computing power by stamping multiple processors

on a single chip rather than straining to increase the speed of a
single processor.”

John Markoff, New York Times, May 17, 2004

*Cooking aware computing

[ CS3210 AY2425S1 - L1 ]
6
How do you make your program run faster?
 Answer before 2004:
 Just wait 6 months and buy a new machine!
 Answer after 2004:
 Write parallel software

[ CS3210 AY2425S1 - L1 ]
7
Parallel Computing - Challenges
 Gorn also recognized the challenges that would be posed to
the system designer by the new parallel computers:

“But visualize what it would be like to program for such a machine! If a

thousand subroutines were in progress simultaneously, one must program
the recognition that they have all finished, if one is to use their results
together. The programmer must not only synchronize his subroutines, but
schedule all his machine units, unless he is willing to have most of them
sitting idle most of the time. Not only would programming techniques
be vastly different from the serial ones in the serial languages we now
use, but they would be humanly impossible without machine intervention.”

[ CS3210 AY2425S1 - L1 ]
8
Serial Computing
problem

processing
unit

in i2 i1
instructions

 Traditionally, a problem is divided into a discrete

series of instructions
 Instructions are executed one after another
 Only one instruction executed at any moment in time
[ CS3210 AY2425S1 - L1 ]
9
Example: Evaluate a = (b+1) * (b-c)
Program Memory
Locations
b (4)
Add 1 to b
i: + b 1 t1
Store result in t1 1
t1 (5 )

Subtract c from b c (2)

Store result in t2 i2: - b c t2

t2 (2 )
Multiply t1, t2 i3: * t1 t2 a
Store result in a
a (10 )

single control flow

Sequential Computation Model

[ CS3210 AY2425S1 - L1 ]
10
Processing Unit: von Neumann Model
Memory
(holds instructions and data)

P
R
Control Arithmetic O
Unit Logic C
E
(Instruction Unit S
Counter) (Registers) S
O
R

input output

[ CS3210 AY2425S1 - L1 ]
11
von Neumann Computation Model
 Processor:
 Performs instructions
 Memory:
 Stores both the instructions and data in cells with unique addresses
 Control Scheme:
 Fetches instructions one after another from memory
 Shuttles data between memory and processor
 Memory wall
  disparity between processor (< 1 nanosecond or 10-9 sec) and
memory speed (100-1,000 nanoseconds)

[ CS3210 AY2425S1 - L1 ]
12
Ways to improve performance
1. Work hard  higher clock frequency

2. Work smart  pipelining, superscalar

processor

3. Get help  replication – multicore, cluster, ..

[ CS3210 AY2425S1 - L1 ]
13
Parallel Computing
 Simultaneous use of multiple processing units to solve a
problem fast / solve a larger problem

 Processing units could be:

i. A single processor with multiple cores
ii. A single computer with multiple processors
iii. A number of computers connected by a network
iv. Combinations of the above

 Ideally, a problem (application) is partitioned into sufficient

number of independent parts for execution on parallel
processing elements
[ CS3210 AY2425S1 - L1 ]
14
Parallel Computing
problem instructions

processing
task0 unit0

processing
task1 unit1

…… ………… ……

processing
taskm unitp-1

in i2 i1
1. A problem is divided into m discrete parts (tasks) that can be solved concurrently
2. Each part is further broken down to a series of instructions (i)
3. Instructions from each part execute in parallel on different processing units (p)

[ CS3210 AY2425S1 - L1 ]
15
Example: Evaluate a = (b+1) * (b-c)

fork creates parallel

fork i2
executing "thread"

i1: + b 1 t1 i2: - b c t2

join blocks executing

join 2 threads until n
wait until join i3:
threads reaches the
same point
* t1 t 2 a

multiple control flows

Parallel Computing (Shared-Memory)

[ CS3210 AY2425S1 - L1 ]
16
Why Parallel Computing
 Primary reasons:
1. Overcome limits of serial computing
2. Save (wall-clock) time
3. Solve larger problems

 Other reasons:
1. Take advantage of non-local resources
2. Cost saving – use multiple “cheaper” (commoditized) computing
resources
3. Overcome memory constraints

 Performance - exploits a large collection of processing units that

can communicate and cooperate to solve large problems fast
[ CS3210 AY2425S1 - L1 ]
17
BASIC OF
PARALLEL COMPUTING

[ CS3210 AY2425S1 - L1 ]
18
Computational Model Attributes
[Operation Mechanism]
• The primitive units of computation or basic actions of the computer (data types and
operations defined by the instruction set)

[Data Mechanism]
• The definition of address spaces available to the computation (how data are
accessed and stored)

[Control Mechanism]
• The schedulable units of computation (rules for partitioning and scheduling the
problem for computation using the primitive units)

[Communication Mechanism]
• The modes and patterns of communication among processing elements working
in parallel, so that they can exchange needed information

[Synchronization Mechanism ]
• Mechanism to ensure needed information arrives at the right time
[ CS3210 AY2425S1 - L1 ]
19
Basics of Parallel Computing
Application
Problem

decompose

Tasks
schedule
& map

Physical Cores &

Processors

[ CS3210 AY2425S1 - L1 ]
20
Decomposition
 One problem  many possible decompositions

 Complicated and laborious process

 Potential parallelism of an application dictates how it should be split
into tasks
 Size of tasks is called granularity – can choose different task sizes
 Defining the tasks is challenging and difficult to automate (why?)

[ CS3210 AY2425S1 - L1 ]
21
Tasks
 Tasks are coded in a parallel programming language

 Scheduling: Assignment of tasks to processes or threads

 Dictates the task execution order
 Manually defined? Static? Dynamic?

 Mapping: Assignment of processes/threads to physical

cores/processors for execution

 Tasks may depend on each other resulting in data or control

dependencies
 impose execution order of parallel tasks
[ CS3210 AY2425S1 - L1 ]
22
Dependences and Coordination
 Dependences among tasks impose constraints on scheduling

 To execute correctly, processes and threads need

synchronization and coordination
 depends on information exchanges between processes and threads
and in turns, depends on the hardware memory organization

 Memory organizations: shared-memory (threads) and

distributed-memory (processes)

[ CS3210 AY2425S1 - L1 ]
23
An illusion?

[ CS3210 AY2425S1 - L1 ]
24
Concurrency vs. Parallelism

Concurrency Parallelism
 Two or more tasks can start, run,  Two or more tasks can run (execute)
and complete in overlapping simultaneously, at the exact same
time periods time
 They might not be running (executing  Tasks do not only make progress,
on CPU) at the same instant but they also actually execute
 Two or more execution flows make simultaneously
progress at the same time by
interleaving their executions or by
executing instructions (on CPU) at
exactly the same time

[ CS3210 AY2425S1 - L1 ]
25
Example – Serial Solution
 Find the sum for n numbers:

sum = 0;
for (i = 0; i < n; i++) {
x = Compute_next_value(. . .);
sum += x;
}

Serial Code

[ CS3210 AY2425S1 - L1 ]
26
Example – Parallel Solution (1/3)
 Suppose we have p cores ( p < n )
 Each core can perform a partial sum of n/p values

my_sum = 0;
my_start = ……;
my_end = ……;
for (my_i = my_start; my_i < my_end; my_i++){
my_x = Compute_next_value(. . .);
my_sum += my_x;
}

 Each core uses its own private variables and executes this
block of code independently
[ CS3210 AY2425S1 - L1 ]
27
Example – Parallel Solution (2/3)
 Upon completion, each core contains:
 the partial sum in my_sum

 E.g. If n (=24), and p = 8

 Values generated:
1 4 3 9 2 8 5 1 1 5 3 7 2 5 0 4 1 8 6 5 1 2 3 9

 Partial sum calculated in each core:

Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14

[ CS3210 AY2425S1 - L1 ]
28
Example – Parallel Solution (3/3)
 Next, a core designated as the master core adds up all values
of my_sum to form the global sum (8 + 19 + 7 + 15 + 7 + 13 +
12 + 14 = 95)
if (I’m the master core){
sum = my_x;
for each core other than myself {
receive value from other core
sum += value;
}
} else { //not master core
send my_x to the master;
}
Parallel Code
Core 0 1 2 3 4 5 6 7
my_sum 8
95 19 7 15 7 13 12 14

[ CS3210 AY2425S1 - L1 ]
29
Better Parallel Algorithm
 Share the work of the global summing too:
CORES
0 1 2 3 4 5 6 7
8 19 7 15 7 13 12 14

+ 27 + 22 + 20 + 26
time

+ 49 + 46

+ 95

[ CS3210 AY2425S1 - L1 ]
30
Comparison: Two Parallel Solutions
 8 cores  1000 cores
 1st version = 7 steps  1st version = 999 steps
 2nd version = 3 steps  2nd version = 10 steps

 factor of 2 improvement  factor of 100 improvement

[ CS3210 AY2425S1 - L1 ]
31
Parallel Performance
 Two perspectives:
 Execution time versus Throughput

 Parallel execution time = computation time + parallelization

overheads

 Parallelization overheads:
 distribution of work (tasks) to processors
 information exchange or synchronization
 idle time
 etc
[ CS3210 AY2425S1 - L1 ]
32
Parallel Computing is HARD

“…when we start talking about parallelism and ease of use of truly parallel
computers, we're talking about a problem that's as hard as any that computer
science has faced. …
I would be panicked if I were in industry.”
John Hennessy, President, Stanford University, 2007

[ CS3210 AY2425S1 - L1 ]
33
Compiler Challenges
 Heterogeneous processors
 Increase in the design space for code optimization

 Auto-tuners:
 Optimizing code at runtime

 Software-controlled memory management

[ CS3210 AY2425S1 - L1 ]
34
Parallel Programming Challenges
 Finding enough parallelism (Amdahl's Law!)

 Granularity

 Locality

 Load balance

 Coordination and synchronization

 Debugging

 Performance modeling / Monitoring

[ CS3210 AY2425S1 - L1 ]
35
LANDSCAPE OF
PARALLEL COMPUTING

[ CS3210 AY2425S1 - L1 ]
36
Uses of Parallel Computing
 Data mining, web search engines, web-based business
services, pharmaceutical design, financial and economic
modeling, AI, etc.

http://computing.llnl.gov/tutorials/parallel_comp
[ CS3210 AY2425S1 - L1 ]
37
Types of Computer Systems

Computer Systems

Single System Distributed Systems

(multiple systems)

PC/Workstation SMP/NUMA Vector Mainframe

Clusters Client Server Grids Peer-to-Peer Cloud

Control and Management

Centralised Decentralized
[ CS3210 AY2425S1 - L1 ]
38
Parallel Computing and Parallel Machines
In your pocket:
Intel Skylake architecture:
ARM big.LITTLE 64-bit
Quad-core CPU + multi-core
(Cortex-A53 + Cortex-A73)
GPU integrated on one chip
4-8 cores, >2GHz

[ CS3210 AY2425S1 - L1 ]
40
Fastest Supercomputer

[ CS3210 AY2425S1 - L1 ]
41
2022-24 Fastest Supercomputer
Supercomputer Frontier
1,206 petaflops
8,699,904 cores
AMD Optimized 3rd Gen EPYC 64C 2GHz
22 Mwatts
No. 7* in GREEN500

Video

[ CS3210 AY2425S1 - L1 ]
42
2020-21 Fastest Supercomputer
Supercomputer
Fugaku
415 petaflops
~7300000 cores ARM A64FX
29 Mwatts
No. 33 in GREEN500

Video
Virtual tour

[ CS3210 AY2425S1 - L1 ]
43
Usage of Frontier
 Energy
 Fusion energy studies
 Combustion energy improvements
 Innovative clean energy systems
 Materials study
 Social and scientific issues
 Innovative drug discovery
 Personalized and preventive medicine
 Disaster prevention and global climate problems
 Meteorological and global environmental predictions
 Evolution of the universe
[ CS3210 AY2425S1 - L1 ]
45
Deconstructing Frontier
 Node:
 1 HPC and AI Optimized AMD EPYC CPU
 4 Purpose Built AMD Radeon Instinct GPU
 74 cabinets:
 Over 9000 nodes
 Interconnect:
 Multiple Slingshot NICs
 Slingshot dragonfly network

[ CS3210 AY2425S1 - L1 ]
46
Deconstructing Fugaku
Processor (CPU):
• Fujitsu’s 48-core A64FX SoC (optional assistant cores)

• 158,976 nodes (CPUs):

396 racks of 384 nodes
36 racks of 192 nodes
• Fujitsu Tofu (Torus Fusion) interconnect

[ CS3210 AY2425S1 - L1 ]
47
TOP500 Performance over Time

name prefix multiplier

exa E 1018
peta P 1015
tera T 1012
giga G 109
mega M 106
kilo K 103

source: top500.org
[ CS3210 AY2425S1 - L1 ]
48
Summary
 Serial versus parallel computing

 Parallel computing - what, why and how

 Performance and challenges

 Landscape of parallel computing

[ CS3210 AY2425S1 - L1 ]
49
Evolution of Prominent Computers
year location name first
1834 Cambridge Difference engine Programmable computer
1943 Bletchley Colossus Electronic computer
1948 Manchester SSEM (Baby) Stored-program computer
1951 MIT Whirlwind 1 Real-time I/O computer
1953 Manchester Transistor computer Transistorized computer
1971 California Intel 4004 Mass-market CPU & IC
1979 Cambridge Sinclair ZX-79 Mass-market home computer
1981 New York IBM PC Personal computer
1987 Cambridge Acorn A400 series High-street RISC PC sales
1990 New York IBM RS6000 Superscalar RISC processor
1998 California Sun picolJAVA Computer Based on a language

[ CS3210 AY2425S1 - L1 ]
50
Progression of Computation Speed
year Floating point operations per second (FLOPS)
1941 1
1945 100
1949 1,000 (1 KiloFLOPS, kFLOPS)
1951 10,000
1961 100,000
1964 1,000,000 (1 MegaFLOPS, MFLOPS)
1968 10,000,000
1975 100,000,000
1987 1,000,000,000 (1 GigaFLOPS, GFLOPS)
1992 10,000,000,000
1993 100,000,000,000
1997 1,000,000,000,000 (1 TeraFLOPS, TFLOPS)
2000 10,000,000,000,000
2007 478,000,000,000,000 (478 TFLOPS)
2009 1,100,000,000,000,000 (1.1 PetaFLOPS)
[ CS3210 AY2425S1 - L1 ]
51
Computing Units
International System of Units (SI) International Electrotechnical
Commission (IEC)
name prefix multiplier
Nam prefix multiplier
exa E 1018
e
peta P 1015
Exbi Ei 260
tera T 1012
pebi Pi 250
giga G 109
tebi Ti 240
mega M 106
gibi Gi 230
kilo K 103
mebi Mi 220
milli m 10-3
kibi Ki 210
micro µ 10-6
nano n 10-9 Example of disk capacity:
pico p 10-12 20 Mibytes
20 MiB
20 Mebibytes
= 20 x 220 = 20,971,520 bytes
[ CS3210 AY2425S1 - L1 ]
52
Reference & Readings
 Fugaku technical details:
 https://www.r-ccs.riken.jp/en/fugaku/

 Reading
 Robert Robey, Yuliana Zamora, ”Parallel and High-Performance
Computing”, Manning Publications – Chapter 1.
 A View of the Parallel Computing Landscape, CACM Vol. 52 No. 10,
pp. 56-67, October 2009. [Technical Report: The Landscape of
Parallel Computing Research: A View from Berkeley, Dec 2006]

[ CS3210 AY2425S1 - L1 ]
53

CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
Chapter # 1
No ratings yet
Chapter # 1
117 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
Parallel and Distributed Computing Module I
No ratings yet
Parallel and Distributed Computing Module I
28 pages
Parallel and Distributed Computing Module I
No ratings yet
Parallel and Distributed Computing Module I
26 pages
L04 Parallel Programming Models I
No ratings yet
L04 Parallel Programming Models I
72 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Week 1
No ratings yet
Week 1
74 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Elective 3
No ratings yet
Elective 3
30 pages
Unit 5
No ratings yet
Unit 5
66 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
47 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
PDC 3
No ratings yet
PDC 3
26 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
What Is Parallel Computing 1 PDF
No ratings yet
What Is Parallel Computing 1 PDF
21 pages
Chapter 1 Parallel - Computing General Concepts
No ratings yet
Chapter 1 Parallel - Computing General Concepts
52 pages
L03 Architecture Memory
No ratings yet
L03 Architecture Memory
56 pages
Introduction
No ratings yet
Introduction
17 pages
Types and Benefits of Parallel Computing
No ratings yet
Types and Benefits of Parallel Computing
11 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
63 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
PC 1
No ratings yet
PC 1
53 pages
Parallel and Distributed Computing-1
No ratings yet
Parallel and Distributed Computing-1
17 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
PDC 1
No ratings yet
PDC 1
41 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Module 2: Goals of Parallelism Week 2 Learning Outcomes:: General-Purpose Computing On Graphics Processing Units
No ratings yet
Module 2: Goals of Parallelism Week 2 Learning Outcomes:: General-Purpose Computing On Graphics Processing Units
11 pages
Distributed vs Parallel Computing Concepts
No ratings yet
Distributed vs Parallel Computing Concepts
29 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
51 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Unit1 Parallel and Distributed
No ratings yet
Unit1 Parallel and Distributed
29 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Introduction To Computing
No ratings yet
Introduction To Computing
6 pages
Parallel Computing: Charles Koelbel
No ratings yet
Parallel Computing: Charles Koelbel
12 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
Intro PDC1
No ratings yet
Intro PDC1
3 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
40 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Intro to Parallel Computing
No ratings yet
Intro to Parallel Computing
43 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
43 pages
LLNL Introduction To Parallel Computing
No ratings yet
LLNL Introduction To Parallel Computing
39 pages
Algorithms and Parallel Computing 1st Edition Fayez Gebali Full Chapters Instanly
100% (2)
Algorithms and Parallel Computing 1st Edition Fayez Gebali Full Chapters Instanly
144 pages
DevOps Q.
No ratings yet
DevOps Q.
13 pages
Prgi User Manual Version 1 Printer
No ratings yet
Prgi User Manual Version 1 Printer
18 pages
Bca 13
No ratings yet
Bca 13
3 pages
AUTOCAD 2021 Basic
100% (1)
AUTOCAD 2021 Basic
6 pages
Untitled
No ratings yet
Untitled
254 pages
Changelog
No ratings yet
Changelog
2 pages
Cybersecurity in The Digital Age Protecting Data in A Connected World
No ratings yet
Cybersecurity in The Digital Age Protecting Data in A Connected World
10 pages
RHEL8 Server Configuration Guide
No ratings yet
RHEL8 Server Configuration Guide
3 pages
iFIX HMI/SCADA Features Overview
No ratings yet
iFIX HMI/SCADA Features Overview
45 pages
Simplex Method
No ratings yet
Simplex Method
21 pages
DPCM
No ratings yet
DPCM
9 pages
Gunawan Smart Farm RTOSv 2
No ratings yet
Gunawan Smart Farm RTOSv 2
6 pages
Photo: Phone: +38 096 126 12 36 Skype: Live:oleksandrklymko
No ratings yet
Photo: Phone: +38 096 126 12 36 Skype: Live:oleksandrklymko
2 pages
What Is A Data Scientist
No ratings yet
What Is A Data Scientist
21 pages
Lesson 5
No ratings yet
Lesson 5
10 pages
Acfroga7lh 3qkjyenivl01jo7ajbmipe Nvvlmfdrm53id0o2x7hq Evlyzkpsyz0wydsfreraso3q6nvj8jqj7ke0uhlglzplv0j9dvprlkrcaaib0z 1dhbx1ywi
No ratings yet
Acfroga7lh 3qkjyenivl01jo7ajbmipe Nvvlmfdrm53id0o2x7hq Evlyzkpsyz0wydsfreraso3q6nvj8jqj7ke0uhlglzplv0j9dvprlkrcaaib0z 1dhbx1ywi
1 page
PDF Signer Software
No ratings yet
PDF Signer Software
11 pages
BK Complete Portfolio 2011
No ratings yet
BK Complete Portfolio 2011
191 pages
SIP Protocol and Call Flow Guide
No ratings yet
SIP Protocol and Call Flow Guide
31 pages
Ita-360 DS (022924) 20240507095014
No ratings yet
Ita-360 DS (022924) 20240507095014
2 pages
Overview of Cloud Computing by Asst. Prof. Lija Mishra
100% (3)
Overview of Cloud Computing by Asst. Prof. Lija Mishra
23 pages
BTech CSE Detailed Notes
No ratings yet
BTech CSE Detailed Notes
5 pages
Unit 3
No ratings yet
Unit 3
64 pages
Msccs 1
No ratings yet
Msccs 1
1 page
MODULE 4 Network Security
No ratings yet
MODULE 4 Network Security
16 pages
Biochemistry Analysers New
No ratings yet
Biochemistry Analysers New
9 pages
Cyber Law
No ratings yet
Cyber Law
43 pages
Manual Centralina Megane Classic 1.4
No ratings yet
Manual Centralina Megane Classic 1.4
35 pages
Ece1100 Bennett Bush Resume
No ratings yet
Ece1100 Bennett Bush Resume
1 page
Hospital Management System (Documentation)
100% (1)
Hospital Management System (Documentation)
35 pages

L01 Introduction

Uploaded by

L01 Introduction

Uploaded by

CS3210

 Optimized on speed only  Optimized to cooperate and work

 Single Processor:  Parallel Computer:

“obtaining more computing power by stamping multiple processors

John Markoff, New York Times, May 17, 2004

*Cooking aware computing

“But visualize what it would be like to program for such a machine! If a

 Traditionally, a problem is divided into a discrete

Subtract c from b c (2)

single control flow

Sequential Computation Model

2. Work smart  pipelining, superscalar

3. Get help  replication – multicore, cluster, ..

 Processing units could be:

 Ideally, a problem (application) is partitioned into sufficient

fork creates parallel

join blocks executing

multiple control flows

Parallel Computing (Shared-Memory)

 Performance - exploits a large collection of processing units that

Physical Cores &

 Complicated and laborious process

 Scheduling: Assignment of tasks to processes or threads

 Mapping: Assignment of processes/threads to physical

 Tasks may depend on each other resulting in data or control

 To execute correctly, processes and threads need

 Memory organizations: shared-memory (threads) and

 E.g. If n (=24), and p = 8

 Partial sum calculated in each core:

 factor of 2 improvement  factor of 100 improvement

 Parallel execution time = computation time + parallelization

 Software-controlled memory management

 Coordination and synchronization

 Performance modeling / Monitoring

Single System Distributed Systems

PC/Workstation SMP/NUMA Vector Mainframe

Clusters Client Server Grids Peer-to-Peer Cloud

Control and Management

• 158,976 nodes (CPUs):

name prefix multiplier

 Parallel computing - what, why and how

 Performance and challenges

 Landscape of parallel computing

You might also like