0% found this document useful (0 votes)

34 views50 pages

Pc7 Performance

This document discusses parallel computing performance and metrics. It introduces key concepts like speedup, efficiency, Amdahl's law, and Gustafson's law. Amdahl's law states that speedup from parallelization is limited by the sequential fraction of a program. Gustafson's law considers increasing problem sizes with more processors to achieve linear speedup. The document also discusses sources of overhead like communication that can reduce efficiency compared to ideal models.

Uploaded by

linchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views50 pages

Pc7 Performance

Uploaded by

linchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Parallel Computing

Parallel performance
Thorsten Grahs, 01.06.2015

Overview
Performance
Speedup & efficiency
Amdahls Law
Gustafsons Law
Efficiency and scalability metrics
Timing models
Examples: Inner product

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 2

Parallel Performance
Primary purpose of parallelization
The primary purpose or goal to parallelize a system or a
program is performance.
So what is performance?
Usually it is about one of the following
Reducing the total time it takes to compute a single result
Increasing the rate at which a series of results can be
computed
Reducing the power consumption of a computation
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 3

Speed up by parallelization
Question
Could one build a TeraFlop computer by using 1000
GigaFlop machines?
Yes, . . . but
Technical restrictions
Influenced by CPU speed, memory, network
Not all program parts could be parallelized?
Speed-Up?
Efficiency?
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 4

Parallel run time

Parallel execution time

Tp (n)

Time between start of the parallel program and the end of all
involved parallel processes.
p # processors
n model size
Run time for a parallel program (distributed sys.) consists of
local computation
Computation of one processor with local data
Data exchange
Necessary communication for data exchange
Waiting time
(e.g. due to unbalanced loads on processors)
synchronisation Calibration between involved processes
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 5

Cost of a parallel program

Costs Cp (n)
The costs (work, processor-time-product) of a parallel
product are defined as
Cp (n) = Tp (n) p
Costs Cp (n)
takes into account the time that all processors involved
needs for fulfilling their tasks
a measure of the work performed by all processors.
are optimal, when the parallel program performs as many
operations as the fastest sequential algorithm with run
time T (n), i.e.
Cp (n) = T (n).
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 6

Speed-up
Serial vs. parallel fraction Speed-up Sp (n)
The speed-up of an parallel program with run time Tp (n) is
defined as
Sp (n) =

T (n)
Tp (n)

T (n) Run time of the optimal serial implementation.

Measure for
comparison between serial and parallel implementation.
relative gain in speed up
Ideal case: linear speed-up T = Tp p Sp = p
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 7

Efficiency
Serial vs. parallel fraction Efficiency Ep (n)
The efficiency of a parallel program is defined as
Ep (n) =

Sp (n)
T (n)
T (n)
=
=
Cp (n)
p
p Tp (n)

Measure for the proportion of the run time, a processor

needs for calculation, which is also present in the serial
program
less efficiency = large parallel overhead
Linear Speed-up Sp = p corresponds to Ep = 1
E100 (n) = 0.4 means that each out of the 100 processors
needs 60% time for communication
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 8

Amdahls objection
Gene Amdahl, 1967
. . . the effort expended on achieving high parallel processing
rates is wasted unless it is accompanied by achievements in
sequential processing rates of nearly the same magnitude
What does this mean?
Inclusion of the sequential proportions
In general, a parallel algorithm has always inherent
sequential components. This has implications for the
achievable speed-up.
We assume
f
relative proportion of the serial task
1f
relative proportion of ideal parallelizable tasks
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 9

Amdahls Law
Gene Amdahl ( 1922)
Computer pioneer
& businessman
Famous for his work on
large-capacity computers at
IBM
Founder of the Amdahl Corp.

Assume
T (n) = Tser + Tpar
sequential execution
p
T (n) = Tser + Tpar /p parallel execution

Amdahls Law
Sp (n) =

Tser + Tpar
Tser + Tpar /p

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 10

Amdahls Law | Example

Assume
a program needs 10 hours on a single processor
one hour is sequential, the rest 9 hours can be parallelized
Consequences
The program cannot run in less than an hour
The speed-up is at most 10
Speed-up is limited by the sequential part
Note: the problem size is fixed!
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 11

Amdahls Law relative formulation

If an implementation has a
relative proportion f , (0 6 f 6 1) of inherent serial tasks,
relative proportion (1 f ) of ideal parallelizable tasks
the over-all run time of the program consists of the
Run time for serial part
f T (n)
Run time for parallel part (1 f )/p T (n)

Amdahls Law (relative)

Sp (n) =

T (n)
1
1
=
6
1f
1f
f
f T (n) + p T (n)
f+ p

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 12

Amdahls law Example

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 13

Amdahls law Speed-up

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14

Amdahls law Consequence

Serial part is excluded from massive parallel performance
For a sequential proportion of 1 %
the maximal accessible speed-up is 100
regardless how many processors are used
For a sequential proportion of 20 %
the maximal accessible speed-up is 5
regardless how many processors are used

Conclusion
Massive parallel systems where for a long time considered as
inefficient and only interesting from a theoretical point of view
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15

Further sources of limitations

Further delays due to
Load balancing
i.e. unequal work packages (loads)
Communication
e.g. waiting for data/processes

Parallelization induced communication

Consider the communication time Tpc (n) due to the need for
exchange between participating processors

Tp (n) = T (n) [(1 f ) + f /p] + Tpc (n)

Tpc (n) is monotone increasing with # p.

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 16

Communication Master-Worker
Assumptions
Load balancing excellent
Communication between Master and Worker
Tpc (n) linear function of p.
A processor does not monologise
T2c (n) minimal comm. time (Master & Worker)
Tpc (n) = T2c (n)(p 1)
i.e.

with

Tc (2)
T (1)

S(p) =

1
f r+(1f )/p+rp

Min. comm. time

Serial comp. time

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 17

Comm. Master-Worker Max. speed-up

New situation
For a great # p speed-up can decline
Maximal Speed-up
S(p? ) =

f r+2

(1f )r

with

p? =

1f
r

Pay-of area
Only for
p p?
S(p) = p holds. This means that only for processor number
much less than p? parallel performance can be assured.
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 18

Scaling
An example
One decorator wallpapers a room in 60 minutes.
Two decorators wallpaper a room in 30 minutes.
How long does it take, when 60 decorators work in the
room?
Remedy
Use the 60 decorators for a hotel with 60 rooms. . .

Problem scaling
With increasing # processors the problem size should be
increased, in order to guarantee parallel
performance/efficiency.
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 19

Scalability
Saturation of speed-up
According to the Amdahls Law for a fixed problem size n with
increasing number of processors p there is a saturation of
the speed-up
60 decorators in one room . . .
Quite often one is interested in scientific computing to
solve a bigger problem in the same time.
Growing problem size n combined with increasing
# processors p
Instead of decorating one room, 60 painters can renovate
the whole hotel (60 rooms).
This behaviour (speed-up for increasing n) is not detected
by Amdahls Law!
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 20

Scaled speed-up
Assumption
The sequential proportion of a parallel program decreases
with increasing model size. I.e. it is not a constant fraction of
the total computation as adopted in Amdahls Law.
For each number of processors p maximum speed up
Sp (n) 6 p can be achieved
namely by correspondingly large model size
The behaviour of run time T with larger problem size and
correspondingly increased number of processors is
described by the Gustafsons Law.
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 21

Gustafsons Law
John Gustafson (1955)
. . . speed-up should be measured by scaling the
problem to the number of processors, not by fixing the problem size.
J. L. Gustafson Reevaluating Amdahls Law,
Comm. of the ACM, 31:532-533, 1988
This implies:
inclusion of the model size n
sequential proportion f constant
Serial run time
Ts (n) = f + n(1 f )
)
Parallel run time
Tp (n) = f + n(1f
p
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 22

Gustafsons Law scaled speed-up

Gustafsons Law(Scaled speed-up)
If the problem size increases while the serial portion grows
slowly or remains fixed, speed-up grows as workers are
added
Sp (n) =
=

Ts (n)
Tp (n)
f + n(1 f )
f+

n(1f )
p

f + n(1 f )
p
fp + n(1 f )
lim Sp (n) = p
=

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 23

Gustafsons Law constant run time

Constant run time

Keeping the run time constant for increasing problem size,
one has to choose n = p. This means: For double model size
(e.g. # grid points) one has to double the # processors.
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 24

Karp-Flatt metric
Both, Amdahl and Gustafson ignore the parallel overhead
of the parallelization.
This may lead to an overestimated speedup.
Karp & Flatt introduces an
experimentally determined serial fraction

Karp-Flatt metric
f =

1/S 1/p
1 1/p

S measured speedup
# processors
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 25

Karp-Flatt metric
Advantage
Takes into account parallel overhead
Detects other sources of overhead or inefficiency
ignored in speedup model:
Process start-up time
Process synchronization time
Imbalanced workload
Architectural overhead

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 26

Karp-Flatt metric Example 1

Measured speedup
p
2
3
S 1.8 2.5

4
3.1

5
3.6

6
4.0

7
4.4

8
4.7

What is the primary reason for speedup of only 4.7 on 8

CPUs?
Karp-Flatt metric
f 0.1 0.1 0.1 0.1 0.1 0.1 0.1
Since f is constant, (i.e. not increasing with growing # p)
the limiting constraint is limited opportunity for parallelism
(large fraction that is inherently sequential)
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 27

Karp-Flatt metric Example 2

Measured speedup
p
2
3
S 1.9 2.6

4
3.2

5
3.7

6
4.1

7
4.5

8
4.7

What is the primary reason for speedup of only 4.7 on 8

processors?
Karp-Flatt metric
f

0.070

0.075

0.080

0.085

0.090

0.095

0.100

Since f is steadily increasing, parallel overhead is the

primary reason (f.i. time spent in process start-up,
communication synchronization or an architectural
constraint).
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 28

Scalability again
Scalability
The scalability of a parallel system
(i.e. parallel program executing on a parallel computer)
is a measure of its ability to increase performance as the
number of processors increases.
Speedup (and hence efficiency) is typically an increasing
function of the problem size.
This is called the Amdahl effect.
in order to maintain efficiency when processors are added,
we can increase the problem size.
This idea is formulated by the
Isoefficiency relation
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 29

Efficiency relation
TO = is the total amount of time spent by all processes
doing work not done by the sequential algorithm. (Total
overhead)
Ts was the sequential execution time. We have
TS + TO
pTP = TS + TO TP = =
p
Speed-up
TS
pTS
S =
TP TS + TO
Efficiency
S
1
TS
E 6
=
=
p
TS + TO
1 + TTO
S

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 30

Isoefficiency relation
E(n, p) 6

1
1+

TO (n,p)
TS (n)

TO (n, p)
1 E(n, p)
6
TS (n)
E(n, p)
1 E(n, p)
TS (n) 6
TO (n, p)
E(n, p)
If we wish to maintain a constant level of efficiency
1 E(n, p)
has to be a constant.
E(n, p)
Consequently, the isoefficiency relation simplifies too

TS (n) > CTO (n, p)

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 31

Scalability function
Suppose a parallel system has the isoefficiency relation
n > f (p)
M(n) is the memory required to store a problem of size n.
We can derive the estimate for the total amount of
memory, i.e.
n > M[f (p)].
Thus the function
M(f (p))
p
shows how memory usage per processor must increase to
maintain same efficiency
This function is called the Scalability function
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 32

Meaning of scalability function

To maintain the same efficiency when increasing p,
we must increase n.
The maximum problem size is limited by the maximum
available size of memory M(f (p)),
which is linear in p.
The scalability function shows how memory usage per
processor must grow to maintain efficiency
Is the complexity of the
scalability function is constant
this means that the
parallel system is perfectly scalable.
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 33

Meaning of scalability function

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 34

Performance models
Analysing performance
Amdahls &. Gustafsons Laws makes simple statements
about the behaviour of a model with changing
# processors p
model/problem size n
Need for analysing the behaviour of parallel Algorithms

Performance models
Abstract machine models for the design and performance
analysis of parallel algorithms.
Allow detailed statements about run time behaviour (time,
memory, efficiency,. . . )
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 35

Timing model
model (Van de Velde)
Simple performance model for run time analysis
Assumptions
All tasks run independently from each other and start
simultaneously
All processors are identical
All arithmetic operations take the same time ta
Message exchange in data words with unit length (16 Bit)
Communication and calculation are non-overlapping
Communication tasks are non-interfering each others
No global only point-to-point communication
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 36

Timing model Modelling data exchange

Data exchange
Linear model for message exchange (send & receive) of a
message M of length l (i.e. consisting of l data words)
tk (l) = + l
is the start time (latency = Time to complete a task)
Here: Time, to send a message, i.e.
pack the message
send the message
& unpack it on recipient side.

reciprocal bandwidth of the bus/network

M Message of length l to be send
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 37

Timing model Modelling data exchange

In general
ta
ta arithmetic time
i.e. time for processing one standard arithmetic operation
Properties
Communication time depends linear on message length l.
Communication time depends on the system/hardware
(Network)
Slope of the line is the reciprocal bandwidth.

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 38

Timing model Example

model time response depending on message length

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 39

Example: Inner product

for(s=0.,i=0;i<N;i++) s += x[i] * y[i];

N vector length (problem size)
p # processors
n = Np uniform distribution of the vectors to processors
Constraint
s should be known on all processors.
Serial work
T1 = 2 N ta

To consider for parallelization

Distribution of data
Distribution of work
Algorithm
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 40

Example: Inner product Version I

Version I local inner products
1
2
3
4

s=0.;
for(i=0;i<n;i++) s += x[i] * y[i];
for(i=0;i<p; i++) if(i != myself) send(s to proc i);
for(i=0;i<p; i++) if(i != myself) s+=recieve(from proc i);

2 n ta
(p 1) tk (1)
(p 1) ta

accumulation of local summation

send and receive of the (p-1) sub totals
computation of total sum

Tp = 2 n ta + (p 1)tk (1) + (p 1)ta

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 41

Example: Inner product Version I

Speed-up
2N ta
(2n + p 1)ta + (p 1)tk (1)
2N
=
(2n + p 1) + (p 1)tr

S(p) =

T1
Tp

with

tr =

tk (1)
ta

(lesser is better)

Speed-up Version I
S(p) =

2N
(2n + p 1) + (p 1)tr

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42

Example: Inner product Version II

Version 2 recursive doubling
Pairs of local sums are formed recursively.
2 n ta
Formation of the local sums
ta + 2tk (1)
Receiving and adding two
sub sums.
log2 p
# inner knots (propagation
levels)

Tp = 2 n ta + log2 p [ta + 2tk (1)]

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 43

Example: Inner product Version II

Speed-up
2N ta
(2n + log2 p)ta + 2 log2 p tk (1)
2N
=
2n + log2 p (1 + 2tr )
tk (1)
with tr =
(lesser is better)
ta

S(p) =

T1
Tp

Speed-up Version II
2N
2n + log2 p (1 + 2tr )
Only logarithmic dependence on number of processors!
S(p)

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 44

Example: Inner product Version III

Version 3 Butterfly
1
2
3
4
5
6
7
8

s=0.;
for(i=0; i<np; i++) s += x[i] * y[i];
for(i=0; i<(int)log2(p); i++) {
k = 1<<i;/*Bit shift to left, i.e. sequence 1,2,4,...*/
j = myself^k; /* exclusive OR */
send (s to proc j);
s += recv(from proc j)
}

log2 (p) steps

Each p different pairs exchange data s.
Each processor forms subtotal
Algorithm known from Fast Fourier Transf. (FFT)
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 45

Example: Inner product Version III

Version 3 Butterfly
Pairs of local sums are formed recursively.
2 n ta
Formation local sums
ta + tk (1)
Sum. of two subtotals
log2 p
# inner knots
P
s(0 : 3)means: 3i=0 s(i)

Tp = 2 n ta + log2 p (ta + tk (1))

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 46

Example: Inner product Version III

Speed-up
2N ta
(2n + log2 p)ta + log2 p tk (1)
2N
=
2n + log2 p (1 + tr )
tk (1)
with tr =
(lesser is better)
ta

S(p) =

T1
Tp

Speed-up Version III

2N
2n + log2 p (1 + tr )
Only logarithmic dependence on number of processors!
Half communication time compared to recursive doubling
S(p)

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 47

Example: Inner product Comparison

Comparison for Versions I,II & III (N=100,000)

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 48

Example: Inner product Conclusions

Simple performance model is helpful
It can analyse different realisations of parallel algorithms
Take into account network speed
Local inner products
linear dependence of the run time of processor number p
prevents massive parallelism
Recursive doubling
logarithmic dependence of the run time leads to better
speed-up
Butterfly
Lower communication overhead In addition
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 49

Further readings
Literature
G.M. Amdahl Validity of the single-processor approach
to achieving large scale computing capabilities, In Proc.
Amer. Fed. Information Proceeding Societies Spring Joint
Computer Conference, AFIPS Press, 1967, pp. 483-485.
J. L. Gustafson Reevaluating Amdahls Law, Comm. of
the ACM, 31:532-533, 1988
E. F. Van de Velde Concurrent Scientific Computing,
Springer Texts in Applied Mathematics 16, 1994
M. McCool, A. D. Robinson, J. Reinders Structured
Parallel Programming, Morgan Kaufmann Pub., 2012
01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 50

Lecture-11 Amdhals Law Gustafsons Law
No ratings yet
Lecture-11 Amdhals Law Gustafsons Law
16 pages
Document
No ratings yet
Document
10 pages
Parallel Computing Metrics
No ratings yet
Parallel Computing Metrics
11 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
Parallel Algorithm Performance
No ratings yet
Parallel Algorithm Performance
10 pages
COE4590 12 Amdahls Law
No ratings yet
COE4590 12 Amdahls Law
18 pages
Lecture 2 Amdahl's Law and Karp-Flatt Metric
0% (1)
Lecture 2 Amdahl's Law and Karp-Flatt Metric
14 pages
Speed Up Laws
No ratings yet
Speed Up Laws
21 pages
Lecture04 PDF
No ratings yet
Lecture04 PDF
27 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
Amdahl's Law & Parallel Processing
No ratings yet
Amdahl's Law & Parallel Processing
16 pages
3.1 Computer Performance
No ratings yet
3.1 Computer Performance
16 pages
02 Gustafsons Law
No ratings yet
02 Gustafsons Law
2 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Performance Evaluation of Parallel Computers
No ratings yet
Performance Evaluation of Parallel Computers
37 pages
Laraib Cs - 39 Assig 1
No ratings yet
Laraib Cs - 39 Assig 1
4 pages
Parallel Programming Insights
No ratings yet
Parallel Programming Insights
32 pages
Reevaluating Amdahl's Law
No ratings yet
Reevaluating Amdahl's Law
3 pages
Amdahl's Law & Parallel Computing
No ratings yet
Amdahl's Law & Parallel Computing
18 pages
Lecture 3.1.4 (Amdahl's Law)
No ratings yet
Lecture 3.1.4 (Amdahl's Law)
13 pages
Amdahl's Law: Parallel Computing Limits
No ratings yet
Amdahl's Law: Parallel Computing Limits
9 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Amdahl's Law (Autosaved)
No ratings yet
Amdahl's Law (Autosaved)
12 pages
Lecture 6 (Amdahl's Law)
No ratings yet
Lecture 6 (Amdahl's Law)
13 pages
5 Problems PDF
No ratings yet
5 Problems PDF
32 pages
Dual Core Processor Speedup Analysis
No ratings yet
Dual Core Processor Speedup Analysis
4 pages
Lecture 4 - Parallel Computing Metrics
No ratings yet
Lecture 4 - Parallel Computing Metrics
3 pages
Lecture 4 Amdahl Law 1
No ratings yet
Lecture 4 Amdahl Law 1
22 pages
Amdahls Law - Advanced Computer Architecture
No ratings yet
Amdahls Law - Advanced Computer Architecture
2 pages
Week 01 Lec 2 - 05!03!2024 (Types of Parallelism)
No ratings yet
Week 01 Lec 2 - 05!03!2024 (Types of Parallelism)
17 pages
Week - 01 - Lec - 2 - 05-03-2021 (Types of Parallelism)
No ratings yet
Week - 01 - Lec - 2 - 05-03-2021 (Types of Parallelism)
17 pages
Unit 1 - Part 3
No ratings yet
Unit 1 - Part 3
17 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Cao AMDAHL's Law
No ratings yet
Cao AMDAHL's Law
4 pages
Amdahl's Law and Gustafson's Law
75% (4)
Amdahl's Law and Gustafson's Law
16 pages
All Numerical Unit-1
No ratings yet
All Numerical Unit-1
28 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Amdahl's Law & Parallel Computing
No ratings yet
Amdahl's Law & Parallel Computing
19 pages
Lecture 3.1.4 (Amdahl's Law)
No ratings yet
Lecture 3.1.4 (Amdahl's Law)
4 pages
Parallel Computing Answer Key
No ratings yet
Parallel Computing Answer Key
8 pages
Amdahl's Law and Parallelism
0% (1)
Amdahl's Law and Parallelism
4 pages
Unit4 Session5 Amdahls Law Gustafsons Law
No ratings yet
Unit4 Session5 Amdahls Law Gustafsons Law
15 pages
Amdahl's Law: Example 1
No ratings yet
Amdahl's Law: Example 1
12 pages
Lect 02
No ratings yet
Lect 02
51 pages
Unit 4 - Analytical Modeling of Parallel Programs
No ratings yet
Unit 4 - Analytical Modeling of Parallel Programs
37 pages
Amdahl's Law
No ratings yet
Amdahl's Law
5 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
03 Performance
No ratings yet
03 Performance
29 pages
Amdahl's Law
No ratings yet
Amdahl's Law
25 pages
Aitsam B21F0230CS015 PDC ASS02
No ratings yet
Aitsam B21F0230CS015 PDC ASS02
5 pages
OOAD
No ratings yet
OOAD
67 pages
Amdahl's Law & Parallelism Concepts
No ratings yet
Amdahl's Law & Parallelism Concepts
17 pages
PDC Lecture 03
No ratings yet
PDC Lecture 03
36 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
PDC Lecture 03
No ratings yet
PDC Lecture 03
36 pages
Reevaluating Amdahl's Law and Gustafson's Law
No ratings yet
Reevaluating Amdahl's Law and Gustafson's Law
9 pages
Grading: Final Grade Calculation
No ratings yet
Grading: Final Grade Calculation
2 pages
MIT Algorithms Problem Set
No ratings yet
MIT Algorithms Problem Set
6 pages
Course Description: Lectures
No ratings yet
Course Description: Lectures
2 pages
Final 2004
No ratings yet
Final 2004
6 pages
Exam1 Sol
No ratings yet
Exam1 Sol
5 pages
1 HW 1
No ratings yet
1 HW 1
1 page
Parallel Computing: Thorsten Grahs, 13. April 2015
No ratings yet
Parallel Computing: Thorsten Grahs, 13. April 2015
41 pages
Microphone Specifications Explained
No ratings yet
Microphone Specifications Explained
4 pages
Alarms of Genset
No ratings yet
Alarms of Genset
4 pages
CV Chetan Kumar
No ratings yet
CV Chetan Kumar
1 page
UNT SportsZone Analyst Insight
No ratings yet
UNT SportsZone Analyst Insight
6 pages
Expansion of Liquids
No ratings yet
Expansion of Liquids
3 pages
Integrated Hydraulic Rock Drill and Splitter Brochure
No ratings yet
Integrated Hydraulic Rock Drill and Splitter Brochure
14 pages
Op Amp Basics for EEE Students
No ratings yet
Op Amp Basics for EEE Students
25 pages
TLD Asu 600
100% (2)
TLD Asu 600
107 pages
2220006-TD13 - Vessel-Mv Hai Long Bravo-Service-19-09-2022
No ratings yet
2220006-TD13 - Vessel-Mv Hai Long Bravo-Service-19-09-2022
7 pages
3VJ13405DB320AA0 Datasheet en
No ratings yet
3VJ13405DB320AA0 Datasheet en
4 pages
Selenium Programming Cookbook PDF
No ratings yet
Selenium Programming Cookbook PDF
72 pages
Production of Acrylic Acid Form Propylene: University Institute of Engineering Department of Chemical Engineering
67% (3)
Production of Acrylic Acid Form Propylene: University Institute of Engineering Department of Chemical Engineering
41 pages
Illustrated Parts Catalogue: Plus Plus
No ratings yet
Illustrated Parts Catalogue: Plus Plus
19 pages
Affix Recent Passport Size Photographs Duly Attested by Gazette Officer
No ratings yet
Affix Recent Passport Size Photographs Duly Attested by Gazette Officer
4 pages
Strategic HRM for Business Students
No ratings yet
Strategic HRM for Business Students
51 pages
Behaviour & Numerical Modelling of Reinforced Concrete Structures
No ratings yet
Behaviour & Numerical Modelling of Reinforced Concrete Structures
15 pages
Microstrip Line Equations
No ratings yet
Microstrip Line Equations
14 pages
Manual biến tần Anyhz FST-610
No ratings yet
Manual biến tần Anyhz FST-610
113 pages
Ring Main Unit - 8DJH ST
No ratings yet
Ring Main Unit - 8DJH ST
8 pages
MPC Sample Problems and Solutions
No ratings yet
MPC Sample Problems and Solutions
20 pages
YTO - LR - 4105 - 4108 - 4110 - Parts
No ratings yet
YTO - LR - 4105 - 4108 - 4110 - Parts
87 pages
Westpac Dubakoor
No ratings yet
Westpac Dubakoor
1 page
ASP Dot Net Udated Resume
No ratings yet
ASP Dot Net Udated Resume
8 pages
Technical Monograph 28: Fundamentals of Threećmode Controllers
No ratings yet
Technical Monograph 28: Fundamentals of Threećmode Controllers
13 pages
Intelilight - Fpc-220 Data Concentrator
No ratings yet
Intelilight - Fpc-220 Data Concentrator
2 pages
Al15 Kgdraft en Rev00 ZSP Datasheet Web
No ratings yet
Al15 Kgdraft en Rev00 ZSP Datasheet Web
2 pages
Yangzi Mitsui
No ratings yet
Yangzi Mitsui
8 pages
MODBUS Register Guide & Examples
No ratings yet
MODBUS Register Guide & Examples
9 pages
Smallest Petrol Generator in The World
No ratings yet
Smallest Petrol Generator in The World
24 pages
Comptest Service Nokia Alcatel - Lucent 1830
0% (1)
Comptest Service Nokia Alcatel - Lucent 1830
2 pages

Pc7 Performance

Uploaded by

Pc7 Performance

Uploaded by

Parallel Computing

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 2

Parallel run time

Cost of a parallel program

T (n) Run time of the optimal serial implementation.

Measure for the proportion of the run time, a processor

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 10

Amdahls Law | Example

Amdahls Law relative formulation

Amdahls Law (relative)

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 12

Amdahls law Example

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 13

Amdahls law Speed-up

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14

Amdahls law Consequence

Further sources of limitations

Parallelization induced communication

Tp (n) = T (n) [(1 f ) + f /p] + Tpc (n)

Tpc (n) is monotone increasing with # p.

Min. comm. time

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 17

Comm. Master-Worker Max. speed-up

Gustafsons Law scaled speed-up

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 23

Gustafsons Law constant run time

Constant run time

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 26

Karp-Flatt metric Example 1

What is the primary reason for speedup of only 4.7 on 8

Karp-Flatt metric Example 2

What is the primary reason for speedup of only 4.7 on 8

Since f is steadily increasing, parallel overhead is the

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 30

TS (n) > CTO (n, p)

Meaning of scalability function

Meaning of scalability function

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 34

Timing model Modelling data exchange

reciprocal bandwidth of the bus/network

Timing model Modelling data exchange

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 38

Timing model Example

model time response depending on message length

Example: Inner product

for(s=0.,i=0;i<N;i++) s += x[i] * y[i];

To consider for parallelization

Example: Inner product Version I

accumulation of local summation

Tp = 2 n ta + (p 1)tk (1) + (p 1)ta

Example: Inner product Version I

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 42

Example: Inner product Version II

Tp = 2 n ta + log2 p [ta + 2tk (1)]

Example: Inner product Version II

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 44

Example: Inner product Version III

log2 (p) steps

Example: Inner product Version III

Tp = 2 n ta + log2 p (ta + tk (1))

Example: Inner product Version III

Speed-up Version III

01.06.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 47

Example: Inner product Comparison

Comparison for Versions I,II & III (N=100,000)

Example: Inner product Conclusions

You might also like