COMPUTER APPLICATION FOR BUSINESS
UNIVERSITY OF HUMAN DEVELOPMENT
COMPUTER SCIENCE DEPARTMENT
COMPUTER ARCHITECTURE
MEASURING PERFORMANCE
Azhi A. Faraj
Performance
2
When we say one computer is faster than another
is, what do we mean?
The computer user is interested in reducing response
time
Performance: Latency vs. Throughput
3
Latency(execution time): time to finish a fixed task
Throughput (bandwidth): number of tasks per unit time
Example: move people 10 miles
Car: capacity = 5, speed = 60 miles/hour
Bus: capacity = 60, speed = 20 miles/hour
Latency:
car = 10 min,
bus = 30 min
Throughput:
car= 30 PPH ,
bus = 120 PPH
3
Measuring Performance
4
Execution time:
Time between start and completion of a task (including
disk accesses, memory accesses )
Throughput:
Total amount of work done in a given time
4
Performance of a Computer
5
Two Computer X and Y;
Performance of (X) > Performance of (Y)
Execution Time (Y) > Execution Time (X)
5
Performance of difference 2 Computer
6
X is n Times faster than Y
6
CPU Time
7
Time CPU spends on a task
User CPU time
CPU time spent in the program
System CPU time
CPU time spent in OS performing tasks on behalf of the
program
7
CPU Time (Example)
8
User CPU time = 90.7s
System CPU time 12.9s
Execution time 2m 39 s 159s
% of CPU time =
User CPU Time + System CPU Time
X 100 %
Execution time
8
CPU Time
9
% CPU time = (90.7 + 12.9 ) x 100
159
= 65 %
9
Amdahls law
10
Performance improvement that can be gained from
some faster mode of execution is limited by fraction
of the time the faster mode can be used
10
Amdahls law
11
Speedup depends on
Fraction of computation time in original machine that
can be converted to take advantage of the
enhancement
(Fraction Enhanced)
Improvement gains by enhanced execution mode
(Speedup Enhanced)
11
Example
12
Total execution time of a Program = 50 s
Execution time that can be enhanced = 30 s
FractionEnhanced = 30 /50 = 0.6
12
Speedup
13
13
Example
14
Normal mode execution time for some portion of a
program = 6s
Enhances mode execution time for the same program
= 2s
Speedup Enhanced = 6/2
=3
14
Execution Time
15
15
Example
16
Suppose we consider an enhancement to the processor of a
server system used for Web serving.
New CPU is 10 times faster on computation in Web application
than original CPU. Assume original CPU is busy with
computation 40% of the time and is waiting for I/O 60% of
time.
What is the overall speedup gained from
enhancement?
16
Answer
17
17
Remark
18
Ifan enhancement is only usable for fraction of a
task, we cannot speedup by more than
18
Example
19
A common transformation required in graphics
engines is square root implementation of floating-
point (FP). square root vary significantly in
performance, especially among processors designed
graphics
Suppose FP square root (FPSQR) is responsible for
20% of execution time of a critical graphics program
Design alternative
1. Enhance FPSQR hardware and speed up this operation by
a factor of 10
2. Make all FP instruction run faster by a factor of 1.6
19
Example
20
FP instruction are responsible for a total of 50% of
execution time. Design team believes they can make
all fp instruction run 1.6 times faster with same
effort as required for fast square root.
Compare these two design alternatives
20
21
21
Computer Clock Times
22
Computers run according to a clock that runs at a
steady rate
The time interval is called a clock cycle (eg, 10ns).
The clock rate is the reciprocal of clock cycle - a
frequency, how many cycles per sec (eg, 100MHz).
10 ns = 1/100,000,000 (clock cycle), same as:-
1/10ns = 100,000,000 = 100MHz (clock rate).
Cycles Per Instruction
23
The execution time of a program clearly must depend on the
number of instructions
but different instructions take different times
An expression that includes this is:-
CPU clock cycles = N * CPI
N = number of instructions
CPI = average clock cycles per instruction
CPU performance equation
24
CPU time = CPU clock cycles for a program x Clock cycle time
= CPU clock cycles / Clock rate
24
Example
25
A program runs in 10s (CPU time) on computer A
having 400 MHz clock rate.
A new machine B, which could run the same
program in 6s, has to be designed.
Further, B should have 1.2 times as many clock
cycles as A.
What should be the clock rate of B?
25
Answer
26
26
CPU Clock Cycles
27
CPI (clock cycles per instruction)
average no. of clock cycles each instruction takes to
execute
IC (instruction count)
no. of instructions executed in the program
CPU clock cycles = CPI x IC
Note: CPI can be used to compare two different
implementations of the same instruction set architecture
(as IC required for a program is same)
27
Example
28
Consider two implementations of same instruction set
architecture. For a certain program, details of time
measurements of two machines are given below
Which machine is faster for this program and by how
much?
28
Answer
29
29
Measuring components of CPU performance
equation
30
CPU Time: by running the program
Clock Cycle Time: published in documentation
IC: by a software tools/simulator of the architecture
((more difficult to obtain)
CPI: by simulation of an implementation (more
difficult to obtain)
30
CPU clock cycles
31
Suppose n different types of instruction
Let
ICi No. of times instruction i is executed in a program
CPIi Avg. no. of clock cycles for instruction i
31
Example
32
Suppose we have made the following measurements:
Frequency of all FP operation = 25%
Average CPI of FP operations excluding FPSQR = 4.0
Average CPI of all other (non-FP) instructions = 1.33
Frequency of FPSQR = 3%
CPI of FPSQR = 20
Design alternatives:
1. decrease CPI of FPSQR to 2
2. decrease average CPI of all FP operation to 2.5
Compare these two design alternatives using CPU
performance equation
32
Answers
33
CPI original = 1.33 * (0.75) + 4 * (0.25 - 0.03) + 20 * (0.03)= 2.4775
CPI new FPSQR = 1.33 * (0.75) + 4 * (0.22) + 2 * (0.03) = 1.9375
CPI new FP = 1.33 * (1- 0.25) + 2 * (0.22) + 2 * (0.03) = 1.4975
33