Dependability
◼Module reliability is a measure of time to failure from a reference
initial instant
◼MTTF (mean time to failure) is a reliability measure
◼Reciprocal of MTTF is failure rate - FIT (Failures in Time)
◼Service interruption is measured as a Mean time to Repair (MTTR)
◼Mean time between failures (MTBF) is widely used and is a sum of
MTTF and MTTR
◼Overall failure rate of a module equals the sum of the failure rates
of its components
◼Module availability is ratio of MTTF to MTBF [ MTTF/(MTTF +
MTTR) ]
70
Example
❑ Failure rate of component (failures in time) = 1/MTTF of component
❑ Failure rate of system = sum of failure rates of components
❑ Using values in given Table,
❑ Failure rate of Computer
= (10-6)x[(1/8) + 3x(1/6) + 2(1) + (1/4)]
= (10-6)x [0.125 + 0.5 + 2 + 0.25]
= 2.875 x 10-6 per hour
= 2.875 x 10-6 x 24 x 365 per year
= 2.5185 x 10-2 per year
MTTF of computer = 1/(failure rate) = 39.71 years
◼ Failure rate of cluster of 144 computers
= 2.5185 x 10-2 x 144
= 3.63 computers per year
Only 140 computers working at end of first year
71
Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n time faster than Y”
DP = F x ½ CV^2
Performanc e X Performanc e Y SP = ½ CV^2
= Execution time Y Execution time X = n
◼ Example: time taken to run a program
◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So A is 1.5 times faster than B
73
CPI in more detail
◼If different instruction classes take different numbers of cycles
n
Clock Cycles = (CPIi Instructio n Count i )
i=1
◼Weighted average CPI
Clock Cycles n
Instructio n Count i
CPI = = CPIi
Instructio n Count i=1 Instructio n Count
Relative frequency
79
Performance Summary
Instructio ns Clock cycles Seconds
CPU Time =
Program Instructio n Clock cycle
◼Performance depends on
◼Algorithm: affects IC, possibly CPI
◼Programming language: affects IC, CPI
◼Compiler: affects IC, CPI
◼Instruction set architecture: affects IC, CPI, Tc
81
Example Problem 1
◼ Consider three different processors P1, P2, and P3 executing the
same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2
has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock
rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instructions
per second?
b. If the processors each execute a program in 10 seconds, find the number
of cycles and the number of instructions.
c. We are trying to reduce the execution time by 30%, but this leads to an
increase of 20% in the CPI. What clock rate should we have to get this time
reduction?
1a
IPS = Cycles per Second/Cycles per Instruction
A measure of throughput – or rate of doing work
◼ Performance of P1: 3GHz/1.5 = 2 x 109 Inst/sec
◼ Performance of P2: 2.5GHz/1.0 = 2.5 x 109 Inst/sec
◼ Performance of P3: 4GHz/2.2 = 1.8 x 109 Inst/sec
◼ CPI can be as relevant to processor performance as clock
frequency
◼ Faster clocks may not always be a good thing – higher power
dissipation, worse reliability, worse coupling noise… and not the
best rate of processing instructions!
1b
◼ # of cycles = Cycles per second x time (in seconds)
◼ Cycles of P1: 3 GHz x 10 s = 30 B cycles
◼ Cycles of P2: 2.5GHz x 10 s = 25 B cycles
◼ Cycles of P3: 4 GHz x 10s = 40 B cycles
Assuming a metric of wall clock time to execute a given benchmark program,
P3 consumed more clock cycles – more power to do the same work
P2 consumed the least number of clock cycles to do the same work
Lower CPI translates into higher productivity and higher energy efficiency as a
result
1b contd..
◼ # of Instructions = Cycles / CPI
◼ # of Instructions of P1: 30 B cycles/1.5 Cycles per Instruction = 20B
◼ # of Instructions of P2: 25 B cycles/1 Cycles per Instruction = 25B
◼ # of Instructions of P3: 40 B cycles/2.2 Cycles per Instruction =
18.18B
1c
◼ Lower execution time trades off with higher CPI & higher FCLK
◼ Assuming 30% reduction in execution time requires 20% higher CPI
# Instructions x CPI_new / ET_new = Fclk_new
Fclk_new P1 = 20 B x 1.8 / 7s = 5.14 GHz
Fclk_new P2 = 25 B x 1.2 / 7s = 4.28 GHz
Fclk_new P1 = 18.18 B x 2.6 / 7s = 6.75 GHz
High CPI processors require even higher Clock rates to get the same %
improvement in execution time
Example Problem 2
Compilers can have a profound impact on the performance of an
application. Assume that for a program, compiler A results in a
dynamic instruction count of 1.0E9 and has an execution time of 1.1
s, while compiler B results in a dynamic instruction count of 1.2E9
and an execution time of 1.5 s.
a. Find the average CPI for each program given that the processor has a
clock cycle time of 1 ns.
b. Assume the compiled programs run on two different processors. If the
execution times on the two processors are the same, how much faster is the
clock of the processor running compiler A’s code versus the clock of the
processor running compiler B’s code?
c. A new compiler is developed that uses only 6.0E8 instructions and has an
average CPI of 1.1. What is the speedup of using this new compiler versus
using compiler A or B on the original processor?
2a
◼ CPI = ETime x Fclk / Instr Count
◼ Compiler A CPI = 1.1s x 1GHz / 1 B = 1.1
◼ Compiler B CPI = 1.5s x 1GHz / 1.2 B = 1.25
On a given machine with a given clock frequency, different
compilers that generate machine instructions using the
same instruction set architecture can differentiate in
achieving lower CPI and higher performance as a result
2b
◼ Assume the processors are different and Execution times
are now the same
◼ How much faster is clock running B’s code Vs clock running
A’s code?
Fclk_B / F_clk_A = [IC_B x CPI_B] / [IC_A x CPI_A]
= [ 1.2 B x 1.25] / [1 B x 1.1]
= 1.36
2c
◼ New compiler uses only 0.6 B instructions, CPI = 1.1
◼ Speedup Versus A or B on the original processor:
◼ Instr Count x CPI = #cycles in original processor
• TA / Tnew = Speedup Vs A: 0.66 B cycles Vs 1 B x 1.1 or 1.1 B
cycles
= 1.1 / 0.6 = 1.67
• TB/Tnew = 0.66 B cycles Vs 1.2 B x 1.25
= 1.5 / 0.66 = 2.27
Example Problem 3
Consider two different implementations of the same instruction set
architecture. The instructions can be divided into four classes according to
their CPI (classes A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1,
2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2.
a. What is the global CPI for each implementation?
b. Given a program with a dynamic instruction count of 1.0E6 instructions
divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20%
class D, which is faster: P1 or P2?
c. Find the clock cycles required in both cases.
3a,b
INSTR A: 10% B: 20% C: 50% D: 20%
CLASS →
P1 CPI 1 2 3 3
P2 CPI 2 2 2 2
◼ CPI of P1 = 0.1 X 1 + 0.2 X 2 + 0.5 X 3 + 0.2 X 3 = 2.6
◼ CPI of P2 = 0.1 X 2 + 0.2 X 2 + 0.5 X 2 + 0.2 X 2 = 2.0
ET_P1 = [ CPI / FCLK ] X IC = [2.6 / 2.5G ] X 1M = 1.04 ms
ET_P2 = [ CPI / FCLK ] X IC = [2.0 / 3G ] X 1M = 0.66 ms
P2 is faster!
3c
◼ Clock cycles required = CPI x IC
◼ P1: 2.6 x 1M = 2.6M cycles
◼ P2: 2.0 x 1M = 2.0M cycles
Example Problem 4
The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6
GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of
static power and 90 W of dynamic power. The Core i5 Ivy Bridge, released in
2012, has a clock rate of 3.4 GHz and voltage of 0.9 V. Assume that, on
average, it consumed 30 W of static power and 40 W of dynamic power.
a. For each processor find the average capacitive loads.
b. Find the percentage of the total dissipated power comprised by static
power and the ratio of static power to dynamic power for each technology.
c. If the total dissipated power is to be reduced by 10%, how much should the
voltage be reduced to maintain the same leakage current? Note: power is
defined as the product of voltage and current.
4a, b
◼ DP = F x ½ CV2
◼ C = 2DP/[V2F]
P4: 2x90W/[ 1.5625 x 3.6G ] = 3.2 x 10-8 F
i5: 2x40W/[0.81 x 3.4G] = 2.9 x 10-8 F
P4: 10W/100W = 10%
i5: 30W/70W = 42.9%
Leakage current increases as # of transistors on chip increases
exponentially
Example Problem 5
Instruction Frequency Cycles per Instruction
ALU operations 30% 1
Load 20% 2
Store 10% 2
Branches 20% 3
Floating point operations 20% 5
a. What is the overall CPI of this machine?
b. If the CPU runs at 750MHz, what is the MIPS rating of this
machine? For this question, count floating point operations in
the MIPS rating.
c. Consider improving this computer’s performance by
enhancing the speed of the floating point instructions. What is
the best possible overall speedup that we could obtain?
5a, b, c
◼ CPI overall
= 0.3 x 1 + 0.2 x 2 + 0.1 x 2 + 0.2 x 3 + 0.2 x 5
= 0.3 + 0.4 + 0.2 + 0.6 + 1.0 = 2.5
◼ MIPS (millions of instructions per second)
= Clock rate/CPI = 750 x 106/2.5 = 3 x 108
= 300 MIPS
◼ Biggest increase in CPI contributed by floating point instructions (need more cycles per
instruction)
Improvements in CPI of floating-point instruction CPI = infinite, i.e., CPI of FP instructions → 0
How much does the CPI of machine improve?
Speedup = CPI old / CPI new
CPI new =
0.3 x 1 + 0.2 x 2 + 0.1 x 2 + 0.2 x 3 + 0.2 x 0 = 1.5
Speedup = 2.5/1.5 = 1.667
Example Problem 6
Two enhancements, E1 and E2, with the following speedups are proposed for a new
architecture:
Speedup1 = 10
Speedup2 = 5
Only one of the enhancements is usable at any point in time (maybe because they use
some of the same hardware).
a. If E1 can be used 20% of the time and E2 can be used 10% of the time, what would be
the overall speedup?
b. If the percentage of time that E1 can be used decreased to 15%, what percentage of the
time would the use of E2 have to be to get the same overall speedup as in part (a)?
c. Suppose we are free to choose between E1 or E2, whenever we want (the percentages
of time for using E1 or E2 can be varied as desired, but in total cannot be more than 100%
of the time). What would be the maximum achievable overall speedup?
6 a, b, c
a. Speedup = Te old / Te new
Te / [20% (Te/10) + 10% (Te/5) + 70% x (Te/Te)]
= 1 / [0.02 + 0.02 + 0.7] = 1/0.74 = 1.35
b. 1/[0.15/10 + x/5 + (0.85 -x)] = 1 / [0.74]
[0.15/10 + x/5 + (0.85 -x) = 0.74
x = 0.125 / 0.8 = 0.15625
Enhancement 2 would need to increase its percentage time from 10% to
15.625% to make up for a decrease in time of Enhancement 1 from 20% to
15%
c. speedup = Te / [100% x (Te/10)] = 10
Example Problem 7
◼ Suppose a program (or a program task) takes 1 billion instructions to
execute on a processor running at 2 GHz. Suppose also that 50% of the
instructions execute in 3 clock cycles, 30% execute in 4 clock cycles, and
20% execute in 5 clock cycles. What is the execution time for the program
or task?
Instruction Frequency Cycles per Instruction
A 50% 3
B 30% 4
C 20% 5
Problem 7
Average Cycles Per Instruction (CPI) of the Program
= 0.5 x 3 + 0.3 x 4 + 0.2 x 5 = 3.7
1 billion instructions x CPI = number of cycles required by Program =
3.7 x 109
at 2 GHz, one clock cycle consumes = 1 / [ 2 x 109] seconds or 0.5 x
10-9 seconds or 0.5 nanoseconds
So, 3.7 x 109 cycles consumes 3.7 x 109 x 0.5 x 10-9 seconds
= 3.7 x 0.5 = 1.85 seconds
Example Problem 8
◼ Suppose the processor in the previous example is redesigned so that all
instructions that initially executed in 5 cycles now execute in 4 cycles. Due to
changes in the circuitry, the clock rate has to be decreased from 2.0 GHz to
1.9 GHz. What is the overall percentage improvement?
Instruction Frequency Cycles per Instruction
A 50% 3
B 30% 4
C 20% 4
P8
Now, the Average Cycles Per Instruction (CPI) of the Program
= 0.5 x 3 + 0.3 x 4 + 0.2 x 4 = 3.5
So,
1 billion instructions x CPI = number of cycles required by Program = 3.5 x 109
at 1.9 GHz,
one clock cycle consumes = 1 / [ 1.9 x 109] seconds or 0.526315 x 10-9 seconds
So,
3.5 x 109 cycles consumes 3.5 x 109 x 0.526315 x 10-9 seconds = 1.8421025 sec
so improvement is 1.85/1.8421025 = 1.0042872 or ~ 0.43% improvement