Designing for performance, Performance Assessment
If you were running a program on two different desktop
computers, you’d say that the faster one is the desktop
computer that gets the job done first.
If you were running a datacenter that had several servers
running jobs submitted by many users, you’d say that the
faster computer was the one that completed the most
jobs during a day.
As an individual computer user, you are interested in
reducing response time—the time between the start and
completion of a task—also referred to as execution time.
Datacenter managers often care about increasing
throughput or bandwidth—the total amount of work done
in a given time.
Hence, in most cases, we will need different performance
metrics as well as different sets of applications to
benchmark personal mobile devices, which are more
focused on response time, versus servers, which are
more focused on throughput.
Response time
Also called execution time. The total time required for the
computer to complete a task, including disk accesses,
memory accesses, I/O activities, operating system
overhead, CPU execution time, and so on.
Throughput
Also called bandwidth. Another measure of performance,
it is the number of tasks completed per unit time.
Thus, we can relate performance and execution time for
a computer X:
This means that for two computers X and Y, if the
performance of X is greater than the performance of Y,
we have
In discussing a computer design, we often want to relate
the performance of two different computers
quantitatively. We will use the phrase “X is n times faster
than Y”—or equivalently “X is n times as fast as Y”—to
mean
Example
If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds, how
much faster is A than B?
Answer
We know that A is n times as fast as B if
and A is therefore 1.5 times as fast as B.
Measuring Performance
Time is the measure of computer performance: the
computer that performs the same amount of work in the
least time is the fastest.
Program execution time is measured in seconds per
program.
The most straightforward definition of time is
called wall clock time, response time, or elapsed time.
These terms mean the total time to complete a task,
including disk accesses, memory accesses, input/output
(I/O) activities, operating system overhead—everything.
Computers are often shared, however, and a processor
may work on several programs simultaneously. In such
cases, the system may try to optimize throughput rather
than attempt to minimize the elapsed time for one
program. Hence, we often want to distinguish between
the elapsed time and the time over which the processor
is working on our behalf.
CPU execution time or simply CPU time, which
recognizes this distinction, is the time the CPU spends
computing for this task and does not include time spent
waiting for I/O or running other programs.
CPU execution time
Also called CPU time. The actual time the CPU spends
computing for a specific task.
User CPU time
The CPU time spent in a program itself.
System CPU time
The CPU time spent in the operating system performing
tasks on behalf of the program.
Although as computer users we care about time, when
we examine the details of a computer it’s convenient to
think about performance in other metrics. In particular,
computer designers may want to think about a computer
by using a measure that relates to how fast the hardware
can perform basic functions. Almost all computers are
constructed using a clock that determines when events
take place in the hardware. These discrete time
intervals are called clock cycles (or ticks, clock ticks,
clock periods, clocks, cycles). Designers refer to the
length of a clock period both as the time for a complete
clock cycle (e.g., 250 picoseconds, or 250 ps) and as the
clock rate (e.g., 4 gigahertz, or 4 GHz), which is the
inverse of the clock period.
The unit of measurement called a hertz (Hz), which is
technically one cycle per second, is used to measure
clock speed.
Clock cycle
Also called tick, clock tick, clock period, clock, or cycle.
The time for one clock period, usually of the processor
clock, which runs at a constant rate.
Clock period
The length of each clock cycle.
CPU Performance and Its Factors
CPU performance measure is CPU execution
time.
This formula makes it clear that the hardware
designer can improve performance by reducing
the number of clock cycles required for a program
or the length of the clock cycle.
Example
Our favorite program runs in 10 seconds on
computer A, which has a 2 GHz clock.
Let’s first find the number of clock cycles required for
the program on A:
Example
Our favorite program runs in 10 seconds on computer A,
which has a 2 GHz clock. We are trying to help a computer
designer build a computer, B, which will run this program in
6 seconds. The designer has determined that a substantial
increase in the clock rate is possible, but this increase will
affect the rest of the CPU design, causing computer B to
require 1.2 times as many clock cycles as computer A for this
program. What clock rate should we tell the designer to
target?
Clock cycle computation of computer A is already shown
earlier.
CPU time for B can be found using this equation:
To run the program in 6 seconds, B must have twice
the clock rate of A.
Instruction Performance
The performance equations above did not include any
reference to the number of instructions needed for the
program. However, since the compiler clearly
generated instructions to execute, and the computer
had to execute the instructions to run the program, the
execution time must depend on the number of
instructions in a program. One way to think about
execution time is that it equals the number of
instructions executed multiplied by the average time
per instruction. Therefore, the number of clock cycles
required for a program can be written as:
The term clock cycles per instruction, which is the
average number of clock cycles each instruction takes
to execute, is often abbreviated as CPI.
Example
Suppose we have two implementations of the same
instruction set architecture. Computer A has a clock cycle
time of 250 ps and a CPI of 2.0 for some program, and
computer B has a clock cycle time of 500 ps and a CPI of 1.2
for the same program. Which computer is faster for this
program and by how much?
Answer
We know that each computer executes the same
number of instructions for the program; let’s call this
number I. First, find the number of processor clock
cycles for each computer:
Now we can compute the CPU time for each computer:
Likewise, for B:
Clearly, computer A is faster. The amount faster is
given by the ratio of the execution times:
We can conclude that computer A is 1.2 times as fast
as computer B for this program.
The Classic CPU Performance Equation
We can now write this basic performance equation in terms
of instruction count (the number of instructions executed by
the program), CPI, and clock cycle time:
or, since the clock rate is the inverse of clock cycle
time
Example
A compiler designer is trying to decide between two code
sequences for a computer. The hardware designers have
supplied the following facts:
For a particular high-level language statement, the
compiler writer is considering two code sequences
that require the following instruction counts:
Which code sequence executes the most instructions?
Which will be faster? What is the CPI for each sequence?
Answer
Which code sequence executes the most instructions?
Sequence 1 executes 2 +1 +2 =5 instructions.
Sequence 2 executes 4+1 +1 =6 instructions.
Therefore, sequence 1 executes fewer instructions.
Which code sequence will be faster?
We can use the equation for CPU clock cycles based on
instruction count and CPI to find the total number of clock
cycles for each sequence:
This yields
So code sequence 2 is faster, even though it executes
one extra instruction.
What is the CPI for each sequence?
The CPI values can be computed by