COMPUTER ORGANIZATION AND DESIGN
5th
Edition
The Hardware/Software Interface
Chapter 1
Computer Abstractions
and Technology
§1.1 Introduction
The Computer Revolution
Progress in computer technology
Underpinned by Moore’s Law
The number of transistors in a IC doubles every
18-24 months
Makes novel applications feasible
Cell phones
World Wide Web
Search Engines
Self-driving car, drone
VR games
Computers are pervasive
Chapter 1 — Computer Abstractions and Technology — 2
Classes of Computers
Desktop computers
General purpose, variety of software
Subject to cost/performance tradeoff
Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Embedded computers
Hidden as components of systems
Stringent power/performance/cost constraints
Chapter 1 — Computer Abstractions and Technology — 3
Processor Market - The PostPC Era
Chapter 1 — Computer Abstractions and Technology — 4
Understanding Performance
Algorithm
Determines number of operations executed
Programming language, compiler
Determine number of machine instructions for each
source-level statement
Processor and memory system
Determine how fast machine instructions are executed
I/O system (including OS)
Determines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and Technology — 5
§1.2 Eight Great Ideas in Computer Architecture
Eight Great Ideas
Design for Moore’s Law
Use abstraction to simplify design
Make the common case fast
Performance via parallelism
Performance via pipelining
Performance via prediction
Hierarchy of memories
Dependability via redundancy
Chapter 1 — Computer Abstractions and Technology — 6
§1.2 Below Your Program
Below Your Program
Application software
Written in high-level language
System software
Compiler: translates HLL code to
machine code
Operating System: service code
Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
Chapter 1 — Computer Abstractions and Technology — 7
Levels of Program Code
High-level language
Level of abstraction closer
to problem domain
Provides for productivity
and portability
Assembly language
Textual representation of
instructions
Hardware representation
Binary digits (bits)
Encoded instructions and
data
Chapter 1 — Computer Abstractions and Technology — 8
§1.3 Under the Covers
Components of a Computer
The BIG Picture Same components for
all kinds of computer
Desktop, server,
embedded
Input/output includes
Input
User-interface devices
Display, keyboard, mouse
Storage devices
Output
Hard disk, CD/DVD, flash
Network adapters
Processor Memory
For communicating with
other computers
Chapter 1 — Computer Abstractions and Technology — 9
Anatomy of a Computer
Output
device
Network
cable
Input Input
device device
Chapter 1 — Computer Abstractions and Technology — 10
Opening the Box
Chapter 1 — Computer Abstractions and Technology — 11
Inside the Processor (CPU)
AMD Barcelona: 4 processor cores
Cache memory : Small fast SRAM memory for immediate access to data
Chapter 1 — Computer Abstractions and Technology — 12
Abstractions
The BIG Picture
Abstraction helps us deal with complexity
Hide lower-level detail
Instruction set architecture (ISA)
The hardware/software interface
Instruction set, register set, addressing mode,
interrupt, etc.
Chapter 1 — Computer Abstractions and Technology — 13
Technology Trend
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2005 Ultra large scale IC 6,200,000,000
Technology
continues to
evolve
Increased DRAM capacity
capacity and
performance
Reduced cost
Chapter 1 — Computer Abstractions and Technology — 14
§1.7 Real Stuff: The AMD Opteron X4
Manufacturing ICs
Yield: proportion of working dies per wafer
Chapter 1 — Computer Abstractions and Technology — 15
AMD Opteron X2 Wafer
AMD Opteron X2 Wafer Intel Core i7 Wafer
12-inch wafer, 280 chips,
X2: 12-inch wafer, 117 chips, 32nm technology
90nm technology Each chip is 20.7 x 10.5 mm
Chapter 1 — Computer Abstractions and Technology — 16
§1.4 Performance
Defining Performance
Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 500 1000 1500 0 100000 200000 300000 400000
Cruising Speed (mph) Passengers x mph
Chapter 1 — Computer Abstractions and Technology — 17
Response Time and Throughput
Response time
How long it takes to do a task
Throughput
Total work done per unit time
e.g., tasks/transactions/… per hour
How are response time and throughput affected
by
Replacing the processor with a faster version?
Adding more processors?
We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and Technology — 18
Relative Performance
Define Performance = 1/Execution Time
“X is n time faster than Y”
Performance X Performance Y
Execution time Y Execution time X n
Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 19
Measuring Execution Time
Elapsed time
Total response time, including all aspects
CPU processing, I/O, OS overhead, idle time
Determines system performance
CPU time The time we focus
Comprises user CPU time and system CPU time
We focus on user CPU time which means the CPU
time spent in the code of our application.
Chapter 1 — Computer Abstractions and Technology — 20
CPU Clocking
Digital hardware governed by a constant-rate clock
Clock period
Clock (cycles)
Data transfer/calculation
Update state
Clock period (cycle time): duration of a clock cycle
reciprocal
e.g., 250ps = 0.25ns = 250×10–12s
Note: s ms (10-3) us(10-6) ns (10-9) ps (10-12)
Clock rate (frequency) : cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Note: Hz KHz (103) MHz(106) GHz(109) THz (1012)
CPU Time
CPU Time CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
Clock Rate
Performance improved by
Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock
rate against cycle count
Chapter 1 — Computer Abstractions and Technology — 22
CPU Time Example
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be? (i.e., clock rate?)
Chapter 1 — Computer Abstractions and Technology — 23
Instruction Count and CPI
Clock Cycles Instruction Count Cycles per Instruction
CPU Time Instruction Count CPI Clock Cycle Time
Instruction Count CPI
Clock Rate
Instruction Count for a program
Determined by program (language/algorithm), compiler, ISA
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 24
CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
A, B : Same ISA they mean “same instr. count“
Same program running on A and B, which is faster,
and by how much?
CPU Time Instruction Count CPI Cycle Time
A A A
I 2.0 250ps I 500ps A is faster…
CPU Time Instruction Count CPI Cycle Time
B B B
I 1.2 500ps I 600ps
B I 600ps 1.2
CPU Time
…by this much
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 25
CPI in More Detail
If different instruction classes take different
numbers of cycles
n
Clock Cycles (CPIi Instructio n Counti )
i1
Weighted average CPI
Clock Cycles n
Instructio n Counti
CPI CPIi
Instructio n Count i1 Instructio n Count
Relative frequency
Chapter 1 — Computer Abstractions and Technology — 26
CPI Example
Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 27
Performance Summary
The BIG Picture
Tcycle
IC CPI (=1/CR)
Instructio ns Clock cycles Seconds
CPU Time
Program Instructio n Clock cycle
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 28
§1.5 The Power Wall
Power Trends
Power
consumption
goes with
clock rate
In CMOS IC technology
Power Capacitive load Voltage2 Frequency
×22 5V → 1V ×300
Chapter 1 — Computer Abstractions and Technology — 29
Reducing Power
The power wall
We can’t reduce voltage further
We can’t remove more heat
How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 30
§1.6 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
Constrained by power, instruction-level parallelism, memory latency
Turn to design multiple processors (cores)
Chapter 1 — Computer Abstractions and Technology — 31
Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Rewrite programs for parallelism
Load balancing
Optimizing communication and synchronization
Comparison
With instruction-level parallelism (pipeline)
Hardware executes multiple instructions at once
Hidden from the programmer (Note: In the past, programmers
could rely on innovations in hardware to double the performance
every 18 months without having to change a line of code)
Chapter 1 — Computer Abstractions and Technology — 32
SPEC CPU Benchmark
Programs used to measure performance
Supposedly typical of actual workload
Standard Performance Evaluation Corp (SPEC)
Develops benchmarks for CPU, I/O, Web, …
SPEC CPU2006
Elapsed time to execute a selection of programs
Negligible I/O, so focuses on CPU performance
Normalize relative to reference machine
Summarize as geometric mean of performance ratios
CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i1
i
Chapter 1 — Computer Abstractions and Technology — 33
CINT2006 for Opteron X4 2356
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECrati
o
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
Ref time: provided by SPEC
SPECratio: Ref-time/Exec-time (The bigger the SPECratio, the faster the CPU is).
Chapter 1 — Computer Abstractions and Technology — 34
§1.8 Fallacies and Pitfalls
Amdahl’s Law
Pitfall: Improve an aspect of a computer and expect a
proportional improvement in overall performance
Incorrect
Taffected
Amdahl’s Law: Timproved Tunaffected
improvement factor
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to
get 4× overall performance?
100 80
20 n =16
4 n
100 80
How about 5× ? 20 Can’t be done!
5 n
Chapter 1 — Computer Abstractions and Technology — 35
Example
Suppose we enhance a machine making all floating-point
instructions run five times faster. If the execution time of
some benchmark before the floating-point enhancement is 10
sec, what will the speedup be if half of the 10 sec is spent
executing floating-point instructions?
We are looking for a benchmark to show off the new floating-
point unit described above, and want the overall benchmark
to show a speedup of 2. One benchmark we are considering
runs for 100 sec with the old floating-point hardware. How
much of the execution time would floating-point instructions
have to account for in this program in order to yield desired
speedup on this benchmark?
Chapter 1 — Computer Abstractions and Technology — 36
Fallacy: Low Power at Idle
Look back at i7 power benchmark
At 100% load: 258W
At 50% load: 170W (66%)
At 10% load: 121W (47%)
Google data center
Mostly operates at 10% – 50% load
At 100% load less than 1% of the time
Consider designing processors to make
power proportional to load
Chapter 1 — Computer Abstractions and Technology — 37
Pitfall: MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second
Doesn’t account for
Differences in ISAs between computers
Differences in complexity between instructions
even with the same ISA, different programs on
the same CPU have different MIPS,
Instructio n count
MIPS
Execution time 10 6
Instructio n count Clock rate
Instructio n count CPI CPI 10 6
10 6
Clock rate
CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 38
§1.9 Concluding Remarks
Concluding Remarks
Cost/performance is improving
Due to underlying technology development
Hierarchical layers of abstraction
In both hardware and software
Instruction set architecture
The hardware/software interface
Execution time
The best performance measure
Power is a limiting factor
Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 39