COMPUTER ORGANIZATION AND DESIGN
Chapter 1
Computer Abstractions
and Technology
Lecture slides are adapted/modified from slides provided by the textbook,
Computer Organization and Design by David A Patterson and John L. Hennessy
publisher Morgan Kaufmann Publisher
The Computer Revolution
◼ Progress in computer technology
◼ Underpinned by Moore’s Law
◼ Makes novel applications feasible
◼ Computers in automobiles
◼ Cell phones
◼ Human genome project Moore’s Law: the no.
of transistors per chip
◼ World Wide Web doubles every two
years
◼ Search Engines
◼ Computers are pervasive
Chapter 1 — Computer Abstractions and Technology — 2
Classes of Computers
◼ Personal computers
◼ General purpose, variety of software
◼ Subject to cost/performance tradeoff
◼ Server computers
◼ Network based
◼ High capacity, performance, reliability
◼ Range from small servers to building sized
Chapter 1 — Computer Abstractions and Technology — 3
Classes of Computers
◼ Supercomputers
◼ High-end scientific and engineering
calculations
◼ Highest capability but represent a small
fraction of the overall computer market
◼ Embedded computers
◼ Hidden as components of systems
◼ Stringent power/performance/cost constraints
Chapter 1 — Computer Abstractions and Technology — 4
The PostPC Era
Chapter 1 — Computer Abstractions and Technology — 5
The PostPC Era
◼ Personal Mobile Device (PMD)
◼ Battery operated
◼ Connects to the Internet
◼ Hundreds of dollars
◼ Smart phones, tablets, electronic glasses
◼ Cloud computing
◼ Warehouse Scale Computers (WSC)
◼ Software as a Service (SaaS) (web search, social
networking)
◼ Portion of software run on a PMD and a
portion run in the Cloud
◼ Amazon and Google
Chapter 1 — Computer Abstractions and Technology — 6
Cloud Computing
Cloud computing refers to
(1) large collection of servers that
provide services over the Internet,
(2) dynamically varying number of
servers as a utility.
SaaS: a portion of code runs on PMD
and a portion that runs in the Cloud.
Chapter 1 — Computer Abstractions and Technology — 7
What You Will Learn
◼ How programs are translated into the
machine language
◼ And how the hardware executes them
◼ The hardware/software interface
◼ What determines program performance
◼ And how it can be improved
◼ How hardware designers improve
performance
◼ What is parallel processing
Chapter 1 — Computer Abstractions and Technology — 8
Understanding Performance
◼ Algorithm
◼ Determines number of operations executed
◼ Programming language, compiler, architecture
◼ Determine number of machine instructions executed
per operation
◼ Processor and memory system
◼ Determine how fast instructions are executed
◼ I/O system (including OS)
◼ Determines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and Technology — 9
Eight Great Ideas
◼ Design for Moore’s Law
◼ Use abstraction to simplify design
◼ Make the common case fast
◼ Performance via parallelism
◼ Performance via pipelining
◼ Performance via prediction
◼ Hierarchy of memories
◼ Dependability via redundancy
Chapter 1 — Computer Abstractions and Technology — 10
Below Your Program
◼ Application software
◼ Written in high-level language
◼ System software
◼ Compiler: translates HLL code to
machine code
◼ Operating System: service code
◼ Handling input/output
◼ Managing memory and storage
◼ Scheduling tasks & sharing resources
◼ Hardware
◼ Processor, memory, I/O controllers
Chapter 1 — Computer Abstractions and Technology — 11
Levels of Program Code
◼ High-level language
◼ Level of abstraction closer
to problem domain
◼ Provides for productivity
and portability
◼ Assembly language
◼ Textual representation of
instructions
◼ Hardware representation
◼ Binary digits (bits)
◼ Encoded instructions and
data
Chapter 1 — Computer Abstractions and Technology — 12
Components of a Computer
The BIG Picture ◼ Same components for
all kinds of computer
◼ Desktop, server,
embedded
◼ Input/output includes
◼ User-interface devices
◼ Display, keyboard, mouse
◼ Storage devices
◼ Hard disk, CD/DVD, flash
◼ Network adapters
◼ For communicating with
other computers
Chapter 1 — Computer Abstractions and Technology — 13
Touchscreen
◼ PostPC device
◼ Supersedes keyboard
and mouse
◼ Resistive and
Capacitive types
◼ Most tablets, smart
phones use capacitive
◼ Capacitive allows
multiple touches
simultaneously
Chapter 1 — Computer Abstractions and Technology — 14
Through the Looking Glass
◼ LCD screen: picture elements (pixels)
◼ Mirrors content of frame buffer memory
Chapter 1 — Computer Abstractions and Technology — 15
Opening the Box
Capacitive multitouch LCD screen
3.8 V, 25 Watt-hour battery
Computer board
Chapter 1 — Computer Abstractions and Technology — 16
Inside the Processor (CPU)
◼ Datapath: performs operations on data
◼ Control: sequences datapath, memory, ...
◼ Cache memory
◼ Small fast SRAM memory for immediate
access to data
Chapter 1 — Computer Abstractions and Technology — 17
Inside the Processor
◼ Apple A5
Chapter 1 — Computer Abstractions and Technology — 18
Abstractions
The BIG Picture
◼ Abstraction helps us deal with complexity
◼ Hide lower-level detail
◼ Instruction set architecture (ISA)
◼ The hardware/software interface
◼ Application binary interface
◼ The ISA plus system software interface
◼ Implementation
◼ The details underlying and interface
Chapter 1 — Computer Abstractions and Technology — 19
A Safe Place for Data
◼ Volatile main memory
◼ Loses instructions and data when power off
◼ Non-volatile secondary memory
◼ Magnetic disk
◼ Flash memory
◼ Optical disk (CDROM, DVD)
Chapter 1 — Computer Abstractions and Technology — 20
Networks
◼ Communication, resource sharing,
nonlocal access
◼ Local area network (LAN): Ethernet
◼ Wide area network (WAN): the Internet
◼ Wireless network: WiFi, Bluetooth
Chapter 1 — Computer Abstractions and Technology — 21
§1.5 Technologies for Building Processors and Memory
Technology Trends
◼ Electronics
technology
continues to evolve
◼ Increased capacity
and performance
◼ Reduced cost
DRAM capacity
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
Chapter 1 — Computer Abstractions and Technology — 22
Semiconductor Technology
◼ Silicon: semiconductor
◼ Add materials to transform properties:
◼ Conductors
◼ Insulators
◼ Switch
Chapter 1 — Computer Abstractions and Technology — 23
Manufacturing ICs
◼ Yield: proportion of working dies per wafer
Chapter 1 — Computer Abstractions and Technology — 24
Intel Core i7 Wafer
◼ 300mm wafer, 280 chips, 32nm technology
◼ Each chip is 20.7 x 10.5 mm
Chapter 1 — Computer Abstractions and Technology — 25
Integrated Circuit Cost
Cost per wafer
Cost per die =
Dies per wafer Yield
Dies per wafer Wafer area Die area
1
Yield =
(1+ (Defects per area Die area/2)) 2
◼ Nonlinear relation to area and defect rate
◼ Wafer cost and area are fixed
◼ Defect rate determined by manufacturing process
◼ Die area determined by architecture and circuit design
Chapter 1 — Computer Abstractions and Technology — 26
§1.6 Performance
Defining Performance
◼ Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 500 1000 1500 0 100000 200000 300000 400000
Cruising Speed (mph) Passengers x mph
Chapter 1 — Computer Abstractions and Technology — 27
Response Time and Throughput
◼ Response time
◼ How long it takes to do a task
◼ Throughput
◼ Total work done per unit time
◼ e.g., tasks/transactions/… per hour
◼ How are response time and throughput affected
by
◼ Replacing the processor with a faster version?
◼ Adding more processors?
◼ We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and Technology — 28
Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n time faster than Y”
Performanc e X Performanc e Y
= Execution time Y Execution time X = n
◼ Example: time taken to run a program
◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 29
Measuring Execution Time
◼ Elapsed time
◼ Total response time, including all aspects
◼ Processing, I/O, OS overhead, idle time
◼ Determines system performance
◼ CPU time
◼ Time spent processing a given job
◼ Discounts I/O time, other jobs’ shares
◼ Comprises user CPU time and system CPU
time
◼ Different programs are affected differently by
CPU and system performance
Chapter 1 — Computer Abstractions and Technology — 30
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
◼ Clock period: duration of a clock cycle
◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 31
CPU Time
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
◼ Performance improved by
◼ Reducing number of clock cycles
◼ Increasing clock rate
◼ Hardware designer must often trade off clock
rate against cycle count
Chapter 1 — Computer Abstractions and Technology — 32
CPU Time Example
◼ Computer A: 2GHz clock, 10s CPU time
◼ Designing Computer B
◼ Aim for 6s CPU time
◼ Can do faster clock, but causes 1.2 × clock cycles
◼ How fast must Computer B clock be?
Clock CyclesB 1.2 Clock Cycles A
Clock RateB = =
CPU Time B 6s
Clock Cycles A = CPU Time A Clock Rate A
= 10s 2GHz = 20 10 9
1.2 20 10 9 24 10 9
Clock RateB = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 33
Instruction Count and CPI
Clock Cycles = Instruction Count Cycles per Instruction
CPU Time = Instruction Count CPI Clock Cycle Time
Instruction Count CPI
=
Clock Rate
◼ Instruction Count for a program
◼ Determined by program, ISA and compiler
◼ Average cycles per instruction
◼ Determined by CPU hardware
◼ If different instructions have different CPI
◼ Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 34
CPI Example
◼ Computer A: Cycle Time = 250ps, CPI = 2.0
◼ Computer B: Cycle Time = 500ps, CPI = 1.2
◼ Same ISA
◼ Which is faster, and by how much?
CPU Time = Instruction Count CPI Cycle Time
A A A
= I 2.0 250ps = I 500ps A is faster…
CPU Time = Instruction Count CPI Cycle Time
B B B
= I 1.2 500ps = I 600ps
B = I 600ps = 1.2
CPU Time
…by this much
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 35
CPI in More Detail
◼ If different instruction classes take different
numbers of cycles
n
Clock Cycles = (CPIi Instruction Count i )
i=1
◼ Weighted average CPI
Clock Cycles n
Instruction Count i
CPI = = CPIi
Instruction Count i=1 Instruction Count
Relative frequency
Chapter 1 — Computer Abstractions and Technology — 36
CPI Example
◼ Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2 2+1+2=5 inst.
IC in sequence 2 4 1 1 4+1+1=6 inst.
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 37
Performance Summary
The BIG Picture
Instructions Clock cycles Seconds
CPU Time =
Program Instruction Clock cycle
◼ Performance depends on
◼ Algorithm: affects IC, possibly CPI
◼ Programming language: affects IC, CPI
◼ Compiler: affects IC, CPI
◼ Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 38
§1.7 The Power Wall
Power Trends More complex pipeline
Simpler pipeline Core 2
CMOS primary energy consumption
◼ In CMOS IC technology is dynamic energy, switch on->off;
off->on controlled by the clock freq.
Power = 0.5 Capacitive load Voltage 2 Frequency
Dynamic ×30 5V → 1V ×1000
Power
Chapter 1 — Computer Abstractions and Technology — 39
Reducing Power
◼ Suppose a new CPU has
◼ 85% of capacitive load of old CPU
◼ 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85
= = 0.85 4
= 0.52
Cold Vold Fold
2
Pold
◼ The power wall
◼ We can’t reduce voltage further
◼ We can’t remove more heat
◼ How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 40
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
Constrained by power, instruction-level parallelism,
memory latency
Chapter 1 — Computer Abstractions and Technology — 41
Multiprocessors
◼ Multicore microprocessors
◼ More than one processor per chip
◼ Requires explicitly parallel programming
◼ Compare with instruction level parallelism
◼ Hardware executes multiple instructions at once
◼ Hidden from the programmer
◼ Hard to do
◼ Programming for performance
◼ Load balancing
◼ Optimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology — 42
SPEC CPU Benchmark
◼ Programs used to measure performance
◼ Supposedly typical of actual workload
◼ Standard Performance Evaluation Corp (SPEC)
◼ Develops benchmarks for CPU, I/O, Web, …
◼ SPEC CPU2006
◼ Elapsed time to execute a selection of programs
◼ Negligible I/O, so focuses on CPU performance
◼ Normalize relative to reference machine
◼ Summarize as geometric mean of performance ratios
◼ CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i=1
i
Chapter 1 — Computer Abstractions and Technology — 43
CINT2006 for Intel Core i7 920
Chapter 1 — Computer Abstractions and Technology — 44
SPEC Power Benchmark
◼ Power consumption of server at different
workload levels
◼ Performance: ssj_ops
◼ Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt = ssj_ops i poweri
i=0 i=0
ssj_ops/watt (server side Java operations per second per watt)
Chapter 1 — Computer Abstractions and Technology — 45
SPECpower_ssj2008 for Xeon X5650
Chapter 1 — Computer Abstractions and Technology — 46
Pitfall: Amdahl’s Law
◼ Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance to
get 5× overall?
80
20 = + 20 ◼ Can’t be done!
n
◼ Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 47
Fallacy: Low Power at Idle
◼ Look back at i7 power benchmark
◼ At 100% load: 258W
◼ At 50% load: 170W (66%)
◼ At 10% load: 121W (47%)
◼ Google data center
◼ Mostly operates at 10% – 50% load
◼ At 100% load less than 1% of the time
◼ Consider designing processors to make
power proportional to load
Chapter 1 — Computer Abstractions and Technology — 48
Pitfall: MIPS as a Performance Metric
◼ MIPS: Millions of Instructions Per Second
◼ Doesn’t account for
◼ Differences in ISAs between computers
◼ Differences in complexity between instructions
Instructio n count
MIPS =
Execution time 10 6
Instructio n count Clock rate
= =
Instructio n count CPI CPI 10 6
10 6
Clock rate
◼ CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 49
Concluding Remarks
◼ Cost/performance is improving
◼ Due to underlying technology development
◼ Hierarchical layers of abstraction
◼ In both hardware and software
◼ Instruction set architecture
◼ The hardware/software interface
◼ Execution time: the best performance
measure
◼ Power is a limiting factor
◼ Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 50