Computer
Abstractions and
Technology
Computer Organization
502044
Acknowledgement
This slide show is intended for use in class, and is not a complete document.
Students need to refer to the book to read more lessons and exercises.
Students have the right to download and store lecture slides for reference
purposes; Do not redistribute or use for purposes outside of the course.
[2]. David A. Patterson, John L. Hennessy, [2014], Computer Organization and
Design: The Hardware/Software Interface, 5th edition, Elsevier, Amsterdam.
[3]. John L. Hennessy, David A. Patterson, [2012], Computer Architecture: A
Quantitative Approach, 5th edition, Elsevier, Amsterdam.
📧 trantrungtin.tdtu.edu.vn
2
Syllabus
● 6.1 Introduction
● 6.2 Great Ideas in Computer ● 6.5 Technologies for Building
Architecture Processors and Memory
● 6.3 Below Your Program ● 6.6 Performance
● 6.4 Under the Covers ● 6.7 The Power Wall
● 6.8 The Switch from
Uniprocessors to
Multiprocessors
3
CS2100 Computer Organisation Unit1 - 4
HOW DO THE PIECES FIT TOGETHER?
Application (IE, Excel, etc.)
Software
Operating
System
Compiler (Windows XP)
Assembler
Instruction Set
Architecture
Processor Memory I/O system
Datapath & Control
Computer Architecture
Digital Design
Hardware
transistors Digital Logic Design
◼ Coordination of many levels of abstraction
◼ Under a rapidly changing set of forces
◼ Design, measurement, and evaluation
CS2100 Computer Organisation Unit1 - 5
LEVELS OF REPRESENTATION
6.1 Introduction
● Our computers are digital systems, and implemented into Personal
computers, Servers and Embedded computers.
● Binary numbers / codes are suitable for digital circuits which works on
LOW / HIGH signals.
● Decimal, Binary, Octal and Hexadecimal are radixes using in computer
science.
● Numbers for calculating, Codes for transferring.
● Register, memory are physical devices that store the binary information.
6
The Computer Revolution
● Progress in computer technology
○ Underpinned by Moore’s Law
● Makes novel applications feasible
○ Computers in automobiles
○ Cell phones
○ Human genome project
○ World Wide Web
○ Search Engines
● Computers are pervasive
7
Classes of Computers
● Personal computers
○ General purpose, variety of software
○ Subject to cost/performance tradeoff
● Server computers
○ Network based
○ High capacity, performance, reliability
○ Range from small servers to building sized
8
Classes of Computers
● Supercomputers
○ High-end scientific and engineering calculations
○ Highest capability but represent a small fraction of the overall computer market
● Embedded computers
○ Hidden as components of systems
○ Stringent power/performance/cost constraints
9
The PostPC Era
10
The PostPC Era
● Personal Mobile Device (PMD)
○ Battery operated
○ Connects to the Internet
○ Hundreds of dollars
○ Smart phones, tablets, electronic glasses
● Cloud computing
○ Warehouse Scale Computers (WSC)
○ Software as a Service (SaaS)
○ Portion of software run on a PMD and a portion run in the Cloud
○ Amazon and Google
11
What You Will Learn
● How programs are translated into the machine language
○ And how the hardware executes them
● The hardware/software interface
● What determines program performance
○ And how it can be improved
● How hardware designers improve performance
● What is parallel processing
12
Understanding Performance
● Algorithm
○ Determines number of operations executed
● Programming language, compiler, architecture
○ Determine number of machine instructions executed per operation
● Processor and memory system
○ Determine how fast instructions are executed
● I/O system (including OS)
○ Determines how fast I/O operations are executed
13
6.2 Eight Great Ideas in Computer Architecture
● ●Our computers are digital systems, and implemented into Personal
computers, Servers and Embedded computers.
● ●Binary numbers / codes are suitable for digital circuits which works on
LOW / HIGH signals.
● ●Decimal, Binary, Octal and Hexadecimal are radixes using in computer
science.
● ●Numbers for calculating, Codes for transferring.
● ●Register, memory are physical devices that store the binary information.
14
Some Great Ideas
● Design for Moore’s Law
● Use abstraction to simplify design
● Make the common case fast
● Performance via parallelism
● Performance via pipelining
● Performance via prediction
● Hierarchy of memories
● Dependability via redundancy
15
6.3 Below Your Program
16
Below Your Program
● Application software
○ Written in high-level language
● System software
○ Compiler: translates HLL code to machine code
○ Operating System: service code
■ Handling input/output
■ Managing memory and storage
■ Scheduling tasks & sharing resources
● Hardware
○ Processor, memory, I/O controllers
17
Levels of Program Code
● High-level language
○ Level of abstraction closer to problem domain
○ Provides for productivity and portability
● Assembly language
○ Textual representation of instructions
● Hardware representation
○ Binary digits (bits)
○ Encoded instructions and data
18
6.4 Under the Covers
19
Components of a Computer
The BIG Picture
● Same components for
all kinds of computer
○ Desktop, server,
embedded
● Input/output includes
○ User-interface devices
■ Display, keyboard, mouse
○ Storage devices
■ Hard disk, CD/DVD, flash
○ Network adapters
■ For communicating with other computers
20
Touchscreen
● PostPC device
● Supersedes keyboard and mouse
● Resistive and Capacitive types
○ Most tablets, smart phones use capacitive
○ Capacitive allows multiple touches simultaneously
21
Through the Looking Glass
● LCD screen: picture elements (pixels)
○ Mirrors content of frame buffer memory
22
Opening the Box
Capacitive multitouch LCD screen
3.8 V, 25 Watt-hour battery
Computer board
23
Inside the Processor (CPU)
● Datapath: performs operations on data
● Control: sequences datapath, memory, ...
● Cache memory
○ Small fast SRAM memory for immediate access to data
24
Inside the Processor
● Apple A5
25
Abstractions The BIG Picture
● Abstraction helps us deal with complexity
○ Hide lower-level detail
● Instruction set architecture (ISA)
○ The hardware/software interface
● Application binary interface
○ The ISA plus system software interface
● Implementation
○ The details underlying and interface
26
A Safe Place for Data
● Volatile main memory
○ Loses instructions and data when power off
● Non-volatile secondary memory
○ Magnetic disk
○ Flash memory
○ Optical disk (CDROM, DVD)
27
Networks
● Communication, resource sharing, nonlocal access
● Local area network (LAN): Ethernet
● Wide area network (WAN): the Internet
● Wireless network: WiFi, Bluetooth
28
6.5 Technologies for Building Processors
and Memory
29
§1.5 Technologies for Building Processors and Memory
Technology Trends
● Electronics technology continues to evolve
○ Increased capacity and performance
○ Reduced cost
DRAM capacity
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
30
Semiconductor Technology
● Silicon: semiconductor
● Add materials to transform properties:
○ Conductors
○ Insulators
○ Switch
31
Manufacturing ICs
● Yield: proportion of working dies per wafer
32
Intel Core i7 Wafer
● 300mm wafer, 280 chips, 32nm technology
● Each chip is 20.7 x 10.5 mm
33
Integrated Circuit Cost
● Nonlinear relation to area and defect rate
Cost per wafer
Cost per die
○ Wafer cost and area are fixed=
Dies per wafer Yield
○ Defect rate determined by manufacturing process
○ Die area determined by architecture
Dies per wafer Waferand circuit
areadesign
Die area
1
Yield =
(1+ (Defects per area Die area/2))2
34
6.6 Performance
35
§1.6 Performance
Defining Performance
● Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 500 1000 1500 0 100000 200000 300000 400000
Cruising Speed (mph) Passengers x mph
36
Response Time and Throughput
● Response time
○ How long it takes to do a task
● Throughput
○ Total work done per unit time
■ e.g., tasks/transactions/… per hour
● How are response time and throughput affected by
○ Replacing the processor with a faster version?
○ Adding more processors?
● We’ll focus on response time for now…
37
Relative Performance
● Define Performance = 1/Execution Time
● “X is n time faster than Y”
Performanc e X Performanc e Y
= Execution time Y Execution time X = n
◼ Example: time taken to run a program
◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So A is 1.5 times faster than B
38
Measuring Execution Time
● Elapsed time
○ Total response time, including all aspects
■ Processing, I/O, OS overhead, idle time
○ Determines system performance
● CPU time
○ Time spent processing a given job
■ Discounts I/O time, other jobs’ shares
○ Comprises user CPU time and system CPU time
○ Different programs are affected differently by CPU and system performance
39
CPU Clocking
● Operation of digital hardware governed by a constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
◼ Clock period: duration of a clock cycle
◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
40
CPU Time
● Performance improved by
○ Reducing number of clock cycles
○ Increasing clock rate
○ Hardware designer must often trade off clock rate against cycle count
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
41
CPU Time Example
● Computer A: 2GHz clock, 10s CPU time
● Designing Computer B
○ Aim for 6s CPU time
○ Can do faster clock, but causes 1.2 × clock cycles
● How fast must Computer B clock be?
Clock CyclesB 1.2 Clock CyclesA
Clock RateB = =
CPU Time B 6s
Clock CyclesA = CPU Time A Clock Rate A
= 10s 2GHz = 20 109
1.2 20 109 24 109
Clock RateB = = = 4GHz
6s 6s
42
Instruction Count and CPI
● Instruction Count for a program
○ Determined by program, ISA and compiler
● Average cycles per instruction
○ Determined by CPU hardware
○ If different instructions have different CPI
■ Average CPI affected by instruction mix
Clock Cycles = Instructio n Count Cycles per Instructio n
CPU Time = Instructio n Count CPI Clock Cycle Time
Instructio n Count CPI
=
Clock Rate
43
CPI Example
● Computer A: Cycle Time = 250ps, CPI = 2.0
● Computer B: Cycle Time = 500ps, CPI = 1.2
● Same ISA
● Which is faster, and by how much?
CPU Time = Instructio n Count CPI Cycle Time
A A A
= I 2.0 250ps = I 500ps A is faster…
CPU Time = Instructio n Count CPI Cycle Time
B B B
= I 1.2 500ps = I 600ps
B = I 600ps = 1.2
CPU Time
…by this
CPU Time I 500ps
A much
44
CPI in More Detail
● If different instruction classes take different numbers of cycles
n
Clock Cycles = (CPIi Instruction Count i )
i=1
◼ Weighted average CPI
Clock Cycles n
Instruction Count i
CPI = = CPIi
Instruction Count i=1 Instruction Count
Relative frequency
45
CPI Example
● Alternative compiled code sequences using instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = ◼ Avg. CPI = 9/6 =
2.0 1.5 46
Performance Summary
● Performance depends on
○ Algorithm: affects IC, possibly CPI
○ Programming language: affects IC, CPI
○ Compiler: affects IC, CPI
○ Instruction set architecture: affects IC, CPI, Tc
Instructions Clock cycles Seconds
CPU Time =
Program Instruction Clock cycle
47
6.7 The Power Wall
48
Power Trends
● In CMOS IC technology
Power = Capacitive load Voltage 2 Frequency
×30 5V → 1V ×1000
49
Reducing Power
● Suppose a new CPU has
○ 85% of capacitive load of old CPU
○ 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85
= = 0.854 = 0.52
Cold Vold Fold
2
Pold
◼ The power wall
◼ We can’t reduce voltage further
◼ We can’t remove more heat
◼ How else can we improve performance?
50
6.8 The Switch from Uniprocessors to
Multiprocessors
51
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
Constrained by power, instruction-level parallelism,
memory latency
52
Multiprocessors
● Multicore microprocessors
○ More than one processor per chip
● Requires explicitly parallel programming
○ Compare with instruction level parallelism
■ Hardware executes multiple instructions at once
■ Hidden from the programmer
○ Hard to do
■ Programming for performance
■ Load balancing
■ Optimizing communication and synchronization
53
SPEC CPU Benchmark
● Programs used to measure performance
○ Supposedly typical of actual workload
● Standard Performance Evaluation Corp (SPEC)
○ Develops benchmarks for CPU, I/O, Web, …
● SPEC CPU2006
○ Elapsed time to execute a selection of programs
■ Negligible I/O, so focuses on CPU performance
○ Normalize relative to reference machine
○ Summarize as geometric mean of performance ratios
■ CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i=1
i
54
CINT2006 for Intel Core i7 920
55
SPEC Power Benchmark
● Power consumption of server at different workload levels
○ Performance: ssj_ops/sec
○ Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt = ssj_opsi poweri
i =0 i =0
56
SPECpower_ssj2008 for Xeon X5650
57
Pitfall: Amdahl’s Law
● Improving an aspect of a computer and expecting a proportional
improvement in overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance
to get 5× overall?
80 Can’t be done!
20 = + 20 ◼
n
◼ Corollary: make the common case fast
58
Fallacy: Low Power at Idle
● Look back at i7 power benchmark
○ At 100% load: 258W
○ At 50% load: 170W (66%)
○ At 10% load: 121W (47%)
● Google data center
○ Mostly operates at 10% – 50% load
○ At 100% load less than 1% of the time
● Consider designing processors to make power proportional to load
59
Pitfall: MIPS as a Performance Metric
● MIPS: Millions of Instructions Per Second
○ Doesn’t account for
■ Differences in ISAs between computers
■ Differences in complexity between instructions
Instruction count
MIPS =
Execution time 106
Instruction count Clock rate
= =
Instruction count CPI CPI 10 6
10 6
Clock rate
◼ CPI varies between programs on a given
CPU 60
Uniprocessor Performance
Constrained by power, instruction-level parallelism,
memory latency
61
Multiprocessors
● Multicore microprocessors
○ More than one processor per chip
● Requires explicitly parallel programming
○ Compare with instruction level parallelism
■ Hardware executes multiple instructions at once
■ Hidden from the programmer
○ Hard to do
■ Programming for performance
■ Load balancing
■ Optimizing communication and synchronization
62
SPEC CPU Benchmark
● Programs used to measure performance
○ Supposedly typical of actual workload
● Standard Performance Evaluation Corp (SPEC)
○ Develops benchmarks for CPU, I/O, Web, …
● SPEC CPU2006
n
○ Elapsed time to execute a selection of programs
■ Negligible I/O, so focuses on CPU performance
n
Execution time ratio
i=1
i
○ Normalize relative to reference machine
○ Summarize as geometric mean of performance ratios
■ CINT2006 (integer) and CFP2006 (floating-point)
63
CINT2006 for Intel Core i7 920
64
SPEC Power Benchmark
● Power consumption of server at different workload levels
○ Performance: ssj_ops/sec
○ Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt = ssj_opsi poweri
i =0 i =0
65
SPECpower_ssj2008 for Xeon X5650
66
Pitfall: Amdahl’s Law
● Improving an aspect of a computer and expecting a proportional
improvement in overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance
to get 5× overall?
80 Can’t be done!
20 = + 20 ◼
n
◼ Corollary: make the common case fast
67
Fallacy: Low Power at Idle
● Look back at i7 power benchmark
○ At 100% load: 258W
○ At 50% load: 170W (66%)
○ At 10% load: 121W (47%)
● Google data center
○ Mostly operates at 10% – 50% load
○ At 100% load less than 1% of the time
● Consider designing processors to make power proportional to load
68
Pitfall: MIPS as a Performance Metric
● MIPS: Millions of Instructions Per Second
○ Doesn’t account for
■ Differences in ISAs between computers
■ Differences in complexity between instructions
Instruction count
MIPS =
Execution time 106
Instruction count Clock rate
= =
Instruction count CPI CPI 10 6
10 6
Clock rate
● CPI varies between programs on a given CPU
69