COMPUTER ARCHITECTURE
Chapter 1: Technology & Performance evaluation
Computer Architecture – CSE – HCMIU 1
TECHNOLOGY REVIEW
2
The computer revolution
• The third revolution along with agriculture and industry
• Progress in computer technology
– Underpinned by Moore’s Law
• Makes novel applications feasible
– Computers in automobiles
– Cell phones
– Human genome project
– World Wide Web
– Search Engines
ABACUS
• Computers are pervasive
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 3
The Moore’s Law
Gordon Moore
Intel co-founder
The number of transistors integrated in a chip has doubled every 18-24 months (1975)
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 4
Intel processors
• As of Q1/2023
– Raptor Lake: 13th generation
– 10 nm technology
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 5
History…
• The first computer in the world
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 6
History…
• Facts of ENIAC:
– 30+ tons
– 1,500+ square feet (140
square meter)
– 18,000+ vacuum tubes
– 140+ KW power
– 5,000+ additions per
second
ENIAC: Electronic Numerical Integrator and Computer
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 7
A Brief History of Computers
• The first generation
– Vacuum tubes
– 1946 – 1955
• The second generation
– Transistors
– 1955 – 1965
• The third generation
– 1965 – 1980
– Integrated circuits
• The current generation
– 1980 - …
– Personal computers
• What’s the next?
– Quantum computers?
– Memristor?
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 8
Classes of Computers
• Personal computers
– General purpose, variety of software
– Subject to cost/performance tradeoff
• Server computers
– Network based
– High capacity, performance, reliability
– Range from small servers to building sized
• Supercomputers
– High-end scientific and engineering
calculations
– Highest capability but represent a small
fraction of the overall computer market
• Embedded computers
– Hidden as components of systems
– Stringent power/performance/cost
constraints
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 9
Post PC era
The number of devices (millions) shipped - source: statista.com
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 10
Modern computer components
• Same components for all kinds
• Components
– Processor
• Datapath
• controller
– Memory
• Main memory
• Cache
– Input/Output
• User-interface
• Network
• Storage
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 11
Below your program
• Application software
– Written in high-level language
• System software
– Compiler: translates HLL code to
machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing
resources
• Hardware
– Processor, memory, I/O
controllers
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 12
Levels of Program Code
• High-level language
– Level of abstraction closer to
problem domain
– Provides for productivity and
portability
• Assembly language
– Textual representation of
instructions
• Hardware representation
– Binary digits (bits)
– Encoded instructions and
data
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 13
Technology trends
• Thanks to electronics and
material technologies
– Increased capacity and
performance
DRAM capacity
– Reduced cost
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 14
PERFORMANCE EVALUATION
15
Defining performance
• Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50
0 500 1000 1500 0 100000 200000 300000 400000
Cruising Speed (mph) Passengers x mph
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 16
Response Time and Throughput
• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput affected by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 17
Relative performance
1
Performance =
Execu on me
• Computer X is n times faster than Computer Y
PerformanceX Execu on meY
= =n
PerformanceY Execu on meX
• Example: time take to run a program
– 10s on A and 15s on B
15s Execu onB
– A is 1.5 × faster than B because Execu onA = 10s = 1.5 ×
ti
ti
ti
ti
ti
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 18
ti
ti
ti
Measuring time
• Elapsed time
– Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
– Determines system performance
• CPU time
– Time spent processing a given job
• Discounts I/O time, other jobs’ shares
– Comprises user CPU time and system CPU time
– Different programs are affected differently by CPU and
system performance
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 19
Measuring CPU time
• Operations of digital hardware (including CPU/processor) governed
by a constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
• Clock period (T): duration of a clock cycle
– s, ms, μs, ns
1
• Clock rate/frequency (F = T ): the number of cycles per second
– Hz, KHz, MHz, GHz
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 20
CPU time
• Performance improved by
– Reducing number of clock cycles
– Increasing clock rate
– Hardware designer must often trade off clock rate against
cycle count
CPU Time = CPU Clock Cycles × Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 21
CPU time example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
– Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
CPU Clock CyclesA CPU Clock CyclesA
CPU TimeA = = = 10s
Clock RateA 2.0GHz
CPU Clock CyclesB 1.2 × CPU Clock CyclesA
CPU TimeB = = = 6s
Clock RateB Clock RateB
⇒ Clock RateB = 4.0GHz
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 22
Instruction count & CPI
• Instruction Count for a program
– Determined by program, ISA and compiler
• Average cycles per instruction
– Determined by CPU hardware
– If different instructions have different CPI
• Average CPI affected by instruction mix
Clock Cycles = Instruc on count × Cycles per Instruc on
CPU Time = Instruc on count × Cycles per Instruc on × Clock Cycle Time
Instruc on count × Cycles per Instruc on IC × CPI
= =
Clock Rate Clock Rate
ti
ti
ti
ti
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 23
ti
ti
Example
• Which is faster, and by how much?
– Computer A: Cycle Time = 250ps, CPI = 2.0
– Computer B: Cycle Time = 500ps, CPI = 1.2
– Same ISA, compiler
CPU TimeA = ICA × CPIA × Cycle TimeA
= IC × 2.0 × 250ps
CPU TimeB = ICB × CPIB × Cycle TimeB
= IC × 1.2 × 500ps
CPU TimeB IC × 600ps
⇒ = = 1.2 ×
CPU TimeA IC × 500ps
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 24
Mixed instructions CPI
• CPI for instructions/operations may vary
– e.g.,: multiplication takes more cycles than addition
• More precise CPU clock cycles should take instruction
types into account
n
∑
Clock cycles = (CPIi × Instruc on counti)
i=1
• Weighted average CPI
n
Clock cycles Instruc on counti
∑
CPI = = (CPIi × )
Instruc on count Instruc on count
i=1
ti
ti
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 25
ti
ti
Example
• Question: two implementations of an application that use
instructions in classes A, B, and C as follows. Which one is better?
– Implementation 1 uses 2 A, 1 B, and 2 C
– Implementation 2 uses 4 A, 1 B, and 1 C
– CPIs for A, B, and C are 1, 2, and 3, respectively
• Answer:
– Implementation 1: clock cycles1 = 2 × 1 + 1 × 2 + 2 × 3 = 10
• IC = 5, wCPI = 2.0
– Implementation 2: clock cycles2 = 4 × 1 + 1 × 2 + 1 × 3 = 9
• IC = 6, wCPI = 1.5
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 26
Exercise
• A program is executed on a 2 GHz CPU. The program consists of
1000 instructions in which:
– 30% load/store instructions, CPI = 2.5
– 10% jump instructions, CPI = 1
– 20% branch instructions, CPI = 1.5
– The rest are arithmetic instructions, CPI = 2.0
a) What is execution time (CPU time) of the program?
b) What is the weighted average CPI of the program?
c) If load/store instructions are improved so that their execution
time is reduced by a factor of 2, what is the speed-up of the
system?
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 27
Performance summary
• The BIG picture (take home message)
Instruc ons Clock cycles Seconds
CPU me = × ×
Program Instruc on Clock cycle
• Performance depends on
– Algorithm: IC, possibly CPI
– Programming language: IC, CPI
– Compiler: IC, CPI
– Instruction set architecture: IC, CPI, T
ti
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 28
ti
ti
Power trends
• In CMOS technology
Power(P) = Capaci ve load × Voltage2 × Clock rate
10000 3600 120
3900
2000 2667 3300 3400
100
1000
frequency 103
95
Frequency (MHz)
200 87 80
Power (W)
66 75.3 77
100 65 60
25
16 power
12.5 40
29.1
10
10.1 20
3.3 4.1 4.9
1 0
Pentium Pro
Pentium 4
Willamette
Core i5 Ivy
Pentium 4
Prescott
Skylake
Core i5
Kentsfield
Pentium
Clarkdal
e (2010)
(2004)
(2015)
Core i5
Bridge
(1982)
(1985)
(1989)
(1993)
(1997)
(2001)
(2012)
80286
80386
80486
Core 2
(2007)
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 29
ti
Reducing power
• Suppose a new CPU has
– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction
Pnew Cold × 0.85 × (Vold × 0.85)2 × Fold × 0.85 4
= = 0.85 = 0.52
Pold Cold × V old × Fold
2
• The power wall
• We can’t reduce voltage further
• We can’t remove more heat
• How else can we improve performance?
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 30
Multiprocessors
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 31
Benchmark
• Programs used to measure performance
– Supposedly typical of actual workload
• Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006
– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
– Normalize relative to reference machine
– Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 32
Example
• Intel core i7 920 results with CINT 2006
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 33
Concluding remarks
• Cost/performance is improving
– Due to underlying technology development
• Hierarchical layers of abstraction
– In both hardware and software
• Instruction set architecture
– The hardware/software interface
• Execution time: the best performance measure
• Power is a limiting factor
– Use parallelism to improve performance
Computer Architecture (c) Cuong Pham-Quoc@HCMUT 34