Computer Architecture
Ngo Lam Trung & Pham Ngoc Hung
Faculty of Computer Engineering
School of Information and Communication Technology (SoICT)
Hanoi University of Science and Technology
E-mail: [trungnl, hungpn]@soict.hust.edu.vn
IT3030E Fall 2022 1
Course administration
❑ Instructor: Ngo Lam Trung/Pham Ngoc Hung
803 B1, SoICT, HUST
❑ Text: [Required] Computer Organization and
Design, 5th edition revised printing
Patterson & Hennessy 2014.
[Optional] Computer Organization and
Architecture, 10th Edition, William Stalling
❑ Slides: pdf
❑ Schedule: as in timetable
IT3030E Fall 2022 2
Course content
❑ Chapter 1: Introduction
❑ Chapter 2: Computer Arithmetic
❑ Chapter 3: Instruction Set Architecture
❑ Chapter 4: CPU
❑ Chapter 5: Memory
❑ Chapter 6: I/O system
❑ Chapter 7: Multicores and multiprocessors
IT3030E Fall 2022 3
Computers are so important
❑ Current modern life
l Industrial revolutions, the 3rd (Automation) and the 4th (Digital
revolution).
l Cell phones, the Internet, Grab, Google Maps...
l WWW, search engines, social networks, e-commerce…
l Robotics, EV, UAV, self-driving cars,…
❑ Future
l Tailored medical care based on individual genome.
l Super-human: transfer human’s brain to a mechanical body
(robot) for interstellar traveling (The Matrix franchise, Michio
Kaku, Physics of the Future 2011 and The Future of the Mind
2015).
l …many more
IT3030E Fall 2022 4
Outcomes from this course
❑ Computer Architecture and Organization
l Understanding of basic computer system organization.
l Abstraction and instruction set architecture: how high-level
language programs translate into computer language programs,
and how hardware execute the latter programs.
l Hardware/software interface, and how software instructs
hardware to perform functions.
❑ Computer performance
l How to evaluate performance
l Basic techniques to improve computer performance.
IT3030E Fall 2022 5
Study guide
❑ Do read the textbook!
❑ Attend class regularly, stay focused.
❑ Comprehend all exercises and homework.
❑ Old-school approach: pen and paper for doing exercise
and taking notes.
❑ Experience in C/C++ will be useful.
❑ Code of conduct:
l No web surfing, music, video, game in class.
l Food is not allowed (water/soft drink OK).
❑ Mid-term exam, Final exam
IT3030E Fall 2022 6
Homework/exercises
❑ MIPS assembly programming
❑ MARS simulator
IT3030E Fall 2022 7
Chapter 1: Introduction
1. Computer Abstraction and Technology
2. Performance Evaluation
[with materials from Computer Organization and Design, 5th Edition,
Patterson & Hennessy, ©2014, MK
IT3030E Fall 2022
and M.J. Irwin’s presentation, PSU 2008] 8
1. Computer Abstraction and Technology
❑ What is a computer?
❑ Computer classification
❑ Computer generations
❑ The key of computer evolution: IC making technology
❑ Computer organization
IT3030E Fall 2022 9
1. Computer Abstraction and Technology
❑ What is a computer?
❑ A machine that
l Accepts input data
l Processes data by executing a stored program
l Produces output
❑ Which one is computer?
IT3030E Fall 2022 10
Classes of Computers
❑ Supercomputers
l Super fast + expensive for high-end applications
❑ Server
l Network based
l High capacity, performance, reliability
l Range from small servers to building sized
❑ Desktop computers (Personal Computers)
l General purpose, variety of software
l Subject to cost/performance tradeoff
❑ Embedded computers
l Hidden as components of systems
l Stringent power/performance/cost constraints
IT3030E Fall 2022 11
Dominant look and feel of computer classes
Embedded
PC
Server
Super computer
IT3030E Fall 2022 12
Price/performance of computer classes
Super $Millions
Mainframe
$100s Ks
Server $10s Ks
Differences in scale,
not in substance Workstation $1000s
Personal $100s
Embedded $10s
IT3030E Fall 2022 13
A brief history of computers
❑ 1st generation: Vacuum tubes
l ENIAC: 1st general purpose computer
l UNIVAC: 1st commercial computer
IT3030E Fall 2022 15
A brief history of computers
❑ 2nd generation: transistor
❑ Computer became smaller and faster
IBM System/360
IT3030E Fall 2022 16
A brief history of computers
❑ Later generations: IC and VLSI
❑ Increasing price/performance
❑ Moore’s law
W.Stallings, COA, 10th edition
IT3030E Fall 2022 17
Post-PC era
❑ PDA, smart phone, tablet…
❑ Smart TV, set top box…
❑ Cloud computing (AMZ EC2, cloud gaming…)
The number manufactured per year of tablets and smart phones
IT3030E Fall 2022 18
Eight important ideas
Design for Simplification Make common Performance
Moore’s law via abstraction cases fast via Parallelism
Performance Performance Memory Dependability
via Pipelining via Prediction hierarchy via
redundancy
IT3030E Fall 2022 19
What’s below your program?
❑ High-level language program (in C)
swap (int v[], int k)
{ int temp;
temp = v[k];
v[k] = v[k+1]; one-to-many
v[k+1] = temp;
C compiler
}
❑ Assembly language program (for MIPS CPU)
swap: sll $2, $5, 2
add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2) one-to-one
sw $16, 0($2)
sw $15, 4($2)
assembler
jr $31
❑ Machine (object, binary) code (for MIPS CPU)
000000 00000 00101 0001000010000000
000000 00100 00010 0001000000100000
. . .
IT3030E Fall 2022 20
Levels of Program Code
❑ High-level language
l Level of abstraction closer to
problem domain
l Provides for productivity and
portability
❑ Assembly language
l Textual representation of
instructions
❑ Hardware representation
l Binary digits (bits)
l Encoded instructions and
data
IT3030E Fall 2022 21
Hardware/software interface: below your program
❑ Application software
l Written in high-level language (HLL)
❑ System software
l Compiler: translates HLL code to
machine code
l Operating System: service code
- Handling input/output
- Managing memory and storage
- Scheduling tasks & sharing resources
❑ Hardware
l Processor, memory, I/O controllers
IT3030E Fall 2022 22
Computer Organization
❑ Computer’s basic operation
l Input data
l Process data by executing stored program
l Output data
❑ What are required components of computer?
l For data input:
l For storing information:
l For program execution and data processing:
l For data output:
IT3030E Fall 2022 23
Computer Organization
❑ Five classic components of a computer – input, output,
memory, datapath, and control
❑ datapath +
control =
processor
(CPU)
IT3030E Fall 2022 24
A similar view
❑ Usually, the Link unit is hidden
Memory
Control Input
Processor Link Input/Output
Datapath Output
CPU To/from network I/O
IT3030E Fall 2022 25
Opening the box: anatomy of computer
IT3030E Fall 2022 26
Inside the Processor (CPU)
❑ Datapath: performs operations on data
❑ Control: sequences datapath, memory, ...
❑ Cache memory
AMD Barcelona: 4-core processor
IT3030E Fall 2022 27
Key to computer evolution: IC making technology
The chip manufacturing process
IT3030E Fall 2022 28
Video: How an IC is made
IT3030E Fall 2022 29
Moore’s Law
IT3030E Fall 2022
How do we benefit from this? 30
Key to computer evolution: IC making technology
❑ Electronics technology continues to evolve
l Increased capacity and performance
l Reduced cost
[Textbook]
IT3030E Fall 2022 31
2. Computer performance evaluation
❑ What is performance?
❑ A storage system
l How much time to find a file/object?
l How much time to transfer a file?
l How many files can be served simultaneously?
❑ A web server
l How fast a request can be served?
l How many request can be served per second?
❑ Different criteria to define performance
l Throughput
l Response time
❑ We focus on response time
IT3030E Fall 2022 32
2. Computer performance evaluation
❑ Response time:
l System performance: elapsed time on unload system
l CPU performance: user CPU time, the time that CPU actually
spent on executing user program.
❑ To maximize performance, need to minimize execution
time
performanceX = 1 / execution_timeX
If computer X is n times faster than Y, then
performanceX execution_timeY
-------------------- = --------------------- = n
performanceY execution_timeX
IT3030E Fall 2022 33
Relative Performance Example
❑ If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds, how
much faster is A than B?
We know that A is n times faster than B if
performanceA execution_timeB
-------------------- = --------------------- = n
performanceB execution_timeA
The performance ratio is 15
------ = 1.5
10
Assume performance of B is 1, then performance of A
is 1.5
IT3030E Fall 2022 34
Performance Factors
❑ CPU execution time (CPU time) – time the CPU spends
working on a task
Does not include time waiting for I/O or running other programs
CPU execution time = # CPU clock cycles x clock cycle time
for a program for a program
= #-------------------------------------------
CPU clock cycles for a program
clock rate
❑ Can improve performance by reducing either the length
of the clock cycle or the number of clock cycles required
for a program
IT3030E Fall 2022 35
Review: Machine Clock Rate
❑ Clock rate (clock cycles per second in MHz or GHz) is
inverse of clock cycle time (clock period)
CC = 1 / CR
one clock period
10 nsec clock cycle => 100 MHz clock rate
5 nsec clock cycle => 200 MHz clock rate
2 nsec clock cycle => 500 MHz clock rate
1 nsec (10-9) clock cycle => 1 GHz (109) clock rate
500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate
IT3030E Fall 2022 36
Improving Performance Example
❑ A program runs on computer A with a 2 GHz clock in 10
seconds. What clock rate must computer B run at to run
this program in 6 seconds? Assume that, computer B
will require 1.2 times as many clock cycles as computer
A to run the program.
CPU timeA CPU clock cyclesA
= -------------------------------
clock rateA
CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec
= 20 x 109 cycles
1.2 x 20 x 109 cycles
CPU timeB = -------------------------------
clock rateB
1.2 x 20 x 109 cycles = 4 GHz
clock rateB = -------------------------------
6 seconds
IT3030E Fall 2022 37
Clock Cycles per Instruction
❑ Not all instructions take the same amount of time to
execute
l Average execution time ~ average clock cycles per instruction
# CPU clock cycles # Instructions Average clock cycles
= x
for a program for a program per instruction
❑ Clock cycles per instruction (CPI) – the average number of
clock cycles each instruction takes to execute
A way to compare two different implementations of the same ISA
CPI for this instruction class
A B C
CPI 1 2 3
IT3030E Fall 2022 38
Using the Performance Equation
❑ Computers A and B implement the same ISA. Computer
A has a clock cycle time of 250 ps and an effective CPI of
2.0 for some program and computer B has a clock cycle
time of 500 ps and an effective CPI of 1.2 for the same
program. Which computer is faster and by how much?
Each computer executes the same number of
instructions, I, so
CPU timeA = I x 2.0 x 250 ps = 500 x I ps
CPU timeB = I x 1.2 x 500 ps = 600 x I ps
Clearly, A is faster … by the ratio of execution times
performanceA execution_timeB 600 x I ps
------------------- = --------------------- = ---------------- = 1.2
performanceB execution_timeA 500 x I ps
IT3030E Fall 2022 39
The Performance Equation
❑ Our basic performance equation is then calculated
CPU time = Instruction_count x CPI x clock_cycle
Instruction_count x CPI
= -----------------------------------------------
clock_rate
❑ Key factors that affect performance (CPU execution time)
The clock rate: CPU specification
CPI: varies by instruction type and ISA implementation
Instruction count: measure by using profilers/ simulators
IT3030E Fall 2022 40
Dynamic Instruction Count
How many Each “for” consists of two
instructions are instructions: increment index,
executed in this check exit condition
program fragment? 12,422,450 Instructions
250 instructions
for i = 1, 100 do 2 + 20 + 124,200 instructions
20 instructions 100 iterations
for j = 1, 100 do 12,422,200 instructions in all
40 instructions 2 + 40 + 1200 instructions
for k = 1, 100 do 100 iterations
10 instructions 124,200 instructions in all
endfor 2 + 10 instructions
endfor 100 iterations for i = 1, n
endfor 1200 instructions in while x > 0
Static count = 326 all
IT3030E Fall 2022 41
Improving performance by CPI
Op Freq CPIi Freq x CPIi
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
𝐴𝑣𝑔 𝐶𝑃𝐼 = 𝑓𝑟𝑒𝑞𝑖 ∗ 𝐶𝑃𝐼𝑖 =
❑ How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
❑ What if branch instruction is only one cycle?
❑ What if two ALU instructions could be executed at once?
IT3030E Fall 2022 42
Improving performance by CPI
Op Freq CPIi Freq x CPIi
ALU 50% 1 .5 .5 .5 .25
Load 20% 5 1.0 .4 1.0 1.0
Store 10% 3 .3 .3 .3 .3
Branch 20% 2 .4 .4 .2 .4
𝐴𝑣𝑔 𝐶𝑃𝐼 = 𝑓𝑟𝑒𝑞𝑖 ∗ 𝐶𝑃𝐼𝑖 = 2.2 1.6 2.0 1.95
❑ How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster
❑ What if branch instruction is only one cycle?
CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster
❑ What if two ALU instructions could be executed at once?
CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster
IT3030E Fall 2022 43
How to improve performance?
❑ Shorter clock cycle = faster clock rate
→ latest CPU technology
❑ Smaller CPI
→ optimizing Instruction Set Architecture
❑ Smaller instruction count
→ optimizing algorithm and compiler
❑ To get best performance, multiple criteria are combined
and considered at design time
→ specific CPU for specific class computation problem
IT3030E Fall 2022 44
Faster Clock Shorter Running Time
Suppose addition takes 1 ns
Clock period = 1 ns; 1 cycle
Clock period = ½ ns; 2 cycles Solution
1 GHz
4 steps
20 steps
2 GHz In this example, addition time
does not improve in going from
1 GHz to 2 GHz clock
Faster steps do not necessarily mean
shorter travel time.
IT3030E Fall 2022 45
Measuring/benchmarking PC performance
❑ SPEC CPU benchmark
l Started in 1989
l SPEC CPU2006: 12 integer, 17 floating point benchmarks
l Reference machine: Sun Ultra Enterprise 2 (1997) running on a
296 MHz UltraSPARC II CPU.
FIGURE 1.18 SPECINTC2006 benchmarks running on a 2.66 GHz Intel Core i7 920.
IT3030E Fall 2022 46
End of chapter 1
IT3030E Fall 2022 47