ECE 5367
4436
Introduction to Computer Architecture
and Design
Ji Chen
Section : T TH 1:00PM 2:30PM
Prerequisites: ECE 4436
ECE 5367
4436
Instructor:
Ji Chen
Email: jchen18@uh.edu
Tel: (713)-743-4423
Office: W328
Office Hour: T TH 2:30-3:30 or
by appointment
TA:
None
ECE 5367
4436
ECE 5367
4436
Course Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction, basic computer organization
Instruction formats, instruction sets and their design
ALU design: Adders, subtracters, logic operations
Multiplication, division, floating point arithmetic
Datapath design
Control design: Hardwired control, microprogrammed control
Pipelining
Memory systems
I/O
ECE 5367
4436
Web: http://www.egr.uh.edu/courses/ece/ECE5367/
Grading
HW/Quiz/Lab
10 %
Project
15 %
Exam 1
25 %
Exam 2
25 %
Exam 3
25 %
Academic Honesty Statement
ECE 5367
4436
Computer Organization and Design: The Hardware/Software
Interface
by David A. Patterson, John L. Hennessy, 3rd edition
Required
NOT REQUIRED
ECE 5367
4436
Home works/quiz: There will be several graded homework/lab
assignments.Home works
Labs:
turned in late will be
accepted only under extraordinary circumstances.
Laboratory assignments may be worked in teams of two (2);
however, there should be no collaboration between teams ..
Lab assignments turned in late will be penalized 25 points for each calendar
day.
Both students in a team will receive the same grade for the project.
Projects:
Teams of four (4): describe computer architecture of a modern technology
Exams:
you have
two mid-term exams, and one final exam.
A missed exam will result in a grade of zero Let me know immediately if
any situation
Final Exam - TBD
Grading: Your final grade will be computed as follows:
HW/Quiz/Lab
10 %
Project
15 %
Exam 1
25 %
Exam 2
25 %
Exam 3
25 %
ECE 5367
4436
Since 1946 all computers have had 5 components
Processor
Input
Control
Memory
Datapath
Output
ECE 5367
4436
TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
MBus Module
SuperSPARC
Floating-point Unit
L2
$
Integer Unit
Inst
Cache
Ref
MMU
Data
Cache
Store
Buffer
Bus Interface
Message Bus (Mbus)
CC
MBus
L64852 MBus control
M-S Adapter
SBus
SBus
DMA
SBus
Cards
SCSI
Ethernet
DRAM
Controller
STDIO
serial
kbd
mouse
audio
RTC
Floppy
ECE 5367
4436
Computer Architecture
Application
Operating
System
Compiler
Firmware
Instr. Set Proc. I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
Layout
Coordination of many levels of
abstraction
Under a rapidly changing set of forces
Design, Measurement, and Evaluation
ECE 5367
4436
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Operating
Systems
Cleverness
History
ECE 5367
4436
Mixed-Signal
Where are We Going??
ECE 5367
4436
In p u t
M u ltip lie r
In p u t
M u ltip lic a n d
32
M u lt ip lic a n d
R e g is t e r
<<1
32
34
34
32=>34
s ig n E x
34 x2 M U X
34
34
M u lt i x 2 / x 1
Arithmetic
S u b /A d d
3 4 -b it A L U
C o n tro l
L o g ic
34
32
LoadHI
L O r e g is te r
( 1 6 x 2 b it s )
32
R e s u lt [ H I ]
Prev
Booth
Encoder
H I r e g is te r
( 1 6 x 2 b it s )
LO[1]
S h ift A ll
LoadLO
ClearHI
32
Extra
2 bits
Single/multicycle
Datapaths
LoadM p
32=>34
s ig n E x
E N C [2 ]
E N C [1 ]
E N C [0 ]
L O [1 : 0 ]
32
R e s u lt [L O ]
1000
Exec Mem WB
IFetchDcd
Exec Mem WB
Performance
100
Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
9%/yr.
DRAM (2X/10
yrs)
19
19
80
81
19
19
82
19
83
84
19
85
19
86
19
19
87
88
19
19
89
90
19
91
19
92
19
19
93
94
19
95
19
96
19
19
97
98
19
99
20
00
IFetchDcd
ECE 5367
Spring 08
Moores Law
Proc
CPU 60%/yr.
(2X/1.5yr)
Time
IFetchDcd
Exec Mem WB
IFetchDcd
Exec Mem WB
Pipelining
I/O
Memory Systems
ECE 5367
4436
Purchasing perspective
Given a collection of machines, which has the
Best performance ?
Least cost ?
Best performance / cost ?
Design perspective
Faced with design options, which has the
Best performance improvement ?
Least cost ?
Best performance / cost ?
Both require
basis for comparison
metric for evaluation
Our goal: understand cost & performance implications of
architectural
choices
Two Notions of Performance
ECE 5367
4436
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
Concorde
3 hours
1350 mph
132
178,200
Which has higher performance?
Time to do the task (Execution Time)
execution time, response time, latency
Tasks per day, hour, week, sec, ns. .. (Performance)
throughput, bandwidth
Response time and throughput often are in opposition
ECE 5367
4436
Definitions
Performance is in units of things-per-second
bigger is better
If we are primarily concerned with response time
performance(x) =
1
execution_time(x)
" X is n times faster than Y" means
Performance(X)
---------------------Performance(Y)
Example
ECE 5367
4436
Time of Concorde vs. Boeing 747?
Concord is 1350 mph / 610 mph = 2.2 times faster
= 6.5 hours / 3 hours
Throughput of Concorde vs. Boeing 747 ?
Concord is 178,200 pmph / 286,700 pmph
Boeing is 286,700 pmph / 178,200 pmph
= 0.62 times faster
= 1.60 times faster
Boeing is 1.6 times (60%) faster in terms of throughput
Concord is 2.2 times (120%) faster in terms of flying time
We will focus primarily on execution time for a single job
Lots of instructions in a program => Instruction throughput important!
ECE 5367
4436
CPU
== Seconds
CPU
Seconds
Performance
Performance Program
Program
==Instructions
xx Seconds
Instructions xx Cycles
Cycles
Seconds
Program
Instruction
Cycle
Program
Instruction
Cycle
ECE 5367
4436
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Performance w/ E
Speedup(E) = -------------------- = --------------------ExTime w/ E
Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task
by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) x ExTime(without E)
Speedup(with E) =
1
(1-F) + F/S
ECE 5367
4436
Base Machine
Op
ALU
Load
Store
Branch
Freq
50%
20%
10%
20%
Typical Mix
Cycles
1
5
3
2
CPI(i)
.5
1.0
.3
.4
2.2
% Time
23%
45%
14%
18%
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
How does this compare with using branch prediction to save a
cycle off the branch time?
What if two ALU instructions could be executed at once?