KEMBAR78
Computer Arichitecture | PDF | Parallel Computing | Multi Core Processor
0% found this document useful (0 votes)
92 views60 pages

Computer Arichitecture

Uploaded by

Nguyễn Sĩ Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views60 pages

Computer Arichitecture

Uploaded by

Nguyễn Sĩ Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Computer Architecture

2 1
2 0
J T
Lecture 01 - Introduction U
X@
n
j u Re Pengju Ren
n g of Artificial Intelligence and Robotics
Pe Xi’an Jiaotong University
Institute

http://gr.xjtu.edu.cn/web/pengjuren
Course Administration
Instructor: Pengju Ren
TA: Gelin Fu (Ph.D Candidate)
Lectures: Two 100-minute lectures a week
2 1
2 0
Textbook: Computer Architecture: A Quantitative Approach
6th Edition(2019)J T U
@X
Prerequisite: Digital System Structure and Design
n
j u Re
ng
Pe

2
Preface

“The most beautiful thing we can experience is the


1
mysterious. It is the source of all true art and Science.”
2
0 I believe, 1930
---Albert Einstein,2What
J T U
@ X
n
j u Re
ng
Pe

3
What is Computer Architecture

Application

2 1
2 0 computer
In its broadest definition,
architecture Tis U
the design of the
Gap too large to X J
abstraction/Implementation layers
@ us to implement information
thatnallow
bridge in one step
R e
processing applications efficiently
gj u using available manufacturing
e n technologies.
P
Physics

4
What is Computer Architecture

Application
Algorithm 2 1
Programming Language 20
Operating System/Virtual Machines J TU
Gap too largeSet
Instruction to Architecture (ISA)
@X
bridge in one step
e n
Microarchitecture
j u R
n g
Register-Transfer Level (RTL)
e
Gates
PCircuits
Devices
Physics

5
What is Computer Architecture

Application
Algorithm 2 1
Programming Language 2 0
Operating System/Virtual Machines J TU
Gap too largeSet
Instruction to Architecture (ISA)
@X
bridge in one step
e n
Microarchitecture
j u R This course

n g
Register-Transfer Level (RTL)
e
Gates
PCircuits
Devices
Physics

6
What is Computer Architecture

Application Application Requirement:

Algorithm 2 1
 Suggest how to improve

Programming Language 2 0
architecture
 Provide revenue to fund

Operating System/Virtual Machines J T U


development

Gap too largeSet


Instruction to Architecture (ISA)
@X
bridge in one step
e nArchitecture provides feedback to
Microarchitecture
j u R guide application and technology
research directions

n g
Register-Transfer Level (RTL)
e
Gates
PCircuits Technology Constraints:
 Restrict what can be done efficiently
Devices  New technologies make new arch
Physics possible

7
Computing Devices Then…

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

EDSAC, University of Cambridge, UK, 1949


8
Computing Devices Now

2 1
2 0
J T U
@X
n
j u Re
ng
Pe
Modern computing is as much
about enhancing capabilities as
data processing!
9
Architecture continually changing

Applications suggest Improved


technologies make
how to improve
Applications 2 1
new applications
technology, provide
2 0 possible
revenue to fund
J T U
development
@ X
n
Technology
e
j u R
ng
Pe Cost of software development
makes compatibility a major
force in market

10
Single-Thread(Sequential) Processor Performance
Mulitcore or ManyCore

[ Hennessy & Patterson, 2017 ]


2 1
2 0
J T U
@X
n
j u Re
ng
Pe
RISC

11
Moore’s Law Scaling with Cores

2 1
2 0
J TU
@X
n
j u Re
ng
Pe

12
Global Semiconductor Market

2 1
2 0
J T U
@X
n
j u Re
ng
Pe
The global semiconductor market is estimated at $450 billion USD in revenue for 2020.
Products using these semiconductors represent global revenues of $2 trillion USD, or
around 3.5% of global gross domestic product (GDP)
ISSCC2021‘Feb 13
Advanced Tech nodes continue provide value

2 1
2 0
J T U
@X
n
j u Re
ng
Pe
Steady progress in two-dimensional transistor scaling and a variety of device
enhancement techniques have sustained energy-efficiency improvement and device
density gains from one technology generation to the next
ISSCC2021‘Feb 14
Upheaval in Computer Design
• Most of last 50 years, Moore’s Law ruled
– Technology scaling allowed continual performance/energy
improvements without changing software model
2 1
• Last decade, technology scaling slowed/stopped 2 0
J U
– Dennard (voltage) scaling over (supplyTvoltage ~fixed)

@ X
– Moore’s Law (cost/transistor) over?

e
– No competitive replacementn for CMOS anytime soon

u R
– Energy efficiency constrains
j everything

n
• No “free lunch” g for software developers, must
P
consider:
e
– Parallel systems
– Heterogeneous systems

15
Today’s Dominant Target Systems
• Mobile (smartphone/tablet)
– >1 billion sold/year

in system-on-a-chip (SoC)
2 1
– Market dominated by ARM-ISA-compatible general-purpose processor

2 0
– Plus sea of custom accelerators (radio, image, video, graphics, audio,
motion, location, security, etc.)
J T U
• Warehouse-Scale Computers (WSCs)
@X

n
100,000’s cores per warehouse


j u Re
Market dominated by x86-compatible server chips
Dedicated apps, plus cloud hosting of virtual machines
– ng
Now seeing increasing use of GPUs, FPGAs, custom hardware to
e
P
accelerate workloads
• Embedded computing
– Wired/wireless network infrastructure, printers
– Consumer TV/Music/Games/Automotive/Camera/MP3
– Internet of Things!

16
Course Content Computer Architecture
• Instruction Level Parallelism
– Superscalar
– Very Long Instruction Word (VLIW)
2 1
• Advanced Memory and Caches
2 0
• Data Level Parallelism
J T U
– Vector Machine
@X
– GPU
en
j u R
• Thread Level Parallelism
g
– Multithreading
n
e
– Multiprocessor/Multicore/ManyCore
P
• Warehouse-Scale Computers (Request Level Paral.)
• Domain-Specific Architectures (DNN Accelerator)

Intel Nehalem Processor, Core i7


17
Architecture vs. Microarchitecture

“Architecture”/Instruction Set Architecture:


Programmer visible state (Memory & Register)
2 1
0
Operations (Instructions and how they work)
2
Execution Semantics (interrupts)
Input/Output J T U
Data Types/Sizes @ X
e n
Microarchitecture/Organization:
j u
Tradeoffs on how
R
to implement ISA for some metric
g
n Cost)
(Speed,eEnergy,
P Pipeline depth, number of pipelines, cache
Examples:
size, silicon area, peak power, execution ordering, bus
widths, ALU widths

18
Same Architecture Diff Micro-Architecture

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

19
Diff Architecture Diff Micro-Architecture

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

20
Where do Operands come from and
Where do Results Go ?

Processor
2 1
20
J TU
ALU
@X
n
j u Re
ng
Pe
MEMORY

21
Where do Operands come from and
Where do Results Go ?
Stack Accumulator Reg-Mem Reg-Reg

Processor
Processor

Processor
Processor
2 1
2 0
ALU ALU
J T U
ALU ALU

@X
n
j u Re
ng

MEMORY
MEMORY

MEMORY
MEMORY

… … … …
Pe

22
Where do Operands come from and
Where do Results Go ?
Stack Accumulator Reg-Mem Reg-Reg

Processor
Processor

Processor
Processor
2 1
2 0
ALU ALU
J T U
ALU ALU

@X
n
j u Re
ng

MEMORY
MEMORY

MEMORY
MEMORY

… … … …
Pe
Number Explicitly
0 1 2 or 3 2 or 3
Named Operands
23
Stack-Based Instruction Set Architecture(ISA)
Stack
Burrough’s B5000 (1960)
Processor
•  Burrough’s B6700
2 1
•  HP 3000
2 0
ALU
•  ICL 2900
J T U
@X
•  Symbolics 3600
•  Inmos Transputer
n
j u Re
Modern
•  Forth machines
ng
MEMORY


Pe •  Java Virtual Machine
•  Intel x87 Floating Point Unit

24
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

25
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

26
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

27
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

28
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

29
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

30
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

31
Evaluation of Expressions

2 1
20
J T U
@X
n
j u Re
ng
Pe

32
Hardware Organization of the Stack

2 1
Stack is part of the processor state 2 0
T
stack must be bounded and small
J U
X
≈ number of Registers, not the size of main memory
@
n
Conceptually stack is unbounded

j u Re
a part of the stack is included in the

ng
processor state; the rest is kept in the main memory
Pe

33
Stack Operations/Implicit Memory References

1
Suppose the top 2 elements of the stack are kept in registers and
2
the rest is kept in the memory.
2 0
Each push operation 1 memory
J T U reference
pop operation 1@ X
memory reference
e n
u R
Better performance byjkeeping the top N elements in registers, and
g
memory referencesnare made only when register stack overflows or
underflows. P e

34
Stack Size and Memory References

2 1
2 0
J T U
@X
n
j u Re
ng
Pe
Four Store and Fetch

35
Stack Size and Memory References

2 1
20
J TU
@X
n
j u Re
ng
Pe

36
Where do Operands come from and
Where do Results Go ?
Stack Accumulator Reg-Mem Reg-Reg

Processor
Processor

Processor
Processor
2 1
2 0
ALU ALU
J T U
ALU ALU

@X
n
j u Re
ng

MEMORY
MEMORY

MEMORY
MEMORY

… … … …
Pe
Push A Load R1, A
Load A Load R1 A
C= A+B Push B Load R2, B
Add B Add R3 R1, B
Add Add R3, R1, R2
Store C Store R3, C
Pop C Store R3, C 37
Classes of Instructions

•  Data Transfer
2 1
– LD, ST, MFC1, MTC1, MFC0, MTC0
•  ALU 2 0
J TU
– ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI
•  Control Flow X
@ERET
•  Floating Point Re
n
– BEQZ, JR, JAL, TRAP,

g j u
– ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W,
e n (SIMD)
•  Multimedia
P SUB.PS, MUL.PS, C.LT.PS
– ADD.PS,
•  String
– REP MOVSB (x86)

38
Addressing Modes: How to get operands from
Memory (MIPS)

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

** May not actually access memory 39


ISA Encoding

2 1
20
J TU
@X
n
j u Re
ng
Pe

40
Case study: X86(IA-32) Instruction Encoding

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

41
RISC-V Instruction Encoding(1)

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

42
RISC-V Instruction Encoding(2)

New open-source, license-free


ISA spec
2 1
• Supported by growing shared 2 0
software ecosystem
J T U
• Appropriate for all levels of
@X
computing system, from n
microcontrollers to
j u Re
supercomputers
ng
Pe
• 32-bit, 64-bit, and 128-bit
variants (we’re using 32-bit in
class, textbook uses 64-bit)

http://www-inst.eecs.berkeley.edu/~cs61c/fa18/img/riscvcard.pdf 43
Real World Instruction Sets

2 1
20
J T U
@X
n
j u Re
ng
Pe

44
Why the Diversity in ISAs?

Application Influenced ISA


•  Instructions for Applications
–  DSP instructions 2 1
•  Compiler Technology has improved 2 0
J TU
–  SPARC Register Windows no longer needed
@ X
–  Compiler can register allocate effectively
n
eISA
Technology Influenced
j u R
g
•  Storage is expensive, tight encoding important
n Set Computer
P e
•  Reduced Instruction
–  Remove instructions until whole computer fits on die
•  Multicore/Manycore
–  Transistors not turning into sequential performance

45
Recap

Application
Algorithm 2 1
Programming Language 2 0
ISA vs Micro-Architecture
ISAUCharacteristics
Operating System/Virtual Machines
XJ T
• Machine Models
Gap too largeSet
to Architecture (ISA)
Instruction
n @ • Encoding
bridge in one step
Microarchitecture
R e
g j u
Register-Transfer Level (RTL)
• Data Types

e n
Gates • Instructions
PCircuits • Addressing Modes
Devices
Physics

46
And in conclusion …
• Computer Architecture >> ISAs and RTL
• Computer Architecture is about interaction of
hardware and software, and design of appropriate 2 1
abstraction layers 2 0
J TU
X
• Computer architecture is shaped by technology and
@
applications
– History provides lessonsR
e n
for the future
• Computer Science j u
g at the crossroads from sequential
e n
to parallelPcomputing
– Salvation requires innovation in many fields, including computer
architecture
• Read Chapter 1 & Appendix A for next time! (6th)
47
2 1
0 2
Next Lecture: RISC-V ISA, Datapath
T U & Control
XJ
@
(ISA and Micro-Architecture)
n
R e
gj u
n
Pe

48
Acknowledgements
• Some slides contain material developed and copyright by:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
2 1
– Joel Emer (Intel/MIT)
20
– James Hoe (CMU)
J T U


David Patterson (UCB)

@ X
David Wentzlaff (Princeton University)

en
u
• MIT material derived
j Rfrom course 6.823
g from course CS252 and CS 61C
• UCB materialnderived
Pe

49
Idealized Uniprocessor Model
Processor names variables:
– Integers, floats, pointers, arrays, structures, etc.
2 1
0
– These are really words, e.g., 64-bit double, 32-bit INTs, bytes, etc.
2
Processor performs operations on those
J T U variables:
– Arithmetic, logical operations, etc.X
n @on values in registers
e
– Only performs these operations
R
Processor controls the
g j u order, as specified by program
– Branches(if), e n
loops(while), function calls, etc.
Idealized Cost
P
– Each operation has roughly the same cost: add, multiply, etc.

50
2 1
Six Great Ideas in Computer U 20
Architecture
XJ T
n @
j u Re
ng
Pe

51
New School Machine Architecture

• Parallel Requests
Software Hardware
2 1
Assigned to computer Harness
Warehouse
Scale
2 0 Smart
Phone
e.g., Search “Cats” Parallelism &
J T
Computer
U
• Parallel Threads
Assigned to core
Achieve High
@X Computer
e.g., Lookup, Ads
Performance
en Core … Core

• Parallel Instructions
j u R Memory (Cache)

n g
>1 instruction @ one time
Input/Output
Instruction Unit(s)
Core
Functional

• Parallel Data
e
e.g., 5 pipelined instructions
P Unit(s)
A0+B0A1+B1A2+B2A3+B3
>1 data item @ one time Main Memory
e.g., Add of 4 pairs of words Logic Gates
• Hardware descriptions
All gates working in parallel at same time

52
Great Idea #1: Abstraction
(Levels of Representation/Interpretation)

High Level Language temp = v[k];


2 1
Program (e.g., C)
Compiler
v[k] = v[k+1];
2 0
Assembly Language
J
lw T U
v[k+1] = temp; Anything can be represented
$t0, 0($2) as a number,
Program (e.g., RISC-V)
Assembler
@X
lw
sw
sw
$t1, 4($2)
$t1, 0($2)
$t0, 4($2)
i.e., data or instructions
Machine Language
n
Re
1000 1101 1110 0010 0000 0000 0000 0000
Program (RISC-V)
Machine
j u 1000 1110 0001 0000 0000 0000 0000 0100

ng
Interpretation 1010 1110 0001 0010 0000 0000 0000 0000

Pe
Hardware Architecture Description 1010 1101 1110 0010 0000 0000 0000 0100
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)

53
Great Idea #2: Moore’s law
(Technique driven development)

2 1
20
J T U
@X
n
j u Re
ng
Pe

54
Great Idea #3: Principle of Locality/
Memory Hierarchy

2 1
2 0
J T U
@X
n
j u Re
ng
Pe

55
Great Idea #4: Parallelism

2 1
20
J TU
@X
n
j u Re
ng
Pe

56
2 1
20
J TU
@X
n
j u Re Gene Amdahl

ng
Pe
Computer Pioneer
Amdahl’s Law

57
Great Idea #5: Performance Measurement
and Improvement

• Matching application to underlying hardware


to exploit: 2 1
0 2
– Locality
J T U
– Parallelism
@X
n
– Special hardware features, like specialized instructions

j u Re
(e.g., matrix manipulation)
• Latency
e ng
P
– How long to set the problem up
– How much faster does it execute once it gets going
– It is all about time to finish

58
Great Idea #6: Dependability via Redundancy

• Redundancy so that a failing piece doesn’t make the whole 2 1


system fail 2 0
J T U
@
1+1=2X 2 of 3 agree

n
j u Re
e ng
P
1+1=2 1+1=2 1+1=1 FAIL!
Increasing transistor density reduces the cost of redundancy

59
Great Idea #6: Dependability via Redundancy

• Applies to everything from datacenters to 2 1


storage to memory to instructors 2 0
J T U
@X
– Redundant datacenters so that can lose 1
datacenter but Internet service stays online
n
j u Re
– Redundant disks so that can lose 1 disk but not lose
data (Redundant Arrays of Independent Disks/RAID)
ng
Pe
– Redundant memory bits of so that can lose 1 bit but
no data (Error Correcting Code/ECC Memory)

60

You might also like