1
CS-506 ADVANCED COMPUTER
SYSTEMS ARCHITECTURE
Lecture# 5
Gul Munir Ujjan
Assistant Professor
CISE Department, NEDUET
Karachi
2
Memory Hierarchy
&
Storage Types
3
Memory Systems
Flat Memory VS Memory Hierarchy
Memory Hierarchy: multiple levels of storage, each optimized for
its assigned task.
In Memory Hierarchy:
Performance should be approaching that of the fastest
component,
Cost per bit should be approaching that of the
cheapest component,
and energy consumption per access should be
approaching that of the least power-hungry
component.
Memory Hierarchy System 4
Processor
Control
Memory
Memory
Memory
Memory Memory
Datapath
Speed: Fastest Slowest
Size: Smallest Biggest
Cost: Highest Lowest
5
Memory Hierarchy Principles
The speed of DRAM and CPU complement
each other
Organize memory in hierarchy, based on the
Concept of Caching (to be discussed
ahead later in Cache memory)
Principle of Locality
Principle of Locality 6
Advantage of the principle of locality
An average access speed that is very close to the
speed that is offered by the fastest technology and
Cost that is very close to the cheapest technology
7
Memory Hierarchy System
Locality of Reference
We think linearly (in steps), and so we program the computer to solve problems by
working in steps.
The practical implications of this are that a computer’s use of the memory system
tends to be non-random and highly predictable.
The concept of locality of reference, so named because memory references tend to be
localized in time and space:
If you use something once, you are likely to use it again. (temporal
locality)
If you use something once, you are likely to use its neighbor
(spatial locality)
Levels of the Memory Hierarchy 8
Upper Level
Capacity, Access Time, Cost Staging
Transfer Unit
faster
100s Bytes, <1 ns/
$.1-.01/bit Registers
prog./compiler
Instr. Operands 1-8 bytes
K Bytes, 1-10 ns
$.01-.001/bit Cache
cache control
Blocks 8-128 bytes
M Bytes, 10ns-
100ns Main Memory
$.001-.0001 OS
Pages 512-4K bytes
G Bytes, 1-10ms Disk
1-0.1 1 cents
Files
user/operator
Infinite, sec Mbytes Larger
0.1 - .010 C
Tape Lower Level
-6
9
Memory Hierarchy Pyramid
Smaller, L0: CPU registers hold words retrieved
faster, registers
and from L1 cache.
costlier L1: on-chip L1 L1 cache holds cache lines
(per byte) cache (SRAM) retrieved from the L2 cache
storage memory.
devices L2: off-chip L2 L2 cache holds cache
cache (SRAM) lines retrieved from
main memory.
L3: main memory Main memory holds disk
Larger, (DRAM) blocks retrieved from local
slower, disks.
Local disks hold
and
files retrieved
cheaper
L4: local secondary storage from disks on
(per byte)
(local disks) remote network
storage
servers.
devices
remote secondary storage
L5: (distributed file systems, Web servers)
10
Hierarchy List
Register File Level 0 Datapath
L1 Level 1 Cache on Chip
L2 Level 2 External Cache
Main memory Level 3 System Board DRAM
Disk cache Level 4 Disk drive
Disk Level 5 Magnetic disk
Optical Level 6 CDs etc- bulk storage
Tape Level 7 Huge cheapest Storage
Storage Types in Memory System 11
Classification of memory systems based on different
attributes and their design
Attributes:
Material – Semiconductor, magnetic, optical
Accessing – Random, Sequential, Hybrid
Store/Retrieve – ROM, RAM and RWM
Data retention – Volatile and Non-volatile
Storage Types in Memory System 12
The semiconductor memories such as registers,
Static RAM and Dynamic RAM are fast in access
speed but are expensive
Used in small size and placed either inside or
closest to the processor
Secondary storage devices which offer huge storage
at lowest cost per bit are placed farthest away from
the processor
13
Random-Access Memory (RAM)
Key features
RAM is packaged as a chip.
Basic storage unit is a cell (one bit per cell).
Multiple RAM chips form a memory
Types of RAM
Static RAM (SRAM) - Cache
Dynamic RAM (DRAM)
Static RAM: Basic Cell 14
6-Transistor SRAM Cell word
word
0 1 (row select)
0 1
bit bit
bit bit
replaced with pullup
to save area
SRAM Organization: 16-word x 4-bit 15
Din 3 Din 2 Din 1 Din 0
WrEn
Precharge
Wr Driver & Wr Driver & Wr Driver & Wr Driver &
- Precharger + - Precharger + - Precharger + - Precharger +
Word 0 A0
SRAM SRAM SRAM SRAM
Address Decoder
Cell Cell Cell Cell A1
Word 1 A2
SRAM SRAM SRAM SRAM
Cell Cell Cell Cell A3
: : : :
Word 15
SRAM SRAM SRAM SRAM
Cell Cell Cell Cell
- Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp +
Dout 3 Dout 2 Dout 1 Dout 0
Q: Which is longer: word line or bit line?
16
Dynamic Random Access Memory (DRAM)
Each cell stores bit with a capacitor and
transistor.
Value must be refreshed every 10-100 ms.
Sensitive to disturbances.
Slower and cheaper than SRAM.
Basic DRAM Cell 17
18
DRAM Organization 16 words x 8bit
16 super cells (words) of size 8 bits
16 x 8 DRAM chip
cols
0 1 2 3
2 bits 0
/
addr
1
rows
memory
2 supercell
controller
(to CPU) (2,1)
8 bits 3
/
data internal row
buffer
19
Reading DRAM Supercell (2,1)
Step 1(a): Row access strobe (RAS) selects row 2.
Step 1(b): Row 2 copied from DRAM array to row buffer.
16 x 8 DRAM chip
cols
0 1 2 3
RAS =2
2
0
/
addr
1
rows
memory
controller 2
8 3
/
data
internal row buffer
20
Reading DRAM Supercell (2,1)
Step 2(a): Column access strobe (CAS) selects column 1.
Step 2(b): Supercell (2,1) copied from buffer to data lines, and
eventually back to the CPU.
16 x 8 DRAM chip
cols
CAS = 1 0 1 2 3
2
0
/
addr
To CPU 1
memory rows
controller 2
supercell 8 3
/ supercell
(2,1) data (2,1)
internal row buffer
64 MB DRAM Memory Module [8x8MB 21
DRAM Chips]
addr (row = i, col = j)
DRAM 0 : supercell (i,j)
DRAM 7
bits bits bits bits bits bits bits bits
56-63 48-55 40-47 32-39 24-31 16-23 8-15 0-7
63 56 48 40 32 24 16 8 0
Memory
controller
64-bit double word at main memory address A
64-bit doubleword
22
Nonvolatile Memories
DRAM and SRAM are volatile memories
Lose information if powered off.
Nonvolatile memories retain value even if powered
off.
Generic name is read-only memory (ROM).
Misleading because some ROMs can be read and
modified.
23
Nonvolatile Memories
Types of ROMs
Programmable ROM (PROM)
Erasable Programmable ROM (EPROM)
Electrically Erasable PROM (EEPROM)
Flash memory
Firmware
Program stored in a ROM
Boot time code, BIOS (basic input/output system)
graphics cards, disk controllers.
24
Disk Storage - Disk Geometry
tracks
surface
track k gaps
spindle
sectors
25
Disk Geometry (Muliple-Platter View)
Aligned tracks form a cylinder.
cylinder k
surface 0
surface 1 platter 0
surface 2
platter 1
surface 3
surface 4
surface 5 platter 2
spindle
26
Disk Operation (Multi-Platter View)
read/write heads
move in unison
from cylinder to cylinder
arm
spindle
27
Concept of Caching
Staging area or temporary-place to:
store frequently-used subset of the data
or instructions from the relatively
cheaper, larger and slower memory; and
To avoid having to go to the main
memory every time this information is
needed
28
Caching and Memory Hierarchy
Memory devices of different type are used for
each value k – the device level
the faster, smaller device at level k, serves as
a cache for the larger, slower device at level
k+1
The programs tend to access the data or
instructions at level k more often than they
access the data at level k+1
29
Caching and Memory Hierarchy
Storage at level k+1 can be slower, but larger and
cheaper per bit
A large pool of memory that costs as much as the
cheap storage at the highest level (near the bottom
in hierarchy)
serves data or instructions at the rate of the fast
storage at the lowest level (near the top in hierarchy)
30
Intel Processor Cache
80386 – no on chip cache
80486 – 8k byte lines
Pentium (all versions)
– Two on chip L1 caches
Data & instructions
Pentium 4
L1 caches Two 8k bytes
L2 cache 256k
Feeding both L1 caches
31
Cache Devices
Cache device is a small SRAM which is made
directly accessible to the processor; and
DRAM, which is accessible by the cache as well as
by the user or programmer, is placed at the next
higher level as the Main-Memory
Larger storage such as disk, is placed away from the
main memory
32
Cache Organization
Main
Memory
33
Caching in a Memory Hierarchy
Level k:
Smaller, faster, more
8
4 9 14
10 3 expensive device at
level k caches a
Data is copied between subset of the blocks
levels in block-sized (say 4 blocks) from
transfer units 10
4
level k+1
0 1 2 3
Larger, slower,
4 5 6 7
cheaper storage
8 9 10 11 device at level k+1
12 13 14 15 is partitioned into
blocks (say 16
Level k+1: blocks)
34
Cache Organization
35
Memory Hierarchy Basics
Four Memory Hierarchy Questions
Q1: Where can a block be placed in the upper level?
(block placement)
Q2: How is a block found if it is in the upper level?
(block identification)
Q3: Which block should be replaced on a miss?
(block replacement)
Q4: What happens on a write?
(write strategy)
36
Memory Hierarchy Basics
Q1: Where can a block be placed in a cache?
Block placement
• Direct mapped
Only one place
• Fully Associative
Anywhere in the cache
• Set Associative
A block is mapped to a
set and then can be
placed anywhere within
that set
37
Memory Hierarchy Basics
Q2: How is a block found if it is in the cache?
Caches have an address tag on each block frame that
gives the block address.
The tag of every cache block that might contain the
desired information is checked to see if it matches the
block address from the processor.
All possible tags are searched in parallel because speed is
critical.
A valid bit is added to the tag to tell whether or not this
entry contains a valid address
38
Memory Hierarchy Basics
Which block should be replaced on a Cache miss?
For Set- and Fully-Associative caches
Random
► Candidate blocks are randomly selected. Some systems
generate pseudorandom block numbers to get reproducible
behavior.
Least Recently Used (LRU)
► The block replaced is the one that has been unused for the
longest time. If recently used blocks are likely to be used again,
then a good candidate for disposal is the LRU block.
First in First Out
Most Recently Used
Block replacement
Least Frequently Used
• Direct mapped
Trivial - Only one place
39
Memory Hierarchy Basics
What happens on a write?
There are two basic options when writing to the cache:
Write-through
The information is written to both the block in the cache
and to the block in the lower-level memory.
Write-back
The information is written only to the block in the cache.
The modified cache block is written to main memory
only when it is replaced.
Reads dominate processor cache accesses.
All instruction accesses are reads, and most
instructions don’t write to memory.
Write Strategy: Pros and Cons of each 40
Write Back
No write to the lower level for repeated writes to
cache
a dirty bit is commonly used to indicate the status
as the cache block is modified (dirty) or not
modified (clean)
Reduce memory-bandwidth requirements, hence
reduces the memory power requirements
Write Strategy: Pros and Cons of each 41
Write Through
Simplifies the replacement procedure
The block is always clean.
When the processor must wait for writes to complete
during write-through, the processor is said to write stall.
A common optimization to reduce write stalls is a write
buffer, which allows the processor to continue as soon as
the data are written to the buffer, thereby overlapping
processor execution with memory updating.
Write Buffer for Write Through 42
Cache
Processor DRAM
Write Buffer
Write Buffer for Write Through 43
Write buffer is just a FIFO: Typical number of entries: 4
Once the data is written into the write buffer and assuming
a cache hit, the CPU is done with the write.
The memory controller will then move the write buffer’s
contents to the real memory behind the scene
44
Cache Optimizations
Average memory access time
= (Hit time × Hit rate) + Miss penalty × Miss rate
Six Basic Cache Optimizations
Reducing the miss rate
larger block size,
larger cache size, and
higher associativity
Reducing the miss penalty
multilevel caches and
giving reads priority over writes
Reducing the time to hit in the cache
avoiding address translation when indexing the cache