Cache Memory
Cache
• Small amount of fast memory
• Sits between normal main memory
and CPU
• May be located on CPU chip or
module
Cache and Main Memory
LEVELS OF MEMORY
• Level 1 or Register: It is a type of memory in which data
is stored and accepted that are immediately stored in the
CPU. The most commonly used register is Accumulator,
Program counter , Address Register, etc.
• Level 2 or Cache memory: It is the fastest memory that
has faster access time where data is temporarily stored
for faster access.
• Level 3 or Main Memory: It is the memory on which the
computer works currently. It is small in size and once
power is off data no longer stays in this memory.
• Level 4 or Secondary Memory: It is external memory that
is not as fast as the main memory but data stays
permanently in this memory.
Characteristics of Cache Memory
• Cache memory is an extremely fast memory type that acts
as a buffer between RAM and the CPU.
• Cache Memory holds frequently requested data and
instructions so that they are immediately available to the
CPU when needed.
• Cache memory is costlier than main memory or disk
memory but more economical than CPU registers.
• Cache Memory is used to speed up and synchronize with a
high-speed CPU.
Cache operation – overview
• CPU requests contents of memory
location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block
from main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each
cache slot
Cache Read Operation -
Flowchart
Cache Addressing
• Where does cache sit?
– Between processor and virtual memory management
unit
– Between MMU and main memory
• Logical cache (virtual cache) stores data using
virtual addresses
– Processor accesses cache directly, not thorough
physical cache
– Cache access faster, before MMU address translation
– Virtual addresses use same address space for different
applications
• Must flush cache on each context switch
• Physical cache stores data using main memory
physical addresses
Size does matter
• Cost
– More cache is expensive
• Speed
– More cache is faster (up to a point)
– Checking cache for data takes time
Cache/Main Memory
Structure
Typical Cache Organization
Comparison of Cache Sizes
Year of
Processor Type L1 cache L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
High-end server/
IBM SP 2000 64 KB/32 KB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
– i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
– (224=16M)
Direct Mapping
• Each block of main memory maps to only
one cache line
– i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique word
• Most Significant s bits specify one memory
block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Address Structure
Tag s-r Line or Slot r Word w
8 14 2
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking
Tag
Direct Mapping
Cache Line Table
• Cache line Main Memory
blocks held
• 0 0, m, 2m, 3m…2s-
m
• 1 1,m+1, 2m+1…2s-
m+1
• m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping Cache
Organization
Direct Mapping Example
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map
to the same line repeatedly, cache
misses are very high
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag
and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a
match
• Cache searching gets expensive
Fully Associative Cache
Organization
Associative Mapping
Example
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to
check for hit
• Least significant 2 bits of address identify
which 16 bit word is required from 32 bit data
block
• e.g.
– Address Tag Data Cache line
– FFFFFC FFFFFC 24682468 3FFF
Associative Mapping
Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a
given set
– e.g. Block B can be in any line of set i
• e.g. 2 lines per set
– 2 way associative mapping
– A given block can be in one of 2 lines in
only one set
Set Associative Mapping
Example
• 13 bit set number
• Block number in main memory is
modulo 213
• 000000, 00A000, 00B000, 00C000 …
map to same set
Two Way Set Associative
Cache Organization
Set Associative Mapping
Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
• Use set field to determine cache set to
look in
• Compare tag field to see if we have a hit
• e.g
– Address Tag Data Set number
– 1FF 7FFC 1FF 123456781FFF
– 001 7FFC 001 112233441FFF
Two Way Set Associative
Mapping Example
Set Associative Mapping
Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2d
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
Replacement Algorithms (1)
Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
Replacement Algorithms (2)
Associative & Set Associative
• Hardware implemented algorithm (speed)
• Least Recently used (LRU)
• e.g. in 2 way set associative
– Which of the 2 block is lru?
• First in first out (FIFO)
– replace block that has been in cache longest
• Least frequently used
– replace block which has had fewest hits
• Random
Write Policy
• Must not overwrite a cache block
unless main memory is up to date
• Multiple CPUs may have individual
caches
• I/O may address main memory
directly
Write through
• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date
• Lots of traffic
• Slows down writes
• Remember bogus write through caches!
Line Size
• Retrieve not only desired word but a number of
adjacent words as well
• Increased block size will increase hit ratio at first
– the principle of locality
• Hit ratio will decreases as block becomes even
bigger
– Probability of using newly fetched information becomes
less than probability of reusing replaced
• Larger blocks
– Reduce number of blocks that fit in cache
– Data overwritten shortly after being fetched
– Each additional word is less local so less likely to be
needed
• No definitive optimum value has been found
• 8 to 64 bytes seems reasonable
• For HPC systems, 64- and 128-byte most
common
Multilevel Caches
• High logic density enables caches on
chip
– Faster than bus access
– Frees bus for other transfers
• Common to use both on and off chip
cache
– L1 on chip, L2 off chip in static RAM
– L2 access much faster than DRAM or ROM
– L2 often uses separate data path
– L2 may now be on chip
– Resulting in L3 cache
• Bus access or now on chip…
Hit Ratio (L1 & L2)
For 8 kbytes and 16 kbyte L1
Unified v Split Caches
• One cache for data and instructions or
two, one for data and one for
instructions
• Advantages of unified cache
– Higher hit rate
• Balances load of instruction and data fetch
• Only one cache to design & implement
• Advantages of split cache
– Eliminates cache contention between
instruction fetch/decode unit and
execution unit
• Important in pipelining