Introduction to Cache Memory
• Let us consider a single-level cache, and that part of the memory hierarchy
consisting of cache memory and main memory.
• Cache memory is logically divided into blocks or lines, where every block
(line) typically contains 8 to 256 bytes.
• When the CPU wants to access a word in memory, a special hardware first
checks whether it is present in cache memory.
o If so (called cache hit), the word is directly accessed from the cache memory.
o If not, the block containing the requested word is brought from main memory to
cache
o For writes, sometimes the CPU can also directly write to main memory.
• Objective is to keep the commonly used blocks in the cache memory.
o Will result in significantly improved performance due to the property of locality of
reference.
Q1. Where can a block be placed in the cache?
• This is determined by some mapping algorithms.
o Specifies which main memory blocks can reside in which cache memory blocks.
o At any given time, only a small subset of the main memory blocks can be held in
main memory.
• Three common block mapping techniques are used:
a) Direct Mapping
b) Associative Mapping
c) (N-way) Set Associative Mapping
• The algorithms shall be explained with the help of an example.
Example: A 2-level memory hierarchy
• Consider a 2-level cache memory / main memory hierarchy.
o The cache memory consists of 256 blocks (lines) of 32 words each.
▪ Total cache size is 8192 (8K) words.
o Main memory is addressable by a 24-bit address.
▪ Total size of the main memory is 224 = 16 M words.
▪ Number of 32-word blocks in main memory = 16 M / 32 = 512K
(a) Direct Mapping
• Each main memory block can be placed in only one block in the cache.
• The mapping function is:
Cache Block = (Main Memory Block) % (Number of cache blocks)
• For the example,
Cache Block = (Main Memory Block) % 256
• Some example mappings:
0 -> 0, 1 -> 1, 255 -> 255, 256 -> 0, 257 -> 1, 512 -> 0, 512 -> 1, etc.
• Block replacement algorithm is trivial, as there is no choice.
• More than one MM block is mapped onto the same cache block.
o May lead to contention even if the cache is not full.
o New block will replace the old block.
o May lead to poor performance if both the blocks are frequently used.
• The MM address is divided into three fields: TAG, BLOCK and WORD.
o When a new block is loaded into the cache, the 8-bit BLOCK field determines the
cache block where it is to be stored.
o The high-order 11 bits are stored in a TAG register associated with the cache block.
o When accessing a memory word, the corresponding TAG fields are compared.
▪ Match implies HIT.
(b) Associative Mapping
• Here, a MM block can potentially reside in any cache block position.
• The memory address is divided into two fields: TAG and WORD.
o When a block is loaded into the cache from MM, the higher order 19 bits of the
address are stored into the TAG register corresponding to the cache block.
o When accessing memory, the 19-bit TAG field of the address is compared with
all the TAG registers corresponding to all the cache blocks.
• Requires associative memory for storing the TAG values.
o High cost / lack of scalability.
• Because of complete freedom in block positioning, a wide range of replacement
algorithms is possible.
(c) N-way Set Associative Mapping
• A group of N consecutive blocks in the cache is called a set.
• This algorithm is a balance of direct mapping and associative mapping.
o Like direct mapping, a MM block is mapped to a set.
Set Number = (MM Block Number) % (Number of Sets in Cache)
o The block can be placed anywhere within the set (there are N choices)
• The value of N is a design parameter:
o N = 1 :: same as direct mapping.
o N = number of cache blocks :: same as associative mapping.
o Typical values of N used in practice are: 2, 4 or 8.
• Illustration for N = 4:
o Number of sets in cache memory = 64.
o Memory blocks are mapped to a set using modulo-64 operation.
o Example: MM blocks 0, 64, 128, etc. all map to set 0, where they can occupy
any of the four available positions.
• MM address is divided into three fields: TAG, SET and WORD.
o The TAG field of the address must be associatively compared to the TAG fields
of the 4 blocks of the selected set.
o This instead of requiring a single large associative memory, we need a number
of very small associative memories only one of which will be used at a time.
Q2. How is a block found if present in cache?
• Caches include a TAG associated with each cache block.
o The TAG of every cache block where the block being requested may be present
needs to be compared with the TAG field of the MM address.
o All the possible tags are compared in parallel, as speed is important.
• Mapping Algorithms?
o Direct mapping requires a single comparison.
o Associative mapping requires a full associative search over all the TAGs
corresponding to all cache blocks.
o Set associative mapping requires a limited associated search over the TAGs of
only the selected set.
• Use of valid bit:
o There must be a way to know whether a cache block contains valid or garbage
information.
o A valid bit can be added to the TAG, which indicates whether the block
contains valid data.
o If the valid bit is not set, there is no need to match the corresponding TAG.
Q3. Which block should be replaced on a cache miss?
• With fully associative or set associative mapping, there can be several blocks to
choose from for replacement when a miss occurs.
• Two primary strategies are used:
a) Random: The candidate block is selected randomly for replacement. This
simple strategy tends to spread allocation uniformly.
b) Least Recently Used (LRU): The block replaced is the one that has not been
used for the longest period of time.
o Makes use of a corollary of temporal locality:
“If recently used blocks are likely to be used again, then the best candidate
for replacement is the least recently used block”
• To implement the LRU algorithm, the cache controller must track the LRU block as the
computation proceeds.
• Example: Consider a 4-way set associative cache.
o For tracking the LRU block within a set, we use a 2-bit counter with every block.
o When hit occurs:
▪ Counter of the referenced block is reset to 0.
▪ Counters with values originally lower than the referenced one are incremented by
1, and all others remain unchanged.
o When miss occurs:
▪ If the set is not full, the counter associated with the new block loaded is set to 0,
and all other counters are incremented by 1.
▪ If the set is full, the block with counter value 3 is removed, the new block put in
its place, and the counter set to 0. The other three counters are incremented by 1.