KEMBAR78
Cache Memory | PDF | Cpu Cache | Computer Data Storage
0% found this document useful (0 votes)
14 views16 pages

Cache Memory

The document provides an overview of cache memory, explaining its structure, operation, and mapping techniques. It details the three common mapping methods: Direct Mapping, Associative Mapping, and N-way Set Associative Mapping, along with their advantages and disadvantages. Additionally, it discusses strategies for block replacement on cache misses, emphasizing the importance of locality of reference for performance improvement.

Uploaded by

R INI BHANDARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views16 pages

Cache Memory

The document provides an overview of cache memory, explaining its structure, operation, and mapping techniques. It details the three common mapping methods: Direct Mapping, Associative Mapping, and N-way Set Associative Mapping, along with their advantages and disadvantages. Additionally, it discusses strategies for block replacement on cache misses, emphasizing the importance of locality of reference for performance improvement.

Uploaded by

R INI BHANDARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction to Cache Memory

• Let us consider a single-level cache, and that part of the memory hierarchy
consisting of cache memory and main memory.
• Cache memory is logically divided into blocks or lines, where every block
(line) typically contains 8 to 256 bytes.

• When the CPU wants to access a word in memory, a special hardware first
checks whether it is present in cache memory.
o If so (called cache hit), the word is directly accessed from the cache memory.
o If not, the block containing the requested word is brought from main memory to
cache
o For writes, sometimes the CPU can also directly write to main memory.

• Objective is to keep the commonly used blocks in the cache memory.


o Will result in significantly improved performance due to the property of locality of
reference.
Q1. Where can a block be placed in the cache?

• This is determined by some mapping algorithms.


o Specifies which main memory blocks can reside in which cache memory blocks.
o At any given time, only a small subset of the main memory blocks can be held in
main memory.

• Three common block mapping techniques are used:


a) Direct Mapping
b) Associative Mapping
c) (N-way) Set Associative Mapping

• The algorithms shall be explained with the help of an example.


Example: A 2-level memory hierarchy

• Consider a 2-level cache memory / main memory hierarchy.


o The cache memory consists of 256 blocks (lines) of 32 words each.
▪ Total cache size is 8192 (8K) words.

o Main memory is addressable by a 24-bit address.


▪ Total size of the main memory is 224 = 16 M words.
▪ Number of 32-word blocks in main memory = 16 M / 32 = 512K
(a) Direct Mapping

• Each main memory block can be placed in only one block in the cache.

• The mapping function is:


Cache Block = (Main Memory Block) % (Number of cache blocks)

• For the example,


Cache Block = (Main Memory Block) % 256

• Some example mappings:


0 -> 0, 1 -> 1, 255 -> 255, 256 -> 0, 257 -> 1, 512 -> 0, 512 -> 1, etc.
• Block replacement algorithm is trivial, as there is no choice.

• More than one MM block is mapped onto the same cache block.
o May lead to contention even if the cache is not full.
o New block will replace the old block.
o May lead to poor performance if both the blocks are frequently used.

• The MM address is divided into three fields: TAG, BLOCK and WORD.
o When a new block is loaded into the cache, the 8-bit BLOCK field determines the
cache block where it is to be stored.
o The high-order 11 bits are stored in a TAG register associated with the cache block.
o When accessing a memory word, the corresponding TAG fields are compared.
▪ Match implies HIT.
(b) Associative Mapping

• Here, a MM block can potentially reside in any cache block position.

• The memory address is divided into two fields: TAG and WORD.
o When a block is loaded into the cache from MM, the higher order 19 bits of the
address are stored into the TAG register corresponding to the cache block.
o When accessing memory, the 19-bit TAG field of the address is compared with
all the TAG registers corresponding to all the cache blocks.

• Requires associative memory for storing the TAG values.


o High cost / lack of scalability.

• Because of complete freedom in block positioning, a wide range of replacement


algorithms is possible.
(c) N-way Set Associative Mapping

• A group of N consecutive blocks in the cache is called a set.

• This algorithm is a balance of direct mapping and associative mapping.


o Like direct mapping, a MM block is mapped to a set.
Set Number = (MM Block Number) % (Number of Sets in Cache)
o The block can be placed anywhere within the set (there are N choices)

• The value of N is a design parameter:


o N = 1 :: same as direct mapping.
o N = number of cache blocks :: same as associative mapping.
o Typical values of N used in practice are: 2, 4 or 8.
• Illustration for N = 4:
o Number of sets in cache memory = 64.
o Memory blocks are mapped to a set using modulo-64 operation.
o Example: MM blocks 0, 64, 128, etc. all map to set 0, where they can occupy
any of the four available positions.

• MM address is divided into three fields: TAG, SET and WORD.


o The TAG field of the address must be associatively compared to the TAG fields
of the 4 blocks of the selected set.
o This instead of requiring a single large associative memory, we need a number
of very small associative memories only one of which will be used at a time.
Q2. How is a block found if present in cache?

• Caches include a TAG associated with each cache block.


o The TAG of every cache block where the block being requested may be present
needs to be compared with the TAG field of the MM address.
o All the possible tags are compared in parallel, as speed is important.

• Mapping Algorithms?
o Direct mapping requires a single comparison.
o Associative mapping requires a full associative search over all the TAGs
corresponding to all cache blocks.
o Set associative mapping requires a limited associated search over the TAGs of
only the selected set.
• Use of valid bit:
o There must be a way to know whether a cache block contains valid or garbage
information.
o A valid bit can be added to the TAG, which indicates whether the block
contains valid data.
o If the valid bit is not set, there is no need to match the corresponding TAG.
Q3. Which block should be replaced on a cache miss?

• With fully associative or set associative mapping, there can be several blocks to
choose from for replacement when a miss occurs.

• Two primary strategies are used:


a) Random: The candidate block is selected randomly for replacement. This
simple strategy tends to spread allocation uniformly.
b) Least Recently Used (LRU): The block replaced is the one that has not been
used for the longest period of time.
o Makes use of a corollary of temporal locality:
“If recently used blocks are likely to be used again, then the best candidate
for replacement is the least recently used block”
• To implement the LRU algorithm, the cache controller must track the LRU block as the
computation proceeds.

• Example: Consider a 4-way set associative cache.


o For tracking the LRU block within a set, we use a 2-bit counter with every block.
o When hit occurs:
▪ Counter of the referenced block is reset to 0.
▪ Counters with values originally lower than the referenced one are incremented by
1, and all others remain unchanged.
o When miss occurs:
▪ If the set is not full, the counter associated with the new block loaded is set to 0,
and all other counters are incremented by 1.
▪ If the set is full, the block with counter value 3 is removed, the new block put in
its place, and the counter set to 0. The other three counters are incremented by 1.

You might also like