Replacement Algorithms
Once the cache has been filled, when a new block is brought into the
cache, one of the existing blocks must be replaced.
For direct mapping, there is only one possible line for any particular
block, and no choice is possible.
For the associative and set-associative techniques, a replacement
algorithm is needed.
Least Recently Used (LRU)
First-In-First-Out (FIFO)
Least Frequently used (LFU)
Random Replacement
Replacement Algorithms
Least Recently Used (LRU)
• Replace that block in the set that has been in the cache longest
with no reference to it.
• For two-way set associative, this is easily implemented. Each line
includes a USE bit. When a line is referenced, its USE bit is set to 1
and the USE bit of the other line in that set is set to 0. When a
block is to be read into the set, the line whose USE bit is 0 is used.
• LRU should give the best hit ratio.
• LRU is also relatively easy to implement for a fully associative
cache.
• Because of its simplicity of implementation, LRU is the most
popular replacement algorithm.
Replacement Algorithms
First-In-First-Out (FIFO)
Replace that block in the set that has been in the cache longest.
Least Frequently used (LFU)
Replace that block in the set that has experienced the fewest
references.
LFU could be implemented by associating a counter with each line.
Random Replacement
pick a line at random from among the candidate lines.
random replacement provides only slightly inferior performance to
an algorithm based on usage.
Write Policy
When a block that is resident in the cache is to be replaced, there
are two cases to consider. If the old block in the cache has not been
altered, then it may be over written with a new block without first
writing out the old block. If at least one write operation has been
performed on a word in that line of the cache, then main memory
must be updated by writing the line of cache out to the block of
memory before bringing in the new block.
Write Policy
Problems
First, more than one device may have access to main memory. For
example, an I/O module may be able to read-write directly to
memory. If a word has been altered only in the cache, then the
corresponding memory word is invalid.
Further, if the I/O device has altered main memory, then the cache
word is invalid.
A more complex problem occurs when multiple processors are
attached to the same bus and each processor has its own local
cache. Then, if a word is altered in one cache, it could conceivably
invalidate a word in other caches.
Write Policy
1- write through
all write operations are made to main memory as well as to the
cache, ensuring that main memory is always valid. Any other
processor–cache module can monitor traffic to main memory to
maintain consistency within its own cache.
The main disadvantage of this technique is that it generates
substantial memory traffic and may create a bottleneck.
Write Policy
2- write back
minimizes memory writes.
updates are made only in the cache. When an update occurs, a
dirty bit, or use bit, associated with the line is set. Then, when a
block is replaced, it is written back to main memory if and only if
the dirty bit is set.
The problem with write back is that portions of main memory are
invalid, and hence accesses by I/O modules can be allowed only
through the cache. This makes for complex circuitry and a
potential bottleneck.
Number of Caches
When caches were originally introduced, the typical system had
a single cache. More recently, the use of multiple caches has
become the norm.
Multilevel Caches
the on-chip cache (cache on the same chip as the processor)
reduces the processor’s external bus activity and therefore
speeds up execution times and increases overall system
performance. When the requested instruction or data is
found in the on-chip cache, the bus access is eliminated.
Multilevel Caches
The inclusion of an on-chip cache leaves open the question of
whether an off-chip, or external, cache is still desirable. Typically,
the answer is yes.
most contemporary designs include both on-chip and external
caches.
The simplest such organization is known as a two-level cache, with
the internal cache designated as level 1 (L1)and the external cache
designated as level 2 (L2).
Multilevel Caches
The reason for including an L2 cache is the following:
If there is no L2 cache and the processor makes an access request
for a memory location not in the L1 cache, then the processor must
access DRAM or ROM memory across the bus. Due to the typically
slow bus speed and slow memory access time, this results in poor
performance.
On the other hand, if an L2 SRAM (static RAM) cache is used, then
frequently the missing information can be quickly retrieved. If the
SRAM is fast enough to match the bus speed, then the data can be
accessed using a zero-wait state transaction, the fastest type of bus
transfer.
Associative Memory – Hardware Organization
each entry of the page table has two fields: a valid bit and a frame
number.
Additional fields are often added to relay more information.
A dirty bit (or a modify bit) could be added to indicate whether the
page has been changed. This makes returning the page to disk more
efficient, because if it is not modified, it does not need to be
rewritten to disk.
Another bit (the usage bit) can be added to indicate the page usage.
This bit is set to 1 whenever the page is accessed. After a certain
time period, the usage bit is set to 0. If the page is referenced again,
the usage bit is set to 1. However, if the bit remains 0, this indicates
that the page has not been used for a period of time, and the system
might benefit by sending this page out to disk.