KEMBAR78
6.module 2 - Part 2 | PDF | Cpu Cache | Cache (Computing)
0% found this document useful (0 votes)
29 views39 pages

6.module 2 - Part 2

The document discusses cache memory in computer architecture, highlighting its importance in bridging the speed gap between the CPU and main memory. It covers concepts such as locality of reference, cache operations, mapping functions, and various cache types and policies. Additionally, it addresses performance considerations and techniques to enhance cache memory efficiency, including interleaving, prefetching, and lockup-free caches.

Uploaded by

My Creations
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views39 pages

6.module 2 - Part 2

The document discusses cache memory in computer architecture, highlighting its importance in bridging the speed gap between the CPU and main memory. It covers concepts such as locality of reference, cache operations, mapping functions, and various cache types and policies. Additionally, it addresses performance considerations and techniques to enhance cache memory efficiency, including interleaving, prefetching, and lockup-free caches.

Uploaded by

My Creations
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

COMPUTER

ARCHITECTURE
PCC-CS 402
(Module 2 - Part 2)

Department of Computer Science & Engineering


2nd Year,4th Semester
2022
Memory

• Concept of Cache Memory


• Techniques for reducing cache misses
• Cache Coherence
• Memory Interleaving
CACHE MEMORIES
• Processor is much faster than the main memory
− As a result, processor spends much of its time waiting for instructions
and data being fetched from main memory
− Major obstacle towards achieving good performance
• Speed of main memory cannot be increased beyond a certain point
• Cache memory is an architectural arrangement which makes main
memory appear faster to the processor than it really is
• Cache memory design is based on property of computer programs
known as “Locality of Reference”
Locality of Reference
• Analysis of address space of a program indicates
− Many instructions in localized areas of a program are executed
repeatedly during a period of time
− Other instructions are accessed relatively less frequently
• Similar property also observed for data access
− A Data Set in a specific address is accessed more often than data in
other locations
• This property of program execution focused on a specific address
space is called Locality of Reference
Locality of Reference
• Temporal locality of reference:
− Locality of Reference in Time
− Recently executed instructions and data is likely to be executed
again in near future
• Spatial locality of reference:
− Locality of Reference in Space
− Instructions and data with addresses close to a recently accessed
address are likely to be accessed soon
CACHE-OVERVIEW
 Small amount of fast memory
 Sits between normal main memory and CPU
 May be located on CPU chip or module
 An entire blocks of data is copied from memory to the cache because the principle
of locality tells us that once a byte is accessed, it is likely that a nearby data
element will be needed soon.
CACHE OPERATION - OVERVIEW
• CPU requests contents of a word in memory
• Memory Controller checks cache for this word
• If present in cache, send to processor (fast access)
• If not present, copies the block containing the requested word
from main memory to cache and delivers the memory content
to CPU (slow access)
• Cache includes tags to identify which block of main memory
is in each cache slot
CACHE HIT
• Existence of a cache is transparent to the processor. The processor issues
Read and Write requests in the same manner.

• If the data is in the cache it is called a Read or Write hit.

• Read hit:
 The data is obtained from the cache.

• Write hit:
 Cache has a replica of the contents of the main memory.
 Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
 Update the contents of the cache, and mark it as updated by setting a bit known
as the dirty bit or modified bit. The contents of the main memory are updated
when this block is replaced. This is write-back or copy-back protocol.
CACHE MISS
• If the data is not present in the cache, then a Read miss or Write miss occurs.

• Read miss:
 Block of words containing this requested word is transferred from the memory.
 After the block is transferred, the desired word is forwarded to the processor.
 The desired word may also be forwarded to the processor as soon as it is transferred
without waiting for the entire block to be transferred. This is called load-through or
early-restart.

• Write-miss:
 Write-through protocol is used, then the contents of the main memory are updated
directly.
 If write-back protocol is used, the block containing the addressed
word is first brought into the cache. The desired word is overwritten with new
information.
Mapping Functions
• Mapping functions determine how main memory blocks are placed in
the cache
• A simple processor example:
− Cache consisting of 128 blocks of 16 words, total 2048 (2K) words
− Main memory is addressable by 16-bit address
− Main memory has 64K words, with 4028 blocks of 16 words each
• Three mapping functions:
− Direct mapping
− Associative mapping
− Set-associative mapping
Direct Mapping
Main
• Block j of the main memory maps to j modulo 128
memory
Block 0

tag
Cache Block 1
of the cache; 0 maps to 0, 129 maps to 1
• Each memory block can be placed in only one
Block 0
tag
Block 1
position in the cache
Block 127
• More than one memory block can be mapped onto
tag
Block 128
same position in cache
Block 127 Block 129
• Memory address is divided into three fields:
− Low order 4 bits determine one of the 16 words
in a block
Tag Block Word
Block 255 − Next 7 bits determine location of cache block
5 7 4 Block 256
− High order 5 bits determine which of possible
Main memory address Block 257 32 blocks is currently present in cache; these
are Tag bits which get stored along with cache
block
Block 4095
Direct Mapping
Main
memory
Block 0 • Locating an address in cache
Cache Block 1 − Block Number derived from Main Memory
tag
Block 0 Block
tag
Block 1 − Upper 5 bits of address matched with Tag of
Block 127 the specific cache block
Block 128 − If match, cache hit, else, cache miss
tag
Block 127 Block 129

• Advantages:
− Simple to implement
Block 255 − Replacement method also simple
Tag Block Word
5 7 4 Block 256 • Disadvantages:
Main memory address Block 257 − Cache Hit Ratio is not high
− Not very flexible

Block 4095
Associative Mapping
Main
memory
Block 0 • Main memory block can be placed into any
Cache Block 1 cache position
tag
Block 0 • Memory address is divided into two fields:
tag
Block 1
− Low order 4 bits identify the word within a
Block 127 block
Block 128 − High order 12 bits or tag bits identify a
tag
Block 127 Block 129 memory block when it is resident in the
cache
• Advantages:
Tag Word
Block 255 − Flexible and uses cache space efficiently
12 4 Block 256
• Disadvantages:
Main memory address Block 257
− More complex, as all tag addresses must be
checked to locate a memory block in cache
− Requires associative memory access
Block 4095
Set-Associative Mapping
Cache Block 0
Main

tag Block 0
memory
Block 1
• Blocks of cache are grouped into sets; Mapping
tag Block 1
function allows a block of the main memory to
tag Block 2
reside in any block of a specific set
tag Block 3 Block 127
• Hence, this mapping is associative between all
blocks in the same set (Set-Associative)
Block 128

Block 129
• In this example, the cache is divided into 64 sets,
tag
Block 126
with two blocks per set; called 2-way set-
tag
Block 127 associative
• Memory block 0, 64, 128 etc. map to set 0, and
they can occupy either of the two positions
Block 255
Tag Set Word
Block 256
6 6 4
• Other possible combinations are 32 sets with 4
Block 257
Main memory address blocks each (4-way) or 16 sets with 8 blocks (8-
way)
• K-way set-associative cache stands for k blocks
Block 4095
per set
Set-Associative Mapping
Cache
Main
memory
Block 0 • Memory address is divided into three parts:
tag Block 0 Block 1 − 4 bits for words in block
tag Block 1
− 6 bit field determines the set number
tag Block 2
− High order 6 bits determines tag bits for both
tag Block 3 Block 127
blocks in a set
• Set-associative mapping is combination of direct
Block 128

and associative mapping and most common


tag Block 129
Block 126

tag
Block 127 • Reduces block conflict of direct mapping and
complex tag search in associative mapping
Tag Set Word
Block 255
• Number of blocks per set is a design parameter
If all blocks are in one set, same as associative
Block 256
6 6 4 •
mapping
Block 257
Main memory address

• If there is only one block per set (k=1), it is the


same as direct mapping
Block 4095
REPLACEMENT POLICY
In case of a Cache Miss, an existing block must be replaced to put in the
new block. Typical Replacement Policies are:
• Random:
− Any existing block can be discarded
• Least Recently Used (LRU):
− Discard block that is unused for the longest time
− Identification is very complex for higher associative cache
• First in First Out (FIFO):
− Discard block that was copied earliest from main memory
CACHE TYPES
• Multi-Level Cache:
− Cache may be organized in 2 or 3 levels
− Level 1 (L1) is typically closest to processor, small but fast
− Level 2 (L2) is larger but slower than L1
− Multi-processor systems use L3, larger and slower than L2
• Unified or Split Cache:
− Unified cache stores both Instructions & Data
− Split Cache are separate units for Instructions & Data
− Typically, L1 is a Split cache and others are Unified caches
CACHE CONTROL
• Valid Bit:
− It is set to ‘1’ when data in cache block is valid, otherwise ‘0’
− At system startup, valid bit is set to ‘0’
− When a memory block is loaded into cache, it is set to ‘1’
− Also, data transfer may occur between main memory and disk
directly bypassing the cache
− If main memory block is updated and the data is also resident in
the cache, the cache valid bit is set to ‘0’
• Dirty Bit:
− Set to ‘1’ the block contains data that has not been written into
memory, otherwise ‘0’
CACHE MISS CATEGORY
• Compulsory (Cold) Misses:
− Very first access to a cache block always results in a miss, when
cache block must be loaded from main memory
• Conflict (Collision) Misses:
− Cache block discarded when another block maps to same place
− Applicable for Direct and Set-Associative Mapping
• Capacity Misses:
− Cache not large enough to store all required data for program
execution
CACHE COHERENCE
• In case of multi-processor systems, each processor has its own
cache and a shared memory environment
• Same data may exist in different caches and main memory,
• In case any of the copies get updated, other copies become
invalid. This is called Cache Coherence problem.
• Cache inconsistencies can caused by data sharing, process
migration or in I/O activity.
CACHE COHERENCE PROTOCOL
• Cache Coherence protocols are applied to maintain data
consistency between multiple caches and main memory
− In Software, using compiler techniques
− In Hardware, such as Snoopy and Directory Protocols
CACHE MEMORY

Performance considerations
PERFORMANCE CONSIDERATIONS
 A key design objective of a computer system is to achieve the best
possible performance at the lowest possible cost.
 Price/performance ratio is a common measure of success.
 Performance of a processor depends on:
 How fast machine instructions can be brought into the processor for
execution.
 How fast the instructions can be executed.
PERFORMANCE OF CACHE MEMORY
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE
CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
Solution:
INTERLEAVING
 Divides the memory system into a number of memory
modules. Each module has its own address buffer register (ABR) and data
buffer register (DBR).
 Arranges addressing so that successive words in the
address space are placed in different modules.
 When requests for memory access involve consecutive
addresses, the access will be to different modules.
 Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory
can be increased.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)

Prefetching
• New data are brought into the processor when they are first needed.
• Processor has to wait before the data transfer is complete.
• Prefetch the data into the cache before they are actually needed, or a before
a Read miss occurs.
• Prefetching can be accomplished through software by including a special
instruction in the machine language of the processor.
 Inclusion of prefetch instructions increases the length of the programs.
• Prefetching can also be accomplished using hardware:
 Circuitry that attempts to discover patterns in
memory references and then prefetches according
to this pattern.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
Lockup-Free Cache
• Prefetching scheme does not work if it stops other accesses
to the cache until the prefetch is completed.
• A cache of this type is said to be “locked” while it services
a miss.
• Cache structure which supports multiple outstanding
misses is called a lockup free cache.
• Since only one miss can be serviced at a time, a lockup
free cache must include circuits that keep track of all the
outstanding misses.
• Special registers may hold the necessary
information about these misses.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
WRITE BUFFER
• Each write operation involves writing to the main memory
• If the processor has to wait for the write operation to complete, it
slows down the processor
• Processor does not depend on the results of the write operation
• Write buffer can be included for temporary storage of write requests
• Processor places each write request into the buffer and continues
execution
• If a subsequent Read request references data which is still in the write
buffer, then this data is referenced in the write buffer
• Applies to both Write-Through and Write-Back techniques
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)

PREFETCHING
• New data are brought into the processor when they are first needed
• The processor has to wait before the data transfer is complete
• Prefetching loads data into the cache before they are actually needed
• Prefetching can be accomplished through software:
− Including a special instruction in the machine language of the
processor
• Prefetching can also be accomplished using hardware:
− Special circuitry used to discover patterns in memory references
and then prefetching according to this pattern
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)

LOCKUP FREE CACHE


• Prefetching scheme does not work if it stops other accesses to the
cache until the prefetch is completed
• A cache of this type is said to be “locked” while it services a miss
• Cache structure which supports multiple outstanding misses is called
a Lockup Free cache
• Since only one miss can be serviced at a time, a lockup free cache
must include circuits that keep track of all the outstanding misses
• Special registers may hold the necessary information about these
misses

You might also like