Cache Memory
Cache Memory
• Cache memory is a small amount of
  high-speed memory that is placed       Processor
  between the Processor and the
  Main Memory unit.
• Due to locality of reference
  phenomenon the references to the           Cache
  memory locations are confined to a
  small locality.
• The cache memory tries to
  accommodate the content of this          Main
  small locality so that the processor
  can access these contents with a        Memory
  significantly lower access time.
                                 Cache Memory
• When the processor issues a memory access
  request it is first referred to the cache.
                                                            Processor
• If the cache contains the location, a hit is said to
  occur, and the operation is performed from the cache
  at the cost of access from the cache.
• Else a miss is said to occur and in that case the
  operation is performed on the main memory at the              Cache
  cost of access from the main memory.
• As per the phenomenon of Temporal Locality, a
  memory location referred to once is likely to be
  referred to again in the near future.
                                                              Main
• To exploit this, after a miss, a copy of the content of
  the location is accessed from main memory is saved         Memory
  in the cache so that the next time a reference to this
  location occurs it can be accessed at the cost of
  access from the cache.
                               Cache Memory
• As per the phenomenon of Spatial Locality, if a
  memory location is referred to, it is likely that the
  locations adjacent to it will also be referred to in the   Processor
  near future.
• To exploit this, after a miss, while fetching the
  content of the location that was referred to, the
  contents of the adjacent locations are also fetched,           Cache
  in a small block (called Cache Line) transfer, and
  saved in the cache.
• You will remember that block transfers from main
  memory cost much less than transfer of the location
  contents as individual bytes.
                                                              Main
• A reference to any of these adjacent locations, in         Memory
  the near future, results in hit and the access is
  made at the cost of access from the cache.
Memory
Access
with Cache
                           Performance of Cache
• Define Hit Ratio (h) as the fraction of memory references that are located in
  the cache.
      •   If 8 out of 10 references are found in the in the cache, then h=0.8;
      •   These 8 Hit accesses will cost 8Tc, where Tc is the cache access time.
      •   Rest 2 Miss accesses will cost 2Tm, where Tm is the main memory access time.
      •   The total access time for the 10 memory accesses will be- 8Tc + 2Tm
           And average access time per memory access will be- 0.8Tc + 0.2Tm
• In general, the Average Access Time will be,
            Ta = h Tc + (1 – h) Tm
• For two level cache Average Access Time will be,
           Ta = h1 Tc1 + (1- h1) h2 Tc2 + (1 - h1)(1 - h2)Tm
                 where
                 • Tc1, Tc2 are the level1 & level2 cache access times
                 • Tm is the main memory access time for a Cache Line
                 • h1, h2 are the Hit Ratios level1 & level2 cache respectively
Note: The Hit Ratios of cache depend on the design and size of the cache as well
      as the behavior of the program in terms of locality of reference. Instances
      of Hit Ratios of exceeding 0.9 is not uncommon. Generally higher-level
      caches have higher Hit Ratios due to the larger space.
Example Scenarios of Single Level Cache:
        For single level cache Average Access Time, Ta = h Tc + (1 – h) Tm
                • Ta is the average access time
                • Tc is the cache access time
                • Tm is the memory access time
                • h is the Hit ratio
     Ex-1.   Tc = 5 ns, Tm = 100 ns,        h = 0.95
             Ta = 0.95 x 5 + 0.05 x 100 ns =4.75 + 5 ns = 9.75ns
     Ex-2.   Tc = 5 ns, Tm = 100 ns,        h = 0.99
             Ta = 0.99 x 5 + 0.01 x 100 ns =4.95 + 1 ns = 5.95ns
Example Scenarios of Two-Level Cache:
       • For two level cache Average Access Time,
               Ta = h1 Tc1 + (1- h1) h2 Tc2 + (1 - h1)(1 - h2)Tm
       • Tc1, Tc2 are the level1 & level2 cache access times
       • Tm is the main memory access times
       • h1, h2 are the Hit ratios
   Ex-3.   Tc1 = 5 ns, Tc2 = 10 ns, Tm = 100 ns, h1 = 0.95, h2 = 0.99
           Ta = 0.95x5 + 0.05x0.99x10 + 0.05x0.01x100 ns
                    =4.75 + 0.495 + 0.05 ns = 5.295 ns
Questions?
Cache Organization
Main Memory Blocks:
• Main memory consists
  of up to 2n addressable
  words, with each word
  having a unique n-bit
  address.
• For mapping purposes,
  grouped into a number
  of fixed-length blocks
  of K words each.
• No. of blocks in main
  memory, M = (2n)/K.
                                      Cache Organization
• Cache cannot be organized as Random
  Access like in Main Memory as same
  cache location has to hold contents of
  different memory locations at different
  times.
• It is organized as Content Addressable
  Memory (CAM).
• A part of the memory address of the
  data, referred to as Tag, is stored as part
  of the content in the cache
• The cache is organized as an array of C
  Cache Lines.
• Each cache line has space for holding K
  words, a Tag (of a few bits) and a
  couple of Control bits
• Length of a line, not including tag and control bits, is line size.
• Tag identifies the particular block in memory that is currently being
  stored in the Cache Line.
• No. of Cache Lines in the cache, C << M, the number of blocks in main
  memory.
                                                                       Cache Memory
MSB                0011000 0101101101000 011
A Memory Address
                                                                                      Block j
                                                       Block No.
                                               (Word bits)
LSB
               Cache Memory - Mapping
• There are far fewer cache lines than main memory blocks
  • A method for mapping blocks to cache lines is needed
      i.e. To determine which block goes to which line
• Based on the type of mapping technique, the cache
  organization differs
   • Three mapping techniques
     • Direct mapping
     • Associative mapping
     • Set Associative mapping
                    Cache Memory – Direct Mapping
• Maps each block of main memory into only one specific Cache Line.
• The mapping is expressed as-
   • i = j modulo C , where i = cache line number,
                            j = main memory block number,
                           C = no. of lines in the cache
• Thus first C blocks map to the C cache lines in a one-to-one manner; similarly
  next C blocks map to the C cache lines ...
• Mapping function is easily implemented using the main memory address
   • A main memory address can be viewed as consisting of three fields.
      • w LS bits identify a unique word or byte within a block of main memory
      • t MS bits form the tag and the l bits in the middle specify the cache line
      • l + t MS bits specify which block of memory
                                (t bits)                    (l bits)                 (w bits)
                                                             Cache Memory – Direct Mapping
MSB                0011000 0101101101000 011
                                               (Tag bits)
A Memory Address
                                               (Line bits)
                                                                                             Block j
                                                               Block No.
                                               (Word bits)
LSB
                               Cache Memory – Direct Mapping
Memory Address length =
           (t + l + w) bits
No. of addressable units=
         2t+l+w words or bytes
Length of Block =
           2w words or bytes          BC-1                                                           LC-1
No. of blocks in main memory,
           M = 2t+l
No. of cache lines, C = 2l
Size of cache =                                             Cache Line   Main Memory Blocks Mapped
           2l+w words or bytes                                  0        0, C, 2C, …, 2 L-C
   (Excluding Tags & Control bits)                              1        1, C+1, 2C+1, …, 2 L-C+1
No. of blocks mapped to per                                     2        2, C+2, 2C+2, …, 2 L-C+2
cache line = M/C = 2t
                                                                ...
Length of tag = t bits               B2L -1
                                                               C-1       C-1, 2C-1, 3C-1, …, 2 L-1
                                              Main Memory
                       Cache Memory – Direct Mapping
• Example
                                               Mapping of Memory Blocks to Cache Lines
 • 14-bit line number is used as an index       Cache Starting Memory Addresses of Blocks
   into the cache to access a specific line,     Line          Mapped to the line
                                                   0   000000, 010000, 020000, …, FF0000
 • 8-bit tag number matches the tag
                                                   1   000004, 010004, 020008, …, FF0004
   number currently stored in that line,           2   000008, 010008, 020008, …, FF0008
 • 2-bit word number is used to select
                                                 ...
   one of the 4 bytes in that line               C-1    00FFFC, 01FFFC, 02FFFC, …, FFFFFC
        Cache Memory – Direct Mapping
             t+l+w
t   l
                                        t+l+w
               Cache Memory – Direct Mapping
• Advantage: The direct mapping technique is simple and
  inexpensive to implement.
• Disadvantage: there is a fixed cache location for any given
  block
  • If two Localities of the program, required simultaneously during
    execution, occupy memory blocks that map into the same cache
    line, then the blocks will be repeatedly swapped between the
    Cache and the Main Memory – leading to Thrashing
  • Hit ratio will suffer
Questions?
     Cache Memory – Associative Mapping
• Associative mapping overcomes the disadvantage of direct mapping by permitting a
  main memory block to be loaded into any of the Cache Lines
• Memory address is interpreted as a Tag and a Word field
      • Memory Block No. itself is the Tag; i.e. the Tag field uniquely identifies a block
          of main memory
• Since any memory block j can be mapped to any cache line i, to determine whether a
  block is in the cache, the cache control logic must compare the tag field of every
  Cache Line for the tag field of the main memory address.
• To achieve the desired speed of access these comparisons must be done in parallel.
                                                  (t bits)                 (w bits)
        • Address length, L = (t + w) bits
        • Number of Tag bits, t = b, where, 2b blocks in main memory
    Cache Memory – Associative Mapping
       t+w
                                    t+w
             t
             Cache Memory – Associative Mapping
• Advantage:
  • Can support large number of simultaneous Localities in the program,
  • Flexibility as to which block to replace when a new block is read into the cache -
• Disadvantage:
  • Requires expensive circuitry for parallel examination of the tags in all the cache
    lines,
  • Requires hardware implementation of suitable Cache Replacement Algorithm to
    maximize the hit ratio.
   Cache Memory – Set Associative Mapping
• Combines the strengths of both, Direct and Associative mapping
  approaches while overcoming their disadvantages
• The Cache Lines are grouped into ν Sets, each consisting of α Lines
  such that-
       Total Cache Line, C = ν x α
                  and i = j modulo ν
                where, i = Set no. of cache lines
                        j = Main memory block no.
                       C = No. of lines in cache
                       ν = No. of cache line sets
                       α = No. of lines in each set
• This is a α-way set associative mapping
• A block Bi can be loaded into any of the α lines in set j
Cache Memory – Set Associative Mapping
                                                α lines
                                         Lα-1
                   Cache Memory – Set Associative Mapping
• Cache control logic interprets a memory address as three fields: Tag, Set, and Word
• Tag in a memory address is much smaller and is only compared to the α tags within
  a single set
• Address length, l = (t+s+w) bits
• No. of addressable units= 2(t+s+w) word or bytes
• Block size= Line size= 2w
                                                                                  (w bits)
• No. of blocks in main memory=       2t+s           (t bits)          (s bits)
• No. of lines in a set= α
• No. of sets, γ = 2s
• No. of lines in cache, C = αγ = α x 2s
• Size of cache= α x 2(s+w) word or bytes
• Length of tag, t= l - (s+w) bits
• Total tag size= t x C = t x α x 2s bits
    Cache Memory – Set Associative Mapping
           t+s+w
t   s
                   t
                                      t+s+w
Example: Direct Mapping
     • A 64KB size cache memory with 8-byte cache lines uses Direct Mapping.
     • The main memory size is 64MB
    • The memory address length is- 26 bits                   [226 = 64M]
    • The Word field length is- 3 bits                        [23 = 8]
    • No. of cache lines = 64 KBytes/ 8 Bytes = 8K
    • The Line field length is- 13 bits                       [213 = 8K]
    • The Tag field length= 26 – 13 – 3 = 10 bits
                                                      Tag                    Line      Word
                                                    (16-25)                 (3-15)     (0-2)
    • Total Tag size is- Tag field length x No. of Cache lines = 10 x 1 K bits = 10 K bits
    • Comparator size= Tag length = 10 bits
Example: Associative Mapping
     • A 64KB size cache memory with 8-byte cache lines uses Associative mapping.
     • The main memory size is 64MB
    •   The memory address length is- 26 bits           [226 = 64M]
    •   The Word field length is- 3 bits                [23 = 8]
    •   No. of cache lines = 64 KBytes/ 8 Bytes = 8K
    •   The Tag field length= 26 – 3 = 23bits
                                                Tag                   Word
                                               (3-25)                 (0-2)
    • Total Tag size is- Tag field length x No. of Cache lines = 23 x 8 K bits = 23 KB
    • Comparator size= Tag length x No. of cache lines = 23 x 8K bits = 1,844,416 bits
Example: Set-Associative Mapping
     • A 64KB size 8-way set-associative cache memory uses 8-byte cache lines.
     • The main memory size is 64MB
    • The memory address length is- 26 bits              [226 = 64M]
    • The Word field length is- 3 bits                   [23 = 8]
    • No. of cache lines = 64 KBytes/ 8 Bytes = 8K
    • No. of cache line sets = 8K/ 8 = 1K
    • The Set field length is- 10 bits                   [210 = 1K]
    • The Tag field length= 26 – 10 – 3 = 13bits
                                              Tag                     Set     Word
                                            (13-25)                 (3-12)    (0-2)
    • Total Tag size is- Tag field length x No. of Cache lines = 13 x 8K bits = 13 KB
    • Comparator size= Tag length x No. of cache lines per set = 13 x 8 bits = 104 bits
          A Comparison of the Mapping Techniques
         Parameter           Direct         Associative          Set-Associative
      No. of Localities     Limited          Very Large           Large Enough
        Supported
       Replacement        Not Required   Need selection of      Need selection of
        Algorithm                           1 among C              1 among α
       No. of Parallel         1         As many as the no.    Equal to no. of lines
       Comparisons                        of cache lines (C)       per Set (α)
       Tag Length (t)        Short           Very Long                Short
      Comparator Size        Small            Very Big               Not Big
• Direct Mapping is simple and less costly, but does not support multiple simultaneous
  localities – prone to Thrashing
• Associative Mapping supports very large number of localities, but the Comparator
  Complexity and Tag Space requirements are Very High.
• Set-Associative Mapping supports sufficient number of localities and the Comparator
  Complexity and Tag Space requirements are within acceptable limits.
Questions?
                 Cache Replacement Algorithm
• The size of cache memory is generally much smaller than the main memory.
• Only a portion of a program, which may contain one or more localities in it,
  can be accommodated in the cache.
• As the program execution shifts from one locality to another there is need for
  accommodating the new localities in the cache replacing the older localities.
• For this the cache uses Replacement Algorithms.
• The commonly used cache replacement algorithm - Least Recently Used
  (LRU) - assumes that the location referred to farthest-in-time is least likely to
  be referred to in the near future.
• The memory location content that was referred to
  farthest-in-time, among those contained in the cache, is
  replaced with the newly referred location content.
        Cache Memory – Replacement Algorithms
• When the cache is full and a new block is required to be
  bought into the cache, one of the existing blocks must be
  replaced
  • For direct mapping, there is only one possible line for any
    particular block – no choice is possible
  • For the associative and set-associative mapping, a replacement
    algorithm is needed to choose the cache line
• Some common replacement algorithms are:
       • Least Recently Used (LRU)
       • First in First Out (FIFO)
       • Least Frequently Used (LFU)
       • Random
        Replacement Algorithm - LRU
• Least Recently Used Algorithm
  • Replace that block in the cache/set that has not been referred
    to for the longest period of time
• To implement:
  • Cache maintains a separate list of indexes of all the cache lines
  • When a line is referenced, the index of that cache line is moved
    to the front of the list
  • Replace the block in the last cache line in the list
• Simple, popular and effective.
      Replacement Algorithm – FIFO, LFU, Random
• First In First Out Algorithm
   • Replace that block in the Cache/ Set that has been in the cache the
     longest
   • In other words, the block that came in first also goes out first
• Least Frequently Used Algorithm
   • Replace that block in the Cache/ Set that has experienced the fewest
     references
   • Implemented by associating a counter with each line; replace the block
     in the line which has the lowest counter value
• Random
   • Not a usage-based algorithm
   • Pick a line at random for replacement
   • Inferior in performance to algorithms based on usage
Questions?
              Cache Memory – Write Policy
• A line in the cache may contain Instructions or Data Block. A Block of
  Instructions is never modified. However, a Block of Data may be
  modified.
• When a Data Block resident in the Cache is to be replaced –
   • If the Data Block has been Modified (i.e. a write operation has been
     performed on one or more words in the cache line), then the
     corresponding Main Memory block must be updated before the
     Replacement.
   • If the Data Block in the Cache has not been modified, then it may be
     overwritten with a new block without first writing the old data block back
     in the Main Memory.
• Write Policy determines – when a data block in the
  Cache is modified should be updated (or written) in
  the main memory.
             Cache Memory – Write Policy
• Several modules/devices may be connected to the main
  memory –e.g., I/O module may directly read from memory
  • Then, if the block that has been changed in the cache is not updated
    in main memory, then the main memory may contain invalid data
  • Invalid data may in turn be read by other modules
• The simplest solution is the write policy called write through
  • All write operation done on the cache are also immediately written
    into main memory
  • Data in main memory are always valid
  • A lot of write operations, memory traffic
                Cache Memory – Write Policy
• Alternative policy - write back
  • Updates are made only in the cache
  • When an update occurs, a Control bit called ‘dirty bit’, or ‘used
    bit’, associated with the cache line is set.
  • When the block is to be replaced, it is written back to main
    memory if and only if the dirty bit is set.
• Disadvantage:
  • The Data Block in Main Memory is invalid till the Write Back
    operation is performed => Any access by I/O modules can be
    allowed only through the cache. Else, I/O modules will receive
    invalid data.
  • Requires complex circuitry and a may lead bottleneck.
            Cache Memory – Write Policy
• Another alternative policy – Buffered Write
  • Write operations to the main memory are performed whenever
    a Data Block in Cache is modified.
  • A queue of write buffers are maintained where the main
    memory write requests are queued and the main memory gets
    updated from the queue.
                         Cache              Main
       Processor                           Memory
                        Write Queue
• Advantage:
  • Processor does not have to wait for the main memory write
    operation to complete.
Cache Memory – Control Bits             Cache with the Control Bits
                                     Tags     Valid Dirty
                                                              Cache
                                               Bit   Bit
 Each cache line has a set of
 Control Bits; Two such bits
 commonly found are-
      • Valid bit: To indicate
        that the contents in the
        cache line are valid in
        current program
        context.
      • Dirty bit: Used in case of
        Write Back policy
Questions?
                      Multilevel Cache
• On-chip cache – when cache and processor are on the
  same chip
  • Compared to cache reachable via bus – reduces the processor’s
    external bus activity therefore speeds up execution times and
    increases overall system performance
• Simplest multilevel cache
  • A two-level cache – internal cache, level 1 (L1) + external cache,
    level 2 (L2)
  • If there is no L2 cache and the processor makes an access
    request for a memory location not in the L1 cache, then the
    processor must access RAM via bus – slow RAM access time
    results in poor performance
  • Using a cache improves performance
                     Multilevel Cache
• Two features multilevel cache design
  • For an off-chip L2 cache, use a separate data path instead of
    system bus, so as to reduce the burden on the system bus.
  • With the further shrinkage of processor components, a number
    of processors now incorporate the L2 cache on the processor
    chip, improving performance
  • Complicates design issues – size, replacement algorithm, and
    write policy
• Potential gains due to the use of an L2 cache depends on
  the hit rates in both the L1 and L2 caches
                  Unified & Split Cache
• When caches are split into two parts – one part dedicated
  to instructions and other to data – it is a split cache
• Both exist at the same level, typically as two L1 caches
• Advantage
  • Eliminates contention for the cache between the instruction
    fetch/decode unit and the execution unit
                    Unified & Split Cache
• When no such splitting is done it is a unified cache
• Advantages of unified cache:
  • For a given size, a unified cache has a higher hit rate than split
    caches because it balances the load between instruction and
    data fetches automatically
  • Only one cache needs to be designed and implemented
• Usually split caches are used for L1 and unified caches for
  higher levels
               Cache Memory Operation
• Cache connects to the processor via
  data, control, and address lines.
• Data and address lines also attach
  to data and address buffers, which
  attach to a system bus to the main
  memory.
• Cache hit occurs, the data and
  address buffers are disabled and
  communication is only between
  processor and cache, with no
  system bus traffic.
• Cache miss occurs, the desired
  address is loaded onto the system
  bus and the data are returned
  through the data buffer to both the
  cache and the processor
                 Cache Memory Addresses
• A physical cache stores data using main memory physical
  addresses
  • Hardware memory management unit (MMU) translates the Virtual/ Logical
    Address into a Physical Address in main memory
                 Cache Memory Addresses
• A logical cache (or virtual cache) stores data using virtual
  addresses
Questions?
Thank You