Lecture 12: Memory Hierarchy
and Introduction to Cache
                    Rose Gomar
   Department of Systems and Computer Engineering
Textbook/Copyright
 • Hennessy, John L., and David A. Patterson. Computer architecture: a
   quantitative approach. Elsevier, 6th edition, 2017, Chapter 2.
 • Hennessy, John L., and David A. Patterson. Computer architecture: a
   quantitative approach. Elsevier, 6th edition, Appendix B.
 • Hennessy, John L., and David A. Patterson, Computer Organization and
   Design: RISC-V edition, Chapter 5.
 • Part of the slides are provided by Elsevier (Copyright © 2019, Elsevier
   Inc. All rights reserved)
                                                                             2
What we learn in this lecture?
•   Memory technologies
•   Memory Hierarchy
•   Motivation for cache
•   Caches
                                 3
The Three Main Memory Categories
 • Generally, there are three categories of
   memory inside a computer system:
     ➢ CPU memory
     ➢ Main memory
     ➢ Secondary memory
 • CPU Memory is the collection of registers
   inside the CPU. They are very fast since they
   are manufactured within the same silicon
   die or chip. However, they are very
   expensive too. Therefore, only few number
   of registers are manufactured.
                                                   5
The Three Main Memory Categories
 • Generally, there are three categories
   of memory inside a computer system:
    ➢ CPU memory
    ➢ Main memory
    ➢ Secondary memory
 • CPU Memory is the collection of
   registers inside the CPU. They are very
   fast since they are manufactured within
   the same silicon die        or     chip.
   However, they are very expensive
   too. Therefore, only few number of
   registers are manufactured.
 • Main memory is the system RAM and
   ROM. Compared to CPU registers, they
   are larger and less expensive but they
   are slower to access.
                                              6
The Three Main Memory Categories
 • Generally, there are three categories
   of memory inside a computer system:
    ➢ CPU memory
    ➢ Main memory
    ➢ Secondary memory
 • CPU Memory is the collection of
   registers inside the CPU. They are very
   fast since they are manufactured within
   the same silicon die        or     chip.
   However, they are very expensive
   too. Therefore, only few number of
   registers are manufactured.
 • Main memory is the system RAM and
   ROM. Compared to CPU registers, they
   are larger and less expensive but they
   are slower to access.
 • An example of secondary memory is
   hard disks. Compared to main memory,
   they are larger, cheaper, but much
   slower.                                    7
Memory Technologies
 • Random Access Memory (RAM)
    • SRAM (Static RAM)
    • DRAM (Dynamic RAM)
 • Read Only Memory (ROM)
 • Flash Storage
    • A type of EEPROM
 • Magnetic Disk
 • Why do we need different memory technologies?
                                                   8
Memory Technologies
 • Random Access Memory (RAM)
    • SRAM (Static RAM)
    • DRAM (Dynamic RAM)
 • Read Only Memory (ROM)
 • Flash Storage
    • A type of EEPROM
 • Magnetic Disk
 • Why do we need different memory technologies?
    •   Designers always want unlimited fastest memory!
         • SRAM is the fastest but the most expensive one!
         • DRAM is slower but less expensive
         • Flash and magnetic will provide even more capacity
                                                                9
A Simple Memory Model
 • Not a good model for big capacities
 • Very slow (why?)
                                   Write
                                   data
                                                                 Read
 Write
                         Read                                    data
               Memory
Address
         clk                          CLK
                                                             Read
                                                            address
                                              Decoder
                                            Write address
                                                                        10
Main Memory Arrays
•   Efficiently store large amounts of data
                                                             N
•   3 common types:                                Address       Array
     –   Dynamic random-access memory
         (DRAM)
     –   Static random-access memory (SRAM)
                                                                    M
     –   Read only memory (ROM)
                                                                 Data
•   M-bit data value read/written at each unique N-bit
    address
                                                                         11
Main Memory Arrays
 •   Efficiently store large amounts of data
                                                                     N
 •   3 common types:                                       Address            Array
         –    Dynamic random-access memory (DRAM)
         –    Static random-access memory (SRAM)
         –    Read only memory (ROM)                                             M
 •   M-bit data value read/written at each unique N-bit
     address                                                                  Data
 •   2-dimensional array of bit cells                                Address Data
 •   Each bit cell stores one bit                                        11    0 1 0
                                                       2
 •   N address bits and M data bits: Address                 Array       10    1 0 0
                                                                                        depth
     –       2N   rows and M columns                                     01    1 1 0
     –       Depth: number of rows (number of words)            3        00    0 1 1
     –       Width: number of columns (size of word)
                                                             Data               width
     –       Array size: depth × width =   2N   × M
                                                                                            12
Memory Cell Access
•   Cells are accessed using wordline and bitline
                                                                     bitline
                               wordline
                                                           stored
                                                             bit
                                                      bitline =                                  bitline =
                       wordline = 1                               wordline = 0
                                            stored                                     stored
                                            bit = 0                                    bit = 0
                                                      bitline =                                  bitline =
                       wordline = 1                               wordline = 0
                                            stored                                     stored
                                            bit = 1                                    bit = 1
                                      (a)                                        (b)
                                                                                                             15
Memory Cell Access
•   Cells are accessed using wordline and bitline
                                                                         bitline
                               wordline
                                                           stored
                                                             bit
•   When the wordline is enabled, bitline takes the stored cell-value
•   When the wordline is disabled, bitline takes high-impedence (open-circuit) status
                                                      bitline =   0                                  bitline =   Z
                       wordline = 1                                   wordline = 0
                                            stored                                         stored
                                            bit = 0                                        bit = 0
                                                      bitline =   1                                  bitline =   Z
                       wordline = 1                                   wordline = 0
                                            stored                                         stored
                                            bit = 1                                        bit = 1
                                      (a)                                            (b)
                                                                                                                     16
Memory Array Layout
• Memory Array with Address Decoder
                    2:4
                  Decoder                           bitline2         bitline1         bitline0
                             wordline3
                        11
              2                           stored           stored           stored
    Address                               bit = 0          bit = 1          bit = 0
                             wordline 2
                        10
                                          stored           stored           stored
                             wordline1    bit = 1          bit = 0          bit = 0
                        01
                                          stored           stored           stored
                                          bit = 1          bit = 1          bit = 0
                             wordline0
                        00
                                          stored           stored           stored
                                          bit = 0          bit = 1          bit = 1
                                                      Data2            Data1            Data0
                                                                                                 17
Memory Arrays
                               One Memory cell
                 Row Decoder
       Column
       Decoder
                                                 18
Main Memory: Static RAM
• Main Memory: Static RAM. This is the
  type of RAM in which data is held until
  power is removed from it. One memory
  cell (bit) of SRAM consists of at least 6
  transistors (6 T memory cell).
                                              One SRAM Cell
                                                              19
  Main Memory: Static RAM
• Main Memory: Static RAM. This is the type of
  RAM in which data is held until power is
  removed from it. One memory cell (bit) of
  SRAM consists of at least 6 transistors (6 T
  memory cell).
• SRAM data is organized into cells.
                                                  One SRAM Cell
• Cells are organized into arrays where
  address decoders determines the row and
  column of the desired information.
                                                 General logic of SRAM
                                                                         20
  Main Memory: Static RAM
• Advantages: SRAMs are as fast as typical CPUs
  because of using the same technology and so
  find more important use as ‘cache memory’.
• Because of being expensive, caches are
  naturally much less in size (storage capacity)
  than the regular main memory.
                                                       One SRAM Cell
• Disadvantages: SRAMs are more expensive
  because each cell needs at least six transistors,
  and less dense compared to DRAMs.
                                                      General logic of SRAM
                                                                              21
Static RAM Read Cycles
• The steps of a read cycle of SRAM:
   ➢ Place the address to be read on the address bus.
   ➢ Ensure that the chip is activated by making CS low.
   ➢ Activate the OE pin. This ensures that data is read.
   ➢ The required data then appears on the data bus.
   TAA is the read access time. The time from the instant
   the address is placed on the address bus to the          A read cycle of SRAM
   point when the required data is available on the data
   bus. TRC is the read cycle time which is the minimum
   time between two read cycles
                                                                                   22
Static RAM Write Cycles
 • The steps of a write cycle of SRAM:
    ➢ Place the address to be written to on the address
      bus.
    ➢ Ensure that the chip is activated by making CS low.
    ➢ Place the data to be written on the data bus.
    ➢ Activate the WR line. Only then the data is valid.
                                                            23
DRAM Technology
•   Data stored as a charge in a capacitor
     • Single transistor used to access the charge
     • Must periodically be refreshed
         • Read contents and write back          A DRAM cell. Very economic compared to
         • Performed on a DRAM “row”             SRAM that has 6 or more Transistors per cell
•   DRAMs are organized in banks (for DDR4 is up to 16)
DRAM Technology
• Dynamic RAM: It is designated as dynamic, because its content
  does not remain unchanged or static as in SRAM, and hence,
  frequent ‘refreshing’ is necessary.
• One of the problems with this arrangement is that the capacitors do
  not hold their charge indefinitely and needs to be recharged. This
  action is done by ‘refreshing’ the cell at regular intervals.
• One important merit of DRAM is that its packing density is very
  high compared to SRAM.
Main Memory: Dynamic RAM
• Read Cycle of DRAM : a processor when
  addressing memory sends the complete
  address on its address pins
• Between the processor and a DRAM chip,
  there is a memory controller whose function
  is to split the address into two, as columns
  and rows.                                      Memory controller for a DRAM
                                                                                26
Main Memory: Dynamic RAM
• Read Cycle of DRAM : a processor when
  addressing memory sends the complete
  address on its address pins
• Between the processor and a DRAM chip,
  there is a memory controller whose function
  is to split the address into two, as columns
  and rows.                                        Memory controller for a DRAM
• A DRAM has less number of address pins
  than the address supplied by the processor,
  because the address lines of the DRAM chip
  is multiplexed (in time) for the row and
  column addresses.
• DRAM chips are large, rectangular arrays of
  memory cells with support logic that is used
  for reading and writing data in the arrays and
  refresh circuitry to maintain the integrity of
  stored data
                                                                                  27
DRAM Read Cycle
                                                                               The addressing structure of DRAM
• Dynamic RAM Timing. Steps of a read cycle of
  DRAM
  is given below:
   1) The row address is placed on the rows and given sufficient time
      to stabilize and be latched.
   2) The Row Address Strobe (RAS) signal is then activated.
   3) The Row Address Decoder selects the proper row.
   4) Next, the column address is placed on the same address lines
      and allowed to stabilize and be latched.
   5) The Column Address Strobe (CAS) signal is then activated.
   6) The CAS pin also serves as the Output Enable; so, once the
      CAS signal has stabilized, the sense amps place the data
      from the selected row and column on the data bus.
   7) With this, the data in the selected address is available at
      the output buffers of the chip, and it is transferred to the data bus.
   8) Before the read cycle can be considered complete, CAS and
      RAS must return to their previous state.
• Note that this is a conventional asynchronous
  read, because the timing signals are not tied to a
  common system clock.
                                                                                        28
DRAM Timing
• Dynamic RAM Timing: The access time (tRAC) is the
  time from the time the RAS signal is activated to
  the time the data is available on the data bus.
• The read cycle time (tRC) is also shown in the
  diagram.
• Observe that another time tRP is included within
  this read cycle time. The total read cycle time is the
  sum of the ‘RAS active time’ and the ‘RAS pre-
  charge time’. The first corresponds to the time
                                                                  A read cycle of DRAM
  during which the RAS signal is active (low).
• tRP is the additional time needed before a new read
                                                            The DRAM controller takes care of
  (or write) cycle can be started by lowering the           scheduling the refreshes and
  signal. This is because there is a parasitic             making sure that they do not
  capacitance for each cell. This parasitic capacitance    interfere with regular reads and
  must be pre-charged high before any operation is         writes
  to be commenced. The access time is also referred
  to as latency.
                                                                                                29
DRAM Refreshing
• DRAM Refreshing Hints:
• Rate: It varies, but typically manufacturers specify
  that each row should be refreshed every 64 ms.
• How is refreshing done: by activating each
                                                         The addressing structure of DRAM
  row using RAS signal.
• When is refreshing done? The DRAM controller
  takes care of scheduling the refreshes and
  making sure that they do not interfere with
  regular reads and writes.
• So, to keep the data in DRAM chip from leaking
  away, the DRAM controller periodically sweeps
  through all of the rows by cycling repeatedly and
  placing a series of row addresses on the address
  bus.
• To reduce the number of refresh cycles, one
  method is to split the address such that there are
  fewer rows and more columns.
                                                             30
DRAM Advantages/ Disadvantages
• DRAM Advantages/ Disadvantages:
• One important merit of DRAM is its packing
  density is very high compared to SRAM.
• Sensing a small charge on the memory cell
  capacitor is challenging due to noise from “coupling   A DRAM cell
  capacitance”.
• It is cheaper than SRAM.
                                                                       31
Advanced DRAM Organization
•   Bits in a DRAM are organized as a rectangular array
     • DRAM accesses an entire row
•   Synchronous DRAM
     • Allows for consecutive accesses in bursts without needing to send
         each address
     • Improves bandwidth
•   Double data rate (DDR) DRAM
     • Transfer on rising and falling clock edges
•   Quad data rate (QDR) DRAM
     • Separate DDR inputs and outputs
    An animated video on DRAM:
    https://www.youtube.com/watch?v=7J7X7aZvMXQ
Dynamic vs. Static RAM
• SRAM vs. DRAM Summary
• Typical Memory Types and corresponding Latencies
                                                     33
ROM (Read Only Memory)
 • Main Memory: ROM (Read Only Memory): This
   is ‘Read Only Memory’. A ROM does not lose its
   contents when power is switched off. ROM is a
   type of ‘programmable’ memory. It has internal
   fuses which when blown create a bit pattern
   which is permanent and hence can be read
   whenever needed. However, if it is an OTP (one
   time programmable) ROM, its contents can
   never be changed again.
                                                            Categorization of main memory types
  • EPROMs are ‘Erasable and Programmable’
    exposing them to ultraviolet radiation.            EEPROM technology is used for BIOS ROMs
  • EEPROM: This is ‘Electrically Erasable’ PROM,      in personal computers.
    and erasure can be done while on circuit board.
  • Flash ROM: This is a special type of EEPROM        Flash ROM technology is used in
    that can be erased and reprogrammed in blocks      microcontrollers and embedded computer
    instead of one byte at a time. This feature gave   systems
    flash memory the advantage of speed over
    EEPROM.
                                                                                                  34
 Flash Storage
• Nonvolatile semiconductor storage
   • 100× – 1000× faster than disk
   • Smaller, lower power, more robust
   • But more $/GB (between disk and DRAM)
   • Popular in personal mobile devices
    Flash Types
•    NOR flash: bit cell like a NOR gate
     •   Random read/write access
     •   Used for instruction memory in embedded systems
•    NAND flash: bit cell like a NAND gate
     •   Denser (bits/area), but block-at-a-time access
     •   Cheaper per GB
     •   Used for USB keys, media storage, …
•    Flash bits wears out after 1000’s of accesses
     •   Not suitable for direct RAM or disk replacement
     •   Wear leveling: remap data to less used blocks
    More information on: https://www.youtube.com/watch?v=YtBysgPOKx4
Disk Storage
• Nonvolatile, rotating magnetic storage
   • Big capacity
   • Awfully slow
      • Average read time about 6.2 ms
More information on :
https://www.youtube.com/watch?v=NtPc0jI21i0
A bit on SSD: https://www.youtube.com/watch?v=5Mh3o886qpg
 Summary
• Memory technologies
• RAM
   SRAM
   DRAM
• ROM
• Secondary storages
 Future lecture
• Principles of locality
• Caches