CSE2213 Lecture 8 Chapter5 the-Memory-System
CSE2213 Lecture 8 Chapter5 the-Memory-System
SYSTEM
Processor Memory
k-bit
address bus
MAR
n-bit
data bus
Up to 2k addressable
MDR locations
Control lines
ഥ , MFC, etc.)
(R/𝑊
Recall that the data transfers between a processor and memory involves two
registers MAR and MDR.
If the address bus is k-bits, then the length of MAR is k bits.
If the word length is n-bits, then the length of MDR is n bits.
Control lines include R/𝑊 ഥ and MFC.
For Read operation R/𝑊 ഥ = 1 and for Write operation R/𝑊 ഥ = 0.
3
Basic concepts (contd..)
❑ Measures for the speed of a memory:
◆ Elapsed time between the initiation of an operation and the completion of
an operation is the memory access time (e.g. the time between the Read
& MFC signals).
◆ Minimum time between the initiation of two successive memory
operations is memory cycle time (e.g. the time between two successive
Read operations)
◆ Memory Cycle time is slightly longer than memory access time
❑ In general, the faster a memory system, the costlier it is and the
smaller it is.
❑ An important design issue is to provide a computer system with as
large and fast a memory as possible, within a given cost target.
❑ Several techniques to increase the effective size and speed of the
memory:
◆ Cache memory (to increase the effective speed).
◆ Virtual memory (to increase the effective size).
4
Semiconductor RAM memories
5
Semiconductor RAM memories (contd..)
Internal organization of memory chips size=128 bits (16x8) = 16 words, 8 bits each
7 7 1 1 0 0
W0
•
•
• FF FF
A0 W1
•
A1 •
Address • Memory
decoder • • • • • • cells
A2 • • • • • •
• • • • • •
A3
W15
•
•
•
6
Semiconductor RAM memories (contd..)
❑ Static RAMs (SRAMs):
◆ Consist of circuits that are capable of retaining their state as long as the
power is applied.
◆ Volatile memories, because their contents are lost when power is interrupted.
◆ Access times of static RAMs are in the range of few nanoseconds.
◆ Capacity is low (each cell consists of 6 transistors)
◆ However, the cost is usually high.
❑ Dynamic RAMs (DRAMs):
◆ Do not retain their state indefinitely.
◆ Contents must be periodically refreshed to retain its state (or else they get
erased).
◆ Contents may be refreshed while accessing them for reading.
◆ Access time is longer than SRAM
◆ Capacity is higher than SRAM (each cell consists of 1 transistor)
◆ Cost is lower than SRAM (as it has fewer transistors)
7
Semiconductor RAM memories (contd..)
8
Semiconductor RAM memories (contd..)
Bandwidth
❑ How much time is needed to transfer a single block of data.
❑ Blocks can be variable in size, it is useful to define a performance
measure in terms of the number of bits or bytes transferred in
one second.
❑ This performance measure is referred to as memory bandwidth.
❑ Bandwidth of a memory unit depends on:
◆ Speed of access to the chip.
◆ How many bits can be accessed in parallel.
❑ Bandwidth of data transfer between processor and memory unit also
depends on:
◆ Transfer capability of the links that connect the processor and memory,
or the speed of the bus.
9
Memory systems
10
Read-Only Memories (ROMs)
❑ SRAM and SDRAM chips are volatile:
◆ Lose the contents when the power is turned off.
❑ Many applications need memory devices to retain contents after the
power is turned off.
◆ For example, computer is turned on, the operating system must be
loaded from the disk into the memory.
◆ Store instructions which would load the OS from the disk.
◆ Need to store these instructions so that they will not be lost after the
power is turned off.
◆ We need to store the instructions into a non-volatile memory.
❑ Non-volatile memory is read in the same manner as volatile memory.
◆ Separate writing process is needed to place information in this memory.
◆ Normal operation involves only reading of data, this type of memory is
called Read-Only memory (ROM).
11
Read-Only Memories (contd..)
❑ Read-Only Memory:
◆ Data are written into a ROM when it is manufactured.
❑ Programmable Read-Only Memory (PROM):
◆ Allow the data to be loaded by a user.
◆ Process of inserting the data is irreversible.
◆ Storing information specific to a user in a ROM is expensive.
◆ Providing programming capability to a user may be better.
❑ Erasable Programmable Read-Only Memory (EPROM):
◆ Stored data to be erased and new data to be loaded.
◆ Flexibility, useful during the development phase of digital systems.
◆ Erasable, reprogrammable ROM.
◆ Erasure requires exposing the ROM to UV light.
12
Read-Only Memories (contd..)
13
Flash memory
14
Acronyms
RAM --Random Access Memory : time taken to access any arbitrary location in memory is
constant (c.f., disks)
ROM --Read Only Memory : ROMs are RAMs which can only be written to once; thereafter
they can only be read. Older uses included storage of bootstrap info
SRAM --Static RAM: RAM chip which loses contents upon power off
SDRAM --Synchronous DRAM: RAM whose memory operations are synchronized with a clock
signal.
15
Memory hierarchy
17
Memory hierarchy (contd..)
Pr ocessor •Fastest access is to the data held in
processor registers. Registers are at
Registers the top of the memory hierarchy.
Increasing Increasing Increasing •Relatively small amount of memory that
size speed cost per bit can be implemented on the processor
Primary L1
cache chip. This is processor cache.
•Two levels of cache:
Level 1 (L1) cache is on the processor chip.
Level 2 (L2) cache is in between main memory and
Secondary L2
cache
processor.
•Next level is main memory, implemented
as SIMMs. Much larger, but much slower
than cache memory.
Main •Next level is magnetic disks. Huge amount
memory
of inexpensive storage.
•Speed of memory access is critical, the
idea is to bring instructions and data
Magnetic disk that will be used in the near future as
secondary close to the processor as possible.
memory
18
Cache memories
• Processor is much faster than the main memory.
– As a result, the processor has to spend much of its
time waiting while instructions and data are being
fetched from the main memory.
– Major obstacle towards achieving good performance.
• Speed of the main memory cannot be increased beyond a
certain point.
• Cache memory is an architectural arrangement which
makes the main memory appear faster to the processor
than it really is.
• Cache memory is based on the property of computer
programs known as “locality of reference”.
19
Locality of reference
• Analysis of programs indicates that many instructions in localized
areas of a program are executed repeatedly during some period of
time, while the others are accessed relatively less frequently.
– These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly.
– This is called “locality of reference”.
• Temporal locality of reference:
– Recently executed instruction is likely to be executed again very
soon.
• Spatial locality of reference:
– Instructions with addresses close to a recently executed
instruction are likely to be executed soon.
20
Time Accessed Content in Actual Memory for(i = 1 ; i <=10 ; i++ )
(when it was brought Cache Location
to Cache)
{
(location where the
data is actually stored Instruction X
[hh:mm:ss] in Main Memory) Instruction Y
Instruction Z
Instruction C
02:00:00 Instruction X 1000 }
Temporal Locality of Reference
02:00:01 Instruction Y 2004
22
Hit and Miss
• Focus on any two adjacent levels – called, upper (closer to CPU) and lower (farther
from CPU) – in the memory hierarchy, because each block copy is always between
two adjacent levels
• Terminology:
– block: minimum unit of data to move between levels (Usually 1 word)
– hit: data requested is in upper level (example:data found in cache!)
– miss: data requested is not in upper level (e.g., data not in cache, So, go to lower
level (main memory or hard disk) to find that data)
– hit rate: fraction of memory accesses that are hits (i.e., found at upper level, say
cache)
– miss rate: fraction of memory accesses that are not hits
• miss rate = 1 – hit rate
– hit time: time to determine if the access is really a hit + time to get and deliver the
data from the upper level (e.g., cache) to the CPU
– miss penalty: time to determine if the access is a miss + time to replace block at
upper level (say, cache) with corresponding block at lower level (e.g., RAM) + time
to deliver the block to the CPU
23
Cache memories
Main
Processor Cache
memory
•Processor issues a Read request, a block of words is transferred from the main memory to the
cache, one word at a time.
•Subsequent references to the data in this block of words are found in the cache.
•At any given time, only some blocks in the main memory are held in the cache.
•Which blocks in the main memory are in the cache is determined by a “mapping function”.
•When the cache is full, and a block of words needs to be transferred from the main memory,
some block of words in the cache must be replaced. This is determined by a “replacement
algorithm”.
24
Cache memories (contd..)
•Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner.
•Read hit:
- The data is obtained from the cache.
•Write hit: Whenever a Processor wants to write a word, it checks to see if the
address it wants to write the data to, is present in the cache or not. If the address
is present in the cache it is called Write Hit. Two protocols are used:
- Contents of the cache and the main memory may be updated simultaneously. This
is called the write-through protocol.
- Update the contents of the cache, and mark it as updated by setting a bit known as
the dirty bit or modified bit. The contents of the main memory are updated when this
block is replaced (i.e. removed from cache and replaced with another block). This is
write-back or copy-back protocol.
25
Cache memories (contd..)
•If the data is not present in the cache, then a Read miss or Write miss occurs.
•Read miss:
- Block of words containing this requested word is transferred from the memory.
- After the block is transferred, the desired word is forwarded to the processor.
- The desired word may also be forwarded to the processor as soon as it is
transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.
•Write-miss:
- If Write-through protocol is used, then the contents of the main memory are
updated directly.
- If write-back protocol is used, the block containing the addressed word is first
brought into the cache.The desired word is overwritten with new information.
26
Mapping functions
• Mapping functions determine how memory
blocks are placed in the cache.
• Three mapping functions:
– Direct mapping
– Associative mapping
– Set-associative mapping.
27
Direct Mapped Cache
•Cache block address = memory block address mod
cache size
29
Set-Associative mapping
Main
memory
Block 0 Set-associative mapping combination of direct and
Cache
associative mapping.
tag Block 1
Block 0 Blocks of cache are grouped into sets.
Set 0
tag Mapping function allows a block of the main
Block 1 memory to reside in any block of a specific set.
tag
Block 2
Assume that, No of Blocks of memory =4096.
Set 1 Divide the cache into 64 sets, with two blocks per set. (if cache size
tag Block 63
Block 3 128)
Block 64 Memory block 0, 64, 128 etc. map to set 0, and they
can occupy either of the two positions.
Block 65
16 bit Memory address is divided into three fields:
tag
- Low order 4 bits identify the word within a block.
Set 63
Block 126 - Middle 6 bit field determines the set number.
tag - High order 6 bit fields are compared to the tag
Block 127
fields of the two blocks in a set.
Block 127
Tag size = No of Blocks of memory / No of sets of cache =
Block 128 4096/64 = 64= 26
31
Replacement algorithms
When the cache is full, and a block of words needs to be transferred from the
main memory, some block of words in the cache must be replaced.This is
determined by a “replacement algorithm”.
When a new block is to be transferred to the cache, and all the positions it
may occupy are full, which block in the cache should be replaced?
32
Replacement algorithms
Locality of reference suggests that it may be okay to replace the block that
has gone the longest time without being referenced.
This block is called Least Recently Used (LRU) block, and the replacement
strategy is called LRU replacement algorithm.
33
MEASURING C ACHE
PERFORMANCE
• The extra time needed to bring the desired information into the cache is
called the miss penalty.
• This penalty is ultimately reflected in the time that the processor is stalled
because the required instructions or data are not available for execution.
• In general, the miss penalty is the time needed to bring a block of data
from a slower unit in the memory hierarchy to a faster unit.
• The miss penalty is reduced if efficient mechanisms for transferring data
between the various units of the hierarchy are implemented.
MEASURING C ACHE
PERFORMANCE
• Consider now the impact of the cache on the overall performance of the
computer.
• Let h be the hit rate, M the miss penalty, that is, the time to access
information in the main memory, and C the time to access information in the
cache. The average access time (tave) experienced by the processor is:
tave = h*C + (1-h)M
If the computer has no cache, then, using a fast processor and a typical DRAM
main memory, suppose it takes 10 clock cycles for each memory read access.
Suppose, 17 cycles are needed to load a block into the cache.
MEASURING C ACHE
PERFORMANCE
This result suggests that the computer with the cache performs five times
better.
MEASURING C ACHE
PERFORMANCE
This means that the actual cache provides an environment in which the
processor effectively works with a large DRAM-based main memory that
appears to be only two times slower than the circuits in the cache.
IMPROVING C ACHE
PERFORMANCE
• From the speed point of view, the optimal place for a cache is on the
processor chip.
• Unfortunately, space on the processor chip is needed for many other
functions; this limits the size of the cache that can be accommodated.
• All high-performance processor chips include some form of a cache.
• Some manufacturers have chosen to implement two separate caches, one for
instructions and another for data, as in the 68040, Pentium III, and Pentium 4
processors.
• Others have implemented a single cache for both instructions and data, as in
the ARM710T processor.
IMPROVING C ACHE
PERFORMANCE
• The L2 cache can be slower, but it should be much larger to ensure a high hit
rate.
• Its speed is less critical because it only affects the miss penalty of the L1
cache.
• A workstation computer may include an L1 cache with the capacity of tens of
kilobytes and an L2 cache of several megabytes. Including an L2 cache further
reduces the impact of the main memory speed on the performance of a
computer.
IMPROVING C ACHE
PERFORMANCE
• In most modern computer systems, the physical main memory is not as large
as the address space spanned by an address issued by the processor.
• For example, a processor that issues 32-bit addresses has an addressable
space of 4G bytes.
• The size of the main memory in a typical computer ranges from a few
hundred megabytes to 1G bytes.
• When a program does not completely fit into the main memory, the parts of
it not currently being executed are stored on secondary storage devices,
such as magnetic disks.
• Of course, all parts of a program that are eventually executed are first
brought into the main memory.
VIRTUAL MEMORY
• Programs, and hence the processor, reference an instruction and data space
that is independent of the available physical main memory space.
• The binary addresses that the processor issues for either instructions or
data are called virtual or logical addresses. These addresses are translated
into physical addresses by a combination of hardware and software
components.
• If a virtual address refers to a part of the program or data space that is
currently in the physical memory, then the contents of the appropriate location
in the main memory are accessed immediately.
• On the other hand, if the referenced address is not in the main memory, its
contents must be brought into a suitable location in the memory before they
can be used.
VIRTUAL MEMORY