KEMBAR78
CPUlogic Design 11 Cache | PDF | Cpu Cache | Cache (Computing)
0% found this document useful (0 votes)
49 views27 pages

CPUlogic Design 11 Cache

This document discusses CPU logic design and cache memory. It describes the memory hierarchy from fastest to slowest components and their typical sizes and access times. It shows a diagram of the processor, instruction cache, and data cache. The rest of the document discusses key cache parameters like size, block size, placement, mapping algorithms, replacement algorithms, and write strategies. It provides examples to explain direct mapping, set associative mapping and fully associative placement. It also describes how blocks are identified in the cache using address tags and valid bits and how CPU addresses are divided for caching.

Uploaded by

sharoofy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views27 pages

CPUlogic Design 11 Cache

This document discusses CPU logic design and cache memory. It describes the memory hierarchy from fastest to slowest components and their typical sizes and access times. It shows a diagram of the processor, instruction cache, and data cache. The rest of the document discusses key cache parameters like size, block size, placement, mapping algorithms, replacement algorithms, and write strategies. It provides examples to explain direct mapping, set associative mapping and fully associative placement. It also describes how blocks are identified in the cache using address tags and valid bits and how CPU addresses are divided for caching.

Uploaded by

sharoofy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CPU Logic Design

Li Yamin
Department of Computer Science
Faculty of Computer and Information Sciences
Hosei University, Tokyo 184-8584 Japan
http://cis.k.hosei.ac.jp/∼yamin/

LYM, Dept. of CS, Hosei University Cache – p.1/27


Performance

100,000

10,000

1,000

100

10

1
19
80
19
81
19
82
CPU Logic Design

19
83
19
84

LYM, Dept. of CS, Hosei University


19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
CPU

93
19
94
19
95
19
Memory

96
19
97
19
98
19
99
20
00
20
Cache – p.2/27

02
CPU Logic Design

Memory hierarchy
Speed Size

Fastest Registers smallest


On-chip cache
Off-chip cache

Main memory
Disk
Slowest Tape Biggest

LYM, Dept. of CS, Hosei University Cache – p.3/27


CPU Logic Design

Capacity Access Time


Registers ∼ 1KB 0.5 ∼ 1ns
On-chip cache 16 ∼ 128KB 1 ∼ 10ns
Off-chip cache 128K ∼ 4MB 10 ∼ 20ns
Main memory 256MB ∼ 4GB 40 ∼ 80ns
Hard disk 20 ∼ 500GB 1 ∼ 10ms
Tape ∼ 10TB 10ms ∼ 10s

LYM, Dept. of CS, Hosei University Cache – p.4/27


CPU Logic Design

Processor
DO
IR
DI Instruction Memory
Cache
PC A

DI D

DO
DO
DI Data
Cache
DA A

LYM, Dept. of CS, Hosei University Cache – p.5/27


CPU Logic Design

 
  Cache Size
Size






  Block Size

 

  Physical Address Cache

Cache Placement






  Virtual Address Cache

 







 Direct Mapping

Mapping Algorithms Set Associate Mapping




 

Parameters Full Associate Mapping


 


  Write Back


 Memory Update Mechanism


  Write Through

 

  Write Allocate




 Cache Write Miss


  No-Write Allocate

 

  Random Replacement Algorithms


 Replacement Algorithms 



LRU Replacement Algorithms

LYM, Dept. of CS, Hosei University Cache – p.6/27


CPU Logic Design

Q1: Block placement – Where can a block be placed


in the upper level?

Q2: Block identification – How is a block found if it is in


the upper level?

Q3: Block replacement – Which block should be


replaced on a miss?

Q4: Write strategy – What happens on a write?

LYM, Dept. of CS, Hosei University Cache – p.7/27


CPU Logic Design

Q1: Where can a block be placed in the upper level?


If each block has only one place it can appear in the
cache, the cache is said to be direct mapped.
If a block can be placed anywhere in the cache, the
cache is said to be fully associative.
If a block can be placed in a restricted set of places in
the cache, the cache is said to be set associative. A
set is a group of blocks in the cache. A block is first
mapped onto a set, and then the block can be placed
anywhere within that set. If there are n blocks in a set,
the cache placement is called n-way set associative.

LYM, Dept. of CS, Hosei University Cache – p.8/27


CPU Logic Design

Fully associative Direct mapped Set associative


block 12 can go block 12 can go block 12 can go
anywhere only into block 4 anywhere in set 0
(12 mod 8) (12 mod 4)
Block Block Block
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
number number number

Cache

Set Set Set Set


0 1 2 3

Block 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
number

Memory

LYM, Dept. of CS, Hosei University Cache – p.9/27


CPU Logic Design

Q2: How is a block found if it is in the upper level?


Caches have an address tag on each block frame that
gives the block address.
The tag is checked to see if it matches the block
address from the CPU.
There must be a way to know that a cache block does
not have valid information. The most common
procedure is to add a valid bit to the tag to say
whether or not this entry contains a valid address.
The figure in the next page shows how a CPU
address is divided.

LYM, Dept. of CS, Hosei University Cache – p.10/27


CPU Logic Design

The three portions of an address in a set-associative


or direct-mapped cache:
Block address Block
Tag Index offset

The first division is between the block address and


the block offset. The block frame address can be
further divided into the tag field and the index field.
The block offset field selects the desired data from
the block, the index field selects the set, and the tag
field is compared against it for a hit.

LYM, Dept. of CS, Hosei University Cache – p.11/27


CPU Logic Design

Block
Block address
offset
<21> <8> <5> Address
Tag Index Data in
2
Data out
Valid Tag Data
<1> <21> <256>

256
blocks Write
buffer
mux
=? 64

Lower level memory


Hit

LYM, Dept. of CS, Hosei University Cache – p.12/27


CPU Logic Design

Block
Block address
offset
<22> <7> <5> Address
Tag Index Data in
2
Data out

Valid Tag Data


<1> <22> <256>

128
blocks Write
buffer
mux mux
=? =?

Lower
mux level
64 memory

LYM, Dept. of CS, Hosei University Cache – p.13/27


CPU Logic Design

Q3: Which block should be replaced on a miss?


When a miss occurs, the cache controller must select
a block to be replaced with the desired data.
A benefit of direct-mapped placement is that
hardware decisions are simplified — in fact, so simple
that there is no choice: Only one block is selected for
a hit, and one that block can be replaced.
With fully associative or set associative placement,
there are many blocks to choose from on a miss.
There are two primary strategies employed for
selecting which block to replace — random and
least-recently used (LRU).

LYM, Dept. of CS, Hosei University Cache – p.14/27


CPU Logic Design

Random Strategy:
To spread allocation uniformly, candidate blocks are
randomly selected.
Some system generate pseudo-random block
numbers to get reproducible behavior, which is
partially useful when debugging hardware.
A virtue of random replacement is that it is simple to
build in hardware.

LYM, Dept. of CS, Hosei University Cache – p.15/27


CPU Logic Design

Least-recently used (LRU) Strategy:


To reduce the chance of throwing out information that
will be needed soon, accesses to block are recorded.
The block replaced is the one that has been unused
for the longest time.
LRU makes use of a corollary of locality: If recently
used blocks are likely to be used again, then the best
candidate for disposal is the least-recently used block.

LYM, Dept. of CS, Hosei University Cache – p.16/27


CPU Logic Design

When a miss occurs in a cache, the incoming data


has to replace something already in the cache.
The least recently used , or LRU, replacement
algorithm tries to find which block was used least
recently. We must keep track of the most recent time
that a block is used.

LYM, Dept. of CS, Hosei University Cache – p.17/27


CPU Logic Design

Consider a set size of 8 (8-way set associate cache).


LRU could be realized by assigning a 3-bit
downcounter to each block. When the cache is
flushed, all counters are reset. The rules are:

Replace at count 0 (any 0 will do).


Upon a hit (including replacement), set the hit block
to 7 and decrement all other counters, but the
counter cannot go below zero.
On cache reset, reset all counters as well as valid
bits.

LYM, Dept. of CS, Hosei University Cache – p.18/27


CPU Logic Design

If the set size is 4 (4-way set associate cache), LRU


could be realized by assigning a 2-bit downcounter to
each block.
The counter value 3 means that set is used most
recently; 2 means that set is used more recently; ...;
and 0 means that set is used least recently.

LYM, Dept. of CS, Hosei University Cache – p.19/27


CPU Logic Design

If the set size is 2 (2-way set associate cache), the


one which was not most recently used must be the
least recently used. So it is not necessary to use two
1-bit counters in the two sets. A 1-bit counter is
enough. The counter value 1 means the set 1 is used
most recently, for instance, it can be understand that
the set 0 must be used least recently.
By assigning a 4-bit downcounter to each block can
realize LRU for a 16-way set associate cache. But the
16-way set associate cache was not found in real
design. Assigning a 4-bit downcounter to each block
will increase the hardware burden greatly.

LYM, Dept. of CS, Hosei University Cache – p.20/27


CPU Logic Design

(1) Replace S3: (2) Replace S1:


S3 S2 S1 S0 S3 S2 S1 S0
0 3 0 0 0 2 0 3 0
1 0 0 0 0 0 0 0 0
LRU counter ... ... ... ... ... ... ... ... ...
(2-bit)
7 0 0 0 0 0 0 0 0

(3) Replace S0: (4) Replace S2:


S3 S2 S1 S0 S3 S2 S1 S0

1 0 2 3 0 3 1 2
0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ...

0 0 0 0 0 0 0 0

LYM, Dept. of CS, Hosei University Cache – p.21/27


CPU Logic Design

2-way set-associate cache Tag 3 2

S1
2AE1 4 1 Word address S0

d3 d2 d1 d0 V Tag LRU Tag V d3 d2 d1 d0


0 0 0 1 0
1 1 1 0 1
2 1 0 1 2
3 0 0 3
4 1 2AE1 1 15D3 1 4
5 0 0 1 5
6 1 1 0 6
7 1 1 1 7

COMP COMP
Select Select
MUX MUX
OutputEnable OutputEnable

Data out Hit Data out

LYM, Dept. of CS, Hosei University Cache – p.22/27


CPU Logic Design

Q4: What happens on a write?


There are two basic options when writing to the
cache:
1. Write-through (or store through) — The information
is written to both the block in the cache and the
block in the lower-level memory.
2. Write back (or copy back or store in) — The
information is written only to the block in the cache.
The modified cache block is written to main
memory only when it is replaced.

LYM, Dept. of CS, Hosei University Cache – p.23/27


CPU Logic Design

There are two options on a write miss:


1. Write allocate — The block is loaded on a write
miss. This is similar to a read miss.
2. No-write allocate — The block is modified in the
lower level and not loaded into the cache.
Although either write-miss policy could be used with
write through or write back, write-back caches
generally use write allocate (hopping that subsequent
writes to that block will be captured by the cache) and
write-through caches often use no-write allocate
(since subsequent writes to that block will still have to
go to memory).

LYM, Dept. of CS, Hosei University Cache – p.24/27


CPU Logic Design

A valid bit is added to units smaller than the full block,


called sub-blocks.
Only a single sub-block need be read on a miss.

Tag V V V V
100 1 1 1 1
300 1 1 0 0
200 0 1 0 1
204 0 0 0 0

SUb-block

LYM, Dept. of CS, Hosei University Cache – p.25/27


CPU Logic Design

Tc : cache access time


Tm : main memory access time
h: hit ratio; 1 − h: miss ratio
Average memory access time
T = hTc + (1 − h)(Tc + Tm ) = Tc + (1 − h)Tm

Example:
Tc = 1ns, Tm = 50ns, h = 95%
T = Tc + mTm = 1 + 0.05 × 50 = 1 + 2.5 = 3.5ns
Speedup = 50/3.5 = 14.29 = 1429%

LYM, Dept. of CS, Hosei University Cache – p.26/27


CPU Logic Design

First-level caches in the Pentium Pro and PowerPC 604


Characteristic Intel Pentium Pro PowerPC 604
Cache organization Split instruction Split instruction
and data caches and data caches
Cache size 8KB each for 16KB each for
instructions/data instructions/data
Cache associativity Four-way set Four-way set
associative associative
Replacement Approximated LRU LRU
Block size 32 bytes 32 bytes
Write policy Write-back Write-back or Write-through

LYM, Dept. of CS, Hosei University Cache – p.27/27

You might also like