0% found this document useful (0 votes)

20 views33 pages

Lec17 Cache 3

The document discusses cache memory organization and design, focusing on examples of direct-mapped, associative, and set-associative cache mapping techniques. It includes performance comparisons and calculations for cache size requirements based on different configurations. The content is aimed at enhancing understanding of cache efficiency and memory access patterns in computer organization.

Uploaded by

2105019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views33 pages

Lec17 Cache 3

Uploaded by

2105019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CENG 3420

Computer Organization & Design

Lecture 17: Cache-3 Discussions
Bei Yu
CSE Department, CUHK
byu@cse.cuhk.edu.hk

(Textbook: Chapters 5.3–5.4)

Spring 2022
Overview

1 Example 1

2 Example 2

3 Example 3

4 Performance Issues

2/28
Example 1
Cache Example

short A[10][4];
int sum = 0; • Assume separate instruction and data caches
int j, i;
double mean; • So we consider only the data
// forward loop • Cache has space for 8 blocks
for (j = 0; j <= 9; j++)
sum += A[j][0];
• A block contains one word (byte)
mean = sum / 10.0;
• A[10][4] is an array of words located at
// backward loop
7A00-7A27 in row-major order
for (i = 9; i >= 0; i--)
A[i][0] = A[i][0]/mean;

4/28
Cache Example

Memory word
address in hex Memory word address in binary Array Contents (40 elements)

(7A00) 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 A[0][0]
(7A01) 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 A[0][1]
(7A02) 0 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 A[0][2]
(7A03) 0 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 A[0][3]
(7A04) 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 A[1][0]

(7A24) 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 0 A[9][0]
(7A25) 0 1 1 1 1 0 1 0 0 0 1 0 0 1 0 1 A[9][1]
(7A26) 0 1 1 1 1 0 1 0 0 0 1 0 0 1 1 0 A[9][2]
(7A27) 0 1 1 1 1 0 1 0 0 0 1 0 0 1 1 1 A[9][3]

Tag for Direct Mapped

8 blocks in cache, 3 bits encodes cache block number
Tag for Set-Associative
4 blocks/ set, 2 cache sets, 1 bit encodes cache set number
Tag for Associative

To simplify discussion: 16-bit word (byte) address; i.e. 1 word = 1 byte.

5/28
Direct Mapping
• Least significant 3-bits of address determine location
• No replacement algorithm is needed in Direct Mapping
• When i == 9 and i == 8, get a cache hit (2 hits in total)
• Only 2 out of the 8 cache positions used
• Very inefficient cache utilization

Content of data cache after loop pass: (time line)

j=0 j=1 j=2 j=3 j=4 j=5 j=6 j=7 j=8 j=9 i=9 i=8 i=7 i=6 i=5 i=4 i=3 i=2 i=1 i=0

0 A[0][0] A[0][0] A[2][0] A[2][0] A[4][0] A[4][0] A[6][0] A[6][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[6][0] A[6][0] A[4][0] A[4][0] A[2][0] A[2][0] A[0][0]
1
2
Cache 3
Block
number 4 A[1][0] A[1][0] A[3][0] A[3][0] A[5][0] A[5][0] A[7][0] A[7][0] A[9][0] A[9][0] A[9][0] A[7][0] A[7][0] A[5][0] A[5][0] A[3][0] A[3][0] A[1][0] A[1][0]

5
6
7

Tags not shown but are needed.

6/28
Associative Mapping

• LRU replacement policy: get cache hits for i = 9, 8, . . . , 2

• If i loop was a forward one, we would get no hits!

Content of data cache after loop pass: (time line)

j=0 j=1 j=2 j=3 j=4 j=5 j=6 j=7 j=8 j=9 i=9 i=8 i=7 i=6 i=5 i=4 i=3 i=2 i=1 i=0

0 A[0][0] A[0][0] A[0][0] A[0][0] A[0][0] A[0][0] A[0][0] A[0][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[0][0]

1 A[1][0] A[1][0] A[1][0] A[1][0] A[1][0] A[1][0] A[1][0] A[1][0] A[9][0] A[9][0] A[9][0] A[9][0] A[9][0] A[9][0] A[9][0] A[9][0] A[9][0] A[1][0] A[1][0]

2 A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0] A[2][0]

Cache 3 A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0] A[3][0]
Block
number 4 A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0] A[4][0]

5 A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0] A[5][0]

6 A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0]

7 A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0]

Tags not shown but are needed; LRU Counters not shown but are needed.
7/28
Set Associative Mapping
• Since all accessed blocks have even addresses (7A00, 7A04, 7A08, ...), only
half of the cache is used, i.e. they all map to set 0
• LRU replacement policy: get hits for i = 9, 8, 7 and 6
• Random replacement would have better average performance
• If i loop was a forward one, we would get no hits!

Content of data cache after loop pass: (time line)

j=0 j=1 j=2 j=3 j=4 j=5 j=6 j=7 j=8 j=9 i=9 i=8 i=7 i=6 i=5 i=4 i=3 i=2 i=1 i=0

0 A[0][0] A[0][0] A[0][0] A[0][0] A[4][0] A[4][0] A[4][0] A[4][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[8][0] A[4][0] A[4][0] A[4][0] A[4][0] A[0][0]
1 A[1][0] A[1][0] A[1][0] A[1][0] A[5][0] A[5][0] A[5][0] A[5][0] A[9][0] A[9][0] A[9][0] A[9][0] A[9][0] A[5][0] A[5][0] A[5][0] A[5][0] A[1][0] A[1][0]
Set 0
2 A[2][0] A[2][0] A[2][0] A[2][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[6][0] A[2][0] A[2][0] A[2][0]
Cache 3 A[3][0] A[3][0] A[3][0] A[3][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[7][0] A[3][0] A[3][0] A[3][0] A[3][0]
Block
number 4
5
Set 1
6
7

Tags not shown but are needed; LRU Counters not shown but are needed.
8/28
Comments on the Example

• In this example, Associative is best, then Set-Associative, lastly Direct Mapping.

• What are the advantages and disadvantages of each scheme?

• In practice,
• Low hit rates like in the example is very rare.
• Usually Set-Associative with LRU replacement scheme is used.

• Larger blocks and more blocks greatly improve cache hit rate, i.e. more cache
memory

9/28
Example 2
Question:
How many total bits are required for a direct-mapped cache with 16 KiB of data and
4-word blocks, assuming a 32-bit address?

11/28
Question:
How many total bits are required for a direct-mapped cache with 16 KiB of data and
4-word blocks, assuming a 32-bit address?

Answer:
• In a 32-bit address CPU, 16 KiB is 4096 words.
• With a block size of 4 words, there are 1024 blocks.
• Each block has 4 × 32 or 128 bits of data plus a tag, which is (32 − 10 − 2 − 2) = 18
bits, plus a valid bit.
• Thus, the total cache size is 210 × (4 × 32 + 18 + 1) = 210 × 147 bits.
147
• the total number of bits in the cache is about 1.15 = times as many as needed
32 × 4
just for the storage of the data.

11/28
Question:
How many total bits are required for an associated-mapped cache with 16 KiB of data and
4-word blocks, assuming a 32-bit address?

12/28
Question:
How many total bits are required for an associated-mapped cache with 16 KiB of data and
4-word blocks, assuming a 32-bit address?

Answer:
• In a 32-bit address CPU, 16 KiB is 4096 words.
• With a block size of 4 words, there are 1024 blocks.
• Each block has 4 × 32 or 128 bits of data plus a tag, which is (32 − 2 − 2) = 28 bits,
plus a valid bit.
• Thus, the total cache size is 210 × (4 × 32 + 28 + 1) = 210 × 157 bits.
157
• the total number of bits in the cache is about 1.27 = times as many as needed
32 × 4
just for the storage of the data.

12/28
Example 3
Question
We have designed a 64-bit address direct-mapped cache, and the bits of address used to
access the cache are as shown below:

Table: Bits of the address to use in accessing the cache

Tag Index Offset

63-10 9-5 4-0

1 What is the block size of the cache in words?

2 Find the ratio between total bits required for such a cache design implementation
over the data storage bits.
3 Beginning from power on, the following byte-addressed cache references are
recorded as shown below.

Table: Recored byte-addressed cache references

Hex 00 04 10 84 E8 A0 400 1E 8C C1C B4 884

Dec 0 4 16 132 232 160 1024 30 140 3100 180 2180

Find the hit ratio.

14/28
1 Each cache block consists of four 8-byte words. The total offset is 5 bits. Three of
those 5 bits is the word offset (the offset into an 8-byte word). The remaining two bits
are the block offset. Two bits allows us to enumerate 22 = 4 words.
2 The ratio is 1.21. The cache stores a total of
32lines × 4words/block × 8bytes/word = 1024bytes = 8192bits. In addition to the
data, each line contains 54 tag bits and 1 valid bit. Thus, the total bits required is
8192 + 54 × 32 + 1 × 32 = 9952 bits.
4
3 The hit ratio is 12 = 33%

15/28
Performance Issues
Q1: Where A Block Be Placed in Upper Level?

Scheme name # of sets Blocks per set

Direct mapped # of blocks 1
# of blocks
Set associative Associativity Associativity
Fully associative 1 # of blocks

17/28
Q1: Where A Block Be Placed in Upper Level?

Scheme name # of sets Blocks per set

Direct mapped # of blocks 1
# of blocks
Set associative Associativity Associativity
Fully associative 1 # of blocks

Q2: How Is Entry Be Found?

Scheme name Location method # of comparisons
Direct mapped Index 1
Set associative Index the set; compare set’s tags Degree of associativity
Fully associative Compare all tags # of blocks

17/28
Q3: Which Entry Should Be Replaced on a Miss?
• Direct mapped: only one choice
• Set associative or fully associative:
• Random
• LRU (Least Recently Used)

Note that:
• For a 2-way set associative, random replacement has a miss rate 1.1× than LRU
• For high level associativity (4-way), LRU is too costly

18/28
Q4: What Happen On A Write?
• Write-Through:
• The information is written in both the block in cache & the block in lower level
of memory
• Combined with write buffer, so write waits can be eliminated
•
L
L:
• :

• Write-Back:
• The information is written only to the block in cache
• The modification is written to lower level, only when the block is replaced
• Need dirty bit: tracks whether the block is clean or not
• Virtual memory always use write-back
•
L
:
•
L
:

19/28
Q4: What Happen On A Write?
• Write-Through:
• The information is written in both the block in cache & the block in lower level
of memory
• Combined with write buffer, so write waits can be eliminated
•
L
L: read misses don’t result in writes
• : easier to implement

19/28
Performance Consideration

Performance
How fast machine instructions can be brought into the processor and how fast they can be
executed.

• Two key factors are performance and cost, i.e., price/performance ratio.
• For a hierarchical memory system with cache, the processor is able to access
instructions and data more quickly when the data wanted are in the cache.
• Therefore, the impact of a cache on performance is dependent on the hit and miss
rates.

20/28
Cache Hit Rate and Miss Penalty

• High hit rates over 0.9 are essential for high-performance computers.

• A penalty is incurred because extra time is needed to bring a block of data from a
slower unit to a faster one in the hierarchy.

• During that time, the processor is stalled.

• The waiting time depends on the details of the cache operation.

Miss Penalty
Total access time seen by the processor when a miss occurs.

21/28
Miss Penalty

Example: Consider a computer with the following parameters:

Access times to the cache and the main memory are t and 10t respectively. When a cache
miss occurs, a block of 8 words will be transferred from the MM to the cache. It takes 10t
to transfer the first word of the block and the remaining 7 words are transferred at a rate
of one word per t seconds.

• Miss penalty = t + 10t + 7 × t + t

• First t: Initial cache access that results in a miss.
• Last t: Move data from the cache to the processor.

22/28
Average Memory Access Time

h × C + (1 − h) × M

• h: hit rate
• M: miss penalty
• C: cache access time

• High cache hit rates (> 90%) are essential

• Miss penalty must also be reduced

23/28
Question: Memory Access Time Example
• Assume 8 cycles to read a single memory word;
• 15 cycles to load a 8-word block from main memory (previous example);
• cache access time = 1 cycle
• For every 100 instructions, statistically 30 instructions are data read/ write
• Instruction fetch: 100 memory access: assume hit rate = 0.95
• Data read/ write: 30 memory access: assume hit rate = 0.90

Calculate: (1) Execution cycles without cache; (2) Execution cycles with cache.

24/28
Caches on Processor Chips

• In high-performance processors, two levels of caches are normally used, L1 and L2.
• L1 must be very fast as they determine the memory access time seen by the processor.
• L2 cache can be slower, but it should be much larger than the L1 cache to ensure a
high hit rate. Its speed is less critical because it only affects the miss penalty of the L1
cache.
• Average access time on such a system:

h1 · C1 + (1 − h1 ) · [h2 · C2 + (1 − h2 ) · M]

• h1 (h2 ): the L1 (L2) hit rate

• C1 the access time of L1 cache,
• C2 the miss penalty to transfer data from L2 cache to L1
• M: the miss penalty to transfer data from MM to L2 and then to L1.

25/28
Larger Block Size

• Take advantage of spatial locality.

• , If all items in a larger block are needed in a computation, it is better to load these
items into the cache in a single miss.

• / Larger blocks are effective only up to a certain size, beyond which too many items
will remain unused before the block is replaced.

• / Larger blocks take longer time to transfer and thus increase the miss penalty.

• Block sizes of 16 to 128 bytes are most popular.

26/28
Miss Rate v.s. Block Size v.s. Cache Size

Miss rate goes up if the block size becomes a significant fraction of the cache size
because the number of blocks that can be held in the same size cache is smaller
(increasing capacity misses)

27/28
Enhancement

Write buffer:
• Read request is served first.
• Write request stored in write buffer first and sent to memory whenever there is no
read request.
• The addresses of a read request should be compared with the addresses of the write
buffer.
Prefetch:
• Prefetch data into the cache before they are needed, while the processor is busy
executing instructions that do not result in a read miss.
• Prefetch instructions can be inserted by the programmer or the compiler.
Load-through Approach
• Instead of waiting the whole block to be transferred, the processor resumes execution
as soon as the required word is loaded in the cache.

28/28

Cache Memory Management Guide
No ratings yet
Cache Memory Management Guide
45 pages
Assosiative Mapping - Cache Memory
No ratings yet
Assosiative Mapping - Cache Memory
2 pages
Cache Memory
No ratings yet
Cache Memory
16 pages
N-Way Set Associative Cache Guide
No ratings yet
N-Way Set Associative Cache Guide
70 pages
Cache Memory Mapping Techniques
No ratings yet
Cache Memory Mapping Techniques
7 pages
Cache Replacement Strategies
No ratings yet
Cache Replacement Strategies
25 pages
Computer Architecture: Cache Design
No ratings yet
Computer Architecture: Cache Design
61 pages
ES Test Assignment Sol.
No ratings yet
ES Test Assignment Sol.
6 pages
COSS - Lecture - 5 - With Annotation
No ratings yet
COSS - Lecture - 5 - With Annotation
23 pages
Introduction To Cache Memory: Lecture 4A
No ratings yet
Introduction To Cache Memory: Lecture 4A
31 pages
Cache Mapping Functions
No ratings yet
Cache Mapping Functions
39 pages
5-Cache Memories-14-02-2025
No ratings yet
5-Cache Memories-14-02-2025
42 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Lecture 7
No ratings yet
Lecture 7
34 pages
10 Cache
No ratings yet
10 Cache
28 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Cache Memory Design Parameters: CS223 Computer Architecture & Organization
No ratings yet
Cache Memory Design Parameters: CS223 Computer Architecture & Organization
19 pages
My Presentation - 6th Oct. 2011
No ratings yet
My Presentation - 6th Oct. 2011
18 pages
Module 6 CO 2020
No ratings yet
Module 6 CO 2020
40 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Chapter 5
No ratings yet
Chapter 5
131 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Contact Session 5 - With Annotation
No ratings yet
Contact Session 5 - With Annotation
27 pages
Cache Memory
No ratings yet
Cache Memory
26 pages
Memory
No ratings yet
Memory
42 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
Lec 23 CAOCache Memory
No ratings yet
Lec 23 CAOCache Memory
11 pages
Cache Memory Architecture Guide
No ratings yet
Cache Memory Architecture Guide
33 pages
Lec 4
No ratings yet
Lec 4
31 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
06 - Memory System - I
No ratings yet
06 - Memory System - I
63 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Memory Cache (Finley 2000)
No ratings yet
Memory Cache (Finley 2000)
15 pages
Chap 6
No ratings yet
Chap 6
48 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
Cache Memory
No ratings yet
Cache Memory
19 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Cache Memory
No ratings yet
Cache Memory
16 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
Cache Memory
No ratings yet
Cache Memory
42 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Cache Memory
No ratings yet
Cache Memory
56 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Comp Arch Lect5
No ratings yet
Comp Arch Lect5
26 pages
Unit V
No ratings yet
Unit V
44 pages
Unit 4
No ratings yet
Unit 4
55 pages
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
No ratings yet
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
25 pages
Memory Organization PPT1
No ratings yet
Memory Organization PPT1
23 pages
CSNB123 - Tutorial CSNB123 - Tutorial
No ratings yet
CSNB123 - Tutorial CSNB123 - Tutorial
7 pages
Caching
No ratings yet
Caching
36 pages
Cache Memory
No ratings yet
Cache Memory
21 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
CH04 COA10e
No ratings yet
CH04 COA10e
46 pages
A Note On The "Implicit" Method For Finite-Difference Heat-Transfer Calculations
No ratings yet
A Note On The "Implicit" Method For Finite-Difference Heat-Transfer Calculations
2 pages
Mohammad Qaasim - Network Engineer CV
No ratings yet
Mohammad Qaasim - Network Engineer CV
4 pages
Chapter7 2
No ratings yet
Chapter7 2
23 pages
Manual Book XG1.DM
No ratings yet
Manual Book XG1.DM
29 pages
T C White Observership Award Application Form
No ratings yet
T C White Observership Award Application Form
4 pages
B.Tech CSE Provisional Grade Sheet
No ratings yet
B.Tech CSE Provisional Grade Sheet
4 pages
OLI Studio User Training Guide
No ratings yet
OLI Studio User Training Guide
7 pages
Fibre Optics
No ratings yet
Fibre Optics
57 pages
M514 M516 AU400: EL Hardware Manual Rev. 0700
No ratings yet
M514 M516 AU400: EL Hardware Manual Rev. 0700
15 pages
Air Cargo Terminal Workload Optimization
No ratings yet
Air Cargo Terminal Workload Optimization
93 pages
JAKA Zu 5 Cobot - APP
No ratings yet
JAKA Zu 5 Cobot - APP
2 pages
3D Printing Prep Guide for Designers
No ratings yet
3D Printing Prep Guide for Designers
7 pages
TNPSC Aptitude Guide
No ratings yet
TNPSC Aptitude Guide
86 pages
A Deep Convolutional Neural Network Learning Transfer To SVM-Based Segmentation Method For Brain Tumor
No ratings yet
A Deep Convolutional Neural Network Learning Transfer To SVM-Based Segmentation Method For Brain Tumor
5 pages
Elliott Wave Pattern Recognition Scanner
No ratings yet
Elliott Wave Pattern Recognition Scanner
5 pages
10 Essential InDesign Skills by InDesignSkills
100% (5)
10 Essential InDesign Skills by InDesignSkills
14 pages
Solutions To Missing Data
No ratings yet
Solutions To Missing Data
8 pages
MBA Syllabus 2019-21 PDF
100% (1)
MBA Syllabus 2019-21 PDF
266 pages
Songbook 09 - Primary CSMP My Eternal Family Singing Time Lds
No ratings yet
Songbook 09 - Primary CSMP My Eternal Family Singing Time Lds
5 pages
Ôn tập deadlock - bài tập lập lịch cho CPU - TRẦN THỊ NHẬT LINH
No ratings yet
Ôn tập deadlock - bài tập lập lịch cho CPU - TRẦN THỊ NHẬT LINH
8 pages
Laser Business Plan
No ratings yet
Laser Business Plan
11 pages
六年级美术评估 worksheet
No ratings yet
六年级美术评估 worksheet
9 pages
GeoCities Archive: A Digital Time Capsule
No ratings yet
GeoCities Archive: A Digital Time Capsule
2 pages
Sensitivity On Wheatstone Bridge Report
No ratings yet
Sensitivity On Wheatstone Bridge Report
4 pages
EN25F80 8 Megabit Serial Flash Memory With 4kbytes Uniform Sector
No ratings yet
EN25F80 8 Megabit Serial Flash Memory With 4kbytes Uniform Sector
32 pages
Solution Methodology
No ratings yet
Solution Methodology
5 pages
MU-N Series: Instruction Manual
No ratings yet
MU-N Series: Instruction Manual
4 pages
NoteGPT AI PPT Maker 1728839592167
No ratings yet
NoteGPT AI PPT Maker 1728839592167
10 pages
T300MVi-MTX EOI Manual
No ratings yet
T300MVi-MTX EOI Manual
56 pages
EG Manual-1
No ratings yet
EG Manual-1
13 pages

Lec17 Cache 3

Uploaded by

Lec17 Cache 3

Uploaded by

CENG 3420

Computer Organization & Design

(Textbook: Chapters 5.3–5.4)

Tag for Direct Mapped

To simplify discussion: 16-bit word (byte) address; i.e. 1 word = 1 byte.

Content of data cache after loop pass: (time line)

Tags not shown but are needed.

• LRU replacement policy: get cache hits for i = 9, 8, . . . , 2

Content of data cache after loop pass: (time line)

Content of data cache after loop pass: (time line)

• In this example, Associative is best, then Set-Associative, lastly Direct Mapping.

• What are the advantages and disadvantages of each scheme?

Table: Bits of the address to use in accessing the cache

Tag Index Offset

1 What is the block size of the cache in words?

Table: Recored byte-addressed cache references

Hex 00 04 10 84 E8 A0 400 1E 8C C1C B4 884

Find the hit ratio.

Scheme name # of sets Blocks per set

Scheme name # of sets Blocks per set

Q2: How Is Entry Be Found?

• During that time, the processor is stalled.

• The waiting time depends on the details of the cache operation.

Example: Consider a computer with the following parameters:

• Miss penalty = t + 10t + 7 × t + t

• High cache hit rates (> 90%) are essential

• h1 (h2 ): the L1 (L2) hit rate

• Take advantage of spatial locality.

• Block sizes of 16 to 128 bytes are most popular.

You might also like