SOLUTION QUESTION 1 [40]
a) DIRECT MEMORY ACCESS
i) Memory mapping
Main memory address 1KB = 210 hence 10 bits.
Offset- One line of cache word has 32 bit words- 4 bytes thus require 2 bits. Note that it is byte
addressable.
Index- Cache lines are 8require 3 bits
Tag- 10-5 = 5 bits
Tag Index Offset
5 3 2
[5]
ii) Cache contents
Direct Mapping
Original Structure of Main memory
Main memory:
Main memory
$000 Byte 0 Byte 1 Byte 2 Byte 3 Block 0
Structure: Size: 210 = 1024
Arranged in block or word $004 Byte 4 Byte 5 Byte 6 Byte 7 Block 1
structure:
Size: 210 /4 = 210= 256 $008 Byte 8 Byte 9 . .
blocks or lines of 4 bytes
.
Address: it is byte
addressable: From $000 to Byte Byte 1 Byte 2 Byte
$3FC
$3FC Block
1020 1021 1022 1023
255
1
Address Parameters Block Number ACCESS
Hex Binary Tag Index Offset (Tag+index
(Decimal)
54 00 0101 0100 00010 101 00 21 M
LB 21
58 00 0101 1000 00010 110 00 22 M
LB22
104 01 0000 0100 01000 001 00 65 M
LB65
5C 00 0101 1100 00010 111 00 23 M
LB23
108 01 0000 1000 01000 010 00 66 M
LB 66
60 00 0110 0000 00011 000 00 24 M
LB 24
F0 00 1111 0000 00111 100 00 60 M
LB 60
64 00 0110 0100 00011 001 00 25 M
LB 25
R65
54 00 0101 0100 00010 101 00 21 H
58 00 0101 1000 00010 110 00 22 H
10C 01 0000 1100 01000 011 00 67 M
LB 67
5C 00 0101 1100 00010 111 00 23 H
110 01 0001 0000 01000 100 00 68 M
LB 68
R 60
60 00 0110 0000 00011 000 00 24 H
F0 00 1111 0000 00111 100 00 60 M
LB 60
R 68
64 00 0110 0100 00011 001 00 25 H
[10]
Cache contents
Line Index Status Address Block/word
Tag
D V
0 ? ? 00011 Block 24
1 ? ? 00011 Block 25
2 ? ? 01000 Block 66
3 ? ? 01000 Block 67
4 ? ? 00111 Block 60
5 ? ? 00010 Block 21
6 ? ? 00010 Block 22
7 ? ? 00010 Block 23
[3]
[2]
2
b) Consider a 2-way set associative cache.
Main memory address 1KB = 210 hence 10 bits.
One line of cache word has 32 bit words- 4 bytes thus require 2 bits. Note that it is byte
addressable.
Cache lines are 8 for two sets hence 4require 2 bits
Tag=10-4=6 bits
Tag Index Offset
6 2 2
[5]
Refer to this class slide
2-way set associative – 2 blocks in a set.
Set number = Set block number / blocks per set.
Number of sets= cache block/blocks per set
= 8/2 = 4
Set Selection = Memory block number MOD number of sets in cache
3
Address Parameters Block Set ACCESS
Hex Binary Tag Index Offset Number Number
(Tag+index
(Decimal)
54 00 0101 0100 000101 01 00 21 1 M
LB 21
58 00 0101 1000 000101 10 00 22 2 M
LB 22
104 01 0000 0100 010000 01 00 65 1 M
LB 65
[SET 1 FULL]
5C 00 0101 1100 000101 11 00 23 3 M
LB 23
108 01 0000 1000 010000 10 00 66 2 M
LB 62
[SET 2 FULL]
60 00 0110 0000 000110 00 00 24 0 M
LB 24
F0 00 1111 0000 001111 00 00 60 0 M
LB 60
[SET 0 FULL]
64 00 0110 0100 000110 01 00 25 1 M
LB 25
RB 21 (LRU)
54 00 0101 0100 000101 01 00 21 1 M
LB 21
RB 65 (LRU)
58 00 0101 1000 000101 10 00 22 2 H
10C 01 0000 1100 010000 11 00 67 3 M
LB 67
[SET 3 FULL]
5C 00 0101 1100 000101 11 00 23 3 H
110 01 0001 0000 010001 00 00 68 0 M
LB 68
RB 24 (LRU)
60 00 0110 0000 000110 00 00 24 0 M
LB 24
RB 60 (LRU)
F0 00 1111 0000 001111 00 00 60 0 M
LB 60
RB 68 (LRU)
64 00 0110 0100 000110 01 00 25 1 H
[10]
WAY 1 WAY 2
Line Status Address Block/word Line Status Address Block/word
Index Tag Index Tag
D V D V
0 ? ? 001111 Block 60 0 ? ? 000110 Block 24
1 ? ? 000110 Block 25 1 ? ? 000101 Block 21
2 ? ? 000101 Block 22 2 ? ? 010000 Block 66
3 ? ? 000101 Block 23 3 ? ? 010000 Block 67
Note cache can be arranged as in the slide. [3]
4
Cache Set Status Address Block/word
line Number Tag
number D V
0 0 ? ? 001111 Block 60
1 ? ? 000110 Block 24
2 1 ? ? 000110 Block 25
? ? 000101 Block 21
3 2 ? ? 000101 Block 22
? ? 010000 Block 66
4 3 ? ? 000101 Block 23
5 ? ? 010000 Block 67
[2]
Conclusion: Two way set associative has better performance than direct mapped
5
SOLUTION QUESTION 2 [10]
a) Bandwidth = Number of bytes transferred per unit time.
i) Block is 32 bytes [2]
Access time 1 clock of 200MHz
Number of caches 2
200 MHz * 2 caches * 32 bytes / 1 clock = 11.92 GB/sec
ii) Block is 32 bytes [2]
Access time 3 clock of 200MHz
200 MHz * 1 cache * 32 bytes / 3 clocks = 1.98 GB/sec
iii) Block is 32 bytes [2]
Access time 24 clock of 200MHz
200 MHz * 32 bytes / 24 clocks = 254 MB/sec
b) Execution time [4]
Number of instructions =7362210 instructions
Clocks per instruction = 1 clock
Number of clocks = 7362210*1= 7362210
Number of misses = 52206
Cycle access penalty of 3 clock cycles per miss
Total clock delay penalty = 52206*3 = 156618
Cycles per instruction = 7362210 + 156618=7518828.
Average CPI =7518828/7362210 = 1.02 clock
At a frequency of 200MHZ i.e. 200, 000,000 cycles take 1 sec
Delay =5.1ns