0% found this document useful (0 votes)

27 views34 pages

09 Memory Building Blocks

The document discusses SRAM technology, detailing its characteristics, advantages, and various implementations in computer architecture. It explains the differences between SRAM and latches, the structure of SRAM cells, and the operation of read and write ports. Additionally, it covers topics such as multi-porting, banking, and the use of content associative memory (CAM) in cache designs, emphasizing the importance of latency and power considerations in SRAM design.

Uploaded by

antimo asumu bayeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views34 pages

09 Memory Building Blocks

Uploaded by

antimo asumu bayeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Advanced Computer Architecture I

Addresses
SRAM Technology
• SRAM: static RAM
• Static: bits directly connected to power/ground
• Naturally/continuously “refreshed”, never decay (contrast DRAM)
• Designed for speed

• Implements all storage arrays in real processors

• Register file, caches, branch predictor, etc.
• Everything except pipeline latches

• Latches vs. SRAM

• Latches: singleton word, always read/write same one
• SRAM: array of words, can read/write different ones
• Address indicates which one

4
(CMOS) Memory Components

• Interface
address ! data • N-bit address bus (on N-bit machine)
• Data bus
M • Can have read/write on same data bus
• Or, can have dedicated read/write buses
• Can have multiple ports: address/data bus pairs

5
SRAM: First Cut
write-data1 write-data0 • 4x2 (4 2-bit words) RAM
wr ite-addr

• 2-bit addr
0 0
• First cut: bits are D-Latches
• Write port
1 1 • Addr decodes to enable signals
• Read port
1 0 • Addr decodes to mux selectors
read-addr

– 1024 input OR gate?

– Physical layout of output wires
0 1 • RAM width ∞ M
• Wire delay ∞ wire length

read-data1 read-data0

6
SRAM: Second Cut
write-data1 write-data0 • Second cut: tri-state wired-OR
wr ite-addr

• Read mux using tri-states

0 0
+ Scalable, distributed “muxes”
+ Better layout of output wires
1 1 • RAM width independent of M
一

• Standard RAM
read-addr一

1 0
• Bits in word connected by wordline
• 1-hot decode address
0 1
• Bits in position connected by bitline
• Shared input/output wires
• Port: one set of wordlines/bitlines
• Grid-like design
read-data1 read-data0

7
SRAM: Third Cut
IN
• Third cut: replace latches with …
WE OUT – 28 transistors per bit
• Cross-coupled inverters (CCI)
+ 4 transistors
• Convention
• Right node is bit, left is ~bit
~bit bit • Non-digital interface
IN? OUT? OUT? IN? • What is the input and output?
• Where is write enable?
• Implement ports in “analog” way
• Transistors, not full gates

8
SRAM: Register Files and Caches
• Two different SRAM port styles
• Regfile style
• Modest size: <4KB
• Many ports: some read-only, some write-only
• Write and read both take half a cycle (write first, read second)
• Cache style
• Larger size: >8KB
• Few ports: read/write in a single port
• Write and read can both take full cycle

9
Regfile-Style Read Port
CLK • Two phase read
• Phase I: clk = 0
• Pre-charge bitlines to 1
0 1 • Negated bitlines are 0

w ordline0 • Phase II: clk = 1

• One wordline goes high
raddr
/

• All “1” bits in that row

1 0 discharge their bitlines to 0
• Negated bitlines go to 1
wordline1
bitline1

bitline0

rdata1 rdata0

10
Read Port In Action: Phase I
CLK=0 • CLK = 0
• p-transistors conduct
• Bitlines “pre-charge” to 1
0 1 • rdata1-0 are 0
raddr

1 0

1 1
/ /
rdata1 0 rdata0 0

11
Read Port In Action: Phase II
CLK=1 • raddr = 1
• CLK = 1
• p-transistors close
0 1
• wordline1 = 1
• “1” bits on wordline1 create path
from bitline to ground
raddr

• SRAM[1]
1 0
• Corresponding bitlines discharge
• bitline1
• Corresponding rdata bits go to 1
0 1 • rdata1
/

rdata1 1 rdata0 0 • That’s a read

12
Regfile-Style Write Port
wdata1 wdata0
CLK • Two phase write
• Phase I: clk = 1
• Stabilize one wordline high
0 1 • Phase II: clk = 0
• Open pass-transistors
waddr

• “Overwhelm” bits in selected word

• Actually: two clocks here
1 0 • Both phases in first half

-因 pass transistor: like a tri-state buffer

13
A 2-Read Port 1-Write Port Regfile
wdata1 wdata0
CLK

0 1
RD
RS1

SRAM cell
RS2
1 0

rdata11 rdata21 rdata10

rdata20 14
Cache-Style Read/Write Port
~wdata1 Double-ended bitlines
WE&~CLK wdata1~wdata0wdata0• • Connect to both sides of bit
• Two-phase write
RE&CLK
• Just like a register file
• Two phase read
RE&~CLK || 0 1
WE&CLK • Phase I: clk = 1
• Equalize bitline pair voltage
• Phase II: clk = 0
addr 1 1 • One wordline high
• “1 side” bitline swings up
• “0 side” bitline swings down
sense-amplifier sense-amplifier • Sens-amp translates swing

rdata1 rdata0
15
Read/Write Port in Read Action: Phase I
• Phase I: clk = 1
• Equalize voltage on bitline pairs
• To (nominally) 0.5
RE&CLK

0 1
RE&~CLK
addr

1 1

~bit bit
0.5 0.5 0.5 0.5
sense-amplifier sense-amplifier

rdata1 rdata0
16
Read/Write Port in Read Action: Phase II
• Phase II: clk = 0
• wordline1 goes high
• “1 side” bitlines swing high 0.6
RE&CLK • “0 side” bitlines swing low 0.4
• Sens-amps interpret swing
0 1
RE&~CLK
addr

1 0

~bit bit
0.4 0.6 0.6 0.4
sense-amplifier sense-amplifier
1 0
rdata1 rdata0
17
Cache-Style SRAM Latency
N
• Assume
• M N-bit words 0 1
M
• Some minimum wire spacing L 1 0
• CCIs occupy no space
sa sa
• 4 major latency components: taken in series
• Decoder: ∞ log2M
• Wordlines: ∞ 2NL (cross 2N bitlines)
• Bitlines: ∞ ML (cross M wordlines)
• Muxes + sense-amps: constant
• 32KB SRAM: red components contribute about equally

• Latency: ∞ (2N+M)L
• Maximize storage for some max latency: make SRAMs as
square as possible: minimize 2N+M
• Latency: ∞ √#total bits

18
Multi-Ported Cache-Style SRAM Latency
• Previous calculation had hidden constant
0 1
• Number of ports P
1 0
• Recalculate latency components
• Decoder: ∞ log2M (unchanged) sa sa

• Wordlines: ∞ 2NLP (cross 2NP bitlines)

• Bitlines: ∞ MLP (cross MP wordlines)
• Muxes + sense-amps: constant (unchanged)

0 1
• Latency: ∞ (2N+M)LP
• Latency: ∞ √#bits * #ports 1 0

• How does latency scale? (P)

sa sa
• How does power scale? (P^2)
sa sa
• Both length and number active increase
19
Multi-Porting an SRAM
• Why multi-porting?
• Multiple accesses per cycle
• True multi-porting (physically adding a port) not good
+ Any combination of accesses will work
– Increases access latency, energy ∞ P, area ∞ P2
• Another option: pipelining
• Timeshare single port on clock edges (wave pipelining: no latches)
+ Negligible area, latency, energy increase
– Not scalable beyond 2 ports
• Yet another option: replication
• Don’t laugh: used for register files, even caches (Alpha 21164)
• Smaller and faster than true multi-porting 2*P2 < (2*P)2
+ Adds read bandwidth, any combination of reads will work
– Doesn’t add write bandwidth, not really scalable beyond 2 ports

20
Banking an SRAM
1020 1021
• Divide SRAM into banks, 1022 1023

interleave the addresses

• Allow parallel access to
different banks
• Two accesses to same bank? ↓
bank-conflict, one waits
• Low area, latency overhead for routing requests to banks
• Few bank conflicts given sufficient number of banks
• Rule of thumb: N simultaneous accesses → 2 N banks

• How to divide words among banks?

• Round robin: using address LSB (least significant bits)
• Example: 16 word RAM divided into 4 banks
• b0: 0,4,8,12; b1: 1,5,9,13; b2: 2,6,10,14; b3: 3,7,11,15
• Why? Spatial locality

21
Full-Associativity with CAMs
CAM

• CAM: content associative memory = 0

= 1
• Array of words with built-in comparators
• Matchlines instead of bitlines
• Output is “one-hot” encoding of match = 1022
= 1023

• FA cache?
• Tags as CAM [31:2] 1:0

• Data as RAM
tag Cache hit data

• Hardware is not software

• No such thing as software CAM

22
CAM Circuit
match1 ~match1 match0 ~match0
CLK
• CAM: reverse RAM
• Bitlines are inputs
0 1
• Called matchlines
• Wordlines are outputs
• Two phase match
1 0 • Phase I: clk=0
• Pre-charge wordlines
• Phase II: clk=1
• Enable matchlines
• Non-matching bits
dis-charge wordlines

23
CAM Circuit In Action: Phase I
match1 ~match1 match0 ~match0
CLK 0 1 1 0
• Phase I: clk=0
• Pre-charge wordlines
0 1

1 0

24
CAM Circuit In Action: Phase II
match1 ~match1 match0 ~match0
CLK 0 1 1 0
• Phase II: clk=1
• Enable matchlines
0 1 • Note: bits flipped
1 • Non-matching bit
discharges wordline

1 0 • ANDs matches
• NORs non-matches
0 • Similar technique for
doing a fast OR for hit
detection

~bit bit

25
CAM Upshot
• CAMs: effective but expensive
– Matchlines are very expensive (for nasty EE reasons)
• Used but only for 16 or 32 way (max) associativity
• Not for 1024-way associativity

26
Bonus

27
Multi-Ported Cache-Style SRAM Power
• Same four components for power
• Decoder: ∞ log2M
• Wordlines: ∞ 2NLP
– Huge Capacitance(C) per wordline (drives 2N gates)
+ But only one ever high at any time (overall consumption low)
• Bitlines: ∞ MLP
– C lower than wordlines, but large
+ Vswing << V DD (C * Vswing2 * f)
• Muxes + sense-amps: constant 0 1

• 32KB SRAM: sense-amps are 60–70%

1 0

• How does power scale?

sa sa

28
A Banked Cache
• Banking a cache
• Simple: bank SRAMs
• Which address bits determine bank? LSB of index
• Bank network assigns accesses to banks, resolves conflicts
– Adds some latency too
0 1

1022 1023
! !
= =

t ! t !
[31:12] [11:3] 0 1:0 [31:12] [11:3] 1 1:0
<< <<
address0 data0 hit0? address1 data1 hit1?
29
SRAM Summary
• Large storage arrays are not implemented “digitally”
• SRAM implementation exploits analog transistor properties
• Inverter pair bits much smaller than latch/flip-flop bits
• Wordline/bitline arrangement gives simple “grid-like” routing
• Basic understanding of read, write, read/write ports
• Wordlines select words
• Overwhelm inverter-pair to write
• Drain pre-charged line or swing voltage to read
• Latency proportional to √#bits * #ports

30
Aside: Physical Cache Layout I
• Logical layout 0 512
1 513
• Data and tags mixed together
2 514
• Physical layout
• Data and tags in separate RAMs
510 1022
511 1023

= =

!
[31:11] [10:2] 1:0
<<

hit? address data

31
Physical Cache Layout II
• Logical layout 0 512
1 513
• Data array is monolithic
2 514
• Physical layout
• Each data “way” in separate array
510 1022
511 1023

[31:11] [10:2] 1:0

address data
32
Physical Cache Layout III
• Logical layout 0 512

• Data blocks are contiguous 1 513

• Physical layout
2 514

• Only if full block needed on read

• E.g., I$ (read consecutive words) 510 1022
• E.g., L2 (read block to fill D$,I$) 511 1023
• For D$ (access size is 1 word) …
• Words in same data blocks are bit-interleaved
• Word0.bit0 adjacent to word1.bit0
+ Builds word selection logic into array
+ Avoids duplicating sense-amps/muxes
word3 word2 word1 word0

[31:11] [10:2] 1:0

address data
33
Physical Cache Layout IV
• Logical layout 0 512
1 513
• Arrays are vertically contiguous
• Physical layout 255 767
• Vertical partitioning to minimize wire lengths
! !
• H-tree: horizontal/vertical partitioning layout
• Applied recursively
256 768
• Each node looks like an H
510 1022
511 1023

address data

34
Physical Cache Layout
• Arrays and h-trees make caches easy to spot in μgraphs

35
Full-Associativity
0 1 1022 1023

↓ ↓
= = = =

[31:2] 1:0

• How to implement full (or at least high) associativity?

• 1K tag matches? unavoidable, but at least tags are small
• 1K data reads? Terribly inefficient

Digital Systems Storage Lecture
No ratings yet
Digital Systems Storage Lecture
38 pages
Memory and Programmable Logic
No ratings yet
Memory and Programmable Logic
52 pages
Lec10 Memory 2
No ratings yet
Lec10 Memory 2
48 pages
L4 Memory RAM and ROM
No ratings yet
L4 Memory RAM and ROM
46 pages
Digital Systems: Memory & Logic
No ratings yet
Digital Systems: Memory & Logic
33 pages
L2-SRAM Cells
No ratings yet
L2-SRAM Cells
16 pages
ECE2030 Introduction To Computer Engineering Lecture 17: Memory and Programmable Logic
No ratings yet
ECE2030 Introduction To Computer Engineering Lecture 17: Memory and Programmable Logic
37 pages
Module 16 Sram
No ratings yet
Module 16 Sram
39 pages
DE NOTES-unit 5
No ratings yet
DE NOTES-unit 5
18 pages
CMOS SRAM Design Essentials
No ratings yet
CMOS SRAM Design Essentials
38 pages
Memories PDF
No ratings yet
Memories PDF
42 pages
Memories: - Memories in Verilog - Memories On The FPGA - External Memories - SRAM (Async, Sync) - Dram - Flash
No ratings yet
Memories: - Memories in Verilog - Memories On The FPGA - External Memories - SRAM (Async, Sync) - Dram - Flash
42 pages
M1 Eece425 S2020 PDF
No ratings yet
M1 Eece425 S2020 PDF
37 pages
DPSD
No ratings yet
DPSD
66 pages
ELE225 - Chapter 7 - 08-10-2024
No ratings yet
ELE225 - Chapter 7 - 08-10-2024
67 pages
Module 2
No ratings yet
Module 2
38 pages
ECE4740: Digital VLSI Design: Semiconductor Memories
No ratings yet
ECE4740: Digital VLSI Design: Semiconductor Memories
35 pages
DSD Memory
No ratings yet
DSD Memory
37 pages
Memory and Programmable Logic
No ratings yet
Memory and Programmable Logic
62 pages
DD Slides 7
No ratings yet
DD Slides 7
61 pages
Ee 587 Soc Design & Test: Partha Pande School of Eecs Washington State University Pande@Eecs - Wsu.Edu
No ratings yet
Ee 587 Soc Design & Test: Partha Pande School of Eecs Washington State University Pande@Eecs - Wsu.Edu
28 pages
Memory and Storage Systems
No ratings yet
Memory and Storage Systems
14 pages
Lecture 8
No ratings yet
Lecture 8
38 pages
Advanced Logic
100% (1)
Advanced Logic
9 pages
Chapter 10
0% (1)
Chapter 10
10 pages
Lectures Wk6
No ratings yet
Lectures Wk6
32 pages
CH 07
No ratings yet
CH 07
107 pages
Memory Modules
No ratings yet
Memory Modules
36 pages
Embedded
No ratings yet
Embedded
4 pages
Ram Memory Design
No ratings yet
Ram Memory Design
5 pages
Chapter 3 Computer Architecture
No ratings yet
Chapter 3 Computer Architecture
28 pages
Computer Components Explained
No ratings yet
Computer Components Explained
31 pages
Rom Pal Pla CPLD Fpga
No ratings yet
Rom Pal Pla CPLD Fpga
81 pages
VLSID Unit5
No ratings yet
VLSID Unit5
108 pages
Unit 5
No ratings yet
Unit 5
56 pages
Module 3 Memory System
No ratings yet
Module 3 Memory System
89 pages
Memory Systems
No ratings yet
Memory Systems
93 pages
Sram Ashok
No ratings yet
Sram Ashok
19 pages
COA Module4
No ratings yet
COA Module4
35 pages
Semiconductor Memories
No ratings yet
Semiconductor Memories
61 pages
De
No ratings yet
De
5 pages
EE201: Digital Circuits and Systems: Section 6 - Memory
No ratings yet
EE201: Digital Circuits and Systems: Section 6 - Memory
22 pages
DigitalLogic ComputerOrganization L14 Memories Handout
No ratings yet
DigitalLogic ComputerOrganization L14 Memories Handout
25 pages
RAM Basics: Anselmo Lastra
No ratings yet
RAM Basics: Anselmo Lastra
56 pages
The Memory System
No ratings yet
The Memory System
115 pages
Memories - I: COMP541
No ratings yet
Memories - I: COMP541
45 pages
Selection of Registers: Recall The R-Type MIPS Instructions
No ratings yet
Selection of Registers: Recall The R-Type MIPS Instructions
20 pages
Cse477 23 Memories
No ratings yet
Cse477 23 Memories
24 pages
Computer Architecture: Main Memory
No ratings yet
Computer Architecture: Main Memory
18 pages
Electronics Students' SRAM Guide
No ratings yet
Electronics Students' SRAM Guide
13 pages
Computer Memory Architecture
No ratings yet
Computer Memory Architecture
23 pages
Chapter4 Buses Memory STM32
No ratings yet
Chapter4 Buses Memory STM32
68 pages
Memory Systems-Module 3
No ratings yet
Memory Systems-Module 3
79 pages
Module 4-The Memory System
No ratings yet
Module 4-The Memory System
55 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
78 pages
Sequential N Combination
No ratings yet
Sequential N Combination
13 pages
Fundamentals of Digital RAM
No ratings yet
Fundamentals of Digital RAM
16 pages
9-Memories 21 22
No ratings yet
9-Memories 21 22
16 pages
10 Caches
No ratings yet
10 Caches
124 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
07 Ooo Spec
No ratings yet
07 Ooo Spec
85 pages
05 Wideissue
No ratings yet
05 Wideissue
77 pages
06 Ooo Basics
No ratings yet
06 Ooo Basics
74 pages
01 Intro
No ratings yet
01 Intro
65 pages
Mmep 11.11 15
No ratings yet
Mmep 11.11 15
10 pages
Oceans Music
No ratings yet
Oceans Music
1 page
Corro
No ratings yet
Corro
2 pages
English Transcript (Computer Science)
100% (1)
English Transcript (Computer Science)
1 page
Pc3 PC Based Hi-Tech Home Implementation
No ratings yet
Pc3 PC Based Hi-Tech Home Implementation
3 pages
Multiplexer Discription
No ratings yet
Multiplexer Discription
3 pages
Screen Cnvm4270a
No ratings yet
Screen Cnvm4270a
43 pages
Ade 2-5 Unit
No ratings yet
Ade 2-5 Unit
248 pages
Logic Circuit Design for Students
No ratings yet
Logic Circuit Design for Students
4 pages
International Accreditation Enhancement PLan
No ratings yet
International Accreditation Enhancement PLan
24 pages
Rre-Order 04-08-2020 PDF
No ratings yet
Rre-Order 04-08-2020 PDF
5 pages
FOV-Unit1 Complete
No ratings yet
FOV-Unit1 Complete
140 pages
Manual Eprom Eng
No ratings yet
Manual Eprom Eng
5 pages
VLSI Design 1marks Question
No ratings yet
VLSI Design 1marks Question
12 pages
CANBUS Sja1000
No ratings yet
CANBUS Sja1000
60 pages
Build A Gaming PC Under Rs 50k
No ratings yet
Build A Gaming PC Under Rs 50k
9 pages
ADC0804 Pinout & Microcontroller Interface
No ratings yet
ADC0804 Pinout & Microcontroller Interface
4 pages
DELD All Units MCQ
No ratings yet
DELD All Units MCQ
33 pages
8051 Microcontroller Quiz Questions
80% (5)
8051 Microcontroller Quiz Questions
7 pages
Understanding CPU Addressing Modes
No ratings yet
Understanding CPU Addressing Modes
20 pages
PN Sequence Generator
0% (1)
PN Sequence Generator
7 pages
Ultra Low Power Design
No ratings yet
Ultra Low Power Design
8 pages
Assignment DEMC
No ratings yet
Assignment DEMC
3 pages
XC2V1000-4BG575C To XC2V80-6FG256I
No ratings yet
XC2V1000-4BG575C To XC2V80-6FG256I
8 pages
User's Manual: LGA775 Pentium 4 Intel 865G Industrial Motherboard
No ratings yet
User's Manual: LGA775 Pentium 4 Intel 865G Industrial Motherboard
65 pages
Disk and OE Matrix
No ratings yet
Disk and OE Matrix
15 pages
Digital Clock But Without A Microcontroller Hardco
No ratings yet
Digital Clock But Without A Microcontroller Hardco
17 pages
c167cr Um
No ratings yet
c167cr Um
464 pages
Et3491-Embedded Systems and Iot Design Full Notes
No ratings yet
Et3491-Embedded Systems and Iot Design Full Notes
203 pages
Lecture Zero ECE249
No ratings yet
Lecture Zero ECE249
23 pages
Olympiad Champs Cyber Class 4 W - Disha Experts
No ratings yet
Olympiad Champs Cyber Class 4 W - Disha Experts
164 pages
Mainboard K7s5a
No ratings yet
Mainboard K7s5a
41 pages
DLCO
No ratings yet
DLCO
10 pages
Intel 945 Express
No ratings yet
Intel 945 Express
482 pages

09 Memory Building Blocks

Uploaded by

09 Memory Building Blocks

Uploaded by

Advanced Computer Architecture I

• Implements all storage arrays in real processors

• Latches vs. SRAM

– 1024 input OR gate?

• Read mux using tri-states

w ordline0 • Phase II: clk = 1

• All “1” bits in that row

rdata1 1 rdata0 0 • That’s a read

• “Overwhelm” bits in selected word

-因 pass transistor: like a tri-state buffer

rdata11 rdata21 rdata10

• Wordlines: ∞ 2NLP (cross 2NP bitlines)

• How does latency scale? (P)

interleave the addresses

• How to divide words among banks?

• CAM: content associative memory = 0

• Hardware is not software

• 32KB SRAM: sense-amps are 60–70%

• How does power scale?

hit? address data

[31:11] [10:2] 1:0

• Data blocks are contiguous 1 513

• Only if full block needed on read

[31:11] [10:2] 1:0

• How to implement full (or at least high) associativity?

You might also like