DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
DRAM Circuit and
Architecture Basics
Overview
Terminology
Access Protocol
Architecture
Word Line
Storage element
(capacitor)
Bit Line
Switching
element
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Circuit Basics
DRAM Cell
University of
Maryland
DRAM
Word Line
Storage element
(capacitor)
Column Decoder
Sense Amps
Data In/Out
Buffers
... Bit Lines...
. .. Word Lines ...
Switching
element
Row Decoder
Bit Line
Memory
Array
DRAM Memory
System: Lecture 2
Row, Bitlines and Wordlines
Spring 2003
Bruce Jacob
David Wang
DRAM Circuit Basics
Row Defined
University of
Maryland
Bit Lines
Word Line
Row of DRAM
Row Size: 8 Kb @ 256 Mb SDRAM node
4 Kb @ 256 Mb RDRAM node
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Circuit Basics
Sense Amplifier I
University of
Maryland
6 Rows shown
Sense
and
Amplify
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
DRAM Circuit Basics
Sense Amplifier II : Precharged
precharged to Vcc/2
Vcc (logic 1)
4
Sense
and
Amplify
6
Gnd (logic 0)
Vcc/2
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Circuit Basics
Sense Amplifier III : Destructive Read
University of
Maryland
2
3
Vcc (logic 1)
Gnd (logic 0)
Sense
and
Amplify
Wordline
Driven
Vcc/2
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Access Protocol
ROW ACCESS
University of
Maryland
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
. .. Word Lines ...
AKA: OPEN a DRAM Page/Row
or
ACT (Activate a DRAM Page/Row)
or
RAS (Row Address Strobe)
... Bit Lines...
Row Decoder
CPU
MEMORY
BUS CONTROLLER
Memory
Array
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
once the data is valid on ALL of
the bit lines, you can select a
subset of the bits and send them
to the output buffers ... CAS
picks one of the bits
big point: cannot do another
RAS or precharge of the lines
until finished reading the column
data ... cant change the values
on the bit lines or the output of
the sense amps until it has been
read by the memory controller
DRAM Circuit Basics
Column Defined
Column: Smallest addressable quantity of DRAM on chip
SDRAM*: column size == chip data bus width (4, 8,16, 32)
RDRAM: column size != chip data bus width (128 bit fixed)
SDRAM*: get n columns per access. n = (1, 2, 4, 8)
RDRAM: get 1 column per access.
4 bit wide columns
#0 #1 #2 #3 #4 #5
One Row of DRAM
* SDRAM means SDRAM and variants. i.e. DDR SDRAM
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Access Protocol
COLUMN ACCESS I
University of
Maryland
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
. .. Word Lines ...
READ Command
or
CAS: Column Address Strobe
... Bit Lines...
Row Decoder
CPU
MEMORY
BUS CONTROLLER
Memory
Array
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Access Protocol
Column Access II
University of
Maryland
DRAM
then the data is valid on the data
bus ... depending on what you
are using for in/out buffers, you
might be able to overlap a litttle
or a lot of the data transfer with
the next CAS to the same page
(this is PAGE MODE)
Column Decoder
Sense Amps
Data In/Out
Buffers
. .. Word Lines ...
Data Out
... Bit Lines...
Row Decoder
CPU
MEMORY
BUS CONTROLLER
Memory
Array
... with optional additional
CAS: Column Address Strobe
note: page mode enables overlap with CAS
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
DRAM Speed Part I
How fast can I move data from DRAM cell to
sense amp?
NOTE
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
RCD (Row Command Delay)
. .. Word Lines ...
tRCD
... Bit Lines...
Row Decoder
CPU
MEMORY
BUS CONTROLLER
Memory
Array
Bruce Jacob
David Wang
University of
Maryland
DRAM Speed Part II
How fast can I get data out of sense amps
back into memory controller?
tCAS aka
tCASL aka
tCL
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
CPU
MEMORY
BUS CONTROLLER
CAS: Column Address Strobe
CASL: Column Address Strobe Latency
CL: Column Address Strobe Latency
... Bit Lines...
. .. Word Lines ...
Spring 2003
Row Decoder
DRAM Memory
System: Lecture 2
Memory
Array
Bruce Jacob
David Wang
University of
Maryland
DRAM Speed Part III
How fast can I move data from DRAM cell into
memory controller?
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
CPU
MEMORY
BUS CONTROLLER
tRAC = tRCD + tCAS
RAC (Random Access Delay)
... Bit Lines...
. .. Word Lines ...
Spring 2003
Row Decoder
DRAM Memory
System: Lecture 2
Memory
Array
Bruce Jacob
David Wang
University of
Maryland
DRAM Speed Part IV
How fast can I precharge DRAM array so I can
engage another RAS?
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
CPU
MEMORY
BUS CONTROLLER
tRP
RP (Row Precharge Delay)
... Bit Lines...
. .. Word Lines ...
Spring 2003
Row Decoder
DRAM Memory
System: Lecture 2
Memory
Array
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Speed Part V
How fast can I read from different rows?
University of
Maryland
DRAM
Column Decoder
Sense Amps
Data In/Out
Buffers
tRC = tRAS + tRP
RC (Row Cycle Time)
. .. Word Lines ...
... Bit Lines...
Row Decoder
CPU
MEMORY
BUS CONTROLLER
Memory
Array
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
DRAM Speed Summary I
What do I care about?
tRCD
tCAS
Seen in ads.
Easy to explain
Easy to sell
tRP
tRC = tRAS + tRP
tRAC = tRCD + tCAS
Embedded systems designers
DRAM manufactuers
Computer Architect:
Latency bound code
i.e. linked list traversal
RAS: Row Address Strobe
CAS: Column Address Strobe
RCD: Row Command Delay
RAC :Random Access Delay
RP :Row Precharge Delay
RC :Row Cycle Time
DRAM Memory
System: Lecture 2
Spring 2003
DRAM Speed Summary II
Bruce Jacob
David Wang
DRAM Type
Frequency
Data Bus
Width
(per chip)
Peak Data
Bandwidth
(per Chip)
Random
Access
Time (tRAC)
Row Cycle
Time (tRC)
PC133
SDRAM
133
16
200 MB/s
45 ns
60 ns
DDR 266
133 * 2
16
532 MB/s
45 ns
60 ns
PC800
RDRAM
400 * 2
16
1.6 GB/s
60 ns
70 ns
FCRAM
200 * 2
16
0.8 GB/s
25 ns
25 ns
RLDRAM
300 * 2
32
2.4 GB/s
25 ns
25 ns
University of
Maryland
DRAM is slow
But doesnt have to be
tRC < 10ns achievable
Higher die cost
Not commodity
Not adopted in standard
Expensive
DRAM Memory
System: Lecture 2
Spring 2003
DRAM latency
F
Bruce Jacob
David Wang
University of
Maryland
DRAM latency isnt
deterministic because of CAS or
RAS+CAS, and there may be
significant queuing delays within
the CPU and the memory
controller
Each transaction has some
overhead. Some types of
overhead cannot be pipelined.
This means that in general,
longer bursts are more efficient.
DRAM
CPU
Mem
E1
Controller
A
B
C
E2/E3
A: Transaction request may be delayed in Queue
B: Transaction request sent to Memory Controller
C: Transaction converted to Command Sequences
(may be queued)
D: Command/s Sent to DRAM
E1: Requires only a CAS or
E2: Requires RAS + CAS or
E3: Requires PRE + RAS + CAS
F: Transaction sent back to CPU
DRAM Latency = A + B + C + D + E + F
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Architecture Basics
PHYSICAL ORGANIZATION
University of
Maryland
NOTE
x2 DRAM
Sense Amps
Data
Buffers
... Bit Lines...
Memory
Array
x4 DRAM
This is per bank
Typical DRAMs have 2+ banks
Column Decoder
Sense Amps
Data
Buffers
... Bit Lines...
....
Memory
Array
x8 DRAM
Column Decoder
Row Decoder
....
... Bit Lines...
x4 DRAM
....
Sense Amps
Data
Buffers
Row Decoder
Column Decoder
Row Decoder
x2 DRAM
Memory
Array
x8 DRAM
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Architecture Basics
Read Timing for Conventional DRAM
University of
Maryland
lets look at the interface another
way .. the say the data sheets
portray it.
RAS
Row Access
[explain]
main point: the RAS\ and CAS\
signals directly control the
latches that hold the row and
column addresses ...
Column Access
CAS
Data Transfer
Address
Row
Address
DQ
Column
Address
Row
Address
Valid
Dataout
Column
Address
Valid
Dataout
DRAM Memory
System: Lecture 2
Spring 2003
DRAM Evolutionary Tree
........
Bruce Jacob
David Wang
......
University of
Maryland
MOSYS
since DRAMs inception, there
have been a stream of changes
to the design, from FPM to EDO
to Burst EDO to SDRAM. the
changes are largely structural
modifications -- nimor -- that
target THROUGHPUT.
[discuss FPM up to SDRAM
Everything up to and including
SDRAM has been relatively
inexpensive, especially when
considering the pay-off (FPM
was essentially free, EDO cost a
latch, PBEDO cost a counter,
SDRAM cost a slight re-design).
however, were run out of free
ideas, and now all changes are
considered expensive ... thus
there is no consensus on new
directions and myriad of choices
has appeared
[ do LATENCY mods starting
with ESDRAM ... and then the
INTERFACE mods ]
FCRAM
Conventional
DRAM
$
(Mostly) Structural Modifications
Targeting Throughput
FPM
Structural
Modifications
Targeting
Latency
EDO
P/BEDO
VCDRAM
SDRAM
ESDRAM
Interface Modifications
Targeting Throughput
Rambus, DDR/2
Future Trends
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Read Timing for Conventional DRAM
University of
Maryland
Row Access
NOTE
Column Access
Transfer Overlap
Data Transfer
RAS
CAS
Address
Row
Address
DQ
Column
Address
Row
Address
Valid
Dataout
Column
Address
Valid
Dataout
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Read Timing for Fast Page Mode
University of
Maryland
Row Access
FPM aallows you to keep th
esense amps actuve for multiple
CAS commands ...
Column Access
Transfer Overlap
much better throughput
problem: cannot latch a new
value in the column address
buffer until the read-out of the
data is complete
Data Transfer
RAS
CAS
Address
Row
Address
DQ
Column
Address
Column
Address
Valid
Dataout
Column
Address
Valid
Dataout
Valid
Dataout
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Read Timing for Extended Data Out
University of
Maryland
Row Access
solution to that problem -instead of simple tri-state
buffers, use a latch as well.
by putting a latch after the
column mux, the next column
address command can begin
sooner
Column Access
Transfer Overlap
Data Transfer
RAS
CAS
Address
Row
Address
DQ
Column
Address
Column
Address
Valid
Dataout
Column
Address
Valid
Dataout
Valid
Dataout
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Read Timing for Burst EDO
University of
Maryland
Row Access
by driving the col-addr latch from
an internal counter rather than
an external signal, the minimum
cycle time for driving the output
bus was reduced by roughly
30%
Column Access
Transfer Overlap
Data Transfer
RAS
CAS
Address
Row
Address
DQ
Column
Address
Valid
Data
Valid
Data
Valid
Data
Valid
Data
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Read Timing for Pipeline Burst EDO
University of
Maryland
pipeline refers to the setting up
of the read pipeline ... first CAS\
toggle latches the column
address, all following CAS\
toggles drive data out onto the
bus. therefore data stops coming
when the memory controller
stops toggling CAS\
Row Access
Column Access
Transfer Overlap
Data Transfer
RAS
CAS
Address
Row
Address
DQ
Column
Address
Valid
Data
Valid
Data
Valid
Data
Valid
Data
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Read Timing for Synchronous DRAM
University of
Maryland
main benefit: frees up the CPU
or memory controller from
having to control the DRAMs
internal latches directly ... the
controller/CPU can go off and do
other things during the idle
cycles instead of wait ... even
though the time-to-first-word
latency actually gets worse, the
scheme increases system
throughput
Row Access
Clock
Column Access
RAS
Transfer Overlap
Data Transfer
CAS
Command
ACT
READ
Address
Row
Addr
DQ
Col
Addr
Valid
Data
Valid
Data
Valid
Data
Valid
Data
(RAS + CAS + OE ... == Command Bus)
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
output latch on EDO allowed you
to start CAS sooner for next
accesss (to same row)
latch whole row in ESDRAM -allows you to start precharge &
RAS sooner for thee next page
access -- HIDE THE
PRECHARGE OVERHEAD.
DRAM Evolution
Inter-Row Read Timing for ESDRAM
Regular CAS-2 SDRAM, R/R to same bank
Clock
Command
ACT
READ
PRE
ACT
READ
Col
Addr
Bank
Row
Addr
Col
Addr
Address
Row
Addr
DQ
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
ESDRAM, R/R to same bank
Clock
Command
ACT
READ
PRE
ACT
READ
Col
Addr
Bank
Row
Addr
Col
Addr
Address
Row
Addr
DQ
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
University of
Maryland
neat feature of this type of
buffering: write-around
DRAM Evolution
Write-Around in ESDRAM
Regular CAS-2 SDRAM, R/W/R to same bank, rows 0/1/0
Clock
Command
ACT
READ
PRE
ACT
WRITE
PRE
ACT
READ
Col
Addr
Bank
Row
Addr
Col
Addr
Bank
Row
Addr
Col
Addr
Address
Row
Addr
DQ
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
Valid
Data
ESDRAM, R/W/R to same bank, rows 0/1/0
Clock
Command
ACT
READ PRE
ACT
WRITE
READ
Col
Addr
Row
Addr
Col
Addr
Col
Addr
Valid Valid
Data Data
Valid
Data
Address
Row
Addr
DQ
Bank
Valid
Data
Valid
Data
Valid
Data
Valid Valid
Data Data
Valid Valid
Data Data
Valid Valid
Data Data
(can second READ be this aggressive?)
Valid
Data
Valid
Data
DRAM Memory
System: Lecture 2
Spring 2003
DRAM Evolution
$
Bruce Jacob
David Wang
University of
Maryland
main thing ... it is like having a
bunch of open row buffers (a la
rambus), but the problem is that
you must deal with the cache
directly (move into and out of it),
not the DRAM banks ... adds an
extra couple of cycles of latency
... however, you get good
bandwidth if the data you want is
cache, and you can prefetch
into cache ahead of when you
want it ... originally targetted at
reducing latency, now that
SDRAM is CAS-2 and RCD-2,
this make sense only in a
throughput way
Internal Structure of Virtual Channel
16 Channels
(segments)
Bank B
Bank A
Input/Output
Buffer
2Kb Segment
2Kb Segment
2Kbit
# DQs
DQs
2Kb Segment
2Kb Segment
Row Decoder
Activate
Sense
Amps
Prefetch
Restore
Sel/Dec
Read
Write
Segment cache is software-managed, reduces energy
DRAM Memory
System: Lecture 2
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
Internal Structure of Fast Cycle RAM
University of
Maryland
SDRAM
FCRAM
13 bits
8M Array
(8Kr x 1Kb)
Sense Amps
tRCD = 15ns
(two clocks)
15 bits
Row Decoder
8K rows requires 13 bits tto
select ... FCRAM uses 15
(assuming the array is 8k x 1k ...
the data sheet does not specify)
Row Decoder
FCRAM opts to break up the
data array .. only activate a
portion of the word line
8M Array
(?)
Sense Amps
tRCD = 5ns
(one clock)
Reduces access time and energy/access
Spring 2003
Bruce Jacob
David Wang
DRAM Evolution
........
DRAM Memory
System: Lecture 2
......
Internal Structure of MoSys 1T-SRAM
University of
Maryland
MoSys takes this one step
further ... DRAM with an SRAM
interface & speed but DRAM
energy
[physical partitioning: 72 banks]
addr
Bank
Select
auto refresh -- how to do this
transparently? the logic moves
tthrough the arrays, refreshing
them when not active.
but what is one bank gets
repeated access for a long
duration? all other banks will be
refreshed, but that one will not.
solution: they have a bank-sized
CACHE of lines ... in theory,
should never have a problem
(magic)
Auto
Refresh
$
DQs