Introduction on
DRAM Interface
Won-Joo Yun
2011. 11. 18
What is Memory?
1. the mental capacity or faculty of retaining and reviving facts,
events, impressions, etc., or of recalling or recognizing
previous experiences.
Also called computer memory, storage.
a. the capacity of a computer to store information subject to recall.
b. the components of the computer in which such information is stored.
[dictionary.com]
2011-11-18
What is DRAM?
Dynamic Random Access Memory
RAM
Unlike electromagnetic tape or disk, it allows stored data to be accessed in
any order (i.e. at random)
Random refers to the idea that any piece of data can be returned in a
constant time, regardless of its physical location and whether it is related to
the previous piece of data [wikipedia.com]
Dynamic
vs. static
needs refresh
the charge stored on the input
capacitance will leak off over time
[3Tr Cell of 1k DRAM]
2011-11-18
What is DRAM?
Sequential access
Random access
long time to access
different access time of locations
constant time to access
regardless of locations
2011-11-18
Semiconductor memory
RAM
ROM
2011-11-18
DRAM
1 Tr. + 1 Cap.
Dynamic (Need refresh)
SRAM
4 Tr. or 6 Tr.
Static
FeRAM
1 Tr. + 1 Cap.
Almost Static
Mask ROM
1 Tr. (Single
Poly)
Not Erasable
EPROM
1 Tr. (Dual Poly)
Erasable by UV
EEPROM
1 Tr. (Dual Poly)
Electrically Erasable (by bit)
FLASH
1 Tr. (Dual Poly)
Electrically Erasable (by
block)
Volatile
Non-Volatile
Memory comparison
DRAM
SRAM
FLASH
FeRAM
MRAM
PRAM
Mechanism
for data
storage
charge and
discharge of
Cap.
switching of
cross-coupled
inv.
charge and
discharge of
F.G.
Dipole
switching of
Ferro-Cap
resistivity with
magnetic
polarization
state
resistivity with
chalcogenide
material
phase change
Access Time
< 100ns
< 50ns
< 100ns
< 100ns
< 50ns
< 100ns
Write Time
< 100ns
< 50ns
< 10us
< 100ns
< 50ns
< 500ns
Erase Time
No need
No need
~ms
No need
No need
No need
# of RD/WR
operation
R&W infinite
(> 1015)
R&W infinite
(> 1015)
106 ~ 1010
1012 ~ 1016
R&W infinite
(> 1015)
109 ~ 1011
Data
Retention
Time
need refresh
need not
refresh
~ 10 years
~ 10 years
~ 10 years
~ 10 years
Operating
Current
~ 100mA
~ 100mA
~ 10mA
~ 10mA
~ 10mA
~ 10mA
Standby
Current
~ 200uA
~ 10uA
~ 10uA
~ 10uA
~ 10uA
~ 10uA
[Hynix]
2011-11-18
Memory cell structure
DRAM : Dynamic Random Access Memory
SRAM : Static Random Access Memory
NVM : Non-Volatile Memory, Flash
2011-11-18
[Hynix]
7
Comparisons
Intel Penryn Dual Core
process : 45nm
die area : 107mm2
6MB L2 cache
48Mb/38.5mm2 = 1.25Mb/mm2
Micron DDR3 SDRAM
process : 42nm
die area : 49.2mm2
4Gb
4Gb/43.3mm2 = 92Mb/mm2
Intel-Micron (IM) Flash
process : 25nm
die area : 167mm2
64Gb
64Gb/141mm2 = 454Mb/mm2
2011-11-18
[Intel, Micron]
8
Standard DRAM genealogy
Asynchronous
Synchronous
2011-11-18
DRAM technology evolution
Density
1K
4K
16K
64K
256K
1M
4M
16M
64M
256M
2G
Year
1971
1975
1979
1982
1985
1988
1991
1994
1997
2000
2005
Design Rule
(um)
10
0.8
0.5
0.30
0.18
0.08
Chip Size
(mm2)
10
13
26
30
35
50
70
110
140
160
200
Cell Size
(um2)
3000
860
400
180
65
25
10
2.5
0.72
0.26
0.05
Power Supply
(V)
20
3.3
2.5
1.5
Operation
Mode
SRAM
EDO
SDR
DDR
DRD
DDR3
Gate Oxide
(nm)
120
12
Cell Type
3Tr
12
Page Mode
100
75
Fast Page Mode
35
30
1Tr Planar Capacitor
20
16
3-D Capacitor
High
[Hynix]
2011-11-18
10
DRAM voltage trend
[Hynix]
2011-11-18
11
Lower voltage means slower device speed
[Hynix]
2011-11-18
12
DRAM Cell
DRAM unit cell : 1 Cell Transistor + 1 Capacitor
2011-11-18
Invented in 1968 R. H. Dennard/IBM
US Patent # 3,387,286
13
DRAM core structure
1) Memory Cell : 1T, 1C
2) X Decoder & Word Line
3) Bit Line
4) Sense Amp
5) Column Select
2011-11-18
14
DRAM core operation
1) Bit line floating
2) Word line select
: Charge sharing
3) DRAM sensing
: Write recovery
4) Column select
: data read (or write)
2011-11-18
15
Charge sharing
BL
Cell
1
BL
Cell
0
2011-11-18
16
Charge sharing
WL
Cb
BL
Vout =
2011-11-18
Cs
Cb*VBLP + Cs*VCELL
Cb + Cs
VBLP= VCC VCELL=VCC, 0
Cb
Cs
Stand by
Q=C*V=Cb*VBLP + Cs*VCELL
VBL=VCELL=Vout
Cb
Cs
Word line turn on
Q=C*V=(Cb + Cs)*Vout
17
BL SA operation
Cross-coupled sense amp
[Hynix]
2011-11-18
18
Memory I/O Interface
Why Synchronous?
Asynchronous DRAM
Page Mode DRAM
Fast Page Mode DRAM
EDO(Extended Data Out) DRAM
Synchronous DRAM
SDRAM
DDR SDRAM
Rambus DRAM
Synchronous DRAM can output more data
[Hynix]
2011-11-18
20
Conventional DRAM circuits
Cell Array
(sub) Matrix Array,, Cell (1T1C), WL, bit line (Folded )
Cap. -Data retention DRAM Tech. Core part.
SA array -- DRAM sensing, Refresh all page Cell
Decoder Mux, Add input
Data I/O
Pre decoding Decoding.
Redundancy, internal refresh counter,
Row address path, Col add path
Read/write , + Data bus sense amp , block write Driver
Various DRAM according to Col.(add./data path) control
Fast page, EDO, SDRAM, DDR, DDR2,
PKG option x4, x8, x16
Control circuits
Operation of Read, Write, Refresh (Timing & Selection) according to /RAS, /CAS, /WE
Internal bias voltage
Vbb, Vpp, Vblp, Vcp, Vint, Vref
2011-11-18
21
SDRAM features +
Pipeline
In previous DRAM, column address path time determines data freq.
With partitioning internal path, data are outputted every clock cycle after 2 or 3 clocks
Clock input
Up to EDO, input signals are directly controlled by /RAS, /CAS, /WE
Changed to command (referenced rising edge of clock) various operation and simple
spec.
Multi bank (2/4)
Mode register set
Independent row access is consisted of multiple bank increase the size of page
capable of continuous operation with hiding pre-charge time
Programmable /CAS latency and burst length suitable for system environments (clock
frequency)
Internal address generator
Internally generates sequential column address for Burst (fast column access) operation
I/O Power
Dedicated power of data (Vccq, Vssq) for stable operation
2011-11-18
22
Pipeline
-. separate signals having long access time for faster input command
2011-11-18
23
Multi-bank Architecture
-. Bank is a unit which can be active independently and has same data bus
width as external output bus width
-. Interleaving bank operation while one bank accessed, another active
[Hynix]
2011-11-18
24
DRAM clock speed trends
[K-h Kim, et al. JSSC 2007]
2011-11-18
[Hynix]
25
DDR features +
DDR data I/O
Double data rate = rising & falling edge of clock
Twice performance compared to SDRAM
DDR performance by 2n-bit pre-fetch
On chip clock by DLL
Frequency not limited to the access time
SSTL interface
Input reference voltage
guarantee of dout data window, termination
Differential input
reference by VREF
data strobe by DQS
edge align, bi-directional, source synchronous
Differential clock
CLK, /CLK
EMRS control
Dout driver size & DLL
2011-11-18
26
SDR/DDR/DDR2/DDR3 operation
Data Rate
2011-11-18
27
SDR/DDR/DDR2/DDR3 operation
DDR
(2b pre-fetch)
2011-11-18
DDR2
(4b pre-fetch)
DDR3
(8b pre-fetch)
28
SDR/DDR/DDR2/DDR3 operation
High bandwidth concept : pre-fetch
-. Fetch cycle means one column cycle that is
executed by read or write command issue
2011-11-18
29
Memory I/O interface
Memory clocking system
Source synchronous scheme
DLL supports
Impedance control
Design trends on graphics memory
Low power techniques
Input clock output
Low cost techniques
clock
Low jitter & high performance techniques
clock output
power distribution network
2011-11-18
30
Common clock scheme
Data transfer is performed relative to a single master clock
signal (Synchronously)
[Hynix]
2011-11-18
31
Common clock scheme
Timing budget
[Hynix]
2011-11-18
32
Source Synchronous
Common (master) clock is not used for data transfer
Devices have an additional strobe pin
Minimizing differences in routed length & layer
characteristics between strobe and data signals is
required
Data / STB are synchronized at driver
Device speed (fast, slow) is irrelevant since
data & STB are supplied by the same device
The significant issue is the accumulated
skew between data & STB as the signals
travel between devices
2011-11-18
33
Source Synchronous
Timing budget
ideal case : tSTB=tDATA
Maximum speed is limited only by setup + hold time
real : Maximum speed is limited by setup + hold + |tSTB - tDATA|max +
2011-11-18
34
DLL supports SS scheme
External CLK
tD1
Internal CLK
(no DLL)
K
tC
DQ
)
D2
+t
D1
-(t
Internal CLK
(w/ DLL)
tD2
Desired
DQ
Data 1
Data 2
Data 3
Data 4
tD2
Data 1
Data 2
Data 3
Data 4
tD1 + tD2 have large P.V.T. variations
2011-11-18
35
DRAM interface on channel
[Hynix]
2011-11-18
36
TX driver (with impedance control)
[Hynix]
2011-11-18
37
Read data eye measurement (DDR)
[Hynix]
2011-11-18
38
DDR2 On-Die Termination (ODT)
On board termination resistance
is integrated inside of DRAM
[Samsung]
2011-11-18
39
ODT value selection and on/off ctrl.
[Samsung]
2011-11-18
40
ODT case study @DDR2-667 writes
[Samsung]
For two slot population, 50ohm seems to be
better than 75ohm in terms of signal integrity
For one slot population, 150ohm seems OK
2011-11-18
41
Signal integrity
[Hynix]
2011-11-18
42
Interface for Graphics memory
GDDR3 applications
Game
Game Consoles
Consoles
Laptop
Laptop // Mobile
Mobile
DDR data rate (Gbps)
High-End
High-End // D-T
D-T
2011-11-18
44
Interface for Graphics memory
Challenges for Graphics Memory
High-speed over 2Gbps for GDDR3, 7
4 Gbps for GDDR5
Low voltage under1.35V
1.5V for GDDR5, 1.8V for GDDR3
Low current consumption
Good quality of clock itself
Robust operation against various noisy environments
guarantee of operation under various power down mode
2011-11-18
45
Design trends
Low Power
- Reduce operating current
- Guarantee operations at low voltage
- Data output
Low Cost
- Small area
- Design for testability
Die cost down
Test cost down
2011-11-18
Low heat, low voltage drop
DVS (Dynamic Voltage Swing) at mobile app.
Data Bus Inversion
High Performance
- Robust DLL
- Low jitter DLL
- Good quality of DCC
- Low SSO noise
Wide data valid window
46
Clocking systems for DRAM interface
Input
Input clock buffer
robust clock
generation from
poor input signal
support low
power mode
2011-11-18
Clocks
DLL / PLL
delay (phase) compensation
wide operation range
(voltage / frequency)
good quality of clock signal
low jitter, duty-corrected
clock
Output
Clock control
output enable
Driver
Impedance matching
Multi slew-rate
Data Bus Inversion
47
Low power techniques
General concept
Power consumption = V(supply voltage) X I(current)
Input
Buffer
in mobile : just inverters
in graphics : low current two-stage amps
Buffer with low power mode
guarantee of low power function in mobile applications
stable operation under off-terminated environments
Clock
DLL
Architecture for low power consumption
Systematically low power operation
Output
Data Bus Inversion DC mode
2011-11-18
48
Low power in clock (DLL)
Architecture
compact circuits and architecture
In digital DLL, Dual-loop Single-loop
lower VDD than external VDD
Vperi using regulated power
decrease internal frequency [GDDR4]
minimize voltage drop
Smart power down control [GDDR3]
Power Consumption
vs. vs.
tCKtCK
Power
Consumption
mW
100
Proposed one
Previous one
90
80
70
60
79% reduction
50
40
20mW
30
4.2mW
20
10
0
1.0
1.1
1.2
1.4
tCK (ns)
tCK
(ns)
2011-11-18
1.6
2.0
10.0
VDD = 1.5V, Temp= 25 C
[ISSCC 08]
49
Low power in output
pseudo open drain I/O system
DBI DC mode [GDDR4]
data 0 consumes current
maximum # of 0 4
[SJ Bae. JSCC 08]
2011-11-18
50
Receiver type comparison
Pseudo open drain (GDDR3) vs. Push-Pull (GDDR2)
[Samsung]
Assumption : Same channel condition for both cases.
It doesnt represent absolute number of ODT power difference between
pseudo open drain case and push-pull case.
2011-11-18
51
Low power in output
pseudo open drain I/O system
DBI DC mode [GDDR4]
data 0 consumes current
maximum # of 0 4
[SJ Bae. JSCC 08]
2011-11-18
52
Low jitter / High performance
Clock
DLL
Low jitter operation with dual-loop architecture
Power noise tolerant replica
Dual DCC for stable duty error correction
Dual-mode with DLL and PLL
DLL for phase lock, PLL for jitter reduction [ISSCC 09]
Meshed power plan
Output
Driver
Data Bus Inversion AC mode : reduce SSO noise
2011-11-18
53
Low jitter in output
DBI AC mode [GDDR4]
reduce SSO noise
In data byte sequence, maximum # of change 4
Power supply noise generation
[SJ Bae. JSCC 08]
2011-11-18
54
GDDR5
JEDEC GDDR SGRAM comparison
[AMD(ATi)]
2011-11-18
56
JEDEC GDDR SGRAM comparison
[AMD(ATi)]
2011-11-18
57
Industry signal interface trend
[AMD(ATi)]
2011-11-18
58
GDDR5 key elements for reliable high
speed data transmission
[AMD(ATi), Qimonda]
2011-11-18
59
Comparison GDDR3 vs. GDDR5
Synchronization issues on every pin
combs on PCB
No needs of combs on PCB
cheaper solution with higher performance
[AMD(ATi)]
2011-11-18
60
Clamshell mode (x16 mode)
[AMD(ATi)]
2011-11-18
61
Recent researches on DRAM I/F
Ref
Applications
Conf.
Year
Issues
[6]
GDDR3
ISSCC
2006
Latency control, 2.5Gbps
[5]
GDDR3
ASSCC
2006
Low power/Wide range DLL architecture,
3Gbps
[2]
GDDR3
ISSCC
2008
Dual DCC, 3Gbps
[10]
GDDR3
ISSCC
2008
Multi-slew-rate output driver, impedance
control, 3Gbps
[1]
GDDR3
ISSCC
2009
Dual PLL/DLL, pseudo-rank, 3.3Gbps
[9]
Graphics
ISSCC
2009
pseudo-differential, common mode rejection,
referenceless, 6Gbps
[7]
GDDR5
ESSCIRC
2009
CML CDN, 5.2Gbps
[8]
GDDR5
VLSI
2009
Fast DCC, 7Gbps
[11]
GDDR5
ISSCC
2010
GDDR5 Architecture, Bank control, 7Gbps
[12]
GDDR5
VLSI
2010
Jitter and ISI reduction, 7Gbps
2011-11-18
62
DDR4
DDR4
2011-11-18
64
DDR4
[PCwatch]
2011-11-18
65
2011-11-18
66
2011-11-18
67
2011-11-18
68
Summary
DRAM Introduction
DRAM Evolutions
Memory Interface
Interface for graphics memory
GDDR3
Low power, low cost, low jitter / high performance
GDDR5
CDR for read (data training), external VPP, error correction, clamshell
mode
DDR4 preview
2011-11-18
69
References
Web sites and published data from Hynix, Samsung, Rambus, Elpida, Micron, AMD(ATi),
Intel, nVidia, SONY, Nintendo, Microsoft, Pcwatch, JEDEC
DRAM Circuit Design, B. Keeth, R. J. Baker, B. Johnson, F. Lin, IEEE Press
[1] H. W. Lee, et al. A 1.6V 3.3Gb/s GDDR3 DRAM with Dual-Mode Phase- and DelayLocked Loop Using Power-Noise Management with Unregulated Power Supply in 54nm
CMOS, ISSCC 2009
[2] W. J. Yun, et al. A 0.1-to-1.5GHz 4.2mW All-Digital DLL with Dual Duty-Cycle
Correction Circuit and Update Gear Circuit for DRAM in 66nm CMOS Technology,
ISSCC 2008
[3] S. J. Bae, et al. An 80 nm 4 Gb/s/pin 32 bit 512 Mb GDDR4 Graphics DRAM With
Low Power and Low Noise Data Bus Inversion, JSSC 2008
[4] K. h. Kim, et al. An 8 Gb/s/pin 9.6 ns Row-Cycle 288 Mb Deca-Data Rate SDRAM
With an I/O Error Detection Scheme, JSSC 2007
[5] W. J. Yun, et al. A Low Power Digital DLL with Wide Locking Range for 3Gbps
512Mb GDDR3 SDRAM, ASSCC 2006
2011-11-18
70
References
[6] D. U. Lee, et al. A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with Series Pipelined CAS
Latency Control and Dual-Loop Digital DLL, ISSCC 2006
[7] K. H. Kim, et al. A 5.2Gb/s GDDR5 SDRAM with CML Clock Distribution
Network, ESSCIRC 2009
[8] D. Shin, et al. Wide-Range Fast-Lock Duty-Cycle Corrector with Offset-Tolerant
Duty-Cycle Detection Scheme for 54nm 7Gb/s GDDR5 DRAM Interface, VLSI 2009
[9] K. S. Ha, et al. A 6Gb/s/pin Pseudo-Differential Signaling Using Common-Mode
Noise Rejection Techniques Without Reference Signal for DRAM Interface, ISSCC 2009
[10] D. U. Lee, et al. Multi-Slew-Rate Output Driver and Optimized ImpedanceCalibration Circuit for 66nm 3.0Gb/s/pin DRAM Interface, ISSCC 2008
[11] T. Y. Oh, et al. A 7Gb/s/pin GDDR5 SDRAM with 2.5ns Bank-to-Bank Active Time
and No Bank-Group Restriction, ISSCC 2010
[12] S. J. Bae, et al. A 40nm 7Gb/s/pin Single-ended Transceiver with Jitter and ISI
Reduction Techniques for High-Speed DRAM Interface, VLSI 2010
2011-11-18
71
Thank you
Appendix
SDRAM categorization
by Speed / Applications
/ DDR1 / DDR2 / DDR3 / DDR4 /
GDDR1 / GDDR2 / GDDR3 / GDDR4 / GDDR5 / GDDR5+ / GDDR6
mDDR / LPDDR2 /
by Density
/ 256Mb / 512Mb / 1Gb / 2Gb / 4 ~ 8Gb /
by Bus-Width
x4 / x8 / x16 / x32 /
2011-11-18
74
DRAM density & bus-width
bus-width
# of data output pins
determined by applications
for examples,
2011-11-18
PC : x64
Server : x64
Graphics card : x64 / x128 / x256 / x512 /
Game consoles : x32 / x128 /
75
DRAM total density
DRAM device density X number of devices
for servers
x4 configurations to increase total memory capacity
x4 4Gb 16 devices can be used (64bit) : 4Gb X 16 = 8GB
for laptops
x16 4Gb 4 devices can be used (64bit) : 4Gb X 4 = 2GB
for PCs
x4 / x8 / x16 configurations
for Graphics applications
x16 / x32 configurations
wide bus-width > total amounts of memory
16GB
[Hynix]
2011-11-18
76
DRAM total density
Conventional memory
Component (bit)
1Gb
256M x4
1Gb
128M x8
2Gb DDP
1Gb
Module (byte=x8)
512M x4
16ea.
256M x64
2GB
8ea.
128M x64
1GB
32ea.
64M x16
4ea.
1024M x64
64M x64
8GB
512MB
Applications
Server
PC
Notebook
Graphics memory
Component (bit)
Number
Bus-width
Total dens.
Applications
512Mb
8ea.
128bit (mirror)
512MB
XBOX 360
512Mb
16M x32
512Mb
16M x32
512Mb
2011-11-18
16M x32
16M x32
16ea./12ea.
4ea.
1ea.
512bit/384bit
128bit
32bit
1GB/768MB
256MB
64MB
High-End
PS3
Nintendo Wii
77
Data bandwidth
For example of GDDR3 on PS3
700MHz/pin
1.4Gb/s/data channel(pin)
Each device has 32bit data I/O
1.4Gb/s X 32 = 44.8Gb/s/component
4 components configurations (32bit X 4 = 128bit)
44.8Gb/s/component X 4 = 179.2Gb/s
Data bandwidth is 22.4GB/s
[SONY]
To increase data bandwidth
Clock speed
Wide I/O
More components (in other words, wide I/O in total)
2011-11-18
78
Data bandwidth
Increasing clock speed per pin
700MHz 1GHz
2.0Gb/s X 32 X 4 / 8 = 32GB/s
ex) High-end graphics cards use 1.3GHz (2.6Gbps) [GDDR3]
ex) 3.6 ~ 4.8Gbps [GDDR5] / up-to 7Gbps (@ES)
Increasing I/O bits per component (wide I/O per component)
32bit 64bit
1.4Gb/s X 64 X 4 / 8 = 44.8GB/s
32bit is maximum in mass production
x4 x 128 (x512) in TSV
Increasing # of components (in other words, wide I/O in total)
48
1.4Gb/s X 32 X 8 / 8 = 44.8GB/s
to increase bus-width : ex) High-end graphics cards use 16
components (=512bit)
2011-11-18
79
Mirror function
To increase total density without increasing data bus-width
an example of 512Mb GDDR3
[ISSCC 09]
2011-11-18
[Hynix]
80
XBOX 360
8 x 32b
128b
[Microsoft]
2011-11-18
81
Prefetch operation
2011-11-18
82
DDR2/3 Architecture
2011-11-18
83
DDR2 block diagram
2011-11-18
84
Simulation schematic
[Hynix]
2011-11-18
85
Measurement vs. Simulation
[Hynix]
2011-11-18
86
DRAM core speed path
2011-11-18
87
Internal voltages
2011-11-18
88
ZQ Cal
2011-11-18
89
Design trends
2011-11-18
90
Receiver type comparison
2011-11-18
91