Clocking Part 2
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Clocking
For modern processors, cycle time is around 1620 FO4 delays, of which registers take 2-4
FO4 delays
Power consumption dominated by clock load, both
distribution network and end loads (latches,
prechargers)
70% of total power in IBM POWER4 design
Simple single-edge triggered registers are fine
for most ASIC designs. This lecture well
examine what is happening in high performance
designs.
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Edge Triggered Timing Constraints
TPmin/TPmax
Combinational
Logic
CLK1
CLK2
Slow path timing constraint
Tcyc TCQmax + TPmax + Tsetup+ Tskew
worst case is when CLK2 is earlier/later than CLK1
Fast path timing constraint
TCQmin + TPmin Thold + Tskew
worst case is when CLK2 is earlier/later than CLK1
Fast path constraint cannot be fixed by slowing
clock fatal to chip design
Skew reduces cycle time
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Two Phase Latch Based Design
Combinational
Logic 1
CLK1
Combinational
Logic 2
CLK2
CLK1
CLK1
CLK2
Non-overlap times
Divide cycle into two phases
phase 2 latches can only sample values generated from
phase 1 latch outputs, and vice versa.
Latches driven by two non-overlapping clocks
Can guarantee no fast path problems with larger
non-overlap
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Two Phase Timing
A
Combinational
Logic 1
CLK1
CLK1
Tx
TNO
TNO
Ty
Combinational
Logic 2
CLK2
CLK2
Tz
TDQmax
TP1max
TDQmax
TP2max
In steady state, Tz Tx, therefore minimum cycle time
Tcyc TP1max + TP2max + 2TDQmax
Non-overlap time, TNO, can be adjusted such that no hold time
violations are possible:
TNO + TCQmin - Tskew Thold
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Time Borrowing
A
Combinational
Logic 1
CLK1
CLK1
C C.L. D
2
CLK2
Tx
TNO
CLK2
A
TNO
Tsetup
TCQmax
TP1max
C
D
Can place latches where convenient in logic path
Maximum time in one combinational logic block is
TP1max Tcyc TCQmax Tsetup TNO Tskew
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Single Clock Latch Based Design
Combinational
Logic 1
CLK
Combinational
Logic 2
CLK
CLK
Two phase non-overlapping system requires distribution of two clocks.
Can distribute single clock signal, and invert locally at latch.
Clock skew can cause overlap between transparent phases of CLK and
inverted CLK, so must check for fast path hold time violations.
Very common clocking scheme for full custom chips, works well with
pipelined domino logic.
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Pipelining Domino Logic
Domino circuits require monotonic change in input signal during
evaluation phase - cannot easily guarantee this with most edge
triggered devices.
Transparent latches allow setup of logic inputs before clock edge.
X
Q
NMOS
CLK
CLK
X
Q
eval.
Q
CLK-Q delay discharges
precharge node
CLK
precharge
Degraded
level
eval.
NMOS
X
Q setup before clock edge
CLK
6.371 Fall 2002
precharge
11/6/02
L18 Clocks Part 2
Pulse Latches
By using narrow clock pulses, can have only a single latch in any
combinational loop.
Used in Cray-1, and in many high-performance (Pentium-4) and
low-power microprocessors (XScale).
Tw
Combinational
Logic
CLK
Thold
Tsetup
CLK
TCQmin
TPmin
A
B
TPmax
Thold
Cycle time, Tcyc,min TDQmax + TPmax + Tsetup + Tskew Tw
Tw is pulse width, and gives maximum time borrowing for
previous cycle
Two-sided timing constraint on pulse width
Tsetup < Tw < TCQmin + TPmin - Thold - Tskew
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
Double-Edge Triggered Registers
Clock load of flip-flops is significant fraction of total chip
power. Can reduce clock frequency in half by using a
double-edge triggered flip-flop.
Q
B
CLK
A
Latch Sample Latch Sample
Sample Latch Sample
Latch
B
CLK
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
10
Pentium-4 Pulse Latches
Pentium-4 distributes 50% duty cycle global clock at
advertised frequency (e.g., 2.8GHz Pentium-4 has 2.8GHz
clock)
Fast ALU section of Pentium-4 runs at twice advertised
clock frequency using pulse latches driven from both edges
of the distributed clock. Clock buffers have duty cycle
correction circuitry to ensure 50% duty cycle.
GCLK (2.8 GHz)
PCLK (5.6 GHz)
GCLK
6.371 Fall 2002
PCLK
11/6/02
L18 Clocks Part 2
11
Flip-Flops Timing
[ Stojanovic and Oklobdzija ]
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
12
Crossing Time Domains
Common to have to communicate between logic
blocks running at unrelated clock frequencies
TCLK
TCLK Clock
Clock
Domain
Domain
TCLK
RCLK
RCLK
RCLK Clock
Clock
Domain
Domain
TCLK
RCLK
Possible
setup time
violation
Possible
hold time
violation
If setup and hold times are violated, flip-flops
might hang in a metastable state.
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
13
Metastability
CLK
Voltage
Feedback
CLK
metastable
Sampling latch
Observation
Interval, t
Time
Probability of failure (i.e., not valid 1 or 0) when observed
time t after clock edge
- t r
F(t) = k e
Parameters k and r functions of latch design. r is called the
time constant of resolution and is primarily controlled by the
gain-bandwidth product of the feedback loop (dont use
dynamic latches as synchronizers!). Error probability decreases
exponentially with t but always some chance of failure.
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
14
Metastability Failure Calculations
-t/ r
ff = tW fT fC e
Frequency of failures for sampling window (setup+hold) of tW,
sampling frequency fC and input transition frequency of fT
For 1GHz sampling clock, 100MHz transitions, 50ps
setup+hold, 50ps time constant, 950ps observation time
ff = 0.03Hz (Mean Time Between Failures: 33 seconds)
Increase observation time to 1950ps (two cycles)
ff = 5.8x10-11 Hz (MTBF 550 years)
Increase observation time to 2950ps (three cycles)
ff = 1.2x10-19 Hz (MTBF 266 billion years)
6.371 Fall 2002
11/6/02
L18 Clocks Part 2
15
Synchronizers
RCLK
TCLK
Use N interleaved registers, each
clocked at 1/N of RCLK rate, to
increase resolution interval by factor of
N without decreasing signal bandwidth.
TCLK CLKA
6.371 Fall 2002
Use pipelined registers to give full RCLK
cycle to resolve asynchronous input.
CLKB
RCLK
CLKC
Rotating
Select
RCLK
CLKA
CLKB
CLKC
Observation
Interval
11/6/02
Repeat
Interval
L18 Clocks Part 2
16