Clock Tree Synthesis
October 06, 2012
SmartPlay Overview
“To be a leading service provider of End to End Solutions
enabled by Innovative Business Models that provide Value,
Quality and Execution excellence to our Customers”
Semiconductor
Digital Analog Wireless Software System Design
World-wide Sales
Common Support Functions (HR/Staffing/Ops/Finance)
Common Infrastructure
2
Confidential
Agenda
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
SmartPlay Proprietary & Confidential 3
Agenda
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
SmartPlay Proprietary & Confidential 4
Introduction to CTS
In VLSI flow, CTS is performed after the placement and
before the routing of signal nets.
SmartPlay Proprietary & Confidential 5
Cont..
Clock is propagated after placement because the exact physical
location of cells and modules are needed for the clocks
propagation which in turn impacts in dealing with accurate delay
and operating frequency
Clock is propagated before routing so that clock router can have
optimum utilization of all routing resources which leads to
minimum skew as well as low dynamic power dissipation.
SmartPlay Proprietary & Confidential 6
Introduction to CTS
Within most VLSI circuits, data transfer between sequential
elements is synchronized by the processing clock.
Before CTS, All clock pins are driven by a single clock source having
high fan-out and high load.
SmartPlay Proprietary & Confidential 7
Cont..
CTS is the process of inserting buffers/inverters along the clock
path of the ASIC design to balance the clock delay to all clock inputs.
In order to balance the skew and minimize insertion delay, CTS is
performed.
SmartPlay Proprietary & Confidential 8
Outline
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
CTS Goals
Given a clock source and n sinks.
Connect all sinks to the clock source by an interconnect network
(tree or non-tree) so as to minimize:
• Clock Skew = maxi,j |ti - tj|
• Delay = maxi ti
• Minimizing Power dissipation
• Total wirelength
• Noise and coupling effect
SmartPlay Proprietary & Confidential 10
Outline
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
Clock Skew
Clock skew is the maximum difference in the arrival time of a clock
signal at pins of two different sequential elements.
Figure showing both Local Skew and Global
skew
SmartPlay Proprietary & Confidential 12
Cont..
There are two types of clock skew:
Local skew: Local skew is the difference in the arrival of clock
signal at the clock pin of related flops of same clock domain.
Global skew: Global skew is the difference in the arrival of
clock signal at the clock pin of non-related flops of same
clock domain.
This is also defined as the difference between shortest clock
path delay and longest clock path delay of same clock
domain in overall design reaching two sequential elements.
SmartPlay Proprietary & Confidential 13
Cont..
Clock skew are also classified as +ve and –ve skew:
Positive skew : Capture clock comes late than launch clock .
Data and clock are routed in same direction. Also, both travels
in same direction
+ve skew improves setup time but can lead to hold violation
Negative skew: Capture clock comes early than launch clock.
Data and clock are routed in opposite direction. Also, both
travels in opposite direction
-ve skew improves hold time but can lead to setup violation.
• Beneficial Skew : If clock is skewed intentionally to resolve
violations
SmartPlay Proprietary & Confidential 14
Cont..
Figure showing both +ve skew and –ve skew
SmartPlay Proprietary & Confidential 15
Clock Latency
It is the delay that is assumed to exist between the clock source
and the flip-flop clock pin during pre CTS stage.
This is used before clock routing, when clock is ideal.
It is not the actual delay, but the delay specified by the user, to
account for the clock delay which will be implemented after
routing of clock tree.
The timing analyzer uses this information to determine clock
arrival times in the absence of propagated clocking i.e. during pre
CTS.
SmartPlay Proprietary & Confidential 16
Cont..
There are two terms associated with latency:
Source Latency: It is the time taken by the clock signal to
propagate from its ideal waveform origin point to the clock
definition point in the design.
Network Latency: It is the time taken by the clock signal to
propagate from the clock definition point in the design to the
clock pin of the sequential device.
Figure showing source latency and
network latency
SmartPlay Proprietary & Confidential 17
Insertion Delay
Once CTS is complete i.e. post CTS, the actual delay from the clock
source point to the clock sink points can be calculated. These are
typically called insertion delays at that point.
SmartPlay Proprietary & Confidential 18
Uncertainty
To be written
SmartPlay Proprietary & Confidential 19
Jitter
To be written
SmartPlay Proprietary & Confidential 20
Clock-gating
Clock tree consume more than 50 % of dynamic power.
So we turn off the clock, when it is not needed by using clock-gating
cells
There are two types of clock gating styles available. They are:
1) Latch-based clock gating
2) Latch-free clock gating.
SmartPlay Proprietary & Confidential 21
Latch free Clock-gating
It uses a simple AND or OR gate.
The output gated clk, can turn terminate prematurely or can
generate multiple clocks pulses.
This restriction makes it inappropriate for single clock based flip-flop
designs.
Latch free clock gating
SmartPlay Proprietary & Confidential 22
Latch Based Clock-gating
This style adds a level-sensitive latch to the design to hold the
enable signal from the active edge of the clock until the inactive
edge of the clock.
Since the latch captures the state of the enable signal and holds it
until the complete clock pulse has been generated, the enable signal
need only be stable around the rising edge of the clock
SmartPlay Proprietary & Confidential 23
Outline
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
Clock Routing Algorithms
How to minimize Skew
Distribute the clock signal in such a way that the
interconnections carrying the clock signal to functional sub-
block are equal in length.
Several clock routing algorithm exit which try to achieve this
goal:-
• H-Tree based algorithm
• X-Tree based algorithm
• MMM algorithm
• Bone Fish Algorithm
SmartPlay Proprietary & Confidential 25
H-Tree Clock Routing
SmartPlay Proprietary & Confidential 26
H-tree Algorithm
Minimize skew by making interconnections to sequential elements
equal in length
•Symmetric Pattern
•The skew is 0 assuming delay is directly proportional to wire
length
Can be used when terminals are evenly distributed
•However, this is never the case in practice (due to blockage,
and so on)
•So strict (pure) H-trees are rarely used
• However, still popular for top-level clock network design
•It utilizes a lot of routing resources.
•Power dissipation is also high.
SmartPlay Proprietary & Confidential 27
X-tree Algorithm
An alternate tree structure with a smaller delay
Assuming non-rectilinear routing is possible
Can Although apparently better than H-Tree but this may cause
crosstalk due to close proximity of wires.
Like H-Trees, this is also applicable for very special structures
Not applicable in general
SmartPlay Proprietary & Confidential 28
X-tree Algorithm
SmartPlay Proprietary & Confidential 29
Method of Means and Medians (MMM)
Follows a strategy very similar to H-Tree.
Recursively partition the terminals into two sets of equal size
(median). Then, connect the center of mass of the whole circuit to
the centers of mass of the two sub-circuits (mean).
Clock skew is only minimized heuristically. The resulting tree may
not have zero-skew.
The basic algorithm ignores the blockages and produces a non-
rectilinear tree . Some wires may also intersect.
• In the second phase, each wire can be converted so that it
consist only of rectilinear segment and avoids blockage.
SmartPlay Proprietary & Confidential 30
Method of Means and Medians (MMM)
SmartPlay Proprietary & Confidential 31
Fish-Bone Algorithm
The clock driver drives all the clock pins directly.
Skew is caused by differing interconnect lengths and loads
If the clock driver delay is much
larger than the interconnect delays,
then the skew will be minimum but
insertion delay will large.
Implementation of fish bone
Algo in a design
SmartPlay Proprietary & Confidential 32
Outline
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
Conventional CTS Distribution
It is the most used approach for dealing with design complexity
There is very huge depth for both
buffer and clock-gating levels.
Most of the sinks in the design
share very less paths back to the
clock root.
Impact of on-chip-variation effect
is very high.
SmartPlay Proprietary & Confidential 34
Clock-Mesh Distribution
It has extremely shallow logic depth below the mesh, usually just a
single buffer or clock gate directly driving the sinks.
It has large shared path from
clock root to the mesh.
Impact of on-chip-variation effect
is minimal
It uses a very dense mesh fabric.
Ultra low skew values can be
achieved
SmartPlay Proprietary & Confidential 35
Clock-Mesh Distribution
It exhibits high power dissipation.
The design logic attached to the mesh fabric is relatively small bins
that contains cluster or sub-cluster amt. of logic. Further, the clock
to logic could be connected by fish-bone or comb logic
It is not good for the design
having RAMs, ROMs and other
hard blockage.
Clock routing in sub-cluster by
fish-bone. The dark black net is
clock mesh
SmartPlay Proprietary & Confidential 36
Multi-Source CTS Distribution
It has a moderate depth for both buffer and clock-gating levels.
The multi-clock source are
located at the bottom of the mesh
grid and all the structure above
the mesh form a shared path back
to the root clock buffer.
Impact of on-chip-variation effect
is greater than clock mesh but less
than conventional CTS.
SmartPlay Proprietary & Confidential 37
Multi-Source CTS Distribution
Mesh fabric is one or two orders of magnitude less dense as of
Clock-Mesh distribution.
It exhibits power dissipation as same as conventional CTS. It
allows greater clock gating depth, thus saving more power.
It offers much larger logic
groupings that are themselves small
clock trees. So each logic grouping
can have their own clock tree
structure
SmartPlay Proprietary & Confidential 38
Checklist before doing CTS
Placement – Completed
Power ground nets – Pre-routed
Estimated congestion – Acceptable
Estimated Timing – Acceptable (setup should be ~0 ns )
Estimated Max Tran/Cap – No Violations
SmartPlay Proprietary & Confidential 39
Inputs Required for CTS
Detailed placement Database
Target for Latency and skew if specified
Buffers/inverters for building the clock tree
Clock tree DRC (Max Tran, Max Cap, Max Fanout, No. of Buffer
levels)
SmartPlay Proprietary & Confidential 40
Outline
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
Steps used by CTS Algo’s
Create the virtual clusters by identifying the location of the leaf
cells which are in the close proximity of each other.
If there are leaf cells that are far from any cluster, they will be
moved to nearest cluster.
The no. of leaf cells per cluster is user defined.
Once the clusters and their locations are determined, buffer
insertions begin such that the clock propagation delay is equal to
each cluster, and clock skew within each cluster is minimized.
The smaller the cluster, the less the skew, but more clock buffering
levels will be required.
SmartPlay Proprietary & Confidential 42
Outline
Introduction To CTS
Objective
Basic Terminologies
Clock Routing Algorithms
Clock distribution Techniques
Checklist before doing CTS
Inputs Required for CTS
General Steps for CTS
ICC commands for performing CTS
Effect of CTS
Checklist after CTS
Hands Off
ICC commands for performing CTS
As explained in text file
SmartPlay Proprietary & Confidential 44
Effect of CTS
Clock Buffers are added
Congestion may increase
Non clock cell may be added to non-ideal location
Can introduce timing and max cap/tran violation
SmartPlay Proprietary & Confidential 45
Checklist After CTS
Skew Report
Clock tree Report
Timing report for Setup and Hold
Power and Area Report
SmartPlay Proprietary & Confidential 46
Output of CTS
Database with properly build clock tree in design
SmartPlay Proprietary & Confidential 47
Reference
1) Synopsys Solvnet
2) “Physical Design Essentials” Authored by Khosrow Golshan,
Publication “spring Publication”
3) http://www.vlsi-basics.com/2013/10/clock-tree-synthesis-
cts.html
4) http://vlsi.pro/physical-design-flow-iiiclock-tree-
synthesis/#prettyPhoto
SmartPlay Proprietary & Confidential 48
Any Questions
SmartPlay Proprietary & Confidential 49
Thank You
Confidential 50