Timing Closure
Lecture 2
Jignesh Shah
UCSC Extension, Silicon Valley
Spring 2024
Agenda
• Quick review of lecture 1
• Overview of STA for setup & hold check
• Interconnect / Wire delay
• Lab #2
Review of lecture 1
• Understood CMOS operation
• Standard cell library : combinational and sequential cells.
• Rise/fall time (aka slew)
• Measured 10% - 90% or 20% - 80%
• F(input slew, output load)
• Delay arc from input pin to outpin
• Tcomb = Delay from 50% of Input to 50% Output
• Tcomb = F(input slew, output load)
• Setup, hold, and Tcq of flip flops
• Setup = Data (D input) must be stable before clock arrives
• Hold = Data (D input) must be stable after clock transitions
• Setup, Hold = F( clock slew, data slew)
• Tcq = Delay from 50% of Clock to 50% of Q (output)
• Tcq = F(input slew, output load)
Lab1 (i.e. Timing Characterization)
• Inverter delay
Lab1 (i.e. Timing Characterization)
• Setup Time, Clock to Q within acceptable range
Lab1 (i.e. Timing Characterization)
• Setup Time, With large Clock to Q delay
Lab1 (i.e. Timing Characterization)
• Flop output is not latched or meta stable . Setup time need to increase for a flop
Types of Timing Analysis
● Dynamic Timing Analysis
○ Vector Based Simulation
○ Input stimulus to activate the longest timing from input to output
○ Use model of transistor
○ Most accurate
● Static Timing Analysis
○ No need for Vector.
○ Can find worst case delay through propagation
○ Use abstract model for delay of building block
○ Depend on good constraint.
What is STA
• Static Timing Analysis
• (STA) is a method of determining if a circuit meets timing constraints without
simulating clock cycles
• How it works:
• Identify timing startpoints and endpoints
• Input/output ports, registers/latches
• Trace delays through timing paths from startpoints and endpoints
• Compare path delays to clock period to see if constraints are met
• Also check hold violations (races) and transition violations (from library design
rules)
• STA is:
• Exhaustive
• Fast
• Not dependent on simulation vectors
STA steps
• Circuit is broken into timing paths
• Delay of each path is calculated Reset
Async
path
Fig 1: Path group ( clock gating not shown)
• Each path delays are checked against timing constraints
Inputs & Outputs for STA
Design Gate Design RC
Delay models
netlist netlist
(i.e. .lib/.db)
(*.v) (*.SPEF)
Design Constraint or
Intent (*.sdc)
STA Tool / Flow
Timing Design Sanity
Report report / log
Cell & Wire delay in STA
• Delay of combinational cell is characterized through
spice simulation for a specified range of output loads
& input slew.
• Ditto for setup & hold constraint of sequential cells.
• Delay, Constraint, and input capacitance of library
cells are written in abstract delay model.
• STA tool interpolate or extrapolate cell delay from
abstract model using proprietary algorithm.
• Wire R & C are extracted by an extraction tool.
• Net delay is computed by STA tool using Elmore
delay type of method.
Cell delay (aka gate delay)
• Cell delay is a function of
both input slew and
output load
• Input slew increases the
Vgs which in effect turns
on the transistor stronger
and stronger
• Gate delay is proportional
to its output capacitance
(output load which is a
combination of metal
routes and input gates)
Cell delay of cascaded gates
Timing Models (.lib) for cell delay
NLDM CCS / ECSM
Voltage Source based Current Source Based
STA used single Load capacitor (Ceff) STA uses two capacitance model.
In-accurate delay modeling if Rd << Znet Accurate for Rd << Znet.
(Large drive strength cell drive a large cap)
Can account for Miller Capacitance Can not account for Miller Capacitance.
Smaller file Size (i.e. faster to load) Huge file size (i.e. slower to load)
Less accurate for non-linear waveform of advanced Better accuracy for non-linear waveform of advanced
process nodes with FinFet process nodes with FinFet
Interconnect
• Net is a wire connecting pins of standard cells and blocks
• Has only one driver
• Can drive a number of fan out cells or blocks
• Can travel on multiple metal layers of the chip
• Can be broken up into segments and modeled with LRC elements. L is mostly ignored.
• Interconnect parasitic
• Interconnect resistance – resistance between the output pin of a cell and the input pins of
the fan out cells
• Interconnect capacitance – comprised of grounded and between neighboring signal routes
• Interconnect inductance – arises due to current loops, effective of inductance can be ignored
for low frequencies
• Before layout R & C of wire estimated based on fanout using wireload model.
Timing check before layout referred as pre_layout STA.
• After physical implementation, R & C of all nets are generated by a RC extractor
tool.
Interconnect Extraction & Delay
Interconnect Extraction inside a too
Elmore delay for Interconnect
* Distributed RC trees
- Have single Driver.
- Do not have resistive loops
- Have capacitances only
coupled to ground
Fig.1 Distributed RC Tree
(i.e. Typo in drawing, Second R1 -> R2)
* Arnoldi algorithm and Asymptotic Waveform Evaluation (aka AWE)
method are also used to calculate wire delay.
T-model
p model
Distributed Interconnect RC model
Timing check of a Path in STA
FF1 = Launch Flop, FF2 = Capature flop
Tskew = Tclk_launch - Tclk_capture
Thold , Tsetup = Hold , Setup of a flop
Tcq = Clock to Q delay of a flop
Setup Check (aka max delay check):
Tcycle >= Tcq + Tcomb + Tskew + Tsetup
Hold Check (aka min delay check):
Tcq + Tcomb >= Thold + Tskew
Sections of a Timing Path
• Launch Side (aka release side)
Arrival Time = Delay from clock Origination to Source flop
+
Delay through Combinational logic upto Destination flop
• Capture Side
Required Time = Delay from clock Origination to Destination flop
+
Constraint of Sequential cell
Fig 1: Logic between two flops Fig 2: Data can arrive in green window
Basic of STA flow
• Set up design environment, read and link design
• Search path, link path
• Read designs, libraries, then link
• Set operating conditions (i.e. PVT), wireload models
• Specify timing constraints
• Clock period, waveform, uncertainty and latency
• Input/output delays,
• External Drive and load for IO.
• Specify timing exceptions
• Multi cycle and false paths
• Min/Max delays, segmentation, disabled arcs
• Perform timing analysis, create reports
• Check timing constraints
• Generate timing and constraints reports
Pre and post layout timing
Slack and critical path
• Slack is the difference between the required time and the arrival time
• Negative slack è violation
• Positive slack è timing met
• Critical path is a path in the design that has the smallest slack
• Basically the path that defines the operating freq (speed) of the circuit
Fig 1: Positive Setup slack
Fig 2: Negative Setup slack
Max & Min Analysis in STA
• STA tool calculates the worst possible slack for each timing path.
• Max analysis (aka setup check, Recovery check for Reset type signal):
Longest launch delay (aka late delay) - shortest capture delay (aka early delay)
• Min analysis (aka hold check, Removal check for Reset type signal):
Shortest launch delay (aka early delay) - Longest capture delay (aka late delay)
Gate dominated versus metal dominated
paths
• Gate dominated paths
• The predominate elements in the path are logic gates
• Interconnect are short
• PVT has major impact on path delay
• Metal dominated paths
• The predominate elements I the path are interconnect
• Few gates in the path driving long distances
• PVT has less impact on path delay
Comprehensive timing checks
• Many types of timing and design rule checks
• Timing checks can be delay calculated or SDF annotated
• Setup and hold can depend on Slew(data), Slew(clock) and Cap(Q)
Perform timing analysis
• To identify problems with design or assertions before spending time
on detailed reports
• Use check_timing –verbose
• Timing_check_defaults: list of timing checks performed by check_timing
• Unconstrained endpoints
• Combinational loops
• Missing clock definitions
• Multiple clock fan in
• Latch fan out problems
• Generated clocks consistency
• Ignored timing exceptions
• Ports with missing input delay
• Etc…
Generate reports
• Summarize timing results
• Report_constraint <options>
• -all_violators –verbose may cause long run times
• Instead use –max_delay (for setup) or –min_delay (for hold) to focus on specific issue
• Display worst violators
• Report_timing <options>
• Large values for –nworst or –max_paths options may cause long run times
• Unconstrained paths are not listed by default
• Timing_report_unconstrained_paths = “false” (set to true to display unconstrained paths)
• Report bottleneck cells contributing to multiple violations
• Report_bottleneck <options>
Abstraction of STA for SoC build
Gate netlist
& RC netlist
SDC
Signoff
Methods
Lab #2 ( Get familiar with STA tool)
a[31:0]
DFFX1 a_bit[31:0]
C_bit[31:0] Sum[31:0]
b[31:0] DFFX1
32 bit adder
Mux_b_bit[31:0]
b_bit[31:0]
mux
DFFX1
Fig1. Design of accumulator
Lab #2 – Familiar with Primetime
• Get familiar with PrimeTime, a STA tool
• Generate timing reports
• Try various different commands and options
• Identify setup/hold violations
• Copy file from /home/jdshah/fall_2023_tc_labs/lab2/*
• Two ways to run PT
-> Type pt_shell to start pt_shell
Each command in accumulator.tcl can be executed one by one in pt_shell
-> Type pt_shell –f <file_name, like accumulator.tcl>
This will execute all commands in accumulator.tcl
Thank You
Backup Slides
Parasitic Extraction of Device
REF: https://www.semiwiki.com/forum/content/3084-
handel-jones-predicts-process-roadmap-slips.html
Intro to PrimeTime (PT)
• PrimeTime (PT) is a full chip gate level static timing analyzer
• Static timing sign off tool
Created by
• Fast and memory efficient
Memory Compilers
Hand-Crafted
core
with customizable A/D
• Delay calculator
features M e m ory
Laid out
• Custom block modeling solution
for
uP performance
D/A
uC M e g a - C e ll s
• Advanced analysis functionality
Synthesized logic
Sequential Check & Delay
FINFETs è Gate Capacitance Increases
è Leakage Improvement
è FinFETs
è ~ 40% faster
è less than ½ dynamic power
è cuts static leakage current by > 60%.
Gate
Source Drain
Structure of Transistor
Structure of a planar NMOS
Structure of a FinFet NMOS
Width of Channel = 2 X Fin Height + Fin Width
Structure of a Multi FinFet NMOS
Cell delay (aka gate delay)
• Cell delay is a function of
both input slew and
output load
• Input slew increases the
Vgs which in effect turns
on the transistor stronger
and stronger
• Gate delay is proportional
to its output capacitance
(output load which is a
combination of metal
routes and input gates)
Cell delay of cascaded gates
Sequential Check & Delay