Timing Closure
Lecture 8
Jignesh Shah
UCSC Extension, Silicon Valley
Spring 2024
Agenda
• Logistics
• Quick review of lecture 7
• Time Borrowing & Useful Skew
• Clock Gating, Minimum_width, Min_period check
• Path Based Analysis, Graph Based Analysis, DMSA
• IO delay constraint
• Engineering Change Order ( aka ECO)
• Lab 7
Logistics
• Reference of books on interview questions posted on Canvas in
discussion forum.
Review of lecture 7
• Multiple Clock Domain
• Synchronizers
• FIFO
• Asynchronous Handshake
• Signal Integrity
• Crosstalk Delay
• Crosstalk Noise
• Crosstalk Fix
Sequential Cells
Flops Latches
Data pass through at positive or negative clock edge Transparent at either high or low clock pulse
Slower compare to Latches Faster compared to flops
More power & area compared to Latches Less power & area compared to flops
Setup & Hold defined at triggering edge Setup & Hold defined at closing edge of transparent
Design analysis is easy Design analysis is complex
Susceptible to Noise Robust against glitch noise
Symbol of a positive edge trigger flop Symbol of a positive level latch
Setup & Hold check at rising Setup & Hold check at closing of
clock edge a flop transparent pulse.
Opening edge of Transparent pulse
Time borrowing with latch (aka cycle stealing)
• Path ending at Latch can borrow some time from next path in pipeline
during transparent window as data can be captured until closing edge.
Fig1 Data at ULAT is arriving after opening edge
• Borrowed time is subtracted from available datapath time of next stage.
PT report with time borrowing
Fig 2. Report of Second Stage
Fig 1. Report of First Stage
Lockup latch to avoid Hold violation during Scan
• A latch with opposite clock polarity inserted at clock domain crossing
of scan chain to avoid hold violation due to large clock skew.
Useful skew
• The intentionally added skew in clock paths to meeting timing is called
useful skew.
Fig 2 Setup can be fixed through useful
Fig 1 Setup violation at FF2
skew by adding clock buffer at FF2
Clock divider
Div2_CLK
Div4_CLK
• Create_clock –name CLK –period 10 [get_port CLK]
• Create_generated_clock –name Div2_CLK -divide_by 2
-source [get_pin FF1/CK] -master CLK [get_pin FF1/Q]
• Create_generated_clock –name Div4_CLK -divide_by 2
-source [get_pin FF2/CK] -master CLK [get_pin FF2/Q]
CLK • Feedback path of divider circuit ( ie. FF1/QN -> FF1/D] has to be
timed.
Div2_CLK
Div4_CLK
Discrete Clock gating
• For discrete clock gating a combination cell like AND or OR gate is added on clock
with control signal and clock as input.
• Output of clock gater cells is considered as clock signal.
• A clock gating check ensure that control signal remains stable during active pulse.
• User has to specify clock gatin setup/hold requirements for all discrete clock gater
• Set_clock_gating_check –setup 0.25–hold 0.1 [get_cells Gating_cell]
Gating signal
CK at a gater
• Clock gating hold check is at inactive edge
• Clock gating setup check is active edge
PT report for clock gating group
Fig 1. Setup check Report Fig 2. Hold check Report
Integrate Clock Gating (aka ICG) cells
• To prevent glitches on clock signak, a latch based clocks gating style
can be used to hold the enable signal steady from active edge to inactive
edge of clock.
• STA tool can infer setup & hold check from timing model of ICG, so user
donot need to specify “set_clock_gating_check” for it.
• ICG can have more area compared to discrete clock gater.
• Design Test can be enabled through ICG by adding test control signal.
ICG use in design
Logic without clock gating Logic with clock gating using ICG
Timing wavefom at output of ICG
Clock Gating for STA purpose
• STA tool can infer a clock gating check at inputs of combinational cells if
- One of the input is clock signal & One of the input is Data signal
and output of combinational cell is fan out to clock pins of sequential cells.
• Liberty model of ICG cells has timing arc between enable and clock pin.
• Clock gating timing check ensure Enable signal change only during inactive
phase of clock.
Min pulse width check
• Minimum pulse width check ensure that width of high & low pulse of a clock
like signal is wide enough for internal function operation of a cell.
• Limit of minimum pulse width is defined in library model of a cell or can be
defined through constraint.
• Min pulse width is defined for sequential cells, clock enable signals or hard IPs.
• Violation of min pulse width occur due due to imbalance rise and fall time as
well as different variation on NMOS & PMOS.
• Min pulse width violation can be fixed by using balance clock cell and using
paired inverters in clock networks.
Imbalance rise and fall delay of
a buffer reduce pulse width.
Minimum period check
• The minimum period defines the maximum frequency at which the memory
type IP can operate.
• The minimum period of the SRAM instance is equal to the sum of the minimum
pulse width high and the minimum pulse width low.
• Limit of minimum period is defined in timing model.
Fig1. Example of Clock Min period report
Multi VT & Channel length cells
• Multi Vt & channel length cells are used to trade off speed with leakage.
• Foot print compatible cells
• Typically foundries offer few flavors of Multi Vt stdcells:
• High VT cells (HVT) – lease amount of leakage because high threshold (i.e. VT) but is the slowest
• Standard VT cells (SVT) – these are nominal
• Low VT cells (LVT) – higher amount of leakage (because VT is low) but is faster than SVT
• Ultra low VT cells (ULVT) – highest amount of leakage (VT is very low) but is fastest
• Typically foundries offer few flavors of Channel length stdcells:
• Nominal Channel Length
• Longer Channel Length – Slower & less leaky than cells of nominal channel length.
• Typical design approach
• Synthesize/PnR with SVT & Nominal Channel length cells
• Run STA and determine slack
• For positive slack, swap SVT to HVT, swap Nominal channel length to Longer channel length using ECO
• For critical paths with negative slack, swap SVT to LVT or to ULVT (limited use) using ECO
ECOs (engineering change order)
• Functional ECOs changes the functionality of the design
• Usually a result of design bug
• Examples:
• Adding flops
• Changing from AND to OR
• Use spare gates
• Functional ECOs originate from RTL
• Timing ECOs.
• Timing ECOs do not change functionality
• The main objective is to meet setup/hold time
• Size up/down cells, adding/remove buffers
• Swap VT cell
• Swap Channel Length cells
• Do not need to be back annotated to RTL
• Fix_eco_* command in PT
What if analysis
• “What if” analysis allows minor editing of the netlist
• Any “real” changes needs to go back to the source netlist (or RTL)
• Commands for netlist/timing modification
• Size_cell – up size or down size cell
• Insert_buffer/remove_buffer – add/remove buffer in a data path
• Create_cell/remove_cell – add/remove cell (not buffers)
• Create_net/remove_net – add/remove net
• Swap_cell – swaps out an existing cell with another
• Write_changes command
• Writes a text based description of the changes
• Writes a PT command file to recreate the changes
Path based analysis
• By default STA tool propagate worst slew & arrival time for multi input gate which has some
pessimism. (i.e. Graph Base Analysis)
• Through PBA pessimism can be reduced by propagating slew and arrival time based on
actual input pin of cells in a timing path.
• With PBA option STA tool re-calculate cell and net delays on user selected paths
• Uses path specific arrival times and slews
• Re-calculates corresponding clock path
• Re-calculates setup/hold constraint arcs
• Uses revised (victim) timing edge
• Run time & memory intensive
• Recommended for end of the design process for few critical violating paths
Distributed Multi-Scenario Analysis (DMSA)
• DMSA feature of Primetime can analyze multiple scenarios parallelly
in a single run and can generate unified result.
• Usually, ECO are generated from DMSA timing sessions.
• Fix_eco_drc: used to fix transition violations
• Fix_eco_timing: used to fix setup/hold violations
• Fix_eco_power: used to optimize power
Constraints For input ports
• For each non clock port specify max & min arrival time using set_input_delay.
- set_input_delay –clock CLK1 [get_port I1] -max <delay_value>
<delay_value> =~ Max (Clk1 latency upto F1 + time of C1 + time for C2)
- set_input_delay –clock CLK1 [get_port I1] -min <delay_value>
<delay_value> =~ Min (Clk1 latency upto F1 + time of C1 + time for C2)
• Use virtual clock if the reference clock of interface is not known.
• For clock input ports use set_clock_latency command to specify clock latency of
external logic.
• Use set_driving_cell, set_input_transition to specify driver characteristic of a port.
Input Path delay specification
Timing report of input interface
Hold check report
Setup check report
Constraints For output ports
• For non clock ports, using set_output_delay command specify max & min time
signal takes outside the blocks.
- set_output_delay –clock CLK1 [get_port I1] -max <delay_value>
<delay_value> =~ Max (time of C2 + time for C3) – Min( latency of F2)
- set_output_delay –clock CLK1 [get_port I1] -min <delay_value>
<delay_value> =~ Min (time of C2 + time for C3) – Max (latency of F2)
• Use virtual clock if the reference clock of interface is not known.
• Use set_load to specify output capacitance of external logic.
Output Path delay specification
Timing report of output interface
Hold check report
Setup check report
IO delay with multiple clock
• Use “-add” option of set_(in|out)put_delay
• Push down delay from Top level to block using budgeting scheme.
Lab #7
• Copy directory “/home/jdshah/spring_2024_labs/lab7
and follow instruction of lab_excercise.txt file
Thank You
Backup Slides
Corner Explosion
Operating modes: functional, scan shift, scan capture, bist, async
FE corners: SS, TT, FF
SSG SSGNP TT FFGNP FF
ΔW ΔT ΔH
Typical typical typical Typical
BE corners: C-worst, Cbest, Typical
C-best min min max
RC-worst, RC-best C-worst max max min
RC-best max max max
RC-worst min min min
Temp corners: cold, hot
Voltage: Vlow, Vnomial, Vhigh
39
Worst case Corner
• Design for worst case
• Usually for setup, the worst delay corner is SS, low Voltage, high and low
temp and Cworst & Rcworst wire corners. (depends on design and process)
• Usually forhold, the minimum delay corner is FF high voltage, high and low
temp and SS low voltage, high or low temp for clock skew dominated path.
• Hold analysis need to be for all wire corners. (i.e. more pessimism required)
• Robust as it covers the process yield distribution
• Increases cost (larger die) and schedule (more difficult to fix setup/hold
violations across SS to FF
FF,Vhigh,hot
FF,Vhighcold
Design & Process
Window
SS,Vlow,hot
SS,Vlow,cold
CMOS
Many IPs in a typical design
Timing check of a Path in STA
FF1 = Launch Flop, FF2 = Capature flop
Tskew = Tclk_launch - Tclk_capture
Thold , Tsetup = Hold , Setup of a flop
Tcq = Clock to Q delay of a flop
Setup Check (aka max delay check):
Tcycle >= Tcq + Tcomb + Tskew + Tsetup
Hold Check (aka min delay check):
Tcq + Tcomb >= Thold + Tskew