0% found this document useful (0 votes)

59 views79 pages

STA Basic Concepts

This document provides an introduction and overview of Static Timing Analysis (STA). It discusses why STA is needed, as SPICE simulation is too slow for modern chip design. The document then outlines the agenda that will be covered, including STA concepts, delay calculation, constraints development, timing verification, and more. The goal is to help students and engineers learn essential STA knowledge.

Uploaded by

maheshdesh1617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views79 pages

STA Basic Concepts

Uploaded by

maheshdesh1617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

STA Basic Concepts

(Manuscript)

Introduction

[p1] Title
Hi there, welcome to the ASIC Boot camp! My name is Neil Jiang. As a group of engineers who
have been working in the semiconductor industry for years, my team are here to show you what
knowledge is truly important to know and what technique is really being adopted in our day-to-
day design work.

Especially, Static Timing Analysis is the focus of this course. STA is the foundation and the first
priority of all backend design work. If you go for a job hunting for ASIC backend design, you will
find that STA engineer is absolutely the most visible role of a team. As a matter of fact, timing
closure consumes more than 90% of the backend design cycle. Out of the three major QoR metrics,
we are always saying: “Area is important, Power is also important, but Timing is the King”.

During this basic level course, you will learn all the common STA concepts systematically and
comprehensively. A lot of the concepts are not taught in school or anywhere else even if you are
a VLSI major student. However, when you take your first job, the company often assumes you
know these “basics”. That’s why we are here to make sure you can understand the most of the
topics people are talking about every day.

With this course, you can have a jump start of your career and quickly gain experience that
equivalent to what you will learn normally from 2 years of actual work. This is because we have
sort out all kinds of misleading and confusing concepts to make this knowledge system ready for
you.

You will have a good understanding and in-depth knowledge of STA which allows you to involve
in the real design problem and be more efficient to debug constraint issue and fix timing violations.

The Target audience of this basic level course is mainly for new graduate student or active job
seekers. Entry/Intermediate level engineers who want to combing your existing STA knowledge is
also warmly welcomed.

[p2] The Goal of this course

Let me give you some more details about what we will going to do in this course.

First, I will explain the most often used concepts in STA and help you establish the knowledge
system from scratch.

We will talk about backgrounds such as process variations, operating conditions, and delay
calculations of a timing arc, how crosstalk & noise get calculated in an STA engine.

Once you understand these concepts, you are already mastered the biggest chunk of basics
needed for entire backend design.
Secondly, I will show you the most common SDC commands you should know because constraint
develop is one of the most important aspect of timing a design. You can get valid results only after
the design is correctly constrained. This helps you with later on constraint debug. The topics will
cover from clock creation, tracing, debug unconstrained endpoint, case analysis and multi-cycle
path.

Thirdly, a basic ability for any STA engineer is to read and understand the STA timing report really
well. This means you will know the exact meaning of each part of the timing report, the meaning
of each symbol and the relationship between the numbers.

Lastly, as the cheat sheet to help you prepare the most classic interview questions, I will show you
some of the simplest and most effective timing closure approach.

[p3] Agenda
Here is the agenda:

 Chapter 1 what is STA?

In this chapter we will discuss the definition of STA, why we need STA and the limitation of STA.

As the back ground of all analysis, I will introduce the concept of PVT corner, different STA analysis
mode used in practice.

 Chapter 2 delay calculation

The STA issues are basically delay matching problems. In this chapter, I will show you how the
backend tools calculate delays in a modern design for both standard cells and interconnections.

We will learn concepts of timing graphs, timing arcs, unateness, cell delay model, wire delay
model and RC tree delay calculation method.

In the end, the concept of graph-based analysis and path-based analysis will also be introduced,
which used quite often in the real design world.

 Chapter 3 Constraints Develop

We will go over topics like:

- Reading a Clock Diagram

- Explain what is start Point, end Point, and timing path groups
- Especially, Synopsys Design Constraints format will be introduced, which the industry
standard for constraining design timing.
By using SDC, I will show you:
How to creation a clock.
How to constraining Input/output paths
How to setup a case analysis
How to disable timing arcs
How to set false path
How to create asynchronous clock group
 Chapter 4 Timing verification

This is the most famous setup/hold checks. However, the timing checks are not limited to only to
setup/hold checks.

We will first talk about the detail mechanism of why we need setup/hold check. Then, I will show
you the concept of multicycle path, the multicycle version of normal setup/hold check which is a
must to see in the real life design. Besides, I will expand the concept of setup/hold check to
another pair of essential checks in STA. the recovery check and removal check.

 Chapter 5 Special Timing Checks

In this chapter, we will talk about timing checks cover cases more than basic setup/hold checks.

You will learn how the max/min timing checks are performed if the datapath is crossing multiple
clock domains.

You will learn basic method to deal with Clock Domain Crossing issues.

You will learn concepts of latch-timing, which is called time borrowing techniques.

Besides that, we will discuss more on following high-frequency checks other than setup/hold
check, which are:

- Clock Gating Check

- Data to Data Check
- Point-to-Point Delay Check

 Chapter 6 Crosstalk and Nosie

This chapter is for signal integrity issues in deep submicron-technology.

We will talk about two types of crosstalk effects, glitch and delta delay and the techniques to
avoid the noise problem from affecting your design.

 Chapter 7 STA methodology

This chapter is important because we can see the impact of how people reduce pessimism inherit
in their methodologies.

This helps you connect the real world with theoretical analysis.

I will first introduce the concepts of Statistical Static Timing Analysis (SSTA), and then the practical
method to modeling the statistical nature of process into STA analysis.

That is, the On Chip Variation / OCV effects.

Lastly, I will introduce another commonly adopted design methodology, Multi-Corner Multi Mode
and also Merge Mode analysis.

 Chapter 8 Signoff Checklist

This chapter serves as a quick review of all the above chapters. The items in this checklist will be
the most common things people are dealing with every day.

[p4] Practice in Mind

When we make this course, we know this could be a helping hand to new STA engineers. So we
intend to make this course as practical as possible. So we made up some example to show the
tool behavior in industry standard backend tool like Design Compiler and Primetime. Even though
these examples are not real-life example, we believe they still supports well to the point we are
trying to make.

We also go over 25 special topics right after certain knowledge point as a complemental material
to extend the concept. These topics are designed to be easily understood and very practical.

For some difficult concept, we introduce a set of rules to help you get to the point quickly. This
aims to clarify undefined conceptual confusions. You can test and rely on these rules in your
design work.

Ok, let’s get started.

Chapter 1 What is STA?

[p6] Why we need STA

There is no strict definition about Static Timing Analysis. It is a way to check if every path in the
design can pass down the intended logic values in time when the chip is operating at a certain
clock frequency.

So basically all the timing problem is essentially check the path delay against some threshold, or
limit to make sure the chip can operate at a certain speed. So here is the question, given a circuit
consists of logic gates and the wires between them, how do you calculate the delay out of it? A
traditional way is to flatten the design into transistor level, and do an SPICE simulation on it.

There is no objection that SPICE simulation is the most accurate way to find out how much the
delay is. However, we are talking about millions or trillions of transistors co-exists in the same
ASIC chips nowadays, SPICE simulation is simply too slow to compute such large amount of data.
Or we can say the time cost for such task is unacceptable for production.

Therefore, we truly need to have another to find reasonably accurate delay results but with much,
much faster turn-around time. So that is where STA comes into play. Let’s see how STA address
these two issues.

The first task for STA is to maintain accuracy. We all know that a circuit consists of cell and wire.
For cell, as long as STA can get accurate delay, it can maintain the accuracy. This can be done by
characterize the cell delay with SPICE into a library ahead of actual STA analysis. Once the library
data is ready, STA can load in the cell delay value and get accurate results. For wire delay, STA can
either estimate the delay before the actual layout is done or extract parasitic and then calculate
the delay after the layout is done.

The second task for STA is to be fast. To achieve this, we have to reduce the amount of nodes that
evolved in a timing problem. Instead of analyzing at transistor level, STA first convert the circuit
into a timing graph in which each gate is the lowest level of delay elements. Also, the way to judge
if the circuit can operate successfully is not simulation based but is purely mathematical. This
dramatically reduced the amount of time needed to handle a large design.

[p7] Features of STA

Let’s look at some more features of Static Timing Analysis from its name.

For a given timing path between a startpoint and an endpoint, the logic will never going to change
once the chip is under-analysis. So it is the same bunch of cells being exercised again and again
every clock cycle. What STA does is it find this pattern and collapse the entire timeline of our
world into one clock cycle and just analyze that one cycle. STA does this for all the timing paths
on the chip at the same time. It then use a pure mathematical way to check path delay against
timing requirement. This is why we call it Static.

The second feature of STA is, unlike SPICE simulation, it does not depends on input stimulus. Being
a mathematic checking, it traverse all possible logic connections between any startpoint and
endpoint, which in the end literally covers all possible scenario that may occur in a design. This
feature sounds good but in some cases it impact the timing closure negatively. For example, STA
will flag path that never used in real application, this could potentially cause over design if people
missed to filter that path out of analysis. This is what we called “exhaustive”.

The third feature of STA is, it does NOT care out functionality. Yes you have not heard wrong, how
can such a powerful thing are care about functionality? What if the design meet timing but have
functional failure? Well, the STA is only responsible for making sure the path delay is within the
target so that value from the launch point can be captured without err at the capture side. It
doesn’t really responsible for if the logic is implemented correctly. So in order to really be ready
to ship your design to the customer, you have to run another type of check called functional
equivalence check where we assume the design meet timing unconditionally and then use some
logic operation algorithm to check if its functionality has been correctly implemented.

The last feature I will say about STA is it can be very conservative and pessimistic... How can a tool
be pessimistic? Well, STA assumes everything is doing it worst when analyzing the chip. For
example, it use slowest delay a cell can get for max timing check, it checks the most restrictive
pair of launch and capture clock edge. It has to do so because we need a robust design in all
condition to consider into manufacture variation and application needs. So it has to have a lot of
design margins or redundancy built in our design.
[p8] Rule of STA
As I said earlier, timing is the king of the backend design. 90% of the design cycle is to achieve
timing closure. This can also be explained by looking at the role of STA in the backend design.

If certain function need too many logic levels on the critical path, it is not possible to finish
operation within one clock cycle at a given technology node. The RTL coding must be tweaked for
easier timing closure so that the functionality can be physically implemented.

Logic synthesis need to be optimized to reduce the logic levels and choose structure fast enough
to operate at the target clock frequency;

Place and Route must do a lot of work to make sure the design can meet timing while being
routable and can be manufactured;

Even power integrity issues like IR drop need to take into consideration of clock frequency, timing
window and simultaneous switching.

This is why the STA engineer usually possess a core position in a backend team.

[p9] Limitation of STA

The STA is such an important tool that any ASIC design can’t be “taped out” without achieve timing
closure. However, we should also know everything has its limitation, so does STA.

The most obvious thing is since STA is our “safe guard” in this ASIC design game, it has to be very
conservative. It won’t allow you to take any chance for failing timing and cause a functional failure.
So that’s why it considers so many delay variations and analysis modes only to try to make you
meet timing even in most restrictive condition. Some of these restrictive condition can be purely
mathematic combination or permutations, which are very rarely exist in reality, so if we rely on
STA, we have to spend effort to close timing on the unrealistic case.

Another point is as mentioned earlier in this chapter, STA may flag paths that are not functionally
meaningful to an application. We call such timing paths “false path”. Usually the architecture guys
or RTL team will be the one to tell if a path truly meaningful in a design. So this is one of the items
should be communicated between the RTL team and backend team. In the backend design side,
we use SDC constraints to tell STA tool the paths are real or not. We will talk more about SDC in
upcoming lectures.

The last point is it is not suitable for asynchronous design. In STA, in order to collapse the design
timeline into minimal cycle, the launch clock edge and capture clock edge must have a
deterministic relation so the pattern can repeat itself.

If the timing path under analysis is between two asynchronous clocks, then the required time can
be some arbitrary number, we can’t rely on a static process to analyze it. That’s why all large scale
designs are synchronous design. However, this doesn’t mean there is absolutely no asynchronous
path in the design, in some very predictable condition, we have special constraints to check timing
for asynchronous paths as well.
[p10] Operation Condition
So let’s learn the concept of operation condition. For any STA analysis, it really doesn’t make any
sense without talking about under what operation condition the analysis is done. The chip is
working in a real physical world where the cell delay and wire delay values are largely impacted
by its surroundings. A cell can be much faster on certain wafers than on some other wafers. A
wire could be faster when temperature is high. Not everything is created equal so we have to take
all these into considerations. Thus, operation condition is the first thing we need to notice of.

Normally, the operation condition also referred as PVT corners. PVT stands for Process, Voltage
and Temperature, which are the most dominant factors that impact the cell/wire delay.

[p11] Process Variation

So first, let’s talk about process variation.

The source of process variation can come from a lot of places such as oxide thickness variation,
voltage threshold variation, and channel length variation.

The impact of process variation is throughout the entire STA analysis. It impacts the entire analysis
from every aspect even though it does not directly factor into the mathematic equation of timing
checks.

There are two types of process variations, namely global process variation and the on-chip
variation.

The global variation is also called inter-die variation which refers to the process effect that impact
all device on the same die but differ from die to die.

The on-chip process variation is also called local variation or intra-die variation, which refers to
the variations in the process which can affect the devices differently even on the same die so the
same type of device placed next to each other could have different behavior on the same die.

[p12] Process Variation (cont’d)

From Monte-carol simulation between the impact of cell delay and process variation, the number
of gates falls into certain delay range roughly follows Gaussian distribution.

For the consideration of timing verification, which is normally tend to be conservative and
pessimistic, we choose the nominal case as well as extreme corner case to guard band our design.

So for the impact from global process variation, we end up having three corners, which is fast,
typical and slow.

Around the central line of each corner, the on-chip variation kicks in but with a much less standard
deviation.

[p13] Process Corner

There are largely two steps to make a CMOS device: transistor formation in base layer and
metallization on higher layer.
We already know that because of process variation, a single CMOS device will be either slow or
fast or anywhere in between across different dies, which is describes as a result of global variation.

Thus transistor formation where each gate consists of PMOS and NMOS, we have 4 extreme
combinations in total. That is slowest NMOS + slowest PMOS, slowest NMOS + fastest PMOS,
fastest NMOS + slowest PMOS, fastest NMOS + fastest PMOS.

For metallization, we also have process variation due to manufacture parameters such as the
thicknesses of the metal and the dielectric, and the metal etch which affects the width and spacing
of the metal trace in various metal layers.

The extreme process corners are defined as the +/- 3 sigma corner from the nominal of process
variation distribution, which models only the global process variation.

[p14] Voltage/Temperature Corner

The voltage and temperature corners are much easier to understand. Normally, the cell
propagation gets slower as temperature goes up. But there is an important “abnormal” situation
when we talking about temperature impact. Traditionally, the higher the temperature, the slower
the cell delay.

The cell delay can be roughly describe by this equation, where miu is the electron mobility and
Vth is the threshold voltage.

As temperature increases, both the mobility and Vth decrease. Generally, the mobility has greater
impact on the overall cell delay so the cell become slower when temperature increases. But if the
supply voltage is very low in some application, there is a range that Vth decease overweigh the
mobility decrease and the cell can get slower even if the temperature goes low.

Normally, among all PVT corners, worst case slow and Best case fast are the two need most
attention since they are the 3 sigma extreme corners and guar band the whole design.

The slow process models correspond to the +3 sigma corner condition for the inter-die variations.

The fast process models correspond to the -3 sigma corner condition for the inter-die variations.

Voltage variation has a very simple relation: the higher the voltage, the faster the cell.

[p15] Library Scaling

Multi-voltage design which contains cells working at different operating conditions require
libraries characterized for multiple voltages.

To simplify multi-voltage analysis, the STA tool supports voltage and temperature scaling. This
scaling allows us to analyze timing at voltage condition that is not being characterized.

The STA tool uses interpolation to derive appropriate timing, noise and power values to be used
for current voltage condition.
One-dimensional scaling only requires two set of libraries such that only one characterization
parameter (voltage rail or temperature) is varying and remaining are held consistent across all
libraries.

Two-dimensional scaling needs at least three libraries to accommodate two parameter varying at
the same time. Normally, to achieve good accuracy, if there are 2 libraries per dimension, in total
we need 4 libraries to support the 2 dimensional scaling.

Three dimensional scaling is not used too often due to it involves a lot of library characterization
work to be done. It allows three independent parameters to be varying at the same time at the
cost of much more library sample points.

[p16] Inputs to STA analysis

To start static timing analysis, we need to prepare some prerequisite inputs to the STA tool.

First of all is the netlist which represents our design. STA only works on design that is already
mapped into gate-level netlist. For pre-layout design phase, Verilog comes from logic synthesis
tool or Place and Route tool before the design is detail routed. For post-layout design phase,
Verilog is generated by place and route tool after the design is detail routed.

Then we need timing constraint to make the STA analysis alive. The constraint is in the form of
SDC, which stands for synopsys design constraints format. For Multi-mode multi-corner analysis,
we need design constraints for each individual scenario.

Then, after the design is routed, we need to back-annotate the wire resistance, capacitance,
inductance value into the design. This is called the parasitic extraction. It is often generated from
layout extraction tool and in a distributed RLC network format called standard parasitic exchange
format or simply SPEF.

Once we have the parasitic values, the STA engine can begin time the design by calculating delay
through each timing arc. It will use SPICE-simulated delay values stored in the timing model library.
The timing model is in form of synopsys liberty format or simply called dotlib.

Lastly, in case we want the STA to capture on-chip variation effect, we need to provide a table
with derating factors that should be used in calculation. The most often used OCV method
nowadays is AOCV table.

We will go over all these requirements in later chapters.

[p17] Topic 1: Netlist

Netlist consists of cascaded nets and logic gates. The simplest case is a gate with single input and
output followed by another gate like that. In this manner, these cells form into a chain. Such cells
are usually buffers or inverters.
In place and route tool, in order to fix timing or design rule violation such as max transition, max
capacitance, PNR tool usually adds a lot buffers into the design. You will see such buffer chain or
inverter chain along the path very often.

A little more complicated case is where one cell drives multiple downstream cells. In such case,
the output of a cell could fan out to multiple places. Multiple fanout is definitely ok and actually
pretty common, but since it increases the load on the net, it could slow down the signal
propagation.

[p18] Topic 1: Netlist (cont’d)

When the functionality needs to be implemented with more complex cells than buffer or inverter,
we could have multiple fan in cells in the design such as two input AND gate, three inputs OR gate
and so on. Here, one cell is getting inputs from more than one source. This is perfectly fine and
common also, the only thing need to watch out here is since multiple path is converging at this
cell, it could result in a high utilized area for routing resources and may introduce congestion if
many such cells are clustering together.

However, if there are two different source driving the same input leg of a cell, then it is a violation.
If this happens, the two input source is fighting each other so the value on the net is no longer
deterministic. This situation must be eliminated from the design.

So the take away here is for a net, it could have only one driver but multiple loads.

For a cell, it could have multiple drives and also multiple loads.

[p19] Topic 2: Hierarchical (cell/pin) v.s. Leaf (cell/pin)

We often heard two terminologies during discussion: the hierarchical pin/net/cell and the leaf
pin/net/cell. They are both valid design objects and sometimes referring to the same physical
entity, so what’s the difference?

Well, the hierarchical objects refer to the logic division boundary, it could be a pin/port on a virtual
design wrapper. The leaf objects refer to the physical cell boundary, usually means a pin/port on
a library cell. The leaf is the lowest hierarchy in a design.

[p20] Topic 3: Immediate fan-in/fan-out

To a cell or net, the immediate fan-in refers to the leaf cell which is directly driving that cell or net;

The immediate fanout means the leaf cells driven by the particular cell directly. They are the first
cells encountered when you trace the netlist upstream or downstream.

The fan-in cone and fanout cone of a cell is not limited to only the first encounter though, they
are referring to all the cells could be trace backward and forward from where the cell locates.
[p21] Topic 4: Basic query commands
get_cells / get_pins / get_nets are the most basic and frequently used commands to manipulating
design objects. By default, the commands will only return objects in hierarchy of current design
scope. For example, when I just do get_cells *_reg, it returns all the cells with _reg in the end in
current hierarchy. We can tell from sizeof_collection command that we have 18 of such cells.

The option –hierarchical can be used to broaden the searching range of this command all the way
from current hierarchy down to the leaf cell level. So with –hier option, the same command
returns 20 design objects. Compared to previous result, icache_0/miss_outstanding_reg and
if_stage_0/read_for_valid_reg now appears on the list.

We can then narrow down the search result if needed by applying filter switch. Like what we did
here, once we use *outstanding* as a key word to refine the results, this command will return
only icache_0/miss_outstanding_reg since no other cell names matching our criterion.

If you know the full hierarchical name of a design object, you could directly use get_cell/get_nets
command on it.

get_pins command can be used to set scope on a particular pin, or to find all available pins on a
certain cell if –of_object is used. Here, we have found all the pins of cell miss_outstanding_reg,
including set/reset, data input pin D, clock pin CLK, scan related pins Scan in, Scan enable, output
pin Q and QN. Since the cell doesn’t have a dedicated scan out pin, most likely the Q pin is reused
as scan out pin during scan mode.

The get_net command can also be used to find out all available nets hooked up to the particular
pin. Here we find the net connected to the clock pin of miss_outstanding_reg is named as
icache_0/clock.

One useful switch along with get_nets is –segments, it can return all the hierarchical name
referring to the same nets. I mean the same net could have different names in different design
scopes, so for icache_0/clock, it is being called net181 in wb_stage_0 hierarchy. If use –
top_net_of_hierarchical_group switch along with –segment, the command will return the net
name in top design, in our case, it’s simply clock.

[p22] (cont’d)
Once we get the design object, sometimes we would like to know some attributes on it.
Depending on what the design object is, their attribute could be different. We can use
list_attribute command to list out all the available attributes on the object. –class can be used to
specify what kind of object it is and –application means only listing our attributes defined by the
tool. Customized attributes can also be defined by the user.

An alternative to show all the active attribute on an object is report_attribute. For example, the
miss_outstanding_reg has several attributes listed by this command. We can use get_attribute to
specifically query one particular attribute, such as is_hierarhical in this case. It returns false means
the cell under query is not a hierarchical cell, but a leaf cell.
[p23] (cont’d)
Report_cell is another commonly used command to report attributes especially for cell objects.
Personally I would like to use –connection and –verbose switch along with it to show connectivity
information on each pin of the cell.

[p24] Topic 5: Basic tracing commands

Sometimes we would like to find out the immediate fanin/fanout of a cell. Here let’s look at one
example in the context of a timing report. On the left hand side is a datapath launch from if_stage
and capture at icache missing outstanding reg. there is a cell cachememory/u5322 which has 4
fanouts. As shown on the right hand side, these fanouts are cell u360, u361, u541 and u610.
According to the definition of immediate fanout, they are all immediate fan out of cell u5322.
Note that on the timing path, only path going through cell u361 is shown. Assuming we don’t have
the knowledge of what these 4 fanouts are, we can use all_connected –leaf command to find out
a complete list of cells that is immediate fanout of the driver cell.

If we apply all_connected –leaf on the driving net n323, it will return the driver pin u5322/Y and
four sinks

U610/A1, u541/A1, u360/A and u361/A. Same holds true if we apply the command on the pin
object. The command can return net n323 is the only net connected to pin 5322/Y.

[p25] (cont’d)
Finding the fan out cone is also easy, let’s look at another cell icache_0/u65 on the timing path.
We can use all_fanout –flat –from command on the output pin Y to list all cells that consists of its
fanout cone. In this example, the results returned will be the pin Y itself and the input D pin of the
miss_outstanding_reg. the –level switch controlled how many layer of fanout cells it returns. A
commonly used switching is –endpoint_only, which make the command only returns the
endpoints among all fanout cells. A minor tweak option with switch –only_cell will return the cell
object instead of pin object.

[p26] (cont’d)
Very similarly, all_fanin –flat –to command can be used to trace back the fan-in cone of a
particular cell or pin. It also comes with –level switch, but what make it different is since the results
are fanin, we need to use –startpoint_only to find out the start point.

[p27] Topic 6: Synopsys objects

After talking so much, you may have noticed the cells/nets/pins are all treated as objects in the
tool. If we store the output from get_cell command into a variable and print it out into standard
tcl, it returns something start with _sel with a string of number. This is the pointer to the design
object used inside the tool engine. We can use query_objects command to translate it into
human-friendly name. Or we can use get_object_name to return the human-friendly name as a
command outputs.
Another trick I would like to show is to split the command output into multiple lines. The default
output of a command will be printing the design objects one followed by other in the same line.
If there are too many of them, tool will display only up to a certain limit. You can try using this line
or put it into a little procedure to split the out and print the instances from the output one line
for each.

Another useful procedure to debug scripts issue is proc_body. It will display the content of a
procedure. If you are not sure where some procedure comes from and what it is doing, you can
use this command to find out the source code of the procedure.

[p28] Chapter Summary

In Chapter 1 we just went over the definition of Static Timing Analysis, get to know the great
power as well as limitation of this methodology. We also have learned the first concept involved
in STA: The Operation Condition, which is a fundamental pre-condition to watch before all
subsequent analysis and checks. Later on, we know how the tool extend the analysis space by
scaling libraries characterized at certain operation conditions in order to cover a large range of
design possibilities. In the end, we revealed all the necessary inputs needed to conduct a STA run.

On the side, we have learned some basic terminology and commands in synopsys design tools.
We get the ideal what is hierarchical cells/pins, fan in and fan out logic cone. Query commands
like get_cells/get_pins, tracing commands like all_connected, all_fanout.

Keep on learning, you already had a good start!

Chapter 2 Delay calculation

[p30] Timing graph

In this chapter, we are going to talk about the delay calculation. First, STA constructs the logic
connectivity after the netlist is read in and the design is linked. During a timing update, this
database is populated with timing values from delay calculation.

STA convert the actual circuit into a map of nodes and calculate the delay between any two nodes.
As the graph represents the entire design, all possible timing paths are contained within the graph.

The graph gets its name from its representation of the design as a node graph. The ports and pins
in the design become the nodes in the graph, and the timing arcs become the connections
between the nodes.

[p31] Timing arcs

Timing Arcs refers to the connection between input pin and output pin of the same cell. We have
two types of cells in the design, the combinational gates and sequential cells such as flops and
latches.

They have different types of timing arcs. For a combinational cell, such as inverter, buffer, AND,
NAND, OR gate, the timing arc is simply connection between each input and its output.
For sequential cells, it is more complicated, it has clock to data arc, clock to reset arc, clock to
output arc. The clock to data arc is actually where the setup and hold checks come from, the clock
to output is the propagation path of the sequential cell.

[p32] Timing sense

Obviously, for every timing arc, the signal could go up and down. So the transition could be rise
and fall. We define the timing sense as three types of unateness.

Positive unate timing arcs are the arc where input and output signal holds the same direction of
transition. A rising transition on an input causes the output to rise or to be unchanged. A falling
transition on an input causes the output to fall or to be unchanged. For example, if the B input of
this AND is 1, the gate will become transparent to the A input, so output transition follows A pin.
If the B input of this AND is 0, the gate output will stuck at zero, so any change on A pin won’t
cause a change on the output.

Vice versa, the negative unate timing arcs are the ones where output transition has opposite
direction of transition from the input waveform. A rising transition on an input causes the output
to fall or to be unchanged. A falling transition on an input causes the output to rise or to be
unchanged.

Non-nate arcs are the ones whose output transition cannot be simply determined by single input.
The output transition also depends upon the state of the other inputs. Such as the XOR gate, if
the B input is zero, then the rising transition on A pin will cause output also rise; if the B input is
one, then the rising transition on A will cause output to fall.

[p33] Timing Path

From the synopsys definition, a timing path is a point-to-point sequence through a design that
starts at a register clock pin or an input port, passes through combinational logic elements, and
ends at a register data input pin or an output port.

What the timing path is describing is whenever there is a data switching happens at the startpoint,
the signal propagates through the cloud of logic gates, doing some logic operation, then reach the
endpoint.

Valid path means the path represent a real functional transaction. The signal has to go through
the cloud of logic and to be captured by the endpoint in order to have the design functioning well.
This type of path should be the one we care about in STA.

Leakage path mean even though the path from the startpoint and to the endpoint exists
topologically since STA is doing an exhaustive check on all possible path trace combinations, it is
not required functionally for the chip to operate. This type of path sometimes could happen and
we should ignore them.

Don’t care path is a more of a constraint issue. For example, if there a timing path reported by the
tool but the startpoint holds a static value, for example, some pre-programed configuration
register, then the startpoint won’t be toggle during normal functional operation. There is no point
to check the timing path for this startpoint. We can set false path from this startpoint, but to deal
with this kind of path requires related design knowledge.

[p34] Late and Early Path

Now let’s talk about what is the longest path and shortest path. This concept is simple but often
get wrong to people who are new to STA. In general, there are multiple timing paths through
which the logic can propagate to the required destination point.

The actual path taken depends upon the state of the other inputs along the logic path.

Max path (longest path/late path) = the path with the largest delay between two end points. ->
This type of path is usually used for setup check

Min path (shortest path/early path) = the path with the smallest delay between two end points.
-> f or hold check

Don’t confuse max/min path with max/min delay

For the same start/end point pair, max path and min path can be two completely different timing
path consist of different sets of gates.

Max delay and min delay for a given cell refers to delay through timing arc of the same cell. The
delay difference is due different stimuli applied to the analysis to modeling process variation and
worst slew merge scenario.

[p35] Drive strength

Another important concept in STA is the drive strength of a library cell. The ability for a cell to
drive a load is inversely related to its output resistance. The output resistance and the load
capacitance forms a RC network. If the resistance is small, the CMOS can charge the output load
more quickly. To achieve a smaller resistance, we can increase the width of the MOSFET so the
equivalent resistance become smaller (multiple R segment in parallel).

[p36] Cell delay

Let’s go over another basic concept of cell propagation delay which I believe most of you should
already heard of.

We know that due to capacitive loading from the parasitic on the wire which must be charged or
discharged in order to raise or drop the voltage level on a net, there will be a finite time for the
signal to transition.

That is where the rise and fall transition or slew rate come from.

Transition time is traditionally measured as time required from 10% to 90% of the transition
(rise/fall).

Slew is the inversion of transition time - the larger the transition time, the slower the slew.
However, the slew threshold is chosen to correspond to the linear portion of the transition
waveform. So for relatively new generation of technology node, the slew threshold is usually
chosen from 30% to 70%.

Propagation delay is the main source of the gate delay. It stands for the time needed to propagate
through the logic cell itself.

If the transition time is ideal, namely zero, then the propagation delay is purely the delay between
two transition edges.

However, since we have finite transition time, the propagation delay is now defined as delay
between 50% of the input waveform and to 50% of the output waveform.

The propagation delay of a cell is a function of input transition of each inputs and the output load
capacitance.

[p37] Cell delay model

Delay calculation is performed for one stage at a time, where a stage consists of the driving cell
arc, the output RC network, and the capacitance of the network load pins.

Given an input slew or waveform at the driver input, the goal of delay calculation is to compute
the response at the driver output and at the input of the receiver pins.

The computed responses are then used to determine the cell delay for the driver and the input
transition time at the load pins.

The purpose of cell delay model is to provide a mathematic model for the STA tool to compute
propagation delay through the driving cell.

Remember that the all these model used in STA is try to be correlated with real physics as close
as possible, but they are not aim to simulate the circuit, which SPICE simulation does.

So all these delay models are a trade-off between runtime and accuracy.

Currently in the industry, two major models are being used.

[p38] NLDM
Non-linear delay model (NLDM) is a voltage-based delay calculation model which is widely used
models representing the response characteristics of cells in the libraries. It is very simple and less
time consuming for the tools to obtain the response of the cells. This model uses two dimensional
tables to represent the cell delay, output slew and other timing checks. In this method of
modelling the driver cell is modeled to be a voltage source with resistance in series (Thevenin
Model). The receiver is modeled to be a load capacitor.

The NLDM table is in the form of a two-dimensional table.

Notice that the NLDM table is characterized under the condition where the output wire resistance
is zero since we have no idea what the load will be when just creating the library.

[p39] NLDM - Library

The NLDM supports two methods of computing Delay through a cell.
Typically, we should specify the method that correlates best to the characterized library data.

The two methods are

• Performing table lookup and interpolation in a cell delay table provided in the library (most
common)

• Using the propagation and transition tables, following this equation:

Dcell = Dpropagation + Dtransition

For a given library, if cell delay tables for a timing arc is provided, then the propagation delay
tables must NOT be provided. The same holds true: if the propagation table is specified, then the
cell delay table must NOT be provided. So the tool can only choose one of the method for delay
calculation.

Here is an example for timing arc and timing sense inside a normal library file. The pin section
shows what the current pin to be characterized is. Here the pin name is Y and it is an output pin.
It also tells you the associated power and ground, the direction and the logic function generated
on this pin. This section also contains design constraints such as max capacitance and max
transition spec which supposed to guide the physical design tool for optimization.

The timing section shows the related pin for this particular arc, A1, which means the following
table is describing the timing behavior for arc A1 to Y. As we can see, for an AND gate, the arc is a
positive unate arc because the output if rising along with the A1 if A2 is a one, or Y stays low if A2
is zero. Y will never be falling when A1 is rising.

Note that there are two tables show up here: the cell_rise and rise_transition.

The cell_rise is the direct cell delay look up table showing cell delay when the signal is rising up. It
uses input transition and output load as index to interpolate the entire cell delay through the cell.
The rise_transition table describes the output transition time based on input transition and output
load. It can be used as input transition to next stage after applicable slew degradation. If there
are propagation table provided in library, the second method of calculating cell delay can be used
as well.

[p40] NLDM - Interpolation

Regarding to how the STA tool interpolate delay values based on pre-characterized input
transition and output load pair, we can think it as the picture draw below:

Let’s see we are given an input transition of 0.09 and output load 0.67. Four nearest pre-
characterized points in the cell delay lookup table is highlighted in the rectangle. In a 3-
dimensional graph, the cell delay value of these four points forms a plane. We can use a simple
equation to model: Z = A + B*X + C*Y + D*X*Y.

To interpolate the corresponding cell delay using these 4 points, the first step is to solve the
hidden coefficients A, B, C and D. This can be easily done by substitute the datapoints into the
equations. Once the coefficients are revealed, then the cell delay can be calculated for the given
input transition/output load combination.

[p41] NLDM – (C effective)

The main limitation of NLDM is it failed to take into account the resistive effect on the
interconnection wire.

In deep submicron technology, the impedance of the net is in comparable range with the driver
resistance. Thus the resistive impact on the wire cannot be simply ignored anymore.

The cell output waveform with RC load is very different from the waveform with a single capacitive
load.

Let’s look at the example of an inverter. As the input waveform rising up, the output waveform
falls down.

The parasitic of the wire and load pin can be modelled as a distributed RC network. This RC
network can be further estimated by a RC PI model consists of a total capacitance of Cnear + Cfar,
but with a Rwire in between. Thus, near-end capacitance will be charged quicker than far-end
capacitance because of the resistive wire.

The output waveform for the actual load will be crossing the 50% transition bar at a much earlier
timestamp than when using total capacitance of the wire for delay calculation.

Recall that the NLDM table is characterized under the condition where the output wire resistance
is zero so it cannot be used directly.

We have to modify that model in order to make the table useful again.

The most widely used method today is to introducing an effective capacitance.

The idea of effective capacitance is to obtain an equivalent output capacitance Ceff which produce
the same delay through the driver cell as the original design with the actual RC load.
However, the effective capacitance only ensures the delay to the 50% waveform transition bar is
matching up the actual RC load, it doesn’t provide a matching output waveform with the actual
load. That means the accuracy of transition from this method is not considered.

We can see because of the interconnection resistance, the capacitance seen from the driver
output is actually smaller than the total cap on the wire. This effect is called resistive shielding.

If the interconnect resistance is negligible, the effective capacitance is nearly equal to the total
resistance;

If the interconnect resistance is very large, the effective capacitance is almost equal to the near-
end capacitance (extreme case is like an open circuit)

Resistive shielding is a very important concept in advanced technology node. This phenomenon
is dominant in case of long wire and make the wire delay calculation pessimistic.

Because of this, when we want to improve the RC delay of a wire, it’s better to reduce the
resistance at the near-end of the drive output so it can reduce the inaccuracy of the delay model
and improve overall delay.

[p42] NLDM – path delay

Let’s see another example of calculating cell delays for a cell. Assuming there is a rising signal
transition at the input of the inverter. Since its output will be falling, it is a negative unate arc.
From the library of this inverter, we can read the cell_fall table to interpolate the cell delay
through this inverter. We can also derive the output falling transition time according to the
fall_transition look up table. Note that if there are multiple arcs converging at the output, the
worst transition time among all the arcs will be picked for calculation afterwards.

Then the signal goes through a RC network, which will widen the transition time even further. This
is called slew degradation. When it reaches the input of the next stage, in this case it is the NAND
gate, the pulse is used as the input transition to the NAND gate. Since the NAND gate is also having
a negative unate arc, we are going to use the cell_rise table and rise_transition table for the cell
delay and transition time calculation.

[p43] CCS Model – driver model

When driver resistance is in order of magnitude less than the impedance of the net, driver model
requires more granularity. Even though one can use effective capacitance method to enhance the
NLDM model, in this case it still cannot satisfy the accuracy requirement anymore. Many industry
EDA tools has enabled composite current source model to overcome the shortage of NLDM.

The composite current source model or simply called CCS model consists of two parts.

The first part is the Driver Model.

Composite current source driver model is a time-varying, voltage-dependent nonlinear current

source. It works with any net topology, including high impedance nets, works with complex cells,
including stacked transistors, works with non-monotonic behavior, including Miller effect. CCS
driver model is a set of waveforms recording extracted current vs. their corresponding time stamp.
It is characterized by capturing current waveform flowing into the load capacitor of the cell. It is
essentially a current source with infinite driver resistance, hence it provides better accuracy in
cases where net impedance is very high. CCS driver model also has sensitivity to input transition
time, output load and side input states. This would help in modelling the cell delay accurately due
to the R effect on the interconnection.

These characterization experiments are repeated for a table of different input transition and
Output load combinations.

The current through Cout is saved for every circuit simulation time step and then reduced to a
much smaller set of current and time (i, t) points.

When to use this driver model on an actual circuit, the first step is to calculate an effective
capacitance from the reduced-order rc network. Then we can apply the output current table.

When we apply these currents to their respective capacitances, we can reconstruct the voltage
waveforms. If we are presented with an output capacitance that we did not pre-characterize with,
we can interpolate between the currents to predict the resulting waveform. Similarly, if we are
presented with an input slew that we did not used for pre-characterizing, we can also interpolate.

[p44] CCS Model – receiver model

The second part is the Receiver Model.

Due to interconnection RC and non-linear capacitance from the input devices of the load, the
receiver capacitance value varies at different points on the transitioning waveform.

In CCS model, this capacitance is modeled differently in the initial (or leading) portion of waveform
versus the trailing portion of the waveform. For each input slew and output load combination, the
model provides two different values C1 and C2 to be used for delay calculation. This two-
capacitance approach enables a dynamic calculation that closely matches circuit simulation for
load inputs that having non-linear capacitance.

[p45] CCS model - conventional

Taking the CCS driver model as an example, in the dotlib timing model, the first two indexes are
still input transition and output capacitance, but an extra index column is added to describe the
time stamp of the current waveform. Now the value table holds a list of output current value
under the combination of certain input transition, output capacitance and certain timestamp.

[p46] CCS model - compact

One drawback of traditional CCS model is it requires too many sample points and data sets.
Conventional CCS timing driver models require you to describe each CCS driver switching current
waveform by sampling data points. As the number of timing arcs increase in a standard cell library,
the CCS timing library size can become very large.

We know so far, the CCS model is current versus time, namely in I-T domain. But we can also
characterize the transition process using current versus voltage, namely in I-V domain. Rather
than use time-stamp based current and voltage curve, compact CCS models current v.s. voltage
curve. The benefit of characterize the transition in I-V domain is the transition curve is much
smoother than I-T domain or V-T domain, so we can reconstruct the curve much more easily and
reduce the storage space.

Notice that the I-V switching curves are usually convex, and they have no inflection point in the
middle, a feature that facilitates compact modeling. For a given I-V curve, we can split it into two
halves. Each halve will be matched with a pre-characterized “base curve”. To Exactly match the I-
V curve, only 6 parameters are needed:

Iinit, switching current value at the starting point.

Ipeak, Peak switching current value.

Vpeak, Voltage value when current reaches peak value.

Tpeak, time when current reaches peak value

LeftID, reference ID of the base curve to match the left halve.

RightID, reference ID of the base curve to match the right halve.

Compared with the traditional CCS method, which may need 20 to 30 sample data sets to describe
the same transition process, the compact CCS method consumes much less space to store the
data. Compact model that uses indirectly shared base curves to model the shape of switching
curves. By allowing each base curve to model multiple switching curves with similar shapes, the
modeling efficiency is improved and the library size is compressed.

[p47] CCS model – base curve and LUT template

Base curve group contains the detailed description of normalized base curves. It defines half curve
waveform for the IV switching waveform. Each base curve consists of attribute curve_x and
curve_y where curve_y contains curve_id and the characterized data array.

What comes along with the base_curves group is the compact_lut_template. The template is a
look up table describes the current-voltage waveform depending on input net transition as the
first index and total output net capacitance as the second index. The waveform attribute is the
third index of the data which consists the six essential parameters stated above, namely, initial
current, peak current, peak voltage, peak time, left base curve ID and right curve ID.

[p48] CCS model – compact driver model

The CCS compact driver model is to record a set of current versus voltage waveforms re-
constructed from base_curve and the corresponding compact LUT. For example, as shown in the
picture, to describing the rise transition IV waveform, it uses c_ccs_drv_template_2 which was
defined in previous slide, and a number of lines is listed under the values bracket. Each line
consists of six parameters, which tells the tool about key values to be used during re-construction
based on the base_curve.

[p49] CCS model – compact receiver model

The CCS receiver model is also another importance component to achieve high accuracy with non-
linear effects of the transistor. In the two-region receiver capacitance model, the voltage rise or
fall at an input or inout node is divided into two regions and the corresponding capacitance values
are stored as receiver_capacitance1 and receiver_capacitance2 lookup tables. These capacitances
are dependent on the input slew and output load. It is recommended to use CCS receiver model
along with the CCS driver model.

[p50] Topic 7: State-Dependent Delay (conditional timing)

State dependent delay or conditional timing arc is very common in the library, especially for those
logic gates who has multiple inputs. The timing model for a particular timing arc could change
depending on the other inputs of the cell. Sometimes people want to distinguish one case from
the other since the timing behavior could be largely different. (Not only timing, but power also)

So there are often several cases description for the same timing arc, one for each inputs
combination and also a default case. The rule of conditional timing arc is as following:

1. we use When statement to specify the condition. If the condition expression evaluates to true,
the following timing values in that case are active. At the same time, the default case is disabled.

2. If there is state cannot be determined in a condition and result the entire condition to be
undetermined state as well (X), the condition will still be evaluated to be true.

3. The timing engine will pick the worst active timing value for this particular timing arc.

4. To disable a particular conditional timing, we must force its condition to be known state false.

Here is an example on the left:

1. There are three cases in parallel as shown on the left. When B is true, not true or don’t case

2. If there is no case analysis set for input B, then all three cases will be active and taking into
account by timing engine. It will pick the worst case to use to calculation. Usually the worst case
is the default case.

3. If we set case analysis to make B constant at one value, then only one of the above two cases
will be picked and used during calculation. The default case is also disabled.

[p51] Wire Delay Model

After talking about cell delay model, the other major part of stage delay lies in the interconnection
delay. The main parasitic for nowadays interconnections are resisters and capacitors.

Thus, the interconnection parasitic can be represented by an RC network. The value for the
parasitic can be pre-layout estimation or post-layout extraction.

Pre-layout phase

Estimation of interconnection delay happens before the design has been actually routed.
Implementation tool has to estimate wire delay to come up with solutions for logic optimization
and placement.

Wire Load Model is often used to estimate RC network.

Post-layout phase
After the design is fully routed, we know the exact topology and length of the route. Extraction
will be done to get the wire parasitic values which forms a distributed RC network for later delay
calculation.

Parasitic extracted from PnR tool is often used here.

[p52] RC Tree
First of all, an importance concept is all backend tool relies on RC tree topology.

RC tree is a reduced order model for tool to calculate interconnection delay through a net.

An RC tree meets the following three conditions:

• Has a single input source node.

• Does not have any resistive loops.

• All capacitances are between a node and ground

[p53] Wireload model v.s. Topographical estimation

The most widely used wire estimation model is the wire load model.

RC values per unit length of the wire are obtained from the library. Extraction data of already
routed designs are used to build a fanout-to-length table called the wireload model

They represent formulas which establish connection between interconnect delay and its
geometrical sizes such as wire length, width, metal thickness.

All nets with the same fanout have the same estimated interconnect delay during front-end
design. Naturally it does not coincide with the reality.

To further correlate with the physical implementation, the industry also tries to bring some place
and route feature into delay estimation.

Even though in STA tool all we need is just the parasitic values, it’s better to know that in some
implementation and optimization tool, they could estimate the wire based on an initial placer that
can take in physical constraints like floorplan and technology file to perform some global routing
and do wire estimation based on that. This way, the RC network topology is based on actual
physical topology and RC value are derived from provided technology file.

[p54] Wireload model

The wireload model illustrates how the length of the wire can be described as a function of fanout.
The model itself is essentially a look-up table for RC per unit length.

For any fanout number not explicitly listed in the table, the wire length is obtained using linear
extrapolation with the specified slope

Note that for nets with same fanout, no matter how they are routed eventually, they will get same
wire delay estimation. All nets with the same fanout have the same estimated interconnect delay
which is obviously not an accurate way to calculate wire delay.
[p55] WLM in library
Here shows an example of the relation of fanout-length defined in library. On the left hand side,
we have two tables depict fanout-length slope-ratio. Unit RC as long as length look up table with
respect to fanout is listed out. On the right hand side, since different wire load model may be
applied to different cells based on the area, the wire_load_selection groups can tell the tool about
what wire load table to pick in case of different design area.

[p56] WLM – RC tree topology

Wire load model has three RC tree topologies usually contained in their library.

First is Best-case tree where the destination pin is physically adjacent to the driver. None of the
wire resistance is in the path to the destination pin.

The resistance contribution in the wireload models is set to 0. So in this case, the wireload
contribution is purely capacitive. Total delay value can be obtained by directly reading from library
NLDM table.

The Second one is the balanced tree where each destination pin is on a separate branch of the
interconnect wire. Each path to the destination sees an equal portion of the total wire RC.

The last one is worst-case tree. All destination pins are clustered together at the far end of the
wire. So each destination pin sees the total RC of the wire.

In STA, we have to setup which topology to be used according to our application needs. But
remember all of these are just an early-on estimation method for delay calculation.

[p57] Parasitic and SPEF

In previous slides, we introduced the wireload model and RC tree topology used in Prelayout
phase.

After the design gets into post-layout phase, where the actual route is done, we usually use layout
extraction tool to generate more accurate design parasitic values in format of SPEF.

SPEF is an industry standard parasitic exchange format.

SPEF contains distributed RLC interconnect values and corresponding (x, y) coordinates for nodes

For example, on the LHS is a snapshot of a sample SPEF file. It describes a net topology along with
RC values between all nodes as the figure shown on RHS.

[P58] Topic 8: Parasitic Annotation Issue

In STA tool like Primetime, after the STA run is done, it can generate an annotation quality check
for the parasitic. Usually what we care most is the internal pin-to-pin nets, but it could have
driverless, loadless and constant nets in the design. Driverless nets mean the nets is not driven by
any cell so its input is left dangling, loadless nets is on the contrary – it doesn’t go anywhere and
just floating. Constant net usually is cause by tie low or tie high cell, representing hard coded
values in the design.

Even though above three “special net” categories are not uncommon, they do have some side
effect to cause partial annotation issue in the timing analysis.
Floating metal pieces in the signal routing could make exaction tool difficult to calculate the RC
since there are an extra piece of geometry belong to the same logical nets. Let’s see during ECO
phase, the layout engineer is trying to reroute a particular signal so he created a new route, but
for some reason he forgets to remove all of the original route and left over a small piece of metal.
In this situation, even a small piece of extra metal could cause partial parasitic extraction issue.
We need to be careful and remove all the dangling unused route shapes.

Dangling ports and pins in RTL could confuse the tool as well, without knowing what the driver
and load is, timing analysis can’t be done correctly on those nets, so ideally they should be
eliminated from RTL or optimized away from logic synthesis tool like Design Compiler.

Constant nets are not that concerning since they are quite static, but we also need to verify to
make sure they are expected. Usually, we shouldn’t have too many constant nets in the design
since at the most of the time, constant means redundancy so logic synthesis tool is working hard
to optimized away any constant logic out of the design.

The consequence of having annotation issue is:

- Tool could assume unrealistic delay value and create false violation
- Timing results cannot be trusted on these problematic nets having annotation issue
- Other nets might as well be impacted so the accuracy of timing analysis degrades.

Thus, we should try to eliminate as much annotation as possible to make STA analysis reliable.

[p59] Elmore Delay Calculation

After discussion about the RC tree topology, let’s take a look at the algorithm EDA tool used to
calculate the delay.

Since all of the RC delay calculation method are based on RC tree, let’s review what a RC tree is
first:

An RC tree must meet the following three conditions:

• Has a single input source.

• Does not have any resistive loops.

• All capacitances are between a node and ground

Thus, both WLMs and SPEF RC network are meeting the RC tree requirements.

Once the parasitic value is obtained and RC tree topology is determined, we can calculate the wire
delay based on the interconnection models.

The easiest way is to use Elmore Delay Model.

Mathematically, it uses the first moment of the RC tree transfer function to estimate the RC delay.

It defined a Shared Path Resistance between any of the two nodes in a rc tree which is the sum of
resistance on the common path from the source to both nodes.
Elmore delay is the sum of product of the capacitance of each node in the RC tree and its
corresponding shared path resistance with all other nodes. For example, the Elmore delay to node
2 on the LHS is calculated as:

The cap on node 1 multiplied by shared path resistance of node 1 and node 2 which is R1. Plus;

The cap on node 2 multiplied by shared path resistance of node 2 and itself which is (R1 + R2).
Plus;

The cap on node 3 multiplied by shared path resistance of node 3 and node 2 which is (R1 + R2).
Plus;

The cap on node 4 multiplied by shared path resistance of node 4 and node 2 which is R1. Plus;

The cap on node 5 multiplied by shared path resistance of node 5 and node 2 which is (R1 + R2).

So total rc delay is R1(C1+C4) + (R1+R2)(C2 + C3 + C5).

Elmore delays calculation is the simplest and most widely used method during the entire design
phase.

Elmore delay model only works for RC trees. It provides an accurate result (true delay) for nodes
that are far from the driving point, it can be inaccurate by orders of magnitude for those nodes
that are near the driving point. This inaccuracy in Elmore delay calculation is primarily due to
resistive shielding mentioned previously.

The main application of Elmore Delay calculation happens in pre-route database which don’t have
parasitic extracted yet. It is a reasonable method when analysis time is a concern and people want
fast turnaround time.

A more accurate but complicated method is High Order Algorithm (AWE / Arnoldi)

Although the Elmore delay inaccuracy can be improved by corrective factors (i.e. effective
capacitance), there are more accurate methods that use higher order moments of an RC circuit
transfer function. However, these methods are all significantly more expensive to compute than
the Elmore delay and that makes them difficult to utilize within ASIC design tools.

One of the more popular and accurate methods used in current physical design and timing
analysis environment for estimating wire delay is Asymptotic Waveform Evaluation (AWE) which
will not be discussed here.

[p60] Arnoldi Delay Calculation

During post-route phase, since we already have all the detail routed metal shapes, parasitic is a
known factor so we need to calculate delay more accurately than pre-route phase. Arnoldi delay
calculation is usually use in Place and Route tool to take route parasitic into account. It helps to
improve RC accuracy at cost of runtime. It is also very suitable to be used in case where net
resistance is comparable to or large than the driver resistance.
[p61] Topic 9: report_delay_calculation
In STA tool like primetime, we sometimes need to have a detailed look at how the delay of a
particular timing arc is calculated. There is a command called “report_delay_calculation” which
can dump out some useful debug information to analyze.

As we can see from the left side, for the fall transition cell delay, the cell u5070 has an input
transition of 0.041 and an output capacitance of 0.800, thus the reported delay calculation is
calculated out of the fall delay table in the library. The report_delay_calculation commands shows
the relevant portion of the look up table it picks from the library and how it calculates the earlier
mentioned coefficients. From there the cell fall delay is calculated. And then we have to multiply
it by the derating factor to create some more margin since this run is using wire load model so it’s
optimistic. The calculated delay 0.063 multiplied by derating factor 1.35 matches what’s there in
the timing report 0.085.

[p62] Slew degradation

From the above analysis, we know that the output slew of the driver is not the same as the input
slew seen by the load pin.

The output slew of a driver depends on the input slew at this driver and the load capacitance seen
from the output of this driver.

The slew rate at load input pin depends on both the output slew of the driver and the slew
degradation along path due to resistive nature of the wire.

If this is a multiple-fanout net, which means one driver drives more than one loads, each load pin
can have a different slew rate at its input.

This phenomenon is called slew degradation.

[p63] Topic 10: Zero interconnect mode (in Design Compiler)

Usually during the initial phase of logic synthesis, we want to analyze if the design is feasible
without the impact of wire delay and buffer tree. If the design cannot meet timing with no
interconnection delay, then there must be something fundamental problem about the logic to fit
in the required cycle time. Design compiler provide us an option to Forces the timer to ignore the
contribution on a timing path from any wire capacitance in the design by setting zero interconnect
delay mode to true. When set_zero_interconnect_delay_mode is set to true, the tool analyzes
your design with only cell delays and the capacitance of the pin-load on all the wires in the design
to determine if your design meets timing goals. The tool does not consider wire capacitance due
to timing paths. The command is used primarily in preplacement and postplacement steps to
assess design and constraint feasibility. It also helps when debugging potential missing timing
exceptions in the constraints. Zero interconnect delay mode must not be used in the final
implementation step.

It is worth noting that, even in the zero interconnect mode, driver resistance and load pin
capacitance still exist and being taking into account of delay calculation.
[p64] Slew Merge
Delay calculation is performed as the signal edge propagates in a forward direction across the
logic path.

If we were computing the timing on a chain of buffers, we will derive the input slew of the next
stage from the output slew of previous stage, performing delay calculation and storing the results
on the timing graph as we propagate along in the direction.

However, when two slews arrive at the same point on the graph, static timing analysis tools like
PrimeTime must choose one of these slews to propagate forward so it can continue delay
calculation for the downstream logic. One common case is the output pin of a multi-fanin cell
where multiple timing arcs converging at.

These points where a slew must be chosen are called slew merge points.

To ensure the min/max graph values always bound the fastest and slowest possible timing, the
worst slew must be chosen and propagated forward.

We can see that there is inherent pessimism in a graph representation of a design's timing. For
the real physical device, a timing arc can have multiple timing behaviors depending on the
upstream logic that sources its transition. However, a graph representation allows each timing arc
to have only a single min/max rise/fall timing behavior - if we tried to store a value for every
possible upstream path, we could have thousands or millions of values stored per arc, resulting in
impossible memory and runtime requirements. To ensure the min/max graph values always
bound the fastest and slowest possible timing, the worst slew must be chosen and propagated
forward.

[p65] Max Delay & Min Delay

In static timing analysis, every timing arc is evaluated twice, once using the "max" stimuli and once
using the "min" stimuli:

When calculation delay for max-delay arc, tool will pick:

Maximum annotated lumped capacitive wire load from max SPEF file
Maximum pin capacitance or receiver model from max timing library
Maximum slew propagation is performed at slew merge point when calculating cell delay.

Vice versa, When calculation delay for min-delay arc, tool will pick:
Minimum annotated lumped capacitive wire load from min SPEF
Minimum pin capacitance or receiver model are used from min timing library
Minimum slew propagation is performed at slew merge point when calculating cell delay.

[p66] Analysis Mode – BC_WC

Analysis mode is about the way STA picking up delay values during its analysis.

The simplest method is to use a single delay possibility across the entire chip. every timing arc is
evaluated as max delay arcs. Both setup and hold paths use the computed max-delay arcs. That
is, we use same library characterized in one operating condition for both max and min timing
checks. The cell delay is deterministic and have only one possibility. However, this method does
not reflect any process variation which happens in reality. The accuracy is very bad.

That’s the best-case / worst case mode appears. When we set the STA in this mode, it reads in
two sets of extreme delay values. The two sets of delay value can represent two PVT
(process/voltage/temperature) which cannot physically coexist at the same time. The two corners
in bc_wc mode represent two completely independent PVT corners. Setup paths use the longest
path through the max-delay arcs for launch, and the shortest path through the max-delay arcs for
capture. Hold paths use the shortest path through the min-delay arcs for launch, and the longest
path through the min-delay arcs for capture. In other words, the bc_wc analysis mode only checks
setup at the max corner, and hold at the min corner. It is important to remember that setup paths
are not checked at the min corner, and hold paths are not checked at the max corner. This could
miss timing violations due to differences in how the launch and capture paths track the PVT
difference between the corners.

The single and bc_wc analysis modes both have a serious accuracy limitation: either the fast
launch/capture paths are computed by using max delay arcs (both single and bc_wc), or the slow
capture path is computed by using the min delay arcs (bc_wc). These modes were suitable for
designs in older technologies where slew sensitivity and slew variation were minimal. These
modes can, however, result in optimism when used on modern small-geometry designs.

[p67] Analysis Mode – OCV

So the widely used mode nowadays is On-chip Variation mode. In this mode, STA will also use two
sets of delay values, the two sets of delay values represent two conditions which can physically
coexist on the die at the same time. Setup paths use the longest path through the max-delay arcs
for launch, and the shortest path through the min-delay arcs for capture. Hold paths use the
shortest path through the min-delay arcs for launch, and the longest path through the max-delay
arcs for capture. On top of this, we usually apply some scaling factor on top of this to make the
situation even more restrictive. The OCV is an effective way to guard band the design, but it could
be too pessimistic due to all the reasons above. We will talk about how to reduce its pessimism
while keep the accuracy in a later chapter.

[p68] Timing window

One place that easy to get confused by people new to STA is the timing window. It is nothing but
the possible range of arrival time for a signal to arrive certain timing node.

We know that there could be multiple different paths reaching one common node in a timing path.
they could have gone through different type of logic gates, different amount of gates, experience
different process variation and crosstalk effect, so certainly the arrival time is different. The timing
window is simply the window between the earliest possible switching time of a node and the
latest possible switching time.

The calculation of timing window is very important in Graphic based analysis and that affect the
noise and crosstalk calculation of a design too.
For example, among 3 paths on LHS figure, path 1 has gone through the largest path delay and
arrivals later than the other 2 paths. Path 3 goes through least gates and having shortest path
delay. So the timing window for the signal on the input net to the D pin is between the path 1
delay and path 3 delay.

[p69] Timing Graph-based Analysis (GBA)

Next, we will introduce the GBA mode and PBA mode. They are two analyze strategy used in
different phase of timing closure.

GBA mode is short for timing graph-based analysis. In this mode, a timing arc could have only one
single set of most conservative rise and fall transition time and the timing window is propagated
along the path.

The principle is to guard band the entire path using worst case value.

Let’s assume the path above is a long path before the slew merge point. But it has a large drive
strength inverter driving the nand gate input so it has a steeper transition.

On the other hand, the path down below is a short path but its transition is slower.

In GBA mode, we could have only one value for the slew to be propagated through this NAND
gate. So according to the GBA principle, the slower one is picked for max timing path.

This choice makes the upper paths more pessimistic after the slew merge point. The signal arrival
time is within the range of a timing window no matter where the path it is coming from.

[p70] Timing Path-based Analysis (PBA)

In PBA mode, on the contrary, the slew and signal arrival time are calculated separately. Each path
is isolated from each other so they don’t interfere.

This time, the faster transition is able to be propagated through the NAND gate so the above path
could see a real transition even after the slew merge point.

The signal arrival time is calculated specific to this path and it has only single switching edge. So
we can calculate noise more accurately from it.

Normally, PBA mode timing QoR is more accurate because of above reasons and should be always
looks better than GBA mode.

But since it has to isolate each path and store unique slew rate and arrival time for each timing
node, it requires much more computing power to process this high volume of data.

The normal strategy is to use GBA for timing closure when the design still has quite a lot failing
path or timing violation. Then after all major timing problem have been solved and only very few
tail paths left, we can enable PBA mode for accurate signoff check.

[p71] Topic 11: report timing paths

The most useful command of any STA tool would be report timing path command. Report_timing
in synopsys tools all have similar behavior even though it might have differed a little bit from tool
to tools.
On the left side, the timing path is reported with the original command without any additional
switch. Each output pin of the cells is reported along with the incremental delay going through
each cell. This incremental delay value includes both the delay from the input wire and the delay
introduced by the cell itself.

More information is needed beyond that when we try to analyze a path or doing any kind of timing
fix. So usually we have to append some additional switches to the original command. Just like the
one shown on the right hand side, the timing report now is showing input nets as well as the
fanout for each net, the total capacitance shown on the wire and transition time for each signal
rise and fall. According to these information, we can decide where has the potential to improve
timing results, fixing speed timing path.

[p72] (cont’d)
The most common switches are listed here.

-net not only shows the net between pin nodes, but also show the number of fanouts for each
net

-input shows the input pin through which the path is going through. It’s useful when tracing report
through multiple input cells. Also, it split the delay associate to a cell into net delay and cell delay.

-tran shows both the input and output transition time used for or calculated by the delay
calculation.

-cap shows the total capacitance appear on the net, including both wire capacitance and input pin
cap from next stage

If the timing constraints have been changed on the fly during debug, STA engine will need to re-
calculate the timing graph again. Even though report_timing will re-time the design implicitly and
incrementally, It’s always a good practice to do an explicit update_timing explicitly before
report_timing.

[p73] Chapter Summary

In this chapter, we have gone through a lot of fundamental concepts of delay calculation. Firstly,
we introduced timing arc, timing sense and timing path. Then two major cell delay models are our
focus: the non-linear delay model and composite current source model. We now know the CCS
model has been developed to overcome the inaccuracy of NLDM when the interconnect
resistance is relatively large compared to the driver resistance. Next we move on to the wire delay
model, learned about the basic type of wire delay models including wire load model for pre-layout
and distributed SPEF parasitic annotation for post-route. Since they are both RC tree, the way how
the RC tree is typically calculated is also discussed.

Last but not least, few easily got confused concepts have been clarified such as slew merging,
min/max arc delay, analysis mode and timing window. GBA and PBA mode concept has also been
mentioned. Overall, this is a big trunk of design knowledge need to be understand for later study.
Chapter 3 Constraint Develop

[p75] Clock Diagram

A synchronous design is where a control signal triggers the circuit to transfer from one state to
another.

The trigger can happen at the positive or negative edge or both edges of the control signal.

Such a control signal which acts as a trigger for a synchronous design is called a clock and the edge
on which the design triggers is called the active edge of the clock.

Clock Diagram is the first thing the STA engineer should get from the clock architecture designer.
The clock scheme of a design is largely depending on the functionality it wants to realize.

But normally all the clock structures have something in common. The picture shown here is a very
generic clock diagram.

First, let’s exam what kind of elements are there in this clock diagram.

You will see the mainly three parts: clock generation part, clock selection part and clock gating
part before the clock actually reaches the clock pin of the flip flop.

For clock generation, the on chip system clock is usually generated from an analog block called
phase lock loop. The PLL has a feedback loop which can raise the clock frequency of a low
frequency reference clock source to the real operating speed of the chip. But depends on the
application, the on chip clock may not be the only clock source get used. That’s why we have clock
selection logics.

System clocks can come from either outside the chip as an external source, or generated from PLL.
Beside functional clocks, we could have test clocks targeting at debug the chip. The clock selection
logic selects the proper clock to be propagated for downstream logic under certain functionality.

As the power saving is more and more of a concern in an ASIC design, clock gating technique is
widely used. The clock gating allows the designer to disable the toggle on certain clock when it is
not used by downstream logics, so it can save a lot power. The clock gates can be architectural
designed and coded in the RTL Verilog or it can be inferred by the physical implementation tool.
We will explain more in a few slides.

[p76] Timing path groups

To develop timing constraints, we have to first lean about timing path groups.

The startpoint of a timing path can be input ports, the clock pin of synchronous flops or memories.

Similarly, the endpoint of a timing path can be output ports or data input pin of synchronous
devices.

Generally speaking, the path groups for a design can be divided into external path and internal
path.
External paths are the path talking to outside design. For example, if the launch clock of a path is
from another partition but the path ending up in the current partition.

Internal paths are the one completely inside current partition.

A more detailed categorization can be:

1) Paths from input data port to output data port, which we call it a feed through path.
2) path from the input data port to data input of a flop/memory
3) path from the Clock pin of a flop to the Output data port
4) path from the Clock pin of a flop to Data input of a flop/memory

In most mainstream STA tool such as Primetime, the tool creates internal path group for each
clock domain according to endpoint clock. That means if a path goes from clock_A domain to clk_B
domain, it belongs to clk_B path group.

Default path group includes all non-clocked paths such as asynchronous set/reset paths.

The default report_timing commands will dump out the worst timing path for each of these path
groups.

[p77] Create path groups

This slide shows a common way to create the above mentioned 4 general path groups in logic
synthesis tool (like DC) or physical implementation tool (like ICC)

First group is for timing paths from an internal register to an output port. It can be created by
simply specifying –to option with all all_outputs. The all_outputs is actually a built-in commands
to return all the output ports in current design.

Next, we are going to find all the clock input port by using all_fanout –clock_tree –level0, this is
return all the clock source nodes, including ports and pins. The returned objects will then be
filtered by get_ports commands to be only ports. Then, we exclude these clock inputs from all
input ports, which gives us all the input data ports. By specifying paths from all input dataports,
we created the input to register path group.

Then, we create another path group from all inputs to all outputs, which describes all the
feedthrough paths.

Anything left will be timing paths from internal register to internal register.

On top of these 4 groups, user can create their only dedicated path group targeting specific timing
paths. You can set more weight on any path group that need to be worked on by the optimization
engine.

[p78] What is SDC?

So how do we actually tell the STA tool about our timing requirement of a design?

The industry standard is by using synopsys design constraints or simply called SDC.
SDC is a tcl based text format with commands created by synopsys for timing constraints. BTW,
this is one of the reason why the job position requires you to have experience and knowledge of
tcl language.

The most essential commands to constrain a design can be clock creation, input/output delay
constraints, environment setup and timing exceptions.

By using these commands, the STA engineer guide the STA tool to look at the real design issues
but not false paths. A good timing constraint is valuable for a timely design closure and critical for
a functional success.

[p79] Master Clock Creation

System clocks and their delays are extremely important constraints in ASIC designs. System clocks
are typically supplied externally, but can be generated internally from an ASIC chip. All delay,
especially in a synchronous ASIC design are dependent upon the system clocks. So the typical
Scenario for a master clock creation is to model clock generated from Off-chip Oscillator or On-
chip Phase Lock Loop

Remember we have divide all the timing path as external path and internal path. After the clock
creation, ideally all the functional flop to flop path will be clocked so internal timing path can
already be analyzed.

The way in STA to create a clock source is by describing its waveform. Waveform option specifies
the waveform within one clock period, which then repeats itself.

Let’s take a look at this example:

The first argument specifies the time at which rising edge occurs, the second argument specifies
the time at which the falling edge occurs. All the edges must be monotonically increasing and
within one period. The edge time alternate starting from the first rising edge after time zero, then
falling edge, then rising edge again and so on. There must be an even number of edges specified.

When the –waveform option is not specified, the clock is assumed to have a 50 % duty cycle. If
there are more than one clock on the same clock source, we must use -name and -add switch to
make them co-exists. Otherwise, the one defines later will override the one defined earlier.

Define clock correctly is extremely important to the STA analysis. If even a single clock is specified
incorrectly, the impact could be felt by millions of paths within the design. It may cause the block
to not meet timing. Even if the block meets timing, it may give a false sense of timing closure.

A missing clock constraint would also mean that a huge number of paths in the design may not be
timed.

Since clock specifications impact maximum number of paths, even a single incorrect or missing
specification could be highly detrimental to the design.

[p80] Generated Clock Creation

Most complex designs require more than one clocks for its functioning. When there are multiple
clocks in a design, they would need to interact or share a relationship.
For most part of the design, the sequential elements are driving by synchronous system clocks,
which means those clocks share a fixed phase relationship.

In most cases, synchronous clocks originate from the same clock source.

When a new clock is generated in a design that is based on a master clock, which means it will
have phase relationship with the master clock, it can be defined as a generated clock. This
definition is needed because STA does not know that the clock period has been changed at the
output of clock divider or multiplier and what is the new period should be.

Typical scenario for generated clock is when the signal coming out of a clock divider Logic, clock
multiplier or clock gating logic. To distinguish the divided version or gated version of the clock, a
new generated clock needs to be created.

A source object can have more than one clock. If the master clock source pin has more than one
clock in its fan-in cone, then the generated clock must indicate the master clock which causes the
generated clock to be derived. This is specified using the -master_clock option. This option takes
the name of the SDC clock that has been defined to drive the master clock source pin.

Once a generated clock has been defined, the clock attributes such as waveform or period would
be derived by the tool, based on the characteristics of the waveform at the source.

To describe the waveform relation between the master clock and generated clock, the most
commonly used switches are as following:

1. -edges

This is represented as a list of integers that correspond to the edge of the source clock from which
the generated clock has been obtained. The edges indicate alternating rising and falling edge of
the generated clock. The edges must contain an odd number of integers and should contain at
least 3 integers to represent one full cycle of the generated clock. The count of edge starts with
“1” and represents the first rising edge of the source clock.

2. -divide_by

This represents a generated clock where the frequency has been divided by the specified factor,
which means the period is multiplied by the same factor.

3. -multiply_by

This represents a generated clock where the frequency has been multiplied by a factor, which
means the period is divided by the same factor.

It should be noted that though clocks are defined using period attribute, the multiply_by or
divide_by are specified using frequency in mind. When a generated clock defined using -divide_by
or -multiply_by options that needs to be inverted, then it can be specified using the -invert option
and make the generated clock start with a falling transition instead of a rising transition.
[p81] Topic 12: generated clock blockage
Another important thing about generated clock is it will block other clock’s propagation, even the
clock is its own master clock. When a generated clock is created at a pin, all other clocks arriving
at that pin are blocked unless they too have generated clock versions created at that pin.

The definition of a generated clock is like a breakpoint to all other clock paths. It can be explained
in this example. Let’s say we defined a master clock at the input ports clock with a period of 8ns.
Then the clock goes through some dividing logic and produces a div-by-2 version and div-by-4
version. The two divided version merge after a MUX, leaving user to select from according to
functional needs. Let’s say for some reason, we only defined the div-by-2 waveform at the output
of the MUX, but we also want to analyze the div-by-4 clock as well. What should we do?

A few observations can be made here:

1. There won’t be divided-by-4 clock show up at node 4 since the definition of div-by-2
generated clock has blocked the clock path of divded-by-4 clock.
2. It also blocked the way for master clock, but since there is no consumer of the master
clocks downstream, we are not so care about it;
3. To fix the issue, we can define the div-by-4 generated clock also at node 4 with –add
option to the create_generated_clock command, but it is not the preferred way. Here I
will let you to think about why;
4. One preferred way is to define the create_generated_clock for divided-by-4 clock at the
input of the MUX and then the MUX should be able to propagate both divided clocks.
5. Depends on the design content downstream, we may need to set exclusive relation
between these two clocks since timing paths between them may not be real.

[p82] Topic 13: Break Generated Clock Timing Loop

Generated clock sometimes could lead to other issues. For example, it may form a sequential
timing loop accidentally. PT will issue warning or error when it has encounter such timing loop,
cause once a loop in the clock path, the clock latency cannot be determined. It will also try to
break the loop automatically, which could be dangerous since valid timing path might be broken
at wrong places.

First thing is to identify the loop by using check_timing –override generated_clock –verbose. This
gives you the loop seen by PT. note that every time after you fix the loop, run this command again
to check if the issue has completely gone. The tool may see new loops coming up once old loops
are fixed. It’s an onion peeling process.

We can make some observation here also:

Obviously, the timing loop occurs in the left side. Flop on the left is generating the clock for flop
on the right;

The output of the flop on the right feeds back to the datapath of the flop on the left. Ideally, such
feedback shouldn’t be happening, so the real fix needs to come from the RTL change. But as a
quick workaround, we can disable timing arc at either node 1 or node 2. On node 1, we can break
the select to output arc of the MUX; on node 2, we can break the data input to data output arc.
Again, it’s not a good idea to let STA tool such as primetime to break the loop automatically.
Manually break the loop to maintain the design consistency over design phases and tools.

[p83] Clock Uncertainty

Clock uncertainty can be defined as a window within which a clock edge can occur. The
uncertainty is to account for several factors such as clock period jitter and clock skews or any
other design margin used in STA analysis.

Clock Skew

When a clock is generated by a source, it may not arrive at all the flops at the same time. The
difference in the arrival time at various flops due to different paths through clock network, or
coupling capacitance from crosstalk or PVT variations in the design. This causes the edges of the
same clock not to align when they reach the various devices. This difference between clock arrivals
at different points in the design is referred to as clock skew. Clock skew can be between different
points of the same clock ( intraclock ) or different (usually, synchronous ) clocks ( interclock ).

Clock Jitter

At the clock generating device (say: PLL) itself, a clock’s edge may not be deterministic on account
of crosstalk or electromagnetic interference or due to PLL characteristics.

This undesired deviation in the periodicity of a clock is referred to as jitter.

The above two phenomenon means the clock period itself could have variation. Thus, in STA we
use clock uncertainty to take these variations into consideration.

Another important aspect of uncertainty is that its value varies between Prelayout and post-
layout. In the pre-layout stage there is no CTS performed; the uncertainty value must take into
effect the possible impact of skew that will be inserted. However post CTS, the actual clock
network has been build so that clock arrivals an be propagated through the network and the skew
portion is already known and doesn’t need to be specified as uncertainty. So, the clock uncertainty
in the post-layout stage is generally less than pre-layout.

[p84] Clock Latency and Insertion Delay

Clock Latency is the time it takes from the clock source to an endpoint, it includes network latency
and source latency (insertion delay)

Network latency is the delay from the clock definition point (create_clock) to the clock pin of a
flip-flop.

The network latency is an estimate of the delay through the clock tree before clock tree synthesis
stage. After the clock tree is built, network latency will be replaced by actual clock network delay.

Source latency, also called insertion delay, is the delay from the clock source to the clock definition
point.

Source latency could represent either on-chip or off-chip latency.

After clock tree synthesis, the total clock latency from clock source to a clock pin of a flip-flop is
the source latency plus the actual delay of the clock tree from the clock definition point to the
flip-flop

It is recommended to use set_propagated_clock command to give directive to the tool that clock
network latency needs to be computed based on the actual circuit elements – including parasitic

Clock latency can be specified using set_clock_latency command.

[p85] Topic 14: Generated Clock Source Latency Path

In this topic, let see a commonly used trick for maintaining same clock source latency in clock
divider logic.

Say if we have to generate two version of clocks, one is divided by 2. The other is divided by 4. In
order to generate the divided-by-4 clock, we used two cascaded divided by 2 clocks. Thus, the
clock path for divide by 4 clock will be longer than divided-by-2 clock. When we mux them
together and feed the output clock to downstream logic, it could see a difference when balancing
the clock tree during CTS.

We want to minimize the clock source latency difference. One way to do it is by having the divider
logic on the datapath rather than clock path. Then use any flop to capture the divided enable
signal. This way the divider logic only modulates the free-running clock waveform. No matter it is
divided-by-2 or divided-by-4, the clock source latency will always be the same.

[p86] Virtual Clock

In some cases, the user needs to constraint ports/pins in a block that has no clocks. In such cases,
ports/pins are assumed to be triggered by clocks outside the block.

A virtual clock has no source specified. In reality, it might have a source, but that source could be
outside the block being constrained.

In case 1, we have a pure combinational feed thru path between one input port and the other
output port. Their related clocks are not used anywhere else inside the partition. Instead of using
a clock which is being declared for this block, a virtual clock can be declared, just for constraining
the combinational path.

In case 2, in order to constrain the output port which goes to an external flop, we can specify a
delay with respect to the real clock itself, but the flop outside the partition will also get the same
clock latency in STA calculation. We can specify the output delay with –source_latency_included
option, but that will hard code the delay values and we have to change every time if the clock
latency is changed.

Thus, in this situation, to define a virtual clock can help out to specify a unique clock source latency
for the flop outside the partition boundary. And if the clock insertion delay changed outside, we
can just modify the virtual clock latency for all the port associated with it.

In practice, virtual clocks are not often used and a good practice often seen is to remove all the
virtual clocks in the end and replaced with real clocks for timing signoff.
[p87] Topic 15: report_clocks
During debug, we may need basic information about the clocks defined in current timing run. We
can check whether create_clock or create_generated_clock commands have been interpreted by
the tool correctly. Report_clocks is the command to do this. It can return all clocks or specified
clocks with their period, waveform, clock root information. This command usually is the first thing
we may check when you debug any clock related timing issue.

[p88] Topic 16: report clock path

Sometimes we need to check the clock path connectivity. Report_timing command can not only
be used to report normal timing path, but can also be used in this case to report path reaching
the flop clock pin. If nothing blocks, the timing path can return all the gated on the source latency
path towards the flop clock pin from the clock root. But since this is not a valid timing path, it will
show as an unconstrained path in the report which is expected.

There is another command called report_clock_timing which does similar thing with some other
optional switches. That command could also be used depends on your preference.

[p89] Topic 17: show unconstrained path

By default, report_timing or get_timing_paths command will only search for constrained path.

But sometimes we want to debug unconstrained path, so by turning it on, it allows you to check
timing path in most cases even if it is not constrained.

To enable this feature, simply set application variable timing_report_unconstrained_path to true

value. Note, this may increase the overall runtime since the search space is increased, so use this
feature when needed.

[p90] Ideal network

You specify only the source of the network; the compile command treats all nets, cells, and pins
on the transitive fanout of these objects as ideal. The ideal property is automatically spread by
the tool and re-spread as necessary during logic optimizations.

The criteria for propagating the ideal property, starting at the source pins and ports, are as follows:

 A pin is marked as ideal if you specify it by using the set_ideal_network command if it is

either a driver pin and its cell is ideal or it is a load pin attached to a net that is ideal.
 A net is marked as ideal if all of its driving pins are ideal.
 A combinational cell is marked as ideal if all of its input pins are either ideal or attached
to a constant net (and other input pins are ideal). Objects with the case analysis attribute
set are not treated as constant.
 Ideal attribute propagation traverses through combinational cells but stops at sequential
cells.

In addition to disabling timing updates and timing optimizations, all cells and nets in the ideal
network have the dont_touch attribute set.
The size_only attribute is set on all cells of ideal network sources. If nets are specified, size_only
is set on all cells that are cells of the specified nets' global driver pins. This guarantees that ideal
network sources are not optimized away by compile.

[p91] cont’d
Let’s look at the situation when the set_ideal_network command is set at the clock source of a
clock path. We use set_ideal_network [get_ports clock] which means all nets, cells, and pins on
the transitive fanout of clock port will become ideal. In this case, the next pin hooking up to the
clock net u3479/A2 is getting this ideal attribute. We notice that the transition at both the clock
port and the u3479 A2 pin are zero, capacitance on the clock net are zero and the incremental
wire delay of the clock net are also zero.

Compared with the original clock path where nothing is ideal, the cell delay of u3479 decreased
from 0.10 ns to 0.08 ns which could be a reaction of the zero input transition at the input pin.

[p92] cont’d
However, if look carefully, we can see the ideal property is not propagated through the u3479
onward. The net N24 still have non-zero capacitance and pin C3176/A1 still has non-zero input
transition. Recall that a combinational cell is marked as ideal if all of its input pins are either ideal
or attached to a constant net, we notice that the cell u3479 could be the issue here.

Since u3479 is an AND gate with 2 inputs, let’s check the timing path on the other input of this
cell. Through report_cell with -connection option, we can see on pin A1 it is hooked up to tenable
input port. The timing path from tenable through u3479 is reported on the right hand side. Since
the tenable is not constrained initially, the cell u3479 is not marked as ideal so it blocks the ideal
network propagation on this path. Setting the tenable to be ideal so now u3479 will also become
ideal and getting the zero transition and capacitance.

[p93] cont’d
Another thing worth mentioning is the usage of –no_propagate option. It is sometimes wanted to
only set ideal property to a net segment rather than the entire fanout cone of the net. At the
meantime, we may still want to tool to do sizing optimization on the driver pin of this net. For
example, on the left hand side there is a reset path going through net n3490. Say we want to only
mark the n3490 to be ideal but allow topological optimization on other net segments. We could
set_ideal_network with –no_propagate option on this net, so the wire capacitance and transition
on this net are zero, the driving pin u42/Y is having size_only attribute as well.

[p94] Constrain IO paths – Input and output delay

Input or Output paths are external timing paths to the design scope. That means only a portion of
the entire timing path is inside the partition. Constraining input or output paths also referred as
timing budgeting.

Input paths are the ones being launched externally but captured inside the partition, while output
paths are launched locally but captured outside.

To the partition boundary, we only see a data port on the datapath and a clock port
sending/receiving clock signals.
Thus, to correctly model how long it takes for the signal travel outside the boundary, we can use
SDC command set_input_delay and set_output_delay.

This value can come from interface budgeting in the beginning of the project or be back-
annotated from actual full chip database once people start to iterate and optimize the design.

-clock option is used to specify the reference clock, with respect to which the delay value is
specified. This should usually refer to the name of the clock which is used to trigger/sample the
signal that reaches this input port. if the clock which samples the data does not enter the block of
interest, we would need to specify a virtual clock with the same characteristics and use that virtual
clock.

[p95] cont’d
This is an example of how the specified input/output delay are shown in the timing report. Let’s
set input delay to the reset port to be 0.4ns with respect to the clock clk, which means this reset
signal is probably a synchronously generated reset send from outside. The 0.4ns input delay is
shown as input external delay in the timing report below. This value has been an incremental
delay and then accounted towards the total path delay of the datapath.

Similarly, we set output delay to be 0.3ns on output port proc2mem_command with respect to
clock clk. The output delay value appears in the timing report as output external delay and was
deducted from the data required time. The output delay mimics what’s the path delay is outside
the output port.

If the launch flop and the capture flop are clocked by two different clocks, the input delay is usually
with respect to the launch clock and the output delay is usually with respect to capture clock.

[p96] Constrain IO – driving cell and load

Besides the set_input_delay and set_output_delay commands, there is another pair of commands
that constrains the IO ports. That is the set_driving_cell and set_load command. Set_driving_cell
mimic the last gates outside the design scope to drive the input port. By providing a library cell as
an input to this command, tool assumes there is a cell driving this port. The drive capability of the
port is the same as if the specified driving cell were connected in the same context to allow
accurate modeling of port drive capability for nonlinear delay models.

There are two other methods of describing port drive capability, using the set_drive or
set_input_transition command. The most recent drive command has precedence. If possible,
always use the set_driving_cell command instead of the set_drive command because the
set_driving_cell command allows accurate calculation of port delay and transition time for library
cells with nonlinear dependence on capacitance.

Similarly, set_load command works on the output ports to set the capacitance to a specified value
on the specified ports and nets in the current design.

[p97] Case Analysis

Setting case analysis value is a very often used technique in STA to setup design environment. We
know that the same chip will be working under different circumstances. Each circumstance will
require corresponding analysis setup. For example, in one functional scenario, CLK A might be
chosen to be the system clock while in another scenario, the CLK B serves as the main clock. So
we have to have a way to tell the STA tool when to use what clock source for the analysis.

By using case analysis, designer could control the right value to be propagated through the logic
cone. There are two types of case values in the design. One is the user-specified value, where the
case value is explicitly set by the user on a certain pin or port. So that the logic value of that node
is forced to be the value designer want; the other way is for the tool to automatically derive
constant value from the logic propagation from upstream logic. This usually happens on nodes
that are not set by the user directly.

To correctly constrain the design, we need to make sure the case value follows the design
intention and the derived constant value is not being propagated to wrong places and causing
trouble. We will talk more on this in constraint debug course.

[p98] cont’d
This is an example of how the case analysis affects the clock propagation along the clock mux. This
clock mux is operating like this: when the test enable pin T enable is zero, it allows functional clock
propagating through. When the test enable pin is one, it allows the test clock going through. The
corresponding clock paths from both functional clock and test clock are shown on the right side.
Initially if there is no case analysis being set, if we query the clocks attributes on the output pin of
cell C3176, it will return both clocks, which means from STA perspective, both clocks will be
propagating through. If we set case analysis on t enable to 1 or 0, only one of the clock will be
returned. This is actually the way to setup functional mode and test mode STA runs.

[p99] Disable Timing Arc

If in some cases, we don’t need to analyze certain signal through a particular logic cone according
to design intention, we can guide the STA tool not even propagates the logic value downstream.

The most command way to stop the propagation is:

For data signals, we can use set_disable_timing command to break the timing arc of a cell.

For clock signals, we can use set_clock_sense –stop_propagation to block clock propagation
through certain timing arc.

[p100] False Path

It is possible that certain timing paths are not real/not feasible during the actual functional
operation of the design. Such path can be turned off in STA.

The advantage of identifying the false paths is that the analysis space is reduced, thereby allowing
the analysis to focus only on the real paths, also cut down the run time.

For example, the CLK A and CLK B are two inputs of the same MUX. According to design intention,
only one of them can be propagated through the MUX at one time.

However, if we somehow forget to set the case analysis and also didn’t say anything about the
relation between CLK A and CLK B, the STA will make both clocks propagate through the MUX and
start analyzing timing paths between these two clocks.
From the STA perspective, this is the most comprehensive way to handle this case so it won’t miss
any possible timing paths. But from the designer’s perspective, such timing path will never be
going to happen so this analysis does not make any sense.

Thus, we can set false path between these two clock domains.

[p101] Asynchronous Clock Group

Sometimes, you would have designs where clocks may not be talking to each other depending on
how the design is architected, such as the function clocks will not interfere with test clocks. We
can of course set false paths between them, but a more preferred way to define such clock
relation is to set clock groups.

Given a list of clocks to be set in one clock group, the tool will assume these clocks belong to same
synchronous design, and all other clocks that has not been specified in this clock group to be async
to all the clocks in that group.

There are two basic relations between two async clocks.

First one is they are logically exclusive, which means both clocks coexists in the design, but they
don’t talk to each other (no timing path between them). Such as CLK_A and CLK_B in left figure.

The other one is they are physically exclusive. That means only one clock can exist in the design
at one time when multiple clocks are define on the same design object. Such as the GCLK_A and
GCLK_B

Note that the CLK_A and CLK_B are not logically exclusive anymore in the right picture since they
have timing path in between on the other part of the design.

[p102] Timing Derate

Setting timing derate factor is to make timing path slower/faster and then validate the behavior
of the design.

To guard band the design, we usually use different derating factor for early path and late path.
That means for setup check and hold check, different derating factor will be used, both represent
the worst scenario for that specific check.

Since the clock and data paths can be affected differently by the OCV, STA can model the OCV by
making the PVT condition for launch and capture path to be slightly different.

Cell delay and wire delay can undergo different derating factor. On cell dominant path, increase
cell derate can help logic optimization tool to be pessimistic and consolidate logic depth. -
cell_check option allow us to derate setup/hold/recovery/removal time requirement.

Global OCV – flat derating factor across all paths, computes worst-case early/late bounds,
pessimistic at smaller process nodes.

AOCV – LUT (look-up table) based derating factor annotated on paths according to logic depth
and path distance. Due to OCV, cell/wire could have different delay on longer path than cell/wire
on shorter paths. We will talk more detail on this topic in later chapter.
[p103] Topic 18: set_timing_derate
An example of the set_timing_derate. We usually use a large derating factor for more design
margin when the delay is calculated with wire load model. Here we derate the max cell delay to
1.35. From the library, we can calculate the basic delay value of this cell u5070 to be around 63ps,
which matches up with the value 85ps in the timing report when scaled by the 1.35 derating factor.

[p104] Chapter Summary

In this chapter, we went through the basic of clock creation and the caveat of using generated
clocks. Then we focused on the ideal network, which is an important concept for both constraining
and implementing the design. Constraining IO path is our next topic, this is how we budget our
boundary paths with pair of commands. Last but not least, disable timing arc, setting timing derate
or create asynchronous clock groups have been introduced. Keep up the good learning, it’s a great
milestone to achieve!

Chapter 4 Timing Verification

[p106] Design Rule v.s. Optimization Goal

Before we head into the timing verification topic, I want to first have an overview of design goals.

In order to physically implement an ASIC chip, there are two categories of goals we need to
achieve regarding to the timing aspects of the design.

The first category is called Design Rule. They are some technology dependent rules designers must
follow in order to make the chip work as intended. We know each device are characterized in a
certain operating range. So if we want to make the chip reliable, we should design in a way that
each device falls into the characterization range from the library. Otherwise the device is working
on some unknown status and its behavior cannot be captured accurately, which may cause design
failure.

The second category is called optimization goal. They are goals defines the performance, power
and area targets the designer wants to achieve. Say if the designer wants a chip to work on a
maximum frequency of 2 GHz, then each timing path need to propagate within the clock period
of 2GHz, which is 500ps. The optimization goals in timing aspect translate to timing constraints
for a design.

Normally, the design rules have precedence over timing constraints because obviously they have
to be met in order to realize a functional ASIC design.

[p107] Design Rule Check

So let’s take a look at the major design rules first:

Max transition rule defines the longest time it requires for a pin in the design to change its logic
values. Many logic libraries contain restrictions on the maximum transition time allowed for a pin,
creating an implicit transition time limit for designs using that library. Transition times on nets are
computed by use of timing data from the logic library.

The purpose of maximum transition rule is to:

1) Make sure delay calculations fall into library characterization range so it can be accurate.
2) Reduce input transition to reduce short circuit current and power consumption

Max transition limit can be on data signal as well as clock signals. Usually the clock propagation
requires much tighter transition time than data signals. Most of the design will have separate
specification for clock transition and data transitions.

Max cap rule defines the maximum total capacitive load that an output pin can drive. That is, the
pin cannot connect to a net which has a total capacitance exceeds the maximum capacitance
requirement defined for the pin.

The total capacitance seen by a driving pin is by adding the wire capacitance of the net to the
capacitance of all the sink pins attached to the net.

Usually, a cell with larger drive strength can have a larger max capacitance threshold.

Max fanout rule defines fanout restrictions for each output driver. Fanout load is a dimensionless
number set for each input pin by the library designers in the standard library. It doesn’t stand for
capacitance.

To evaluate the fanout for a driving pin, the tool calculates the sum of all the fanout_load
attributes for inputs driven by the driving pin and compares that number with the number of
max_fanout attributes stored at the driving pin.

This is a soft limit to restrict the number of fanout a gate can drive, it is defined to avoid max. cap
and max. transition violations. But as long as you meeting max fanout and max transition, you can
ignore max. fanout violations.

Min pulse width rule

In a design, the width of each signal pulse needs to satisfy certain threshold either defined in the
dot lib or set_min_pulse_width in order to function properly. This is especially true for the clock
signal to the sequential elements. The signal pulse width shrink happens mainly due to the non-
equal rise and fall time of the cells. The difference of rise and fall could be caused by OCV.

Sometimes the pulse width reduction is caused by large transition time so the mid-point of rise
transition to mid-point of fall transition has been shrined.

If the pulse width keeps decreasing along the path, at some point if the width is less than the AC
noise margin of the cell, the pulse won’t be able to propagate through the cell and is absorbed.
This phenomenon is called pulse absorption and should be avoided.

In physical implementation, we can use double inverters to replace unbalanced buffer so both the
rise and fall edge of the original waveform can experience same number of rise and fall transitions.
We will talk more on this in place and route course.

Another common reason for min pulse with violation is due to wrongly set clock uncertainty. If
the clock uncertainty value is too large, it will also eat up clock pulse width in STA analysis.
[p108] Topic 19: Minimum Pulse Width
Let’s take a closer look at the minimum pulse width requirement. The min pulse width
requirement is usually coming from the library when the cell is get characterized. In dot lib, tool
use fall_constraint for a low pulse and rise_constraint for a high pulse.

As we mentioned in previous slide, Delay difference in the rise and fall delays of the gates on the
path. If the rise delay of the cell is greater than the fall delay, then the output clock has a smaller
pulse width than the input.

Besides the non-equal rise and fall delay of the gates along the path, clock re-convergence path
pessimism also impacts the calculation of pulse width window.

Dynamic CRP means clock Reconvergence pessimism introduced by dynamic effects like signal
integrity delta delays or dynamic clock source latency. Depends on the different derating set for
early and late clock path, the dynamic CRP will reduce the calculated pulse width because the
path delay could be deviated at different timestamp.

The static CRP refers to a CRP value computed from clock arrivals that do NOT include any dynamic
effects. The static CRP can be removed from the reduction of pulse width so it’s giving credit.

Besides all above, clock uncertainly also will be reduced from the calculated pulse width to
account in the uncertain nature of the clock.

As the picture showed on the right, from the library the min pulse width for low pulse and high
pulse are defined as 0.07 and 0.06 respectively. After we time the design and report the pulse
with on the clock pin, we can see the actual calculated pulse with and the slack against the
requirement.

[p109] (cont’d)
And here is another explanation of how the min pulse width is calculated in the tool. As you can
see on the left hand side, this is a min pulse width calculation for high pulse. According to the
notes on the right, the leading edge is calculated as the max_rise clock arrival time, which is the
late latency path. The closing edge is calculated as the min_fall clock arrival, which is the early
latency path. So the initial 10ns pulse eventually became 8ns wide.

The same holds for low pulse width calculation. You can exercise the details using the notes on
the right hand side.

[p110] Optimization Goal

The Optimization goal of IC design is the PPA goal which stands for performance, power and area
goal. The performance goal usually translates to the maximum clock frequency a chip could be
operated on since high frequency, more workload can be handled within a given time window. In
backend, effective clock frequency is directly related to the path delay, thus to optimize
performance, optimizing path delay is the main goal. We have two metrics to judge the timing
quality of a design, namely the worst negative slack and the total negative slack. We can easily
tell from their name: the worst negative slack is the worst negative slack among all the paths in
the given scope and the total negative slack is the sum of all the negative slacks in the scope. The
scope could be the entire design, or a specific path group or even clock domain.
The power goal during delay optimization is mainly on the VT class usage. As we all know, there
is a tradeoff between transistor speed and leakage power. Low VT threshold cells have smaller
cell delay but relatively high leakage power while high VT threshold cells have larger cell delay but
lower leakage. The rough idea is to use high VT threshold cell on paths with large positive slack
so even it become slower it won’t fail the setup timing check. On max timing critical paths, use
low VT cell to speed up the design.

For area, there is only one goal: to minimize the die size required. Of course, the physical limitation
to the area is you have to make sure the design is routable with no shorts and physical DRC rule
violation such as minimum spacing between two metal traces, minimum width of the metal trace.

On the right side, there is a sample delay optimization process. It shows you the time cost of each
optimization step along with the different design PPA metrics. Usually this can be found in the run
log files. It can serve as an early indicator of the current run quality.

[p111] Topic 20: report_constraint

In synopsys tool, report_constraint is an extra useful commands to verify if the design constraints
and run quality is in good shape. It shows a summary of the worst violator per endpoint of each
violated design rule constraints in the current design. According to your need, you can customize
the reporting. Ideally, it can report both DRC rule violations and timing violations. As shown on
the left hand side, there is a section from the report showing the max_delay violation which is the
setup check for REGIN group. It lists out each endpoint which has a negative slack and the required
path delay, actual path delay and the calculated negative slack. On the right hand side, it is
reporting the max transition violation. This is a net based report, which lists out the net has max
transition violation and each input/output pin hooked up to this net. Fixing the net will fix all the
transition violations on its associate pins.

[p112] Pre/Post – layout STA

The Static Timing Analysis has to be performed on all the stages during timing closure. We can
largely divide these analyses into two categories: namely the Prelayout STA and Postlayout STA.

Pre-layout STA

1) Interconnection Parasitic is assumed as ideal or estimated based on WLM since no

placement information available.

2) Clock Tree is assumed to be ideal before CTS so the STA focus on datapath issues.

3) At this stage hold timing violation is usually ignored. (No real skew information)

4) For intra-clock uncertainty, clock skew estimation and clock jitter value has to be modeled
for setup analysis; clock slew estimation has to be modeled for hold analysis since the
hold check is performed on same clock edge of same clock, so the contribution from jitter
has been canceled out.

5) For inter-clock uncertainty, both clock skew estimation and clock jitter value has to be
modeled for setup and hold analysis since the timing path crosses different clocks.

Post-layout STA
1) First step is to extract parasitic from the actual layout

2) Clock tree has been implemented and clock network delay is propagated with real delay
values.

3) Now the hold timing is realistic and needs to be fixed.

4) For intra-clock uncertainty, only clock jitter value has to be modeled for setup analysis;
Clock skew value will be propagated from clock delay calculation for both setup and hold
checks.

5) For inter-clock uncertainty, clock jitter value has to be modeled for setup and hold
analysis; clock skew value will be propagated from clock delay calculation for both setup
and hold checks.

6) May plugin additional recipe for robust verification such as OCV derating factor.

[p113] A Basic Flip-Flop structure

Let’s look at the basic structure of a flip flop.

The core of a flip flop consists of two back-to-back latches. Each latch consists of two transmission
gates and two back to back inverters. Two latches are triggered by opposite clock edges.

[p114] Setup Time

Assuming the clock active edge is rising edge. Before the rising edge, the first transmission gate is
transparent so the path between the data pin to the node 4 is fully open.

At the same time, the second transmission gate is shut off but the second latch is holding the
value and send it to output Q pin.

Notice the Node 4 is where the sample clock edge is going to get data into the first latch. The data
has to come all the way from the node D through a transmission gate and two inverters so it has
some path delay. This means at the moment of the clock sampling edge, the data is actually
coming from some time back and this time equals to the propagation delay from the D pin to the
Node 4. We call the propagation delay for the first latch the setup time. (Tpd_latch)

In order to have a steady input value for the clock pin to sample, we must ensure the data will not
change value during the clock active edge. This means the data pin cannot change value even
before the clock active edge for a certain amount of time. This is where the setup time comes
from.

[p115] Negative Setup Time

In above analysis, we only looked at the core structure of a flip-flop. In reality, there are some
extra delay even inside the library cell before signal reaches the back to back latch structure. The
library setup time needs to consider the propagation delay before both Data and Clock pin of the
flop.

The total propagation delay from outside the flop to the node 4 will be propagation delay through
interconnections or any elements before the node D plus the original propagation delay of first
latch. At the meantime, the clock signal also need some time to activate the latch. Thus, the real
setup time will be Tpd_data + Tpd_latch – Tpd_clk. Depends on the value of these components,
setup time could be positive or negative. Flops with negative setup time means the data can
arrival at the flop later than the clock edge, which can be used as a way to fix setup violation, but
it has a larger hold time requirement.

[p116] Hold Time

During the clock active edge transitioning, the data pin should also be hold steady.

In actual circuitry, clock signal is generated from CLK bar signal through an inverter. The inverter
has propagation delay, so during the clock edge transitioning, there is a certain amount of time
both the CLK and CLK bar is in transition and the first transmission gate is still transparent.

Thus, it takes some time for the transmission gate to completely shut off. If any new data change
now passes through this gate before its completely closed, the original data is now corrupted and
hence correct data cannot come out of the RHS latch.

The hold time is determined by the inverter propagation delay to make first transmission gate
completely shut off.

[p117] Negative Hold Time

Similar to the situation of setup, the library hold time also needs to take care of the extra delay
outside the latch structure. The real hold time will be propagation delay to the clock minus the
sum of propagation delay to the data pin and inverter delay to make the first transmission gate
shut off. Depending to the relative value of these elements, hold time can be positive or negative.
Negative hold time means the data can change before the clock edge.

[p118] Clock-to-Q Delay

Finally, the clock active edge is passed, now the second transmission gate is transparent and the
path from fist latch to the output Q pin is open.

The time takes from the first latch to Q pin is the cause of clock to Q delay.

[p119] Metastability
From the analysis above, we know that due to the propagation delay inside the flip flop, there will
be a certain time range that the data pin cannot change value so that clock active edge can sample
the data steadily.

We define this period of time to be “Metastability window”. Mathematically, the Metastability

window equals the sum of setup time and hold time.

The Metastability window must be positive, however either setup time or hold time alone can be
negative. This is because the propagation delay to the D pin and clock pin has difference as
mentioned in previous analysis, so it could possibly have negative setup or negative hold time
requirement.

In some cases, we can use negative setup flip flop for max timing path fixes.
Metastability is another big topic. We will not discuss in this course but all the following timing
checks are here to ensure a flip flop will not go into Metastable status.

[p120] Clock to D Timing arc

Both setup and hold check are happening on the clock to D timing arc. In general, the launch path
is simply path coming to the D pin of the capture flop while the capture path is the path coming
to the clock pin of the capture flop. So the setup or hold time requirement from the capture flop
is simply to check the arrival time of CLK signal against the Data signal.

[p121] Clock to D timing arc in Library

Here is an example from a library file. Note this section is for pin D and the related pin is CLK,
which means it is the clock to D timing arc. Within the timing section, there is setup_rising and
hold_rising, and the constraints are a two-dimensional table using input transition of clock pin
and input transition of the data pin as two index. Note this is different from the regular cell
delay NLDM look-up table where the second index is the output load.

[p122] Setup Timing Check

After knowing what a setup time is in the library, we have to apply that into our STA analysis. We
know each timing paths are actually consisting of two parts: the launch path and the capture path.

The launch path is the path going through the clock tree to the clock pin of the launch flop and
then the data path between launch flop and capture flop.

The capture path is the path going through the clock tree directly to the clock pin of the capture
flop.

As we can see, the propagation delay on the launch path is the sum of insertion delay of launch
clock, the clk-to-q delay of launch flop and the propagation delay of the data path;

The propagation delay on the capture path is the sum of the insertion delay of capture clock.

For a setup timing check, we expect the data to be launched at cycle N and to be captured at the
next cycle (N+1). At the same time, the value on the data pin of capture flop must be stable for
the amount of setup time when the capture clock is sampling the data.

Thus, we can have following math equation to establish the relation between all these delay
values. Basically this simply mean the data launched must be stable before it gets sampled.

To make the requirement more restrictive, the clock uncertainty is also accounted in and being
subtracted from the RHS of the equation.

T launch clock path delay plus T clock to q delay plus T data path delay should less than or equal
to T cycle plus T capture clock path delay minus T setup time requirement minus T clock
uncertainty

The slack is difference between the LHS and RHS. Once we have the slack number, it is straight-
forward to derive the maximum effective clock period and frequency out of this equation:

Fmax equals one over T cycle minus slack

[p123] Interpret Timing Report
In the STA tool, we can generate timing report for any timing path that exists in the design. For
example, this is a simple setup check timing report.

The startpoint stands for the launch flop, notice that the flop is driven by CLK_A.

The endpoint stands for the capture flop; it is driven by CLK_B.

By default, this path belongs to the path group of capture clock, in this case it is CLK_B.

The path type “max” means this is a max timing path.

“Point” means the following are the timing point along the path.

“Fanout” is the number of connection from the driving cell.

“cap” is the total cap see by the driving pin, which is the sum of wire capacitance and load pin
capacitance.

“tran” is the input and output slew rate of the pin.

“derate” is a scaling factor for modeling any systematic margin such as on chip variation effect.

“incr” is the propagation delay of the wire or the cell. The value on the line of input pin stands for
the wire delay to the input pin. The value on the line of output pin stands for the cell propagation
delay of the driving cell.

“path” is the accumulated path delay so far along the path.

“voltage” is the voltage condition for this path, it should be corresponding to the PVT corner the
analysis is done.

<Explain the meaning of annotation symbol>

“H” letter: means the delay value is Hybrid annotation from multiple sources such as wire load
model and SDF

“*” (asterisk sign): means the wire/cell delay value are directly annotated by Standard Delay
Format (SDF) file

“&” (ampersand sign): means RC parasitic is back-annotation from a Standard Parasitic Exchange
Format (SPEF)

“#” (hash mark): High-fanout net

If there are no symbols: it means the delay is estimated by wire load model

The path launched from time stamp 0 and being captured at 0.80ns. Since we haven’t apply any
multicycle constraints on this path, this tells us the clock cycle is 800ps.

The actual arrival time (path delay) of launch path reaches 0.97ns in the data pin of capture flop
while the capture path data required time is only 0.77ns.
This means the launch path is too slow so the data pin could be still changing values when the
capture clock is sampling data. Thus this path is now violating the setup check requirement.

[p124] Fixing Setup Violation

One of the most favorite question people like to ask during the job interview is: how to fix the
setup violation?

This slide has listed several categories of common timing fix techniques. I am going to briefly
introduce the most commonly used techniques here. More detailed explanation and examples
will be provided through another advanced STA course.

The first category is to fix the setup violation by speed up the cell delay. This can be done by cell
adjustments.

Most easy way is to swap high VT threshold cell by a low VT one since the drain current is much
higher in a low VT device thus the discharge/charge can happen faster. But the low VT cell also
consume a lot more power.

This is a preferred way if you don’t want to disturb routing around the cell, especially in ECO phase.

We also swap low drive strength cell in the critical position into high drive strength. The cell size
could be bigger, so it is a disturbance to the placement and routing. Bigger cell also has larger pin
cap, so it will increase the total capacitance seen by previous driver. We can use slew degradation
to determine if a cell is good for this type of swap. That is, if the cell output transition is worse
than the input transition, then this cell has not reshaped the waveform as we expected, so we can
size this up.

Physical implementation tools such as logic synthesis tool or place and route tool tend to insert a
chain of buffer into the design for various reasons. In many situations, the number of buffers are
more than what we actually need. This can contribute a lot unnecessary cell delay to the total
path delay. If some cell has a large fanout along the path that usually results in a large load for the
driver, so we can add a buffer to share part of the loads and reduce the total cap seen by the
original driver. If only a few endpoint of a large fanout cone has violation while other endpoint
has tons of positive slack, we can create a dedicated buffer for the failing endpoint so the critical
path can be isolated and see much less load on the net.

Place and route tool sometimes did bad routing on critical nets. If we know certain net needs to
travel long distance and the timing is critical, we can set net routing layer constraints to make the
tool route that net on higher metal layer. Many of these wire routing improvement need control
in place and route stage, which will be covered by a later PnR course.

Logic manipulation can be interesting but dangerous as well. For example we know that for a logic
gate, the propagation delay from the input closer to the output is usually faster than the input far
way from the output. If we have a signal on critical path we can put it on the input closer to the
output. Logic replication is also an often used technique, the idea is very similar to load splitting
the buffer tree. The driver logic gate can be cloned and take some of the load away. For each logic
change we make, we need to make sure it won’t create logic equivalent issue.
Lastly clock tree manipulation is also widely used fixing techniques. If there are a lot paths from a
same clock source violating timing, we can try tie off the downstream clock tree element closer
to upstream driver to speed up the clock tree or add more clock buffer to slow down the clock
insertion delay. Understand the reduction in clock insertion delay could speed up the launch path
delay of current stage, but also reduce the capture path delay of the previous stage so it may
create hold violation when fixing setup. Designer needs to check both stage to see if there is
enough positive marge on the other side when the change is done.

[p125] Hold Timing Check

Note that the capture clock in cycle N is supposed to capture data launched from cycle (N-1). If
the data launched in cycle N is propagated to the capture clock before the clock sampling finishes,
new value will override the previous value so the functional is wrong. The hold timing check must
make sure the data launched by current clock edge does not override the data launched from
previous cycle.

To meet the hold timing check, following delay relation must be meet.

To make the requirement more restrictive, the clock uncertainty is being added to the RHS of the
equation.

As we can see from the above equation, the hold time check is independent with the clock period.
This is because the hold check is performed on the same edge of the clock waveform.

Thus, we can say the hold time is more critical than setup time, because if we violate setup time,
the chip will still work on a slower frequency.

But if we violate hold, the chip won’t work on any condition and that ensured a functional failure.

[p126] Topic 21: How is hold timing edge determined?

Even though this seems like an obvious thing, but do you really know how STA tool determines
the hold timing check edge?

The hold relations are determined according to that setup relation. There are two hold check
scenarios:

1) Data from the source clock edge that follows the setup launch edge must not be captured
by the setup latch edge. This one is depicted in the left hand side picture.
2) Data from the setup launch edge must not be captured by the destination clock edge that
precedes the setup latch edge. This one is described in the right hand side picture.

Every time when there is a hold check, STA tool will choose the worst hold time scenario from the
two. In this example, these two scenarios are essentially equal. But we will see cases where they
are not equal in later slides.

[p127] Interpret Hold Timing Report

Here is a sample hold timing check report.
It is still a path between same launch flop and capture flop as the example of setup path we
showed earlier, but the combinational path in between is different.

This is because the setup check has to use the longest timing path while the hold check has to
choose the shortest timing path.

So here the path type is min.

The path is launched at 0ns and captured at 0ns as well, indicating this is a same cycle check. In
this example, sine the library hold time is larger than the data arrival time, the hold requirement
has been met.

[p128] Fixing Hold Violation

Hold fixing is much easier than setup fixing. When fixing setup violations, there is a certain logical
function which must be implemented with less delay, and it often requires careful tradeoffs of
intrinsic and extrinsic delay characteristics, metal layer reroutes, cell locations, and other such
factors. On the other hand, fixing a hold violation is simply a matter of adding more delay. Adding
more delay anywhere along the hold violating path, and the violation is improved.

So where is the optimal location to add the hold buffer? Generally speaking, we can follow these
orders:

1. Find timing path with worst hold slack across all PVT

2. Choose the pins with maximum number of violating paths going through (bottleneck) as fixing
candidate  minimal number of buffers inserted.

3. Exclude pins with bad setup margin / negative slack, choose the one have good setup slack 
avoid setup/hold conflict

4. If other conditions being equal, it is preferred to choose fixing at Load pins rather than driver
pins. This is because adding delay cell at the load pin will not disturb driving cell and wire delay
too much and is more predictable.

[p129] Location of Hold Fixing

For example, in this picture, let’s assume the shorter path from the launch flop in top left to the
capture flop in the middle right has a min violation. There are three places we can put hold buffers
to fix this path. The location #1 will impact all the paths ending in the capture flop, which could
increase the max path delay for even paths originating from other launch flops; the location #2 is
better than location #1, but still has potential to impact the max path from the lower launch flop;
Hence only the location #3 only impacts the delay on the min path without increasing max path
delay on other non-relevant paths.

If the path originating from the lower launch flop is timing critical, then that means there is a
setup and hold conflict situation beyond location #2. That’s when we choose location #3 as the
fixing point; Otherwise, location #2 can also be used as an option.
[p130] Delay Calculation for Timing Path
In order to cover the worst corner case and guard band the design, when STA tool calculates the
launch path delay, it will use slowest possible delay number a cell could have in current operating
condition; on the contrary, it will use fastest cell delay value for capture clock path. There are
multiple sources of delay variation for a cell.

First it comes with the process variation mentioned in earlier chapter. Same type of cell could
have different delay values due to oxide thickness variation, voltage threshold variation, channel
length variation.

Secondly, cells in different location can get different supply rail voltage and temperature. The IR
drop effect can depend upon a couple of factors such as how fast the cell is switching, how dense
is the power consumption in that region, how resistive is the supply net or how much decap exists
in nearby location.

Thirdly the variation can come from coupling effect with nearby nets. The cell transition can be
slower or faster depends on the switching direction of the nearby net. If the nearby net is
switching in the same direction as the net connected to the cell does, then the transition of the
net is also speed up; if the nearby nets is switching oppositely, then the nets transition will slow
down.

All of these effects has to be considered during STA analysis.

[p131] Clock Reconvergence Pessimism Removal

Clock Reconvergence pessimism removal is the process by which static variation between the
early and late arrivals of a clock edge is removed.

Static variation means the variation will not be different during the time period of the timing check.

For example, process variation can be a source of static variation for both setup check and hold
check but crosstalk noise can only be source of static variation for only hold check.

These early/late arrival of clock edge can be introduced mainly by two reasons:

1) Minimum/maximum arrival differences caused by logic Reconvergence

2) Minimum/maximum delay differences in the timing behavior of logic gates

[p132] CRPR through reconvergent logic cones

The big reason for clock Reconvergence pessimism refers to minimum/maximum arrival
differences caused by logic Reconvergence. If the clock path diverges at some point and go
through different logic then converge again together, then STA could use the max arrival time and
min arrival time for same setup or hold check.

Most of the time, the design is worked on one of the clock paths so it shouldn’t have different
arrival time. The re-converge point usually have selection logic to control which clock path is active.

However, without case analysis, STA will propagate both clock path and use them separately into
timing check.
For max timing path checks, it will choose longer path for launch and shorter path for capture.
Vice versa it will use short path for launch path and longer path for capture.

This brings artificial pessimism need to be removed.

[p133] CRPR through OCV

Another reason that may cause clock Reconvergence pessimism is max/min delay variation on the
common path of clock tree.

It is also called Common Path Pessimism.

One thing to notice is that the launch clock path could have shared portion with the capture clock
path. The way the clock tree is being built today creates scenario where one clock signal travels
on a few clock cells upstream but fan out to a lot endpoint downstream. So between flops starting
from same clock tree branch, they do have some common portion along their clock path.

So the cells delay on common path has been calculated twice:

For example, during max timing checks, the common path is assigned to large delay value during
launch path calculation and smaller value during capture path calculation.

But we know that the same cell cannot have two different delay values at the same time, so the
delay difference on the common path doesn’t exist in real world. They are just artificial pessimism
need to be removed from the timing path check equations.

However, not all source of delay variation on the clock common path can be removed. The CRPR
mainly refers to removal of process variation difference. Other two main sources, namely IR drop
and crosstalk, need to be handled carefully.

For max timing check like setup check, since it is between two different clock cycle, which means
the check happens at two time stamps, the switching activity could be very different even for the
same cell. Thus, we cannot simply remove the supply voltage variation and crosstalk effect blindly.

For min timing check like hold check, it is performed on the same clock edge, which means the
check happens only at a single time stamp, so the switching activity is exactly the same. Then the
supply voltage variation and crosstalk effect can be taken out as well.

[p134] Topic 22: CRPR and Clock Gating

This topic may not directly relate to timing analysis, but is good to know. Usually, each clock gate
is gating off a certain cluster of downstream sequential elements. Different clock gates are
managing the dynamic power for different functional units. So even if these clock gates are
originating from the same upstream clock source, they are naturally the diverging point of the
clock tree.

There are some choices to place a clock gate: generally, it can be placed closer to the clock source,
or be placed near the leaf cell, or somewhere in between. In the first case, if the clock is placed
near the source, what will happen? Well, the clock is being gated from the trunk, so fewer clock
gates will be enough to shut off large area of flops in the design, and large portion of the clock
tree is gated off. This gives great power saving when the functional feature is not in use. But from
timing perspective, the clock gates are upstream which means the clock tree will be diverging
early up the tree. The common path to the leaf cells is shorter so cells in different clock subtree
could see larger source latency variation since the common path pessimism is minimal.

[p135] (cont’d)
On the contrary, if the clock gates are placed near the leaf cells, more clock gates are needed to
gate off the same amount of flops as the first case. Also, since the clock gates are near leaf end,
only smaller portion of the clock tree will be shut off. The power saving will not be as good as the
first scenario. However, from timing perspective, since the clock tree diverges only when it
reaches the leaf, large portion of the clock tree is common to the downstream flop. Source latency
variation on the clock paths for these flops will be less than the first case since the common path
pessimism is at its maximum and can giving timing credit back.

[p136] Recovery and Removal Timing Check

Recovery and removal checks are another pair of max/min timing check. Unlike setup and hold
timing check, they work on signals which does not have a fixed phase relation with system clocks,
namely, the asynchronous signals such as set/reset. In such situation, we need the insertion of
set/reset signal to be stable for a short period of time before the sampling clock edge and also
stay unchanged for an amount of time after the sampling edge to avoid go into metastable status.

The time it requires before the active clock edge is called recovery time and after the active clock
edge is called removal time. Similar to setup and hold checks, STA will use max cell delay in launch
path, min delay in capture path for recovery check and min cell delay in launch path, max delay in
capture path for removal checks.

[p137] Multi-cycle Path

So far, we have talked about the case when setup check is performed between current clock cycle
and the next clock cycle. This means all the data processing need to be done in one clock cycle. In
reality, this requirement may not be true for every path depends on the functional intention. In
some cases, the output value of a flip flop is intended to be sampled every N cycles. Thus we need
to code a relaxation to make STA engine understand the requirement.

The SDC command set_multicycle_path -setup can be used to either move forward the capture
clock edge or move backward the launch clock by a specific number of cycles from the default
check edge.

By default, for setup check, STA tool will move the check using capture clock period except you
explicitly forcing the tool to use launch clock period by using –start options. In this case since the
launch clock is running at the same frequency as capture clock, it makes no difference; but later
in the course we will see different behavior when the transaction happens before slow to fast
clock domain or fast to slow clock domain.

[p138] Topic 23: multi-cycle hold path

As a complimentary constraint to multicycle path setup constraints, multicycle path hold
constraints is equally important. SDC command set_multicycle_path –hold means to either move
the hold capture edge backward to align with the launch edge or push the launch clock forward
to align with the capture edge depending on the option.
By default, the hold check edge will be moving along with the setup capture edge when the
multicycle path constraint is applied. Say, if the setup capture edge is moved by N cycles, the hold
capture edge is also moved along with it. In general, this move makes the hold time requirement
much difficult to meet. Let’s see how the new hold check edge is determined and why it is over-
taxing for the multicycle setup path.

[p139] (cont’d)
After the setup capture edge moved N cycles, there are two scenarios for the corresponding hold
checks. Case number one, the data launched from the next following cycle must not override the
data on current setup capture edge. This case is shown on the left hand side.

Case number two, the data launched from the current setup launch edge must not override the
data on the next following capture edge. This case is shown on the right hand side.

The STA tool will pick the worst case from the two. In case where the launch clock and capture
clock running at same clock frequency, they are same. But in case for slow/fast or fast/slow clock
crossing path, situation will be different.

[p140] (cont’d)
Keep in mind that since this is a multicycle setup path of N, which means the data shouldn’t be
change for at least N cycle. Checking hold at the next cycle will be no difference as checking hold
at cycle N. So we don’t need to check the hold for the most restrictive case. Actually, to maintain
valid data crossing, all hold check can be relaxed to zero cycle check. So the new hold edge will be
pushed by the multicycle hold path by N-1 cycle to align with the zero cycle launch edge.

In this picture, the multicycle setup constraint pushes the capture edge to be 3 cycles away. The
corresponding hold is pull back the hold checking edge by 2 cycles, after which the hold checking
edge is aligned with the launch edge in the same cycle.

[p141] Multi-cycle Path Timing Report

Here is an example of a multicycle setup check timing report. Everything is same with the single
cycle case except for the timing begins the capture edge. The clock period of CLK_B is 0.8ns and
we have set a multicycle of 3 for setup, so the capture clock now begins at 2.4ns. The previous
failing single cycle path now is passing with a large positive slack of 1.40ns.

From the waveform, we can tell the new capture edge has been moved to the 5th cycle, which is
3 cycles away from the launch clock edge.

By default, STA then is going to find most restrictive edge combination for hold check. In this case
it would be launching from 2th cycle and capture at the 4th cycle.

But let’s recall that the output value of this flop will only be used every 3 clock cycle. Since it is
being sampled on cycle 2 and cycle 5, there is no point to check the value on cycle 3 or cycle 4. In
another word, we don’t care about the value in cycle 3 and cycle 4 even if they get override. So
meeting the hold check requirement at same cycle of the launch data is always good enough to
ensure the correct functionality. That’s why we have set_multicycle_path –hold 2.

As a matter of fact, every set_multicycle_path -setup N should be accompanied by a

corresponding set_multicycle_path -hold (N-1)
[p142] Topic 24: Timing Exceptions
What is timing exception?

By default, PrimeTime assumes that data launched at a path startpoint is captured at the path
endpoint by the very next occurrence of a clock edge at the endpoint. For paths that are not
intended to operate in this manner, you need to specify a timing exception. Otherwise, the timing
analysis does not match the behavior of the real circuit.

Types of timing exception:

There are three types of timing exception: 1) False path, 2) Min/Max delay and 3) Multi-cycle
paths.

 A false path is a logic path that exists but should not be analyzed for timing. Declaring a
path to be false removes all timing constraints from the path. STA does not report it to be
a violation no matter how long or short the path delay

 Min/Max delay constraints is to override the default maximum or minimum time with
your own specific time value. By default, PrimeTime calculates the maximum and
minimum path delays by considering the clock edge times. By using set_min_delay or
set_max_delay, it forces the tool to ignore the clock relationship.

 Multicycle path needs to be specified when there are more than one clock cycles required
to propagate data from the start of a path to the end of the path. It relaxes the default
timing check behavior of the STA tool.

Efficiently specify the timing exceptions:

If a timing exception is specified on a particular set of pins, it needs to keep track of exceptions
on registers, pins, nets. This makes the command less efficient. To efficiently specify the timing
exception and reduce analyze time/run time, following below method:

Before using false paths, consider using case analysis (set_case_analysis), declaring an exclusive
relationship between clocks (set_clock_groups), or disabling analysis of part of the design
(set_disable_timing). These alternatives can be more efficient than using the set_false_path
command

If set false paths must be used, avoid specifying a large number of paths using the -through
argument, by using wildcards, or by listing the paths one at a time.

[p143] (cont’d)
There are some rules for applying the timing exceptions.

Rule #1 Exception Priority

In case of conflicting exceptions for a particular path, the timing exception types have the
following order of priority, from highest to lowest:

set_false_path

set_max_delay and set_min_delay

set_multicycle_path

That means if there is set_false_path and set_multicycle_path constraints set for the same pair of
start point and end point, the set false path will take the precedence. The path will become totally
excluded from timing verification.

Rule #2 More restrictive one dominants

For same type of constraints, more restrictive one wins and override the less restrictive one.

For example, if we have set multicycle 5 and set multicycle 3 working on the same path, then since
the multicycle 3 is more restrictive than 5, the path is constrained to be a three cycle multicycle
path.

Rule #3 More specific constraints dominants

If two constraints worked on the same path, the constraints with more specific condition wins.
For example, if we have applied a global multi-cycle constraint from clock A domain to clock B
domain, but we also have a second constraint specify a different multi-cycle number through a
particular pin between the two clock domains, then the one with the through pin take priority.

[p145] Topic 25: Bottleneck Analysis

A good command in Primetime to use is report bottleneck analysis. This command reports
information about timing bottlenecks in the current design. A bottleneck is a common point in
the design that contributes to multiple violations.

It reports a list of top cells with the highest bottleneck cost where bottleneck cost is defined as
the number of violating paths through cell. This means if these cells have better delay, the path
delay on a lot of timing paths could be improved. To first address these cells, it provides us the
quickest way to cut down the TNS number of a design.

For example, in this picture, the NAND gate in the middle is on all the four paths between two
sets of startpoint and endpoint. If these four paths have timing delay issues, by fixing this common
gate it can benefit all of them. The report bottleneck command can be used to create a sorted list
of the common gates which should analyzed first.

[p145] Chapter Summary

In this chapter, we go over the basic timing verification methods along with some design rule
constraints. Especially, the setup and hold timing check is the most important checks we introduce.
We have learned how to read the timing report. The location of hold fixes is critical to avoid the
setup/hold conflicting paths. Clock Reconvergence pessimism is another import concept, which
reduces the unnecessary pessimism built into the STA analysis. Lastly, multicycle path and some
other timing exceptions are discussed. They are important constraint for the designers to relax
timing verification based on design architecture usage.

Chapter 5 Special Timing Checks

[p147] Multi Clock Domain
In chapter 4, our focus is timing path being launched and captured by the same system clock. Let’s
imagine what will have happened when in some design, the logic needs to talk between clocks
having different speed? Of course, a big assumption must be made that the no matter how
different the launch clock and capture clock is, they must have a fixed phase relation so the timing
path behavior is repeating and predictable. With that being said, the two clocks can only be in two
ways: one clock’s period is an integer multiple of the other, or they are not share an integer
multiple relation. For both cases, STA use this two-step strategy to analyze:

[p148] Integer Multiple

First, it will expand both clock to their minimum base period. STA is performed by computing a
common base period among all related clocks. The common base period is established so that all
clocks are synchronized. For example, CLK_A waveform will be extended 4 times to be aligned
with CLK_B.

Secondly, the STA engine will find the most restrictive launch and capture edge combination to
guard band the design.

In case of one clock has integer multiple clock period of another, STA only need to extend the fast
clock to align with the slow clock.

[p149] Non-integer Multiple

A little complex situation is when the launch clock and capture clock are not sharing integer
multiple relation between them. For example, the CLK_B has a clock period 3 times the CLK_A
while CLK_C has a clock period 2 times the CLK_A. STA will expand both clocks to their minimum
base period, which is 6 CLK_A cycles.

Then it is looking for the most restrictive setup and hold check edges.

By default, STA will think every two clock are synchronous clock even though they are not. So one
thing to be noted is if you missed some timing exceptions between two asynchronous clock, STA
will still expand those clocks and come up with very large and weird number of minimum base
period. If you see this, most of the time it means you are missing a set_false_path or
set_clock_group command to disable those false timing paths.

[p150] Slow-to-Fast Clock Crossing

Let’s look at some example of how multicycle constraint get applied on cross clock domain paths.

Let’s assume a timing path is traveling from CLK_B to CLK_A. CLK_B is a slow clock whose clock
period is 4 times the period of CLK_A. From what we have learnt in previous slides, STA tool first
expanded the CLK_A 4 times to align CLK_A with CLK_B. By default, the most restrictive launch
and capture pair for setup check is from first CLK_B launch edge to second CLK_A capture edge.

Most restrictive launch and capture pair for hold check is from first CLK_B edge and CLK_A edge.

[p151] (cont’d)
In this case, since the launch clock and capture clock have different data rate, the value from
CLK_B may not be needed immediately by CLK_A in the very next cycle. Ideally the launch flop
need to drive the signal steady for at least N cycle before it got sampled. Thus, it is highly possible
to relax the setup requirement and give more time to the data transmission.
Say we want to sample the data every 3 cycles; we can use set_multicycle_path –setup 3 –from
CLK_B –to CLK_A. Then the setup capture edge is at where we want, but hold capture edge also
moved along with it.

[p152] (cont’d)
The first thing to do with hold check is to determine the right launching and capturing edge.
Providing the setup multicycle is 3 cycles, we have two possibilities for the hold.

Case #1, data launched from the source clock edge that follows the current setup launch edge
must not be captured by the current setup capture edge.

Case #2, data launched from the current setup launch edge must not be captured by the
destination clock edge that precedes the setup capture edge. It is more restrictive than case #1
for a slow to fast clock crossing.

[p153] (cont’d)
But since the driver is stable for at least 3 cycles, we don’t need to ensure such stringent hold
requirement. In another word, we don’t care if the value captured by CLK_A in cycle #2 or cycle
#3 is overwritten by data launched from CLK_B in cycle #1.

We have to use setup_multicycle_path –hold 2 –end move the hold capture edge back to be the
first clock edge so the hold check remains a zero cycle check.

Note there is a -end option in the hold MCP specification. –end means to use the endpoint clock
cycle as a reference when moving the edge. In this case, since the endpoint clock is CLK_A, this
tells the STA tool to move the hold capture edge 2 CLK_A edge backward.

Note that by default, the -end is the default for a multicycle setup constraint and the -start is the
default for multicycle hold constraint.

[p154] (cont’d)
Here is an example for slow to fast clock crossing timing report. The clocks in discussion is clock
M and clock P. the clock M has a period of 20ns and clock P has a period of 5ns. On the left hand
side, the setup path is launching from the rising edge of clock M at time 0 to the rising edge of
clock P at time 5ns. Which indicate this is a path without any multicycle path relaxation. The right
hand side is the correspond hold check timing report. The data launching from clock M at time 0
is captured at clock P at time 0. You can try apply some multicycle path to both setup and hold
side and think about how the timing report will look like.

[p155] Fast-to-Slow Clock Crossing

Similar to slow to fast clock crossing, we can have path from fast clock domain CLK_A to slow clock
domain CLK_B. As shown in below figure, the most restrictive launch capture combination after
expanding CLK_A for setup check is from 4th rising edge of CLK_A to 2nd rising edge of CLK_B. the
most restrictive hold check is from first CLK_A edge to first CLK_B edge. However, in such
transition the data launched from the CLK A is usually hold steady for some cycles for the CLK B
to sample. So the data transmission could be counted between multiple cycles.
[p156] (cont’d)
Value launched from CLK_A may not be needed immediately by CLK_B in the very next cycle. As
above, we relax the launch to happen at cycle #3 so it can be captured 2 cycles later. We can use
set_multicycle_path –setup 2 –start –from CLK_A to CLK_B. New setup launch edge is 2 cycles
away from the CLK_B capture edge.

Since path is traveling from CLK_A to CLK_B, we use –start to indicate the reference clock for
moving the edge is CLK_A. From above two example, we can tell that the edge movement number
usually specified by using the clock period of faster clock. This is because this way gives greater
timing gratuity to model design function. If we set up MCP relative to slow clock, the path will be
too relaxed to reflect what the timing requirement really need to be.

[p157] (cont’d)
For the corresponding hold check, we can find the worst one from the two possibilities listed as
the two pictured below. We can use the same method discussed previously. The conclusion is
during a fast to slow clock transition, the worse hold scenario is when data launched from the
source clock edge that follows the setup launch edge must not be captured by the current setup
capture edge.

[p158] (cont’d)
Now since the data will be held stable for 2 cycles, we can relax the hold requirement by moving
the hold check edge one cycle later. In another word, we don’t care if the value launched from
CLK_A in cycle #4 is overwriting data launched from cycle #3.

We can use set_multicycle_path –hold 1 -from CLK_A to CLK_B. Note that the hold check is moving
the launch clock by default, so we don’t need to specify the –start switch explicitly.

New hold launch edge is now aligned with capture edge to ensure data launched from 0ns is not
overriding the previous capture edge.

[p159] (cont’d)
Here is an example of setup and hold check timing report when there is no multicycle path applied
between clock M and clock P. on the left side, the setup check happens between time 15ns and
20ns, which indicates it is launching from 4th launch clock and captured by the 2nd capture clock.
The corresponding hold check is still a zero cycle check. You can exercise out some multicycle
constraints and see how the timing report should become.

[p160] Clock Domain Crossing (CDC)

At the beginning of this chapter, I said the launch clock and capture clocks must have a fixed phase
relation. But what if we really need to transfer data between two asynchronous clocks?

As I mentioned in the introduction of this course, this cannot be verified by STA engine since the
datapath behavior is not deterministic anymore.

This is usually called a clock domain crossing problem. If the clocks don’t share a phase
relationship, the arrival of the launch clock and capture clock will not be deterministic relative to
each other.
This means setup and hold timing requirement could potentially vary in every cycle. This will easily
cause Metastability which needs to be resolved using synchronizers.

There are two typical scenarios, one is we need to transfer single digit or only a few data across
two clock domain. In this case the simplest way to avoid Metastability is to have two flops clocked
by destination domain in series. The data captured by the first flop will go into Metastability status,
but it has a very large change to resolve the Metastability and settle down into a stable value after
a while, usually within one clock cycle. Then in the next cycle, the second synchronous flop will
see a steady input. If the Metastability still not resolved by the first flop, we can add more flops
in series to reduce chance of Metastability at the cost of additional latency.

The second scenario is when we need to transfer high volume of data between two clock domains.
Handshaking protocol or asynchronous FIFO structure is usually used under this circumstance.

For example, a simple handshaking mechanism is shown on the RHS. Domain 1 first put a high
volume of data onto the data bus, then it sends a request signal to domain 2. The request signal
can be synchronized by using the 2-flop synchronizer. Once domain 2 received this request, it will
store the value shown on the into local flops. Then domain 2 will issue an acknowledge signal back
to domain 1. Acknowledge signal can also be synchronized by 2 flop synchronizer. After domain 1
received acknowledge signal, it then can change value on the database to send the next word.

The disadvantage of handshaking is the time to synchronous req and ack signals for each word is
added up into the total latency of the data transfer. FIFO technique can transfer high volume of
data while maintain a low latency. However, the FIFO structure is a little bit complex and out of
the scope of this course so we will not talk about it here.

[p161] Topic 26: Source Synchronous Clock v.s. System-synchronous Clock

In most cases, the design STA is performed at is system synchronous clock design where the
system clock is distributed from a central location to all modules. As shown in the picture, clock
tree is carrying the clock to different leaf cells in both modules. System synchronous clocks is
usually used to transfer data in synchronous clock domain. Clocks are distributed to both modules
from an external system clock source and the clock goes through different routes than the data
signal need to be transferred, thus exposed in different on chip variation. In such scheme, the
clock path delay is unbalanced with data path delay, which results in larger setup requirement.
What’s more, the clock skew to two different modules could be uncertain, which leads to
increased uncertainty to the data transfer.

[p162] (cont’d)
The main issue for system synchronous clock scheme is the clock uncertainty needs to be
controlled reasonably well so the max timing requirement can be met. This is usually done by the
clock tree synthesize stage during place and route. However, if we send the clock along with the
data signal, it could potentially reduce the max timing requirement since the clock propagation is
in the same direction as of data. So it could potentially be used in faster clock design. Because the
clock nets and data signal are send together and routed together, the clock path will be exposed
to the same on-chip variation source as the data path.
Besides, it can be used to transfer data between asynchronous clock domains and need
synchronizers to further transfer data from the source clock domain to the destination clock
domain.

[p163] Latch Timing

Latches are level-sensitive sequential elements, which means the state of output depends on the
logic level of clock signal. When the clock is in active phase, the first transmission gate is open,
and the path from D pin to Q pin is just buffers. We call this latch transparency. When the clock is
inactive, the first transmission gate shut off, and the value is hold inside the back to back inverter.

Note that If the data arrives after the clock open edge, the cell propagation delay is from the D
pin to Q pin. The latch just behaves like a buffer. There is no clock uncertainty penalty across the
latch.

If the data arrives before the clock open edge, then the data need to wait for the clock open edge.
So the cell propagation delay is from clk pin to Q pin.

From timing check point of view,

1) When data arrives before clock open edge, the latch behaves just like normal flip-flop. The path
slack is positive.

2) If the data arrives during the transparency edge, the data can still be captured correctly at the
cost of eating up into the transparency window. We call it borrows into the capture clock. The
more it borrows, then less time it left for the next cycle to complete its operation.

3) If the data arrives after the closing edge of the latch, then it means the datapath is too slow,
there is no way for the capture flop to get the correct data value from the launch clock. The entire
high phase is borrowed by the current cycle.

From this explanation, the ultimate goal to use a latch is to expect it working in the transparency
mode but not borrowing too much to cause failure in next cycle.

[p164] Latch Timing

So why do we use latch in the design? Imagine if the entire design consists of only flops, which
means the allows maximum operation time for each datapath are almost equal to the clock cycle,
then all of the datapath in the design need to have very balanced path delay. In some situation,
we could have a large path delay during one cycle, but very little delay in the next cycle. Without
latch, RTL owner need to think of a way to retime the register so it can satisfy setup check. But
with the latch in design, the retiming is automatically done by latch time borrowing.

For example, in above picture, the CLK_B is inverted from CLK_A.

We can see that this is in total a 1.5 cycle path where the first segment is a full cycle path between
the first flop and the latch in the middle.

The second segment is between the latch and the last flop, and it is a half cycle path since the
capture flop is clocked by the inverted version of CLK_A.
Since the first segment consists more logic gates, let’s assume it has larger path delay while the
second segment is a very short path with minimal delay.

If the latch in the middle is a hard-edge flop, then the first segment will probably fail setup check
and the second path will pass setup with plenty of positive margin.

Because it a latch, the first segment can “borrow” into this latch and use the positive margin on
the other side while not create any timing violation.

[p165] Latch Timing: automatic retiming

For example, in above picture, the CLK_B is inverted from CLK_A.

We can see that this is in total a 1.5 cycle path where the first segment is a full cycle path between
the first flop and the latch in the middle.

The second segment is between the latch and the last flop, and it is a half cycle path since the
capture flop is clocked by the inverted version of CLK_A.

Since the first segment consists more logic gates, let’s assume it has larger path delay while the
second segment is a very short path with minimal delay.

If the latch in the middle is a hard-edge flop, then the first segment will probably fail setup check
and the second path will pass setup with plenty of positive margin.

Because it a latch, the first segment can “borrow” into this latch and use the positive margin on
the other side while not create any timing violation.

[p167] Latch Timing (cont’d)

This is a timing report for a path between two latches. The first line of “time given to startpoint”
appears in the datapath stands for the time borrowed from the previous stage, so the path has to
start after accounting in this period of time. The line “time borrowed from endpoint” in the
capture clock path stands for the amount of the time this path needs to borrow from the next
stage to meet setup timing requirement.

In the time borrowing information section, we can see the maximum allowed time one latch can
borrow is determined by the half phase of the clock period minus the library setup time of the
latch, which is 0.33ns.

In this case, we only need 0.11ns to make the path meet setup requirement, so the actual time
borrow is 0.11ns. In the end, since the endpoint latch is in transparent phase now, it should
behave just like a buffer, the STA tool will give back the clock uncertainty deductible back as credit,
so the real time borrowed from the endpoint latch is only 60ps.
[p166] Topic 27: De-skew / Lockup Latch
De-skew latch or lock-up latch is an often used technique for hold protection. It can be extremely
useful for situations like slow clock domain transmission or communication between Intellectual
Property blocks. Let’s say if there is a path as shown in the left hand side. Notice there is a clock
skew between the capture clock and the launch clock and in this case, the capture clock arrives
much later than the launch clock. This puts on a harder requirement for the hold time check to
meet. The data transition must be slow enough to arrive later than the clock capture edge which
has a large clock skew. To fix this hold violation, we can throw in buffers to delay the data path,
but if the clock skew is too large, it will be too many buffers needed, which will be bad for power
and area.

On the left hand side, a de-skew latch is added onto the data path. Note it is triggered on the
negative edge of the clock. This de-skew latch will be transparent during the normal clock low
phase, allowing data to passing through. But it will stay closed in the high phase. So any data
transaction from the launching flop will have to wait for the latch opening. This essentially adds
half cycle delay onto the datapath and greatly help to satisfy the hold requirement even the clock
skew if large.

[p168] Clock Gating

Clock gating technique has been widely adopted in today’s ASIC design. There are two types of
clock gating. the clock gate can be inserted by the architecture designer and come from RTL itself,
or, they can be inferred by logic synthesis tool automatically.

[p169] Inferred Clock Gating

A classical scenario where clock gates are inferred by tool is when the synthesis tool find
opportunity to reduce power and area for a bank of load-enable registers. Due to the functionality,
these register need to hold value if the enable of the mux in front of it does not chose new data.
If such enable register forms a large bank, the mux array consumes a lot die area, routing
resources. And even if the enable signal is turn on so no data change happens on the D pin of each
flop, the clock pin is still toggling and causing internal power consumption within the flop bank.

If we analyze the logic careful, it’s not hard to find that we can move the enable from the each
individual datapath onto the clock path. This way all those mux structures can be eliminated and
area overhead is gone. Clock switching also stopped after the enable is turn on, which saves
dynamic power.

No matter it is architecture clock gates or inferred clock gates, it must satisfy certain timing
requirement in order to generate good quality clock for downstream logic. This timing
requirement is the arrival time constraints between enable signal and the free-running clock. For
example, on the RHS picture, if the enable signal changes value during low phase of clock, the
gated clock will be a clean cut new clock afterwards. But if the enable signal changes value during
the high phase of the clock, there will be an extra pulse glitch. This extra pulse may cause
unwanted reaction in the design and even cause function failure.

[p170] Clock Gating Check – Active high

The clock gating setup and hold check are intended to validate that the gating pin transition does
not create an active edge for the fanout clock.
Even though they are called setup and hold check, they are completely different checks than the
regular datapath setup and hold check.

If the output of the gating cell is enabled by a high level control signal, such as AND or NAND gate,
we call it active high clock gating check.

The active-high clock gating setup check requires that the gating signal changes before the clock
goes high.

The active-high clock gating hold check requires that the gating signal changes only after the
falling edge of the clock.

One can see that the hold time requirement is quite large, this can be resolved by using a different
type of launch flip-flop, say, a negative edge-triggered flip-flop to generate the gating signal.

[p171] Clock Gating Check – Active low

Similarly, if the gating cell is enabled by low level signal, such as OR gate, NOR gate. Then it is an
active low clock gate.

The active-low clock gating setup check requires that the gating signal changes before the clock
goes low.

The active-low clock gating hold check requires that the gating signal changes only after the rising
edge of the clock.

Both active low and active high clock gating cells are basically saying the enable signal should only
change value when the clock signal makes the gating cell inactive so it won’t cause unwanted
extra pulses downstream.

[p172] Glitch-free Clock Gating

If we satisfy the clock gating setup and hold check, there shouldn’t be extra pulses generated
because of different arrival time between the free-running clock and enable signal.

However, if the enable signal has some glitch due to other reason, then the traditional clock gating
with single gating cell could propagate this unwanted glitch downstream.

To prevent this, normally we use a glitch-free clock gating structure. Assuming the flop is rising-
edge triggered, now the enable signal will first go into a active-low latch, the output of this latch
is then used to control the gating cell.

If the glitch happens during high phase of the CLK_B, it will not be stored into the latch since the
latch only be transparent when CLK_B is low.

If the glitch happens during low phase of the CLK_B, it will pass the latch, but will only changing
value in the inactive phase of the clock signal.

So, by adding this latch, the final gated clock signal will be glitch free and only the useful clock
pulse is left.
[p173] Data-to-Data Check
In case where we need to monitor the arrival time between one signal with respect to another
signal, data-to-data check can be applied. set_data_check command can be used between any
two arbitrary data pins. Conceptually it is similar to regular setup and hold check: One pin is the
constrained pin, which acts like a data pin of a flip-flop, the other pin is the related pin, which act
like a clock pin of a flop. The only difference is now the data-to-data setup check is performed
between the same edge of launch and capture clock.

For example, the data pin A of the AND gate is generated through a flop clocked by CLK_A and
going through some random combinatorial logic; pin B is generated through a flop clocked by
CLK_B and also going through some another cone of combinatorial cloud. Now if we can the data
value on A pin to be hold steady for an amount of time before or after the switching on data pin
B, a data check constraint can be applied as shown.

First, the B pin must be constant at least 0.2ns before A pin switches. According to the definition,
B pin is like the clock pin and A pin is like data pin, so we can say set_data_check –setup 0.20 –
from UAND/b –to UAND/a.

Secondly, the B pin must be constant at least 0.1ns after A pin changes value. So we can derive
set_data_check –hold 0.10 –from UAND/b –to UAND/a.

By default, since the setup check is performed on same edge of both launch and capture clock,
the default hold check launch edge will be one cycle ahead of the default hold capture edge.
According to the design intention, we can either move the hold launch back by specify a -1 hold
multicycle constraints or use another data setup check to reversely constrain pin A and pin B.

[p174] (cont’d)
Here is a timing report for the data to data check. Note that it is from A2 pin to A1 pin, so the A2
pin is treated as the related pin and A1 pin as constrained pin. The data check setup time is set to
0.1ns from the command and is being deducted from the data arrival time on A2. On the other
hand, data arrival time on A1 is calculated just as normal and then compared with the adjusted
A2 arrival time. The slack is the delta between the two arrival time.

[p175] Point-to-Point Check

Point-to-point delay check provide flexible way to monitor the minimum and maximum path delay.
This check is useful in case we have requirement for some path delay to be in a certain range.

Specification for this check is very simple, if we want to constraint all path coming from first flop
to be within 2ns, we can say set_max_delay 2.0 –from UDFF1/Q.

If we want to constrain path delay from point A to point B, we can say set_max_delay 1.0 –from
A –to B.

Also, we can specify the minimum time required between two points. For example, if we want to
make all timing path ending up in the second flop to be at least 1ns long, we can say set_min_delay
1.0 –to UDFF2/d.

Normally, in a synchronize design, the assumption is design is completely constrained with respect
to clocks. So set_max_delay and set_min_delay is not recommended to use in most situations.
These two commands are mostly used in async signals.

[p176] (cont’d)
This is the timing report for point to point delay check. For comparison, the normal timing report
is listed on the left. The original timing path is between the arc CLK to D. with the new set max
delay constraints we added, now it is checking the arrival time only to the A pin of cell u6561.

[p177] Chapter Summary

In this chapter we went over the timing checks between multiple clock domain which are running
at different frequencies. The slow to fast clock crossing and fast to slow clock crossing multicycle
constraints are discussed and we find that the hold multicycle still follows N-1 rule with respect
to the setup multicycle constraints. Then we introduced source synchronous clock scheme, which
is to propagate the clock along with the data path, which gives more relaxed max timing
requirement. De-skew latch is also introduced, which has been widely used for hold protection
when the clock skew is large. Clock gating check is another important timing concept to make sure
the glitch is filtered out and won’t be propagating downstream to the clock tree. Lastly, data-to-
data check and point-to-point checks are two complimentary checks to give us more flexibility to
constrain the design.

Chapter 6 Crosstalk & Noise

[p179] Incurrence
We know from high school physics class that two parallel metal piece will form a capacitor. The
same holds true for today’s ASIC chip. As the technology node shrinks, manufacture pitch become
smaller and the metal layers become thin and tall, the capacitance between two nets become
much larger than before due to greater sidewall coupling. What’s more, today’s ASIC is working
at a much lower supply voltage but high frequency, which makes the design less immune to the
noise. Crosstalk is more of a concern between two or more long parallel nets with high switching
activity and low drive strength.

[p180] Category
SI issues result in two primary failure modes: functional failure due to glitch on a steady signal or
timing failure due to delta delay on a switching signal.

Usually when we talk about crosstalk, the one causing the unwanted switching is called aggressor
while the other is called the victim.

Note that they could switch place if we upsize the victim net driver too much so the victim can be
a new aggressor to the original aggressor net.

[p181] Glitch
First, let’s look at the glitch issue on the steady victim net. Glitch issue could lead to functional
failure by sending wrong value to the downstream logic.

In this case, assuming there is a rising switching activity appears on the aggressor net, the node
voltage on the ground capacitance of victim net will be charging up through the coupling
capacitance.
But since the victim net is driven by a steady value, eventually the charge will be released and
voltage level will be restored.

The magnitude of the glitch is determined by the coupling cap, the relative drive strength of
aggressor and victim and the ground capacitance of the victim.

The larger the coupling cap, the more charges transferred to the ground cap, and thus the glitch
height become taller.

The larger the ground cap, the more charges needed to build up the voltage, so the glitch height
become shorter.

Strong driver on aggressor can use higher glitch while strong driver on victim net can improve the
immunity to noise.

If the glitch magnitude is large enough to be captured as a different logic value by the downstream
cell inputs, such as the clock pin or asynchronous set/reset pin, it could result in real functional
failure.

If the glitch width is width enough to propagate through downstream cells and reach a sequential
cell input, it could also cause functional failure.

[p182] Type of Glitches

Generally, the aggressor can cause 4 types of glitches depends upon the switching direction and
existing logic level of the victim net. If the victim net is logic high and the aggressor is switching
from low to high, it will cause the voltage level go above the supply voltage, which is called a
“overshoot”. If the aggressor is switching from high to low, then it will bring the voltage level
down to some extent, which is called a fall glitch. Similarly, for a victim net that is originally stay
low, we could have undershot and rise glitch. STA tool will be checking all of these 4 types and
make sure the noise is all within acceptable range.

[p183] Glitch Propagation – DC noise margin

As long as the coupling capacitance is there, glitch is an inevitable problem. Fortunately, not all
glitches are dangerous that will cause functional problem.

So how to determine whether or not a glitch can be tolerated by a design? In today’s STA, we use
two types of noise margin to check against the glitch.

DC noise analysis only examines the glitch magnitude and is conservative.

A glitch below the DC margin limit will not cause a logic value change in downstream logic and
also cannot be propagate through the fanout no matter how large the pulse width is.

For example, as shown in LHS, the fanout of the victim net is a inverter. Output of the inverter will
remain low as long as input voltage is higher than VIH

Output of the inverter will remain high as long as input voltage is lower than VIL.

Thus, the VIL and VIH is the DC noise marge here.

[p184] Glitch Propagation – AC noise margin
The DC noise margin is a conservative bottom-line. But in reality, since each gate has limited
response bandwidth, a very narrow glitch doesn’t necessary propagate thru a gate even if the
magnitude is higher than DC margin. In general, a single stage cell with stop any input glitch which
is much narrower than the delay through the cell.

So instead of a clean-cut noise threshold, the AC noise margin takes glitch width into consideration
and defines a safe-zone of glitch.

[p185] Glitch Propagation – Load Impact

Another factor contributes to glitch propagation is the output load capacitance. If all other
conditions are same, the net with a larger group cap will have larger noise margin and less glitch.
The load capacitance of a gate act as a low-pass filter so that the high frequency component of
the glitch will be filtered out along the path.

[p186] Delta Delay

The second type of crosstalk effect is the crosstalk delta delay. This applies to the net which is
switching at the time of the crosstalk happens. Crosstalk delta delay could lead to timing failure.

The base delay calculation assumes that the driving cell provides all the necessary charge for rail-
to-rail transition of the total capacitance of the net, where Ctotal = Cground + Ccoupling.

[p187] (cont’d)
The aggressor switching in the opposite direction increases the amount of charges required from
the driving cell of the victim net and increase the delays for the driving cell and the interconnect
for the victim net.

The charge required to change the voltage difference on the coupling capacitance from +V to –V
essentially doubles the coupling capacitance in baseline delay calculation.

The aggressor switching in the same direction reduces the amount of charges required from the
driving cell. The delay for the driving cell and interconnection are also reduced.

Since there is no voltage difference on both side of the coupling cap, no charge needed for the
coupling cap so essentially it cancelled out the coupling cap effect from the base delay calculation.

[p188] Multi-Aggressor & Timing Window Overlap

It is possible to have multiple aggressor attacking the same victim net and the same time. How
does STA tool calculating the worst crosstalk it could have?

Recall what we have talked about, in GBA mode, each signal has an arrival time window that
propagated along the timing path. The arrival time window or simply called timing window
bounds the earliest and latest possible switching time one signal could have at that node. So the
calculation of worst crosstalk is simply by adding up each crosstalk effect from each individual
aggressor.

Noted that the signal integrity analysis is designed to be pessimistic.

The aggressor nets that can switch within the arrival time window of a victim net are all assumed
to switch in a direction that maximizes pessimism, as follows:
For minimum-timing analysis, the aggressors are all assumed to switch in the same direction as
the victim, making the delay of the victim net as small as possible.

For maximum-timing analysis, the aggressors are all assumed to switch in the opposite direction
as the victim net, making the delay of the victim net as large as possible.

[p189] Avoid Noise

In the final phase of design closure, we need to achieve a noise free design. Fixing noise violation
is as important as fixing DRC and timing violations.

To fix the noise problem, we can use techniques listed below:

1. Routing improvement

Keep aggressor away from the victim net to reduce coupling cap. (NDR rules)

Use non-default routing rule such as double spacing or double width, ground shielding to reduce
the coupling capacitance

2. Gate resizing

Upsizing the victim gate or downsizing the aggressor gate. Since downsizing usually could hurt
max timing paths, usually only upsizing the victim net is used. But if the victim is overly upsized, it
could become a new aggressor.

3. Buffer Insertion / Net Splitting

Most Effective way to fix noise violations. If the buffer is correctly selected, the noise problem can
be resolved without creating a new aggressor. Net splitting works on nets that have more than
one fanouts

4, Using HVT cells as Victim Receiver

HVT cells has a higher VT threshold and thus higher noise margin. Small glitch will just be filtered
out automatically. Replace the original receiver cells on the victim net into a HVT device is you
have positive timing margin.

5, Guard Ring

Usually applies on the partition boundary or the surrounding area of sensitive circuitry. Guard
Ring essentially serve as a shield to the portion of the circuit being protected.

[p190] Chapter Summary

This chapter give a brief introduction about the signal integrity issue: crosstalk and noise. We have
looked at the type of glitches, DC noise margin and AC noise margin. The delta delay effect and
multi-aggressor analysis. The timing window plays an import role during multi-aggressor delta
delay calculation. Last but not least, we have discussed the way to fix or avoid the noise problems
in our design.
Chapter 7
[p192] Process Variation
The variation in manufacturing result in structural changes in devices and interconnects leading
to permanent deviations in their electrical behavior. From a circuit design perspective, the process
variations can be divided into two major groups. The global process variation and on-chip
variation.

Global variation is also called die-to-die variation. Die-to-die variation have a variation radius
larger than the die size including within wafer, wafer to wafer, lot to lot and fab to fab variation.
These variations affect all the circuits within the die equally. Die-to-die variation of a parameter
can be considered as the die-averaged parameter mean deviation from its process mean target.

On-chip variation, on the other hand, refer to the variations that occur between various circuits
elements of the same die. They can be grouped into systematic and random variations.

So far, all the delay calculation mentioned earlier in our timing verification are deterministic. In
traditional corner-based STA analysis, library characterization for one particular corner is to use a
single data set to represent the delay value under certain circumstance. We know the PVT corner
is usually used to model the global process variation, but how can we use a single delay value to
model local variation all across the chip?

[p193] Statistical Spatial Effect

We can look at a timing path as a cascaded stream of statistical events. Every delay through each
individual timing arcs are represented by a mean and standard deviations, so as the path delay.
Along the timing path, the mean of path delay will add up and standard deviation of the path will
become larger and larger.

If the variation is coming from same source of process variation, these variations are addictive
and the standard deviation of the delay can be directly added up.

These variation is called systematic variation. They are design dependent such as layout proximity
effects, CMP related variations, IR drop, temperature map, etc.

Paths comprised of cells in close proximity exhibit less variation relative to one another. This
phenomenon is called Spatial Effect.

[p194] Statistical Cancellation

For variations coming from different source, they can be randomly happening and uncorrelated
with each other.

The random one or statistical components are related to the variations associated with the
processing equipment.

As the path stages increase, the probability of all gates along that path being simultaneously fast
or slow decreases.

Random variation will die out over logic stages. This phenomenon is called Statistical Cancellation.
[p195] Statistical STA (SSTA)
As a matter of fact, all these timing parameters such as cell delay, wire delay, timing window, path
slacks are statistical in nature.

For cell delay, Manufacture parameters such as channel length and threshold voltage have
variation due to Global and Local process variation. Delay through each timing arc can be
represented by mean and standard deviations.

For wire delay, Electrical properties such as metal thickness and dielectrics can have variations for
each metal layer. Delay through interconnections can be represented by mean and standard
deviations.

What’s more, since the metallization for each layer is done separately, the wire variation for each
layer can be independent from each other. Even adjacent layer can have totally different direction
of delay variation. This add another layer of complexity.

Since both cell delay and wire delay are statistical in nature, signal arrival time is also not
deterministic. Timing window can also be modelled statistically when calculating crosstalk

So now the path slack is also represented in form of mean and standard deviation. The passing or
failing criterion can be determined based upon the required statistical confidence.

Even though the SSTA captures the statistical nature of the manufacturing, it is technically
complex and need variation extraction support. The biggest knock against SSTA is the fact that
the characterization database is large and that run-times can be long. Large full-chip SSTA really
isn’t feasible today. In reality, ASIC designers developed enhanced deterministic STA tools to take
on-chip variation effect into account rather than use SSTA directly.

[p196] Global OCV & Derating

The simplest way to model on chip variation is to apply a flat global scaling factor for the delay
value. The derating can be applied for both cell and wire separately. The global OCV must use
worst case variation combination for the timing checks. For example, on the RHS picture, all the
launch path is using a derating factor of 1.15, which means the cell delay and wire delay values
are multiplied by 1.15 and make the path slower. On the capture path, capture clock path is
derated by 0.95, which makes the path faster. This combination makes the setup check more
challenge to meet.

The global derating is very conservative. It is unaware of spatial correlation since even cells
adjacent to each other could have very different variation. It also does not care about statistical
cancellation. The paths are derated exactly same across the board no matter how deep the logic
depth is. Global OCV method is a safe approach of applying the worst-case variation across the
entire chip but it could result in overdesign, reduced design performance, and longer timing
closure cycles, excessive design margins.

[p197] AOCV/LOCV (advanced/location-based OCV)

So to address above issue, an enhanced OCV called advanced OCV or location based OCV is widely
adopted.
First, STA tool will compute a bounding box of the timing path. It then will use the diagonal of the
bounding box as the distance criterion. Along with the number of logic gates as another dimension,
library characterization can come up with an derating table which reflects both spatial correlation
as well as variation cancellation. For example, the derating used for the upper path has a large
derating factor than the lower path, which is because the lower path has shorter distance and
more logic stages.

The AOCV method provides better accuracy and reduces pessimism on long paths (as well as
reducing the chance of errors in over-optimistic short paths). Of course, it comes with a cost:
these tables have to be built, and if each cell in the library has to be characterized for, say, N path
lengths and M positions in that path, that’s a N by M entry table for every cell that has to be
derived via SPICE.

[p198] Mode Analysis

One last thing I want to talk about in this chapter is the mode analysis. First of all, we need to
understand the concept of operation mode. Let’s use a computer as A life example.

We can use it for regular tasks like web-browsing, intensive jobs like over-clocked gaming, idle
mode like sleeping, or debug mode when running a system diagnose.

Similarly, for an ASIC chip, we also could have many different modes to run different tasks, each
mode requires its own clock configuration and timing constraints. The chip may run at different
frequency or part of the design is shut off while other part remains on. A user could write the
constraints for each mode individually, or write a set of constraints which are combined for
multiple modes.

At the meanwhile, even for the same mode, the chip could be running at different PVT corners
depending on the process variation, voltage supply variation and temperature change.

We define the combination of one mode and one corner as a scenario. Since STA tool can only
analyze one scenario at a time, the designers have to create scenario for each functional mode.

This is called the Multi-Corner Multi-mode analysis.

However, we can image, as the design become more and more complex and have all kinds of
functionality nowadays, the number of scenario explodes. This increases the difficulty in timing
closure dramatically since after we close timing in one corner, new timing violation in another
corner may pop up. So the design has to go through a long iterative process for final timing closure.

Therefore, people seek to merge some of the scenarios into one scenario and create a super mode
called merge mode. How to merge different mode into one single mode is another big topic which
will be covered by another course. While the merge mode methodology can dramatically collapse
many analysis modes all-together, a loss of information (and accuracy) must occur. That’s because:

- A timing arc can only have a single set of min/max timing behaviors

- The only safe behavior is to keep the most pessimistic min/max timing across all modes, and
then use that timing for every mode.
While create individual mode and analyzed separately, each operating mode has its own unique
timing. Every analysis is accurate.

[p199] Topic 28: Merged Mode Example

To better help you understand what does merge mode mean, let’s see an example.

In the shown example, 2 different clock signals are present (CLK A and CLK B) which are selected
by SEL signal. It means the circuit operated in two modes – Mode 0 (when SEL equals 0 and CLK A
is given to the circuit) and Mode 1 (when SEL equals 1 and CLK B is given to the circuit). Analyzed
separately, case analysis is used to select the appropriate clock for each mode. Analyzed together,
these modes could be combined together. Combining the modes is the best way as there is need
to do fewer runs to get the same result. However, there are drawbacks to combining modes:
- Timing pessimism
- Increased memory/runtime
- Increased script complexity

When the modes are analyzed together, both clocks propagate into the network.
- The fast CLK A slew (red) is used for min-delay propagation.
- The slow CLK B slew (blue) is used for max-delay propagation.

Setup paths use max-delay launch, min-delay capture.

- For setup path in CLK A domain, launch is pessimistic since it’s using CLK B max slew
- For setup path in CLK B domain, capture is pessimistic since it’s using CLK A min slew

Hold paths use min-delay launch, max-delay capture.

- For hold path in CLK A domain, capture is pessimistic since it’s using CLK B max slew
- For hold path in CLK B domain, launch is pessimistic since it’s using CLK A min slew

Typically, CTS keeps tight control on clock slews. However, the non-constant mux select slew can
also propagate into the clock network.

[p200] Individual Mode v.s. Merge Mode

Here, let me give you the comparison about these two different mode analysis.

The advantage of MCMM is

1) Natural to think

2) Can exclude path from specific functional mode precisely

The disadvantage of MCMM is

1) Can result in too many modes if functionality is very complex.

2) Can be very runtime intensive and designers may not get a meaningful result in a
reasonable time

3) Can increase timing closure iterations.

The advantage of Merge mode analysis is

1) Reduce number of analysis mode significantly

2) Reduce runtime and timing closure difficulty

The disadvantage of Merge mode analysis is

1) Would be pessimistic compared with MCMM since it has to be conservative not under-
constrain any mode.

2) Additional constraints (e.g. new generated clocks) may be needed to modelling exclusive
relation between merged clocks.

[p201] Chapter Summary

In this chapter, we revisited the process variation in great detail. We also introduced the statistical
STA and the issue it targeted to address. Later on, since the SSTA is not very practical, we
introduced the trade-off industry is using right now, which is the global OCV and Advanced OCV.
Lastly, we mentioned the individual mode and merge mode analysis, which serves as a choice of
STA methodology for the designer to consider.

Chapter 8

[p203] Sign-off Corners & Modes

Firstly, for a successful STA analysis, all the necessary analysis mode including both functional and
test mode must be defined. Both high and low temperature corner have been analyzed to cover
the temperature inversion impact, timing constraints for each scenario have been carefully exam.
Suitable library has been loaded up into the analysis for the corresponding PVT corner.

[p204] Extraction Quality

Parasitic extraction quality is another concern. STA tool like primetime could generate a report
showing statistics of the extraction quality for all the nets in design.
Ideally, all the nets should have annotated parasitic with type RC network, we need to understand
the reason for all the driverless, load less and not annotated nets.

[p205] Clock Definition

Clock creation part needs to be reviewed carefully. Clock period and root pin need to be set
correctly for each timing scenario. All the clocks should be propagated from its true clock source
so STA tool can calculate propagated delay for all the components on the path.
No clock stamp point defined on internal pins of any module and no virtual clocks to be used so
no manual source latency specification happens.

[p206] Timing Exceptions

Timing exceptions such as false path and multicycle path need to be reviewed. Make sure no
relaxation happens on real functional path.
Make sure every setup multicycle constraint has its hold multicycle counterpart. Make sure clock
group definitions are precious and comprehensive.
Legacy code from previous project need to be reviewed and removed if it is unnecessary or invalid
anymore.

[p207] IO pin/ports Constraints

Input and output delay for design ports need to be reviewed since it affects the slew propagation
into the design.

Basically each input port should have driving cell type specified to let the STA tool know how much
drive strength is driving the logic inside the design scope;

Each output port should have load capacitance specified so the tool has a feeling about how much
load the design is going to drive.

Of course, the driving cell and load capacitance are better to be correlated from a top-level full-
chip timing run so we can see the real timing impact from a global perspective.

Besides, any primary IO/block IO which is unconstrained should have valid reason.

In some cases, there is a part of the clock tree from top level design trespassing the design under
analysis, we have to check if the clock network delay meets the requirement and try to minimize
the clock latency if needed.

[p208] STA Reports

In summary, to sign off on timing, we have to make sure the design meets requirements on all
timing aspects. STA tool should dump out at lease these reports for designers to review.

First we need to make sure the extraction annotation quality is good so we can get accurate delay
calculation results into the design. Then we need to make sure the general design rules are
meeting requirements, there are max transition, max cap, clock pulse width rules. Then, the
design should also meet the performance targets which are setup, hold, recovery, removal check.
Special timing checks like clock gating check and data to data check are the next action item to
look at. Signal integrity issue like glitch and crosstalk need to be addressed as well.

In the end, if some violations above have a reason not to be fixed, we need to document them
and put into a waiver file for record.

[p209] Recommendation

[p210] Chapter Summary

So far, we have finished this basic level course for Static timing analysis. You should have well
established a solid foundation for future study.

Let’s meet again in upcoming courses!

TC Lecture2
No ratings yet
TC Lecture2
46 pages
EDA Unit 4
No ratings yet
EDA Unit 4
8 pages
Static Timing Analysis: What Is STA?
No ratings yet
Static Timing Analysis: What Is STA?
3 pages
Ijcet 16 01 148
No ratings yet
Ijcet 16 01 148
14 pages
STA - Part 1
No ratings yet
STA - Part 1
24 pages
Static Timing Analysis Updated
No ratings yet
Static Timing Analysis Updated
15 pages
Lec1.Static Timing Analysis
No ratings yet
Lec1.Static Timing Analysis
15 pages
STA Basics Concepts
No ratings yet
STA Basics Concepts
34 pages
A Complete Handbook For VLSI Engineers: Static Timing Analysis
No ratings yet
A Complete Handbook For VLSI Engineers: Static Timing Analysis
26 pages
Static Timimg Analysis Mat
No ratings yet
Static Timimg Analysis Mat
45 pages
Ijcet 16 01 148
No ratings yet
Ijcet 16 01 148
13 pages
Static Timing Analysis Guide
No ratings yet
Static Timing Analysis Guide
19 pages
1.timing Optimization Techniques: 1. Mapping
100% (2)
1.timing Optimization Techniques: 1. Mapping
152 pages
Timing Analysis Basics in VLSI Design
100% (1)
Timing Analysis Basics in VLSI Design
40 pages
Sta Vlsi
100% (2)
Sta Vlsi
40 pages
Cad For Ic Design Melzg
No ratings yet
Cad For Ic Design Melzg
9 pages
VLSI INTERVIEW QUESTION - Static - Puneet Mittal
No ratings yet
VLSI INTERVIEW QUESTION - Static - Puneet Mittal
63 pages
19MECV15 Report
No ratings yet
19MECV15 Report
36 pages
Sta-Types of Paths
No ratings yet
Sta-Types of Paths
12 pages
61DAC EngineeringTracks Proceedings TOC
No ratings yet
61DAC EngineeringTracks Proceedings TOC
3 pages
STA Presentation
No ratings yet
STA Presentation
22 pages
Sta Syllabi
No ratings yet
Sta Syllabi
6 pages
Chapter 2 What Is STA
No ratings yet
Chapter 2 What Is STA
13 pages
FPGA Static Timing Analysis Guide
No ratings yet
FPGA Static Timing Analysis Guide
13 pages
STA - Booklet
No ratings yet
STA - Booklet
34 pages
Unit5 B
No ratings yet
Unit5 B
11 pages
Sta Part 3
No ratings yet
Sta Part 3
16 pages
Timing Analysis in Physical Design
100% (2)
Timing Analysis in Physical Design
32 pages
STA Intel
No ratings yet
STA Intel
81 pages
Static Timing Analysis
No ratings yet
Static Timing Analysis
9 pages
STA
No ratings yet
STA
123 pages
Digital VLSI Design Timing Analysis: Semester B, 2021-22 Lecturer: Zvika Webb 21 March 2022
100% (1)
Digital VLSI Design Timing Analysis: Semester B, 2021-22 Lecturer: Zvika Webb 21 March 2022
86 pages
Sta Ques
No ratings yet
Sta Ques
4 pages
Vlsi Cad Presentation
No ratings yet
Vlsi Cad Presentation
7 pages
Timing Analysis for VLSI Beginners
100% (1)
Timing Analysis for VLSI Beginners
28 pages
Static Timing Analysis
100% (2)
Static Timing Analysis
100 pages
5 STA基本概念
No ratings yet
5 STA基本概念
24 pages
Dynamic Vs Static Timing Analysis
No ratings yet
Dynamic Vs Static Timing Analysis
4 pages
Ppt2-Unit 4
No ratings yet
Ppt2-Unit 4
43 pages
Sta - Top 50 Frequently Asked Interview Questions
No ratings yet
Sta - Top 50 Frequently Asked Interview Questions
3 pages
STA Questions
No ratings yet
STA Questions
1 page
STA Questions
No ratings yet
STA Questions
1 page
STA Questions
No ratings yet
STA Questions
1 page
2020 12 Concept of Timing Analysis
No ratings yet
2020 12 Concept of Timing Analysis
28 pages
STA Interview Questions
No ratings yet
STA Interview Questions
15 pages
STA Basic Commands and Timing Report Analysis
100% (2)
STA Basic Commands and Timing Report Analysis
21 pages
Notes9 STA and Clock Tree
No ratings yet
Notes9 STA and Clock Tree
7 pages
Static Timing Analysis - Physical Design - VLSI Back-End Adventure
No ratings yet
Static Timing Analysis - Physical Design - VLSI Back-End Adventure
23 pages
Sta
100% (1)
Sta
91 pages
Eeiol 2009mar26 Eda Ta 01
No ratings yet
Eeiol 2009mar26 Eda Ta 01
2 pages
STA Interview Questions Vlsi Static Timing Analysis
No ratings yet
STA Interview Questions Vlsi Static Timing Analysis
31 pages
File 3
No ratings yet
File 3
71 pages
Top 50 Frequently Asked STA Interview Questions
No ratings yet
Top 50 Frequently Asked STA Interview Questions
22 pages
Thermofluids Laboratory Report
No ratings yet
Thermofluids Laboratory Report
28 pages
Razer Gold Gift Card - Google Search
No ratings yet
Razer Gold Gift Card - Google Search
1 page
B2 Current and Voltage Transformers
No ratings yet
B2 Current and Voltage Transformers
22 pages
CMMI Process Improvement Guide
No ratings yet
CMMI Process Improvement Guide
13 pages
Manual Generador Genmax
No ratings yet
Manual Generador Genmax
37 pages
Lecture09 CE72.12Isoparametric Formulation
No ratings yet
Lecture09 CE72.12Isoparametric Formulation
14 pages
Dbms Imp Questions Unit Wise
No ratings yet
Dbms Imp Questions Unit Wise
2 pages
Assignment 8 Chapter 8 AbbigailFlores
No ratings yet
Assignment 8 Chapter 8 AbbigailFlores
2 pages
Syllabus DBI202
No ratings yet
Syllabus DBI202
8 pages
Mid Term Review OISP AE 21
No ratings yet
Mid Term Review OISP AE 21
2 pages
Echos of War
No ratings yet
Echos of War
102 pages
SME Research by Prof Muyungi
No ratings yet
SME Research by Prof Muyungi
14 pages
BOP Drawings by Sections, Rev
100% (1)
BOP Drawings by Sections, Rev
10 pages
Types of Network
No ratings yet
Types of Network
18 pages
Mind Map As A Tool For Critical Thinking
100% (1)
Mind Map As A Tool For Critical Thinking
6 pages
Amity University
No ratings yet
Amity University
3 pages
Alex Kondov - Tao of Node - Design, Architecture & Best Practices Alex Kondov - Software Engineer
No ratings yet
Alex Kondov - Tao of Node - Design, Architecture & Best Practices Alex Kondov - Software Engineer
51 pages
Vehicle Security System
100% (1)
Vehicle Security System
16 pages
Manual SerDia2010 EN PDF
No ratings yet
Manual SerDia2010 EN PDF
225 pages
M527-M506 (Funcionamiento)
No ratings yet
M527-M506 (Funcionamiento)
370 pages
UPS GTEC Zs110 User Manual
No ratings yet
UPS GTEC Zs110 User Manual
21 pages
Ge Elec Reviewer
No ratings yet
Ge Elec Reviewer
137 pages
Samsung V-NAND SSD 970 EVO Plus: 2021 Data Sheet)
No ratings yet
Samsung V-NAND SSD 970 EVO Plus: 2021 Data Sheet)
5 pages
Chairs
No ratings yet
Chairs
1 page
IEC 61010-1-2010 Amd1-2016 Cor1-2019
50% (2)
IEC 61010-1-2010 Amd1-2016 Cor1-2019
4 pages
Turning Point Tactics Map Pack Version 2
100% (1)
Turning Point Tactics Map Pack Version 2
59 pages
DFX8 Web
No ratings yet
DFX8 Web
2 pages
Gj-Ud Buana
No ratings yet
Gj-Ud Buana
1 page
Seagate HDD Data Sheet
No ratings yet
Seagate HDD Data Sheet
2 pages
A - Brand Standardization (Brand and IM) INDIA
No ratings yet
A - Brand Standardization (Brand and IM) INDIA
12 pages