Module 2
Module 2
FLOORPLANNING
AND
PLACEMENT
The input to the floorplanning step is the output of system partitioning and design
entrya netlist. Floorplanning precedes placement, but we shall cover them
together. The output of the placement step is a set of directions for the routing
tools.
At the start of floorplanning we have a netlist describing circuit blocks, the logic
cells within the blocks, and their connections. For example, Figure 16.1 shows
the Viterbi decoder example as a collection of standard cells with no room set
aside yet for routing. We can think of the standard cells as a hod of bricks to be
made into a wall. What we have to do now is set aside spaces (we call these
spaces the channels ) for interconnect, the mortar, and arrange the cells.
Figure 16.2 shows a finished wallafter floorplanning and placement steps are
complete. We still have not completed any routing at this pointthat comes later
all we have done is placed the logic cells in a fashion that we hope will minimize
the total interconnect length, for example.
FIGURE 16.1 The starting point for the floorplanning and placement steps for
the Viterbi decoder (containing only standard cells). This is the initial display of
the floorplanning and placement tool. The small boxes that look like bricks are
the outlines of the standard cells. The largest standard cells, at the bottom of the
display (labeled dfctnb) are 188 D flip-flops. The '+' symbols represent the
drawing origins of the standard cellsfor the D flip-flops they are shifted to the
left and below the logic cell bottom left-hand corner. The large box surrounding
all the logic cells represents the estimated chip size. (This is a screen shot from
Cadence Cell Ensemble.)
FIGURE 16.2 The Viterbi Decoder (from Figure 16.1 ) after floorplanning and
placement. There are 18 rows of standard cells separated by 17 horizontal
channels (labeled 218). The channels are routed as numbered. In this example,
the I/O pads are omitted to show the cell placement more clearly. Figure 17.1
shows the same placement without the channel labels. (A screen shot from
Cadence Cell Ensemble.)
16.1 Floorplanning
Figure 16.3 shows that both interconnect delay and gate delay decrease as we
scale down feature sizesbut at different rates. This is because interconnect
capacitance tends to a limit of about 2 pFcm 1 for a minimum-width wire while
gate delay continues to decrease (see Section 17.4, Circuit Extraction and DRC).
Floorplanning allows us to predict this interconnect delay by estimating
interconnect length.
FIGURE 16.3 Interconnect
and gate delays. As feature
sizes decrease, both average
interconnect delay and
average gate delay decrease
but at different rates. This is
because interconnect
capacitance tends to a limit
that is independent of
scaling. Interconnect delay
now dominates gate delay.
Table 16.1 shows the estimated metal interconnect lengths, as a function of die
size and fanout, for a series of three-level metal gate arrays. In this case the
interconnect capacitance is about 2 pFcm 1 , a typical figure.
Figure 16.5 shows that, because we do not decrease chip size as we scale down
feature size, the worst-case interconnect delay increases. One way to measure the
worst-case delay uses an interconnect that completely crosses the chip, a
coast-to-coast interconnect . In certain cases the worst-case delay of a 0.25 m m
process may be worse than a 0.35 m m process, for example.
FIGURE 16.5
Worst-case
interconnect delay.
As we scale circuits,
but avoid scaling the
chip size, the
worst-case
interconnect delay
increases.
FIGURE 16.7 Congestion analysis. (a) The initial floorplan with a 2:1.5 die
aspect ratio. (b) Altering the floorplan to give a 1:1 chip aspect ratio. (c) A trial
floorplan with a congestion map. Blocks A and C have been placed so that we
know the terminal positions in the channels. Shading indicates the ratio of
channel density to the channel capacity. Dark areas show regions that cannot be
routed because the channel congestion exceeds the estimated capacity.
(d) Resizing flexible blocks A and C alleviates congestion.
FIGURE 16.8 Routing a T-junction between two channels in two-level metal.
The dots represent logic cell pins. (a) Routing channel A (the stem of the T) first
allows us to adjust the width of channel B. (b) If we route channel B first (the
top of the T), this fixes the width of channel A. We have to route the stem of a
T-junction before we route the top.
FIGURE 16.9 Defining the channel routing order for a slicing floorplan using a
slicing tree. (a) Make a cut all the way across the chip between circuit blocks.
Continue slicing until each piece contains just one circuit block. Each cut divides
a piece into two without cutting through a circuit block. (b) A sequence of cuts:
1, 2, 3, and 4 that successively slices the chip until only circuit blocks are left.
(c) The slicing tree corresponding to the sequence of cuts gives the order in
which to route the channels: 4, 3, 2, and finally 1.
Figure 16.10 shows a floorplan that is not a slicing structure. We cannot cut the
chip all the way across with a knife without chopping a circuit block in two. This
means we cannot route any of the channels in this floorplan without routing all of
the other channels first. We say there is a cyclic constraint in this floorplan. There
are two solutions to this problem. One solution is to move the blocks until we
obtain a slicing floorplan. The other solution is to allow the use of L -shaped,
rather than rectangular, channels (or areas with fixed connectors on all sidesa
switch box ). We need an area-based router rather than a channel router to route L
-shaped regions or switch boxes (see Section 17.2.6, Area-Routing Algorithms).
Figure 16.11 (a) displays the floorplan of the ASIC shown in Figure 16.7 . We
can remove the cyclic constraint by moving the blocks again, but this increases
the chip size. Figure 16.11 (b) shows an alternative solution. We merge the
flexible standard cell areas A and C. We can do this by selective flattening of the
netlist. Sometimes flattening can reduce the routing area because routing between
blocks is usually less efficient than routing inside the row-based blocks.
Figure 16.11 (b) shows the channel definition and routing order for our chip.
FIGURE 16.11 Channel definition and ordering. (a) We can eliminate the cyclic
constraint by merging the blocks A and C. (b) A slicing structure.
16.1.5 I/O and Power Planning
Every chip communicates with the outside world. Signals flow onto and off the
chip and we need to supply power. We need to consider the I/O and power
constraints early in the floorplanning process. A silicon chip or die (plural die,
dies, or dice) is mounted on a chip carrier inside a chip package . Connections are
made by bonding the chip pads to fingers on a metal lead frame that is part of the
package. The metal lead-frame fingers connect to the package pins . A die
consists of a logic core inside a pad ring . Figure 16.12 (a) shows a pad-limited
die and Figure 16.12 (b) shows a core-limited die . On a pad-limited die we use
tall, thin pad-limited pads , which maximize the number of pads we can fit
around the outside of the chip. On a core-limited die we use short, wide
core-limited pads . Figure 16.12 (c) shows how we can use both types of pad to
change the aspect ratio of a die to be different from that of the core.
FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The
number of pads determines the die size. (b) A core-limited die: The core logic
determines the die size. (c) Using both pad-limited pads and core-limited pads
for a square die.
Special power pads are used for the positive supply, or VDD, power buses (or
power rails ) and the ground or negative supply, VSS or GND. Usually one set of
VDD/VSS pads supplies one power ring that runs around the pad ring and
supplies power to the I/O pads only. Another set of VDD/VSS pads connects to a
second power ring that supplies the logic core. We sometimes call the I/O power
dirty power since it has to supply large transient currents to the output transistors.
We keep dirty power separate to avoid injecting noise into the internal-logic
power (the clean power ). I/O pads also contain special circuits to protect against
electrostatic discharge ( ESD ). These circuits can withstand very short
high-voltage (several kilovolt) pulses that can be generated during human or
machine handling.
Depending on the type of package and how the foundry attaches the silicon die to
the chip cavity in the chip carrier, there may be an electrical connection between
the chip carrier and the die substrate. Usually the die is cemented in the chip
cavity with a conductive epoxy, making an electrical connection between
substrate and the package cavity in the chip carrier. If we make an electrical
connection between the substrate and a chip pad, or to a package pin, it must be
to VDD ( n -type substrate) or VSS ( p -type substrate). This substrate connection
(for the whole chip) employs a down bond (or drop bond) to the carrier. We have
several options:
● We can dedicate one (or more) chip pad(s) to down bond to the chip
carrier.
● We can make a connection from a chip pad to the lead frame and down
bond from the chip pad to the chip carrier.
● We can make a connection from a chip pad to the lead frame and down
bond from the lead frame.
● We can down bond from the lead frame without using a chip pad.
Depending on the package design, the type and positioning of down bonds may
be fixed. This means we need to fix the position of the chip pad for down
bonding using a pad seed .
A double bond connects two pads to one chip-carrier finger and one package pin.
We can do this to save package pins or reduce the series inductance of bond
wires (typically a few nanohenries) by parallel connection of the pads. A
multiple-signal pad or pad group is a set of pads. For example, an oscillator pad
usually comprises a set of two adjacent pads that we connect to an external
crystal. The oscillator circuit and the two signal pads form a single logic cell.
Another common example is a clock pad . Some foundries allow a special form
of corner pad (normal pads are edge pads ) that squeezes two pads into the area at
the corners of a chip using a special two-pad corner cell , to help meet bond-wire
angle design rules (see also Figure 16.13 b and c).
To reduce the series resistive and inductive impedance of power supply networks,
it is normal to use multiple VDD and VSS pads. This is particularly important
with the simultaneously switching outputs ( SSOs ) that occur when driving buses
off-chip [ Wada, Eino, and Anami, 1990]. The output pads can easily consume
most of the power on a CMOS ASIC, because the load on a pad (usually tens of
picofarads) is much larger than typical on-chip capacitive loads. Depending on
the technology it may be necessary to provide dedicated VDD and VSS pads for
every few SSOs. Design rules set how many SSOs can be used per VDD/VSS
pad pair. These dedicated VDD/VSS pads must follow groups of output pads as
they are seeded or planned on the floorplan. With some chip packages this can
become difficult because design rules limit the location of package pins that may
be used for supplies (due to the differing series inductance of each pin).
Using a pad mapping we translate the logical pad in a netlist to a physical pad
from a pad library . We might control pad seeding and mapping in the
floorplanner. The handling of I/O pads can become quite complex; there are
several nonobvious factors that must be considered when generating a pad ring:
● Ideally we would only need to design library pad cells for one orientation.
For example, an edge pad for the south side of the chip, and a corner pad
for the southeast corner. We could then generate other orientations by
rotation and flipping (mirroring). Some ASIC vendors will not allow
rotation or mirroring of logic cells in the mask file. To avoid these
problems we may need to have separate horizontal, vertical, left-handed,
and right-handed pad cells in the library with appropriate logical to
physical pad mappings.
● If we mix pad-limited and core-limited edge pads in the same pad ring, this
complicates the design of corner pads. Usually the two types of edge pad
cannot abut. In this case a corner pad also becomes a pad-format changer ,
or hybrid corner pad .
● In single-supply chips we have one VDD net and one VSS net, both global
power nets . It is also possible to use mixed power supplies (for example,
3.3 V and 5 V) or multiple power supplies ( digital VDD, analog VDD).
Figure 16.13 (a) and (b) are magnified views of the southeast corner of our
example chip and show the different types of I/O cells. Figure 16.13 (c) shows a
stagger-bond arrangement using two rows of I/O pads. In this case the design
rules for bond wires (the spacing and the angle at which the bond wires leave the
pads) become very important.
FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and
core-limited pads. (b) A hybrid corner pad. (c) A chip with stagger-bonded pads.
(d) An area-bump bonded chip (or flip-chip). The chip is turned upside down
and solder bumps connect the pads to the lead frame.
FIGURE 16.14 Gate-array I/O pads. (a) Cell-based ASICs may contain pad
cells of different sizes and widths. (b) A corner of a gate-array base. (c) A
gate-array base with different I/O cell and pad pitches.
FIGURE 16.15 Power distribution. (a) Power distributed using m1 for VSS and
m2 for VDD. This helps minimize the number of vias and layer crossings needed
but causes problems in the routing channels. (b) In this floorplan m1 is run
parallel to the longest side of all channels, the channel spine. This can make
automatic routing easier but may increase the number of vias and layer
crossings. (c) An expanded view of part of a channel (interconnect is shown as
lines). If power runs on different layers along the spine of a channel, this forces
signals to change layers. (d) A closeup of VDD and VSS buses as they cross.
Changing layers requires a large number of via contacts to reduce resistance.
Figure 16.15 shows two possible power distribution schemes. The long direction
of a rectangular channel is the channel spine . Some automatic routers may
require that metal lines parallel to a channel spine use a preferred layer (either
m1, m2, or m3). Alternatively we say that a particular metal layer runs in a
preferred direction . Since we can have both horizontal and vertical channels, we
may have the situation shown in Figure 16.15 , where we have to decide whether
to use a preferred layer or the preferred direction for some channels. This may or
may not be handled automatically by the routing software.
Clock skew represents a fraction of the clock period that we cannot use for
computation. A clock skew of 500 ps with a 200 MHz clock means that we waste
500 ps of every 5 ns clock cycle, or 10 percent of performance. Latency can
cause a similar loss of performance at the system level when we need to
resynchronize our output signals with a master system clock.
Figure 16.16 (c) illustrates the construction of a clock-driver cell. The delay
through a chain of CMOS gates is minimized when the ratio between the input
capacitance C 1 and the output (load) capacitance C 2 is about 3 (exactly e ª 2.7,
an exponential ratio, if we neglect the effect of parasitics). This means that the
fastest way to drive a large load is to use a chain of buffers with their input and
output loads chosen to maintain this ratio, or taper (we use this as a noun and a
verb). This is not necessarily the smallest or lowest-power method, though.
Suppose we have an ASIC with the following specifications:
● 40,000 flip-flops
The power dissipated charging the input capacitance of the flip-flop clock is fCV
2 or
All of this power is dissipated in the clock-driver cell. The worst problem,
however, is the enormous peak current in the final inverter stage. If we assume
the needed rise time is 0.1 ns (with a 200 MHz clock whose period is 5 ns), the
peak current would have to approach
(800 pF) (3.3 V)
I= ª 25 A . (16.4)
0.1 ns
Designing a clock tree that balances the rise and fall times at the leaf nodes has
the beneficial side-effect of minimizing the effect of hot-electron wearout . This
problem occurs when an electron gains enough energy to become hot and jump
out of the channel into the gate oxide (the problem is worse for electrons in n
-channel devices because electrons are more mobile than holes). The trapped
electrons change the threshold voltage of the device and this alters the delay of
the buffers. As the buffer delays change with time, this introduces unpredictable
skew. The problem is worst when the n -channel device is carrying maximum
current with a high voltage across the channelthis occurs during the rise-and
fall-time transitions. Balancing the rise and fall times in each buffer means that
they all wear out at the same rate, minimizing any additional skew.
A phase-locked loop ( PLL ) is an electronic flywheel that locks in frequency to
an input clock signal. The input and output frequencies may differ in phase,
however. This means that we can, for example, drive a clock network with a PLL
in such a way that the output of the clock network is locked in phase to the
incoming clock, thus eliminating the latency of the clock network . A PLL can
also help to reduce random variation of the input clock frequency, known as jitter
, which, since it is unpredictable, must also be discounted from the time available
for computation in each clock cycle. Actel was one of the first FPGA vendors to
incorporate PLLs, and Actels online product literature explains their use in ASIC
design.
Most ASICs currently use two or three levels of metal for signal routing. With two
layers of metal, we route within the rectangular channels using the first metal layer for
horizontal routing, parallel to the channel spine, and the second metal layer for the
vertical direction (if there is a third metal layer it will normally run in the horizontal
direction again). The maximum number of horizontal interconnects that can be placed
side by side, parallel to the channel spine, is the channel capacity .
FIGURE 16.19 Gate-array interconnect. (a) A small two-level metal gate array
(about 4.6 k-gate). (b) Routing in a block. (c) Channel routing showing channel
density and channel capacity. The channel height on a gate array may only be
increased in increments of a row. If the interconnect does not use up all of the
channel, the rest of the space is wasted. The interconnect in the channel runs in m1 in
the horizontal direction with m2 in the vertical direction.
Vertical interconnect uses feedthroughs (or feedthrus in the United States) to cross the
logic cells. Here are some commonly used terms with explanations (there are no
generally accepted definitions):
● An unused vertical track (or just track ) in a logic cell is called an uncommitted
feedthrough (also built-in feedthrough , implicit feedthrough , or jumper ).
● A vertical strip of metal that runs from the top to bottom of a cell (for
double-entry cells ), but has no connections inside the cell, is also called a
feedthrough or jumper.
● Two connectors for the same physical net are electrically equivalent connectors
(or equipotential connectors ). For double-entry cells these are usually at the top
and bottom of the logic cell.
● A dedicated feedthrough cell (or crosser cell ) is an empty cell (with no logic)
that can hold one or more vertical interconnects. These are used if there are no
other feedthroughs available.
● A feedthrough pin or feedthrough terminal is an input or output that has
connections at both the top and bottom of the standard cell.
● A spacer cell (usually the same as a feedthrough cell) is used to fill space in
rows so that the ends of all rows in a flexible block may be aligned to connect
to power buses, for example.
There is no standard terminology for connectors and the terms can be very confusing.
There is a difference between connectors that are joined inside the logic cell using a
high-resistance material such as polysilicon and connectors that are joined by
low-resistance metal. The high-resistance kind are really two separate alternative
connectors (that cannot be used as a feedthrough), whereas the low-resistance kind are
electrically equivalent connectors. There may be two or more connectors to a logic
cell, which are not joined inside the cell, and which must be joined by the router (
must-join connectors ).
There are also logically equivalent connectors (or functionally equivalent connectors,
sometimes also called just equivalent connectorswhich is very confusing). The two
inputs of a two-input NAND gate may be logically equivalent connectors. The
placement tool can swap these without altering the logic (but the two inputs may have
different delay properties, so it is not always a good idea to swap them). There can
also be logically equivalent connector groups . For example, in an OAI22
(OR-AND-INVERT) gate there are four inputs: A1, A2 are inputs to one OR gate
(gate A), and B1, B2 are inputs to the second OR gate (gate B). Then group A = (A1,
A2) is logically equivalent to group B = (B1, B2)if we swap one input (A1 or A2)
from gate A to gate B, we must swap the other input in the group (A2 or A1).
In the case of channeled gate arrays and FPGAs, the horizontal interconnect areasthe
channels, usually on m1have a fixed capacity (sometimes they are called
fixed-resource ASICs for this reason). The channel capacity of CBICs and channelless
MGAs can be expanded to hold as many interconnects as are needed. Normally we
choose, as an objective, to minimize the number of interconnects that use each
channel. In the vertical interconnect direction, usually m2, FPGAs still have fixed
resources. In contrast the placement tool can always add vertical feedthroughs to a
channeled MGA, channelless MGA, or CBIC. These problems become less important
as we move to three and more levels of interconnect.
16.2.2 Placement Goals and Objectives
The goal of a placement tool is to arrange all the logic cells within the flexible blocks
on a chip. Ideally, the objectives of the placement step are to
● Guarantee the router can complete the routing step
Objectives such as these are difficult to define in a way that can be solved with an
algorithm and even harder to actually meet. Current placement tools use more specific
and achievable criteria. The most commonly used placement objectives are one or
more of the following:
● Minimize the total estimated interconnect length
The minimum rectilinear Steiner tree ( MRST ) is the shortest interconnect using a
rectangular grid. The determination of the MRST is in general an NP-complete
problemwhich means it is hard to solve. For small numbers of terminals heuristic
algorithms do exist, but they are expensive to compute. Fortunately we only need to
estimate the length of the interconnect. Two approximations to the MRST are shown
in Figure 16.21 .
The complete graph has connections from each terminal to every other terminal [
Hanan, Wolff, and Agule, 1973]. The complete-graph measure adds all the
interconnect lengths of the complete-graph connection together and then divides by n
/2, where n is the number of terminals. We can justify this since, in a graph with n
terminals, ( n 1) interconnects will emanate from each terminal to join the other ( n
1) terminals in a complete graph connection. That makes n ( n 1) interconnects in
total. However, we have then made each connection twice. So there are one-half this
many, or n ( n 1)/2, interconnects needed for a complete graph connection. Now we
actually only need ( n 1) interconnects to join n terminals, so we have n /2 times as
many interconnects as we really need. Hence we divide the total net length of the
complete graph connection by n /2 to obtain a more reasonable estimate of minimum
interconnect length. Figure 16.21 (a) shows an example of the complete-graph
measure.
FIGURE 16.21 Interconnect-length
measures. (a) Complete-graph
measure. (b) Half-perimeter measure.
The bounding box is the smallest rectangle that encloses all the terminals (not to be
confused with a logic cell bounding box, which encloses all the layout in a logic cell).
The half-perimeter measure (or bounding-box measure) is one-half the perimeter of
the bounding box ( Figure 16.21 b) [ Schweikert, 1976]. For nets with two or three
terminals (corresponding to a fanout of one or two, which usually includes over 50
percent of all nets on a chip), the half-perimeter measure is the same as the minimum
Steiner tree. For nets with four or five terminals, the minimum Steiner tree is between
one and two times the half-perimeter measure [ Hanan, 1966]. For a circuit with m
nets, using the half-perimeter measure corresponds to minimizing the cost function,
1m
f= S h i , (16.5)
2i=1
It does not really matter if our approximations are inaccurate if there is a good
correlation between actual interconnect lengths (after routing) and our
approximations. Figure 16.22 shows that we can adjust the complete-graph and
half-perimeter measures using correction factors [ Goto and Matsuda, 1986]. Now our
wiring length approximations are functions, not just of the terminal positions, but also
of the number of terminals, and the size of the bounding box. One practical example
adjusts a Steiner-tree approximation using the number of terminals [ Chao, Nequist,
and Vuong, 1990]. This technique is used in the Cadence Gate Ensemble placement
tool, for example.
FIGURE 16.22 Correlation between total length
of chip interconnect and the half-perimeter and
complete-graph measures.
One problem with the measurements we have described is that the MRST may only
approximate the interconnect that will be completed by the detailed router. Some
programs have a meander factor that specifies, on average, the ratio of the
interconnect created by the routing tool to the interconnect-length estimate used by the
placement tool. Another problem is that we have concentrated on finding estimates to
the MRST, but the MRST that minimizes total net length may not minimize net delay
(see Section 16.2.8 ).
FIGURE 16.24 Min-cut placement. (a) Divide the chip into bins using a grid.
(b) Merge all connections to the center of each bin. (c) Make a cut and swap
logic cells between bins to minimize the cost of the cut. (d) Take the cut pieces
and throw out all the edges that are not inside the piece. (e) Repeat the process
with a new cut and continue until we reach the individual bins.
Usually we divide the placement area into bins . The size of a bin can vary, from a bin
size equal to the base cell (for a gate array) to a bin size that would hold several logic
cells. We can start with a large bin size, to get a rough placement, and then reduce the
bin size to get a final placement.
The eigenvalue placement algorithm uses the cost matrix or weighted connectivity
matrix ( eigenvalue methods are also known as spectral methods ) [Hall, 1970]. The
measure we use is a cost function f that we shall minimize, given by
1n
f= S c ij d ij 2 , (16.6)
2i=1
= x T Bx + y T By . (16.7)
In Eq. 16.7 , B is a symmetric matrix, the disconnection matrix (also called the
Laplacian).
We may express the Laplacian B in terms of the connectivity matrix C ; and D , a
diagonal matrix (known as the degree matrix), defined as follows:
B =D C; (16.8)
n
d ii = S c ij , i = 1, ... , ni ; d ij = 0, i À j
i=1
must be a permutation of the fixed positions, p . We can show that requiring the logic
cells to be in fixed positions in this way leads to a series of n equations restricting the
values of the logic cell coordinates [ Cheng and Kuh, 1984]. If we impose all of these
constraint equations the problem becomes very complex. Instead we choose just one
of the equations:
n n
S xi2=S p i 2 . (16.11)
i=1 i=1
Simplifying the problem in this way will lead to an approximate solution to the
placement problem. We can write this single constraint on the x -coordinates in matrix
form:
n
xTx=P; P=S p i 2 . (16.12)
i=1
where P is a constant. We can now summarize the formulation of the problem, with
the simplifications that we have made, for a one-dimensional solution. We must
minimize a cost function, g (analogous to the cost function f that we defined for the
two-dimensional problem in Eq. 16.7 ), where
g = x T Bx . (16.13)
This last equation is called the characteristic equation for the disconnection matrix B
and occurs frequently in matrix algebra (this l has nothing to do with scaling). The
solutions to this equation are the eigenvectors and eigenvalues of B . Multiplying Eq.
16.16 by x T we get:
l x T x = x T Bx . (16.17)
The eigenvectors of the disconnection matrix B are the solutions to our placement
problem. It turns out that (because something called the rank of matrix B is n 1) there
is a degenerate solution with all x -coordinates equal ( l = 0)this makes some sense
because putting all the logic cells on top of one another certainly minimizes the
interconnect. The smallest, nonzero, eigenvalue and the corresponding eigenvector
provides the solution that we want. In the two-dimensional placement problem, the x -
and y -coordinates are given by the eigenvectors corresponding to the two smallest,
nonzero, eigenvalues. (In the next section a simple example illustrates this
mathematical derivation.)
16.2.5 Eigenvalue Placement Example
Consider the following connectivity matrix C and its disconnection matrix B ,
calculated from Eq. 16.8 [ Hall, 1970]:
0001
C=0011
0100
1100
1000 0001 1 0 01
B=0200 0011= 0 211
0010 0100 01 1 0
1100 1100 11 0 2
(16.19)
Figure 16.25 (a) shows the corresponding network with four logic cells (14) and three
nets (AC). Here is a MatLab script to find the eigenvalues and eigenvectors of B :
For a one-dimensional placement ( Figure 16.25 b), we use the eigenvector (0.6533,
0.2706, 0.6533, 0.2706) corresponding to the smallest nonzero eigenvalue (which is
0.5858) to place the logic cells along the x -axis. The two-dimensional placement (
Figure 16.25 c) uses these same values for the x -coordinates and the eigenvector (0.5,
0.5, 0.5, 0.5) that corresponds to the next largest eigenvalue (which is 2.0) for the y
-coordinates. Notice that the placement shown in Figure 16.25 (c), which shows
logic-cell outlines (the logic-cell abutment boxes), takes no account of the cell sizes,
and cells may even overlap at this stage. This is because, in Eq. 16.11 , we discarded
all but one of the constraints necessary to ensure valid solutions. Often we use the
approximate eigenvalue solution as an initial placement for one of the iterative
improvement algorithms that we shall discuss in Section 16.2.6 .
● The measurement criteria that decides whether to move the selected cells.
There are several interchange or iterative exchange methods that differ in their
selection and measurement criteria:
● pairwise interchange,
● force-directed interchange,
All of these methods usually consider only pairs of logic cells to be exchanged. A
source logic cell is picked for trial exchange with a destination logic cell. We have
already discussed the use of interchange methods applied to the system partitioning
step. The most widely used methods use group migration, especially the Kernighan
Lin algorithm. The pairwise-interchange algorithm is similar to the interchange
algorithm used for iterative improvement in the system partitioning step:
1. Select the source logic cell at random.
2. Try all the other logic cells in turn as the destination logic cell.
3. Use any of the measurement methods we have discussed to decide on whether
to accept the interchange.
4. The process repeats from step 1, selecting each logic cell in turn as a source
logic cell.
Figure 16.26 (a) and (b) show how we can extend pairwise interchange to swap more
than two logic cells at a time. If we swap l logic cells at a time and find a locally
optimum solution, we say that solution is l -optimum . The neighborhood exchange
algorithm is a modification to pairwise interchange that considers only destination
logic cells in a neighborhood cells within a certain distance, e, of the source logic
cell. Limiting the search area for the destination logic cell to the e -neighborhood
reduces the search time. Figure 16.26 (c) and (d) show the one- and
two-neighborhoods (based on Manhattan distance) for a logic cell.
FIGURE 16.26 Interchange. (a) Swapping the source logic cell with a destination
logic cell in pairwise interchange. (b) Sometimes we have to swap more than two
logic cells at a time to reach an optimum placement, but this is expensive in
computation time. Limiting the search to neighborhoods reduces the search time.
Logic cells within a distance e of a logic cell form an e-neighborhood. (c) A
one-neighborhood. (d) A two-neighborhood.
The vector component x ij is directed from the center of logic cell i to the center of
logic cell j . The vector magnitude is calculated as either the Euclidean or Manhattan
distance between the logic cell centers. The c ij form the connectivity or cost matrix
(the matrix element c ij is the number of connections between logic cell i and logic
cell j ). If we want, we can also weight the c ij to denote critical connections.
Figure 16.27 illustrates the force-directed placement algorithm.
FIGURE 16.27 Force-directed placement. (a) A network with nine logic cells.
(b) We make a grid (one logic cell per bin). (c) Forces are calculated as if springs
were attached to the centers of each logic cell for each connection. The two nets
connecting logic cells A and I correspond to two springs. (d) The forces are
proportional to the spring extensions.
Kirkpatrick, Gerlatt, and Vecchi first described the use of simulated annealing applied
to VLSI problems [ 1983]. Experience since that time has shown that simulated
annealing normally requires the use of a slow cooling schedule and this means long
CPU run times [ Sechen, 1988; Wong, Leong, and Liu, 1988]. As a general rule,
experiments show that simple min-cut based constructive placement is faster than
simulated annealing but that simulated annealing is capable of giving better results at
the expense of long computer run times. The iterative improvement methods that we
described earlier are capable of giving results as good as simulated annealing, but they
use more complex algorithms.
While I am making wild generalizations, I will digress to discuss benchmarks of
placement algorithms (or any CAD algorithm that is random). It is important to
remember that the results of random methods are themselves random. Suppose the
results from two random algorithms, A and B, can each vary by ±10 percent for any
chip placement, but both algorithms have the same average performance. If we
compare single chip placements by both algorithms, they could falsely show
algorithm A to be better than B by up to 20 percent or vice versa. Put another way, if
we run enough test cases we will eventually find some for which A is better than B by
20 percenta trick that Ph.D. students and marketing managers both know well. Even
single-run evaluations over multiple chips is hardly a fair comparison. The only way
to obtain meaningful results is to compare a statistically meaningful number of runs
for a statistically meaningful number of chips for each algorithm. This same caution
applies to any VLSI algorithm that is random. There was a Design Automation
Conference panel session whose theme was Enough of algorithms claiming
improvements of 5 %.
16.2.8 Timing-Driven Placement Methods
Minimizing delay is becoming more and more important as a placement objective.
There are two main approaches: net based and path based. We know that we can use
net weights in our algorithms. The problem is to calculate the weights. One method
finds the n most critical paths (using a timing-analysis engine, possibly in the
synthesis tool). The net weights might then be the number of times each net appears in
this list. The problem with this approach is that as soon as we fix (for example) the
first 100 critical nets, suddenly another 200 become critical. This is rather like trying
to put worms in a canas soon as we open the lid to put one in, two more pop out.
Another method to find the net weights uses the zero-slack algorithm [ Hauge et al.,
1987]. Figure 16.29 shows how this works (all times are in nanoseconds).
Figure 16.29 (a) shows a circuit with primary inputs at which we know the arrival
times (this is the original definition, some people use the term actual times ) of each
signal. We also know the required times for the primary outputs the points in time at
which we want the signals to be valid. We can work forward from the primary inputs
and backward from the primary outputs to determine arrival and required times at
each input pin for each net. The difference between the required and arrival times at
each input pin is the slack time (the time we have to spare). The zero-slack algorithm
adds delay to each net until the slacks are zero, as shown in Figure 16.29 (b). The net
delays can then be converted to weights or constraints in the placement. Notice that
we have assumed that all the gates on a net switch at the same time so that the net
delay can be placed at the output of the gate driving the neta rather poor timing model
but the best we can use without any routing information.
FIGURE 16.29 The zero-slack algorithm. (a) The circuit with no net delays. (b) The
zero-slack algorithm adds net delays (at the outputs of each gate, equivalent to
increasing the gate delay) to reduce the slack times to zero.
An important point to remember is that adjusting the net weight, even for every net on
a chip, does not theoretically make the placement algorithms any more complexwe
have to deal with the numbers anyway. It does not matter whether the net weight is 1
or 6.6, for example. The practical problem, however, is getting the weight information
for each net (usually in the form of timing constraints) from a synthesis tool or timing
verifier. These files can easily be hundreds of megabytes in size (see Section 16.4 ).
With the zero-slack algorithm we simplify but overconstrain the problem. For
example, we might be able to do a better job by making some nets a little longer than
the slack indicates if we can tighten up other nets. What we would really like to do is
deal with paths such as the critical path shown in Figure 16.29 (a) and not just nets .
Path-based algorithms have been proposed to do this, but they are complex and not all
commercial tools have this capability (see, for example, [ Youssef, Lin, and
Shragowitz, 1992]).
There is still the question of how to predict path delays between gates with only
placement information. Usually we still do not compute a routing tree but use simple
approximations to the total net length (such as the half-perimeter measure) and then
use this to estimate a net delay (the same to each pin on a net). It is not until the
routing step that we can make accurate estimates of the actual interconnect delays.
16.4.2 PDEF
The physical design exchange format ( PDEF ) is a proprietary file format used
by Synopsys to describe placement information and the clustering of logic cells.
Here is a simple, but complete PDEF file:
(CLUSTERFILE
(PDEFVERSION "1.0")
(DESIGN "myDesign")
(DATE "THU AUG 6 12:00 1995")
(VENDOR "ASICS_R_US")
(PROGRAM "PDEF_GEN")
(VERSION "V2.2")
(DIVIDER .)
( CLUSTER (NAME "ROOT")
(WIRE_LOAD "10mm x 10mm")
(UTILIZATION 50.0)
(MAX_UTILIZATION 60.0)
(X_BOUNDS 100 1000)
(Y_BOUNDS 100 1000)
(CLUSTER (NAME "LEAF_1")
(WIRE_LOAD "50k gates")
(UTILIZATION 50.0)
(MAX_UTILIZATION 60.0)
(X_BOUNDS 100 500)
(Y_BOUNDS 100 200)
(CELL (NAME L1.RAM01)
(CELL (NAME L1.ALU01)
)
)
)
This file describes two clusters:
● ROOT , which is the top-level (the whole chip). The file describes the size
( x - and y -bounds), current and maximum area utilization (i.e., leaving
space for interconnect), and the name of the wire-load table, '
10mm x 10mm ', to use for this block, chosen because the chip is expected
to be about 10 mm on a side.
● LEAF_1 , a block below the top level in the hierarchy. This block is to use
predicted capacitances from a wire-load table named '50k gates' (chosen
because we know there are roughly 50 k-gate in this block). The LEAF_1
block contains two logic cells: L1.RAM01 and L1.ALU01 .
16.2 (Trees, 20 min.) For the network graph shown in Figure 16.32 (f), draw the
following trees and calculate their Manhattan lengths:
● a. The minimum Steiner tree.
Calculate:
● h. The complete-graph measure and the half-perimeter measure.
Figure 16.32 parts (ae) illustrate the definitions of these trees. There is no known
solution to the minimum Steiner-tree problem for nets with more than five terminals.
FIGURE 16.32 Tree routing. (a) The minimum rectilinear Steiner tree (MRST).
(b) The minimum rectilinear spanning tree. (c) The minimum single-trunk rectilinear
Steiner tree (1-MRST). (d) The minimum rectilinear chain connection. (e) The
minimum source-to-sink connection. (f) Example net for Problem 16.2 .
16.3 (Eigenvalue placement constraints, 10 min. [ Cheng and Kuh, 1984]) Consider
the one-dimensional placement problem with a vector list of valid positions for the
logic cells p = [ p i ] and a vector list of x -coordinates for the logic cells x = [ x i ].
Show that for a valid placement x (where the vector elements x i are some permutation
of the vector elements p i ), the following equations hold:
n n
S xi =S pi
i=1 i=1
n n
S xi2=S pi2
i=1 i=1
...
n n
S xin=S p i n (16.22)
i=1 i=1
16.5 (Die size, 10 min.) Suppose the minimum spacing between pad centers is W mil
(1 mil = 10 3 inch), there are N I/O pads on a chip, and the die area (assume a square
die) is A mil 2 :
● a. Derive a relationship between W , N , and A that corresponds to the point at
which the die changes from being pad-limited to core-limited.
● b. Plot this relationship with N (ranging from 50 to 500 pads) on the x -axis, A
on the y -axis (for dies ranging in size from 1 mm to 20 mm on a side), and W
as a parameter (for W = 1, 2, 3, and 4 mil).
16.6 (Power buses, 20 min.) Assume aluminum metal interconnect has a resistance of
about 30 m W /square (a low value). Consider a power ring for the I/O pads. Suppose
you have a high-power chip that dissipates 5 W at V DD = 5 V, and assume that half of
the supply current (0.5 A) is due to I/O. Suppose the square die is L mil on a side, and
that the I/O current is equally distributed among the N VDD pads that are on the chip.
In the worst case, you want no more than 100 mV drop between any VDD pad and the
I/O circuits drawing power (notice that there will be an equal drop on the VSS side;
just consider the VDD drop).
● a. Model the power distribution as a ring of N equally spaced pads. Each pad is
connected by a resistor equal to the aluminum VDD power-bus resistance
between two pads. Assume the I/O circuits associated with each pad can be
considered to connect to just one point on the resistors between each pad. If the
resistance between each pad is R , what is the worst-case resistance between the
I/O circuits and the supply?
● b. Plot a graph showing L (in mil) on the x -axis, W (the required power-bus
width in microns) on the y -axis, with N as a parameter (with N = 1, 2, 5, 10).
● c. Comment on your results.
FIGURE 16.33 (For Problem 16.10 .) A problem with the min-cut algorithm is that it
ignores connections to logic cells outside the area being partitioned. (a) We perform a
vertical cut 1 producing the areas R 1 and R 2 . (b) Next we make a horizontal cut 2,
producing L 2 and L 3 , and a cut 2', producing R 2 and R 3 . (c) The min-cut algorithm
ignores the connection from L 2 and is equally likely to produce the arrangement
shown here when we make cut 2'.
16.11 (Benchmarks and statistics, 30 min.) Your boss asks you to compare two
placement programs from companies ABC and XYZ. You run five test cases for both
on a single netlist, P1. You get results (measured in arbitrary units) of 9, 8, 9, 7, 11 for
ABC; 6, 9, 10, 13, 8 for XYZ.
● a. Calculate the mean and standard deviations for these results.
● b. What confidence (in the statistical sense) do you have in these figures?
● c. What can you say about the relative performance of ABC and XYZ?
On average each test case takes about 0.5 hours (wall clock) for both ABC and XYZ.
Next you run six test cases on another netlist, P2 with the following results: 4, 6, 7, 8,
5, 7 for ABC, and 4, 5, 3, 6, 4, 3 for XYZ. These test cases take about 0.75 hours (wall
clock) each.
● d. What can you say about the P2 results?
● e. Given the P1 and P2 results together, what can you say about ABC and XYZ?
● f. How many P1 test cases should you run to get a result so that you can say
ABC is better or worse than XYZ with 90 percent confidence (i.e., you make the
right decision 9 out of 10 times)? How long would this take?
● g. Find the same figures for the P2 netlist. Comment on your answers.
● h. Suppose you had more netlists and information about the variation of results
from each netlist, together with the average time to run each netlist. How would
you use this information to get the most meaningful result in the shortest time?
16.12 (Linear and quadratic placement, 20 min.) [ Sigl, Doll, and Johannes, 1991]
Figure 16.34 (a) shows a simple network that we will place. Figure 16.34 (b) shows
the problem. The logic cells are all the same size: 1 grid unit wide by 1 grid unit high.
Logic cells 1 and 3 are fixed at the locations shown. Logic cell 2 is movable and
placed at coordinates (for the lower-left corner) of ( x 2 , y 2 ). The lower-left corners
of logic cells should be placed at grid locations and should not overlap.
● a. What is the connection matrix c ij for this network?
● b. Calculate and draw (showing the logic-cell coordinates) the placement that
minimizes the linear cost function (or objective function) f L ,
1n
fL= S c ij d ij (16.23)
2j=1
16.13 (Placement interconnect lengths, 45 min.) Figure 16.30 (d) shows the actual
routing corresponding to a placement with an estimated routing length of 8 units (
Figure 16.30 b).
● a. Draw the layout (with routing) corresponding to the placement of
Figure 16.30 (c), which has a lower estimated total routing length of 7 units.
● b. Compare the actual total routing length for both layouts and explain why they
are different from the estimated lengths and describe the sources of the errors.
● c. Consider flipping both logic cells A and B about the y -axis in the layout
shown in Figure 16.30 (d). How much does this shorten the total interconnect
length? Some placement algorithms consider such moves.
16.14 (Zero-slack algorithm, 60 min.) For the circuit of Figure 16.35 :
● a. Find all of the arrival, required, and slack times (all delays are in
nanoseconds).
iv. Distribute a delay equal to the slack S p along the path assigning a fraction to
each net at the output pins of the gates on the path.
v. Work backward from p updating all the required times as necessary and
forward from p updating all the arrival times.
vi. Convert net delays to net lengths.
Hint: You can consult the original description of the zero-slack algorithm if this is not
clear [ Hauge et al., 1987].
16.15 (World planning, 60 min.) The seven continents are (with areas in millions of
square miles): Europestrictly a peninsula of Asia (4.1), Asia (17.2), North America
(9.4), South America (6.9), Australia (3.0), Africa (11.7), and Antarctica (5.1).
Assume the continents are flexible blocks whose aspect ratio may be adjusted.
● a. Create a slicing floorplan of the world with a square aspect ratio.
● b. Draw a world connectivity graph with seven nodes and whose edges are
labeled with the distances between Moscow, Beijing, Chicago, Rio de Janeiro,
Sydney, Nairobi, and the South Pole.
● c. Suppose you want to floorplan the world so that the difference in distances
between the centers of the continental blocks and the corresponding edges in the
world connectivity graph is minimized. How would you measure the differences
in distance? Suggest a method to minimize your measure.
● d. Use an eigenvalue method to floorplan the world. Draw the result with
coordinates for each block and explain your approach.
16.7 Bibliography
There are no recent monographs or review articles on floorplanning modern
ASICs with interconnect delay dominating gate delay. Placement is a much more
developed topic. Perhaps the simplest place to dig deeper is the book by Preas
and Lorenzetti that contains a chapter titled Placement, assignment, and
floorplanning [Preas and Karger, 1988]. The collection edited by Ohtsuki [ 1986]
contains a review paper by Yoshida titled Partitioning, assignment, and
placement. Sangiovanni-Vincentellis review article [ 1986] complements
Ohtsukis edited book, but both are now dated. Sechens book [1988] describes
simulated annealing and its application to placement and chip-planning for
standard cell and gate array ASICs. Part III of the IEEE Press book edited by Hu
and Kuh [ 1983] is a collection of papers on wireability, partitioning, and
placement covering some of the earlier and fundamental work in this area. For a
more recent and detailed look at the inner workings of floorplanning and
placement tools, Lengauers [ 1990] book on algorithms contains a chapter on
graph algorithms and a chapter on placement, assignment, and floorplanning.
Most of these earlier book references deal with placement before the use of
timing as an additional objective. The tutorial paper by Benkoski and Strojwas [
1991] contains a number of references on performance-driven placement. Luks
book [ 1991] describes methods for estimating net delay during placement.
Papers and tutorials on all aspects of floorplanning and placement (with an
emphasis on algorithms) are published in IEEE Transactions on Computer-Aided
Design. The newest developments in floorplanning and placement appear every
year in the Proceedings of the ACM/IEEE Design Automation Conference
(DAC) and Proceedings of the IEEE International Conference on
Computer-Aided Design (ICCAD).
16.8 References
Page numbers in brackets after a reference indicate its location in the chapter
body.
Benkoski, J., and A. J. Strojwas. 1991. The role of timing verification in layout
synthesis. In Proceedings of the 28th ACM/IEEE Design Automation
Conference, San Francisco, pp. 612619. Tutorial paper with 60 references. This
was an introduction to a session on Placement for Performance Optimization
containing five other papers on this topic. [ reference location ]
Hanan, M., P. K. Wolff Sr., and B. J. Agule. 1973. Some experimental results on
placement techniques. In Proceedings of the 13th Design Automation
Conference. Reference to complete graph wire measure. [ reference location ]
Hartoog, M. R., 1986. Analysis of placement procedures for VLSI standard cell
layout. In Proceedings of the 23rd Design Automation Conference. [ reference
location ]
Hu, T. C., and E. S. Kuh (Eds.). 1983. VLSI Circuit Layout: Theory and Design.
New York: IEEE Press. Contains 26 papers divided into six parts; Part 1:
Overview; Part II: General; Part III: Wireability, Partitioning and Placement; Part
IV: Routing; Part V: Layout Systems; Part VI: Module Generation. ISBN
0879421932. TK7874. V5573. [ reference location ]
Luk, W. K. 1991. A fast physical constraint generator for timing driven layout.
In Proceedings of the 28th ACM/IEEE Design Automation Conference.
Introduction to timing-driven placement and net- and path-based approaches.
Describes some different methods to estimate interconnect delay during
placement. ISBN 0-89791-395-7. [ reference location ].
Masleid, R. P. 1991. High-density central I/O circuits for CMOS. IEEE Journal
of Solid-State Circuits, Vol. 26, no. 3, pp. 431435. An I/O circuit design that
reduces the percentage of chip area occupied by I/O circuits from roughly 22
percent to under 3 percent for a 256 I/O chip. Uses IBM C4 technology that
allows package connections to be located over chip circuitry. 10 references. [
reference location ]
Ohtsuki, T. (Ed.). 1986. Layout Design and Verification. New York: Elsevier.
Includes nine papers on CAD tools and algorithms: "Layout strategy,
standardisation, and CAD tools," Ueda, Kasai and Sudo; "Layout compaction,"
Mylynski and Sung; "Layout verification," Yoshida; "Partitioning, assignment
and placement," Goto and Matsuda; "Computational complexity of layout
problems," Shing and Hu; "Computational and geometry algorithms," Asano,
Sato and Ohtsuki; an excellent survey and tutorial paper by M. Burstein:
"Channel routing"; "Maze-running and line-search algorithms" an easily-readable
paper on detailed routing by Ohtsuki; and a mathematical paper, "Global
routing," by Kuh and Marek-Sadowska. ISBN 0444878947. TK7874. L318. [
reference location ]
Wong, D. F., H. W. Leong, and C. L. Liu. 1988. Simulated Annealing for VLSI
Design. Norwell, MA: Kluwer. Introduction; Placement; Floorplan Design;
Channel Routing; Permutation Channel Routing; PLA Folding; Gate Matrix
Layout; Array Optimization. ISBN 0898382564. TK7874. W65. [ reference
location ]
Youssef, H., R.-B. Lin, and E. Shragowitz. 1992. Bounds on net delays for VLSI
circuits. IEEE Transactions on Circuits and Systems II: Analog and Digital
Signal Processing, Vol. 39, no. 11, pp. 315324. An alternative to the
weight-based approach is development of delay bounds on all nets. 21 references.
[ reference location ].
L ast E d ited by S P 1411 2 0 0 4
ROUTING
Once the designer has floorplanned a chip and the logic cells within the flexible
blocks have been placed, it is time to make the connections by routing the chip.
This is still a hard problem that is made easier by dividing it into smaller
problems. Routing is usually split into global routing followed by detailed routing
.
Suppose the ASIC is North America and some travelers in California need advice
on how to drive from Stanford (near San Francisco) to Caltech (near Los
Angeles). The floorplanner has decided that California is on the left (west) side of
the ASIC and the placement tool has put Stanford in Northern California and
Caltech in Southern California. Floorplanning and placement have defined the
roads and freeways. There are two ways to go: the coastal route (using Highway
101) or the inland route (using Interstate I5, which is usually faster). The global
router specifies the coastal route because the travelers are not in a hurry and I5 is
congested (the global router knows this because it has already routed onto I5
many other travelers that are in a hurry today). Next, the detailed router looks at a
map and gives indications from Stanford onto Highway 101 south through San
Jose, Monterey, and Santa Barbara to Los Angeles and then off the freeway to
Caltech in Pasadena.
Figure 17.1 shows the core of the Viterbi decoder after the placement step. This
implementation consists entirely of standard cells (18 rows). The I/O pads are not
included in this examplewe can route the I/O pads after we route the core
(though this is not always a good idea). Figure 17.2 shows the Viterbi decoder
chip after global and detailed routing. The routing runs in the channels between
the rows of logic cells, but the individual interconnections are too small to see.
FIGURE 17.1 The core of the Viterbi decoder chip after placement (a screen
shot from Cadence Cell Ensemble). This is the same placement as shown in
Figure 16.2, but without the channel labels. You can see the rows of standard
cells; the widest cells are the D flip-flops.
FIGURE 17.2 The core of the Viterbi decoder chip after the completion of
global and detailed routing (a screen shot from Cadence Cell Ensemble). This
chip uses two-level metal. Although you cannot see the difference, m1 runs in
the horizontal direction and m2 in the vertical direction.
17.1 Global Routing
The details of global routing differ slightly between cell-based ASICs, gate
arrays, and FPGAs, but the principles are the same in each case. A global router
does not make any connections, it just plans them. We typically global route the
whole chip (or large pieces if it is a large chip) before detail routing the whole
chip (or the pieces). There are two types of areas to global route: inside the
flexible blocks and between blocks (the Viterbi decoder, although a cell-based
ASIC, only involved the global routing of one large flexible block).
● Maximize the probability that the detailed router can complete the routing.
FIGURE 17.3 Measuring the delay of a net. (a) A simple circuit with an inverter
A driving a net with a fanout of two. Voltages V 1 , V 2 , V 3 , and V 4 are the
voltages at intermediate points along the net. (b) The layout showing the net
segments (pieces of interconnect). (c) The RC model with each segment replaced
by a capacitance and resistance. The ideal switch and pull-down resistance R pd
model the inverter A.
The problem is to find the voltages at the inputs to logic cells B and C taking into
account the parasitic resistance and capacitance of the metal interconnect.
Figure 17.3 (c) models logic cell A as an ideal switch with a pull-down resistance
equal to R pd and models the metal interconnect using resistors and capacitors for
each segment of the interconnect.
The Elmore constant for node 4 (labeled V 4 ) in the network shown in
Figure 17.3 (c) is
4
tD4=S Rk4Ck (17.1)
k=1
= R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4 ,
where,
R 14 = R pd + R 1 (17.2)
R 24 = R pd + R 1
R 34 = R pd + R 1 + R 3
R 44 = R pd + R 1 + R 3 + R 4
Suppose we have the following parameters (from the generic 0.5 m m CMOS
process, G5) for the layout shown in Figure 17.3 (b):
● m2 resistance is 50 m W /square.
● m2 capacitance (for a minimum-width line) is 0.2 pFmm 1 .
● 4X inverter delay is 0.02 ns + 0.5 C L ns ( C L is in picofarads).
● Delay is measured using 0.35/0.65 output trip points.
● m2 minimum width is 3 l = 0.9 m m.
● 1X inverter input capacitance is 0.02 pF (a standard load).
First we need to find the pull-down resistance, R pd , of the 4X inverter. If we
model the gate with a linear pull-down resistor, R pd , driving a load C L , the
output waveform is exp t /( C L R pd ) (normalized to 1V). The output reaches
63 percent of its final value when t = C L R pd , because exp (1) = 0.63. Then,
because the delay is measured with a 0.65 trip point, the constant 0.5 nspF 1 =
0.5 k W is very close to the equivalent pull-down resistance. Thus, R pd ª 500 W .
(1 mm) (50 ¥ 10 3 W )
R3 = = 56 W
0.9 m m
(2 mm) (50 ¥ 10 3 W )
R4 = = 112 W
0.9 m m
(17.3)
C 1 = (0.1 mm) (0.2 ¥ pFmm 1 ) = 0.02 pF
C 2 = (0.1 mm) (0.2 ¥ pFmm 1 ) + 0.02 pF = 0.04 pF
C 3 = (1 mm) (0.2 ¥ pFmm 1 ) = 0.2 pF
C 4 = (2 mm) (0.2 ¥ pFmm 1 ) + 0.02 pF = 0.42 pF
(17.4)
Finally, we can calculate Elmores constants for node 4 and node 2 as follows:
t D 4 = R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4 (17.6)
= (506)(0.02) + (506)(0.04)
+ (562)(0.2) + (674)(0.42)
= 425 ps .
t D 2 = R 12 C 1 + R 22 C 2 + R 32 C 3 + R 42 C 4 (17.7)
= ( R pd + R 1 )( C 2 + C 3 + C 4 )
+ ( R pd + R 1 + R 2 ) C 2
= (500 + 6 + 6)(0.04)
+ (500 + 6)(0.02 + 0.2 + 0.2)
= 344 ps .
Comparing Eqs. 17.6 17.8 , we can see that the delay of the inverter can be
assigned as follows: 20 ps (the intrinsic delay, 0.2 ns, due to the cell output
capacitance), 340 ps (due to the pull-down resistance and the output capacitance),
4 ps (due to the interconnect from A to B), and 65 ps (due to the interconnect
from A to C). We can see that the error from neglecting interconnect resistance
can be important.
Even using the Elmore constant we still made the following assumptions in
estimating the path delays:
● A step-function waveform drives the net.
Figure 17.5 shows an example of global routing for a net with five terminals,
labeled A1 through F1, for the cell-based ASIC shown in Figure 17.4 . If a
designer wishes to use minimum total interconnect path length as an objective,
the global router finds the minimum-length tree shown in Figure 17.5 (b). This
tree determines the channels the interconnects will use. For example, the shortest
connection from A1 to B1 uses channels 2, 1, and 5 (in that order). This is the
information the global router passes to the detailed router. Figure 17.5 (c) shows
that minimizing the total path length may not correspond to minimizing the path
delay between two points.
FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC (from
Figure 17.4 ) showing a single net with a fanout of four (five terminals). We
have to order the numbered channels to complete the interconnect path for
terminals A1 through F1. (b) The terminals are projected to the center of the
nearest channel, forming a graph. A minimum-length tree for the net that uses
the channels and takes into account the channel capacities. (c) The
minimum-length tree does not necessarily correspond to minimum delay. If we
wish to minimize the delay from terminal A1 to D1, a different tree might be
better.
Global routing is very similar for cell-based ASICs and gate arrays, but there is a
very important difference between the types of channels in these ASICs. The size
of the channels in sea-of-gates arrays, channelless gate arrays, and cell-based
ASICs can be varied to make sure there is enough space to complete the wiring.
In channeled gate-arrays and FPGAs the size, number, and location of channels
are fixed. The good news is that the global router can allocate as many
interconnects to each channel as it likes, since that space is committed anyway.
The bad news is that there is a maximum number of interconnects that each
channel can hold. If the global router needs more room, even in just one channel
on the whole chip, the designer has to repeat the placement-and-routing steps and
try again (or use a bigger chip).
17.1.5 Global Routing Inside Flexible
Blocks
We shall illustrate global routing using a gate array. Figure 17.6 (a) shows the
routing resources on a sea-of-gates or channelless gate array. The gate array base
cells are arranged in 36 blocks, each block containing an array of 8-by-16
gate-array base cells, making a total of 4068 base cells.
The horizontal interconnect resources are the routing channels that are formed
from unused rows of the gate-array base cells, as shown in Figure 17.6 (b) and
(c). The vertical resources are feedthroughs. For example, the logic cell shown in
Figure 17.6 (d) is an inverter that contains two types of feedthrough. The inverter
logic cell uses a single gate-array base cell with terminals (or connectors ) located
at the top and bottom of the logic cell. The inverter input pin has two electrically
equivalent terminals that the global router can use as a feedthrough. The output of
the inverter is connected to only one terminal. The remaining vertical track is
unused by the inverter logic cell, so this track forms an uncommitted
feedthrough.
You may see any of the terms landing pad (because we say that we drop a via to
a landing pad), pick-up point , connector , terminal , pin , or port used for the
connection to a logic cell. The term pick-up point refers to the physical pieces of
metal (or sometimes polysilicon) in the logic cell to which the router connects. In
a three-level metal process, the global router may be able to connect to anywhere
in an areaan area pick-up point . In this book we use the term connector to refer
to the physical pick-up point. The term pin more often refers to the connection on
a logic schematic icon (a dot, square box, or whatever symbol is used), rather
than layout. Thus the difference between a pin and a connector is that we can
have multiple connectors for one pin. Terminal is often used when we talk about
routing. The term port is used when we are using text (EDIF netlists or HDLs, for
example) to describe circuits.
In a gate array the channel capacity must be a multiple of the number of
horizontal tracks in the gate-array base cell. Figure 17.6 (e) shows a gate-array
base cell with seven horizontal tracks (see Section 17.2 for the factors that
determine the track width and track spacing). Thus, in this gate array, we can
have a channel with a capacity of 7, 14, 21, ... horizontal tracksbut not between
these values.
FIGURE 17.6 Gate-array global routing. (a) A small gate array. (b) An enlarged
view of the routing. The top channel uses three rows of gate-array base cells; the
other channels use only one. (c) A further enlarged view showing how the
routing in the channels connects to the logic cells. (d) One of the logic cells, an
inverter. (e) There are seven horizontal wiring tracks available in one row of
gate-array base cellsthe channel capacity is thus 7.
Figure 17.7 shows the inverter macro for the sea-of-gates array shown in
Figure 17.6 . Figure 17.7 (a) shows the base cell. Figure 17.7 (b) shows how the
internal inverter wiring on m1 leaves one vertical track free as a feedthrough in a
two-level metal process (connectors placed at the top and bottom of the cell). In a
three-level metal process the connectors may be placed inside the cell abutment
box ( Figure 17.7 c). Figure 17.8 shows the global routing for the sea-of-gates
array. We divide the array into nonoverlapping routing bins (or just bins , also
called global routing cells or GRCs ), each containing a number of gate-array
base cells.
FIGURE 17.7 The gate-array inverter from Figure 17.6 d. (a) An oxide-isolated
gate-array base cell, showing the diffusion and polysilicon layers. (b) The metal
and contact layers for the inverter in a 2LM (two-level metal) process. (c) The
routers view of the cell in a 3LM process.
We need an aside to discuss our use of the term cell . Be careful not to confuse
the global routing cells with gate-array base cells (the smallest element of a gate
array, consisting of a small number of n -type and p -type transistors), or with
logic cells (which are NAND gates, NOR gates, and so on).
A large routing bin reduces the size of the routing problem, and a small routing
bin allows the router to calculate the wiring capacities more accurately. Some
tools permit routing bins of different size in different areas of the chip (with
smaller routing bins helping in areas of dense routing). Figure 17.8 (a) shows a
routing bin that is 2 -by-4 gate-array base cells. The logic cells occupy the lower
half of the routing bin. The upper half of the routing bin is the channel area,
reserved for wiring. The global router calculates the edge capacities for this
routing bin, including the vertical feedthroughs. The global router then
determines the shortest path for each net considering these edge capacities. An
example of a global-routing calculation is shown in Figure 17.8 (b). The path,
described by a series of adjacent routing bins, is passed to the detailed router.
FIGURE 17.8 Global routing a gate array. (a) A single global-routing cell (GRC
or routing bin) containing 2-by-4 gate-array base cells. For this choice of routing
bin the maximum horizontal track capacity is 14, the maximum vertical track
capacity is 12. The routing bin labeled C3 contains three logic cells, two of
which have feedthroughs marked 'f'. This results in the edge capacities shown.
(b) A view of the top left-hand corner of the gate array showing 28 routing bins.
The global router uses the edge capacities to find a sequence of routing bins to
connect the nets.
FIGURE 17.10 (a) A large m1 to m2 via. The black squares represent the holes
(or cuts) that are etched in the insulating material between the m1 and 2 layers.
(b) A m1 to m2 via (a via1). (c) A contact from m1 to diffusion or polysilicon (a
contact). (d) A via1 placed over (or stacked over) a contact. (e) A m2 to m3 via
(a via2) (f) A via2 stacked over a via1 stacked over a contact. Notice that the
black square in parts bc do not represent the actual location of the cuts. The
black squares are offset so you can recognize stacked vias and contacts.
In a two-level metal CMOS ASIC technology we complete the wiring using the
two different metal layers for the horizontal and vertical directions, one layer for
each direction. This is Manhattan routing , because the results look similar to the
rectangular northsouth and eastwest layout of streets in New York City. Thus,
for example, if terminals are on the m2 layer, then we route the horizontal
branches in a channel using m2 and the vertical trunks using m1. Figure 17.11
shows that, although we may choose a preferred direction for each metal layer
(for example, m1 for horizontal routing and m2 for vertical routing), this may
lead to problems in cases that have both horizontal and vertical channels. In these
cases we define a preferred metal layer in the direction of the channel spine. In
Figure 17.11 , because the logic cell connectors are on m2, any vertical channel
has to use vias at every logic cell location. By changing the orientation of the
metal directions in vertical channels, we can avoid this, and instead we only need
to place vias at the intersection of horizontal and vertical channels.
Figure 17.12 shows an imaginary logic cell with connectors. Double-entry logic
cells intended for two-level metal routing have connectors at the top and bottom
of the logic cell, usually in m2. Logic cells intended for processes with three or
more levels of metal have connectors in the center of the cell, again usually on
m2. Logic cells may use both m1 and m2 internally, but the use of m2 is usually
minimized. The router normally uses a simplified view of the logic cell called a
phantom . The phantom contains only the logic cell information that the router
needs: the connector locations, types, and names; the abutment and bounding
boxes; enough layer information to be able to place cells without violating design
rules; and a blockage map the locations of any metal inside the cell that blocks
routing.
FIGURE 17.12 The different types of connections that can be made to a cell.
This cell has connectors at the top and bottom of the cell (normal for cells
intended for use with a two-level metal process) and internal connectors (normal
for logic cells intended for use with a three-level metal process). The
interconnect and connections are drawn to scale.
Figure 17.13 illustrates some terms used in the detailed routing of a channel. The
channel spine in Figure 17.13 is horizontal with terminals at the top and the
bottom, but a channel can also be vertical. In either case terminals are spaced
along the longest edges of the channel at given, fixed locations. Terminals are
usually located on a grid defined by the routing pitch on that layer (we say
terminals are either on-grid or off-grid ). We make connections between
terminals using interconnects that consist of one or more trunks running parallel
to the length of the channel and branches that connect the trunk to the terminals.
If more than one trunk is used, the trunks are connected by doglegs . Connections
exit the channel at pseudoterminals .
FIGURE 17.13 Terms used in channel routing. (a) A channel with four
horizontal tracks. (b) An expanded view of the left-hand portion of the channel
showing (approximately to scale) how the m1 and m2 layers connect to the logic
cells on either side of the channel. (c) The construction of a via1 (m1/m2 via).
The trunk and branch connections run in tracks (equispaced, like railway tracks).
If the trunk connections use m1, the horizontal track spacing (usually just called
the track spacing for channel routing) is equal to the m1 routing pitch. The
maximum number of interconnects we need in a channel multiplied by the
horizontal track spacing gives the minimum height of a channel (see
Section 17.2.2 on how to determine the maximum number of interconnects
needed). Each terminal occupies a column . If the branches use m2, the column
spacing (or vertical track spacing ) is equal to the m2 routing pitch.
FIGURE 17.14 The definitions of local channel density and global channel
density. Lines represent the m1 and m2 interconnect in the channel to simplify
the drawing.
17.2.3 Algorithms
We start discussion of routing methods by simplifying the general
channel-routing problem. The restricted channel-routing problem limits each net
in a channel to use only one horizontal segment. In other words the channel
router uses only one trunk for each net. This restriction has the effect of
minimizing the number of connections between the routing layers. This is
equivalent to minimizing the number of vias used by the channel router in a
two-layer metal technology. Minimizing the number of vias is an important
objective in routing a channel, but it is not always practical. Sometimes
constraints will force a channel router to use jogs or other methods to complete
the routing (see Section 17.2.5 ). Next, though, we shall study an algorithm that
solves the restricted channel-routing problem.
Figure 17.15 illustrates the LEA. The algorithm works as long as none of the
branches touchwhich may occur if there are terminals in the same column
belonging to different nets. In this situation we have to make sure that the trunk
that connects to the top of the channel is placed above the lower trunk. Otherwise
two branches will overlap and short the nets together. In the next section we shall
examine this situation more closely.
Figure 17.18 illustrates the Lee maze-running algorithm . The goal is to find a
path from X to Yi.e., from the start (or source) to the finish (or target)avoiding
any obstacles. The algorithm is often called wave propagation because it sends
out waves, which spread out like those created by dropping a stone into a pond.
Algorithms that use lines rather than waves to search for connections are more
efficient than algorithms based on the Lee algorithm. Figure 17.19 illustrates the
Hightower algorithm a line-search algorithm (or line-probe algorithm ):
1. Extend lines from both the source and target toward each other.
2. When an extended line, known as an escape line , meets an obstacle,
choose a point on the escape line from which to project another escape line
at right angles to the old one. This point is the escape point .
3. Place an escape point on the line so that the next escape line just misses the
edge of the obstacle. Escape lines emanating from the source and target
intersect to form the path.
FIGURE 17.19 Hightower
area-routing algorithm. (a)
Escape lines are constructed
from source (X) and target (Y)
toward each other until they hit
obstacles. (b) An escape point
is found on the escape line so
that the next escape line
perpendicular to the original
misses the next obstacle. The
path is complete when escape
lines from source and target
meet.
The Hightower algorithm is faster and requires less memory than methods based
on the Lee algorithm.
17.2.7 Multilevel Routing
Using two-layer routing , if the logic cells do not contain any m2, it is possible to
complete some routing in m2 using over-the-cell (OTC) routing. Sometimes poly
is used for short connections in the channel in a two-level metal technology; this
is known as 2.5-layer routing . Using a third level of metal in three-layer routing ,
there is a choice of approaches. Reserved-layer routing restricts all the
interconnect on each layer to flow in one direction in a given routing area (for
example, in a channel, either parallel or perpendicular to the channel spine).
Unreserved-layer routing moves in both horizontal and vertical directions on a
given layer. Most routers use reserved routing. Reserved three-level metal routing
offers another choice: Either use m1 and m3 for horizontal routing (parallel to the
channel spine), with m2 for vertical routing ( HVH routing ) or use VHV routing
. Since the logic cell interconnect usually blocks most of the area on the m1 layer,
HVH routing is normally used. It is also important to consider the pitch of the
layers when routing in the same direction on two different layers. Using HVH
routing it is preferable for the m3 pitch to be a simple multiple of the m1 pitch
(ideally they are the same). Some processes have more than three levels of metal.
Sometimes the upper one or two metal layers have a coarser pitch than the lower
layers and are used in multilevel routing for power and clock lines rather than for
signal interconnect.
Figure 17.20 shows an example of three-layer channel routing. The logic cells are
64 l high, the m1 routing pitch is 8 l, and the m2 and m3 routing pitch is 16 l .
The channel in Figure 17.20 is the same as the channel using two-layer metal
shown in Figure 17.13 , but using three-level metal reduces the channel height
from 40 l ( = 5 ¥ 8 l ) to 16 l . Submicron processes try to use the same metal
pitch on all metal layers. This makes routing easier but processing more difficult.
FIGURE 17.20 Three-level channel routing. In this diagram the m2 and m3
routing pitch is set to twice the m1 routing pitch. Routing density can be
increased further if all the routing pitches can be made equala difficult process
challenge.
With three or more levels of metal routing it is possible to reduce the channel
height in a row-based ASIC to zero. All of the interconnect is then completed
over the cell. If all of the channels are eliminated, the core area (logic cells plus
routing) is determined solely by the logic-cell area. The point at which this
happens depends on not only the number of metal layers and channel density, but
also the routing resources (the blockages and feedthroughs) in the logic cell. This
the cell porosity . Designing porous cells that help to minimize routing area is an
art. For example, it is quite common to be able to produce a smaller chip using
larger logic cells if the larger cells have more routing resources.
FIGURE 17.21 Clock routing. (a) A clock network for the cell-based ASIC
from Figure 16.11. (b) Equalizing the interconnect segments between CLK and
all destinations (by including jogs if necessary) minimizes clock skew.
The clock tree may contain multiply-driven nodes (more than one active element
driving a net). The net delay models that we have used break down in this case
and we may have to extract the clock network and perform circuit simulation,
followed by back-annotation of the clock delays to the netlist (for circuit
extraction, see Section 17.4 ) and the bus currents to the clock router. The sizes of
the clock buses depend on the current they must carry. The limits are set by
reliability issues to be discussed next.
Clock skew induced by hot-electron wearout was mentioned in Section 16.1.6,
Clock Planning. Another factor contributing to unpredictable clock skew is
changes in clock-buffer delays with variations in power-supply voltage due to
data-dependent activity. This activity-induced clock skew can easily be larger
than the skew achievable using a clock router. For example, there is little point in
using software capable of reducing clock skew to less than 100 ps if, due to
fluctuations in power-supply voltage when part of the chip becomes active, the
clock-network delays change by 200 ps.
The power buses supplying the buffers driving the clock spine carry direct
current ( unidirectional current or DC), but the clock spine itself carries
alternating current ( bidirectional current or AC). The difference between
electromigration failure rates due to AC and DC leads to different rules for sizing
clock buses. As we explained in Section 16.1.6, Clock Planning, the fastest way
to drive a large load in CMOS is to taper successive stages by approximately e ª
3. This is not necessarily the smallest-area or lowest-power approach, however [
Veendrick, 1984].
where J is the average of J(t) , and | J | is the average of | J |. The constant k AC/DC
relates the relative effects of AC and DC and is typically between 0.01 and
0.0001. Electromigration problems become serious with a MTTF of less than 10 5
hours (approximately 10 years) for current densities (DC) greater than 0.5 GAm
2 at temperatures above 150 °C.
Table 17.1 lists example metallization reliability rules limits for the current you
can pass through a metal layer, contact, or viafor the typical 0.5 m m three-level
metal CMOS process, G5. The limit of 1 mA of current per square micron of
metal cross section is a good rule-of-thumb to follow for current density in
aluminum-based interconnect.
Some CMOS processes also have maximum metal-width rules (or fat-metal rules
). This is because stress (especially at the corners of the die, which occurs during
die attach mounting the die on the chip carrier) can cause large metal areas to
lift. A solution to this problem is to place slots in the wide metal lines. These
rules are dependent on the ASIC vendors level of experience.
To determine the power-bus widths we need to determine the bus currents. The
largest problem is emulating the systems operating conditions. Input vectors to
test the system are not necessarily representative of actual system operation.
Clock-bus sizing depends strongly on the parameter k AC/DC in Eq. 17.10 , since
the clock spine carries alternating current. (For the sources of power dissipation
in CMOS, see Section 15.5, Power Dissipation.)
Gate arrays normally use a regular power grid as part of the gate-array base. The
gate-array logic cells contain two fixed-width power buses inside the cell,
running horizontally on m1. The horizontal m1 power buses are then strapped in
a vertical direction by m2 buses, which run vertically across the chip. The
resistance of the power grid is extracted and simulated with SPICE during the
base-array design to model the effects of IR drops under worst-case conditions.
TABLE 17.1 Metallization reliability rules for a typical 0.5 micron ( l = 0.25 m
m) CMOS process.
Current limit Metal thickness
Layer/contact/via Resistance 3
1 2
m1 1 mA m m 1 7000 Å 95 m W /square
m2 1 mA m m 1 7000 Å 95 m W /square
m3 2 mA m m 1 12,000 Å 48 m W /square
0.8 m m square m1 contact to
0.7 mA 11 W
diffusion
0.8 m m square m1 contact to
0.7 mA 16 W
poly
0.8 m m square m1/m2 via
0.7 mA 3.6 W
(via1)
0.8 m m square m2/m3 via
0.7 mA 3.6 W
(via2)
Standard cells are constructed in a similar fashion to gate-array cells, with power
buses running horizontally in m1 at the top and bottom of each cell. A row of
standard cells uses end-cap cells that connect to the VDD and VSS power buses
placed by the power router. Power routing of cell-based ASICs may include the
option to include vertical m2 straps at a specified intervals. Alternatively the
number of standard cells that can be placed in a row may be limited during
placement. The power router forms an interdigitated comb structure, minimizing
the number of times a VDD or VSS power bus needs to change layers. This is
achieved by routing with a routing bias on preferred layers. For example, VDD
may be routed with a left-and-down bias on m1, with VSS routed using
right-and-up bias on m2.
Three-level metal processes either use a m3 with a thickness and pitch that is
comparable to m1 and m2 (which usually have approximately the same thickness
and pitch) or they use metal that is much thicker (up to twice as thick as m1 and
m2) with a coarser pitch (up to twice as wide as m1 and m2). The factor that
determines the m3/4/5 properties is normally the sophistication of the fabrication
process.
In a three-level metal process, power routing is similar to two-level metal ASICs.
Power buses inside the logic cells are still normally run on m1. Using HVH
routing it would be possible to run the power buses on m3 and drop vias all the
way down to m1 when power is required in the cells. The problem with this
approach is that it creates pillars of blockage across all three layers.
Using three or more layers of metal for routing, it is possible to eliminate some of
the channels completely. In these cases we complete all the routing in m2 and m3
on top of the logic cells using connectors placed in the center of the cells on m1.
If we can eliminate the channels between cell rows, we can flip rows about a
horizontal axis and abut adjacent rows together (a technique known as flip and
abut ). If the power buses are at the top (VDD) and bottom (VSS) of the cells in
m1 we can abut or overlap the power buses (joining VDD to VDD and VSS to
VSS in alternate rows).
Power distribution schemes are also a function of process and packaging
technology. Recall that flip-chip technology allows pads to be placed anywhere
on a chip (see Section 16.1.5, I/O and Power Planning, especially
Figure 16.13d). Four-level metal and aggressive stacked-via rules allow I/O pad
circuits to be placed in the core. The problems with this approach include placing
the ESD and latch-up protection circuits required in the I/O pads (normally kept
widely separated from core logic) adjacent to the logic cells in the core.
1. At 125 °C for unidirectional current. Limits for 110 °C are ¥ 1.5 higher. Limits
for 85 °C are ¥ 3 higher. Current limits for bidirectional current are ¥ 1.5 higher
than the unidirectional limits.
2. 10,000 Å (ten thousand angstroms) = 1 m m.
3. Worst case at 110 °C.
17.4 Circuit Extraction and
DRC
After detailed routing is complete, the exact length and position of each
interconnect for every net is known. Now the parasitic capacitance and resistance
associated with each interconnect, via, and contact can be calculated. This data is
generated by a circuit-extraction tool in one of the formats described next. It is
important to extract the parasitic values that will be on the silicon wafer. The
mask data or CIF widths and dimensions that are drawn in the logic cells are not
necessarily the same as the final silicon dimensions. Normally mask dimensions
are altered from drawn values to allow for process bias or other effects that occur
during the transfer of the pattern from mask to silicon. Since this is a problem
that is dealt with by the ASIC vendor and not the design software vendor, ASIC
designers normally have to ask very carefully about the details of this problem.
Table 17.2 shows values for the parasitic capacitances for a typical 1 m m CMOS
process. Notice that the fringing capacitance is greater than the parallel-plate
(area) capacitance for all layers except poly. Next, we shall describe how the
parasitic information is passed between tools.
Here is an example regular SPF file for just one net that uses the PI segment
model shown in Figure 17.22 (e):
FIGURE 17.23 The detailed standard parasitic format (DSPF) for interconnect
representation. (a) An example network with two m2 paths connected to a logic
cell, INV1. The grid shows the coordinates. (b) The equivalent DSPF circuit
corresponding to the DSPF file in the text.
1. Fringing capacitances are per isolated line. Closely spaced lines will have
reduced fringing capacitance and increased interline capacitance, with increased
total capacitance.
2. NA = not applicable.