Logic Synthesis
Part 3 – ASIC Synthesis
Amr Adel Mohammady
/amradelm
/amradelm
Save The Palestinian Children
Israel has killed more than 13,000 children in Gaza since
October 7 while others are suffering from severe
malnutrition and do not “even have the energy to cry”,
says the United Nations Children’s Fund (UNICEF).
“Thousands more have been injured or we can’t even
determine where they are. They may be stuck under
rubble … We haven’t seen that rate of death
among children in almost any other conflict in
the world”
-UNICEF Executive Director
Till Nov,2023
Introduction
• In the previous parts we learned the FPGA fabric and the FPGA synthesis flow.
• In this part we will discuss the ASIC synthesis flow.
/amradelm
/amradelm 3
The Inputs and Outputs
• The inputs to ASIC synthesis are:
o HDL: The Verilog or VHDL design files
o Constraints: The timing, power, and area constraints
o Timing Libraries: The standard cell libraries.
o Synthesis Commands:
▪ set target_library <STDCELL_LIBRARY>
set link_library1 "* $target_library io.db rams.db"
read_verilog <RTL FILES LIST>
current_design <TOP_MODULE_NAME>
link
source <TIMING_CONSTRAINTS>
compile #Synthesize the design
• The outputs are:
o Design netlist
o Various reports about the design such as timing, power or area reports
o Synthesis Commands:
▪ write -f verilog -o ./netlist.v
report_timing
report_power
report_area
[1] : The target_library variable specifies the library that Design Compiler uses to select cells for optimization and mapping. The link_library variable /amradelm
specifies every library that has cells referenced by the netlist such as RAMs. The tool uses the libraries specified in the link_library variable for /amradelm 4
resolving references (linking)
Target Library
• Both FPGA and synthesis start synthesis by creating a tech independent netlist. NAND Cell
• After that, the generic netlist is mapped to the target technology and optimized
to meet the constraints.
• The target in ASIC is called a standard cell library:
o It’s a collection of pre-designed and pre-characterized logic gates and other
digital functions used for VLSI design.
o The information can be timing, power consumption, physical layout, logic
functionality, etc
o This information is scattered into multiple files. For example, the timing and
power information exist in timing lib/db file while the physical layout exists in
a LEF/GDS files.
[1]
o These files are sometimes called “Views” (e.g. timing view) as they represent
the cell info from a certain point of view.
Schematic View Layout View
Input Transition Time
𝒕
1.1 1.2 1.3 1.4
10 2.10 2.20 2.27 3.00
Load Capacitance
𝑪𝑳 20 2.50 3.00 3.45 3.96
30 2.90 3.40 3.80 4.15
Example Timing View2
[1] : Reference: An Exploration of Applying Gate-Length-Biasing Techniques to Deeply-Scaled FinFETs Operating in Multiple Voltage Regimes. IEEE /amradelm
Transactions on Emerging Topics in Computing. PP. 1-1. 10.1109/TETC.2016.2640185. /amradelm 5
[2] : These are arbitrary number for demonstration only
Wire Load Model (WLM)
OR Cap Wire Cap
• For the synthesis to know the cell delay and power, it needs to know the input transition and INV Cap
capacitive load.
• Both values depend on the cell type and also the wires connecting the cells.
• The cell information is known from the standard cell library. So, the missing info is the wire
values (resistance and capacitance).
• In older technologies, the wire values were estimated using a wire load model (WLM).
• This model estimates the length of a wire (and therefore the resistance and capacitance) based
on the number of fanouts and the block size as shown in the diagrams
• These estimations are based on results from previous designs
More Fanouts => More Wire Length Larger Block Size => More Wire Length
/amradelm
/amradelm 6
Wire Load Model (WLM) – Example
/amradelm
/amradelm 7
Physical Synthesis
• In newer tech nodes the WLM produced bad estimations so tool vendors tried another approach called Physical Synthesis.
• In this approach the floorplan and physical info (techfile, cell layout, parasitics, etc) are passed to the synthesis.
• This allows the synthesis to do cell placement along with logic synthesis and optimization.
• Since, it knows the distance between the cells, the synthesis can more accurately estimate the expected wire length.
• Physical synthesis produces much better results compared to the WLM approach but has a longer design time
• Two-pass Synthesis: Tool vendors recommend doing physical synthesis in two steps:
1. Synthesize the design with an initial floorplan. The resulting netlist gives info about the cell counts total area, and congestion which enables us to create a
better floorplan
2. Create a new floorplan then redo the synthesis with the physical info.
• In the next slides we will see the other inputs needed (along FP) to do
physical synthesis
One-Pass Synthesis (Not Recommended)
Two-Pass Synthesis (Recommended)
/amradelm
/amradelm 8
Physical Synthesis Inputs – Tech File
• The tech file contains various info about the
technology like:
o The units and precision.
o The coloring of the metals in the GUI.
o The minimum standard cell height and width.
o The design rules such as the layers’ default width
and spacing, etc.
o Via definitions
Example Tech File
/amradelm
/amradelm 9
Physical Synthesis Inputs – ITF & TLUPLUS
• The ITF (Interconnect Technology File) is a text-based file that contains raw information about
each technology layer such as the thicknesses, resistivity, and dielectric constants
• These values are further processed to generate the TLU+ file which contains tables of R, and C
values as functions of metal layer widths, and spacing. This is done while taking into account all
adjacent layers’ effects.
• The TLU+ contents are binary and only contain a text header showing the ITF that was used to
generate the TLU+ file
• Along with TLU+, we use a layer mapping file that maps the layer names between the tech file
and the TLU+
Example ITF File CMOS Cross Section1
[1] : Reference : Okuno, Hanako & Fournier, Adeline & Quesnel, E. & Muffato, V. & Poche, Hélène & Fayolle, M. & Dijon, J.. (2010). CNT integration /amradelm
on different materials suitable for VLSI interconnects. Comptes Rendus Physique - C R PHYS. 11. 381-388. 10.1016/j.crhy.2010.06.008. /amradelm 10
Physical Synthesis Inputs – LEF (Library Exchange Format)
• The GDS file contains full data about the design layout and masks and is sent to
the fabrication plant to fabricate the chip.
• From a runtime and memory usage point of view, we don’t need all the info of the
GDS when doing placement. We only care about the cell boundary, pin shapes and
locations.
• The LEF file contains only the necessary info needed to perform placement and is
used during physical synthesis and across the PnR stages.
• Once PnR is finished, the LEF views are replaced by the GDS views to produce the
final GDS file that contains all the info needed by the fabrication plant
[1] : Reference : Automated integrity checks stop out-of-sync data issues in parallel flows (techdesignforums.com) /amradelm
/amradelm 11
ASIC Synthesis Options
/amradelm
/amradelm 12
Critical Range & TNS Optimization
• By default the tool focuses on enhancing the worst negative slack (WNS).
• The tool considers the WNS and some paths before it. This is controlled with the critical range variable.
• A critical range of 0.0 means that only the most critical paths (the ones with the worst violation) are optimized. If you specify a nonzero critical range, near-
critical paths within that amount of the worst path will also be optimized, if possible.
• Also, you can instruct the tool to focus on enhancing the entire total negative slack (TNS)
at the cost of additional runtime.
• Synthesis Commands:
o set_critical_range 2.5 top TNS
o set compile_timing_high_effort_tns true
WNS
With critical range of 2.5
/amradelm
/amradelm 13
Arithmetic Blocks Architecture
• Digital blocks have a tradeoff between speed vs power and area. The designer might choose an implementation that consume more power or has larger area
but higher speed.
• For example, there are different ways to implement binary adders. One implementation is the ripple adder which has small area and power consumption but has
high 𝑇𝑐𝑜𝑚𝑏 , while a carry-look-ahead (CLA) adder has smaller 𝑇𝑐𝑜𝑚𝑏 but takes a larger area.
• The synthesis tool can automatically choose the best implementation to enhance timing, or area.
• Synthesis Commands:
o set_dp_smartgen_options -optimize_for [area | speed | area,speed]
𝑇𝑐𝑜𝑚𝑏 = 700𝑝𝑠 𝑇𝑐𝑜𝑚𝑏 = 400𝑝𝑠
𝐴𝑟𝑒𝑎 = 75𝜇𝑚2 𝐴𝑟𝑒𝑎 = 130𝜇𝑚2
/amradelm
Reference : Kamanga, Isaack. Design Optimization of the 64-Bit Carry Look-Ahead Adder Based on FPGA and Verilog HDL /amradelm 14
Register Duplication
• By duplicating registers, the timing paths can be shortened, reducing the wire and
cell propagation delays.
• This also reduces the fanout on the register which may enhance the output delay of
the register
• Consider the example on the right :
o By duplicating the green registers we managed to move each copy near one of
the blue register
o This first, reduces the wire length between the green and blue registers and
second, allows us to remove the buffers and inverter pairs on the nets and both
reduce the total combinational delay
o This shows that this method becomes more useful when the capture registers
(the blue ones) are placed far away from each other in the chip.
o However, FF1 now drives double the fanout so the delay of the timing path
between FF1 and FF2 is increased. We need to make sure this increase doesn’t
cause the path to violate setup timing.
• Duplication can be enabled globally or on a cell-by-cell basis
• Synthesis Commands:
Before Duplication After Duplication
o set compile_register_replication true
#When this variable is set to true, compile tries to
identify registers in the current design that can be split
to balance the loads for better QoR.
o set_register_replication -num_copies 3 <REGISTER>
#Duplicate a certain register 3 times.
/amradelm
/amradelm 15
Register Merging
• Merging is the opposite of duplication and is done to reduce the area in the design
but might degrade timing.
• Merging can be enabled globally or on a cell-by-cell basis
• Synthesis Commands:
o set compile_enable_register_merging true
o set_register_merging <REGISTER_LIST> true #Merge certain
registers.
Before Merging After Merging
/amradelm
/amradelm 16
Preferred MUX Implementation
Standard Cell
• Standard cell libraries have the basic cells needed to build a MUX (2 AND ,1 OR ,1 Inverter) but also have
integrated MUX cells.
• It’s better to use the basic cells to build a MUX because each cell can be placed and optimized individually
allowing for greater flexibility for placement and optimizations which produces better timing and area
results.
• The problem is this approach increases the number of pins. For example, a 2:1 MUX will have 11 pins (6 pins
for the 2 ANDs, 2 for Inverter, 3 for OR) compared to 4 pins for the integrated MUX (2 inputs, 1 output, 1
selection).
• This might create pin congestion and make routing difficult. In such cases, it’s better to use the MUX cells
• ASIC tools allow you to instruct the synthesis about which implementation it should prefer over the other.
• Synthesis Commands:
o set compile_prefer_mux true
#The default flow typically maps most multiplexers to and-or-invert (AOI)
logic in order to minimize area, but in some cases this can result in
congestion hotspots. With compile_prefer_mux enabled, multiplexing logic
that is likely to cause congestion is converted to MUX trees where possible.
o set hdlin_infer_mux all
set_size_only [get_cells -hier * -filter "@ref_name =~ *MUX_OP*"]
#These commands forces the compiler to use MUX cells instead of the basic
Standard Cells
gates. However, this restricts the tool and might degrade QoR.
/amradelm
/amradelm 17
Multi-Bit Banking
• ASIC standard cell libraries contain special flip-flops that can store more than one bit. These FFs are called multi-bit banking registers.
• The area of a multi-bit register is less than the total area of the registers if implemented individually.
• Also, the clock tree have less buffers (less area and power) when multi-bit banking is enabled.
• The disadvantage is the limited placement and since all the bits are forced to be placed at the same location.
• The other disadvantage is the limited CTS flexibility since all bits are forced to have the same clock latency which limits fixing timing violations using local skew
optimizations.
• Synthesis Commands:
o set hdlin_infer_multibit [never | default_all | default_none]
#The never setting prevents inference of multibit components
from HDL regardless of directives (Verilog) or attributes (VHDL).
#The default_all setting infers multibit components on all bused
registers except where directives or attributes
indicate otherwise.
#The default_none setting specifies that only attributes
or directives are used to infer multibit components.
This is the default for the hdlin_infer_multibit variable.
/amradelm
/amradelm 18
Thank You!
/amradelm
/amradelm 19