But this isnt an architecture course?!?!?
A lot of the CAD algorithms we will talk about are specific to FPGAs: FPGAs consume a large market share of all CAD Their structure provides unique CAD constraints and opportunities As technology advances, we are seeing more structured approaches, even for custom-chip implementations These will require new CAD techniques that may be inspired by those used for FPGAs today Since we will spend a lot of time talking about CAD for FPGAs, it is important to talk about what an FPGA looks like inside (architecture)
FPGA Architectures
Steve Wilton stevew@ece.ubc.ca Course web site: http://www.ece.ubc.ca/~eece583
Whats Inside an FPGA?
Logic Blocks - used to implement logic - lookup tables and flip-flops Altera: LABs Xilinx: CLBs
Whats Inside an FPGA?
Whats Inside an FPGA?
Logic Block
I/O Blocks - interface off-chip - can usually support many I/O Standards
Connection Block Switch Block Routing Track (Horizontal) Routing Channel (Vertical)
{
Logic Block:
Basic Logic Gate: Lookup-Table Inputs Bit-Stream
TILE
Logic Blocks implement the functionality of the circuit
Function of each lookup table can be configured by shifting in bit-stream.
Logic Block:
Quick Question: What function would this implement? A B C
1 1 1 1 1 1 1 0
Logic Block:
Basic Logic Gate: Lookup-Table Inputs
D Q
F=A+B+C
Function of each lookup table can be configured by shifting in bit-stream.
Xilinx Virtex II Logic Block
COUT SHIFTIN SOPOUT
Xilinx Virtex II Logic Block
SHIFTIN COUT SOPOUT
SOPIN YB G4 G3 G2 G1 WG4 WG3 WG2 WG1 ALTDIG CE CLK BY CIN SR DIG LUT/ RAM/ ROM
SOPIN YB G4 G3 G2 G1 WG4 WG3 WG2 WG1 ALTDIG CE CLK BY CIN SR DIG LUT/ RAM/ ROM
Y SHIFTOUT DY
Y SHIFTOUT DY
Flip-Flop/ Latch
Flip-Flop/ Latch
X2
X2
Stratix II Logic Block:
Stratix II Logic Block:
Source: Stratix II Handbook, 2005
Source: Stratix II Handbook, 2005
Logic Clusters
Intra-cluster connections: fast Inter-cluster connections: slow
Logic Blocks are grouped into Clusters
Local Interconnect
D Q
There is a balance:
D Q
- Larger clusters mean more intra-cluster connections - But, larger clusters means the intra-cluster connections are not as fast
D Q
Cluster Architecture:
Cluster Architecture:
4LUT
This will significantly impact the speed and routability of the device
4LUT
4LUT
4LUT
4LUT
4LUT
Intra-cluster routing:
Academic studies typically consider fully populated:
Intra-cluster routing:
Commercial parts: depopulated (this is 50%)
4LUT
4LUT
4LUT
4LUT
4LUT
4LUT
Altera Stratix LAB (Logic Array Block):
General Purpose Routing LE LE LE Local Interconnect To left LAB From left LAB LE To right LAB LE LE LE LE LE LE From right LAB General Purpose Routing
Altera APEX MegaLAB:
MegaLAB Interconnect LE LE Local Interconnect Local Interconnect LE LE LE LE LE LE LE LE LE LE Local Interconnect LE LE LE LE LE LE LE LE LE LE Local Interconnect LE LE LE LE LE LE LE LE 16 LABs and 1 ESB LE LE LE LE LE LE LE LE LE LE
10 Logic Elements in each LAB Two carry chains through each LAB Connections to general purpose routing and neighbouring LABs
Routing is important!
12%
Logic (LUTs) LUT outputs
31% 9%
Routing Fabric
9% 27% 12%
LUT inputs CLB inputs CLB input buffers Routing switches
Source: Guy Lemieux
Embedded System Block
Mesh (Island-style) FPGA
Whats Inside an FPGA?
Clustered Logic Block Switch Block
Reconfigurable Logic:
Routing Channel
Routing Channel
Reconfigurable Logic:
Connect Logic Blocks using Fixed Metal Tracks and Programmable Switches
Connect Logic Blocks using Fixed Metal Tracks and Programmable Switches
Programmable Switches
SRAM
Switch Blocks
Most of the FPGA area is due to routing - Fixed metal tracks arranged in horizontal and vertical channels - Connected to each other using switch blocks
SRAM
Unbuffered Connection
SRAM
Buffered Connection
Today, buffered connections are common
Switch Blocks:
Switch Blocks connect horizontal and vertical channels Every possible connection? - Too big - Too slow
Switch Blocks:
Switch Blocks connect horizontal and vertical channels Every possible connection? - Too big - Too slow Many Topologies possible Fs = 3 is common
Implementing the Switch Block:
Circuit-level design of these switch blocks will be considered later
Switch Block Topologies:
Disjoint
Universal
Wilton
Advantage of Wilton Switch Block
Advantage of Wilton Switch Block
CLB
CLB
CLB
CLB
CLB
CLB
Diversity means you can get to more routing tracks. It tends to provide slightly better routability. No big impact on delay.
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
Wiring Segments
Short segments are good for local connections Long segments are good for global connections
Segmented Architecture
At each switch block: some tracks end some tracks pass right through
LB LB LB LB LB
Medium-length segments
Single length segments
LB
LB
LB
LB
LB
Long Line Segments
Most FPGAs have a variety of segment lengths
LB LB LB LB LB
Segment Lengths
Typically, an FPGA contains a mix of segment lengths: - Some wires that span only one logic block - Some wires that span more than one logic block - Some wires that span the whole chip If a segment is too short, must traverse many segments to reach your destination If a segment is too long, waste routing capacity, extra capacitance Academic work has suggested length-4 segments
Wilton block for Segmented Architectures
Wilton block needs more switches for pass through case
10
The Imran Switch Block
At each Switch Block, some tracks terminate: Programmable Switch
The Imran Switch Block
Some tracks do not terminate: Programmable Switch
Connect using Wilton pattern
Connect using Disjoint pattern
The Imran Switch Block
Put the two together:
Connection Blocks
Most of the FPGA area is due to routing - Fixed metal tracks arranged in horizontal and vertical channels - Connected to each other using switch blocks - Connected to logic blocks using connection blocks
Gives good results for segmented architectures
11
Connection Block
Logic Block
Detailed Routing Diagram (XC4000X)
Each pin can connect to a subset of the tracks in an adjacent channel Dots represent Programmable Connections Yes, this is old, but it illustrates the parts. Today, vendors dont publish the routing details
Quad
Double
Switch Block
Switch Block
Single Double Long
Direct
Logic Block
CLB
Feedback
Long
G ng Lo ble ou D le ng Si e bl ou D ng Lo l ba lo ct ire D l ba lo G ng Lo c ba ed Fe
d ua Q
Altera Stratix
Horizontal: R4 Lines, R8 Lines, R28 Lines Vertical: C4 Lines, C8 Lines, C16 Lines Local Interconnects
R4 Line R4 Line
Altera Stratix II
Horizontal: R4 Lines, R24 Lines Vertical: C4 Lines, C16 Lines Local Interconnects
R4 Line
They found little benefit to the length-8 lines in Stratix
R4 Line
12
Xilinx Virtex II:
Xilinx Virtex II:
Long Lines: Span entire chip - 24 in each channel (horizontal and vertical) - Can to connect to any logic block (actually through the neighbouring switch block)
Hex Lines: - 120 in each channel (horizontal and vertical) - Can only be driven at one end - Two connections to destination logic blocks
Xilinx Virtex II:
Xilinx Virtex II:
Local Interconnect between neighbouring logic blocks:
Double Lines - 40 in each channel (horizontal and vertical) - Driven at one end
13
Connection Blocks
Most of the FPGA area is due to routing - Fixed metal tracks arranged in horizontal and vertical channels - Connected to each other using switch blocks - Connected to logic blocks using connection blocks
Systems
Implementing Systems in an FPGA:
FPGA vendors embed fixed blocks to improve speed and density: Embedded Memories (blocks of 2K-18K)
Implementing Systems in an FPGA:
FPGA vendors embed fixed blocks to improve speed and density: Embedded Memories (blocks of 2K-18K) Multiplier Blocks
14
Implementing Systems in an FPGA:
FPGA vendors embed fixed blocks to improve speed and density: Embedded Memories (blocks of 2K-18K) Multiplier Blocks High-Speed I/Os
Implementing Systems in an FPGA:
FPGA vendors embed fixed blocks to improve speed and density: Embedded Memories (blocks of 2K-18K) Multiplier Blocks High-Speed I/Os Dedicated Clock Circuitry
Implementing Systems in an FPGA:
FPGA vendors embed fixed blocks to improve speed and density: Embedded Memories (blocks of 2K-18K) Multiplier Blocks High-Speed I/Os Dedicated Clock Circuitry CPU (eg. ARM,MIPS)
Summary
Two Sources of Flexibility in an FPGA: 1. Most FPGAs use Lookup-Tables as their basic logic resource - 4-LUT can implement any function of 4 inputs 2. Connections between logic blocks can be made using fixed metal tracks - these fixed tracks are connected to each other and to the logic blocks using programmable switches
15