Digital Systems Design Methodologies
High level synthesis
Introduction
Models: FSMD
Synthesis definition
Behavioral Structural
Architectural Architectural synthesis Control
For I=0 to I=15 Memory
Level Sum = Sum + array[I]
(register level)
+
0 State
Logic Logic synthesis
0 0
Level
0 Circuit synthesis
Circuit (Library)
Level
Geometric
-2-
Architectural Synthesis Problem
Specification
•An Intermediate representation
•A set of functional resources
•characterized by area and execution delay
Constraints
Objectives
Output:
•Data-path + controller
Tasks
•Place operations in time and space
•Determine detailed interconnection and control
Objectives/Constraints include: area, cycle time, latency, and throughput.
•Area: number of modules/resources available or size of your silicon die.
•Cycle time: how fast your clock runs
•Latency: number of cycles for input data to result in a solution or result.
•Throughput: Amount of data that can be processed in a given amount of time (usually involves pipelining)
-3-
Architectural Synthesis output
• Functional resources: Perform
external
operations on data (arithmetic
external
control and logic blocks).
Data-path:
data
inputs inputs • Memory resources: Store data
… … (memory and registers).
datapath
control • Interconnection resources:
controller inputs datapath (muxes, busses and ports).
datapath
control
… outputs …
external external • Finite state machine (FSM)
control data • Controller + microprogram
outputs outputs
Controller: • Sinchronization scheme (e.g.
global clock singol phase with
master-slave registers)
controller and datapath
-4-
Objective function
Main goals in classical approach
1. Minimum area
– Functional units, registers, memory, interconnect
2. Maximum speed
– latency in clock cycles
– cycle time
– throughput
Generally one parameter is set as a constrained
and the other one is optimized
-5-
More sophisticated Objective functions for
high-level and system design
Additional goals in modern approaches
❑ More accurate estimation, such as
– Size of operands
– Sharing of hardware for similar operations (e.g. + and -)
❑ Testability
❑ Low power
• Power down, clock disabling
❑ Reliability
• Fault tolerance, self-test
-6-
Finite State Machine + Datapath
model
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
… …
external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic
datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …
controller and datapath a view inside the controller and datapath
-7-
Example: greatest common divisor
!1
(a) black-box 1:
First create algorithm
(c) state
❑ view 1 !(!go_i) diagram
2:
❑ Convert algorithm to go_i x_i y_i
2-J:
!go_i
“complex” state GCD
3: x = x_i
machine
d_o
4: y = y_i
– Known as FSMD: finite- (b) desired !(x!=y)
5:
state machine with functionality
x!=y
datapath 0: int x, y;
1: while (1) {
6:
x<y !(x<y)
– Can use templates to 2: while (!go_i);
3: x = x_i; 7: y = y -x 8: x = x - y
perform such conversion 4: y = y_i;
5: while (x != y) {
6-J:
6: if (x < y)
5-J:
7: y = y - x;
else 9: d_o = x
8: x = x - y;
} 1-J:
9: d_o = x;
}
-8-
State diagram templates
Assignment statement Loop statement Branch statement
while (cond) { if (c1)
c1 stmts
a=b loop-body- else if c2
next statement statements c2 stmts
} else
other stmts
next statement next statement
!cond
a=b C: C:
cond c1 !c1*c2 !c1*!c2
next loop-body-
statements
c1 stmts c2 stmts others
statement
J: J:
next next
statement statement
-9-
Creating the datapath
❑ Create a register for any 1:
!1
declared variable 1 !(!go_i)
2:
Create a functional unit for
x_i y_i
❑ !go_i
Datapath
each arithmetic operation 2-J:
x_sel
n-bit 2x1 n-bit 2x1
3: x = x_i
❑ Connect the ports, registers y_sel
x_ld
and functional units 4: y = y_i 0: x 0: y
y_ld
– Based on reads and writes 5: !(x!=y)
!= < subtractor subtractor
Use multiplexors for
x!=y
– 6:
5: x!=y 6: x<y 8: x-y 7: y-x
x_neq_
multiple sources x<y !(x<y) y
x_lt_y 9: d
Create unique identifier 7: y = y -x 8: x = x - y d_ld
❑
d_o
for each datapath
6-J:
–
component control input 5-J:
and output 9: d_o = x
1-J:
- 10 -
Creating the controller’s FSM
!1 go_i
1:
1 !(!go_i)
Controller
0000 1:
!1 ❑ Same structure as FSMD
2:
!go_i
0001 2:
1 !(!go_i)
❑ Replace complex
!go_i
2-J:
0010 2-J: actions/conditions with
datapath configurations
3: x = x_i x_sel = 0
0011 3: x_ld = 1
4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y) Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1
6-J: 0111 1000
!= < subtractor subtractor
1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: 1010 5-J: x_neq_
y
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld
1-J: 1100 1-J: d_o
- 11 -
Splitting into a controller and
datapath
go_i
Controller implementation model Controller !1
0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_
Q3 Q2 Q1 Q0 x_neq_y=1 y
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_o
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:
1010 5-J:
1011 9: d_ld = 1
1100 1-J:
- 12 -
Controller state table for the GCD
example
Inputs Outputs
Q3 Q2 Q1 Q0 x_ne x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
q_y y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0
- 13 -
Completing the GCD custom single-
purpose processor design
❑ We finished the datapath … …
❑ We have a state table for controller datapath
the next state and control
registers
logic
next-state
and
control
– All that’s left is logic
combinational logic design
❑ This is not an optimized state
register
functional
units
design, but we see the
basic steps
… …
a view inside the controller and datapath
- 14 -
Optimizing single-purpose processors
Optimization is the task of making design
metric values the best possible
original program
Optimization FSMD
opportunities datapath
FSM
15
Optimizing the original program
❑ Analyze program attributes and look for areas of possible
improvement
– number of computations
– size of variable
– time and space complexity
– operations used
• multiplication and division very expensive
- 16 -
Optimizing the original program
(cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, x and y values evaluated as follows: (42, 8),
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (8,2), (2,0)
- 17 -
Optimizing the FSMD
❑ Areas of possible improvements
– merge states
• states with constants on transitions can be eliminated, transition
taken is already known
• states with independent operations can be merged
– separate states
• states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
– scheduling
- 18 -
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values 2:
2:
!go_i go_i !go_i
2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:
4: y = y_i x<y x>y
merge state 3 and state 4 – assignment operations are
independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)
x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each
6-J: state can be done from state 7 and state 8,
respectively
5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:
1-J:
- 19 -
Optimizing the datapath
b R1
R1 R2
0 1 mux1
0 1
mux2
Sharing of functional
units
Multiplier, 0
Controller • one-to-one mapping, as done
FSM previously, is not necessary
• if same operation occurs in
R1
we_r1
different states, they can
R2
we_r2 share a single functional unit
Multi-functional units
• ALUs support a variety of
op operations, it can be shared
among operations occurring in
different states
20
Optimizing the FSM
❑ State encoding
– task of assigning a unique bit pattern to each state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
❑ State minimization
– task of merging equivalent states into a single state
• state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state
21
A modern design flow
- 22 -
HLS: front-end
- 23 -
HLS: middle-end
- 24 -
HLS: back-end
- 25 -