This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
FPGA Implementation of Fast Running FIR Filters
S. Rengaprakash,1 M. Vignesh,2 N. Syed Anwar,3 M. Pragadheesh,4
E. Senthilkumar,5 M. Sandhya6 and J. Manikandan7
1,2,3,4
Department of EEE, Saranathan College of Engineering, Venkateswara Nagar, Trichy 620012, Tamil Nadu, INDIA
5,6,7
Department of ECE and Crucible of Research and Innovation (CORI), PES University, 100-Feet Ring Road,
BSK Stage III, Bangalore 560085, Karnataka, INDIA
Email: 1 rsoundar555@gmail.com 7 manikandanj@pes.edu
Abstract—Digital filter design using Finite Impulse Response TABLE I
(FIR) filters are predominantly used for various applications I MPULSE R ESPONSE OF VARIOUS FIR F ILTERS .
pertaining to digital signal processing and wireless communi-
cation. Fast running FIR filters are designed using the concepts Filter type Impulse response
of polyphase decomposition to provide the speed benefit. In this
sin[(n−M )·ωc ]
paper, hardware implementation of fast running FIR filter on a ; n = M
LPF hd (n) = ωc
π·(n−M )
Virtex-5 FPGA is proposed and the design is compared with two ; n=M
π
different techniques: FIR filter design using optimized transpose sin[(n−M )·ω ]
− π·(n−M ) c ; n = M
structure and FIR filter design using Distributed Arithmetic (DA) HPF hd (n) =
1 − ωπc ; n=M
approach. The fast running FIR filter designed in this paper is
sin[(n−M )·ωc2 ] sin[(n−M )·ωc1 ]
π·(n−M )
− π·(n−M )
; n = M
two times faster than the usual FIR filter. BPF hd (n) = ωc2 − ωc1
Index Terms—Digital filters, Distributed Arithmetic, Fast π
; n=M
sin[(n−M )·ωc1 ] sin[(n−M )·ωc2 ]
running filters, FIR, FPGA. π·(n−M )
− π·(n−M )
; n = M
BSF hd (n) = ωc2 − ωc1
1− π
; n=M
I. I NTRODUCTION
Finite Impulse Response (FIR) filter design is an important
topic in digital signal processing and wireless communication
systems. FIR filters are used for various wireless applications
such as software defined radios in [1], satellite receivers
in [2], satellite payload filter in [3], pre-processing for wire-
less sensor networks in [4], for 3G communication systems
in [5], for RADAR applications in [6] and many more. FIR
filters have been implemented on various hardwares such as
microcontroller in [7], DSPs in [8], [9], FPGAs in [10],
[11], ASICs in [5] and using SoCs in [12], [13]. FPGAs have
come a long way from mere glue-logic operations in [14], as Fig. 1. Optimized transpose structure of basic FIR filter.
controllers in [15], for system-on-chip based designs in [16],
[17] and reconfigurable systems in [18], [19]. This motivated
the implementation of proposed filter design onto an FPGA. A. Basic FIR Filter
FIR filters are designed using various approaches and to The block diagram for FIR filter design using optimized
list a few, design of FIR filters using Distributed Arithmetic transpose structure is shown in Fig. 1 and the basic FIR filter
(DA) is reported in [10], using residue arithmetic in [20] equation is given as
and using particle swarm optimization in [21]. A fast-running −1
N
FIR filter was first proposed in [22] and not much work y[n] = h [k] x[n − k] (1)
has been carried out in this area. In this paper, hardware k=0
implementation of fast-running FIR filter onto an FPGA device
where N is the FIR filter length, x[n − k] is the (n − k)th
is proposed and the experimental results are reported followed
instance of input data and h[k] is the kth coefficient of the
by a comparison between proposed design and FIR filter
filter, computed as
designed using optimized transpose structure and DA.
h [n] = w [n] · hd [n] (2)
II. FIR F ILTERS
where w [n] and hd [n] are the windowing function and impulse
In this section, a comparison between mathematical ex- response of filter respectively.
pressions for FIR filter design using optimized transpose
structure, distributed arithmetic and fast-running FIR filters B. FIR Filter Using Distributed Arithmetic
are discussed. The impulse response for high-pass, low-pass, Distributed Arithmetic (DA) is an alternative and efficient
band-pass and band-stop filters are given in Table I. technique for multiply and accumulate (MAC) operation. DA
978-1-5090-4442-9/17/$31.00 2017
c IEEE 1282
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
Fig. 2. Distributed arithmetic based FIR filter design.
technique reduces MAC operation by distributing the inputs filter. The bit wise information obtained from shift register unit
in bit serial fashion. The FIR filter equation given in (1) is acts as LUT address and the data read from LUT are finally
rewritten as passed to the adder/shifter unit.
y [n] = h [0] x [n] + h [1] x [n − 1] + h [2] x [n − 2]
C. Fast Running FIR Filter
+ . . . + h [k] x[n − k] (3)
Fast running FIR filter design is an outcome of polyphase
and the input x[n] at any instance is written as decomposition, wherein the input signal x[n] and filter h[n]
B−1 are decomposed into R polyphase components. The polyphase
x [n] = xb [n]2b (4) decomposition of input signal X(z) and filter H(z), with R =
b=0 2 into even and odd polyphase components are given as
where xb [n] ∈ [0, 1] is the bth bit of input x[n]. The FIR filter
X (z) = x [n] z −n = X0 z 2 + z −1 X1 z 2 (8)
equation is rewritten as
−1
B−1 n
N H (z) = h [n] z −n = H0 z 2 + z −1 H1 z 2 . (9)
y= h[n] xb [n]2 .
b
(5)
n
n=0 b=0
Rearranging (5) by grouping the sum of the products gives The basic FIR filter equation in (1) can be rewritten in time
domain and z-domain as
y = h [0] xB−1 [0] 2B−1 + xB−2 [0] 2B−2 + . . . + x0 [0] 20
+ h [1] xB−1 [1] 2B−1 + xB−2 [1] 2B−2 + . . . + x0 [1] 20 y [n] = x [n] ∗ h [n] (10)
.. Y (z) = X(z)H(z). (11)
.
+ h [N − 1] xB−1 [N − 1] 2B−1 + · · · + x0 [N − 1] 20 Substituting (8) and (9) in (11) yields
(6) Y (z) = X0 z 2 + z −1 X1 z 2 [H0 z 2 + z −1 H1 z 2 ]
y = h [0] xB−1 [0] + . . . + h [N − 1] xB−1 [N − 1] 2B−1 = X0 z 2 H0 z 2 + X0 z 2 z −1 H1 z 2
+ h [0] xB−2 [0] + . . . + h [N − 1] xB−2 [N − 1] 2B−2 + z −1 X1 z 2 H0 z 2 + z −2 X1 z 2 H1 z 2
.. = X0 z 2 H0 z 2 + z −2 X1 z 2 H1 z 2
.
+ z −1 X1 z 2 H0 z 2 + z −1 X0 z 2 H1 z 2
+ (h [0] x0 [0] + h [1] x0 [1] . . . + h [N − 1] x0 [N − 1]) 20 . 2
(7) = Y0 z + z −1 Y1 z 2 (12)
Since the coefficients, are known constants and xb [n] ∈ where Y0 (z) and Y1 (z) are the polyphase components with
[0, 1], the h[n]xb [n] multiplication in (7) is reduced to an
Y0 (z) = X0 (z) H0 (z) + z −1 X1 (z) H1 (z) (13)
addition of a set of h[n] coefficients, based on the values
of xb [n]. These values are pre-computed for all the possible and
combinations of input bits and stored in a Look-up Table
(LUT). Y1 (z) = X1 (z) H0 (z) + X0 (z) H1 (z) . (14)
The block diagram of FIR filter designed using DA with
filter order 12 is shown in Fig 2. The shift register unit retrieves The architecture for fast-running FIR filter design using
bitwise information from every input sample x[n], fed to the above-mentioned equations is shown in Fig. 3.
1283
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
Fig. 3. Fast running FIR filter architecture. Fig. 5. Hardware setup for evaluating proposed work.
(a) Model for optimized transpose structure based FIR Filter. Fig. 6. Block diagram of the hardware setup established.
TABLE II
S IMULATION R ESULTS OF FAST RUNNING FIR F ILTER .
Filter LPF HPF BPF BSF
type (fc = 2 kHz) (fc = 3 kHz) (fc1 = 2 kHz, (fc1 = 3 kHz,
→ fc2 = 6 kHz) fc2 = 6 kHz)
fin ↓ Sim. Xil. Sim. Xil. Sim. Xil. Sim. Xil.
1 kHz 4.57 4.51 0.71 0.55 0.82 0.77 4.00 4.13
2 kHz 3.19 3.04 1.65 1.37 2.4 2.42 2.90 2.75
(b) Model for distributed arithmetic based FIR Filter. 3 kHz 1.38 1.32 2.86 2.75 3.87 3.85 1.62 1.82
4 kHz 0.17 0.17 4.67 4.81 4.26 4.40 0.40 1.65
5 kHz 0.50 0.44 5.50 5.50 3.60 3.58 0.90 2.20
6 kHz 0.44 0.39 5.72 5.50 1.90 2.20 2.30 3.63
7 kHz 0.33 0.28 5.50 5.50 0.97 0.99 4.00 5.50
8 kHz 0.28 0.28 5.33 5.22 0.53 0.49 5.40 6.38
9 kHz 0.50 0.07 5.28 5.28 0.20 0.28 6.30 6.20
10 kHz 0.33 0.39 5.39 5.39 0.88 1.10 6.40 5.90
TABLE III
H ARDWARE R ESULTS OF FAST RUNNING FIR F ILTER .
fin (kHz) LPF HPF BPF BSF
(c) Model for fast-running fir filter structure.
1 4.56 0.80 1.04 4.40
Fig. 4. FPGA model for FIR filter designs. 2 3.20 1.68 2.60 3.60
3 1.52 5.60 4.24 2.00
III. FPGA I MPLEMENTATION OF THE F ILTERS 4 0.48 5.36 4.60 2.30
5 0.72 5.20 4.08 2.30
The FIR filter design using various approaches dis- 6 0.80 5.76 2.48 4.10
cussed in Section II are implemented on Virtex-5 FPGA 7 0.64 5.60 1.28 5.52
based ML507 evaluation board. An FPGA can be pro- 8 0.64 5.44 1.04 5.60
9 0.48 5.92 0.64 5.52
grammed using VHDL, Verilog, SystemVerilog, SystemC, 10 0.72 5.84 1.28 5.44
High Level Synthesis (HLS) or System Generator blocksets
using MATLAB Simulink. Hardware implementation of pro-
posed work is carried out using Xilinx System Generator
IV. E XPERIMENTAL R ESULTS
blocksets for FPGA, which is a model based programming
approach. The Xilinx model designed for FIR filter using The hardware setup established and its block diagram for
optimized transpose structure, FIR filter using DA technique testing proposed work are shown in Figs. 5 and 6 respec-
and fast running FIR architecture are shown in Fig. 4(a)–(c). tively. DIP switches on FPGA board are used to assess the
1284
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
TABLE IV
C OMPARISON OF H ARDWARE R ESOURCES UTILIZED BY VARIOUS FIR F ILTER D ESIGN A PPROACHES .
Filter Type → LPF HPF BPF BSF
Technique → Available Opt. Tr. DA Fast Opt. Tr. DA Fast Opt. Tr. DA Fast Opt. Tr. DA Fast
Resources ↓
Slice registers 44800 4147 322 3013 4147 322 2386 8152 322 4571 8352 322 4545
Slice LUTs 44800 4739 1005 2571 4754 1005 3332 9250 1005 5668 9524 1005 5837
Block memory 148 0 98 10 0 98 16 0 98 16 0 98 16
DSP48Es 128 39 2 64 44 2 53 77 2 71 81 2 77
I/Os 640 23 24 21 23 24 21 27 24 21 27 24 21
TABLE V [2] J. J. Patel, K. R. Parmar, and H. N. Mewada, “Design of FIR
P OWER C ONSUMED BY VARIOUS FIR F ILTER D ESIGNS . filter for burst mode demodulator of satellite Receiver,” in 2016
International Conference on Communication and Signal Processing
(ICCSP), Melmaruvathur, 2016, pp. 0686–0690.
Design → Optimized Transpose Distributed Arithmetic Fast Running
Filter Type ↓ [3] A. Goel and A. Gupta, “Design of satellite payload filter emulator
using hamming window,” in 2014 International Conference on Medical
LPF 1.444 W 1.635 W 1.432 W Imaging, m-Health and Emerging Communication Systems (MedCom),
HPF 1.446 W 1.631 W 1.432 W Greater Noida, 2014, pp. 202–205.
BPF 1.473 W 1.632 W 1.432 W [4] Cheng Xu, Su Yin, Yunchuan Qin, and Hanzheng Zou, “A novel
BSF 1.464 W 1.637 W 1.432 W hardware efficient FIR filter for wireless sensor networks,” in 2013 Fifth
International Conference on Ubiquitous and Future Networks (ICUFN),
Da Nang, 2013, pp. 197–201.
performance of the filter designed by varying the filter cut-off [5] M. Ojail, S. Chevobbe, R. David, and D. Demigny, “A frequency domain
frequencies and input signal frequency. FIR filter implementation method for 3G communication systems,” in
The simulation results of proposed fast-running FIR filter 2008 The Third International Conference on Digital Telecommunications
(icdt 2008), Bucharest, 2008, pp. 1–5.
designed using Simulink blocksets and Xilinx System gen- [6] M. Lavanya and A. Kalaiselvi, “High speed FIR adaptive fil-
erator blocksets are reported in Table II for all the four ter for RADAR applications,” in 2016 International Conference on
types of filters. The hardware results of fast-running FIR filter Wireless Communications, Signal Processing and Networking (WiSP-
NET), Chennai, 2016, pp. 2118–2122.
implemented on FPGA for the same cut-off frequencies used
[7] S. Pujari, A. Yeotkar, V. Shingare, S. Momin, and B. Kokare,
for simulation are reported in Table III. It is observed from “Performance analysis of microcontroller and FPGA based Signal
Tables II and III that the hardware results of fast running Processing a case study on FIR filter design and implementation,”
FIR filter are in par with the simulation results. Satisfactory in 2015 International Conference on Industrial Instrumentation and
Control (ICIC), Pune, 2015, pp. 252–257.
performance is observed on testing the filter for different cut- [8] S. Vityazev, A. Kharin, A. Kalinkin, and V. Vityazev, “Parallel form
off frequencies too. of FIR filter for implementation on multicore DSP,” in 2014 3rd
The hardware resources utilized to design fast-running FIR Mediterranean Conference on Embedded Computing (MECO), Budva,
Montenegro, 2014, pp. 177–179.
filters is compared with the resources utilized to design FIR [9] C. L. Hu, “Design and verification of FIR filter based on Matlab and
filters using optimized transpose structure and Distributed DSP,” in 2012 International Conference on Image Analysis and Signal
Arithmetic based approach in Table IV. It is observed that Processing, Hangzhou, 2012, pp. 1–4.
[10] P. Longa and A. Miri, “Area-efficient FIR filter design on FPGAs
the DA based approach consumes least resources followed by using distributed arithmetic,” in 2006 IEEE International Symposium on
fast running FIR and optimized transpose structure based FIR Signal Processing and Information Technology, Vancouver, BC, 2006,
filter. Details about the power consumption for each design pp. 248–252.
[11] E. Ozpolat, B. Karakaya, T. Kaya, and A. Gulten, “FPGA-based
are obtained using Xilinx XPower Analyzer and the results digital Filter Design for Biomedical Signal,” in 2016 XII International
are reported in Table V. Conference on Perspective Technologies and Methods in MEMS Design
(MEMSTECH), Lviv, 2016, pp. 70–73.
V. C ONCLUSION [12] I. Steiner and G. A. Jullien, “A Fault-Tolerant Complex FIR Filter for
SoC Communication Technologies,” in 2007 Conference Record of the
Design and implementation of fast running FIR filter on Forty-First Asilomar Conference on Signals, Systems and Computers,
Xilinx Virtex-5 FPGA is reported and its performance is Pacific Grove, CA, 2007, pp. 2009–2013.
[13] A. T. Erdogan and T. Arslan, “High throughput FIR filter design for
evaluated with optimized transpose structure based FIR filter low power SoC applications,” in Proceedings of 13th Annual IEEE
and Distributed Arithmetic based FIR filter approaches. The International ASIC/SOC Conference (Cat. No.00TH8541), Arlington,
proposed filter is two times faster than usual FIR filter. The VA, 2000, pp. 374–378.
[14] K. J. Raut and S. S. Shiriramwar, “MP3 Portable Player System-Level
proposed system can be easily employed for various wireless Glue Logic on FPGA,” in 2007 IEEE International Conference on
communication applications for swift processing. Microelectronic Systems Education (MSE’07), San Diego, CA, 2007,
pp. 97–98.
R EFERENCES [15] J. Manikandan, M. Jayaraman, and M. Jayachandran, “Design of
an FPGA-based electronics flow regulator for spacecraft propulsion
[1] F. Harris, “Fixed length FIR filters with continuously variable system,” Advances in Space Research, vol. 47, no. 3, pp. 488–495,
bandwidth,” in 2009 1st International Conference on Wireless Feb. 2011.
Communication, Vehicular Technology, Information Theory and [16] A. Fridman and S. Semenov, “System-on-Chip FPGA-based GNSS
Aerospace & Electronic Systems Technology, Aalborg, 2009, receiver,” in Design & Test Symposium, 2013 East-West, Rostov-on-Don,
pp. 931–935. 2013, pp. 1–7.
1285
This full-text paper was peer-reviewed and accepted to be presented at the IEEE WiSPNET 2017 conference.
[17] J. Manikandan and B. Venkataramani, “System-on-programmable-chip [20] G. Loonawat and R. E. Siferd, “FPGA implementation of a FIR filter
implementation of diminishing learning based pattern recognition sys- using residue arithmetic,” in Proceedings of the IEEE 1996 National
tem,” International Journal of Machine Learning and Cybernetics, Aerospace and Electronics Conference NAECON 1996, Dayton, OH,
vol. 4, no. 4, pp. 347–363, Aug. 2013. 1996, vol. 1, pp. 286–290.
[18] N. Montealegre, D. Merodio, A. Fernández, and P. Armbruster, “In-flight [21] S. Mukherjee, R. Kar, D. Mandal, S. Mondal, and S. P. Ghoshal,
reconfigurable FPGA-based space systems,” in Adaptive Hardware and “Linear phase low pass FIR filter design using Improved Particle Swarm
Systems (AHS), 2015 NASA/ESA Conference on, Montreal, QC, 2015, Optimization,” in 2011 IEEE Student Conference on Research and
pp. 1–8. Development, Cyberjaya, 2011, pp. 358–363.
[19] J. Manikandan, S. Shruthi, S. J. Mangala, and V. K. Agrawal, “Design [22] U. Meyer-Baese, Digital signal processing with field programmable gate
and implementation of reconfigurable coders for communication sys- arrays, 2nd ed., Berlin Heidelberg: Springer-Verlag, 2004.
tems,” in Proc. Int. Conf. on VLSI Systems, Architectures, Technology
and Applications (VLSI-SATA), Bengaluru, India, Jan. 2016, pp. 1–5.
1286