0% found this document useful (0 votes)

15 views6 pages

Efficient FFT Implementations

This document discusses efficient implementations of the Fast Fourier Transform (FFT), focusing on both iterative and parallel approaches. It explains the iterative FFT algorithm, which operates in O(n log n) time, and introduces the concept of butterfly operations for combining elements. Additionally, it describes a parallel FFT circuit that leverages the iterative algorithm's structure to perform computations concurrently, achieving a depth of O(log n).

Uploaded by

kunalrastogi13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Efficient FFT Implementations

Uploaded by

kunalrastogi13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

30.

3 Efﬁcient FFT implementations 915

DFT is therefore a special case of the chirp transform, obtained by taking ´ D !n .

Show how to evaluate the chirp transform in time O.n lg n/ for any complex num-
ber ´. (Hint: Use the equation
n1
X
2 =2 2 2
yk D ´k aj ´j =2 ´.kj / =2
j D0

to view the chirp transform as a convolution.)

30.3 Efﬁcient FFT implementations

Since the practical applications of the DFT, such as signal processing, demand the
utmost speed, this section examines two efficient FFT implementations. First, we
shall examine an iterative version of the FFT algorithm that runs in ‚.n lg n/ time
but can have a lower constant hidden in the ‚-notation than the recursive version
in Section 30.2. (Depending on the exact implementation, the recursive version
may use the hardware cache more efficiently.) Then, we shall use the insights that
led us to the iterative implementation to design an efficient parallel FFT circuit.

An iterative FFT implementation

We ﬁrst note that the for loop of lines 10–13 of R ECURSIVE -FFT involves com-
puting the value !nk ykŒ1 twice. In compiler terminology, we call such a value a
common subexpression. We can change the loop to compute it only once, storing
it in a temporary variable t.

for k D 0 to n=2 1
t D ! ykŒ1
yk D ykŒ0 C t
ykC.n=2/ D ykŒ0 t
! D ! !n

The operation in this loop, multiplying the twiddle factor ! D !nk by ykŒ1 , storing
the product into t, and adding and subtracting t from ykŒ0 , is known as a butterﬂy
operation and is shown schematically in Figure 30.3.
We now show how to make the FFT algorithm iterative rather than recursive
in structure. In Figure 30.4, we have arranged the input vectors to the recursive
calls in an invocation of R ECURSIVE -FFT in a tree structure, where the initial
call is for n D 8. The tree has one node for each call of the procedure, labeled
916 Chapter 30 Polynomials and the FFT

ykŒ0 + ykŒ0 C !nk ykŒ1 ykŒ0 ykŒ0 C !nk ykŒ1

!nk !nk

ykŒ1 • – ykŒ0 !nk ykŒ1 ykŒ1 ykŒ0 !nk ykŒ1

(a) (b)

Figure 30.3 A butterﬂy operation. (a) The two input values enter from the left, the twiddle fac-
tor !nk is multiplied by yk , and the sum and difference are output on the right. (b) A simpliﬁed
Œ1

drawing of a butterﬂy operation. We will use this representation in a parallel FFT circuit.

(a0,a1,a2,a3,a4,a5,a6,a7)

(a0,a2,a4,a6) (a1,a3,a5,a7)

(a0,a4) (a2,a6) (a1,a5) (a3,a7)

(a0) (a4) (a2) (a6) (a1) (a5) (a3) (a7)

Figure 30.4 The tree of input vectors to the recursive calls of the R ECURSIVE -FFT procedure. The
initial invocation is for n D 8.

by the corresponding input vector. Each R ECURSIVE -FFT invocation makes two
recursive calls, unless it has received a 1-element vector. The first call appears in
the left child, and the second call appears in the right child.
Looking at the tree, we observe that if we could arrange the elements of the
initial vector a into the order in which they appear in the leaves, we could trace
the execution of the R ECURSIVE -FFT procedure, but bottom up instead of top
down. First, we take the elements in pairs, compute the DFT of each pair using
one butterfly operation, and replace the pair with its DFT. The vector then holds
n=2 2-element DFTs. Next, we take these n=2 DFTs in pairs and compute the
DFT of the four vector elements they come from by executing two butterfly oper-
ations, replacing two 2-element DFTs with one 4-element DFT. The vector then
holds n=4 4-element DFTs. We continue in this manner until the vector holds two
.n=2/-element DFTs, which we combine using n=2 butterfly operations into the
final n-element DFT.
To turn this bottom-up approach into code, we use an array AŒ0 : : n 1� that
initially holds the elements of the input vector a in the order in which they appear
30.3 Efficient FFT implementations 917

in the leaves of the tree of Figure 30.4. (We shall show later how to determine this
order, which is known as a bit-reversal permutation.) Because we have to combine
DFTs on each level of the tree, we introduce a variable s to count the levels, ranging
from 1 (at the bottom, when we are combining pairs to form 2-element DFTs)
to lg n (at the top, when we are combining two .n=2/-element DFTs to produce the
final result). The algorithm therefore has the following structure:
1 for s D 1 to lg n
2 for k D 0 to n 1 by 2s
3 combine the two 2s1 -element DFTs in
AŒk : : k C 2s1 1� and AŒk C 2s1 : : k C 2s 1�
into one 2s -element DFT in AŒk : : k C 2s 1�
We can express the body of the loop (line 3) as more precise pseudocode. We
copy the for loop from the R ECURSIVE -FFT procedure, identifying y Œ0 with
AŒk : : k C 2s1 1� and y Œ1 with AŒk C 2s1 : : k C 2s 1�. The twiddle fac-
tor used in each butterfly operation depends on the value of s; it is a power of !m ,
where m D 2s . (We introduce the variable m solely for the sake of readability.)
We introduce another temporary variable u that allows us to perform the butterfly
operation in place. When we replace line 3 of the overall structure by the loop
body, we get the following pseudocode, which forms the basis of the parallel im-
plementation we shall present later. The code first calls the auxiliary procedure
B IT-R EVERSE -C OPY .a; A/ to copy vector a into array A in the initial order in
which we need the values.
I TERATIVE -FFT.a/
1 B IT-R EVERSE -C OPY .a; A/
2 n D a:length // n is a power of 2
3 for s D 1 to lg n
4 m D 2s
5 !m D e 2 i=m
6 for k D 0 to n 1 by m
7 ! D1
8 for j D 0 to m=2 1
9 t D ! AŒk C j C m=2�
10 u D AŒk C j �
11 AŒk C j � D u C t
12 AŒk C j C m=2� D u t
13 ! D ! !m
14 return A
How does B IT-R EVERSE -C OPY get the elements of the input vector a into the
desired order in the array A? The order in which the leaves appear in Figure 30.4
918 Chapter 30 Polynomials and the FFT

is a bit-reversal permutation. That is, if we let rev.k/ be the lg n-bit integer

formed by reversing the bits of the binary representation of k, then we want to
place vector element ak in array position AŒrev.k/�. In Figure 30.4, for exam-
ple, the leaves appear in the order 0; 4; 2; 6; 1; 5; 3; 7; this sequence in binary is
000; 100; 010; 110; 001; 101; 011; 111, and when we reverse the bits of each value
we get the sequence 000; 001; 010; 011; 100; 101; 110; 111. To see that we want a
bit-reversal permutation in general, we note that at the top level of the tree, indices
whose low-order bit is 0 go into the left subtree and indices whose low-order bit
is 1 go into the right subtree. Stripping off the low-order bit at each level, we con-
tinue this process down the tree, until we get the order given by the bit-reversal
permutation at the leaves.
Since we can easily compute the function rev.k/, the B IT-R EVERSE -C OPY pro-
cedure is simple:

B IT-R EVERSE -C OPY .a; A/

1 n D a:length
2 for k D 0 to n 1
3 AŒrev.k/� D ak

The iterative FFT implementation runs in time ‚.n lg n/. The call to B IT-
R EVERSE -C OPY.a; A/ certainly runs in O.n lg n/ time, since we iterate n times
and can reverse an integer between 0 and n 1, with lg n bits, in O.lg n/ time.
(In practice, because we usually know the initial value of n in advance, we would
probably code a table mapping k to rev.k/, making B IT-R EVERSE -C OPY run in
‚.n/ time with a low hidden constant. Alternatively, we could use the clever amor-
tized reverse binary counter scheme described in Problem 17-1.) To complete the
proof that I TERATIVE -FFT runs in time ‚.n lg n/, we show that L.n/, the number
of times the body of the innermost loop (lines 8–13) executes, is ‚.n lg n/. The
for loop of lines 6–13 iterates n=m D n=2s times for each value of s, and the
innermost loop of lines 8–13 iterates m=2 D 2s1 times. Thus,
lg n
X n s1
L.n/ D 2
sD1
2s
lg n
X n
D
sD1
2
D ‚.n lg n/ :
30.3 Efﬁcient FFT implementations 919

a0 y0
!20
a1 y1
!40
a2 y2
!20 !41
a3 y3
!80
a4 y4
!20 !81
a5 y5
!40 !82
a6 y6
!20 !41 !83
a7 y7
stage s D 1 stage s D 2 stage s D 3

Figure 30.5 A circuit that computes the FFT in parallel, here shown on n D 8 inputs. Each
butterfly operation takes as input the values on two wires, along with a twiddle factor, and it produces
as outputs the values on two wires. The stages of butterflies are labeled to correspond to iterations
of the outermost loop of the I TERATIVE -FFT procedure. Only the top and bottom wires passing
through a butterfly interact with it; wires that pass through the middle of a butterfly do not affect
that butterfly, nor are their values changed by that butterfly. For example, the top butterfly in stage 2
has nothing to do with wire 1 (the wire whose output is labeled y1 ); its inputs and outputs are only
on wires 0 and 2 (labeled y0 and y2 , respectively). This circuit has depth ‚.lg n/ and performs
‚.n lg n/ butterfly operations altogether.

A parallel FFT circuit

We can exploit many of the properties that allowed us to implement an efficient
iterative FFT algorithm to produce an efficient parallel algorithm for the FFT. We
will express the parallel FFT algorithm as a circuit. Figure 30.5 shows a parallel
FFT circuit, which computes the FFT on n inputs, for n D 8. The circuit begins
with a bit-reverse permutation of the inputs, followed by lg n stages, each stage
consisting of n=2 butterflies executed in parallel. The depth of the circuit—the
maximum number of computational elements between any output and any input
that can reach it—is therefore ‚.lg n/.
The leftmost part of the parallel FFT circuit performs the bit-reverse permuta-
tion, and the remainder mimics the iterative I TERATIVE -FFT procedure. Because
each iteration of the outermost for loop performs n=2 independent butterfly opera-
tions, the circuit performs them in parallel. The value of s in each iteration within
920 Chapter 30 Polynomials and the FFT

I TERATIVE -FFT corresponds to a stage of butterﬂies shown in Figure 30.5. For

s D 1; 2; : : : ; lg n, stage s consists of n=2s groups of butterflies (corresponding to
each value of k in I TERATIVE -FFT), with 2s1 butterflies per group (corresponding
to each value of j in I TERATIVE -FFT). The butterflies shown in Figure 30.5 corre-
spond to the butterfly operations of the innermost loop (lines 9–12 of I TERATIVE -
FFT). Note also that the twiddle factors used in the butterflies correspond to those
used in I TERATIVE -FFT: in stage s, we use !m 0 1
; !m m=21
; : : : ; !m , where m D 2s .

Exercises

30.3-1
Show how I TERATIVE -FFT computes the DFT of the input vector .0; 2; 3; 1; 4;
5; 7; 9/.

30.3-2
Show how to implement an FFT algorithm with the bit-reversal permutation occur-
ring at the end, rather than at the beginning, of the computation. (Hint: Consider
the inverse DFT.)

30.3-3
How many times does I TERATIVE -FFT compute twiddle factors in each stage?
Rewrite I TERATIVE -FFT to compute twiddle factors only 2s1 times in stage s.

30.3-4 ?
Suppose that the adders within the butterﬂy operations of the FFT circuit some-
times fail in such a manner that they always produce a zero output, independent
of their inputs. Suppose that exactly one adder has failed, but that you don’t know
which one. Describe how you can identify the failed adder by supplying inputs to
the overall FFT circuit and observing the outputs. How efﬁcient is your method?

Problems

30-1 Divide-and-conquer multiplication

a. Show how to multiply two linear polynomials ax C b and cx C d using only
three multiplications. (Hint: One of the multiplications is .a C b/ .c C d /.)

b. Give two divide-and-conquer algorithms for multiplying two polynomials of

degree-bound n in ‚.nlg 3 / time. The ﬁrst algorithm should divide the input
polynomial coefﬁcients into a high half and a low half, and the second algorithm
should divide them according to whether their index is odd or even.

Implementation of Fixed Point FFT 512 1K 2K 4K On FPGA
No ratings yet
Implementation of Fixed Point FFT 512 1K 2K 4K On FPGA
16 pages
FFT Algorithm for Engineering Students
No ratings yet
FFT Algorithm for Engineering Students
10 pages
DIT and DIF Algorithms
0% (1)
DIT and DIF Algorithms
21 pages
Title Computational Complexity of FFT Algorithm: Objective
No ratings yet
Title Computational Complexity of FFT Algorithm: Objective
4 pages
III. Fast Fourier Transform: Digital Signal Processing - 3
No ratings yet
III. Fast Fourier Transform: Digital Signal Processing - 3
17 pages
Lec fftII
No ratings yet
Lec fftII
53 pages
1707 01697 PDF
No ratings yet
1707 01697 PDF
5 pages
Algorithms of Scientific Computing: Fast Fourier Transform (FFT)
No ratings yet
Algorithms of Scientific Computing: Fast Fourier Transform (FFT)
28 pages
FFT Algorithm Implement in C
No ratings yet
FFT Algorithm Implement in C
9 pages
Fast Fourier Transform (FFT) (Theory and Implementation)
No ratings yet
Fast Fourier Transform (FFT) (Theory and Implementation)
47 pages
Dept of Electrical, Electronics & Instrumentation Engineering KK Birla Goa Campus
No ratings yet
Dept of Electrical, Electronics & Instrumentation Engineering KK Birla Goa Campus
36 pages
You Only Live Once
No ratings yet
You Only Live Once
13 pages
Sound Analysis for Math Enthusiasts
No ratings yet
Sound Analysis for Math Enthusiasts
106 pages
Parallel Fast Fourier Transform: 159.735 Studies in Parallel and Distributed System
No ratings yet
Parallel Fast Fourier Transform: 159.735 Studies in Parallel and Distributed System
9 pages
DSP6 FFT
No ratings yet
DSP6 FFT
20 pages
FFT
No ratings yet
FFT
32 pages
DIT Radix-2
No ratings yet
DIT Radix-2
7 pages
FFT Algorithms for Engineers
No ratings yet
FFT Algorithms for Engineers
37 pages
VHDL Implementation of A Flexible and Synthesizable FFT Processor
No ratings yet
VHDL Implementation of A Flexible and Synthesizable FFT Processor
5 pages
FFT A Brief Overview
No ratings yet
FFT A Brief Overview
30 pages
A New Representation of FFT Algorithms
No ratings yet
A New Representation of FFT Algorithms
9 pages
Conceptual Design v2
No ratings yet
Conceptual Design v2
4 pages
ELEC692 VLSI Signal Processing Architecture: Architecture For Fourier Transform
No ratings yet
ELEC692 VLSI Signal Processing Architecture: Architecture For Fourier Transform
40 pages
5 L L EC533: Digital Signal Processing: DFT and FFT
No ratings yet
5 L L EC533: Digital Signal Processing: DFT and FFT
20 pages
Algoritham
No ratings yet
Algoritham
18 pages
Exp 3 Report
No ratings yet
Exp 3 Report
14 pages
Module 7
No ratings yet
Module 7
26 pages
Implementation of Fast Fourier Transform (FFT) Using VHDL
93% (30)
Implementation of Fast Fourier Transform (FFT) Using VHDL
71 pages
1971 - FFT Pruning (FFT)
No ratings yet
1971 - FFT Pruning (FFT)
7 pages
DFT 2
No ratings yet
DFT 2
19 pages
FFT Basics for ECE Students
No ratings yet
FFT Basics for ECE Students
11 pages
FFT Basics for ECE Students
No ratings yet
FFT Basics for ECE Students
22 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
25 pages
FFT: Decimation in Time & Frequency
No ratings yet
FFT: Decimation in Time & Frequency
37 pages
An In-Place Radix-2 DIT FFT For Input in Natural Order
No ratings yet
An In-Place Radix-2 DIT FFT For Input in Natural Order
11 pages
Impact of DPU 2017
No ratings yet
Impact of DPU 2017
6 pages
Akhil Jadawala (41492) A Darshana Gulhane (41491)
No ratings yet
Akhil Jadawala (41492) A Darshana Gulhane (41491)
25 pages
FFTReal Version 2.11
No ratings yet
FFTReal Version 2.11
5 pages
Fast Fourier Transform
No ratings yet
Fast Fourier Transform
15 pages
An 0487
No ratings yet
An 0487
26 pages
KLakshmiNarasamma KSundeep 139
No ratings yet
KLakshmiNarasamma KSundeep 139
6 pages
FFT Tutorial 121102
No ratings yet
FFT Tutorial 121102
28 pages
FFT Algorithms: A Survey: Pavan Kumar K M, Priya Jain, Ravi Kiran S, Rohith N, Ramamani K
No ratings yet
FFT Algorithms: A Survey: Pavan Kumar K M, Priya Jain, Ravi Kiran S, Rohith N, Ramamani K
5 pages
The Fast Fourier Transform
No ratings yet
The Fast Fourier Transform
20 pages
Design and Simulation of 32-Point FFT Using Radix-2 Algorithm For FPGA 2012
No ratings yet
Design and Simulation of 32-Point FFT Using Radix-2 Algorithm For FPGA 2012
5 pages
Fast Fourier Transform (FFT)
No ratings yet
Fast Fourier Transform (FFT)
23 pages
Polynomial Multiplication Optimization
No ratings yet
Polynomial Multiplication Optimization
25 pages
FFT 2025
No ratings yet
FFT 2025
39 pages
Lab Assignment
0% (1)
Lab Assignment
62 pages
Designing An Optimal Railway Routes Planner Using Dijkstra-1
No ratings yet
Designing An Optimal Railway Routes Planner Using Dijkstra-1
47 pages
MATLAB Discrete Signal Sampling
No ratings yet
MATLAB Discrete Signal Sampling
4 pages
EECS 247 Analog-Digital Interface Integrated Circuits © 2008
No ratings yet
EECS 247 Analog-Digital Interface Integrated Circuits © 2008
24 pages
Complex Numbers & Matrices Guide
No ratings yet
Complex Numbers & Matrices Guide
10 pages
Bms Institute of Technology Department of Mca Sub Code - 16mca38 Algorithms Laboratory Viva Questions
No ratings yet
Bms Institute of Technology Department of Mca Sub Code - 16mca38 Algorithms Laboratory Viva Questions
13 pages
L9 Special Programming Models Transportation Model
No ratings yet
L9 Special Programming Models Transportation Model
113 pages
W1 W2 W3 W4 Supply F1 14 25 45 5 6 F2 65 25 35 55 8 F3 35 3 65 15 16 Demand 4 7 6 13
No ratings yet
W1 W2 W3 W4 Supply F1 14 25 45 5 6 F2 65 25 35 55 8 F3 35 3 65 15 16 Demand 4 7 6 13
2 pages
ADC Basics for Engineers
No ratings yet
ADC Basics for Engineers
59 pages
Simulated Annealing in Stochastic Search
No ratings yet
Simulated Annealing in Stochastic Search
19 pages
B市多水源供水系统一级分区和泵组优化调度控制漏失欧谌昊
No ratings yet
B市多水源供水系统一级分区和泵组优化调度控制漏失欧谌昊
109 pages
IVA Question Bank
No ratings yet
IVA Question Bank
8 pages
Tabla de Polinomios Generadores PDF
No ratings yet
Tabla de Polinomios Generadores PDF
9 pages
Market Basket Analysis For A Supermarket
No ratings yet
Market Basket Analysis For A Supermarket
9 pages
Linear Block Codes Explained
No ratings yet
Linear Block Codes Explained
20 pages
Solution Assignment 4
No ratings yet
Solution Assignment 4
9 pages
IE2108 2023-2024 Semester 2
No ratings yet
IE2108 2023-2024 Semester 2
4 pages
Naive Bayes Classification Numerical Example With Code
No ratings yet
Naive Bayes Classification Numerical Example With Code
8 pages
Session 3 CSP 2023 AP Daily Practice Sessions
No ratings yet
Session 3 CSP 2023 AP Daily Practice Sessions
6 pages
Digital Control System Answers
No ratings yet
Digital Control System Answers
4 pages
AIM: Program To Find Factorial of A Number by Using Recursion
No ratings yet
AIM: Program To Find Factorial of A Number by Using Recursion
18 pages
Sample Term Paper
No ratings yet
Sample Term Paper
7 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Wavelet Theory and Application in Communication An
No ratings yet
Wavelet Theory and Application in Communication An
18 pages
Mid Point Algo
No ratings yet
Mid Point Algo
27 pages
DCIT204 Past Questions
No ratings yet
DCIT204 Past Questions
3 pages
Mnnnii
No ratings yet
Mnnnii
32 pages
Chapter 07
100% (1)
Chapter 07
20 pages
Lec 01
No ratings yet
Lec 01
16 pages
Lec-5 Image Analysis-Boundary Detection
No ratings yet
Lec-5 Image Analysis-Boundary Detection
77 pages

Efficient FFT Implementations

Uploaded by

Efficient FFT Implementations

Uploaded by

30.

3 Efﬁcient FFT implementations 915

DFT is therefore a special case of the chirp transform, obtained by taking ´ D !n .

to view the chirp transform as a convolution.)

30.3 Efﬁcient FFT implementations

An iterative FFT implementation

ykŒ0 + ykŒ0 C !nk ykŒ1 ykŒ0 ykŒ0 C !nk ykŒ1

ykŒ1 • – ykŒ0  !nk ykŒ1 ykŒ1 ykŒ0  !nk ykŒ1

(a0,a4) (a2,a6) (a1,a5) (a3,a7)

(a0) (a4) (a2) (a6) (a1) (a5) (a3) (a7)

is a bit-reversal permutation. That is, if we let rev.k/ be the lg n-bit integer

B IT-R EVERSE -C OPY .a; A/

A parallel FFT circuit

I TERATIVE -FFT corresponds to a stage of butterﬂies shown in Figure 30.5. For

30-1 Divide-and-conquer multiplication

b. Give two divide-and-conquer algorithms for multiplying two polynomials of

You might also like

ykŒ0 + ykŒ0 C !nk ykŒ1 ykŒ0 ykŒ0 C !nk ykŒ1

ykŒ1 • – ykŒ0 !nk ykŒ1 ykŒ1 ykŒ0 !nk ykŒ1