KEMBAR78
Efficient FFT Implementations | PDF | Discrete Fourier Transform | Fast Fourier Transform
0% found this document useful (0 votes)
15 views6 pages

Efficient FFT Implementations

This document discusses efficient implementations of the Fast Fourier Transform (FFT), focusing on both iterative and parallel approaches. It explains the iterative FFT algorithm, which operates in O(n log n) time, and introduces the concept of butterfly operations for combining elements. Additionally, it describes a parallel FFT circuit that leverages the iterative algorithm's structure to perform computations concurrently, achieving a depth of O(log n).

Uploaded by

kunalrastogi13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Efficient FFT Implementations

This document discusses efficient implementations of the Fast Fourier Transform (FFT), focusing on both iterative and parallel approaches. It explains the iterative FFT algorithm, which operates in O(n log n) time, and introduces the concept of butterfly operations for combining elements. Additionally, it describes a parallel FFT circuit that leverages the iterative algorithm's structure to perform computations concurrently, achieving a depth of O(log n).

Uploaded by

kunalrastogi13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

30.

3 Efficient FFT implementations 915

DFT is therefore a special case of the chirp transform, obtained by taking ´ D !n .


Show how to evaluate the chirp transform in time O.n lg n/ for any complex num-
ber ´. (Hint: Use the equation
n1 
X  
2 =2 2 2
yk D ´k aj ´j =2 ´.kj / =2
j D0

to view the chirp transform as a convolution.)

30.3 Efficient FFT implementations

Since the practical applications of the DFT, such as signal processing, demand the
utmost speed, this section examines two efficient FFT implementations. First, we
shall examine an iterative version of the FFT algorithm that runs in ‚.n lg n/ time
but can have a lower constant hidden in the ‚-notation than the recursive version
in Section 30.2. (Depending on the exact implementation, the recursive version
may use the hardware cache more efficiently.) Then, we shall use the insights that
led us to the iterative implementation to design an efficient parallel FFT circuit.

An iterative FFT implementation


We first note that the for loop of lines 10–13 of R ECURSIVE -FFT involves com-
puting the value !nk ykŒ1 twice. In compiler terminology, we call such a value a
common subexpression. We can change the loop to compute it only once, storing
it in a temporary variable t.

for k D 0 to n=2  1
t D ! ykŒ1
yk D ykŒ0 C t
ykC.n=2/ D ykŒ0  t
! D ! !n

The operation in this loop, multiplying the twiddle factor ! D !nk by ykŒ1 , storing
the product into t, and adding and subtracting t from ykŒ0 , is known as a butterfly
operation and is shown schematically in Figure 30.3.
We now show how to make the FFT algorithm iterative rather than recursive
in structure. In Figure 30.4, we have arranged the input vectors to the recursive
calls in an invocation of R ECURSIVE -FFT in a tree structure, where the initial
call is for n D 8. The tree has one node for each call of the procedure, labeled
916 Chapter 30 Polynomials and the FFT

ykŒ0 + ykŒ0 C !nk ykŒ1 ykŒ0 ykŒ0 C !nk ykŒ1

!nk !nk

ykŒ1 • – ykŒ0  !nk ykŒ1 ykŒ1 ykŒ0  !nk ykŒ1


(a) (b)

Figure 30.3 A butterfly operation. (a) The two input values enter from the left, the twiddle fac-
tor !nk is multiplied by yk , and the sum and difference are output on the right. (b) A simplified
Œ1

drawing of a butterfly operation. We will use this representation in a parallel FFT circuit.

(a0,a1,a2,a3,a4,a5,a6,a7)

(a0,a2,a4,a6) (a1,a3,a5,a7)

(a0,a4) (a2,a6) (a1,a5) (a3,a7)

(a0) (a4) (a2) (a6) (a1) (a5) (a3) (a7)

Figure 30.4 The tree of input vectors to the recursive calls of the R ECURSIVE -FFT procedure. The
initial invocation is for n D 8.

by the corresponding input vector. Each R ECURSIVE -FFT invocation makes two
recursive calls, unless it has received a 1-element vector. The first call appears in
the left child, and the second call appears in the right child.
Looking at the tree, we observe that if we could arrange the elements of the
initial vector a into the order in which they appear in the leaves, we could trace
the execution of the R ECURSIVE -FFT procedure, but bottom up instead of top
down. First, we take the elements in pairs, compute the DFT of each pair using
one butterfly operation, and replace the pair with its DFT. The vector then holds
n=2 2-element DFTs. Next, we take these n=2 DFTs in pairs and compute the
DFT of the four vector elements they come from by executing two butterfly oper-
ations, replacing two 2-element DFTs with one 4-element DFT. The vector then
holds n=4 4-element DFTs. We continue in this manner until the vector holds two
.n=2/-element DFTs, which we combine using n=2 butterfly operations into the
final n-element DFT.
To turn this bottom-up approach into code, we use an array AŒ0 : : n  1� that
initially holds the elements of the input vector a in the order in which they appear
30.3 Efficient FFT implementations 917

in the leaves of the tree of Figure 30.4. (We shall show later how to determine this
order, which is known as a bit-reversal permutation.) Because we have to combine
DFTs on each level of the tree, we introduce a variable s to count the levels, ranging
from 1 (at the bottom, when we are combining pairs to form 2-element DFTs)
to lg n (at the top, when we are combining two .n=2/-element DFTs to produce the
final result). The algorithm therefore has the following structure:
1 for s D 1 to lg n
2 for k D 0 to n  1 by 2s
3 combine the two 2s1 -element DFTs in
AŒk : : k C 2s1  1� and AŒk C 2s1 : : k C 2s  1�
into one 2s -element DFT in AŒk : : k C 2s  1�
We can express the body of the loop (line 3) as more precise pseudocode. We
copy the for loop from the R ECURSIVE -FFT procedure, identifying y Œ0 with
AŒk : : k C 2s1  1� and y Œ1 with AŒk C 2s1 : : k C 2s  1�. The twiddle fac-
tor used in each butterfly operation depends on the value of s; it is a power of !m ,
where m D 2s . (We introduce the variable m solely for the sake of readability.)
We introduce another temporary variable u that allows us to perform the butterfly
operation in place. When we replace line 3 of the overall structure by the loop
body, we get the following pseudocode, which forms the basis of the parallel im-
plementation we shall present later. The code first calls the auxiliary procedure
B IT-R EVERSE -C OPY .a; A/ to copy vector a into array A in the initial order in
which we need the values.
I TERATIVE -FFT.a/
1 B IT-R EVERSE -C OPY .a; A/
2 n D a:length // n is a power of 2
3 for s D 1 to lg n
4 m D 2s
5 !m D e 2 i=m
6 for k D 0 to n  1 by m
7 ! D1
8 for j D 0 to m=2  1
9 t D ! AŒk C j C m=2�
10 u D AŒk C j �
11 AŒk C j � D u C t
12 AŒk C j C m=2� D u  t
13 ! D ! !m
14 return A
How does B IT-R EVERSE -C OPY get the elements of the input vector a into the
desired order in the array A? The order in which the leaves appear in Figure 30.4
918 Chapter 30 Polynomials and the FFT

is a bit-reversal permutation. That is, if we let rev.k/ be the lg n-bit integer


formed by reversing the bits of the binary representation of k, then we want to
place vector element ak in array position AŒrev.k/�. In Figure 30.4, for exam-
ple, the leaves appear in the order 0; 4; 2; 6; 1; 5; 3; 7; this sequence in binary is
000; 100; 010; 110; 001; 101; 011; 111, and when we reverse the bits of each value
we get the sequence 000; 001; 010; 011; 100; 101; 110; 111. To see that we want a
bit-reversal permutation in general, we note that at the top level of the tree, indices
whose low-order bit is 0 go into the left subtree and indices whose low-order bit
is 1 go into the right subtree. Stripping off the low-order bit at each level, we con-
tinue this process down the tree, until we get the order given by the bit-reversal
permutation at the leaves.
Since we can easily compute the function rev.k/, the B IT-R EVERSE -C OPY pro-
cedure is simple:

B IT-R EVERSE -C OPY .a; A/


1 n D a:length
2 for k D 0 to n  1
3 AŒrev.k/� D ak

The iterative FFT implementation runs in time ‚.n lg n/. The call to B IT-
R EVERSE -C OPY.a; A/ certainly runs in O.n lg n/ time, since we iterate n times
and can reverse an integer between 0 and n  1, with lg n bits, in O.lg n/ time.
(In practice, because we usually know the initial value of n in advance, we would
probably code a table mapping k to rev.k/, making B IT-R EVERSE -C OPY run in
‚.n/ time with a low hidden constant. Alternatively, we could use the clever amor-
tized reverse binary counter scheme described in Problem 17-1.) To complete the
proof that I TERATIVE -FFT runs in time ‚.n lg n/, we show that L.n/, the number
of times the body of the innermost loop (lines 8–13) executes, is ‚.n lg n/. The
for loop of lines 6–13 iterates n=m D n=2s times for each value of s, and the
innermost loop of lines 8–13 iterates m=2 D 2s1 times. Thus,
lg n
X n s1
L.n/ D 2
sD1
2s
lg n
X n
D
sD1
2
D ‚.n lg n/ :
30.3 Efficient FFT implementations 919

a0 y0
!20
a1 y1
!40
a2 y2
!20 !41
a3 y3
!80
a4 y4
!20 !81
a5 y5
!40 !82
a6 y6
!20 !41 !83
a7 y7
stage s D 1 stage s D 2 stage s D 3

Figure 30.5 A circuit that computes the FFT in parallel, here shown on n D 8 inputs. Each
butterfly operation takes as input the values on two wires, along with a twiddle factor, and it produces
as outputs the values on two wires. The stages of butterflies are labeled to correspond to iterations
of the outermost loop of the I TERATIVE -FFT procedure. Only the top and bottom wires passing
through a butterfly interact with it; wires that pass through the middle of a butterfly do not affect
that butterfly, nor are their values changed by that butterfly. For example, the top butterfly in stage 2
has nothing to do with wire 1 (the wire whose output is labeled y1 ); its inputs and outputs are only
on wires 0 and 2 (labeled y0 and y2 , respectively). This circuit has depth ‚.lg n/ and performs
‚.n lg n/ butterfly operations altogether.

A parallel FFT circuit


We can exploit many of the properties that allowed us to implement an efficient
iterative FFT algorithm to produce an efficient parallel algorithm for the FFT. We
will express the parallel FFT algorithm as a circuit. Figure 30.5 shows a parallel
FFT circuit, which computes the FFT on n inputs, for n D 8. The circuit begins
with a bit-reverse permutation of the inputs, followed by lg n stages, each stage
consisting of n=2 butterflies executed in parallel. The depth of the circuit—the
maximum number of computational elements between any output and any input
that can reach it—is therefore ‚.lg n/.
The leftmost part of the parallel FFT circuit performs the bit-reverse permuta-
tion, and the remainder mimics the iterative I TERATIVE -FFT procedure. Because
each iteration of the outermost for loop performs n=2 independent butterfly opera-
tions, the circuit performs them in parallel. The value of s in each iteration within
920 Chapter 30 Polynomials and the FFT

I TERATIVE -FFT corresponds to a stage of butterflies shown in Figure 30.5. For


s D 1; 2; : : : ; lg n, stage s consists of n=2s groups of butterflies (corresponding to
each value of k in I TERATIVE -FFT), with 2s1 butterflies per group (corresponding
to each value of j in I TERATIVE -FFT). The butterflies shown in Figure 30.5 corre-
spond to the butterfly operations of the innermost loop (lines 9–12 of I TERATIVE -
FFT). Note also that the twiddle factors used in the butterflies correspond to those
used in I TERATIVE -FFT: in stage s, we use !m 0 1
; !m m=21
; : : : ; !m , where m D 2s .

Exercises

30.3-1
Show how I TERATIVE -FFT computes the DFT of the input vector .0; 2; 3; 1; 4;
5; 7; 9/.

30.3-2
Show how to implement an FFT algorithm with the bit-reversal permutation occur-
ring at the end, rather than at the beginning, of the computation. (Hint: Consider
the inverse DFT.)

30.3-3
How many times does I TERATIVE -FFT compute twiddle factors in each stage?
Rewrite I TERATIVE -FFT to compute twiddle factors only 2s1 times in stage s.

30.3-4 ?
Suppose that the adders within the butterfly operations of the FFT circuit some-
times fail in such a manner that they always produce a zero output, independent
of their inputs. Suppose that exactly one adder has failed, but that you don’t know
which one. Describe how you can identify the failed adder by supplying inputs to
the overall FFT circuit and observing the outputs. How efficient is your method?

Problems

30-1 Divide-and-conquer multiplication


a. Show how to multiply two linear polynomials ax C b and cx C d using only
three multiplications. (Hint: One of the multiplications is .a C b/  .c C d /.)

b. Give two divide-and-conquer algorithms for multiplying two polynomials of


degree-bound n in ‚.nlg 3 / time. The first algorithm should divide the input
polynomial coefficients into a high half and a low half, and the second algorithm
should divide them according to whether their index is odd or even.

You might also like