FFT For Adaptive Implementation 16
FFT For Adaptive Implementation 16
Computing Systems
26-27 September 2016, Offenburg, Germany
Abstract—Fast Fourier Transform (FFT) is a rudimen- as Fast Fourier Transform exploits the symmetry in
tary operation in signal processing. Modern high speed DFT calculations to speed up the operation. The FFT
signal processing application technologies, such as 4G algorithm which was first implemented by Cooley-Tukey
LTE, 5G, Internet of Things (IoT), etc., necessitate a high exploited the symmetry in DFT operation to reduce the
throughput implementation of FFT. Moreover the FFT size
complexity of arithmetic operations from the order of
required in aforementioned technologies tends to vary with
the operation mode. It is highly desirable to have a FFT
O(N 2 ) to O(N logN ) arithmetic operations [1] .
implementation that not only meets the high throughput Since then Fast Fourier Transform (FFT) has become
demand but is also scalable to cater selectable N-point omnipresent in wide-ranging engineering applications.
FFT. In this paper, our contribution is two-fold; we first It is extensively used in spectrum analysis, high speed
propose a novel split-radix 4/8 FFT algorithm which is image processing, speech recognition, data compression,
comparable in efficiency with radix-8 algorithm but is radar processing, OFDM systems and many more ap-
considerably less complex in design for large size FFTs. plications. High speed application technologies like 4G
Secondly, we developed an automation tool that gives high LTE, 5G communication systems and Internet of Things
level implementation details of an adaptive N-point split-
(IOT) are all the rage in the world at the moment.
radix FFT processor.
These high speed applications require high throughput
Index Terms—DFT, FFT, IoT, split-radix, FFT butterfly, FFT for their operation. The FFT size required for the
parametrized design implementation of these high speed applications also
varies. The FFT size for 3G, 4G LTE, WiMax commu-
I. I NTRODUCTION nication systems is dependent on the operating mode of
Discrete Fourier Transform (DFT) is used to represent the system and it ranges from 64-Point to 2048-Point [2].
a discrete time signal by a series of sinusoidal functions. Similarly, 5G technology is expected to be introduced in
For a complex valued discrete-time sequence x[n] of N 2020. The FFT size for 5G communication is expected
points, the DFT is defined by eq. 1. to be even greater as compared to 4G LTE technology.
In view of the variable FFT size, it is high desirable
N −1 to have a parametrized and adaptive FFT design that
X[k] = (x[n]WNkn k = 0, 1, 2, ..., N ) (1) can be scaled depending upon the operating requirements
n=0 without going through the laborious effort of designing
for each N-Point FFT size configuration.
Where the phase factor is given by eq. 2
The basic FFT design is based on Radix-2 [3] butterfly
block, which was proposed by Cooley-Tukey [1]. For
WNkn = e−j2knπ/N (2)
more efficient implementation, higher radix and Split
The function X[k] represents the frequency domain radix algorithms such as radix-4 [4], radix-8 [5] and
value of the input function x[n] in the discrete time split-radix 2/4 [6], [7], etc. have been employed. All
domain. For direct evaluation of a single point DFT, N conventional FFT algorithms decompose a size N FFT
complex multiplications and (N − 1) complex additions into two parts, namely odd half and even half, and effec-
are required. Consequently, an N-point DFT requires tively reduce the number of multiplications by using the
N 2 complex multiplications and N × (N − 1) complex symmetric properties of the FFT. On the other hand, the
additions. As a result, the computational complexity split-radix algorithms employ a combination of different
of DFT becomes O(N 2 ). For increased number of N radix FFT for decomposition. A split-radix algorithm
points, direct computation of DFT becomes computa- exploits the best of both radix algorithms in terms of
tionally too-intensive. A collection of algorithms known points catered and stages required for FFT computation
117
x[n] + + + X[8r]
x[n+N/8] + + - * X[8r+1]
Wn
x[n+N/4] + - + * X[8r+2]
W2n
x[n+3N/8] + - * - * X[8r+3]
-i W3n
x[n+N/2] - + + * X[8r+4]
W4n
x[n+5N/8] - * + - * X[8r+5]
(1-i)/2 W5n
x[n+6N/4] - * - + * X[8r+6]
-i W6n
x[n+7N/8] - * - *-i - * X[8r+7]
(-1-i)/2 W7n
signal reordering. The split-radix 2/4 butterfly is shown FFT using radix-4 and radix-8 decomposition algorithms.
in figure 4. The proposed split-radix-4/8 algorithm decomposes a
N -point DFT sequence recursively into two N/4 DFT
x[4r] sub-sequences and four N/8 DFT sub-sequences. The
x[n]
Decimation in Frequency (DIF) equations for proposed
radix algorithm are given in figure 5.
x[n+N/4] x[4r+1]
The even indexed outputs are decomposed based on
Wn
radix-4 algorithm and the odd indexed outputs are de-
x[n+N/2] x[4r+2] composed based on radix-8 algorithm. The proposed
i W3n butterfly is shown in figure 6.
x[n+3N/4] x[4r+3]
Stage1 Stage2
Fig. 4. Split Radix-2/4 FFT Butterfly
Figure.4 Radix-2/4 structure
[9] FFT [9]
Butterfly x(0) X(0)
x(1) X(8)
More split-radix algorithms also exist such as split- Radix 4 X(4)
radix 2/8 and split-radix 2/4/8 algorithms [8]. The split- x(2)
X(12)
radix 2/8 uses radix-2 decomposition for even outputs x(3)
and radix-8 decomposition for odd outputs. It reduced
the arithmetic complexity further but the structure be- Radix 4
comes extremely non-uniform due to the presence of
not only radix-2 and radix-8 butterflies but other radix Radix 2
butterflies. In [8], in order to produce a more uniform
structure a split-radix 2/4/8 has been developed based on Radix 2
split-radix 2/8. x(12) Radix 4/8 X(3)
x(13) Radix 2
III. P ROPOSED A PPROACH X(11)
x(14) X(7)
For larger FFTs having more points, the lower radix x(15) Radix 2
X(15)
FFT algorithms incur a huge latency cost as the FFT
computation requires a lot of stages to complete the large
FFT sizes. As a result, these FFT algorithms fail to meet Fig. 7. 16-Point Radix-4/8 FFT Architecture
the throughput demand of high speed applications. On
the other hand, the higher order radix algorithms are The proposed split radix-4/8 algorithm operates on
quite capable for computing FFT efficiently but their N = 4n point sequences and takes log8 N stages to
structure becomes too complex for increased FFT size. compute the FFT of the sequence. As a result, we get
To bridge this gap, we are proposing a new split an algorithm which is more efficient (in number of
radix-4/8 FFT algorithm which decomposes an N -Point stages) than radix-4 FFT algorithm and less complex as
118
ே ே ଷே ܰ Ȁͺ
ಿ
ሺ ሻିଵ ቂቀݔሾ݊ሿ െ ݔቂ݊ ଶ ቃቁ െ ݆ ቀ ݔቂ݊ ସ ቃ െ ݔቂ݊ ସ
ቃቁቃ ܹܰ
Xሾͺ ݎ ͳሿ ൌ σୀ
ఴ
ே ହே ଷே ே ͺ݊ݎ ݊
ቂቀ ݔቂ݊ ଼ ቃ െ ݔቂ݊ ଼
ቃቁ െ ݆ ቀ ݔቂ݊ ଼
ቃ െ ݔቂ݊ ଼
ቃቁቃ Ǥ ܹܰ ܹܰ
ே ே ଷே ͵ܰȀͺ
ಿ
ሺ ሻିଵ ቂቀݔሾ݊ሿ െ ݔቂ݊ ଶ ቃቁ ݆ ቀ ݔቂ݊ ସ ቃ െ ݔቂ݊ ସ
ቃቁቃ ܹܰ
Xሾͺ ݎ ͵ሿ ൌ σୀ
ఴ
ே ହே ଷே ே ͺ݊ݎ ͵݊
ቂቀ ݔቂ݊ ଼ ቃ െ ݔቂ݊ ቃቁ ݆ ቀ ݔቂ݊ ቃ െ ݔቂ݊ ቃቁቃ Ǥ ܹܰ ܹ
଼ ଼ ଼ ܰ
ே ே ଷே ܰ Ȁͺ
ಿ
ሺ ሻିଵ ቂቀݔሾ݊ሿ െ ݔቂ݊ ଶ ቃቁ െ ݆ ቀ ݔቂ݊ ସ ቃ െ ݔቂ݊ ቃቁቃ ܹܰସ
Xሾͺ ݎ ͷሿ ൌ σୀ
ఴ
ே ହே ଷே ேͺ݊ݎ ͷ݊
ቂቀ ݔቂ݊ ଼ ቃ െ ݔቂ݊ ଼ ቃቁ െ ݆ ቀ ݔቂ݊ ଼ ቃ െ ݔቂ݊ ଼ ቃቁቃ Ǥ ܹܰ ܹܰ
ே ே ଷே ͵ܰȀͺ
ಿ
ሺ ሻିଵ ቂቀݔሾ݊ሿ െ ݔቂ݊ ଶ ቃቁ ݆ ቀ ݔቂ݊ ସ ቃ െ ݔቂ݊ ସܰ
ቃቁቃ െ ܹ
Xሾͺ ݎ ሿ ൌ σୀ
ఴ
ே ହே ଷே ே ͺ݊ݎ ݊
ቂቀ ݔቂ݊ ଼ ቃ െ ݔቂ݊ ଼ ቃቁ ݆ ቀ ݔቂ݊ ଼ ቃ െ ݔቂ݊ ଼ ቃቁቃ Ǥ ܹ ܹ
ܰ ܰ
ಿ
ሺ ሻିଵ ே ே ଷே Ͷ݊ݎ
XሾͶݎሿ ൌ σୀ
ర
ቀሾ ݔሾ݊ሿ ݔቂ݊ ସ ቃሿ ሾሾ ݔቂ݊ ଶ ቃ ݔቂ݊ ସ
ቃሿ ቁ ܹܰ
Fig. 5. Decimation in Frequency (DIF) equations for the proposed split Radix-4/8 FFT
119
x(0) x(0) + x(2) +x(4) +x(6)
x(1)
x(1) + x(3) +x(5) +x(7)
- + Wn
x(0)-x(4) +
(x(1)-x(5) -j[x(3)- {(x(0)-x(4) -j[x(2)-x(6)]) - (x(1)-
x(5) x(7)]) * WN/8 x(5) -j[x(3)-x(7)]) * WN/8} * W5n
N/8
- x(1)-x(5) +
W - W5n
{(x(0)-x(4)+j[x(2)-x(6)]) + ( x(1)-x(5)
x(6) + j[ x(3)-x(7) ])* W3N/8} * W3n
-j - x(0)-x(4)+j[x(2)-x(6)] 3n
- x(2)-x(6) -j[x(2)-x(6)] + W
(x(1)-x(5)+j[x(3)- {(X(0)-x(4)]+j[X(2)-x(6)]) – (X(1)-x(5)
x(7)]) * W3N/8 + j[ X(3)-x(7) ]) * W3N/8} * W7n
x(7) 7n
-j - W3N/8 - W
- x(3)-x(7) -j[x(3)-x(7)]
TABLE II
C OMPARATIVE A NALYSIS OF ADDITIONS AND NON - TRIVIAL REAL MULTIPLICATIONS
120
Implementation Complexity Comparison of Radix-4/8 vs. Other Radices
1200
1032
976 964 972 960
1000
800
600
400 264
208 196 204 192
200
0
Radix-2 Radix-4 Radix-2/4 Radix-8 Radix-4/8
Total Real Multiplications Total Real Additions
Rad-4
Rad-2
Rad-2
Radix 4/8 Rad-2
Rad-2
Rad-2
Rad-2
Radix 4/8
Rad-2
Rad-2
Rad-2
x(60) Rad-2 X(15)
x(61) X(31)
x(62) Radix 4/8 X(47)
x(63) X(63)
121