ADSP-21020 FFT User Notes
----------------------------
by Ronnin Yee
August 21, 1991
There are currently six FFT routines in the 21020 assembly code runtime library:
Complex FFT
-----------
fftrad2.asm A radix-2 DIT FFT.
fftrad4.asm A radix-4 DIF FFT.
Real FFT
--------
rfft2.asm A radix-2 real FFT (RFFT).
irfft2.asm A radix-2 inverse real FFT (IRFFT).
rfft4.asm A radix-4 real FFT (RFFT).
irfft4.asm A radix-4 inverse real FFT (IRFFT).
In general, a radix-4 FFT will run faster than radix-2 FFT but will take up
more space and has more restrictions on the length of the FFT. Specifically,
all radix-2 FFT routines will take data lengths that are any power of two
(>= 32 points) while complex radix-4 routines will only take data lengths that
are a power of four (>= 64) and real radix-4 routines will only take data
lengths that are (a power of four)*2 >= 128.
Complex inverse FFTs are not provided since they are very easy to implement
with just a forward FFT. To implement a inverse FFT, one just needs to swap
the real and imaginary parts of the data, perform the forward FFT, and then
swap the real and imaginary parts of the result.
To ease the confusion of which data goes where for each of the routines, the
following table of variables and their location is presented, where "N" is the
length of the FFT:
Input Output
Routine DM PM DM PM
-------------------------------------------------------------------------------
fftrad2: redata[N] refft[N] imfft[N]
imdata[N]
sine[N/2] cosine[N/2]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
fftrad4: imdata[N] redata[N] refft[N]
imfft[N]
cosine[3N/4] sine[3N/4]
--------------------------------------------------------------------------------
rfft2: real[N] refft[N/2+1] imfft[N/2+1]
sine[N/4] cosine[N/4]
h_sine[N/4] h_cosine[N/4]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rfft4: evreal[N/2] odreal[N/2] refft[N/2+1] imfft[N/2+1]
cosine[3N/8] sine[3N/8]
h_sine[N/4] h_cosine[N/4]
--------------------------------------------------------------------------------
irfft2: refft[N/2+1] imfft[N/2+1] evreal[N/2] odreal[N/2]
sine[N/4] cosine[N/4]
n_sine[N/4] n_cosine[N/4]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
irfft4: refft[N/2+1] imfft[N/2+1] real[N]
cosine[3N/8] sine[3N/8]
n_sine[N/4] n_cosine[N/4]
--------------------------------------------------------------------------------
Notes: This array must start on an address that is a multiple of N.
This array must start on an address == multiple of N/2.
This output buffer can overlap with the input buffer in the
same memory space.
redata = real part of time domain data
imdata = imaginary part of time domain data
refft = real part of frequency domain data
imfft = imaginary part of frequency domain data
real = real time domain data
evreal = real time domain data, even indices only { =x(2n) }
odreal = real time domain data, odd indices only { =x(2n+1) }
All input and output data are in normal order since the routines handle the
necessary bit- and digit- reversals.
The strange symmetry of how data is shuffled around in radix-2 and radix-4
routines is a result of the differences in structure between the radix-2
routines and the radix-4 routines. In practical terms, the radix-2 routines
perform their bitreversal before the FFT and the radix-4 routines perform their
bitreversal after the FFT. This affects where the data should be placed for
optimal performance.
Space Saving Ideas and Other Table Talk
---------------------------------------
If space is an issue and multiple FFT routines are being used, one may get the
urge to share tables between routines and thus save space. He should consider
the following points:
1) The "sine" and "cosine" tables of the radix-2 and radix-4 are NOT compatible.
The radix-4 routines reads in its sines and cosines in a sort of a bit-
reversed address order while the radix-2 tables uses a normal ordering.
2) The "sine" and "cosine" tables of any radix-2 routine is compatible with the
"sine" and "cosine" table of any other radix-2 FFT routine of the same
length and type (real or complex). The tables of complex and real routines
are different because the complex do N point FFTs while the real actually
do N/2 FFTs. The same holds true among radix-4 routines.
3) "sine" and "cosine" radix-4 tables compiled for a length N FFT can also be
used for FFTs of length less than N. This is a result of the bit-reversed
-like ordering of the tables.
4) All "h_sine" and "h_cosine" tables are the same for the same length FFT.
This also holds true for "n_sine" and "n_cosine".
5) "n_cosine" and "n_sine" are the same as "h_cosine" and "h_sine" multiplied
by a factor of (2/N). If you wish to share these tables also and don't mind
the scaling, use the "h_cosine" and "h_sine" tables and change
"f2=(1/2*HN);" to "f2=0.5;" in the beginning of the conversion stage.
This will cause the output of this routine to be (ifft)*(N/2).
Things to do when your FFT won't work
-------------------------------------
0) Take a coffee break. A refreshed perspective can do wonders.
1) Re-check preprocessor variables. Some of these variables are computed
slightly differently for different routines. For instance, "STAGES" is
log2(N) in fftrad2, log4(N) in fftrad4 and log4(N/2) in rfft4!
2) Recompute bit-reversing in bit-reversed variables. This can be quite a
pain.
3) Are the arrays that the bit-reversed variables are pointing to on the
correct boundries? Note that currently this may require a trip to the
architecture file (see explaination in the routine).
4) Have you given the right file names to all the tables and incoming data?
For that matter, are they the right length and did you use the right
program to create them?
5) Make sure all your data is going into the right memory space. The
assembler will NOT flag an error if you define a variable "foo" in PM and
use it to access data in DM.
6) Did you remember to re-compile and re-link?
7) Repeat steps 1) and 2) very carefully.
8) Use the simulator to verify all of the above.
Hopefully, this will solve most of your FFT problems.
The Conversion Stage in the Real FFTs
-------------------------------------
For a more complete discussion of the algorithm we used for the real FFTs, read
"The Fast Fourier Transform" by E. Oran Brigham (Prentice Hall:New Jersey,
1974).
Given a 2N point sequence, x(n), and having taken the FFT of x(2n)+jx(2n+1) for
n=0,1,...,N-1, we can now compute:
Given FFT(x(2n)+jx(2n+1)) = A(k)+jF(k),
let X(k) = R(k) + jI(k),
let c(k) = cos(pi*k/N),
let s(k) = sin(pi*k/N),
2R(k) = A(k)+A(N-k) + c(k)(F(k)+F(N-k)) - s(k)(A(k)-A(N-k))
2I(k) = F(k)-F(N-k) - s(k)(F(k)+F(N-k)) - c(k)(A(k)-A(N-k))
This will give us X(k). Notice what happens when we let k'=N-k (k'-> k for
convenience):
2R(N-k) = A(k)+A(N-k) - c(k)(F(k)+F(N-k)) + s(k)(A(k)-A(N-k))
2I(N-k) = -F(k)+F(N-k) - s(k)(F(k)+F(N-k)) - c(k)(A(k)-A(N-k))
because c(N-k) = -c(k) and s(N-k) = s(k).
Since the data needed to compute X(k) is the same as the data needed to compute
X(N-k), we decided to compute them simultaneously during each iteration of the
conversion stage loop. So the algorithm does the conversion in k, N-k pairs
(except for the mid-point).
In order to calculate the inverse, we realize that we have four unknowns and
four equations. Thus it is a simple matter to derive:
2A(k) = R(k)+R(N-k) - s(k)(R(k)-R(N-k)) - c(k)(I(k)+I(N-k))
2F(k) = I(k)-I(N-k) + c(k)(R(k)-R(N-k)) - s(k)(I(k)+I(N-k))
and
2A(N-k) = R(k)+R(N-k) + s(k)(R(k)-R(N-k)) + c(k)(I(k)+I(N-k))
2F(N-k) = -I(k)+I(N-k) + c(k)(R(k)-R(N-k)) - s(k)(I(k)+I(N-k))
We can, therefore, calculate A(k)+jF(k) from X(k), run the inverse FFT and
regain x(n).