{______________________________________________________________________________ FFTRAD4.ASM ADSP-21020 Radix 4 Fast Fourier Transform This routine performs a complex, radix 4 Fast Fourier Transform (FFT). The FFT length (N) must be a power of 4 and a minimum of 64 points. The real part of the input data is placed in DM and the complex part in PM. This data is destroyed during the course of the computation. The real and complex output of the FFT is placed in separate locations in DM. Since this routine takes care of all necessary address digit-reversals, the input and output data are in normal order. The digit reversal is accomplished by using a modified radix 4 butterfly throughout which swaps the inner two nodes resulting with bit reversed data. The digit reversal is completed by bit reversing the real data in the final stage and then bit reversing the imaginary so that it ends up in DM. For this routine to work correctly, the program "twidrad4.C" must be used to generate the special twiddle factor tables for this program. Author: Karl Schwarz & Raimund Meyer, Universitaet Erlangen Nuernberg Revision: 27-MAR-91, Ronnin Yee, Analog Devices, DSP div., (617) 461-3672 Calling Information: costwid table at DM : cosine length 3*N/4 sintwid table at PM : sine length 3*N/4 real input at DM : redata length N, normal order imag input at PM : imdata length N, normal order Results: real output at DM : refft length N, normal order imag output at DM : imfft length N, normal order (Note: Because the bit reversed addressing mode is used with the arrays refft and imfft, they must start at addresses that are integer multiples of the length (N) of the transform, (i.e. 0,N,2N,3N,...). This is accomplished by specifying two segments starting at those addresses in the architecture file and placing the variables alone in their respective segments. These addresses must also be reflected in the preprocessor variables ORE and OIM in bit reversed format.) Altered Registers: All I, M, L and R registers. Three levels of looping. Benchmarks: radix-4, complex with digit reversal FFT Length cycles time (ms @ 20MHz CLK) ---------- ------ --------------------- 64 920 0.046 256 4044 0.202 1024 19245 0.962 4096 90702 4.535 16384 419434 20.972 First Stage - 8 cycles per radix-4 butterfly Other Stages - 14 cycles per radix-4 butterfly (this is optimal!) Memory Usage: pm code = 192 words, pm data = 1.75*N words, dm data = 3.75*N words Assembler Preprocessor Variables: N Number of points in FFT. Must be a power of four, minimum of 64. STAGES Set to log4(N) or (log(N)/log(4)) OST = bitrev(32 bit N/2) ORE = bitrev(32 bit addr of output real in dm), addr is 0,N,2N,3N,... OIM = bitrev(32 bit addr of output imag. in dm), addr is 0,N,2N,3N,... _______________________________________________________________________________} { include Fast FIR header (specifies all preprocessor variables) } #include "ffir.h" .SEGMENT/DM dm_sram; .VAR cosine[3*N/4]="tc.dat"; {Cosine twiddle factors, from FFTTR4TBL program} .VAR redata[N]; { Input real data } .GLOBAL redata; .ENDSEG; .SEGMENT/DM dm_algn0; { this segment is an integer multiple of N } .VAR refft[N]; { Output real data } .GLOBAL refft; .ENDSEG; .SEGMENT/DM dm_algn1; { this segment is an integer multiple of N } .VAR imfft[N]; { Output imaginary data } .GLOBAL imfft; .ENDSEG; .SEGMENT/PM pm_sram; .VAR sine[3*N/4]="ts.dat"; { Sine twiddle factors, from FFTTR4TBL program} .VAR imdata[N]; { Input imaginary data } .GLOBAL imdata; .ENDSEG; .GLOBAL _fftrad4; .SEGMENT/PM pm_sram; _fftrad4: {_______first stage radix-4 butterfly without twiddles______} i0=redata; i1=redata+N/4; i2=redata+N/2; i3=redata+3*N/4; i4=i0; i5=i1; i6=i2; i7=i3; m0=1; m8=1; i8=imdata; i9=imdata+N/4; i10=imdata+N/2; i11=imdata+3*N/4; i12=i8; i13=i9; i14=i10; i15=i11; l0 = 0; l1 = l0; l2 = l0; l3 = l0; l4 = l0; l5 = l0; l6 = l0; l7 = l0; l8 = l0; l9 = l0; l10 = l0; l11 = l0; l12 = l0; l13 = l0; l14 = l0; l15 = l0; f0=dm(i0,m0), f1=pm(i8,m8); f2=dm(i2,m0), f3=pm(i10,m8); f0=f0+f2, f2=f0-f2, f4=dm(i1,m0), f5=pm(i9,m8); f1=f1+f3, f3=f1-f3, f6=dm(i3,m0), f7=pm(i11,m8); f4=f6+f4, f6=f6-f4; f5=f5+f7, f7=f5-f7; f8=f0+f4, f9=f0-f4; f10=f1+f5, f11=f1-f5; LCNTR=N/4, do fstage until LCE; { do N/4 simple radix-4 butterflies } f12=f2+f7, f13=f2-f7, f0=dm(i0,m0), f1=pm(i8,m8); f14=f3+f6, f15=f3-f6, f2=dm(i2,m0), f3=pm(i10,m8); f0=f0+f2, f2=f0-f2, f4=dm(i1,m0), f5=pm(i9,m8); f1=f1+f3, f3=f1-f3, f6=dm(i3,m0), f7=pm(i11,m8); f4=f6+f4, f6=f6-f4, dm(i4,m0)=f8, pm(i12,m8)=f10; f5=f5+f7, f7=f5-f7, dm(i5,m0)=f9, pm(i13,m8)=f11; f8=f0+f4, f9=f0-f4, dm(i6,m0)=f12, pm(i14,m8)=f14; fstage: f10=f1+f5, f11=f1-f5, dm(i7,m0)=f13, pm(i15,m8)=f15; {_____________Middle stages with radix-4 main butterfly___________________} { m0=1 and m8=1 is still preset } m1=-2; { reverse step for twiddles } m9=m1; m2=3; { forward step for twiddles } m10=m2; m5=4; { first there are 4 groups } r2=N/16; { with N/16 butterflies in each group } r3=N/16*3; { step to next group } LCNTR=STAGES-2, do mstage until LCE; { do STAGES-2 stages } i7=cosine; { first real twiddle } i15=sine; { first imag twiddle } r8=redata; r9=imdata; i0=r8; { upper real path } r10=r8+r2; i8=r9; { upper imaginary path } i1=r10; { second real input path } r10=r10+r2, i4=r10; { second real output path } i2=r10; { third real input path } r10=r10+r2, i5=r10; { third real output path } i3=r10; { fourth real input path } r10=r9+r2, i6=r10; { fourth real output path } i9=r10; { second imag input path } r10=r10+r2, i12=r10; { second imag output path } i10=r10; { third imag input path } r10=r10+r2, i13=r10; { third imag output path } i11=r10; { fourth imag input path } i14=r10; { fourth imag output path } m4=r3; m12=r3; r4=r3+1, m6=r2; m3=r4; r2=r2-1, m11=r4; m7=r2; LCNTR=m5, do mgroup until LCE; { do m5 groups } f0=dm(i7,m0), f5=pm(i9,m8); f8=f0*f5, f4=dm(i1,m0), f1=pm(i15,m8); f9=f0*f4; f12=f1*f5, f0=dm(i7,m0), f5=pm(i11,m8); f13=f1*f4, f12=f9+f12, f4=dm(i3,m0), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13; f13=f1*f5; f9=f0*f5, f8=f8+f13, f0=dm(i7,m1), f5=pm(i10,m8); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m0), f1=pm(i15,m9); f11=f0*f4; f13=f1*f5, f6=f9-f13; f9=f0*f5, f13=f11+f13, f11=dm(i0,0); f13=f1*f4, f8=f11+f13, f10=f11-f13; {___________Do m7 radix-4 butterflies___________} LCNTR=m7, do mr4bfly until LCE; f13=f9-f13, f4=dm(i1,m0), f5=pm(i9,m8); f2=f2+f6, f15=f2-f6, f0=dm(i7,m0), f1=pm(i15,m8); f8=f0*f4, f3=f8+f12, f7=f8-f12, f9=pm(i8,0); f12=f1*f5, f9=f9+f13, f11=f9-f13, f13=f2; f8=f0*f5, f12=f8+f12, f0=dm(i7,m0), f5=pm(i11,m8); f13=f1*f4, f9=f9+f13, f6=f9-f13, f4=dm(i3,m0), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13, dm(i0,m0)=f3, pm(i8,m8)=f9; f13=f1*f5, f11=f11+f14, f7=f11-f14, dm(i4,m0)=f7, pm(i12,m8)=f6; f9=f0*f5, f8=f8+f13, f0=dm(i7,m1), f5=pm(i10,m8); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m0), f1=pm(i15,m9); f11=f0*f4, f3=f10+f15, f8=f10-f15, pm(i13,m8)=f11; f13=f1*f5, f6=f9-f13, dm(i6,m0)=f8, pm(i14,m8)=f7; f9=f0*f5, f13=f11+f13, f11=dm(i0,0); mr4bfly: f13=f1*f4, f8=f11+f13, f10=f11-f13, dm(i5,m0)=f3; {___________End radix-4 butterfly_____________} { dummy for address update * * } f13=f9-f13, f0=dm(i7,m2), f1=pm(i15,m10); f2=f2+f6, f15=f2-f6, f0=dm(i1,m4), f1=pm(i9,m12); f3=f8+f12, f7=f8-f12, f9=pm(i8,0); f9=f9+f13, f11=f9-f13, f0=dm(i2,m4); f9=f9+f2, f6=f9-f2, f0=dm(i3,m4), f1=pm(i10,m12); dm(i0,m3)=f3, pm(i8,m11)=f9; f11=f11+f14, f7=f11-f14, dm(i4,m3)=f7, pm(i12,m11)=f6; f3=f10+f15, f8=f10-f15, pm(i13,m11)=f11; dm(i6,m3)=f8, pm(i14,m11)=f7; mgroup: dm(i5,m3)=f3, f1=pm(i11,m12); r3=m4; r1=m5; r2=m6; r3=ashift r3 by -2; { groupstep/4 } r1=ashift r1 by 2; { groups*4 } m5=r1; mstage: r2=ashift r2 by -2; { butterflies/4 } {____________________Last radix-4 stage__________________________________} { Includes bitreversal of the real data in dm } bit set mode1 BR0; { bitreversal in i0 } { with: m0=m8=1 preset } i4=redata; { input } i1=redata+1; i2=redata+2; i3=redata+3; i0=ORE; { real output array base must be an integer multiple of N } m2=OST; i7=cosine; i8=imdata; { input } i9=imdata+1; i10=imdata+2; i11=imdata+3; i12=imdata; { output } i15=sine; m1=4; m9=m1; f0=dm(i7,m0), f5=pm(i9,m9); f8=f0*f5, f4=dm(i1,m1), f1=pm(i15,m8); f9=f0*f4; f12=f1*f5, f0=dm(i7,m0), f5=pm(i11,m9); f13=f1*f4, f12=f9+f12, f4=dm(i3,m1), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13; f13=f1*f5; f9=f0*f5, f8=f8+f13, f0=dm(i7,m0), f5=pm(i10,m9); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m1), f1=pm(i15,m8); f11=f0*f4; f13=f1*f5, f6=f9-f13; f9=f0*f5, f13=f11+f13, f11=dm(i4,m1); f13=f1*f4, f8=f11+f13, f10=f11-f13; {________Do N/4-1 radix-4 butterflies_______} LCNTR=N/4-1, do lstage until LCE; f13=f9-f13, f4=dm(i1,m1), f5=pm(i9,m9); f2=f2+f6, f15=f2-f6, f0=dm(i7,m0), f1=pm(i15,m8); f8=f0*f4, f3=f8+f12, f7=f8-f12, f9=pm(i8,m9); f12=f1*f5, f9=f9+f13, f11=f9-f13, f13=f2; f8=f0*f5, f12=f8+f12, f0=dm(i7,m0), f5=pm(i11,m9); f13=f1*f4, f9=f9+f13, f6=f9-f13, f4=dm(i3,m1), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13, dm(i0,m2)=f3, pm(i12,m8)=f9; f13=f1*f5, f11=f11+f14, f7=f11-f14, dm(i0,m2)=f7, pm(i12,m8)=f6; f9=f0*f5, f8=f8+f13, f0=dm(i7,m0), f5=pm(i10,m9); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m1), f1=pm(i15,m8); f11=f0*f4, f3=f10+f15, f8=f10-f15, pm(i12,m8)=f11; f13=f1*f5, f6=f9-f13, dm(i0,m2)=f3, pm(i12,m8)=f7; f9=f0*f5, f13=f11+f13, f11=dm(i4,m1); lstage: f13=f1*f4, f8=f11+f13, f10=f11-f13, dm(i0,m2)=f8; f13=f9-f13; f2=f2+f6, f15=f2-f6; f3=f8+f12, f7=f8-f12, f9=pm(i8,m9); f9=f9+f13, f11=f9-f13, dm(i0,m2)=f3; f9=f9+f2, f6=f9-f2, dm(i0,m2)=f7; pm(i12,m8)=f9; f11=f11+f14, f7=f11-f14, pm(i12,m8)=f6; f3=f10+f15, f8=f10-f15, pm(i12,m8)=f11; dm(i0,m2)=f3, pm(i12,m8)=f7; dm(i0,m2)=f8; {________Do the bitreversal of the imaginary part from pm to dm___________} i8=imdata; i0=OIM; { image output array base must be an integer multiple of N } f0=pm(i8,m8); LCNTR=N-1, do pmbr until LCE; { do N-1 bitreversals } pmbr: dm(i0,m2)=f0, f0=pm(i8,m8); rts (db); dm(i0,m2)=f0; bit clr MODE1 BR0; { no bitreversal in i0 any more } .ENDSEG;