{______________________________________________________________________________ RFFT4.ASM ADSP-21020 Real Radix 4 Fast Fourier Transform This routine performs a real radix 4 Fast Fourier Transform (RFFT). The RFFT length must be a (power of 4)*2 and a minimum of 128 points. The input data must be placed in both DM and PM: the even indexed points in DM and the odd indexed points in PM. This data is destroyed during the course of the computation. The real and complex output of the RFFT is placed in separate locations in DM. The RFFT is accomplished by doing a N/2 (HN) complex FFT of y(n), where Re[y(n)] = x(2n) and Im[y(n)] = x(2n+1) for n=0,1,...,N/2-1. This is followed by some calculation to derive X(k) from Y(k) in the conversion stage. This method has been refered to as "Packing". Note that only (N/2+1) elements are necessary to define the real FFT spectra because of its Hermitian (complex conjugate) symmetry. Since this routine takes care of all necessary address digit-reversals, the input and output data are in normal order. The digit reversal is accomplished by using a modified radix 4 butterfly throughout which swaps the inner two nodes resulting with bit reversed data. Digit reversal is completed by the address bit-reversals during and after the last butterfly stage. For this routine to work correctly, the program "rfft4tbl.c" must be used to generate the special twiddle factor tables for this program. Based on: FFTRAD4 by Karl Schwarz & Raimund Meyer, Universitaet Erlangen Nuernberg Author: 25-APR-91, Ronnin Yee, Analog Devices, DSP div., (617) 461-3672 Calling Information: costwid table at DM : cosine length 3*N/4 sintwid table at PM : sine length 3*N/4 coscnv table at PM : h_cosine length N/2 sincnv table at DM : h_sine length N/2 real even indexed input at DM : evreal length N/2, normal order real odd indexed input at PM : odreal length N/2, normal order Results: real output at DM : refft length N/2+1, normal order imag output at PM : imfft length N/2+1, normal order (Note: Because the bit reversed addressing mode is used with the arrays refft and imfft, they must start at addresses that are integer multiples of the length (HN) of the transform, (i.e. 0,HN,2HN,3HN,...). This is accomplished by specifying two segments starting at those addresses in the architecture file and placing the variables alone in their respective segments. These addresses must also be reflected in the preprocessor variables ORE and OIM in bit reversed format.) Altered Registers: All I, M, L and R registers. Three levels of looping. Benchmarks: radix-4, real with digit reversal time (msec) FFT Length cycles 20MHz 25MHz ---------- ------ --------- -------- 128 1144 .057 .046 512 4969 .248 .199 2048 22858 1.14 .914 8196 105076 5.25 4.20 First Stage - 8 cycles per radix-4 butterfly Other Stages - 14 cycles per radix-4 butterfly (this is optimal!) Memory Usage: pm code = 228 words, pm data = 1.75*N+1 words, dm data = 2.75*N+1 words (Note: to lower pm data requirements, overlap "odreal" and "imfft") Assembler Preprocessor Variables: N Number of points in FFT. Must be a power of four, minimum of 64. STAGES Set to log4(N) or (log(N)/log(4)) OST = bitrev(32 bit N/2) IRE = bitrev(32 bit addr of input real in dm), addr is 0,HN,2HN,3HN,... ORE = bitrev(32 bit addr of output real in dm), addr is 0,HN,2HN,3HN,... _______________________________________________________________________________} #define N 128 #define HN N/2 #define STAGES 3 #define OSTH 0x04000000 #define ORE 0x00000000 #define IRE 0x00020000 { include for symbolic definition of system regster bits } #include "def21020.h" .SEGMENT/DM dm_data; .VAR cosine[3*HN/4]="tc4.dat"; {Cosine twiddle factors, from RFFTTR4TBL program} .VAR h_sine[HN/2]="ths.dat"; { Sine conversion coeff, from RFFTTR4TBL program} .ENDSEG; .SEGMENT/DM dm_rdat; { this segment is an integer multiple of HN } .VAR refft[HN+1]; { Output real data } .GLOBAL refft; .ENDSEG; .SEGMENT/DM dm_idat; { this segment is an integer multiple of HN } .VAR evreal[HN]="evreal.dat"; { Input even real data } .GLOBAL evreal; .ENDSEG; .SEGMENT/PM pm_data; .VAR odreal[HN]= "odreal.dat"; { Input odd imaginary data } .VAR imfft[HN+1]; { Output imaginary data } .VAR sine[3*HN/4]="ts4.dat"; { Sine twiddle factors, from RFFTTR4TBL program} .VAR h_cosine[HN/2]="thc.dat"; {Cosine conversion coeff, from RFFTTR4TBL program} .GLOBAL odreal; .GLOBAL imfft; .ENDSEG; .SEGMENT/PM rst_svc; { program starts at the reset vector } pmwait=0x0021; {pgsz=0,pmwtstates=0,intrn.wtstates only} dmwait=0x008421; {pgsz=0,dmwtstates=0,intrn.wtstates only} call rfft4; stop: idle; nop; nop; .ENDSEG; .SEGMENT/PM pm_code; rfft4: {_______first stage radix-4 butterfly without twiddles______} i0=evreal; i1=evreal+HN/4; i2=evreal+HN/2; i3=evreal+3*HN/4; i4=i0; i5=i1; i6=i2; i7=i3; m0=1; m8=1; i8=odreal; i9=odreal+HN/4; i10=odreal+HN/2; i11=odreal+3*HN/4; i12=i8; i13=i9; i14=i10; i15=i11; l0 = 0; l1 = l0; l2 = l0; l3 = l0; l4 = l0; l5 = l0; l6 = l0; l7 = l0; l8 = l0; l9 = l0; l10 = l0; l11 = l0; l12 = l0; l13 = l0; l14 = l0; l15 = l0; f0=dm(i0,m0), f1=pm(i8,m8); f2=dm(i2,m0), f3=pm(i10,m8); f0=f0+f2, f2=f0-f2, f4=dm(i1,m0), f5=pm(i9,m8); f1=f1+f3, f3=f1-f3, f6=dm(i3,m0), f7=pm(i11,m8); f4=f6+f4, f6=f6-f4; f5=f5+f7, f7=f5-f7; f8=f0+f4, f9=f0-f4; f10=f1+f5, f11=f1-f5; LCNTR=HN/4, do fstage until LCE; { do HN/4 simple radix-4 butterflies } f12=f2+f7, f13=f2-f7, f0=dm(i0,m0), f1=pm(i8,m8); f14=f3+f6, f15=f3-f6, f2=dm(i2,m0), f3=pm(i10,m8); f0=f0+f2, f2=f0-f2, f4=dm(i1,m0), f5=pm(i9,m8); f1=f1+f3, f3=f1-f3, f6=dm(i3,m0), f7=pm(i11,m8); f4=f6+f4, f6=f6-f4, dm(i4,m0)=f8, pm(i12,m8)=f10; f5=f5+f7, f7=f5-f7, dm(i5,m0)=f9, pm(i13,m8)=f11; f8=f0+f4, f9=f0-f4, dm(i6,m0)=f12, pm(i14,m8)=f14; fstage: f10=f1+f5, f11=f1-f5, dm(i7,m0)=f13, pm(i15,m8)=f15; {_____________Middle stages with radix-4 main butterfly___________________} { m0=1 and m8=1 is still preset } m1=-2; { reverse step for twiddles } m9=m1; m2=3; { forward step for twiddles } m10=m2; m5=4; { first there are 4 groups } r2=HN/16; { with HN/16 butterflies in each group } r3=HN/16*3; { step to next group } LCNTR=STAGES-2, do mstage until LCE; { do STAGES-2 stages } i7=cosine; { first real twiddle } i15=sine; { first imag twiddle } r8=evreal; r9=odreal; i0=r8; { upper real path } r10=r8+r2; i8=r9; { upper imaginary path } i1=r10; { second real input path } r10=r10+r2, i4=r10; { second real output path } i2=r10; { third real input path } r10=r10+r2, i5=r10; { third real output path } i3=r10; { fourth real input path } r10=r9+r2, i6=r10; { fourth real output path } i9=r10; { second imag input path } r10=r10+r2, i12=r10; { second imag output path } i10=r10; { third imag input path } r10=r10+r2, i13=r10; { third imag output path } i11=r10; { fourth imag input path } i14=r10; { fourth imag output path } m4=r3; m12=r3; r4=r3+1, m6=r2; m3=r4; r2=r2-1, m11=r4; m7=r2; LCNTR=m5, do mgroup until LCE; { do m5 groups } f0=dm(i7,m0), f5=pm(i9,m8); f8=f0*f5, f4=dm(i1,m0), f1=pm(i15,m8); f9=f0*f4; f12=f1*f5, f0=dm(i7,m0), f5=pm(i11,m8); f13=f1*f4, f12=f9+f12, f4=dm(i3,m0), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13; f13=f1*f5; f9=f0*f5, f8=f8+f13, f0=dm(i7,m1), f5=pm(i10,m8); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m0), f1=pm(i15,m9); f11=f0*f4; f13=f1*f5, f6=f9-f13; f9=f0*f5, f13=f11+f13, f11=dm(i0,0); f13=f1*f4, f8=f11+f13, f10=f11-f13; {___________Do m7 radix-4 butterflies___________} LCNTR=m7, do mr4bfly until LCE; f13=f9-f13, f4=dm(i1,m0), f5=pm(i9,m8); f2=f2+f6, f15=f2-f6, f0=dm(i7,m0), f1=pm(i15,m8); f8=f0*f4, f3=f8+f12, f7=f8-f12, f9=pm(i8,0); f12=f1*f5, f9=f9+f13, f11=f9-f13, f13=f2; f8=f0*f5, f12=f8+f12, f0=dm(i7,m0), f5=pm(i11,m8); f13=f1*f4, f9=f9+f13, f6=f9-f13, f4=dm(i3,m0), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13, dm(i0,m0)=f3, pm(i8,m8)=f9; f13=f1*f5, f11=f11+f14, f7=f11-f14, dm(i4,m0)=f7, pm(i12,m8)=f6; f9=f0*f5, f8=f8+f13, f0=dm(i7,m1), f5=pm(i10,m8); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m0), f1=pm(i15,m9); f11=f0*f4, f3=f10+f15, f8=f10-f15, pm(i13,m8)=f11; f13=f1*f5, f6=f9-f13, dm(i6,m0)=f8, pm(i14,m8)=f7; f9=f0*f5, f13=f11+f13, f11=dm(i0,0); mr4bfly: f13=f1*f4, f8=f11+f13, f10=f11-f13, dm(i5,m0)=f3; {___________End radix-4 butterfly_____________} { dummy for address update * * } f13=f9-f13, f0=dm(i7,m2), f1=pm(i15,m10); f2=f2+f6, f15=f2-f6, f0=dm(i1,m4), f1=pm(i9,m12); f3=f8+f12, f7=f8-f12, f9=pm(i8,0); f9=f9+f13, f11=f9-f13, f0=dm(i2,m4); f9=f9+f2, f6=f9-f2, f0=dm(i3,m4), f1=pm(i10,m12); dm(i0,m3)=f3, pm(i8,m11)=f9; f11=f11+f14, f7=f11-f14, dm(i4,m3)=f7, pm(i12,m11)=f6; f3=f10+f15, f8=f10-f15, pm(i13,m11)=f11; dm(i6,m3)=f8, pm(i14,m11)=f7; mgroup: dm(i5,m3)=f3, f1=pm(i11,m12); r3=m4; r1=m5; r2=m6; r3=ashift r3 by -2; { groupstep/4 } r1=ashift r1 by 2; { groups*4 } m5=r1; mstage: r2=ashift r2 by -2; { butterflies/4 } {____________________Last radix-4 stage__________________________________} { Includes bitreversal of the real data in dm } bit set mode1 BR0; { bitreversal in i0 } { with: m0=m8=1 preset } i4=evreal; { input } i1=evreal+1; i2=evreal+2; i3=evreal+3; i0=ORE; { real output array base must be an integer multiple of HN } m2=OSTH; i7=cosine; i8=odreal; { input } i9=odreal+1; i10=odreal+2; i11=odreal+3; i12=odreal; { output } i15=sine; m1=4; m9=m1; f0=dm(i7,m0), f5=pm(i9,m9); f8=f0*f5, f4=dm(i1,m1), f1=pm(i15,m8); f9=f0*f4; f12=f1*f5, f0=dm(i7,m0), f5=pm(i11,m9); f13=f1*f4, f12=f9+f12, f4=dm(i3,m1), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13; f13=f1*f5; f9=f0*f5, f8=f8+f13, f0=dm(i7,m0), f5=pm(i10,m9); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m1), f1=pm(i15,m8); f11=f0*f4; f13=f1*f5, f6=f9-f13; f9=f0*f5, f13=f11+f13, f11=dm(i4,m1); f13=f1*f4, f8=f11+f13, f10=f11-f13; {________Do HN/4-1 radix-4 butterflies_______} LCNTR=HN/4-1, do lstage until LCE; f13=f9-f13, f4=dm(i1,m1), f5=pm(i9,m9); f2=f2+f6, f15=f2-f6, f0=dm(i7,m0), f1=pm(i15,m8); f8=f0*f4, f3=f8+f12, f7=f8-f12, f9=pm(i8,m9); f12=f1*f5, f9=f9+f13, f11=f9-f13, f13=f2; f8=f0*f5, f12=f8+f12, f0=dm(i7,m0), f5=pm(i11,m9); f13=f1*f4, f9=f9+f13, f6=f9-f13, f4=dm(i3,m1), f1=pm(i15,m8); f8=f0*f4, f2=f8-f13, dm(i0,m2)=f3, pm(i12,m8)=f9; f13=f1*f5, f11=f11+f14, f7=f11-f14, dm(i0,m2)=f7, pm(i12,m8)=f6; f9=f0*f5, f8=f8+f13, f0=dm(i7,m0), f5=pm(i10,m9); f13=f1*f4, f12=f8+f12, f14=f8-f12, f4=dm(i2,m1), f1=pm(i15,m8); f11=f0*f4, f3=f10+f15, f8=f10-f15, pm(i12,m8)=f11; f13=f1*f5, f6=f9-f13, dm(i0,m2)=f3, pm(i12,m8)=f7; f9=f0*f5, f13=f11+f13, f11=dm(i4,m1); lstage: f13=f1*f4, f8=f11+f13, f10=f11-f13, dm(i0,m2)=f8; f13=f9-f13; f2=f2+f6, f15=f2-f6; f3=f8+f12, f7=f8-f12, f9=pm(i8,m9); f9=f9+f13, f11=f9-f13, dm(i0,m2)=f3; f9=f9+f2, f6=f9-f2, dm(i0,m2)=f7; pm(i12,m8)=f9; f11=f11+f14, f7=f11-f14, pm(i12,m8)=f6; f3=f10+f15, f8=f10-f15, pm(i12,m8)=f11; dm(i0,m2)=f3, pm(i12,m8)=f7; dm(i0,m2)=f8; {________Do the bitreversal of the imaginary part from pm to dm___________} i8=odreal; i0=IRE; { image output array base must be an integer multiple of HN } f0=pm(i8,m8); LCNTR=HN-1, do pmbr until LCE; { do HN-1 bitreversals } pmbr: dm(i0,m2)=f0, f0=pm(i8,m8); dm(i0,m2)=f0; bit clr MODE1 BR0; { no bitreversal in i0 any more } {compute the N point FFT of real from the packed N/2 transform} i0=refft; { real src pntr forward} i1=h_sine+1; { real sine pntr} i2=refft+HN-1; { real src pntr backward} i3=refft; { real dest pntr forward} i4=refft+HN; { real dest pntr backward} i5=evreal; i6=evreal+HN-1; i9=h_cosine+1; { imag cosine pntr} i11=imfft; { imag dest pntr forward} i12=imfft+HN; { imag dest pntr backward} m0 = 1; m1 =-1; m8 = 1; m9 =-1; f2 = 0.5; { this register facilitates div by 2} f15= 0.0; {do first pair, U(0)+jV(0) & U(HN)+jV(HN)} f12=dm(i0,m0); f9 =dm(i5,m0); f11=f12+f9, f3=f12-f9, f12=dm(i0,m0); f9 =dm(i5,m0); f10=PASS f15, f8 =dm(i2,m1); f13=dm(i6,m1); {do the rest in pairs (n,HN-n) except the mid-point} f4=f8+f12, f5=f8-f12, f0 =dm(i1,m0), f1 =pm(i9,m8); LCNTR = (HN/2)-1, DO trans UNTIL LCE; f8=f0*f5, f6=f9+f13, f7=f9-f13, dm(i3,m0)=f11; f12=f1*f6, f10=f10+f15,f15=f10-f15, dm(i4,m1)=f3; f13=dm(i6,m1); f11=f2*f4, f14=f8+f12, f12=dm(i0,m0), pm(i11,m8)=f10; f10=f1*f5, f11=f11+f14,f3=f11-f14, f8 =dm(i2,m1), pm(i12,m9)=f15; f14=f0*f6, f4=f8+f12, f5=f8-f12, f0 =dm(i1,m0), f1 =pm(i9,m8); trans: f15=f2*f7, f10=f10-f14, f9 =dm(i5,m0); {finish up and do the mid-point} dm(i3,m0)=f11; f10=f10+f15,f15=f10-f15, dm(i4,m1)=f3; f14=-f9, pm(i11,m8)=f10; rts (db); {finished} pm(i12,m9)=f15; dm(i3,m0)=f8, pm(i11,m8)=f14; {_______________________________________________________________________} .ENDSEG;