220 likes | 332 Views
David Hansen and James Michelussi. Is F Better than D. Introduction. Discrete Fourier Transform (DFT) Fast Fourier Transform (FFT) FFT Algorithm – Applying the Mathematics Implementations of DFT and FFT Hardware Benchmarks Conclusion. DFT.
E N D
David Hansen and James Michelussi Is F Better than D
Introduction • Discrete Fourier Transform (DFT) • Fast Fourier Transform (FFT) • FFT Algorithm – Applying the Mathematics • Implementations of DFT and FFT • Hardware Benchmarks • Conclusion
DFT • In 1807 introduced by Jean Baptiste Joseph Fourier. • allows a sampled or discrete signal that is periodic to be transformed from the time domain to the frequency domain • Correlation between the time domain signal and N cosine and N sine waves X(k) = DFT Frequency Signal N = Number of Sample Points X(n) = Time Domain Signal WN = Twiddle Factor
DFT (Walking Speed) • Why is this important? Where is this used? • allows machines to calculate the frequency domain • allows for the convolution of signals by just multiplying them together • Used in digital spectral analysis for speech, imaging and pattern recognition as well as signal manipulation using filters • But the DFT requires N2 multiplications!
FFT (Jet Speed) • J. W. Cooley and J. W. Tukey are given credit for bringing the FFT to the world in the 1960s • Simply an algorithm for more efficiently calculating the DFT • Takes advantage of symmetry and periodicity in the twiddle factors as well as uses a divide and conquer method • Symmetry: WNr +N/2 = -WNr • Periodicity: WNr+N = WNr • Requires only (N/2)log2(N) multiplications ! • Faster computation times • More precise results due to less round-off error
FFT Algorithm • Several different types of FFT Algorithms (Radix-2, Radix-4, DIT & DIF) • Focus on Radix-2 using Decimation in Time (DIT) method • Breaks down the DFT calculation into a number of 2-point DFTs • Each 2-point DFT uses an operation called the Butterfly • These groups are then re-combined with another group of two and so on for log2(N) stages • Using the DIT method the input time domain points must be reordered using bit reversal
Implementations of DFT and FFT David Hansen
DFT Implementation for (r=0; r<=samples/2; r++) { float re = 0.0f, im = 0.0f; float part = (float)r * -2.0f * PI / (float)samples; for (k=0; k<samples; k++) { float theta = part * (float)k; re += data_in[k] * cos(theta); im += data_in[k] * sin(theta); } } • Nested For Loop, (N/2)*N Iterations… O(N2) • 63027.41 Cycles / Sample (123 cycles per inner loop iteration) • Obvious Inefficiencies, cos and sin math.h functions • Efficient assembly coding could reduce the inner loop to 3 cycles per iteration (1,536 cycles / sample)
C++ FFT Implementation void fft_float (unsigned NumSamples, float *RealIn, float *ImagIn, float *RealOut, float *ImagOut ) { for ( i=0; i < NumSamples; i++ ) { // Iterate over the samples and perform the bit-reversal j = ReverseBits ( i, NumBits ); } BlockEnd = 1; // Following loop iterates Log2(NumSamples) for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 ) { // Perform Angle Calculations (Using math.h sin/cos) // Following 2 loops iterate over NumSamples/2 for ( i=0; i < NumSamples; i += BlockSize ) { for ( j=i, n=0; n < BlockEnd; j++, n++ ) { // Perform butterfly calculations } } BlockEnd = BlockSize; } }
C++ FFT Implementation • Bit-Reverse For Loop – N iterations • Nested For Loops • First Outer Loop – Log2(N) iterations • Made use of sin/cos math.h functions • Second Outer Loop – N / BlockSize iterations • Inner Loop – BlockSize/2 iterations • O(N + Log2(N) * N/BlockSize * BlockSize/2) • O(N+N*Log2(N)) • 193.84 Cycles / Sample
Assembly FFT Implementation • Bit-Reverse Address Generation • Hide Bit-Reverse operation inside first and second FFT Stages • Sin and Cos values stored in a Look-Up-Table • 256 Kbyte LUT added to Data1 • Needed to grow Data1 Memory Space using LDF file • Interleaved Real and Imaginary Arrays • Quad Reads Loads 2 Complex Points per Cycle • Supports the Real FFT for input signals with no Imaginary component • 40% Algorithm-based Savings
Assembly FFT Implementation • Special Butterfly Instruction • Can perform addition/subtraction in parallel in one compute block • Speeds up the inner-most loop • VLIW and SIMD Operations • Performs simultaneous operations in both compute blocks • Loop unrolling and instruction scheduling keeps the entire processor busy with instructions. • 11.35 Cycles per Sample
DC FFT Test FFT Source Array FFT Output Magnitude
Audio FFT Test FFT Source Array FFT Output Magnitude
Conclusion • The FFT algorithm is very useful when computing the frequency domain on a DSP. • FFT is much faster than a regular DFT algorithm • FFT is more precise by having less errors created due to round off. • The timed coding examples further support this claim and demonstrate how to code the algorithm. • The Radix-2 FFT isn’t the fastest but it uses a less complex addressing and twiddle factor routine • In this case (unlike in school) F is better then D.