1 / 19

FFT VLSI Implementation

FFT VLSI Implementation. VLSI Signal Processing 台灣大學電機系 吳安宇. Shousheng He and Mats Torkelson, A new approach to pipeline FFT processor. IEEE Proc. Of IPPS, P766-770, 1996.

Gabriel
Download Presentation

FFT VLSI Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FFT VLSI Implementation VLSI Signal Processing 台灣大學電機系 吳安宇 • Shousheng He and Mats Torkelson, A new approach to pipeline FFT processor. IEEE Proc. Of IPPS, P766-770, 1996. • E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, A fast single-chip implementation of 8192 complex point FFT. IEEE J. Solid-State Circuits, P300-305, March 1995

  2. FFT Review

  3. Implementation--- Two Extreme Method Slow ----------------- Speed ----------------- Fast Small ------------------Area------------------- Large Complicated  ------------ Control --------------- Simple

  4. Design Consideration • System Requirement • e.g., speed, area,power … • Trade-off in these two cases, we need • More Processing Elements (PE’s) • Better Processing Element Utilization Rate • Better Control Scheme

  5. FFT Processor--- Block Diagram

  6. Some Current Themes Radix-2 Multi-path Delay Commutator. ( N = 16 ) Radix-2 Single-path Delay Feedback. ( N = 16 )

  7. Radix-4 Single-path Delay Commutator. ( N = 256 ) Radix-4 Multi-path Delay Commutator. ( N = 256 ) Radix-4 Single-path Delay Feedback. ( N = 256 ) Some Current Themes (cont.)

  8. Distinctive merit of the above • The delay-feedback are more efficient than delay-commutator in terms of memory utilization • Radix-4 has higher multiplier utilization ,however,Radix-2 has simpler BF which are better utilized

  9. Comparison Radix / Speed Low  ----------------------------------- High Processing Ability / Unit Low  ----------------------------------- High Control Theme Simple ----------------------------------- Complex Combine the advantages  Further decompose high radix PE

  10. Decompose Method (1) • Simply ‘‘reuse’’ the repeated micro unit A radix-4 PE

  11. Decompose Method (2) • From algorithm level Applying 3 index: n=<n1*N/2 + n2*N/4 + n3>N k=<k1 + 2k2 + 4k3>N where n1,n2={0,1} ;n3={0~N/4-1} Summation of n1

  12. Decompose Method (2) cont. Summation of n2 Only real-imaginary swapping & sign inversion

  13. Graphical Explanation (N=16) Trivial multiplication

  14. Graphical Explanation (cont.) • The Eqs are equivalent to the operations below

  15. Circuit of BF2I First N/2 cycles Xr(n) Zr(n+N/2) Xi(n) Zi(n+N/2) Xr(n+N/2) Zr(n) Xi(n+N/2) Zi(n) Second N/2 cycles

  16. Circuit of BF2II Xr(n) Zr(n+N/2) Xi(n) Zi(n+N/2) Xr(n+N/2) Zr(n) Xi(n+N/2) Zi(n) Swap Re&Im and sign inversion

  17. Radix-22 Single-path Delay Feedback FFT architecture using the above technique, for N=256 Compare with original architecture, for N=256

  18. Structural advantage 2 • Radix-2 has the same complexity as radix-4,but still retain radix-2 BF structure • The stage has non-trivial multiplication • Control is simple; synchronization controller address counter for W n

  19. Conclusions • FFT Applications: Radar Signal Processing, Fast convolution, Spectrum Estimation, OFDM-based Modulation/demodulations • Efficient VLSI architectures (parallel processing) are required for real-time processing. • However, most systems still employ DSP processors (e.g., TI C3x/C5x) for computations (fast algorithms like DIT and DIF FFT). • VLIW (Very Long-length Instruction Word)-based processors (TI C6x) need new programming skills to utilize the two parallel MAC units.

More Related