1 / 22

The Imagine Stream Processor

The Imagine Stream Processor. Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany. Presenter: Lu Hao. Contents. Stream processor Imagine Architecture Example: FFT application Experimental result Conclusion. Motivation of stream processor.

brooks
Download Presentation

The Imagine Stream Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao

  2. Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion

  3. Motivation of stream processor • Media-processing applications, such as 3-D polygon rendering, MPEG-2 encoding are becoming an increasingly dominant portion of computing workloads today • Properties of media-processing applications • Real-time performance constraints • High arithmetic intensity require parallel solutions • Inherently contain a large amount of data-parallelism • Providing large numbers of ALUs to operate on data in parallel is relatively inexpensive • Current programmable solutions cannot scale to support this many ALUs • Both providing instructions and transferring data at the necessary rates are problematic. • For example, a 48 ALU single-chip processor must issue up to 48 instructions/cycle and provide up to 144 words/cycle of data bandwidth to operate at peak rate.

  4. What is a stream processor • Usually SIMD • Allows some applications to more easily exploit a limited form of parallel processing • Using the stream programming model to expose parallelism as well as producer-consumer locality • can use multiple computational units

  5. The Imagine Processor • Imagine is a programmable stream processor and is a hardware implementation of the stream model. • Imagine is designed to be a stream coprocessor for a general purpose processor that acts as the host. • The programming model organizes the computation in an application into a sequence of arithmetic kernels, and organizes the data-flow into a series of data streams. • On a variety of realistic applications, Imagine can sustain up to 50 instructions per cycle, and up to 15 GOPS of arithmetic bandwidth. • Load-store architecture for streams (SRF)

  6. Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion

  7. Architecture of Imagine • 32 KW streamregister file (SRF) • The microcontroller keeps track of the program counter as it broadcasts each VLIW instruction to all eight clusters in a SIMD manner. • Each ALU cluster: six ALUs and 304 registers in several local register files (LRFs).

  8. Architecture of Imagine The SRF

  9. The SRF • Clusters <---> SRF: data that needs to be passed from kernel to kernel • SRF <---> DRAM: part of truly global data structures • All stream operands originate in the SRF and stream results are stored back to the SRF.

  10. Irregular stream locality converted to reuse through memory

  11. Irregular producer-consumer locality captured at the SRF

  12. Data distribution

  13. Data distribution result

  14. Architecture of Imagine The ALU cluster

  15. The ALU cluster 256 x 32-bit register file

  16. Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion

  17. Example: mapping of a 1024-point radix-2 FFT to the stream model

  18. Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion

  19. Experimental Result • Speedup of 8 clusters over 1 cluster

  20. Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion

  21. Conclusion • Stream processors are suitable for media-processing applications • Imagine exploits the data-level parallelism (DLP) in streams by executing a kernel on eight successive stream elements in parallel (one on each cluster). • SRF • ALU clusters • Application example: 1024pt FFT

  22. Thanks! • Questions?

More Related