1 / 26

Exploring Design Space for 3D Clustered Architectures

Exploring Design Space for 3D Clustered Architectures. Manu Awasthi , Rajeev Balasubramonian University of Utah. Layer 2. Device Layer 2. Layer 1. Vertical Interconnect. 1. Device Layer. Silicon. Silicon. 3D Technologies. Very Small ~ 10 µm. Multiple layers of active devices

hmollie
Download Presentation

Exploring Design Space for 3D Clustered Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah

  2. Layer 2 Device Layer 2 Layer 1 Vertical Interconnect 1 Device Layer Silicon Silicon 3D Technologies Very Small ~ 10µm • Multiple layers of active devices • Vertical interconnects between layers 3D Chip 2D Chip 1 Courtesy: K.Bernstein, IBM

  3. Benefits of 3D • Reduction of global interconnect L L • Delay/Power reduction • Bandwidth • Mix-technology integration

  4. Previous Proposals All are active HEAT!!! • Previously in 3D… • Break and stack (Folding) [Puttaswamy et al] • Vertical stacking of active devices Reduced Intra-block latency RegFile Break and Stack

  5. An alternative approach? Die 1 3D Chip 2D Chip Die 0 • Prudent Stacking Can: • Improve Performance • Result in better thermal profile

  6. Wire Delays and Performance

  7. Clustered Architectures • Centralized front-end • I-Cache & D-Cache • LSQ, Rename, Decode • Branch Predictor • Clustered back-end • Issue Queue • Regfile, FUs L1 D Cache Cluster Front- End Higher clock Frequency, High ILP!! Crossbar/Router

  8. Decentralized Cache Banks L1 D Cache L1 D Cache L1 D Cache Possibly better performance

  9. Decentralized Cache Banks L1 D Cache L1 D Cache L1 D Cache Replicated Cache Banks

  10. Decentralized Cache Banks Odd Words Even Words L1 D Cache L1 D Cache Word Interleaved Cache Banks

  11. Outline • Introduction • Motivation • 3D Architectures • Clustered Architectures • Proposals • Results • Conclusions

  12. Architecture 1 Die 1 Intra Die Interconnect Inter Die Interconnect Die 0 Cache-on-cluster Cluster Cache Bank

  13. Architecture 2 Die 1 Intra Die Interconnect Inter Die Interconnect Die 0 Cluster-on-cluster Cluster Cache Bank

  14. Architecture 3 Die 1 Intra Die Interconnect Inter Die Interconnect Die 0 Staggered Cluster Cache Bank

  15. Outline • Introduction • Motivation • 3D Architectures • Clustered Architectures • Proposals • Results • Conclusions

  16. Experimental Setup • Framework • Simplescalar, Wattch and Hotspot 3.0 • Wire model : 8x global metal plane • Benchmarks • SPEC 2K, single threaded • Processor Configuration • 8 Clusters • 64 kB L1 I/D Caches, 2 way set-assoc • L1 Data cache Word-Interleaved or Replicated • 2D Centralized Cache – Base Case

  17. Base Case Performances Best Case 2D Config

  18. The 3D Effect 3D Replicated vs 2D Centralized

  19. The 3D Effect 3D WI vs 2D Centralized

  20. Comparisons Best Case 2D Best Case 3D - Rep Best Case 3D - WI 2D Case 3D Replicated 3D WI 12% Improvement for best case 3D vs best case 2D

  21. Thermal Analysis • Wattch for power numbers • HotSpot 3.0 for thermal model (grid) • 500x500 grid resolution • Interconnect power modeling • Attributed to functional units • 8X plane wires • Router + Crossbar modeled as separate entity

  22. Thermal Profiles Peak Temperature : Hottest on-chip Unit (Celsius)

  23. Outline • Introduction • Motivation • 3D Architectures • Clustered Architectures • Proposals • Results • Conclusions

  24. Conclusions • Wire delays are critical to performance • Some are more important than others. • Prudent block stacking • Performance improvement upto 12% over 2D • WI banks + Arch 3 (3D) • Better thermal profiles compared to folding

  25. Backup Slides

  26. 4 Cluster Arrangements Cluster Cache bank Intra-die horizontal wire Inter-die vertical wire Die 1 Die 0 (a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)

More Related