1 / 10

Non-Uniform Cache Architecture

Non-Uniform Cache Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100 for Prof. Yalamanchili. Non-Uniform Cache Architecture. ASPLOS 2002 proposed by UT-Austin Facts Large shared on-die L2

liang
Download Presentation

Non-Uniform Cache Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Uniform Cache ArchitectureProf. Hsien-Hsin S. LeeSchool of Electrical and Computer EngineeringGeorgia TechGuest lecture for ECE4100/6100 for Prof. Yalamanchili

  2. Non-Uniform Cache Architecture • ASPLOS 2002 proposed by UT-Austin • Facts • Large shared on-die L2 • Wire-delay dominating on-die cache 3 cycles 1MB 180nm, 1999 11 cycles 4MB 90nm, 2004 24 cycles 16MB 50nm, 2010

  3. Multi-banked L2 cache Bank=128KB 11 cycles 2MB @ 130nm Bank Access time = 3 cycles Interconnect delay = 8 cycles

  4. Multi-banked L2 cache Bank=64KB 47 cycles 16MB @ 50nm Bank Access time = 3 cycles Interconnect delay = 44 cycles

  5. Sub-bank Bank Data Bus Predecoder Address Bus Sense amplifier Tag Array Wordline driver and decoder Static NUCA-1 • Use private per-bank channel • Each bank has its distinct access latency • Statically decide data location for its given address • Average access latency =34.2 cycles • Wire overhead = 20.9%  an issue

  6. Static NUCA-2 Tag Array • Use a 2D switched network to alleviate wire area overhead • Average access latency =24.2 cycles • Wire overhead = 5.9% Switch Bank Data bus Predecoder Wordline driver and decoder

  7. Dynamic NUCA • Data can dynamically migrate • Move frequently used cache lines closer to CPU

  8. Dynamic NUCA bank • Simple Mapping • All 4 ways of each bank set needs to be searched • Farther bank sets  longer access 8 bank sets one set way 0 way 1 way 2 way 3

  9. Dynamic NUCA bank • Fair Mapping • Average access time across all bank sets are equal 8 bank sets one set way 0 way 1 way 2 way 3

  10. Dynamic NUCA bank • Shared Mapping • Sharing the closet banks for farther banks 8 bank sets way 0 way 1 way 2 way 3

More Related