1 / 13

Reducing memory penalty by a programmable prefetch engine for on-chip caches

Reducing memory penalty by a programmable prefetch engine for on-chip caches. Presentatie voor het vak computerarchitectuur door Armin van der Togt. Indeling:. Probleemstelling De prefetch architectuur Resultaten Conclusies Gerelateerd werk. Probleemstelling.

zoltin
Download Presentation

Reducing memory penalty by a programmable prefetch engine for on-chip caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing memory penalty by a programmable prefetch engine for on-chip caches Presentatie voor het vak computerarchitectuur door Armin van der Togt

  2. Indeling: • Probleemstelling • De prefetch architectuur • Resultaten • Conclusies • Gerelateerd werk

  3. Probleemstelling • Verschil tussen snelheid van geheugen en CPU wordt steeds groter dus: cache en prefetching • Hardware prefetching duur en complexe geheugen structuren moeilijk • Software prefetching veel executie overhead

  4. Original code Generated code (inner loop only) Software prefetching

  5. Hare Prefetch Engine Firing ALU on-chip cache De prefetch architectuur Run-Ahead Table PC ORQ Memory system Processor chip

  6. iaddr: PC om prefetch te starten <base, stride>: prefetch adres en stapgrootte <count, start>: prefetch condities count: eens in de count keer dat PC=iaddr wordt een prefetch gestart start: pas na start keer dat aan de bovenstaande conditie is voldaan mag begonnen worden met prefetchen Nieuwe instructie voor de prefetch engine: fill_run_ahead iaddr, <base, stride> , <count, start>

  7. Voorbeeld

  8. Code met prefetch instructies memory latency = 5 cycles

  9. Resultaten

  10. Conclusies • Prefetching kan geheugen penalty tot 80% verlagen • Een programeerbare prefetch engine verlaagt de penalty ten opzichte van software prefetching • Bij kleine caches (1-2k) is de programmerbare prefetch engine relatief duur • de compiler moet prefetching ondersteunen

  11. Gerelateerd werk • Fu and Patel: stride directed prefetching in scalar processors (hardware) • Mowry and Gupta: software controlled prefetching • Chiueh: A programmable hardware prefetch architecture for numerical loops (lijkt hier op)

  12. Literatuur • Tien-Fu Chen, Reducing memory penalty by a programmable prefetch engine for on-chip caches, Microprocessors and Microsystems, 21 (1997) 121-130

More Related