Cache

Cache • Cache Design for Performance • Comparison • Intel Pentium 4 • Motorola PowerPC G4 • Trends • Glossary

Memory Hierarchy / Locality

Design for Performance • Size • High performance • All memory the system can cache • The design of chipset • Type • Unified/Separated Data+Instruction cache

Design for Performance • Size • Mapping • Replacement algorithms • Write policy • Block size • Number of caches

Design for Performance • Write Policy • Write back • One extra data bit • Higher performance than write through • Write through • multiple CPUs can monitor main memory traffic to keep local to CPU cache up to date • Lots of traffic and slow-down write (buffer = big help) • * Multi processors : data consistancy in different caches

Design for Performance • Block size • In general miss rate block size • Miss penalty block size • Number of caches • Cost, performance

Pentium 4 L2 cache options • 256 KB L2 integrated cache, 400 Mhz system bus • 512 KB L2 integrated cache 400 or 533 Mhz front side bus

P4 L1 Execution Trace Cache • stores up to 12K decoded micro-ops in the order of program execution • increases performance by circumventing decoder in main execution loop • efficient storage: branched instructions are not stored

P4 IF without Trace Cache

P4 IF with Trace Cache

Previous Pentiums

P4 with L2 cache/L1 Trace

PowerPC G4 • Motorola’s MPC7457 Processor: 1.3 GHz • Dual processor G4s share cache between processors • General market availability: Q4 2003

PowerPC G4: L1 Cache • Harvard architecture: split between data and instruction cache • 32 KB 8-way set associative • Pseudo-LRU replacement algorithm • 32 byte (8 word) L1 cache block • 4 instructions per clock cycle (instruction) & 4 words per clock cycle (date) • Cache way locking allows performance alogorithms and code to be locked into the cache. • Write back/through programmed per page or per block

PowerPC G4: L2 Cache • On-chip, unified instruction and data cache • 512 KB, 8-way set associative. Runs at processor speed. • 32 bytes/clock cycle communication with L1 cache • Write back/through programmed per page or per block • Pseudo-LRU replacement algorithm • Two 32 byte sectors per line • Twice the amount as previous G4s

PowerPC G4: L3 Cache • 2 MB off-chip L2 cache using DDR SDRAM • Runs at speeds up to 500 MHz • 1, 2, or 4 MB SRAM space • 1 or 2 MB cache space • 1 MB or 2 MB sectored line size • Dedicated bus to processor (4 GB/sec) • Dual processors: each has own L3 cache

G4 enhancement plan • Pipelining: Increase #, keep pipelines short • Small # of simultaneous instructions in various stages of execution (up to 16) • execute instructions in parallel, move them off chip in as few cycles as possible.

Drawbacks: G4 plan • more execution units added to handle decoding • Fewer instructions pipelined at any given time

Benefits: G4 Plan • Shorter miss penalties (min. 4 clock cycles) • Introduce Mitigating Factor: • Added L3 cache with 64 bit bus • Increased size of execution units speeds performance “wider and shallower”

P4 Enhancement Plan • Innovative cache implementation • “Narrow, but deep” pipelining: • higher clock speeds • Deep scheduling queues to prevent pipeline “bubbles”

Drawbacks: P4 plan • Larger miss penalties (min. 19 clock cycles) • due to longer pipelining • Longer slower pipelines require complicated scheduling • No L3 cache

Benefits: P4 Plan • Trace Cache significantly improves throughput to execution cycle • Faster clock speeds overall

Sources • http://www.intel.com • http://www.apple.com/powermac/architecture.html • “MPC7457 RISC Microprocessor Hardware Specs” http://e-www.motorola.com/webapp/sps/library • “Motorola – PowerPC G4” http://www.geek.com • “Cache Latencies of the MPC7450” http://www.motorola.com • “Motorola Delivers….” http://www.us.design-reuse.com/news/news4852.html • “The Fundamentals of Cache” http://www.systemlogic.net/articles/00/10/cache/page2.php

Cache

Cache

Presentation Transcript

Cache Memory

Cache memory

Cache Memory

Trace Cache

Cache Memory

Cache

Cache Memory

Cache

Cache

Cache Memory

Cache Administration

Cache Memory

Cache Improvements

OOL CACHE

Cache

Cache Memory

L3 Cache

Cache?

COMMUNICATIONS CACHE

Cache

Cache Cache Experience Paris March 2012