Lecture 20: Main Memory

Lecture 20: Main Memory Today SRAM vs. DRAM Technology Trends Advanced DRAM organizations

Integrated into CPU Fast, many ports Main Memory Very dense Slower to access, one port SRAM DRAM Register File DO A DI D A W B B Typical Memory Technologies • Caches • On chip L1 • Off chip L2 • More bits, fewer ports

SRAM Organization

SRAM Timing Address Address CE OE Dout Data Valid

Basic DRAM Architecture

RAS CAS DRAM Access Time Address Row Column Dout Data Valid

SRAM 4MBit capacity 64-bit data interface 15-20ns access time (50-80MHz) Storage cells are self-restoring Lower power 4 times cost per bit (vs DRAM) DRAM 64MBit capacity 16-bit interfaces 60ns read access time 16MHz Reads destroy data Must write data back Refresh periodically Higher power Lower cost per bit SRAM vs. DRAM

Technology Trends

Improving External Memory System Performance • Bandwidth vs. Latency • Bandwidth = #bits transferred per cycle • Latency = time to access DRAM • Bandwidth • Memory bus width (16, 32, 64) • Address interleaving • Independent Memory Banks • Latency • Synchronous DRAM access modes • Faster interface (Rambus)

Memory Bus Width • Depends on microprocessor implementation • 386 = 32 bit external data bus • 386SX = 16 bit external data bus • Today = 64 bit data busses common, 128 bit soon • Also interacts with external DRAM organization

Main Memory Organization (widen data busses) • 64Mbit DRAM (= 8MB) • What if we want a 64MB memory system? 4MB x 4x4 2MB x 8x4 1MB x 16x4

Distribute memory address space across memory banks Route requests to banks based on low block address bits Allows memory accesses to go in parallel Two key issues how are replies matched up with requesters? how do we avoid bank conflicts? Multiple Memory Banks (interleaving) or Offset Bank Word Offset Bank A0 A1 A2 A3 D0 D1 D2 D3 A0 A1 A2 A3

Interleaved Memory Organization Bank Select Latch or Queue CPU & Cache Memory Bank

Accesses may not reference banks evenly Consider 0,1,2,3 … vs 0,8,16,24 … often caused by column access to a matrix causes problems for large block size too Solutions don’t do that number of columns in matrix not a power of 2 prime number of banks number of active banks with stride s is lcd(s,b) hash the banks Bank Conflicts

Synchronous DRAM • Interface signals are clocked • Clock provided by microprocessor • Why? • Easier to designed timed protocols • “Data available 8 cycles after CAS” • Add intelligence to EMI (external memory interface) on CPU

RAS CAS Burst Mode • Provide one address and sequence of data comes out • Perfect for cache line reads and writes • Burst size programmable Address Row Column Dout D0 D1 D3 D2

RAS CAS Page Mode Access • One RAS (get whole row) • Multiple CAS (different parts of the row) • Exploits spatial locality (kind of like DRAM cache) Address Row ColB ColC Dout C0 B0 C1 B1 B3 B2

RAS CAS Pipelined Mode Access • Interleave access to multiple internal banks • Lower latency for back-to-back access to different banks Address ColC RowB RowC ColB Dout C0 B0 C1 B1 B3 B2

New DRAM Interfaces • Rambus • 800 MHz interface (18 bits gets you 14.4Gb/sec) • compare this to 100Mhz, 16 bit SyncDram = 1.6Gb/sec • More complicated electrical interface on DRAM and CPU • Restrictions on board level design

Next Time • Virtual Memory • Allow multiple users with protection • Enable relocation of data • Extend memory hierarchy automatically past DRAM

Lecture 20: Main Memory