200 likes | 397 Views
Lecture 20: Main Memory. Today. SRAM vs. DRAM Technology Trends Advanced DRAM organizations. Integrated into CPU Fast, many ports. Main Memory Very dense Slower to access, one port. SRAM. DRAM. Register File. DO. A. DI. D. A. W. B. B. Typical Memory Technologies. Caches
E N D
Lecture 20: Main Memory Today SRAM vs. DRAM Technology Trends Advanced DRAM organizations
Integrated into CPU Fast, many ports Main Memory Very dense Slower to access, one port SRAM DRAM Register File DO A DI D A W B B Typical Memory Technologies • Caches • On chip L1 • Off chip L2 • More bits, fewer ports
SRAM Timing Address Address CE OE Dout Data Valid
RAS CAS DRAM Access Time Address Row Column Dout Data Valid
SRAM 4MBit capacity 64-bit data interface 15-20ns access time (50-80MHz) Storage cells are self-restoring Lower power 4 times cost per bit (vs DRAM) DRAM 64MBit capacity 16-bit interfaces 60ns read access time 16MHz Reads destroy data Must write data back Refresh periodically Higher power Lower cost per bit SRAM vs. DRAM
Improving External Memory System Performance • Bandwidth vs. Latency • Bandwidth = #bits transferred per cycle • Latency = time to access DRAM • Bandwidth • Memory bus width (16, 32, 64) • Address interleaving • Independent Memory Banks • Latency • Synchronous DRAM access modes • Faster interface (Rambus)
Memory Bus Width • Depends on microprocessor implementation • 386 = 32 bit external data bus • 386SX = 16 bit external data bus • Today = 64 bit data busses common, 128 bit soon • Also interacts with external DRAM organization
Main Memory Organization (widen data busses) • 64Mbit DRAM (= 8MB) • What if we want a 64MB memory system? 4MB x 4x4 2MB x 8x4 1MB x 16x4
Distribute memory address space across memory banks Route requests to banks based on low block address bits Allows memory accesses to go in parallel Two key issues how are replies matched up with requesters? how do we avoid bank conflicts? Multiple Memory Banks (interleaving) or Offset Bank Word Offset Bank A0 A1 A2 A3 D0 D1 D2 D3 A0 A1 A2 A3
Interleaved Memory Organization Bank Select Latch or Queue CPU & Cache Memory Bank
Accesses may not reference banks evenly Consider 0,1,2,3 … vs 0,8,16,24 … often caused by column access to a matrix causes problems for large block size too Solutions don’t do that number of columns in matrix not a power of 2 prime number of banks number of active banks with stride s is lcd(s,b) hash the banks Bank Conflicts
Synchronous DRAM • Interface signals are clocked • Clock provided by microprocessor • Why? • Easier to designed timed protocols • “Data available 8 cycles after CAS” • Add intelligence to EMI (external memory interface) on CPU
RAS CAS Burst Mode • Provide one address and sequence of data comes out • Perfect for cache line reads and writes • Burst size programmable Address Row Column Dout D0 D1 D3 D2
RAS CAS Page Mode Access • One RAS (get whole row) • Multiple CAS (different parts of the row) • Exploits spatial locality (kind of like DRAM cache) Address Row ColB ColC Dout C0 B0 C1 B1 B3 B2
RAS CAS Pipelined Mode Access • Interleave access to multiple internal banks • Lower latency for back-to-back access to different banks Address ColC RowB RowC ColB Dout C0 B0 C1 B1 B3 B2
New DRAM Interfaces • Rambus • 800 MHz interface (18 bits gets you 14.4Gb/sec) • compare this to 100Mhz, 16 bit SyncDram = 1.6Gb/sec • More complicated electrical interface on DRAM and CPU • Restrictions on board level design
Next Time • Virtual Memory • Allow multiple users with protection • Enable relocation of data • Extend memory hierarchy automatically past DRAM