Signature Buffer: Bridging Performance Gap between Registers and Caches

Signature Buffer: Bridging Performance Gap between Registers and Caches Lu Peng, Jih-Kwon Peir, Konrad Lai

Introduction • Two types of storage • Registers • Fast and small • Supply data for operations • Memory • Large and slow • Cache for recently used data • Most RISC only operates on data from registers • Data communication path • Producer -> store -> load -> consumer

Introduction • Future processors with 35nm technology • 10 GHz clock • 64 KB L1 cache • 3-7 cycles L1 cache access time • IPC degrades by 3.5% per additional cycle on L1 cache access time

Signature Buffer • Zero-cycle load • “The load and its dependent instructions can be fetched, dispatched and executed at the same time” • Avoid address calculation • Each load and store uses a signature for accessing the storage • The signature buffer can be accessed in early pipeline stages • A signature consists of, • Color of the base register • Displacement value

Outline • Motivation • Implementation • Performance evaluation

Motivation –Memory Reference Correlations • Signature correlations • Store-load and load-load can be correlated directly by the signature • Signature reference locality • Nearby memory references often differ by small displacement value with the same base register

Example 1 Signature correlations Signature reference locality Source and Assembly Codes of Function copy_disjunct from Parser

Example 2 Source and Assembly Codes of Function bsW from Bzip

Signature Buffer

Signature Buffer Initial State 0 1 2 3 32

Signature Buffer 0 1 2 -> 32 1 100 3 32 -> 33 1 -- 100

Data Alignment

Data Alignment SB Directory SB Data Array L1 Data Array L1 Tag Array Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

Data Alignment SB Directory SB Data Array L1 Data Array L1 Tag Array SB MISS! Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

Data Alignment SB Directory SB Data Array L1 Data Array L1 Tag Array SB MISS!Invalidate high A, low B Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

Microarchitecture • Bypass I • SB hit or an early store-load forwarding • Bypass II • Normal store-load forwarding

Microarchitecture

Performance Evaluation

Performance Evaluation –IPC SB – nospec 13% speedup SB – perfect 14% speedup

Performance Evaluation –Load Distribution Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SB SB With perfect memory dependence predictor obtains 23% zero-cycle load

Performance Evaluation –SB Hit Ratio Average SB hit rate is about 51%

Performance Evaluation –Comparison with L0 Cache Performance benefit of SB goes up with L1 latencyand always above having a L0 cache

Performance Evaluation –Comparison with L0 Cache Larger L0 => higher hit rate SB is less sensitive to size.

Advantages • Non-speculative • Data obtained from the SB without intervening stores is always correct • All loads can access the data from the SB without any restriction on the type of the loads or base registers. • Loads through the SB can bypass the address generation and cache access completely. • Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement. • SB uses line-based granularity to capture spatial locality.

Questions?

Loads – SB Specific • Early S-L forwarding • A load has identical signature with an early store in the LSQ with no intervening store in between. (zero-cycle load & SB hit) • Early SB access • SB is accessed after a load is fetched and decoded (zero-cycle load & SB hit) • Delayed SB access • SB is accessed after memory dependence resolutions because of intervening stores (SB hit) • Non-Signature Forwarding • Consecutive SB misses to the same SB line gets forwarded data from previous misses (SB miss)

Signature Buffer: Bridging Performance Gap between Registers and Caches

Signature Buffer: Bridging Performance Gap between Registers and Caches

Presentation Transcript

Bridging the Gap

Behavioral Design Style: Registers, Counters, Shift Registers Basic Testbenches

Chapter 11: Monitoring Server Performance

Lecture 3

Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

Buffer Overflow and Other Memory Corruption Attacks

E-signature Strategies

Memory Hierarchy Design

Bridging Research, Policy and Practice

Bridging Leadership Introductory Seminar

Sequences Linear Shift Registers and Stream Ciphers

Intergenerational Differences: Bridging the Gap

Misuse detection systems

系統程式

CIT Performance Management

Chapter 9

Compilation 0368-3133 (Semester A, 2013/14)

Performance Analysis for VoIP System

Combined use of data from registers and sample surveys

Lecture 11 Sequential Logic,ISA, ALU

Cache Memories