1 / 10

Analytic Evaluation of Shared-Memory Systems with ILP Processors

This paper presents an analytic evaluation of shared-memory systems with ILP processors, providing insights into system behavior and performance. The study utilizes a set of equations to describe system details and estimates various parameters to assess system throughput. The analytical model is validated through simulations and provides valuable information on application behavior, the impact of architectural parameters, and programmable coherence controllers.

Download Presentation

Analytic Evaluation of Shared-Memory Systems with ILP Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analytic Evaluation of Shared-Memory Systems with ILP Processors D.J. Sorin, V.S. Pai, S.V. Adve, M.K. Vernon, D.A. Wood Presented by Bogdan Romanescu

  2. Introduction • Motivation: Simulating shared-memory systems with ILP processors takes painfully long • Hypothesis: It is possible to describe the system with a set of equations which • have simple parameters • capture system details • Method: View memory as a system of queues and delay centers • Metric: Processor throughput

  3. System under test • Cache coherent shared-memory multiprocessor • Mesh interconnection • Processor • multiple issue • out of order scheduling • non blocking loads • speculative execution • L1 and L2 $ state tracking • miss status holding registers (MSHR) • Interleaved memory and directory

  4. Model parameters • Architecture parameters • number of nodes • number of MSHRs • NI, bus and switch occupancies • Application parameters • ILP parameters: • , CV • fsynch-write • fM • Directory coherence parameters: Pread, Pwrite, Pupgrade, Pwb, PL|x, PM|x,y, P3hop|x&not-memory, H, X

  5. Estimating parameters • Non-ILP dependent : fast simulators for multiprocessors with single issue in order processors • ILP dependent : FastILP simulator • Timestamping • “Eras” division • Trace-driven simulations

  6. Analytical model • Output measure: system throughput (IPC) as f(input parameters, system architecture) • Iterations between 2 models • Synchronous blocking model (SB): processor stalled due to load and read-modify-write • MSHR blocking model (MB): processor stalled due to MSHRs full • MVA equations used for computing delay • Synchronizations accounted for separately (locks and barriers)

  7. Equations • Average round-trip time SB • Total average residence time at NI out queue • Total mean delay for each type of synchronous transaction at local NI • Utilization of local NI queue • Average waiting time at local NI queue due to traffic from remote nodes

  8. Model validations • Better approximation for the residual life • Account for significant fsynch-write

  9. Applications • Insights into application behavior • fM : ability to exploit ILP to overlap read memory requests • CV: degree of burstiness • Evaluation of the impact of the MSHRs number • Benefits of coupled/decoupled memory and directories • Analysis of programmable coherence controllers impact

  10. Questions • Is “mean time” a representative measure? • How misleading can it be? • Residual life: even with interpolation, accurate enough? • Why are the errors going up even after using the 2 accuracy-increasing observations?

More Related