1 / 28

The GLIMPSES Toolkit Rapid code prototyping for SPEs

The GLIMPSES Toolkit Rapid code prototyping for SPEs. Jaswanth Sreeram, Santosh Pande. Overview of Toolkit. GLIMPSES Toolkit : GL obal I nterprocedural M emory and P aralleli S m E stimator for S PUs Profile instrumentation support Profile parsers and interpreters.

mercury
Download Presentation

The GLIMPSES Toolkit Rapid code prototyping for SPEs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The GLIMPSES ToolkitRapid code prototyping for SPEs Jaswanth Sreeram, Santosh Pande

  2. Overview of Toolkit • GLIMPSES Toolkit : GLobal Interprocedural Memory and ParalleliSm Estimator for SPUs • Profile instrumentation support • Profile parsers and interpreters. • Analyzers for memory allocation & access behavior • Visualization Engine

  3. GLIMPSES toolkit • One of two tools available in public domain • Rapid Prototyping, Legacy Code Migration and Performance Tuning on Cell SPEs • Second one is asmvis • Released on source-forge in mid July: http://glimpses.sourceforge.net • OSI certified open source license(s). • Has received interest for adoption in academia and industry • Samsung Korea, Codecs and Media computing Group. • Sony Computer Entertainment America (SCEA)

  4. GLIMPSES : Motivation • Prototyping large codebases for porting to SPEs is challenging • Find a partition (set of functions) • Find a set of upward exposed references • DMA transfer them and lay them out – alignment • After execution store the results back • Make sure memory requirements do not exceed capacity

  5. Motivation – contd. • Challenges due to architectural attributes • Limited local store • High branch penalty • Suited for vectorizable code rather than scalar code • SPE/PPE interactions • Provide programmer with tools to • Understand program behavior (esp. memory usage) • Quickly construct candidates partitions for SPE • Evaluate/Quantify partitions’ suitability for SPEs

  6. GLIMPSES : Details • Memory Estimation tools enable programmer to: • Estimate static & dynamic memory usage • Code, Stack, Heap • Understand program behavior • Detect program objects affecting dynamic memory behavior • Show the correlation between these program objects and memory usage. • Rank program segments • Criteria: Memory requirements, vectorizability, branching, etc. • Visualize results interactively.

  7. Features overview • Dynamic Call Graph visualization – ability to select a call tree • Memory Requirements • Dynamic • Analytical – ‘what if’ scenario calculator for memory capacity • Memory Access Patterns • Locality (spatial, temporal, neighbor affinity) • Ranking • Criteria based estimates • Alias and safe pre-fetching information • Multiple alias analyses available

  8. C/C++ program LLVM compiler flow Bytecode Analysis & Instrumentation Passes Instru. Bytecode Runtime Link Execute Overview Analytical Memory Estimator Partition Estimator Dyn. Memory Estimator GraphML Trace Visualization Engine Test Inputs Profile Trace

  9. Visualization Graph Visualization Area Results Display Panel

  10. Visualization …contd

  11. Visualization …contd • Zoom view • Shows dynamic call chains for a program run (in this case the program is mpeg2-decode)

  12. Visualization …contd Function Characteristics Alias Analysis Algorithm used Type of Aliases displayed (“Must Alias”, “May Alias”, “No Alias”) Aliasing information for pairs of variables/memory regions.

  13. Analytical Memory Estimation • Correlate dynamic memory usage with program objects • Dynamic memory usage depends on inputs, etc. • Compiler Analysis • From each malloc, do a backward traversal to find instructions that influence the arguments to malloc. • Construct an arithmetic expression for amount of memory allocated, in terms of inputs or other program objects. • Handles control flow constructs (if-then-else, loops etc)

  14. Memory Behavior: Analytical Estimation __Malloc_size__1 = Picture_Width*Picture_Height __Malloc_size__2 = Picture_Width*Picture_Height __Malloc_size__3 = Picture_Width*Picture_Height __Malloc_size__4 = Picture_Width*Picture_Height __Malloc_size__5 = Chroma_Width*Chroma_Height __Malloc_size__6 = Chroma_Width*Chroma_Height __Malloc_size__7 = Chroma_Width*Chroma_Height __Malloc_size__8 = Chroma_Width*Chroma_Height if (cc==0)‏ size = Picture_Width * Picture_Height; else size = Chroma_Width * Chroma_Height; ….. …… for(….) { if (…..) malloc(size); if (…..) malloc(size); }

  15. Memory References • Memory reference metrics • Temporal (frequency) • Spatial • Neighbor affinity • Metrics measured per memory line • Per function metrics or per-partition metrics • Visually represented via a color map • Pale Violet (low) -> Bright Red (high)

  16. Memory Ref. Frequency (mpeg2decode) Memory Reference map (per partition) with 1024B memory lines

  17. Mpeg2decode: Load recurrence

  18. Neighbor Affinity • Metric to describe how well memory layout is suited to caching • Consider a slice S of length w of the whole memory access trace and two loads L1, L2 Є S If |L1addr – L2addr| < line size then L1, L2 exhibit neighbor affinity for slice size w

  19. Load Neighbor Affinity

  20. Alias Analysis for libode • Basic AA (least precise, fastest) • Aggressive local analysis • Non context sensitive • Non-flow sensitive • Total number of queries 119520497 • “No Alias” 35924925 • “May Alias” 83492482 • “Must Alias” 103090

  21. Alias Analysis (contd) • Globals Mod/Ref • context-sensitive mod/ref and alias analysis for internal global variables • Very fast, very precise, limited scope • Total number of queries 119520497 • “No Alias” 35944215 • “May Alias” 83473192 • “Must Alias” 103090

  22. Alias Analysis (contd) • Anderson’s AA algorithm • Subset-based, flow-insensitive, context-insensitive, and field-insensitive alias analysis • Very precise, but slow. • Total number of queries 119520497 • “No Alias” 79361105 • “May Alias” 40057171 • “Must Alias” 102221

  23. Ranking (MPEG2Encode) • Criteria based • Code Size (csize) • Stack Size (ssize) • Heap Size (hsize) • Branch density (br_density) • Autovectorizable loops (av_loops) • Is LS memory limit likely to be hit (ls_limit) Rank = w1*csize + w2*ssize + w3*hsize + w4*br_density + w5/(1 + av_loops) + w6* ls_limit (wi are weights for each criteria)

  24. Partitioning • Preprocessing: Propogate ranks upwards in the call graph Rank(n) = Rank(n) + ∑ Rank(n→child[i]) • Input: Call graph consisting of nodes annotated with ranks • Output: Graph partitions that are suitable for execution on the SPEs • A partition P is deemed “suitable” if Rank(P→root) < Threshold

  25. Effect of threshold on partitions mpeg2decode

  26. GLIMPSES status Beta version available for download at: http://glimpses.sourceforge.net 300MB source code package (includes visualizer)‏ Lines of code (C/C++): 447,000 Third party tools integrated: LLVM (Compiler), Prefuse (Visualization) Executable Size: 422 MB (x86 binaries) Typical trace size : 900 MB (LIBODE) Man-hour effort: ~750 Releases : v.0.8 : based on LLVM version 1.8 (July 7th)‏ v.1.0 : based on LLVM version 2.0 (undergoing testing) Tested to work with large codebases: LIBODE (115000 lines of code), mpeg2 (10000 lines of code etc.), SPEC INT 2000‏ etc. 26

  27. Ongoing and future work • More Validation • Compare partitions produced with those generated by expert programmers • An inter-procedural, flow-sensitive, context-sensitive alias analysis algorithm

  28. Ongoing and future work • Function data dependence graph • Encapsulates data flow between functions • Arguments, aliases, globals • Important factor in partitioning decisions – “affinity between pairs of functions”

More Related