1 / 22

A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications

A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications. Kai Li, Allen D. Malony , Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science

demont
Download Presentation

A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Online PerformanceAnalysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute, NeuroInformatics Center University of Oregon

  2. Outline • Problem description • Scaling and performance observation • Interest in online performance analysis • General online performance system architecture • Access models • Profiling issues and control issues • Framework for online performance analysis • TAU performance system • SCIRun computational and visualization environment • Experiments • Conclusions and future work

  3. Problem Description • Need for parallel performance observation • Instrumentation, measurement, analysis, visualization • In general, there is the concern for intrusion • Seen as a tradeoff with accuracy of performance diagnosis • Scaling complicates observation and analysis • Issues of data size, processing time, and presentation • Online approaches add capabilities as well as problems • Performance interaction, but at what cost? • Tools for large-scale performance observation online • Supporting performance system architecture • Tool integration, effective usage, and portability

  4. Scaling and Performance Observation • Consider “traditional” measurement methods • Profiling: summary statistics calculated during execution • Tracing: time-stamped sequence of execution events • More parallelism  more performance data overall • Performance specific to each thread of execution • Possible increase in number interactions between threads • Harder to manage the data (memory, transfer, storage, …) • More parallelism / performance data  harder analysis • More time consuming to analyze • More difficult to visualize (meaningful displays) • Need techniques to address scaling at all levels

  5. Why Complicate Matters with Online Methods? • Adds interactivity to performance analysis process • Opportunity for dynamic performance observation • Instrumentation change • Measurement change • Allows for control of performance data volume • Post-mortem analysis may be “too late” • View on status of long running jobs • Allow for early termination • Computation steering to achieve “better” results • Performance steering to achieve “better” performance • Online performance observation may be intrusive

  6. Performance Instrument Performance Measurement Performance Data Performance Control Performance Analysis Performance Visualization General Online Performance Observation System

  7. Models of Performance Data Access (Monitoring) • Push Model • Producer/consumer style of access and transfer • Application decides when/what/how much data to send • External analysis tools only consume performance data • Availability of new data is signaled passively or actively • Pull Model • Client/server style of performance data access and transfer • Application is a performance data server • Access decisions are made externally by analysis tools • Two-way communication is required • Push/Pull Models

  8. TAU Performance System Architecture Paraver EPILOG ParaProf

  9. Online Profile Measurement and Analysis in TAU • Standard TAU profiling • Per node/context/thread • Profile “dump” routine • Context-level • Profile file per eachthread in context • Appends to profile file • Selective event dumping • Analysis tools access filesthrough shared file system • Application-level profile“access” routine

  10. Performance Steering Online Performance Analysis and Visualization SCIRun (Univ. of Utah) Performance Visualizer Application // performance data streams TAU Performance System Performance Analyzer // performance data output accumulated samples Performance Data Integrator Performance Data Reader file system • sample sequencing • reader synchronization

  11. Profile Sample Data Structure in SCIRun node context thread

  12. Performance Analysis/Visualization in SCIRun SCIRun program

  13. Uintah Computational Framework (UCF) • Universityof Utah • UCF analysis • Scheduling • MPI library • Components • 500 processes • Use for onlineand offlinevisualization • Apply SCIRunsteering

  14. “Terrain” Performance Visualization F

  15. Scatterplot Displays • Each pointcoordinatedeterminedby threevalues: MPI_Reduce MPI_Recv MPI_Waitsome • Min/Maxvalue range • Effective forclusteranalysis • Relation between MPI_Recv and MPI_Waitsome

  16. Online Unitah Performance Profiling • Demonstration of online profiling capability • Colliding elastic disks • Test material point method (MPM) code • Executed on 512 processors ASCI Blue Pacific at LLNL • Example 1 (Terrain visualization) • Exclusive execution time across event groups • Multiple time steps • Example 2 (Bargraph visualization) • MPI execution time and performance mapping • Example 3 (Domain visualization) • Task time allocation to “patches”

  17. Example 1 (Event Groups)

  18. Example 2 (MPI Performance)

  19. Example 3 (Domain-Specific Visualization)

  20. Possible Improvements • Profile merging at context level to reduce number of files • Merging at node level may require explicit processing • Concurrent trace merging could also reduce files • Hierarchical merge tree • Will require explicit processing • Could consider IPC transfer • MPI (e.g., used in mpiP for profile merging) • Create own communicators • Sockets or PACX between computer server and analyzer • Leverage large-scale systems infrastructure • Parallel profile analysis

  21. Concluding Remarks • Interest in online performance monitoring, analysis, and visualization for large-scale parallel systems • Need to intelligently use • Benefit from other scalability considerations of the system software and system architecture • See as an extension to the parallel system architecture • Avoid solutions that have portability difficulties • In part, this is an engineering problem • Need to work with the system configuration you have • Need to understand if approach is applicable to problem • Not clear if there is a single solution

  22. Future Work • Build online support in TAU performance system • Extend to support PULL model capabilities • Develop hierarchical data access solutions • Performance studies of full system • Latency analysis • Bandwidth analysis • Integration with other performance tools • System performance monitors • ParaProf parallel profile analyzer • Development of 3D visualization library • Portability focus

More Related