1 / 24

A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications

A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications. Boyana Norris Argonne National Laboratory Van Bui, Lois Curfman McInnes, Li Li Argonne National Laboratory Oscar Hernandez, Barbara Chapman University of Houston Kevin Huck

Download Presentation

A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois Curfman McInnes, Li Li Argonne National Laboratory Oscar Hernandez, Barbara Chapman University of Houston Kevin Huck University of Oregon

  2. Outline • Motivation • Performance/Power Models • Component Infrastructure • Experiments • Conclusions and Future Work • Acknowledgements

  3. Component-Based Software Engineering • Functional unit with well-defined interfaces and dependencies • Components interact through ports • Benefits: software reuse, complex software management, code generation, available “services” • Drawback: more restrictive software engineering, need for runtime framework

  4. Motivation • CBSE increasing in HPC • Power increasing in importance • A need for simpler processes for performance/power measurement and analysis • Performance tools can be applied at the component abstraction layer • Opportunities for automation

  5. Power vs. Energy Rate a system performs work Power = Work / ▲Time Total work over a period of time Energy = Power * ▲ Time

  6. Power Trends Cameron, K. W., Ge, R., and Feng, X. 2005. High-Performance, Power-Aware Distributed Computing for Scientific Applications. Computer 38, 11 (Nov. 2005), 40-47.

  7. Power Reduction Techniques • Circuit and logic level • Low power interconnect • Low power memories and memory hierarchy • Low power processor architecture adaptations • Dynamic voltage scaling • Resource hibernation • Compiler level power management • Application level power management

  8. Goals and Approach • Provide a component based system • Facilitates performance/power measurement and analysis • Computes high level performance metrics • Integrates existing tools into a uniform interface • End Goal: static and dynamic optimizations based on offline/online analyses

  9. Substitution Assertion Database System Diagram Analysis Infrastructure Control Infrastructure Instrumented Component Application Runs Control System (parameter changes and component substitution) CQoS-Enabled Component Application Performance/Power Databases (persistent & runtime) Substitution Set Interactive Analysis and Model Building Component A Component B Component C Machine Learning

  10. Performance Model I • FLP Inefficiency – PD: Problem size dependent variant • FLP Inefficiency – PI: Problem size independent variant

  11. Performance Model II • Core logic Stalls = L1D_register_stalls + branch_misprediction + instruction_miss + stack_engine_stalls + floating_point_stalls + pipeline_inter_register_dependency + processor_frontend_flush • Memory Stalls = L1_hits * L1_latency + L2_hits * L2_latency + L3_hits * L3_latency + local_mem_access * local_mem_latency + remote_mem_access * remote_mem_latency + TLB_miss * TLB_miss_penalty

  12. Power Model • Based on on-die components • Leverages performance hardware counters

  13. Performance Measurement and Analysis System • Components • TAU: Performance measurement • http://www.cs.uoregon.edu/research/tau/home.php • Performance Database Component(s) • PerfExplorer: Performance and power analysis • http://www.cs.uoregon.edu/research/tau/docs/perfexplorer/ TAU Component Database Components Component App Runtime Optimization PerfExplorer Component Compiler feedback User/tool analysis

  14. PerfExplorer Component • Loads a python analysis script • Performance and power analysis • Data mining, inference rules, comparing different experimental runs

  15. Study I: Performance-Power Trade-offs • Experiment – Effect of compiler optimization levels on performance and power • Experimental Details • Machine: SGI Altix 300 • MPI Processes: 16 • Compiler: OpenUH • Code: GenIDLEST • Optimization levels: -O0, -O1, -O2, -O3 • Performance tools: TAU, PerfExplorer, and PAPI

  16. Linux/ccNUMA

  17. Results • Aggressive optimizations Higher power • IPC ~ Power dissipation • Aggressive optimizations Lower energy • Operation count ~ energy consumption

  18. Performance/Power Study With PETSc Codes • PETSc: Portable Extensible Toolkit for Scientific Computation • http://www.mcs.anl.gov/petsc/ • Experimental Details • Machine: SGI Altix 3600 • Compiler: GCC • MPI Processes: 32 • Application: 2-D simulation of cavity flow • Krylov subspace linear solvers: FGMRES, GMRES, BiCGS • Preconditioner: Block Jacobi • Problem Size: 16x16 each processor (weak scaling) • Performance tools: TAU, PerfExplorer, PAPI

  19. Inefficiency • Bottlenecks in methods used in solution of linear system • Bottleneck also in preconditioner

  20. Results • FGMRES has good performance initially • Not very power efficient • BCGS is optimal for performance and power efficiency

  21. Conclusions • Little or no hardware and software support for detailed power measurement and analysis on modern systems • Need for more integrated toolsets supporting both performance and power measurements, analysis, and optimizations • Combining tools with component based software engineering can benefit efficiency and effectiveness of tuning process

  22. Future Directions • Integration of components into a framework • Dynamic selection of algorithms and parameters based on offline/online analyses • Compiler based performance power cost modeling • Continue performance and power analysis of PETSc based codes • Extension of performance and power model for more modern architectures

  23. References • Jarp, S. A methodology for using the itanium-2 performance counters for bottleneck analysis. Tech.rep., HP Labs, August 2002. • Bircher, W.L.; John, L.K. Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events. International Symposium on Performance Analysis of Systems & Software, Page(s):158 - 168, 2007. • Isci, C. and Martonosi, M. 2003. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data. In Proceedings of the 36th Annual IEEE/ACM international Symposium on Microarchitecture (December 03 - 05, 2003). • K. Huck, O. Hernandez, V. Bui, S. Chandrasekaran, B. Chapman, A. D. Malony, L.C. McInnes, and B. Norris. Capturing Performance Knowledge for Automated Analysis, Supercomputing, 2008 . http://www2.cs.uh.edu/~vtbui/sc.pdf

  24. Acknowledgments • Professors/Advisors: Boyana Norris, Lois Curfman McInnes, Barbara Chapman, Allen Maloney, Danesh Tafti • Students: Oscar Hernandez, Kevin Huck, Sunita Chandrasekaran, Li Li • SiCortex: Lawrence Stuart and Dan Jackson • MCS Division, Argonne National Laboratory • NSF, DOE, NCSA, NASA

More Related