1 / 30

Advanced MPI

This presentation explores advanced concepts in MPI, including performance measurements, point-to-point communications, datatypes, communicators, collective operations, and MPI-2 features. It also discusses the portability and tool-friendliness of MPI, as well as tuning MPI programs for peak performance.

ohaney
Download Presentation

Advanced MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced MPI William D. GroppRusty Lusk and Rajeev ThakurMathematics and Computer Science DivisionArgonne National Laboratory

  2. Outline • Introduction and review of MPI concepts • Performance measurements in MPI: methods and pitfalls • MPI Point-to-point communications • MPI Datatypes • Communicators and Libraries • Collective Operations • MPI-2 Features

  3. Outline • Background • The message-passing model • Origins of MPI and current status • Sources of further MPI information • Basics of MPI message passing • Hello, World! • Fundamental concepts • Simple examples in Fortran and C • Extended point-to-point operations • non-blocking communication • modes

  4. Outline (continued) • Advanced MPI topics • Collective operations • More on MPI datatypes • Application topologies • The profiling interface • Toward a portable MPI environment

  5. MPI is Simple • Many parallel programs can be written using just these six functions, only two of which are non-trivial: • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV

  6. Alternative set of 6 Functions for Simplified MPI • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_BCAST • MPI_REDUCE

  7. Toward a Portable MPI Environment • MPICH is a high-performance portable implementation of MPI (1). • It runs on MPP's, clusters, and heterogeneous networks of workstations. • In a wide variety of environments, one can do: configure make mpicc -mpitrace myprog.c mpirun -np 10 myprog upshot myprog.log to build, compile, run, and analyze performance.

  8. end of first canned presentation

  9. MPI is Tool-Friendly • The MPI profiling interface can be used to write portable performance-analysis tools that interact with any MPI implementation. • Upshot is one such tool:

  10. Still Not Covered • Process topologies • Creating groups and communicators • Attributes • Persistent requests

  11. How Big is MPI? • MPI is large (MPI-1 contains about 125 calls). • MPI’s extensive functionality requires many functions • The number of functions is not necessarily a measure of complexity. • MPI is small • (Many useful programs can be written with just 6 of them). • MPI is just right • One can access flexibility when it is required. • One need not master all parts of MPI to use it.

  12. A Final Point • MPI provides an extensive specification for message-passing programs and libraries. • Many issues required for writing portable parallel libraries have been addressed. • Efficient implementations have made it possible for library developers to write efficient, portable code for others to use. • End users may increasingly find that libraries, rather than explicit message-passing code, will be the key to developing applications.

  13. End of third canned

  14. Tuning MPI Programs for Peak Performance William Gropp Ewing Lusk Argonne National Laboratory

  15. Outline • Goals of the Tutorial • Background assumptions • How message passing works (protocols) • How protocols relate to MPI calls • Performance modeling, measurements, and tools • Diagnosing and understanding performance problems • Vendor-specific issues • MPI-2

  16. Assumptions and Background We assume you have some familiarity with • Various MPI send/receive modes • Elementary collective operations • MPI datatypes

  17. Performance Modeling, Measurements and Tools • Basic Model • Needed to evaluate approaches • Must be simple • Synchronization delays • Main components • Latency and Bandwidth • Other effects on performance • Understand deviations from the model

  18. Including Contention • Lack of contention greatest limitation of latency/bandwidth model • Hyperbolic model of Stoica, Sultan, and Keyes provides a way to estimate effects of contention for different communication patterns; see ftp://ftp.icase.edu/pub/techreports/96/96-34.ps.Z

  19. Other Impacts on Performance • Contention • In the network • At the processors • Memory Copies • Packet sizes and stepping

  20. Diagnosing and understanding performance problems • Memory Copies and MPI datatypes • Effect of message packetization • Synchronization delays • Unexpected hot spots and premature synchronization • Polling and Interrupt style MPI implementations • Effect of contention • Choosing between MPI alternatives

  21. Memory copies • Memory copies are the primary source of performance problems • Cost of non-contiguous datatypes • Single processor memcpy is often much slower than the hardware.Measured memcpy performance:

  22. Example: Performance Impact of Memory Copies • Assume n bytes sent eagerly (and buffered) • s + r n + c n • Rendezvous, not buffered • s + s + (s + r n) • Rendezvous faster if s < cn/2 • Assumes no delays in responding to rendezvous control information

  23. Summary • Achieving peak performance requires a model of how message-passing systems are implemented • MPI exposes message-passing semantics to give programmers more control • Experimentation is necessary to understand performance • Tools are available • Last word • Defer Synchronization!

  24. End of SC97 canned

  25. Logging and Visualization Tools • Upshot and MPE tools • VT • Pablo, Paragraph, and Paradyn • Other vendor tools • Validation by running with coarse-grain logging

  26. Upshot and MPE • Automatic logging • Uses PMPI interface and a special library • mpicc -mpilog … • mpirun -np 8 a.out …. • User-directed logging • MPE_Log_event calls inserted by user • MPE_Describe_state defines user state • states may be nested • Works with MPICH and vendor MPIs

  27. Sample Upshot Output

  28. Validating the logging • Logging introduces some timing differences • Can change the behavior of the computation (and not all can be filtered out, for example, if the presence of logging causes different protocols to be used) • Compare coarser grain timings to check that the detailed logging did not change the behavior of the program

  29. Deficiency analysis/filtering techniques • Only interested in • taking too long, and • too slow (compared to model) • Filter logs with thresholds for both; use this to further instrument • Interesting research area • P3T is an example tool

  30. End of performance measurement canned

More Related