1 / 23

TAU Performance Tool on Ranger and Kraken

TAU Performance Tool on Ranger and Kraken. Mahin Mahmoodi October 22, 2009. Outline. TAU’s features Instructions using TAU on Ranger & Kraken TAU overhead TAU MD code profiling Callpath Outer Loop Selective Phase Tracing References. TAU: Tuning & Analysis Utilities.

omar-welch
Download Presentation

TAU Performance Tool on Ranger and Kraken

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TAU Performance Tool on Ranger and Kraken Mahin Mahmoodi October 22, 2009

  2. Outline • TAU’s features • Instructions using TAU on Ranger & Kraken • TAU overhead • TAU MD code profiling • Callpath • Outer Loop • Selective • Phase • Tracing • References

  3. TAU: Tuning & Analysis Utilities • Supports essentially all computing platforms • Auto and manual instrumentation for profiling , tracing, and sampling • Routine, loop, block, structure, phase profiling, compiler and binary instrumentation • Multiple programming language & programming paradigms support • Fortran, Java, C/C++, Python, .., Multi-threading, message passing, mixed-mode, hybrid, … • Performance measurement of I/O & Linux kernel • Track heap memory for each routine • Application to OS noise analysis • Tools for performance data management & mining • Sophisticated visualization (2-D & 3-D views) • Translation to multiple trace formats • Developed by Allen Malony et al. at the University of Oregon and ParaTools • TAU Website: http://tau.uoregon.edu

  4. Location of TAU on Ranger & Kraken Ranger % module avail tau tau/2.17(default) - Latest TAU version is available from: /share/home/00968/tg802155/tau-2.18.2p4 • Configured with pgi7_2/mvapich/1.0.1, and with intel10_1/mvapich/1.0.1 Kraken % module avail tau tau-2.18.1, tau-2.18.2 - Configured with PGI compiler

  5. General Instructions for TAU Step 1: Auto Instrumentation • Use a TAU Makefile stub • Compile with TAU scripts (tau_cc.sh, tau_f90.sh, tau_cxx.sh) Example - setenv TAU_MAKEFILE <TAU_path>/lib/Makefile.tau-papi-pdt-pgi - setenv TAU_OPTIONS "-optVerbose -optKeepFiles“ (optional) - tau_cc.sh -o hello hello.c (hello is an instrumented binary) If using makefile - make CC=tau_cc.sh

  6. General Instructions for TAU (cont.) Step 2: Execution • module load papi • Set the necessary environment variables and run the instrumented binary as normal (this step generates profile files, one per core) • information on environment variables is here: http://www.cs.uoregon.edu/research/tau/docs/newguide/bk03apa.html#d0e15219 Step 3: Profiling report • Add TAU bin directory to your path set path=( <tau_path>/bin $path) • Run pprof (text) or paraprof (GUI) to get results To view on Windows workstation: - % paraprof --pack app.ppk (pack profiles on remote machine) - click on app.ppk in workstation (tau has to be installed first)

  7. The UNRES The UNRES molecular dynamics (MD) code utilizes a carefully-derived mesoscopic protein force field to study and predict protein folding pathways by means of molecular dynamics simulations. • http://www.chem.cornell.edu/has5 • http://www.chem.cornell.edu/has5

  8. % paraprof – Main Data Window UNRES on Ranger, 16way 32 nodes Left_click here Subroutines time breakdown in each cores: Core 0 Core 1

  9. TAU Instrumentation Overhead • TAU Direct measurement • Deterministic approach • Auto and manual instrumentation • profilers do not give detailed insight into timing behavior of an application • Introduces overhead Overhead (time in sec): MD steps base: 51.4 seconds MD steps with TAU: 315 seconds

  10. Reducing the TAU Instrumentation Overhead • - In the Main Data Window, • from File, select • Create Selective Instrumentation File • Specify the filtering criteria in selection window • Save the throttled routines as a file • Include the throttled file in the compilation (more info later)

  11. TAU Commonly Used Features • Callpath profiling • Selective instrumentation • Loop Instrumentation • Phase profiling • Tracing

  12. TAU CALLPATH Profiling In run-time: • setenv TAU_CALLPATH 1 • setenv TAU_CALLPATH_DEPTH 30 (default depth is 2) • To see the call graph: (On Main Data Window, right_click on cores then select Thread Statistical Table option)

  13. Selective Instrumentation • Instrument the code normally • Generate the select.tau file as shown in slide … • Set TAU_OPTIONS and recompile: setenv TAU_OPTIONS “-optVerbose -optKeepFiles -optPreProcess-optTauSelectFile=select .tau” • Further selective options can be added to select.tau • Files to include/exclude • Routines to include/exclude • Directives for loop instrumentation • Phase definitions

  14. Sample select.tau % cat select.tau BEGIN_EXCLUDE_LIST DDOT DAXPY VECPR DIST BETA ALPHA END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION static phase name="PHASE_MD" file="minimize_p*" line=153 to line=154 loops file="prim_advance_mod*" routine="PRIM_ADVANCE_MOD::PREQ_ADVANCE_EXP" END_INSTRUMENT_SECTION BEGIN_INCLUDE_LIST EELEC EGB GINV_MULT ESCP END_INCLUDE_LIST

  15. Phase Profiling Isolate regions of code execution

  16. Load imbalance is ResolvedChoice of the serial algorithm created load imbalance Load imbalance in Original code

  17. UNRES Start-up Time Is Improved Original code MPI_Bcast time is reduced by ~4x optimizing the start up routines(262.50 vs 54.01 sec) Optimized code

  18. Tracing • Captures run-time events • Timestamp, process, thread, and event type are recorded • Enter/leave of functions for process/thread • MPI sender, receiver, length, tag, communicator • Tracing preserve the context • temporal and spatial relationships • Traces can become very large • May cause perturbation

  19. TAU Tracing and Vampir Visualization To generate TAU trace files • Instrument the code with TAU normally • setenv TAU_TRACE 1 • Run normally to generate *.trc and *.edf files • % tau_treemerge.pl to merge tau.trc and tau.edf files • tau2otf tau.trc tau.edf app.otf Trace visualization and analysis • % vampir app.otf (or vng client with vngd server) • Vampir is available on bigred and quarry at IU.

  20. UNRES TRACE in Timeline View • Intuitive navigation and zooming help to quickly identify inefficient or faulty parts of a code MPI Messages Thumbnail

  21. Activities in Node #4 in an specific time interval

  22. EGB calculation in processes 9 – 31 stars later than processes 0 - 8

  23. References • TAU • http://tau.uoregon.edu • http://www.cs.uoregon.edu/research/tau/docs.php • http://www.cs.uoregon.edu/research/tau/docs/newguide/re01.html • http://www.cs.uoregon.edu/research/tau/docs/newguide/bk03apa.html#d0e15219 • http://www.cs.uoregon.edu/research/tau/docs/scenario/index.html • POINT: Productivity from Open, Integrated Tools • http://www.nic.uoregon.edu/point • http://www.psc.edu/general/software/packages/tau/TAU-quickref.pdf • IU’s vampir documentation: • http://www.pti.iu.edu/hpa/vampir-workshop

More Related