1 / 24

Toward Global HPC Platforms Gabriel Mateescu Research Computing Support Group

Toward Global HPC Platforms Gabriel Mateescu Research Computing Support Group National Research Council Canada Gabriel.Mateescu@nrc.ca www.sao.nrc.ca/~gabriel/hpcs. Agenda. HPC applications for computational grids Building Legion parallel programs File staging and job running

aloha
Download Presentation

Toward Global HPC Platforms Gabriel Mateescu Research Computing Support Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward Global HPC Platforms • Gabriel Mateescu Research Computing Support Group National Research Council Canada Gabriel.Mateescu@nrc.ca www.sao.nrc.ca/~gabriel/hpcs

  2. Agenda • HPC applications for computational grids • Building Legion parallel programs • File staging and job running • MPI Performance • Job scheduling • PBS interoperability • Conclusions

  3. Wide Area Parallel Programs • Some HPC applications benefit from running under Legion • Tolerate high latencies • Consist of many loosely coupled subtasks • Either use or generate very large amounts of data, e.g., Terabytes • Examples: • parameter space studies • Monte Carlo simulations • particle physics, astronomy • Avaki HPC is the new brand name of the Legion grid middleware

  4. Build Legion MPI Programs • Sample makefile for building and registering the MPI program called myprog • CC = cc • CFLAGS = -mips4 • myprog: myprog.c • $(CC) $(CFLAGS) –I$(LEGION)/include/MPI –c $?; • legion_link –o $@ $@.o -mpi; • legion_mpi_register $@ ./$@ $(LEGION_ARCH); • One must use Legion’s mpi.h • Issue the command make myprog

  5. Remote Build • Register the makefile • legion_register_program legion_makefile \ • ./makefile sgi • legion_register_program legion_makefile \ • ./makefile linux • Remote make, on a selected architecture • legion_run –a sgi \ • –IN myprog.c \ • –OUT myprog legion_makefile

  6. Native MPI Programs • Need to configure the host for running native MPI • Create the class /class/legion_native_mpi_backend • legion_native_mpi_init $LEGION_ARCH • Configure the host to invoke the native mpirun • legion_native_mpi_config_host hosts/nickel \ $LEGION/bin/${LEGION_ARCH}/legion_native_mpisgi_wrapper

  7. Build Native MPI Programs • Sample makefile • CC = cc • CFLAGS = -mips4 • myprog_native: myprog.c • $(CC) $(CFLAGS) –o $@ –c $? -lmpi; • legion_native_mpi_register $@. /$@ $(LEGION_ARCH); • Run the program under Legion • legion_native_mpi_run –n 4 mpi/programs/myprog

  8. File Staging • Collect input files with the –IN option and output files with –OUT option to legion_run, legion_mpi_run • Need to know the names of the files created by the program • Option file useful for multiple legion_run files • -IN par.in -OUT res.out • Wild cards are not allowed in the options file • Specification file useful with legion_run_multi • # keyword(IN,OUT) file_name pattern • IN par.in /mydir/par*.in • OUT res.out /mydir/res*.out

  9. …File Staging • For MPI programs, use –a to get output from all processes • Examples • legion_mpi_run –n 4 –a -OUT file.out mpi/programs/myprog • legion_run -OUT file.out home/gabriel/myprog • legion_run -f opt_file home/gabriel/myprog • legion_run_multi -f spec_file –n 2 \ home/gabriel/myprog

  10. Capturing Standard Output • Method 1: Use a tty object • legion_tty tty1 • legion_mpi_run –n 2 home/gabriel/myprog_run • legion_tty_off • Method 2: Redirect standard output to a file • Option -A -STDOUT to legion_mpi_run • Option -stdout (to redirect) and –OUT (to copy back) to legion_run • legion_mpi_run –n 4 –A STDOUT out mpi/programs/myprog • legion_run –stdout std.out -OUT std.out myprog_run

  11. Debugging • View MPI program instances • legion_context_list mpi/instances/myprog • Prove MPI jobs with legion_mpi_probe • legion_mpi_probe mpi/instances/myprog • Trace Legion MPI jobs with legion_mpi_debug • legion_mpi_debug –q –c mpi/instances/myprog • Trace execution of commands with the –v (verbose) option • legion_mpi_run –n 8 –v –v –v mpi/programs/myprog

  12. Debugging • Find where an object is located with legion_whereis • ps –ef|grep $LEGION_OPR/Cached-myprog-Binary–version • Probe jobs • legion_run –nonblock –p probe_file myprog • legion_probe_run –p probe_file –statjob • legion_probe_run –p probe_file –kill • Some commands accept the –debug option • Error messages not always helpful • require knowledge of Legion internals

  13. MPI Performance • Platform: SGI Origin 2000 • 4 x R12K 400 MHz CPUs • Instruction- and Data- cache: 32 KB • L2 cache 8 MB • Main memory 2 GB • Average latency from processor 0 to the other three processors • Native MPI ~ 8 microseconds • Legion MPI ~1900 microseconds

  14. MPI Bandwidth

  15. Host Types • Interactive Host – Legion starts a job on an interactive host as a time-shared job • Batch queue job – Legion submits the job to a batch queuing and scheduling system, such as PBS • Determine the type of the host with the command • legion_list_attributes –c hosts/nickel \ • host_property host_queue_type • Attribute host_property has the value‘interactive’ or ‘queue’

  16. Interactive Host Scheduling Scheduler Enactor hostA hostB hostC Collection

  17. Interactive Host Scheduling • Legion can pick a set of hosts for running a job, but it does not seem to include really good scheduling algorithms • Legion may split a parallel job among two hosts, even though there are enough resources to run the job on an SMP • User can create a host file specifying candidate hosts • cat hf_monet • /hosts/monet.CERCA.UMontreal.CA 4 • /hosts/nickel 1 • The option –HF to legion_mpi_run specifies the host file • legion_mpi_run –n 2 –HF hf_monet mpi/programs/hello

  18. Interactive Host Scheduling • A performance description of the architectures is converted to a host file for Legion scheduling • % cat perf • sgi 2.0 • % legion_make_schedule –f perf mpi/programs/myprog \ • > hf_file • The file hf_file can be used along with the option –HF to legion_mpi_run • % legion_mpi_run –n 2 –HF hf_file mpi/programs/myprog

  19. Batch Scheduling • Instead of relying on Legion scheduling, or specifying the set of hosts “by hand”, use a batch scheduling system, e.g. PBS • Why use PBS? • Smart Scheduling, job monitoring, and restarting • Combine ubiquitous access provided by Legion with efficient execution and communication obtained from locality and resource allocation • Legion MPI does not have good performance • Think globally, and act locally • ${LEGION} must be visible or copied to all PBS nodes

  20. Running on a Batch Host • A batch host has the attributes • legion_list_attributes –c hosts/nickel_pbs \ • host_property host_queue_type • host_property(‘queue’) • host_queue_type(‘PBS’) • Make sure that the job to be run on the batch does not have the attribute desired_host_property with the value interactive • legion_update_attributes \ –c home/gabriel/mpi/programs/myprog \ –d “desired_host_property(‘interactive’)”

  21. Running on a Batch Host • Run an ordinary job • % legion_run –stdout stdout.o –OUT stdout.o \ –h /hosts/wolf-pbs uname_stdout • For legion_mpi_run, one needs to create a batch host context – associate a context with a batch host , done once • % legion_mkdir /home/gabriel/context_wolf-pbs • % legion_ln /hosts/wolf-pbs \ /home/gabriel/context_wolf-pbs/wolf-pbs • Run the MPI job with a batch host context • % legion_mpi_run –n 16 -A –STDOUT std.out \ –h /home/gabriel/context_wolf-pbs \ • mpi/programs/myprog

  22. Fault Tolerance • Checking the consistency of the Legion collection is tricky • Legion tools tend to core dump when the bootstrap host does not respond • Watch for core dumps in the current directory, or under /tmp • Aborting some commands, e.g., legion_login, may leave around stale objects which confuse Legion • After a while, it is good to log out of Legion and log in again to refresh the working context • Apparently, a crashed MPI program is not restarted

  23. Conclusions • Legion provides good capacity and not so good performance • It is not trivial to run parallel jobs under Legion • User must specify staging of input and output files • Limited job information and debugging tools • A legion_ps command is needed • How to peek at the output files ? • Too much information hiding ? • Legion scheduling of parallel job seems non optimal • Integrating Legion with a batch system that provides scheduling and fault tolerance, improves performance and reliability

  24. Acknowledgments • Chris Cutter and Mike Herrick, Avaki • John Karpovich, University of Virginia • Janusz Pipin and Marcin Kolbuszewski, C3.ca and NRC • Roger Impey, NRC

More Related