1 / 29

High Performance Computing on Flux EEB 401

High Performance Computing on Flux EEB 401. Charles J Antonelli Mark Champe LSAIT ARS September, 2014. Flux. Flux is a university - wide shared computational discovery / high -performance computing service. Provided by Advanced Research Computing at U-M Operated by CAEN HPC

jagger
Download Presentation

High Performance Computing on Flux EEB 401

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High PerformanceComputing on FluxEEB 401 Charles J AntonelliMark ChampeLSAIT ARSSeptember, 2014

  2. Flux Flux is a university-wideshared computational discovery / high-performance computing service. • Provided by Advanced Research Computing at U-M • Operated by CAEN HPC • Procurement, licensing, billing by U-M ITS • Interdisciplinary since 2010 http://arc.research.umich.edu/resources-services/flux/ cja 2014

  3. The Flux cluster Login nodes Compute nodes Data transfernode Storage … cja 2014

  4. A Flux node 48,64 GB RAM 12, 16 Intel cores Local disk Network cja 2014

  5. Programming Models • Two basic parallel programming models • Message-passingThe application consists of several processes running on different nodes and communicating with each other over the network • Used when the data are too large to fit on a single node, and simple synchronization is adequate • “Coarse parallelism” • Implemented using MPI (Message Passing Interface) libraries • Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives • Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable • “Fine-grained parallelism” or “shared-memory parallelism” • Implemented using OpenMP (Open Multi-Processing) compilers and libraries • Both cja 2014

  6. Command Line Reference William E Shotts, Jr.,“The Linux Command Line: A Complete Introduction,”No Starch Press, January 2012.http://linuxcommand.org/tlcl.php . Download Creative Commons Licensed version athttp://downloads.sourceforge.net/project/linuxcommand/TLCL/13.07/TLCL-13.07.pdf . cja 2014

  7. Using Flux • Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token) • Logging in to Fluxssh login@flux-login.engin.umich.eduCampus wired or MWirelessVPNssh login.itd.umich.edufirst cja 2014

  8. Copying data Three ways to copy data to/from Flux • From Linux or Mac OS X, use scp:scplocalfilelogin@flux-xfer.engin.umich.edu:remotefilescplogin@flux-login.engin.umich.edu:remotefilelocalfilescp -r localdirlogin@flux-xfer.engin.umich.edu:remotedir • From Windows, use WinSCP • U-M Blue Dischttp://www.itcs.umich.edu/bluedisc/ • Use Globus Connect cja 2014

  9. Globus Online • Features • High-speed data transfer, much faster than scp or WinSCP • Reliable & persistent • Minimal client software: Mac OS X, Linux, Windows • GridFTP Endpoints • Gateways through which data flow • Exist for XSEDE, OSG, … • UMich: umich#flux, umich#nyx • Add your own client endpoint! • Add your own server endpoint: contact flux-support@umich.edu • More information • http://cac.engin.umich.edu/resources/login-nodes/globus-gridftp cja 2014

  10. Batch workflow • You create a batch script and submit it to PBS (the cluster resource manager & scheduler) • PBS schedules your job, and it enters the flux queue • When its turn arrives, your job will execute the batch script • Your script has access to any applications or data stored on the Flux cluster • When your job completes, anything it sent to standard output and error are saved and returned to you • You can check on the status of your job at any time, or delete it if it’s not doing what you want • A short time after your job completes, it disappears cja 2014

  11. Basic batch commands • Once you have a script, submit it:qsubscriptfile$ qsubsinglenode.pbs6023521.nyx.engin.umich.edu • You can check on the job status:qstatjobidqstat -u user $ qstat-u cja nyx.engin.umich.edu: Req'dReq'dElap Job ID Username Queue JobnameSessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 6023521.nyx.engi cjaflux hpc101i -- 1 1 -- 00:05 Q -- • To delete your jobqdeljobid$ qdel 6023521$ cja 2014

  12. Loosely-coupled batch script #PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS –l procs=12,pmem=1gb,walltime=01:00:00 #PBS -M youremailaddress #PBS -m abe #PBS -j oe #Your Code Goes Below:cd $PBS_O_WORKDIR mpirun ./c_ex01 cja 2014

  13. Tightly-coupled batch script #PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS –l nodes=1:ppn=12,mem=47gb,walltime=02:00:00 #PBS -M youremailaddress #PBS -m abe #PBS -j oe #Your Code Goes Below: cd $PBS_O_WORKDIR matlab -nodisplay -r script cja 2014

  14. Flux software • Licensed and open software: • Abacus, BLAST, BWA, bowtie, ANSYS, Java, Mason, Mathematica, Matlab, R, RSEM, STATA SE, … • See http://cac.engin.umich.edu/resources • C, C++, Fortran compilers: • Intel (default), PGI, GNU toolchains • You can choose software using the module command cja 2014

  15. Modules • The module command allows you to specify what versions of software you want to use module list -- Show loaded modulesmodule loadname-- Load module name for usemodule show name-- Show info for namemodule avail -- Show all available modulesmodule avail name -- Show versions of module name*module unload name -- Unload module namemodule -- List all options • Enter these commands at any time during your session • A configuration file allows default module commands to be executed at login • Put module commands in file ~/privatemodules/default • Don’t put module commands in your .bashrc / .bash_profile cja 2014

  16. Flux storage • Lustre filesystem mounted on /scratch on all login, compute, and transfer nodes • 640 TB of short-term storage for batch jobs • Large, fast, short-term • NFS filesystems mounted on /home and /home2 on all nodes • 80 GB of storage per user for development & testing • Small, slow, long-term cja 2014

  17. Flux environment • The Flux login nodes have the standard GNU/Linux toolkit: • make, perl, python, java, emacs, vi, nano, … • Watch out for source code or data files written on non-Linux systems • Use these tools to analyze and convert source files to Linux format • file • dos2unix cja 2014

  18. BLAST • Load modules mod unload intel-comp openmpigcc mod load med python/3.2.3 gccboost/1.54.0-gcc ncbi-blast/2.2.29 • Create file ~/.ncbirc, with contents[BLAST]BLASTDB=/nfs/med-ref-genomes/blast • Copy sample code to your home directory cd cp ~cja/hpc/eeb401-sample-code.tar.gz . tar -zxvf eeb401-sample-code.tar.gz cd ./eeb401-sample-code cja 2014

  19. BLAST • Examine blast-example.pbs • Edit with your favorite Linux editor • emacs, vi, pico, … • Change email address bjensen@umich.edu to your own cja 2014

  20. BLAST • Submit your job to Fluxqsub blast-example.pbs • Watch the progress of your job qstatjobid • When complete, look at the job’s outputless blast-example.ojobid cja 2014

  21. BWA module load med samtools module load med ncbi-blast module load med bowtie # optional module load med bwa cja 2014

  22. Bowtie module load med bowtie cja 2014

  23. RSEM module load R/3.0.1 module load lsarsem module load med bowtie Note: loading R/3.0.1 unloads gcc/4.7.0 and loadsgcc/4.4.6 cja 2014

  24. Perl scripts module load lsabaucom-bioinformatics module show baucom-bioinformatics cja 2014

  25. Interactive jobs • You can submit jobs interactively: qsub -I -X -V -l procs=2 -l walltime=15:00 -A youralloc_flux-l qos=flux –q flux • This queues a job as usual • Your terminal session will be blocked until the job runs • When your job runs, you'll get an interactive shell on one of your nodes • Invoked commands will have access to all of your nodes • When you exit the shell your job is deleted • Interactive jobs allow you to • Develop and test on cluster node(s) • Execute GUI tools on a cluster node • Utilize a parallel debugger interactively cja 2014

  26. Interactive BLAST • Load modules: module unload gccopenmpi module load med gccncbi-blast • Start an interactive PBS sessionqsub -I -V -l nodes=1:ppn=2-l walltime=1:00:00 -A eeb401f14_flux -l qos=flux -q flux • Run BLAST in the interactive shellcd $PBS_O_WORKDIR blastdbcmd-dbrefseq_rna-entry nm_000249 -out test_query.fa blastn -query test_query.fa-dbrefseq_rna -task blastn-dust no -outfmt 7 -num_alignments2 -num_descriptions 2 -num_threads 2 cja 2014

  27. Gaining insight • There are several commands you can run to get some insight over when your job will start: • freenodes : shows the total number of free nodes and cores currently available on Flux • mdiag-a youralloc_name: shows cores and memory defined for your allocation and who can run against it • showq-w acct=yourallocname: shows cores being used by jobs running against your allocation (running/idle/blocked) • checkjob -v jobid : Can show why your job might not be starting • showstart -e all jobid: Gives you a coarse estimate of job start time; use the smallest value returned cja 2014

  28. Some Flux Resources • http://arc.research.umich.edu/resources-services/flux/ • U-M Advanced Research Computing Flux pages • http://cac.engin.umich.edu/ • CAEN HPC Flux pages • http://www.youtube.com/user/UMCoECAC • CAEN HPC YouTube channel • For assistance: hpc-support@umich.edu • Read by a team of people including unit support staff • Cannot help with programming questions, but can help with operational Flux and basic usage questions cja 2014

  29. Any Questions? • Charles J. AntonelliLSAIT Advocacy and Research Supportcja@umich.eduhttp://www.umich.edu/~cja734 763 0607 cja 2014

More Related