1 / 42

Overview

Getting Started With The NPACI Grid & NPACKage Shannon Whitmore swhitmor@sdsc.edu http://npacigrid.npaci.edu http://npackage.npaci.edu. Overview. Introduction Getting Started on the NPACI Grid Tutorial. Defined . NPACI Grid Hardware, software, network, and data resources at

Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting Started With The NPACI Grid &NPACKageShannon Whitmoreswhitmor@sdsc.eduhttp://npacigrid.npaci.eduhttp://npackage.npaci.edu

  2. Overview • Introduction • Getting Started on the NPACI Grid • Tutorial

  3. Defined • NPACI Grid • Hardware, software, network, and data resources at • San Diego Supercomputer Center • Texas Advanced Computing Center • University of Michigan, Ann Arbor • California Institute of Technology – coming soon • NPACKage • An integrated set of grid middleware and advanced NPACI applications

  4. Grid Resources • Blue Horizon (SDSC) • IBM power3-based clustered SMP system • 1,152 processors, 576 GB main memory • Longhorn (TACC) • IBM power4 system • 224 processor and 512 GB aggregate memory • Hypnos & Morpheus (UMichigan) • AMD-based Linux clusters • Hypnos: 128 nodes Morpheus 50 nodes • Each SMP node: two CPUs & one GB memory per processor

  5. Why use the NPACI Grid? • Simplifies job submission • Globus: common scripting language for job submission • Condor-G: launch and monitor jobs from one site • Combines local resources with SC resources • Run small jobs locally, large jobs remotely • Enables portal development • Single point of access for tools • Simplifies complex interfaces for users

  6. Why use the NPACI Grid? (cont’d) • Provides tools for distributed data management and analysis • SRB • Datacutter • Provides single sign-on capabilities

  7. Caveats • Resources are intended for large jobs • Try not to run small jobs on the batch queues • Must plan in advance for large runs • Request machine allocations • Cannot run distributed jobs on batch resources concurrently

  8. NPACKage Components

  9. Why use NPACKage • Easier to port applications • Components tested before release • Consulting support available • Consistent packaging • Simplified installation/configuration process • Single web site for all documentation • Install NPACKage on your system today!

  10. Accessing The Gridhttp://npacigrid.npaci.edu/user_getting_started.html

  11. Accounts • Need an NPACI account? http://npacigrid.npaci.edu/expedited_account.html • Need an account extension? http://npacigrid.npaci.edu/account_extension_request.html • Username does not start with “ux”? consult@npaci.edu

  12. Login Nodes • SDSC (Blue Horizon) • tf004i.sdsc.edu & tf005i.sdsc.edu (batch) • b80n01.sdsc.edu - b80n13.sdsc.edu (interactive) • TACC • longhorn.tacc.utexas.edu (batch & interactive) • Michigan • hypnos.engin.umich.edu (batch & interactive) • morpheus.engin.umich.edu (batch & interactive)

  13. Setup your environment • Add the following to your shell initialization file on all NPACI Grid hosts • For csh based shells if ( ! $?NPACI_GRID_CURRENT ) then     alias . source     setenv NPACI_GRID_CURRENT /usr/npaci-grid-1.1     . $NPACI_GRID_CURRENT/setup.csh endif • For bourne based shells if [ "x$NPACI_GRID_CURRENT" = "x" ]; then     export NPACI_GRID_CURRENT=/usr/npaci-grid-1.1     . $NPACI_GRID_CURRENT/setup.sh fi

  14. Certificates Required to use the NPACI Grid • Used for authentication and encryption • Enables single sign-on capabilities On cert.npaci.edu • Run /usr/local/apps/pki_apps/cacl • Generates X.509 certificate • Creates your Distinguished Name – a globally unique ID identifying you as an individual

  15. Certificates (cont’d) • Copy your .globus directory to all sites • Script: http://npacigrid.npaci.edu/Examples/copycert.sh • Wait for DN propagation in grid-mapfile • Maps local usernames on a given host to a DN

  16. Verify Grid Access Connect to any login node • ssh longhorn.tacc.utexas.edu –l <username> • grid-proxy-init • generates a proxy certificate • provides single sign on capability • proxies are valid for one day • grid-proxy-destroy • Removes the proxy

  17. Verify Grid Access (cont’d) • Authenticate your certificate at each site • globusrun -a -r hypnos.engin.umich.edu • globusrun -a -r morpheus.engin.umich.edu • globusrun -a -r longhorn.tacc.utexas.edu • globusrun -a -r tf004i.sdsc.edu • Problems? Contact us: • http://npacigrid.npaci.edu/contacts.html

  18. TutorialClients and Serviceshttp://npacigrid.npaci.edu/tutorial.html

  19. Overview • Running Jobs • Using Globus • Using Condor-G • Transferring Files • Resource and Monitoring Services • MDS/Ganglia • NWS

  20. Gatekeeper • Handles globus job requests at a remote site • Manages authentication and security • Routes job requests to a jobmanager • Exists on all login nodes

  21. Jobmanager • Manages job launching • Two jobmanagers on each gatekeeper host • Interactive • jobmanager-fork – default • Batch – interface to local schedulers • jobmanager-loadleveler - longhorn & horizon • jobmanager-pbs - hypnos & morpheus

  22. Globus clients Three commands for remote job submission • globus-job-submit • globus-job-run • globus-run

  23. globus-job-submit • Runs in background • Returns a contact string • Output from each job stored locally • $(HOME)/.globus/.gass_cache/….. • Example: • globus-job-submit morpheus.engin.umich.edu /bin/date

  24. globus-job-submit (cont’d) Supporting commands • globus-job-status <contact-string> • globus-job-getoutput <contact-string> • globus-job-clean <contact-string>

  25. globus-job-run • Runs in foreground • Provides executable staging • Output delivered directly • Example • globus-job-run hypnos.engin.umich.edu/jobmanager-pbs /bin/hostname

  26. globusrun • Main command for submitting globus jobs • Uses the Resource Specification Language for specifying job options • Examples: • globusrun -f b80.rsl • globusrun -r hypnos.engin.umich.edu -f myjob.rsl

  27. Sample RSL File + ( &(resourceManagerContact="longhorn.tacc.utexas.edu/jobmanager-loadleveler") (max_wall_time=45) (queue=normal) (max_memory=10) (directory=/paci/sdsc/ux444444/JobOutput) (executable=/bin/date) (stdout=longhorn-output) (stderr=longhorn-error) )

  28. Required RSL Parameters Loadleveler at Texas (queue=normal) (max_wall_time=45) (max_memory=10) Loadleveler at SDSC b80’s: (queue=interactive) (max_wall_time=45) (environnent=(MP_EUIDEVICE en0)) tf004i/tf005i (queue=normal) (max_wall_time=45) PBS at Michigan hypnos (queue=route) (max_wall_time=45) (email_address=your@email) morpheus (queue=npaci) (max_wall_time=45) (email_address=your@email)

  29. Condor-G • Provides job submission & monitoring at a single site: griddle.sdsc.edu • Handles file transfers & job I/O • Uses Globus to launch jobs • Provides a tool (DAGMan) for handling job dependencies

  30. Condor Submit Description File # path to executable on remote host executable = /paci/sdsc/ux444444/hello.sh # do not stage executable from local to remote host transfer_executable = false # host and jobmanager where job is to be submitted globusscheduler = longhorn.tacc.utexas.edu/jobmanager-fork # condor-g always uses the globus universe universe=globus # local files where output, error, and logs will be placed output = hello.out error = hello.error log = hello.log # submit the job Queue

  31. Condor-G Commands • condor_submit to launch job • condor_submit <description_file> • condor_q to monitor job • condor_rm to remove job • condor_rm <id> • condor_rm all

  32. DAGMan • Metascheduler for Condor-G jobs • Directed acyclic graph (DAG) represents jobs & dependencies • Nodes (vertices) are jobs in the graph • Edges (arcs) identify dependencies • Commands • condor_submit_dag <DAGMan Input File> • condor_q -dag

  33. DAGMan Input File • Required • Jobs names and their corresponding Condor submit description files for each node in the DAG • Dependency description • Optional • Preprocessing and postprocessing before or after job submission • Number of times to retry if a node within the DAG fails

  34. Example DAGMan Input File Job A longhorn.condor Job B morpheus.condor Job C hypnos.condor Job D horizon.condor PARENT A CHILD B C PARENT B C CHILD D Retry C 3

  35. Description Files • longhorn.condor universe = globus executable = /bin/hostname transfer_executable = false globusscheduler = longhorn.tacc.utexas.edu/jobmanager-fork output = job.$(cluster).out error = job.$(cluster).err log = job.$(cluster).log Queue • For hypnos, morpheus, and tf004i files replace globusscheduler value appropriately

  36. File Transfer GridFTP • Defines a protocol • Provides GSI authentication, partial file and parallel transfers, etc • Programs • Server: gsiftp - extends FTP with GSI authentication • Client: globus-url-copy

  37. globus-url-copy • globus-url-copy <fromURL> <toURL> • Accepted URL’s • For local files: file:<full path> • For remote files: gsiftp://<hostname><path> • Also accepts http://, ftp://, https:// • Example globus-url-copy file:/paci/sdsc/ux444444/myfile \ gsiftp://longhorn.tacc.utexas.edu/~/newfile

  38. gsiscp • Not GridFTP-based • Uses GSI authentication • Specify GSISSH server port for single sign-on capabilities • Example • gsiscp –P 1022 setup* \ ux444444@morpheus.engin.umich.edu:.

  39. Resource & Discovery Services • Publishes system and application data • Components • Globus MDS - Monitoring and Discovery Services • Ganglia – For clusters • NWS – Network monitoring • Useful for grid middleware • Resource discovery & selection • Useful for grid applications Configuration and real-time adaptation

  40. Graphical MDS Views • On the web: • https://hotpage.npaci.edu/ • https://hotpage.npaci.edu/cgi-bin/grid_view.cgi • http://npackage.cs.ucsb.edu/ldapbrowser/login.php • Download and run your own LDAP browser • i.e. http://www.iit.edu/~gawojar/ldap/ • NPACI Grid MDS Info • LDAP Host: giis.npaci.edu • Port: 2132 • Base DN: Mds-Vo-name=npaci,o=Grid

  41. Future Work • New NPACKage components • GridPort (ready for next release) • Netsolve (in progress) • NPACI Alpha Project Integration • MCELL, Telesciences, Geosciences, Protein Folding are all in progress • Scalable Viz, PDB Data Resource, Computational Electromicroscopy coming soon

  42. Grid Consulting • Services • Assist with troubleshooting • Evaluate your application for use on the grid • Assist with porting • We are actively looking for applications! Contact us: http://npacigrid.npaci.edu/contacts.html

More Related