1 / 46

Douglas Jacobsen Bioinformatics Computing Consultant

Genepool Modules Setting up your environment at NERSC. Douglas Jacobsen Bioinformatics Computing Consultant. Topics. UNIX Environment Basics Constructing a default environment, dotfiles Introduction to Modules Extension to Modules – ModulesReloaded Using modules interactively

sabine
Download Presentation

Douglas Jacobsen Bioinformatics Computing Consultant

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genepool Modules Setting up your environment at NERSC Douglas JacobsenBioinformatics Computing Consultant

  2. Topics UNIX Environment Basics Constructing a default environment, dotfiles Introduction to Modules Extension to Modules – ModulesReloaded Using modules interactively Using modules in a batch job Constructing basic modules for your software Constructing pipeline modules

  3. Motivation for this training • Most-common tickets at NERSC are issues with environment settings • /jgi/tools is being retired;old settings need to be changed! • The modules system on genepool has been updated to ease the transition and future production work • Examples modulefiles in: • /global/projectb/shared/data/training/modules

  4. The UNIX Environment • What is it? Key/value store for every process • What does the UNIX environment do for you? • controls which programs you can easily run • PATH • Many linux systems have default PATH of: PATH = /usr/local/bin:/usr/bin:/bin • Sets up linking paths to allow your programs to run • LD_LIBRARY_PATH • Controls how your programs run • MANPATH, PKG_CONFIG_PATH, PS1, OMPI_MCA_ras • Really the environment is a way for you to communicate with your programs • Useful convenience variables on the command line and scripts: • SCRATCH, NERSC_HOST, BOOST_ROOT

  5. The UNIX Environment: The Rules init memtime bash ls perl $data = `cat $file | sort ` blastx /bin/sh cat sort Each process has its own environment Each process can manipulate it’s own environment but no others A child process inherits its parent’s environment A “login” shell reads special “dotfiles” which may reset parts of the environment

  6. Looking at the environment $ env# dump the whole environment $ echo $NERSC_HOST # just see NERSC_HOST $ echo $PATH # view the compound variable PATH $ env | grep MODULE # just variables with ‘MODULE’ We’ll be looking at the environment a lot today, these are two easy ways to interrogate the environment from either bash or tcsh What shell are you using? (hint, check $SHELL)

  7. Changing the environment • bash (default on genepool) export MYVAR=“test” # when writing, don’t use ‘$’ echo $MYVAR # when reading, use ‘$’ export PATH=$HOME/bin:$PATH # prepend your PATH export MYVAR=“${MYVAR}2” # append ‘2’ to MYVAR • tcsh setenv MYVAR “test” Echo $MYVAR setenv PATH $HOME/bin:$PATH setenv MYVAR “${MYVAR}2”

  8. NERSC Dotfiles – Your default Environment Pt 1 • When you first login (or a batch script runs), a login shell is executed • A login shell is generated for every job – even if you transmit your environment, the login shell environment is overlayed on top of the transmitted environment • A login shell sources special files in your home directory, your dotfiles • bash users (files evaluated in this order): • $HOME/.profile (read-only symlink, do not change) • $HOME/.bash_profile.ext(user customizable) • $HOME/.bashrc(read-only symlink, do not change) • $HOME/.bashrc.ext (user customizable) • tcsh users (files evaluated in this order): • $HOME/.tcshrc(read-only symlink, do not change) • $HOME/.tcshrc.ext (user customizable) • $HOME/.login(read-only symlink, do not change) • $HOME/.login.ext (user customizable) • zsh, kshexecute some dotfiles, but NERSC support is being phased out • /bin/sh does not properly source the dotfiles(BEWARE!)

  9. Using Software and the UNIX Environment • Providing large-scale installations of software for many different users on an HPC system presents a number of challenges: • Different users need different software, use different shells • Some users need different specific versions, including older versions • All users need to access the software quickly and easily from “everywhere” [network-mounted, non-standard paths] • Providing a user interface for accessing that software can be challenging • Example: How would you use software installed in /usr/common/jgi/aligners/blast+/2.2.28 • Answer: • Add /usr/common/jgi/aligners/blast+/2.2.28/bin to PATH; • csh: setenv PATH /usr/common/jgi/aligners/blast+/2.2.28/bin:$PATH • bash: export PATH=/usr/common/jgi/aligners/blast+/2.2.28/bin:$PATH

  10. What are Modules? A “module” is something that can be loaded or unloaded dynamically into the environment. Modules have a name Modules can have a default version Modules have a version can have many versions To refer to the default version of a module, use: <name> e.g. module load gcc To refer to a specific version of a module, use: <name>/<version> e.g. module load gcc/4.8.1

  11. Modules Interactive Example • Basic Commands: module load <module id> [<module id> …] Load a module module unload <module id> [<module id> …] Remove a module module list List all loaded modules module show <module id> See module effects module avail See all modules module purge Remove all modules • Try the following: • Load the default blast+ module • Load the latest version of the hdf5 module (hint: not default) • Unload the above modules but leave the rest intact • What effects does the jgitools module have? • What versions of RSeQC are available on genepool? (try using grep) • Why didn’t grep work for the last step? • module avail | grepRSeQCwon’t work • module communicates with you on stderr (stdout is used internally)

  12. More awkward in tcsh, but possible: ( module –t avail ) | & grepRSeQC dmj@genepool02:~$ module list Currently Loaded Modulefiles: 1) modules 7) mysql/5.0.96 2) nsg/1.2.0 8) PrgEnv-gnu/4.6 3) uge/8.0.1 9) perl/5.16.0 4) jgitools/1.2.0 10) readline/6.2 5) oracle_client/11.2.0.3.0 11) python/2.7.4 6) gcc/4.6.3 12) usg-default-modules/1.4 dmj@genepool02:~$ module load blast+ dmj@genepool02:~$ module load hdf5/1.8.11 dmj@genepool02:~$ module list Currently Loaded Modulefiles: 1) modules 8) PrgEnv-gnu/4.6 2) nsg/1.2.0 9) perl/5.16.0 3) uge/8.0.1 10) readline/6.2 4) jgitools/1.2.0 11) python/2.7.4 5) oracle_client/11.2.0.3.0 12) usg-default-modules/1.4 6) gcc/4.6.3 13) blast+/2.2.26 7) mysql/5.0.96 14) hdf5/1.8.11 dmj@genepool02:~$ module unload blast+ hdf5 dmj@genepool02:~$ module list Currently Loaded Modulefiles: 1) modules 7) mysql/5.0.96 2) nsg/1.2.0 8) PrgEnv-gnu/4.6 3) uge/8.0.1 9) perl/5.16.0 4) jgitools/1.2.0 10) readline/6.2 5) oracle_client/11.2.0.3.0 11) python/2.7.4 6) gcc/4.6.3 12) usg-default-modules/1.4 dmj@genepool02:~$ module -t avail 2>&1 | grepRSeQC RSeQC/2.3.2 RSeQC/2.3.6(default) dmj@genepool02:~$

  13. Basic Modules Functionality • Modules manipulate the environment • Loading can: • Set an environment variable (possibly by replacing) • Append (or prepend) to a compound environment variable • Unset an environment variable • *can* execute a command (not recommended if the command changes the state of the system) • ‘module unload’ reverses the effects of the ‘module load’ • Which effects of a module might be irreversible? • Answer: • setenv won’t restore the environment to its original state • multiple modules calling ‘setenv’ or ‘unsetenv’ on the same variable might lead to an inconsistent state (those modules should conflict) • Executing system calls which change system state (e.g. xhost) are not trivially reversible by unloading the module

  14. Modules: conflicting and swapping • Some modules are incompatible • E.g. both wublast and blast+ provide different blastn, blastx, etc. executables • To prevent these modules from being simultaneously loaded, they conflict dmj@genepool02:~$ module load wublast dmj@genepool02:~$ module load blast+ blast+/2.2.26(25):ERROR:150: Module 'blast+/2.2.26' conflicts with the currently loaded module(s) 'wublast/20060510’ • Most of the time, only a single version of a module should be loaded at a time: • e.g., doesn’t make sense to load more than one version of gcc • Try: module purge ## cleans everything out module load gcc Module load gcc/4.8.1 • Error? to change from gcc/4.6.3 (the default) to gcc/4.8.1 (the latest), swap! module swap gccgcc/4.8.1-or-module swap gcc/4.8.1

  15. Setting up your own modules • Modules are described by modulefiles • One version per modulefile, in a directory named for the module; • Collections of modules are found in $MODULEPATH • Try looking at $MODULEPATH • Add your own modules directory: genepool$ mkdir $HOME/modules genepool$ mkdir $HOME/modules/my_first_module genepool$ module use $HOME/modules • Try looking at $MODULEPATH again genepool$ module avail my_first_module • Why doesn’t it show up? • No modulefiles installed yet… next slide.

  16. Simple modulefile (TOO SIMPLE) Modulefiles are written in (somewhat overloaded) TCL. Module identifier string (REQ) #%Module1.0 ## ## Required internal variables set name gcc set version 4.6.3 set root /usr/common/usg/languages/$name/$version\_1 ## List conflicting modules here conflict $name ## Software-specific settings exported to user environment prepend-path PATH $root/bin prepend-path LD_LIBRARY_PATH $root/lib prepend-path LD_LIBRARY_PATH $root/lib64 prepend-path PKG_CONFIG_PATH $root/lib/pkgconfig setenv GCC_DIR $root Comment } Internal variables Don’t load more than one gcc! } The actual environment adjustments WARNING: This example is simplified, do not use in production on genepool. Refer to later ModulesReloaded examples.

  17. Common Environment Variables in Modules Be VERY careful about manipulating these environment variables!!! • Modules for software packages commonly set: • PATH • LD_LIBRARY_PATH • PYTHONPATH • PERL5DIR • Every usg/jgi module for software also sets an environment variable pointing to the base of the distribution: • E.g. BOOST_ROOT, PERL_DIR, PYTHON_DIR, GIT_PATH • Exercise: • Load the python module first • Use ‘module info’ to investigate the effects of: • graphviz • RSeQC • Smrtanalysis • Are there commonalities? Differences?

  18. Modules have dependencies For the python module to function, both the gccand readlinemodules need to be loaded For the perlmodule to function, the gccmodule needs to be loaded • Python needs some of gcc’s libraries • Perl needs some of gcc’s libraries • Python also needs readline’s libraries

  19. Complexity of module dependencies on genepool • Highly inter-connected graph of dependencies • The most highly connected nodes: • gcc • perl • python • oracle-jdk • openmpi • Many modules are disconnected from the network, possibly because they are: • Statically compiled • Only rely on base-system functionality • Dependencies haven’t been modelled yet

  20. ModulesReloaded • Automatically checks and loads dependencies • Automatically unloads orphaned dependencies • Differentiates between user-loaded modules and auto-loaded modules when manipulating modules • Does more extensive error checking • Modules failing to load return exit status 1 (echo $?) • Supports “variant” modules • Single modulefiles for multiple installations of similar software • Enables reporting of upcoming changes to modules system • Enhances logging capabilities of modules system

  21. ModulesReloadedAutoLoad/Unload • Exercise: • Start by unloading all modules. • Load the python module. • Which modules were loaded? • Next, load the perl module. • Which modules are loaded now? • Now, unload the python module • Check module list • Finally, unload the perl module. • Check module list • Look at the details of the perl and python modules.

  22. ModulesReloadedAutoLoad/Unload • Exercise: • Start by unloading all modules. [module purge] • Load the python module. [module load python] • Which modules were loaded? [gcc, readline, python] • Next, load the perl module. [module load perl] • Which modules are loaded now? [gcc, readline, python, perl] • Now, unload the python module [module unload python] • Check module list [gcc, perl] • Finally, unload the perl module. [module unload perl] • Check module list [None!] • Look at the details of the perl and python modules. module show perl module show python

  23. ModulesReloadedAutoLoad/Unload • In the previous exercise, you should have noticed that the perl and python modules each depended on the gcc module (among others). • The gcc module won’t get unloaded while another loaded module still depends on it.

  24. ModulesReloaded User’s Choice! • Exercise: • Load the default hmmer module • Load the repeatmasker module • Why did that just happen? • ModulesReloaded tracks which modules the user directly requests (vs. those just loaded as dependencies), and won’t swap or remove them automatically. • Unload hmmer, then try loading repeatmasker.

  25. ModulesReloaded Variants https://www.nersc.gov/users/computational-systems/genepool/programming/ • Programming Environments are integrated sets of modules • Attempt to provide a seamless and coherent build environment – regardless of compiler. • Exercise: • Purge all your modules. • Load ‘PrgEnv-gnu’ • Load ‘boost’ • Examine the BOOST_ROOT environment variable • Swap to ‘PrgEnv-gnu/4.8’ • Examine the BOOST_ROOT environment variable again

  26. ModulesReloaded Variants • The ‘boost’ module is a ‘variant’ module • When loaded, it detects which programming environment (PrgEnv) is loaded • When the PrgEnv is swapped, the variant module is also reloaded • A variant module cannot be loaded without its provider (e.g. boost cannot be loaded without some PrgEnv) • Earlier, we had to load python before we could interrogate RSeQC • because RSeQC is a variant on ‘python’ (instead of ‘PrgEnv’)

  27. ModulesReloaded Variants PrgEnv and Compilers Software Libraries (and Deps) Each programming environment provide the ‘PrgEnv’ attribute which is required by the libraries. The PrgEnv meta-modules conflict with each other; but the compilers do not. Legend “Normal” Module PrgEnv-provider Module PrgEnv-client Module Default Module Non-default Module

  28. ModulesReloadedDefaultChange • Changing default module versions may be disruptive to some users • To advertise the change a warning is communicated by modules • Example: • The default version of blast+ is planned to be changed on August 6. • Load the default blast+ module • Unload the blast+ module • Load blast+/2.2.26 (which is the default) dmj@genepool04:~$ module load blast+ WARNING: The default version of blast+ will be changing from 2.2.26 to 2.2.28 on 2013/08/06. Please try blast+/2.2.28. Please contact consult@nersc.gov with any questions. dmj@genepool04:~$ module unload blast+ WARNING: The default version of blast+ will be changing from 2.2.26 to 2.2.28 on 2013/08/06. Please try blast+/2.2.28. Please contact consult@nersc.gov with any questions. dmj@genepool04:~$ module load blast+/2.2.26 dmj@genepool04:~$ • The warning is only sent to users accessing the default without specifying a version.

  29. NERSC Dotfiles – Your default Environment Pt 2 • Default modules are loaded in the .bashrc/.tcshrc files • System files load ‘uge’,’nsg’,’jgitools’ • uge adds the scheduler • Jgitools puts /jgi/tools/bin into your PATH • .bashrc loads ‘usg-default-modules’ • usg-default-modules autoloads: • PrgEnv-gnu • perl • python • oracle-client • mysql • Are any additional modules auto-loaded as prerequisites? • You can add your own ‘module load’ commands to .bashrc.ext / .tcshrc.ext • Do this with care – modules added in the default environment become somewhat infectious

  30. NERSC Dotfiles – Your default Environment Pt 2 • What happens if a user does the following in a their .bashrc.ext file? module load smrtanalysis export PERL5LIB=$HOME/perl export LD_LIBRARY_PATH=/house/groupdirs/randd/lib:$LD_LIBRARY_PATH • Is something wrong here? • Answer: PERL5DIR shouldn’t be replaced. This is invalidating the effects of the smrtanalysis module. Instead, use: export PERL5LIB=$HOME/perl:$PERL5LIB • What about this: export PATH=/jgi/tools/bin:$PATH • Is there something wrong with this? • Answer: The jgitools module is loaded very early in the environment. The jgitools module already implements this functionality. The many things in /jgi/tools/bin may override other settings you want.

  31. NERSC Dotfiles – Your default Environment Pt2 • Best Practices: • Do put your settings in a “genepool”-only section of .bashrc.ext / .tcshrc.ext if [ “$NERSC_HOST” == “genepool” ]; then … fi • Limit the number of modules you load by default, it can complicate handing off batch scripts later • Do not replicate module functionality • i.e. don’t set environment variables with paths into /usr/common directly • Only add to variables like PATH, LD_LIBRARY_PATH, PYTHONPATH, PERL5DIR as these are commonly

  32. Using Modules in your Work

  33. Using Modules Interactively Use modules precisely as we have been in the exercises Modules are great for interactive use!

  34. Using Modules in Batch Scripts Ensures login environment is initialized #!/bin/bash –l #$ -l ram.c=10G #$ -l h_rt=8:00:00 set –e module purge module load PrgEnv-gnu/4.6 module load uge module load blast+/2.2.28 module load python/2.7.4 #…. Run your programs here …. UGE options Kill script if any commands give non-zero exit status Clear all the modules, and then reload all needed modules by version

  35. Using Modules in Batch Scripts • Using this approach: • Your batch script will terminate if something goes wrong (non-zero exit status) • No extraneous modules will be loaded, ensuring exactly the calculation you want to be run is run with no surprises • Using the precise version numbers means your script will work even after new defaults are installed • Purging the modules first will allow your script to work in other users’ hands without requiring anybody to change their dotfiles.

  36. Using Modules in Production Pipelines • Consider creating a pipeline module • e.g. jigsaw/5.1 • The pipeline module could be a pure ‘meta-module’ or point to it’s own relevant scripts (and still be a meta-module) • A meta-module purely loads other modulefiles • E.g., PrgEnv-gnu • A full-featured modulefile could: • Load other modulefiles • Add entries to PATH, PERL5LIB, other parts of the environment

  37. Writing a meta-modulefile mod_conflict replaces the conflict keyword to trap and exit with status 1 #%Module1.0 ## ## Required internal variables set name MyPipeline set version 1.0 ## List conflicting modules here set mod_conflict [list $name] ## List prerequisite modules here set mod_prereq_autoload [list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0] set mod_prereq[list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0] ## Source the common modules code-base source /usr/common/usg/Modules/include/usgModInclude.tcl ## Software-specific settings exported to user environment setenvMYPIPELINE_VER $version mod_prereq_autoloadis the list of modules to autoload mod_prereqis the list of modules to enforce are loaded first. This sets up the automatic load/swap protections. usgModInclude.tclis the ModulesReloaded include code. This should be included before any environment manipulations. A pure meta-module

  38. Writing a meta-modulefile root should evaluate to the filesystem path for your pipeline. The braces instruct TCL to not evaluate it immediately. The include code will do the evaluation and perform additional error checking. #%Module1.0 ## ## Required internal variables set name MyPipeline set version 1.0 set root {/path/to/my/group/stuff/$name/$version} ## List conflicting modules here set mod_conflict [list $name] ## List prerequisite modules here set mod_prereq_autoload [list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0] set mod_prereq[list blast+/2.2.28 mothur/1.26.0 qiime/1.7.0] ## Source the common modules code-base source /usr/common/usg/Modules/include/usgModInclude.tcl ## Software-specific settings exported to user environment setenv MYPIPELINE_VER $version setenv MYPIPELINE_ROOT $root prepend-path PATH $root/bin Position all your environment manipulations after the include file. Doset an environment variable for the version and root of your pipeline. A full featured pipeline-module

  39. Using Pipeline Modules in Batch Scripts Ensures login environment is initialized #!/bin/bash –l #$ -l ram.c=10G #$ -l h_rt=8:00:00 set –e module purge module load PrgEnv-gnu/4.6 module load python/2.7.4 module use /path/to/my/groups/modulefiles module load MyPipeline/1.0 #…. Run your programs here …. UGE options Kill script if any commands give non-zero exit status Clear all the modules, load any needed variant-provider modules Add your modulefiles to MODULEPATH (module use) Load your pipeline module

  40. Conclusion and Best Practices

  41. Best Practices - Dotfiles • If you make changes to compound environment variables, make sure to only add to them • PATH, LD_LIBRARY_PATH, PERL5DIR, PYTHONPATH (many more) • Do not replace modules functionality in your dotfiles: • Don’t add /jgi/tools/bin to PATH • Don’t add any absolute paths in /usr/common to your environment • Limit the number of default modules • Large numbers of default modules complicates giving scripts to others (they need to change their default environment to run your script) • Instead setup convenience meta-modules or pipeline modules and load them as-needed

  42. Best Practices - Modules • Avoid embedding absolute paths in your scripts • Instead use the environment variables set in your modules • This reduces maintenance work on your script and centralizes the work to a single place – the modulefile • In production scripts, purge the modules and load them by-version • This ensures the script runs reproducibly • Unloading modules and re-loading is sometimes more reliable than swapping • ModulesReloaded, for example, can’t unload orphaned dependencies when swapping: module swap PrgEnv-gnu PrgEnv-intel module swap PrgEnv-intelPrgEnv-gnu • The above will leave the intel module loaded due to a bug in the underlying modules system (will investigate and fix in the future).

  43. Best Practices - General • Logout (and back in again) • Seriously, environments do not age like a fine wine • With consistent use of modules, however, they should be more stable

  44. More Information • The NERSC website has a great deal of information about this: • Genepool User Environment: • http://www.nersc.gov/users/computational-systems/genepool/user-environment/ • Running CGI Scripts with Modules: • https://www.nersc.gov/users/computational-systems/genepool/user-environment/scriptenv-loading-modules-before-starting-a-script/ • Using modules within Python: • https://www.nersc.gov/users/computational-systems/genepool/user-environment/working-with-modules-within-perl-and-python/ • ModulesReloaded • Coming soon…

  45. EOF

  46. National Energy Research Scientific Computing Center

More Related