330 likes | 347 Views
The LCG Test Suites: more details. Gilbert Grosdidier LAL-Orsay/IN2P3/CNRS & LCG. Suite contents and purposes Test S/W design Configure step Test Loop Plug-ins Definition files Presenter step The different kinds of tests more on the storms The CLI options
E N D
The LCG Test Suites:more details Gilbert Grosdidier LAL-Orsay/IN2P3/CNRS & LCG LCG Test Suites @ EGEE CERN - GG
Suite contents and purposes Test S/W design Configure step Test Loop Plug-ins Definition files Presenter step The different kinds of tests more on the storms The CLI options with an example of submission Result examples and other useful links The CTB The current set of Testing Suites was mainly contributed by: Miquel Barcelo Frédérique Chollet Gilbert Grosdidier Andrey Kiryanov Charles Loomis Gonzalo Merino René Météry Danila Oleynik Andrea Sciabà Elena Slabospitskaya Many other people contributed to the design and ideas leading to the current suites among them the INFN Testing Team Talk plan, and Credits LCG Test Suites @ EGEE CERN - GG
CE: Computing Element the gateway to the WNs SE: Storage Element SRM: Storage Resource Manager (E)RM: (EDG) Replica Manager the 3 above are the 3 Devils BDII: Information Index RB: Resource Broker PX: Myproxy Server WN: Worker Node(s) ie the Batch Worker(s) UI: User Interface the passport for Hell … or Heaven ? LCG: L(HC) Computing Grid Acronyms for the newbies LCG Test Suites @ EGEE CERN - GG
Four levels of testing required Almost dropped Only ~50% developped Fully developped Fully developped • Installation and Configuration Testing • targets each machine/service individually • Unit Testing • targets each basic functionality of a given service independently from the rest of the TB • Functionality Testing • for the whole TB, exercises full functionalities, from a user point of vue, but one by one • Stress Testing • same as above, but with sophisticated jobs (several functionalities required at the same time), and with a huge number of jobs. • The most basic testing was dropped because of lack of manpower • Meaning Install & Config, and Unit Testing LCG Test Suites @ EGEE CERN - GG
The TSTG suite Dedication • Site Certification • initial check of the install • check when changes occur • regular re-checking (daily checks) <- this precisely implies Unit Testing • FS full, memory exhaustion, DB full, list full, no more inodes, hanging server(s) • TB Certification • Basic Funtional Tests of components (services) • Basic Grid Functionalities, with individual tests • Full Grid Functionalities on a full TB, including remote sites, with global/group tests • Beyond (HEPCAL and Exp. Beta Testing) is outside of TSTG scope • M/W Validation (Functionalities, Robustness, Stress Testing) • Basic Funtional Tests of components (services) • Basic Grid Functionalities, on a well defined/known TB, like the Cert TB • Full Grid Functionalities, including stress testing and group testing • task dedicated to performance/functionality comparisons with previous version • thru definition tests • the major requirement being then automated running (thru cron jobs) • no prompting, no passphrases LCG Test Suites @ EGEE CERN - GG
Test S/W design, & Configure Task • It is split into 2 main parts, dedicated to 3 tasks • the top level driver, written in Perl • the plug-ins, one for each specific test • The top level driver is responsible for these 3 tasks: • Configure: build environment (S/W libraries & configurations, TB config) • Test Loop: run one or several selected tests • Presenter: merge the results and build the Web enabled summary page • The Configure task • sets the environment for test S/W libraries • and the testbed configuration itself (i.e. the nodes to be tested) • 2 sets of options are available • the general options for the top level configuration • selecting the TB, verbosity, VO, main RB, • the options specific to each plug-in which will be run LCG Test Suites @ EGEE CERN - GG
The TSTG S/W structure Job1 S/W config Job2 Config TB config Job3 Setup JStorm Job4 Run CEGate Job5 Evaluate DStorm Job6 Test Loop BuildXML MM_rfio Job7 GfalStorm … RB_val … Presenter Merge Results Summary Page LCG Test Suites @ EGEE CERN - GG
The Test Loop design • The Test Loop features: • it is built out of a lot of more or less independent plug-ins • this offers more robustness in case one crashes • execution of the suite can proceed to the next one harmlessly • different languages are allowed (flexibility and openness) • Perl OO technique for most of them • but also a few of them in bash • other languages are welcome (Java, C, C++, …) • provided the plug-ins share the same input (CLI) and output (XML) design • Input requires switches to target all or some nodes only • Output includes: exit status, summary results printed to STDOUT, detailed results going to an XML file • they should be killable (thru the top level process) • they should create no side effect for the other tests • no processes still spinning when they are done • no left over jobs without being cancelled • no files left on the SEs whenever possible LCG Test Suites @ EGEE CERN - GG
More on the Plug-in design • The Test phase allows to run several tens of generic tests • using all the same environment • a specific testbed for example • however each one is aiming at a specific kind of services • all the CEs of a given TB for example • one may select a bunch of them, or run them one at a time • It is often required to run a whole bunch of tests where one needs to specify different targets or configurations for different tests • One may want to choose • only 3 CEs out of 7 for the RB tests • while using all available CEs for the SRM-SE stress test on dCache SEs • The solution goes thru the so-called « definition files » • allowing for very flexible construction of test batches LCG Test Suites @ EGEE CERN - GG
The « definition file » feature • Each line in such a file is indeed: • targeting only one of the generic tests • with very detailed specifications (thru option parameters) • on the target machines • on the conditions of the test (input, output, speed, etc…) • In a definition file, the same generic test can be repeated several times • with different conditions or specifications • targeting different subsets of nodes • This feature is used in several opportunities • where a preformatted bunch of tests has to be run • when running a cron job • when comparing different results is required • for TB certification or M/W validation • when building a test with an automated tool or GUI • when a single shot is required to launch many tests at the same time LCG Test Suites @ EGEE CERN - GG
The definition file feature summary • In the MainScript itself, the Test Loop contains hard coded definitions of the generic form of each possible subtest (plugin) • While the definition file contains the actual list of subtests, where modifications and additions to the generic form of the subtest may be found • This flexibility allows to • Easily repeat the same subtest several times but with different tunings/parameters inside of the same full test • More/different machines to target • Different stress parameters • And so on • Avoid recoding the main loop each time a slight modification is achieved for a given subtest • Add/remove subtests to a full test w/o recoding (when the generic form of the subtest already exists :-) LCG Test Suites @ EGEE CERN - GG
The Presenter level • After each step in the test loop, information is gathered about the step results: • exit status, output files, • elapsed time, • effective options, actual command line used, • nodes effectively tested, … • When the loop is over, an overall summary Web page is created: • to merge all these informations in an « at a glance » fashion • to allow navigation down to detailed files to track the cause of the failures • to give access for each step to a documentation page gathering: • the description of the step • the way to re-issue the command • to reproduce the failure (conditions, options, nodes) • or to clone it into a different environment, or to re-use some parts of it in a different way to cross-check its (weird ;-) results LCG Test Suites @ EGEE CERN - GG
The different kinds of Test [v0.1.12] • Watch out: test names are far from meaningful • CEGate: Globus Gatekeeper Unit Testing (CEs) • 11 tests achieved on each CE node • CECycle: submit jobs to specific CEs systematically • UI_ST: UI functionality tests for SiteTesting • FTP: GridFTP functionality tests (RBs, CEs, SEs) • DNS: reverse DNS Tests (RBs, CEs, SEs, PX) • RB_val: Functionality tests for RBs (Unit testing) • a suite of 5 small jobs submitted through JDL files • SEwsCycle: Checkup of SE setup (WP5-SEs) - does not work yet • RMCycle: Checkup of RM setup (SEs, RM) • PXRenew: to check Proxy Renewal from inside a WN job • very touchy LCG Test Suites @ EGEE CERN - GG
The different kinds of Test (2) • The Storms (All components, Global & Stress testing): • JStorm: Job Storm • Simple jobs with InpOut sandbox transfers, and check of output contents • --batchSleep option also available • CStorm: Copy Storm • Achieves random GridFTP transfers from jobs running on WNs • RStorm: Replica Storm • Broadcasts files thru RM from the UI, • and checks for availability on SEs from the WNs • KStorm: Checksum Storm • Achieves big file sandbox transfers, with both end checksums • UStorm: User Storm • Where the user may provide his own JDL xor Script files • CalStorm: the storm engine is a different one • this allows sending more jobs in a row, and is more stressful for the submit phase • usually, the jobs are 10sec. sleepers • but 50-100min sleepers aim to check load balance on CEs LCG Test Suites @ EGEE CERN - GG
CalStorm submission profile JobStorm profile Why 2 kinds of Storm ? w jobs in the stack m streams adjustable delay complete job subm. (30 sec) complete job subm. n jobs execution polling output retrieval Jobs are spawned independently; Window stack is adjustable; Number of jobs Jobs done Submission only Av. delay between subs: 3sec Sliding window: submission, polling + out. retrieval Jobs are alive on the RB Execution and polling Delay between subs: <1sec (adjustable) Jobs are submitted sequentially within several streams; Number of streams is adjustable; Jobs: NOT submitted Output Retrieval Number of jobs Time Time LCG Test Suites @ EGEE CERN - GG
CalStorm polling adjustable retrycount adjustable myproxyhost selectable JobStorm timeout adjustable retrycount adjustable possible to resubmit a failed job CE can be selected can work with several RBs in turn possibility to clean all ghost jobs previous to the storm Similarities and Differences • Similarities • a very simple Hello script is sent for execution on WNs • a sleep time option is available • the polling period is adjustable (however does not mean the same thing) • resources can be specified • ranking, CPU, other classAD requests, CE selection, … • RB can be selected LCG Test Suites @ EGEE CERN - GG
The different kinds of Test (3) • More Storms: the Data Storms now • DStorm: Data Storm • Replica (RM) file transfers from the WNs, and check for contents on UI • Currently testing either of Classic/Castor/dCache SRM-SEs • HStorm: David’s Storm • same as above, but using file names with metadata contents • this allows to spot when a job mistakenly ran several times • RB or Condor-G debugging • GfalStorm: is a special Data Storm • Uses GFAL lib to write/read/stat/unlink/close a file from a WN on a remote SE • thru a small C application • Currently testing either of Classic/Castor/dCache SRM-SE • The storms, when used for submitting few jobs, are also extremely powerful to spot configuration or M/W failures on many different components • they exercise many distributed parts of the M/W, and allow for fine grain debug LCG Test Suites @ EGEE CERN - GG
The different kinds of Test (4) • MM: MatchMaking Test for RB • exercising either of (file)/gridftp/rfio/gsidcap protocols • one of the most important and sophisticated one • RLS: Basic functionality Test for RMC/LRC/RLS • SEs: GridFTPUmask checks for SEs (should be merged with ?) • Deprecated (to be reactivated): • ProXyf: Security Test for RB (stealing proxies - better if failing) • MDS: Consistency checks for (MDS +) BDII (2 tests in sequence) LCG Test Suites @ EGEE CERN - GG
The general purpose Options • . MainScript --TList="test1 test2 …" • testX = CEGate CECycle FTP DNS RB SEwsCycle RMCycle • also: MDS RLS MM UI_ST • and storms: JStorm CStorm RStorm KStorm DStorm UStorm GfalStorm… • . "MainScript --List" : • Prints the List of available Tests. • . "MainScript --help" : • Prints this README file, plus the full option List. • . "MainScript --MDebug" : • Prints some Variable values from inside the MainScript. • . "MainScript --TList="test" --fullHelp" : • To Force printing of a detailed Help about the selected Tests. • . "MainScript --TList="test" --showME" : • To Force printing of option values and machine names for the selected Tests. LCG Test Suites @ EGEE CERN - GG
More specific top level Options • Many other options, meaningful only at individual test level, are available • though all of them may NOT be available for some specific tests (use --showME option) • . MainScript --TList="test" --forcingTB= "yourTBname" • To Force a TB other than "CertTB". This option is mandatory. • . MainScript --TList="test" --addOptionList="--opt1=val1 --opt2=val2 ..." • To Provide a list of additionnal Options to the Tests to be achieved. • . MainScript --TList="test" --forceMachineList="node1 node2" • To Provide a list of Machines to be used in the tests, overriding the default • . MainScript --TList="test" --addMachineList="node1 node2 ..." • To Provide a list of Machines to be used in the tests, adding them to the default ones • . MainScript --TList="test" --forcingRB="fullRBname" • To Provide an alternate RB to work with, overriding the default one provided on the UI • . MainScript --TList="test" --forcingVO="otherVO" • To Force a VO other than "dteam". Useful in most of the tests now. • These are only some of them … LCG Test Suites @ EGEE CERN - GG
Some shared by several tests For the Storm family --useCEList --circular --singleSubmit --multiRB --reqLapse --maxStack --maxSubs --pollingPeriod --childTimeout --stackDebug --jobCancel --selectNodes --SEtype For other tests --nstreams --njobs --polling --retrycount --myproxyhost --resource --rank --requirements --time --protocol --customScript --customJDL --filesize Many other service options Some individual test Options LCG Test Suites @ EGEE CERN - GG
The CLI: an Example • An actual example of a test submission command • although the test name is a dummy one :-) MainScript --forcingTB=CertTB --verbose --forcingVO=atlas \ --forcingRB=lxshare0240.cern.ch --TList=DummyStorm \ --addOptionList="--reqLapse=1 --maxStack=50 --singleSubmit --maxSubs=20 \ --pollingPeriod=60 --keepTempDir --circular --useCEList --serie=11022 \ --selectNodes='lxshare0277.cern.ch lxshare0290.cern.ch' " \ --forceMachineList="lxshare0236.cern.ch lxshare0278.cern.ch" • In this case, the generic command was: MainScript --TList=DummyStorm \ --addOptionList="--reqLapse=2 --maxStack=25 --maxSubs=2 --pollingPeriod=30" • it was submitting jobs to all available CEs • it was acting on all SEs of the site, by default. LCG Test Suites @ EGEE CERN - GG
Detailed example, and other links • A detailed example of a recent Result Web page, produced on the CertTB (15/05/04, morning) is available in the LCG/TSTG area: • 040515-040505 RTest • http://grid-deployment.web.cern.ch/grid-deployment/tstg/validation/040515-040505_RTest • This presentation is available in: • EdinburghTSTG [ppt], [pdf]. • http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/EdinburghTSTG.ppt • Install help for TSTG RPMs and Tarball: • Install URL • http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/LCG—Certification-help LCG Test Suites @ EGEE CERN - GG
Certification & Testing Testbed Still missing: remote clusters/sites UIa UIb UIc PX RBa RBb RBc BDIIa BDIIb RLS BDIIc CEPBS CEPBS CEPBS CEPBS CEPBS CECondor CELSF WNf WNg SEClassic SECastor SEClassic SEClassic SEClassic WNf WNg WNb WNe WNf WNg SEdCache SECastor SEdCache WNb WNe WNf WNg WNa WNc WNb WNe SEdCache … WNg WNa WNc WNd WNe WNg WNc WNd … WNg WNc WNd WNg WNc WNd … LCG Test Suites @ EGEE CERN - GG
Conclusion: Useful or not ? • Yes, the TSTG suites are useful and powerful • they are used daily to spot misconfigurations on the CTB • each time some new piece of S/W is (re-)installed • some specific suites are also run as a daily cron-job • other, longer lasting or more stressful are run every week (cron) • they most often allow to discover problems or features • they are even used to debug development S/W out of the box • however, interpretation of results not always straightforward • but it was not expected to be the other way round :-) • Documentation and Dissemination must be improved • and this talk was part of it • Not every kind of required test is provided yet • but new tests are often easy to derive from existing ones • thanks to definition files, it is often easy to cover a need by reusing 2 or several existing pieces together • new additions will soon be required • WP5-SEs, R-GMA, N-MON LCG Test Suites @ EGEE CERN - GG
Layout of TSTG Framework modules • The different functional parts at Framework level are: • Initialisations (environment, option setting, def file reading, …) • Various cross-checks (proxy + myproxy, RB config files, …) • Test loop itself • Building of the global XML result file • Building of the global HTML result file • From the partial results provided by each subtest • Preparation and delivery of the tarball result file • And expansion in final AFS/Web location if acrontab job • Together with sending of the mail when the acrontab job is over • The code is located into a few Perl scripts only • MainScript • BaseUtils.pm • We are now up to version 0.1.18 of TSTG LCG Test Suites @ EGEE CERN - GG
Layout of Perl OO plugin modules • This Perl OO layout was initiated by Cal Loomis in ‘2002. • The inheritance tree of the Perl plugin family is currently: • BaseStep.pm, EDGUtils.pm & TestUtils.pm - common utilities • File, Shell - external utilities • BaseTest - BRANCH • CheckInfo, RM, SEUtils, SEws, TestCycle - deprecated branches • EDGLifecycle - BRANCH • Stack - BRANCH • The method calls available in BaseTest are (methods to be overloaded) • setContext - to init some global vars, and set up possible options • checkArguments - no comment • setDefaults - to set default values for unspecified options • checkPrerequisites - somewhat deprecated • setupTest - to build everything needed by the further test • runTest - no comment • evaluateTest - no comment • parsing - parsing and formatting of output of previous step • resultParser - extract/build global return status from plugin printout • runXMLParserScript • buildXMLFile - builds the XML result file from global result hash table • Calls log2html script - to build local HTML file from above XML file • cleanup - no comment LCG Test Suites @ EGEE CERN - GG
Layout of Perl OO plugin modules • The method calls available in EDGLifecycle branch are (methods to be overloaded) • setContext - to init some global vars, and set up possible options • checkArguments - no comment • setDefaults - to set default values for unspecified options • checkPrerequisites - somewhat deprecated • setupTest - to build everything needed by the further test • runTest - no comment • evaluateTest - no comment • parsing - parsing and formatting of output of previous step • resultParser - extract/build global return status from plugin printout • runXMLParserScript • buildXMLFile - builds the XML result file from global result hash table • Calls log2html script - to build local HTML file from above XML file • cleanup - no comment LCG Test Suites @ EGEE CERN - GG
Layout of Perl OO plugin modules • The method calls available in Stack family branch are (methods to be overloaded) • sub setContext - same role • sub setupTest - same role • sub buildCEList - builds the list of actually available CEs using list-match • sub runTest - same role • sub submitOver - utility • sub createJDL - creates the generic JDL file • sub createScript - creates the script to be run on the WNs • sub setupEnvironment - env setup more specific to the storm itself • sub addArguments - adds specific arguments to each individual JDL file • sub evaluateTest - same role • sub cleanup - same role • sub waitforChildren - monitors the spawned children (cf next) • sub submitTimeout - spawns the process which will submit and monitor each individual job • sub reaper - retrieves the pid of the completed spawned processes • sub fillResHashN - fills the result hash structure • sub buildXMLFile - same role • sub runXMLParserScript - same role • sub cforceload - utility • sub checkSem - utility LCG Test Suites @ EGEE CERN - GG
BaseTest - BRANCH GlobusGatekeeper.pm GridFTP.pm ReverseDNS.pm SecurityTest.pm TestLRCAttributes.pm, TestLRCMapping.pm, TestRMCAttributes.pm, TestRMCMapping.pm - deprecated modules EDGLifecycle - BRANCH BrokerInfo.pm CheckVOVars.pm Checksum.pm HelloScript.pm HelloWorld.pm ProxyRenewal.pm RMSetupTest.pm Sleep.pm Current Branch Contents • Stack - BRANCH • CECycle.pm • CheckStorm.pm • CopyStorm.pm • DataStorm.pm • DavidStorm.pm • GfalStorm.pm • JobStorm.pm • MatchMaking.pm • MultiCStorm.pm • MultiDStorm.pm • PileStorm.pm • UserStorm.pm LCG Test Suites @ EGEE CERN - GG
Early running and pretesting of a new plugin • Let’s assume that the developper achieved a private install of the TSTG stuff using the tarball (cf slide #23). The base dir will then be: • opt/edg/bin for the MainScript running • opt/edg/lib/perl for individual plugin running • If one sticks to current Perl OO layout: • It is easy to run/test a new plugin outside of the TSTG framework, before any kind of interaction with it, even if the whole stuff required by the latter is already coded and running, while already using the benefits of the whole OO structure • First drop the script into the required location • Eg opt/edg/lib/perl/BaseTest/EDGLifeycle/YourNewTest.pm • Run it through this command line: • edg-testbed-test BaseTest::EDGLifeycle::YourNewTest --yourOpt1=val1 --yourOpt2=val2 … param1 param2 … • edg-testbed-test BaseTest::EDGLifecycle::RMSetupTest --xml --VO=dteam --njobs=15 --lcgcr lxb1767.cern.ch lxb0739.cern.ch lxb1759.cern.ch • It is easy as well to test it within the framework, because there are 3 only actions required to insert a new plugin into the whole structure • Insert the new test code into the trusted list of the MainScript • Insert the generic call into the Test Loop of the MainScript • Drop the required script on the required branch of the project dir tree (cf above) • Check out examples on next slide LCG Test Suites @ EGEE CERN - GG
An actual example • If the test code one wants to use for the new test is RMS_All, the generic call sequence in the Test Loop of the MainScript will look like: • if ( $test eq "RMS_All" ) { • my @SEs = "$ENV{SEhostname__}"; • my $func = "BaseTest::EDGLifecycle::RMSetupTest"; • my $trail = "--xml --VO=$self->{opts}->{forcingVO}"; • TestProc(\$self, "$test", "RunTest", $func, $tIndex, 0, "$trail @SEs");} • If one only wants to use the generic call, run it through something like: • opt/edg/bin/MainScript --forcingTB=CertTB --TList=RMS_All --forcingVO=dteam • If one wants something more subtle than the generic call, one can now use instead: • opt/edg/bin/MainScript --forcingTB=CertTB --TList=RMS_All --verbose --forcingVO=dteam --addOptionList="--xml --njobs=10" --forceMachineList="lxb1767.cern.ch lxb1759.cern.ch lxb0739.cern.ch" LCG Test Suites @ EGEE CERN - GG
Permanent insertion of a new test in the suite • Probably better to be handled by a designated admin • Required steps (rather simple IMHO): • Insert the new script code into the src dir of the edg-site-certification package of the LCG CVS repo (isscvs.cern.ch:/local/reps/lcgware) • Insert the new script name in the required location in the Makefile.am of the same directory • Insert both MainScript additions into MainScript.in file (src dir of the edg-tests package of the same repo) • Cf previous slide LCG Test Suites @ EGEE CERN - GG