1 / 26

Current Accounting in LCG

Current Accounting in LCG. John Gordon and David Kant CCLRC, e-Science Centre. History . EDG – EU DataGrid 2001-04 developed DGAS a full economic scheduling and accounting package developed in Italy wasn’t mature enough to be deployed by end of EDG LCG – 2004-….

velika
Download Presentation

Current Accounting in LCG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current Accounting in LCG John Gordon and David Kant CCLRC, e-Science Centre

  2. History • EDG – EU DataGrid 2001-04 • developed DGAS a full economic scheduling and accounting package developed in Italy • wasn’t mature enough to be deployed by end of EDG • LCG – 2004-…. • wanted resource reporting across the grid • commissioned APEL from RAL • SWEGrid • developed SGAS for Swedish Supercomputing • OSG • Their own solution under development v0.0 • Agreed to collaborate to share code and provide a central LCG report. . • I am mainly considering APEL but many of the issues are common to other systems and worldwide aggregation is discussed at the end. LCG Management Board, 7 Feb 2006 - 2

  3. APEL Repository • In production since December 2004 • Wider than EGEE • Asia Pacific • FNAL/CMS • Can take accounting records from different sources and combine to show Multi-Grid views of LHC VOs. • flash thorugh next few slides of example displays LCG Management Board, 7 Feb 2006 - 3

  4. Accounting Home Page http://goc.grid-support.ac.uk// 166 Sites publishing data (7 Feb 2006) 6.0 Million Job records ~ 100K records per week (period June – Dec 2005)

  5. Demos of Accounting Aggregation Global views of resource consumption. • LCG View • http://goc.grid-support.ac.uk/gridsite/accounting/tree/treeview.php • Shows Aggregation for each LHC VO • Requirements driven by RRB / Kors Bos • Tier-1 and Country entry points • LHC VO only • All data normalised in units of 1000 . SI2000 . Hour • Tabular Summaries per Tier1/ Country • GridPP View • http://goc.grid-support.ac.uk/gridsite/accounting/tree/gridppview.php • Shows Aggregation for EGEE partner • Prototype for EGEE View LCG Management Board, 7 Feb 2006 - 5

  6. LHC View: Data Aggregation For VOs per Tier1, per Country

  7. Aggregation of Data for GridPP

  8. Aggregation of Data for Tier2

  9. Data Aggregation at Site Level Breakdown of data per Vo per month showing Njobs, CPUt, WCT, record history Total CPU Usage per VO Gantt Chart NB:Gaps across all VOs consistent with scheduled downdowns in GocDB

  10. Issues • Full Deployment • Validation • Level of detail • Futureproofing • Account other resources • Standards • Interoperability • Global Repository LCG Management Board, 7 Feb 2006 - 11

  11. Full Deployment • Only 80% of sites in EGEE are currently publishing accounting records – Why? • Batch support not 100% • Some haven’t got R-GMA working • Some haven’t deployed APEL or configured it properly • Some don’t want to. • Tier1 or ROC should help and/or persuade them • Does LCG want a single accounting repository. Or is this a VO responsibility? • Some feel that there are local legal reasons of data protection and/or personal privacy LCG Management Board, 7 Feb 2006 - 12

  12. Legal/Privacy • APEL does not publish user information (by default) • APEL can filter so that only LCG VOs are published • Sites can publish aggregate data and have full control over what they publish • Country Reps should send me details if this is an issue in their country • or one of their sites thinks it is • Currently the data reside in R-GMA at the job level • Would the issue change if access was restricted? LCG Management Board, 7 Feb 2006 - 13

  13. Batch Support in APEL Currently Available in LCG 2.7 • OpenPBS, Torque, PBSPro and Vanilla PBS • ~90% Sites in EGEE, similar or higher in OSG • Load Share Facility (Versions 5 and 6) • CERN, Italy Available RSN • Condor • Canada • Sun Grid Engine in development • Imperial College DIY • Any site can publish its own information in aggregated records • One record per experiment/day showing total jobs/cputime • Allows site to use its own accounting system and only publish what it wants LCG Management Board, 7 Feb 2006 - 14

  14. Validation 1 • Are all records captured? • Portal shows continuity of publishing/experiment • No systematic way of checking at the job level since only Grid jobs are accounted, missing jobs in sequence may be local ones • Comparing with RB doesn’t help as not all jobs submitted via RBs and don’t know full set of RBs • 2.7.0 has RB unique identifier which can be used by APEL • Sites should check their APEL logs for errors in publishing and for consistency with any local records or monitoring • Is normalisation correct? • APEL normalises using average SpecInt2000 value for the cluster published in Information Service. • Might be wrong • Spanish ROC working on benchmarking jobs to establish better value and flag up discrepancies with BDII published value • This should give us reassurance • Is site meeting commitments? LCG Management Board, 7 Feb 2006 - 15

  15. Validation 2 • Is the site meeting its commitments • to LCG? • Easy to store MoU figures per site and/or country and build into plots • to the VO? • possible to do the same for LHC but wouldn’t scale to all VOs. • This is where the VO asks for details of users • Scope for sites to manipulate their published results LCG Management Board, 7 Feb 2006 - 16

  16. Level of Detail • The original LCG spec (spring 2004) was for • only VO accounting – didn’t need user info • only Grid jobs accounted – local jobs were the site’s business • I added user DN to spec • but by default it doesn’t leave the site. • As soon as a VO is shown details of its use, someone asks ‘Who used all that cpu at that site?’ • SWE ROC (Spain and Portugal) has a portal which shows use by users • need a grid certificate to access • If we want user level accounting we need to address the legal issues • Now that sites are being questioned on their delivery to LCG and experiments they note that they are also contributing significant resources via local jobs. • Accounting for local jobs as well is a major change of scope • Sites which use their own accounting systems can do it trivially for experiment (but not user). • Do experiments centrally want locally submitted work to count against their allocation? LCG Management Board, 7 Feb 2006 - 17

  17. Futureproofing • APEL depends on joining information from several sources • Sensitive to changes in log format by • Globus • Batch system (LSF, PBS,…. • Middleware (gLite,… • gLite WMS implications not fully understood. LCG Management Board, 7 Feb 2006 - 18

  18. Accounting Other Resources • Storage • The GGF Usage Record schema can be extended to cover arbitrary resources. • run a cron on SE once per day to measure usage/VO and publish to APEL repository • Memory • can be stored in job records but how do we aggregate? • probably more relevant to monitor than account. • Network • SNMP allows monitoring of lines/ports • feasible to show traffic in/out of site/cluster but doesn’t scale to hosts • no association with jobs, VO, or user possible • current network business model doesn’t require accounting LCG Management Board, 7 Feb 2006 - 19

  19. Standards • Global Grid Forum has a working group which has developed a Usage Record schema. • All known grid accounting systems claim to support this • but not verified, limited if any exchange of records. • The schema is not optimal for our use. • e.g. no site information in schema. • TeraGrid, OSG and EGEE have all commented on the current draft but we should participate directly in the GGF UR WG as real-life experience greatly improves standards. LCG Management Board, 7 Feb 2006 - 20

  20. Interoperability • Discussion with SweGrid but no data exchanged • FNAL/CMS publishing aggregated data monthly • detailed approval mechanism • DGAS publishing data (via R-GMA??) from their respository to APEL • Discussing sensors with OSG • they want LSF, we like their Condor method. LCG Management Board, 7 Feb 2006 - 21

  21. World Wide Accounting Service for LCG • Project involves combining results from all three peer infrastructures and presenting an aggregated view of resource usage for LHC VOs to the RRB • Peer Infrastructures in LCG • Open Science Grid + Others (Ruth Pordes, Philippe Canal, Matteo Melani) • Nordugrid (Per Oster, Thomas Sandholm) • LCG/EGEE (Kors Bos, Dave Kant) GRID-ACCOUNTING@LISTSERV.RL.AC.UK LCG Management Board, 7 Feb 2006 - 22

  22. Web Service Container Service Interface RUS WS Application ACL DB Resource Usage Service • Based on emerging GGF standards and Web Services • GGF UR, OGSI • An implementation exists in “Market for Computational Science” – UK e-Science project • Use case might be: • A user invokes the query service through a web browser, using SSL for client authentication, to ensure that usage information at user level belongs to the user. Servlet sends query to RUS web service and gets user data. LCG Management Board, 7 Feb 2006 - 23

  23. Possible Roadmap • Stage 1: Lets try to get some data from each of Tier-1s summary records describing VO usage over a finite period of time • Before end 2005 • SweGrid and Fermilab and DGAS ARE providing Data! • Stage 2: Centralised database with a web service interface (RUS) to publish/query accounting data (summary records) • Sometime in 2006 • Stage 3: Distributed databases with a complete RUS implementation including permission model. • Sometime early 2007 LCG Management Board, 7 Feb 2006 - 24

  24. Summary • EGEE has had a production accounting infrastructure in place since 2004 • but still has a long way to go • We are developing a central repository • to sit above all the grid infrastructures (EGEE, OSG, NorduGrid) • to meet the requirement for global reporting on LHC Computing • Accounting is a controversial subject • Thank you to everyone who has cooperated LCG Management Board, 7 Feb 2006 - 25

  25. Actions • Countries to give feedback on legal issues with LCG-wide accounting (By April?) • APEL to understand gLite implications • Tier1s to help ‘their’ associated sites to deploy and configure APEL. • or to use their own accounting system to publish to LCG Repository • Sites to check their APEL logs for errors • give feedback to increase reliability and robustness • Sites to compare with other sources as reality check • eg if Ganglia shows 90% average utilisation over a month then APEL numbers should agree. • Work with OSG to push/pull their LCG accounting data LCG Management Board, 7 Feb 2006 - 26

More Related