1 / 48

ALICE Tier2/3 @ GSI: User Experience and Infrastructure Overview

Learn about the ALICE Tier2/3 setup at GSI, including the GSI Lustre cluster, GSIAF integration, and user experience with the ALICE Analysis Train.

Download Presentation

ALICE Tier2/3 @ GSI: User Experience and Infrastructure Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User experience with the ALICE Tier2/3 @GSI A. Andronic, A. Kalweit, A.Manafov, A.Kreshuk, C.Preuss, D.Miskowiec, J.Otwinowski, K. Schwarz, M. Ivanov, A. Marin, M.Zynovyev, P. Braun-Munzinger, P.Malzacher, S. Radomski, S. Masciocchi, T.Roth, V.Penso, W.Schoen (ALICE-GSI)

  2. Outline • Introduction: • about GSI • about ALICE • The GSI Tier2/3 • The GSIAF • The GSI lustre cluster • The Grid@GSI • Conclusions

  3. GSI: Gesellschaft für SchwerionenforschungGerman Institute for Heavy Ion Research ~1000 employees ~1000 guest scientists Budget: ~95 Mio Euro

  4. FAIR GSI as of today

  5. ALICE: The dedicated HI Experiment at the CERN LHC

  6. ALICE Collaboration > 1000 Members ~ 30 Countries ~ 100 Institutes ALICE@GSI: Large participation in TPC and TRD Detector calibration Physics Analysis

  7. The ALICE EXPERIMENT

  8. The ALICE Grid Map Europe Africa Asia North America ALICE-FAIR meeting 8

  9. Alice Tier 2/3 @GSI: Size/Ramp-up plans Capacity is for the Tier 2 (fixed via WLCG MoU) +1/3 for the Tier 3 http://lcg.web.cern.ch/LCG/C-RRB/MoU/WLCGMoU.pdf

  10. What we want to provide:a mixture of a Tier 2 a Tier 3 a PROOF farm with local storage: GSIAF integrated in the standard GSI batch farm (GSI, FAIR) We want to be able to readjust the relative size of the different parts on request.

  11. Investment plans at GSI: ALICE Tier 2 GSI Invest FAIR ALICE T2 Time

  12. GSI – current setup CERN GridKa 1 Gbps Grid 3rd party copy test cluster10 TB 80 TB ALICE::GSI::SE::xrootd vobox Grid CE LCG RB/CE GSI batchfarm: ALICE cluster (160 nodes/1500 cores for batch: 20 nodes for GSIAF) Directly attached disk storage (81 TB) PROOF/Batch GSI batchfarm: Common batch queue Lustre Clustre 150 TB GSI

  13. Present Status • ALICE::GSI:SE::xrootd • 75 TB disk on fileserver (16 FS a 4-5 TB each)‏ • 3U 12*500 GB disks RAID 5 • 6 TB user space per server • Batch Farm/GSIAF • gave up concept of ALICE::GSI::SE_tactical::xrootd • not good to mix local and Grid access • cryptic file names make non Grid access difficult nodes dedicated to ALICE (Grid+local)‏ (used by FAIR/Theory if free) • ~1500 CPU' • 15*4 = 60 Cores, 8 GByte RAM, 2 Tbyte Disk + System (D-GRID) • 25*8 = 200 Cores, 16 GByte RAM, 2 Tbyte Disk in a RAID 5 (ALICE) • 40*8 = 320 Cores, 32 GByte RAM, 2 Tbyte Disk in a RAID 5 (D-GRID) • 7*16*8 = 896 Cores, 16 GByte RAM, 2 * 128 GByte Disks in a RAID MIRROR (Blades) (ALICE) • on all machines: Debian Etch 64bit

  14. The GSI AF

  15. PROOF – user experience • Proof cluster: 20 x 8 = 160 workers • Used heavily for code development and debugging as it providesfastresponse onlarge statistics • For example, ~1.4 TBytes of data are processed in ~20 minutes for a very CPU-intensive analysis • Overall, the users we arevery happywith it • (almost) everything is allowed – we can still handle it with 6-8 active users • All machines see an NFS-mounted disk • users can use their own libraries • Large disk space (lustre + local disks) • Intermediate results at many points can be studied

  16. Installation • shared NFS dir, visible by all nodes • xrootd ( version 2.9.0 build 20080621-0000)‏ • ROOT (521-01-alice and 519-04)‏ • AliRoot (head)‏ • all compiled for 64bit • reason: due to fast software changes • disadvantage: possible NFS stales • started to build Debian packages of the used software to install locally

  17. Configuration • Setup: 1 standalone, high end 32 GB machine for xrd redirector and proof master, Cluster: xrd data servers and proof workers, AliEn SE, Lustre • So far no authentification/authorization • Via Cfengine • platform independent computer administration system (main functionality: automatic configuration). • xrootd.cf, proof.conf, TkAuthz.Authorization, access control, Debian specific init scripts for start/stop of daemons (for the latter also Capistrano and LSF methods for fast prototyping)‏ • All configuration files are under version control (subversion)‏

  18. monitoring via MonaLisahttp://lxgrid3.gsi.de:8080

  19. PROOF users at GSIAF

  20. Cluster load

  21. PROOF cluster - issues • But, still there are some problems: • Transparency for users • “It runs fine locally, but crashes on PROOF, how do I find where the problem is?” • Fault Tolerance • Much progress in the last year, but still our problem #1 • The worst is that misbehavior of one user session can kill the whole cluster • Happens rarely, but needs manual administrator intervention

  22. The Lustre Cluster

  23. The upgraded (alpha) GSI Lustre Cluster Running lustre 1.6.4.3, Debian 2.6.22 kernel • 27 (17) Object Storage Servers, in ``fail out mode'' • Roughly 135 (80)TBytes volume (RAID 5) • Ethernet connections (27(17) x 1 Gbit/s). Bonding tested (2x1 Gbit/s per OSS), but hardware not available • ~ 1500 (400) ALICE client CPU's Other talks: W. Schoen, St Louis (2007), CERN (2008, HEPIX) S. Masciocchi (CERN,2008)

  24. Computing infrastructure 27/

  25. The ALICE Analysis Train The concept: (ROOT, ALICE) • Experimental data have large volume (200 kBytes/event) • All data stored in ROOT format • The data analysis is dominated by input/output latencies • Idea: load data once and run many analyses (train) • The ALICE Analysis Framework (A. Morsch, A. Gheata, et al.) The GSI analysis train: • 12 physics analyses (CPU/total time ~ 0.75) • Reads simulated events from lustre • Runs as batch jobs on the local farm

  26. Speed results Due to I/O, CPU/total time improves with the train

  27. Performance (with 17 FS) • Total n. of events/sec versus number of parallel jobs data on lustre * node filled with MC jobs data on one local disk * node filled with MC jobs Saturation due to network limitation! If (4000 ev/sec) we need 3 days to analyze 109 events (1 year@LHC)

  28. Network traffic-1 (with 17 FS) • 10 Gbit connection • switch giffwsx41 (the best one) • 20 nodes No problems on the 10Gbit links

  29. Network traffic-2 (with 17 FS) • file server lxfsd011 • 1 Gbit connection • for each of the current 17 file servers Very close to saturation on the 1Gbit links!!!

  30. Network traffic again (with 27 FS) 10 Gbit connection, switch giffwsx41 Now: data traffic better distributed

  31. GSI Luster Clustre

  32. Next Generation Cluster • Soon available: running lustre 1.6.5 (Move to Version 1.8.x when available) • 35 Object Storage Servers • Initially 160 TBytes volume, later 600 TBytes • MDS: 2 servers in a High Availability configuration • Ethernet connections (100x1 Gbit/s) • ~1400 ALICE client CPU's • ~ total 4000 GSI client CPU's • quotas will be enabled Walter Schoen Thomas Roth

  33. The ALICE-GSI Grid

  34. ALICE Grid jobs computed at GSI > 50000 GSI: 1% Job efficiency at GSI: 80.6%

  35. Conclusions • Coexistence of interactive and batch processes (PROOF analysis on staged data and Grid user/production jobs) on the same machines can be handled !!! • re”nice” LSF batch processes to give PROOF processes a higher priority (LSF parameter)‏ • number of jobs per queue can be increased/decreased • queues can be enabled/disabled • jobs can be moved from one queue to other queues • Currently at GSI each PROOF worker is an LSF batch node • optimised I/O. Various methods of data access (local disk, file servers via xrd, mounted lustre cluster) have been investigated systematically. Method of choice: Lustre and eventually xrd based SE. Local disks are not used for PROOF anymore at GSIAF. • PROOF nodes can be added/removed easily • Administrative overhead with local disks is larger compare to with a file cluster • extend GSI T2 and GSIAF according to promised ramp up plan

  36. Acknowledgements: The Team A. Andronic, A. Kalweit, A.Manafov, A.Kreshuk, C.Preuss, D.Miskowiec, J.Otwinowski, K. Schwarz, M. Ivanov, A. Marin, M.Zynovyev, P. Braun-Munzinger, P.Malzacher, S.Radomski, S. Masciocchi, T.Roth, V.Penso, W.Schoen (ALICE-GSI, IT-GSI)

  37. Backup slides

  38. The ALICE computing model (1/2) • pp • Quasi-online data distribution and first reconstruction at T0 • Further reconstructions at T1’s • AA • Calibration, alignment and pilot reconstructions during data taking • Data distribution and first reconstruction at T0 during four months after AA • Further reconstructions at T1’s • One copy of RAW at T0 and one distributed at T1’s

  39. The ALICE computing model (2/2) • T0 • First pass reconstruction, storage of one copy of RAW, calibration data and first-pass ESD’s • T1 • Reconstructions and scheduled analysis, storage of the second collective copy of RAW and one copy of all data to be kept, disk replicas of ESD’s and AOD’s • T2 • Simulation and end-user analysis, disk replicas of ESD’s and AOD’s

  40. The TPC

  41. The Transition Radiation Detector e-identification • 18 supermodules • 6 radial layers • 5 longitudinal stacks • 540 chambers • 750m2 active area • 28m3 of gas Each chamber: ≈ 1.45 x 1.20m2 ≈ 12cm thick (incl.Radiators and electronics) in total1.18 million read out channels

  42. TRD assembly and installation 4 SM's are installed

  43. GSIAF – GSI Analysis Facility

  44. Present Status • ALICE::GSI:SE::xrootd • 75 TB disk on fileserver (16 FS a 4-5 TB each)‏ • 3U 12*500 GB disks RAID 5 • 6 TB user space per server • Batch Farm/GSIAF • gave up concept of ALICE::GSI::SE_tactical::xrootd • not good to mix local and Grid access • cryptic file names make non Grid access difficult nodes dedicated to ALICE (Grid+local)‏ (used by FAIR/Theory if free) • 1500 CPU's • 160 boxes: 1200 cores (to a large extend funded by D-Grid): each • 2*2core 2.67 GHz Xeon, 8 GB RAM • 2.1 TB local disk space on 3 disks + system disk • Additionally 24 new boxes: each • 2*4core 2.67 GHz Xeon, 16 GB RAM • 2.0 TB local disk space on 4 disks including system • up to 2*4 core, 32GB RAM and Dell Blade Centres • on all machines: Debian Etch 64bit

More Related