470 likes | 576 Views
http://presidentofindia.nic.in/scripts/sllatest1.jsp?id=734. Status of the LHC Project. J. Engelen February 13, 2006. The Large Hadron Collider. The Large Hadron Collider: 14 TeV pp collisions at 10 34 cm -2 s -1 New energy domain (x10), new luminosity domain (x100)
E N D
http://presidentofindia.nic.in/scripts/sllatest1.jsp?id=734 H.Göringer IT/EE Palaver GSI
Status of the LHC Project J. Engelen February 13, 2006 H.Göringer IT/EE Palaver GSI
The Large Hadron Collider • The Large Hadron Collider: 14 TeV pp collisions at 1034 cm-2s-1 • New energy domain (x10), new luminosity domain (x100) • Will have to cross threshold of electroweak symmetry • breaking; unitarity of WW scattering requires MHiggs< 850 GeV • Many possibilities: Standard Higgs – SUSY (many possibilities...) • Large Extra Dimensions (quantum gravity) • and many more results on CP violation, Quark Gluon Plasma, • QCD, ..., surprises... The LHC results will determine the future course of High Energy Physics H.Göringer IT/EE Palaver GSI
Barrel Toroid installation status The mechanical installation is complete, electrical and cryogenic connections are being made now, for a first in-situ cool-down and excitation test in spring 2006 H.Göringer IT/EE Palaver GSI
The LHC Computing Grid: LCG(Project leader Les Robertson) is about storing 15 PB (imagine!) of new data per year; processing them and making the information available to thousands of physicists all around the world! Model: ‘Tiered’ architecture; 100,000 processors; multi-PB disk, tape capacity Leading ‘computing centers’ involved H.Göringer IT/EE Palaver GSI
LCG 2005 today 2006 cosmics 2007 First beams First physics 2008 Full physics run Building the Service SC1 -Nov04-Jan05 - data transfer between CERN and three Tier-1s (FNAL, NIKHEF, FZK) SC2 –Apr05 - data distribution from CERN to 7 Tier-1s – 600 MB/sec sustained for 10 days (one third of final nominal rate) SC3 –Sep-Dec05 - demonstrate reliable basic service – most Tier-1s, some Tier-2s; push up Tier-1 data rates to 150 MB/sec (60 MB/sec to tape) SC4 –May-Aug06 - demonstrate full service – all Tier-1s, major Tier-2s; full set of baseline services; data distribution and recording at nominal LHC rate (1.6 GB/sec) LHC Service in operation– Sep06 – over following six months ramp up to full operational capacity & performance LHC service commissioned – Apr07 H.Göringer IT/EE Palaver GSI
Conclusions The LHC project (machine; detectors; LCG) is well underway for physics in 2007 Detector construction is generally proceeding well, although not without concerns in some cases; an enormous integration/installation effort is ongoing – schedules are tight but are also taken very seriously. LCG (like machine and detectors at a technological level that defines the new ‘state of the art’) needs to fully develop the functionality required; new ‘paradigm’. Large potential for exciting physics. H.Göringer IT/EE Palaver GSI
CHEP – Mumbai, February 2006 State of Readiness of LHC Computing Infrastructure Jamie Shiers, CERN H.Göringer IT/EE Palaver GSI
LHC Commissioning Expect to be characterised by: • Poorly understood detectors, calibration, software, triggers etc. • Most likely no AOD or TAG from first pass – but ESD will be larger? • The pressure will be on to produce some results as soon as possible! • There will not be sufficient resources at CERN to handle the load • We need a fully functional distributed system, aka Grid • There are many Use Cases we did not yet clearly identify • Nor indeed test --- this remains to be done in the coming 9 months! H.Göringer IT/EE Palaver GSI
Resource Deployment and Usage Resource Requirements for 2008 H.Göringer IT/EE Palaver GSI
State of Readiness of the LHC experiments’ software P. Sphicas CERN/UoA Computing in High Energy Physics Mumbai, Feb 2006 H.Göringer IT/EE Palaver GSI
ROOT activity at CERN fully integrated in the LCG organization (planning, milestones, reviews, resources, etc.) • The main change during last year has been the merge of the SEAL and ROOT projects • Single development team • Adiabatic migration of the software products into a single set of core software libraries • 50% of the SEAL functionality has been migrated into ROOT (mathlib, reflection, python scripting, etc.) • ROOT is now at the “root” of the software for all the LHC experiments H.Göringer IT/EE Palaver GSI
LHC-Era Data Rates in 2004 and 2005Experiences of the PHENIX Experiment with a PetaByte of Data Martin L. Purschke, Brookhaven National Laboratory PHENIX Collaboration Long Island, NY RHIC from space H.Göringer IT/EE Palaver GSI
ALICE ~1250 All in MB/s all approximate 600 ATLAS CMS ~300 LHCb ~150 ~100 ~100 ~25 ~40 Where we are w.r.t. others 400-600MB/s are not so Sci-Fi these days H.Göringer IT/EE Palaver GSI
But is this a good thing? We had a good amount of discussions about the merit of going to high data rates. Are we drowning in data? Will we be able to analyze the data quickly enough? Are we recording “boring” events, mostly? Is it not better to trigger and reject? • In Heavy-Ion collisions, the rejection power of level2- triggers is limited (high multiplicity, etc) • triggers take a lot of time to study and developers usually welcome a delay in the onset of the actual rejection mode • The rejected events are by no means “boring”, high-statistics physics in them, too H.Göringer IT/EE Palaver GSI
...good thing? Cont’d • Get the data while the getting is good - the Detector system is evolving and is hence unique for each run, better get as much data with it as you can • Get physics that you simply can’t trigger on • Don’t be afraid to let data sit on tape unanalyzed - computing power increases, time is on your side here In the end we convinced ourselves that we could (and should) do it. The increased data rate helped defer the onset of the LVL2 rejection mode in Runs 4, 5 (didn’t run rejection at all in the end) Saved a lot of headaches… we think H.Göringer IT/EE Palaver GSI
Ingredients to achieve the high rates We implemented several main ingredients that made the high rates possible. • We compress the data before they get on a disk the first time (cram as much information as possible into each MB) • Run several local storage servers (“Buffer boxes”) in parallel, and dramatically increased the buffer disk space (40TB currently) • Improved the overall network connectivity and topology • Automated most of the file handling so the data rates in the DAQ become manageable H.Göringer IT/EE Palaver GSI
How to analyze these amounts of data? • Priority reconstruction and analysis of “filtered” events (Level2 trigger algorithms offline, filter out the most interesting events) • Shipped a whole raw data set (Run 5 p-p) to Japan to a regional PHENIX Computing Center (CCJ) • Radically changed the analysis paradigm to a “train-based” one H.Göringer IT/EE Palaver GSI
Shifts in the Analysis Paradigm After a good amount of experimenting, we found that simple concepts work best. PHENIX reduced the level of complexity in the analysis model quite a bit. We found: You can overwhelm any storage system by having it run as a free-for-all. Random access to files at those volumes brings anything down to its knees. People don’t care too much what data they are developing with (ok, it has to be the right kind) Every once in a while you want to go through a substantial dataset with your established analysis code H.Göringer IT/EE Palaver GSI
Shifts in the Analysis Paradigm • We keep some data of the most-wanted varieties on disk. • that disk-resident dataset remains the same mostly, we add data to the collection as we get newer runs. • The stable disk-resident dataset has the advantage that you can immediately compare the effect of a code change while you are developing your analysis module • Once you think it’s mature enough to see more data, you register it with a “train”. H.Göringer IT/EE Palaver GSI
Analysis Trains • After some hard lessons with the more free-for-all model, we established the concept of analysis trains. • Pulling a lot of data off the tapes is expensive (in terms of time/resources) • Once you go to the trouble, you want to get as much “return on investment” for that file as possible - do all the analysis you want while it’s on disk If you don’t do that, the number of file retrievals explodes - I request the file today, next person requests it tomorrow, 3rd person next week • We also switched to tape (cartridge)-centric retrievals - once a tape is mounted, get all the files off while the getting is good • Hard to put a speed-up factor to this, but we went from impossible to analyse to an “ok” experience. On paper the speed-up is like 30 or so. • So now the data gets pulled from the tapes, and any number of registered analysis modules run over it -- very efficient • You can still opt out for certain files you don’t want or need H.Göringer IT/EE Palaver GSI
Analysis Train Etiquette Your analysis module has to obey certain rules: • be of a certain module-kind to make it manageable by the train conductor • be a “good citizen” -- mostly enforced by inheriting from the module parent class, start from templates, and review • Code mustn't crash, have no endless loops, no memory leaks • pass prescribed Valgrind and Insure tests This train concept has streamlined the PHENIX analysis in hard-to-underestimate ways. After the train, the typical output is relatively small and fits on disk Made a mistake? Or forgot to include something you need? Bad, but not too bad… fix it, test it, the next train is leaving soon. Be on board again. H.Göringer IT/EE Palaver GSI
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC),Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN)Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin) Les Cottrell (SLAC), Yee Ting Li (SLAC) • Computing in High Energy Physics • 13-17 February 2006 • http://xrootd.slac.stanford.edu • xrootd is largely funded by the US Department of Energy • Contract DE-AC02-76SF00515 with Stanford University H.Göringer IT/EE Palaver GSI
xrootd • 2 projects to integrate SRM and xrootd (SLAC, FNAL) • possible cooperation gStore – xrootd H.Göringer IT/EE Palaver GSI
xrootd read access • requested file on file server and in cache table in memory of redirector: immediate access 2. requested file on file server, but not in cache table: access after 5 s (promised to make it better) (file life time in cache table configurable, default 6 hours) 3. file on tape: access via mass storage interface with gstore: only for few files allowed! H.Göringer IT/EE Palaver GSI
prepare file servers for large scale xrootd read access • gStore fills file servers from tape o only 1 stage command o gStore knows optimal tape read sequence o gStore distributes optimal on all file servers o parallel streams o storage quota for groups can be implemented 2. gStore passes list of new files to xrootd via prepare() interface 3. xrootd updates it's cache tables in memory H.Göringer IT/EE Palaver GSI
actionsafter xrootd write to fileservers • xrootd writes via mass storage interface with gStore to tape • eventually also possible: asynchroneously with gStore: o periodical query (~hours) of fileservers and xrootd cache tables (?) o archive files new files no longer in xrootd cache tables (and not yet archived) H.Göringer IT/EE Palaver GSI
Web Addresses CHEP06 • http://www.tifr.res.in/~chep06/index.php • http://indico.cern.ch/confAuthorIndex.py?confId=048 • http://presidentofindia.nic.in/scripts/sllatest1.jsp?id=734 H.Göringer IT/EE Palaver GSI
Service Challenge Throughput Tests • Currently focussing on Tier0Tier1 transfers with modest Tier2Tier1 upload (simulated data) • Recently achieved target of 1GB/s out of CERN with rates into Tier1s at or close to nominal rates • Still much work to do! • We still do not have the stability required / desired… • The daily average needs to meet / exceed targets • We need to handle this without “heroic efforts” at all times of day / night! • We need to sustain this over many (100) days • We need to test recovery from problems (individual sites – also Tier0) • We need these rates to tape at Tier1s (currently disk) • Agree on milestones for TierXTierY transfers & demonstrate readiness H.Göringer IT/EE Palaver GSI
Achieved (Nominal) pp data rates Meeting or exceeding nominal rate (disk – disk) Met target rate for SC3 (disk & tape) re-run H.Göringer IT/EE Palaver GSI
Timeline - 2006 O/S Upgrade? (SLC4) Sometime before April 2007! H.Göringer IT/EE Palaver GSI