1 / 22

Status of LHCb-INFN Computing

Status of LHCb-INFN Computing. CSN1, Catania, September 18, 2002 Domenico Galli, Bologna. LHCb Computing Constraints. Urgent Need of production and analysis of large number of MC data sets in a short time. LHCb-light detector design . Trigger design , TDRs .

eshe
Download Presentation

Status of LHCb-INFN Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

  2. LHCb Computing Constraints • Urgent Need of production and analysis of large number of MC data sets in a short time. • LHCb-light detector design. • Trigger design, TDRs. • Need to optimize the hardware and software configuration to minimize dead time and system administration effort. Status of LHCb-INFN Computing, 2 Domenico Galli

  3. LHCb Farm Architecture (I) • Article in press on Computer Physics Communications: • “A Beowulf-class computing cluster for the Monte Carlo production of the LHCb experiment”. • Disk-less computing nodes, with operating systems centralized on a file server (Operating System Server). • Very flexible configuration, allows adding and removing nodes from the system without any local installation. • Useful for computing resources shared among different experiments. • Extremely stable system: no side effects at all in more than 1 year of work. • System administration duties minimized. Status of LHCb-INFN Computing, 3 Domenico Galli

  4. LHCb Farm Architecture (II) • Security • Usage of private IP addresses and Virtual LAN. • High level of isolation from the Internet network. • Extern accesses (afs servers, bookkeeping database, CASTOR library at CERN) through Network Address Translation technology on a Gateway node. • Potential system “Single Points of Failure” equipped with redundant disk configuration. • RAID-5 (2 NAS). • RAID-1 (Gateway and Operating System Server). Status of LHCb-INFN Computing, 4 Domenico Galli

  5. Disk-less node Red Hat 7.2 (kernel 2.4.18) Mirrored disks (RAID 1) CERN Red Hat 6.1 DNS Kernel 2.2.18 NAT (IP masquerading) PBS Master MC control server Gateway Farm Monitoring NAS NAS 1TB RAID 5 1TB RAID 5 Private Public VLAN VLAN Uplink Control Node Fast Ethernet Processing Node 1 Switch Disk-less nodes CERN Red Hat 6.1 Processing Node n Kernel 2.2.18 PBS Slave Master Server Red Hat 7.2 Ethernet OS file-systems Link Home directories Various services: Power Control Power Distributor PXE remote boot, Mirrored disks (RAID 1) DHCP, NIS LHCb Farm Architecture (III) Status of LHCb-INFN Computing, 5 Domenico Galli

  6. Fast ethernet switch Rack (1U dual-processor MB) NAS, 1TB Ethernet controlled power distributor (32 channels) Status of LHCb-INFN Computing, 6 Domenico Galli

  7. Data Storage • Files containing reconstructed events (OODST-ROOT format) are transferred to CERN using bbftp and automatically stored on the CASTOR tape library. • Data transfer from CNAF to CERN performed with a maximum throughput of 70 Mb/s (on a 100 Mb/s link). • To be compared with ~15 Mb/s using ftp. Status of LHCb-INFN Computing, 7 Domenico Galli

  8. 2002 Monte Carlo Production • Target • Production of large event statistics for the design of the LHCb-light detector and of the trigger system (trigger TDR). • Software: • Simulation (FORTRAN) and reconstruction (C++) code to be used in the production supplied in July. • LHCb Data Challenge ongoing (August-September) • Participating Computing Centers : CERN, INFN-CNAF, Liverpool, IN2P3-Lyon, NIKHEF, RAL, Bristol, Cambridge, Oxford, ScotGrid (Glasgow & Edinburgh) Status of LHCb-INFN Computing, 8 Domenico Galli

  9. Status of Summer LHCb-Italy Monte Carlo Production (Data Challenge) • Events produced in Bologna (Aug., 1 –Sep., 12): 1,053,500 Status of LHCb-INFN Computing, 9 Domenico Galli

  10. Distribution of Produced Events Among Production Centers (August, 1–September, 12) • The other above mentioned centres are late on the Data Challenge start date. Status of LHCb-INFN Computing, 10 Domenico Galli

  11. Usage of the CNAF Tier-1 Computing Resources • Computing, Control and Service Nodes: • 130 PIII CPUs (clock ranges from 866 MHz to 1.4 GHz) • Disk Storage Servers • 1 TB NAS (14 x 80 GB IDE disks + hotspare in RAID5). • 1TB NAS (7 x 170 GB SCSI disks + hotspare in RAID5). • All the stuff is working at a very high duty-cycle. CPU LOAD Status of LHCb-INFN Computing, 11 Domenico Galli

  12. Plan for Analysis Activities • In autumn the analysis of the data produced during the Data Challenge is foreseen. • Complete porting to Bologna of the development environment of the analysis code (DaVinci C++ code) already performed and in use on a mini-farm since 2 months. • Need of an extension of the analysis mini-farm to a grater number of nodes for the need of the Italian LHCb collaboration. • Data produced in Bologna are kept stored on Bologna disks, data produced in the other centers need to be transferred to Bologna on user-demand with an automatic procedure. • Analysis jobs (on ~100 CPUs) need an I/O throughput (~100MB/s) greater than supplied by NAS (~10MB/s). Status of LHCb-INFN Computing, 12 Domenico Galli

  13. CN 1 ION 1 ION 2 CN 2 I/O nodes Clients Network ION n CN m MGR Management Node High Performance I/O System (I) • An I/O parallelization system (through the use of a parallel file system) was successfully tested. • PVFS (Parallel Virtual File System). • File striping of data among local disks of several I/O servers (ION). • Scalable System (throughput ~ 100 Mbit/s x n_ION) Status of LHCb-INFN Computing, 13 Domenico Galli

  14. High Performance I/O System (II) • With 10 ION we were able to reach the Aggregate I/O of 110 MB/s (30 client nodes reading data). • To be compared with: • 20-40 MB/s (local disk) • 10 MB/s (100Base-T NAS) • 50 MB/s (1000Base-T NAS) • With a single file hierarchy. Status of LHCb-INFN Computing, 14 Domenico Galli

  15. Test of a PVFS-Based Analysis Facility (I) • Test performed using the OO DaVinci algorithm for B+– selection. • Analyzed 44.5k signal events and 484k bb inclusive events in 25 minutes (to be compared with 2 days on a single PC). • Completely performed with the Bologna Farm parallelizing the analysis algorithm over 106 CPUs (80 x 1.4 GHz PIII CPUs + 26 x 1 GHz PIII CPUs). • DaVinci processes read OODST from PVFS. Status of LHCb-INFN Computing, 15 Domenico Galli

  16. Test of a PVFS-Based Analysis Facility (II) OODST CN 1 ION 1 OODST CN 2 ION 2 Nt-ple PVFS ION 10 OODST CN 106 MGR Login Node Status of LHCb-INFN Computing, 16 Domenico Galli

  17. Test of a PVFS-Based Analysis Facility (III) • 106 DaVinci processes reading from PVFS. • 968 files (500 OODST events each) x 120 MB. • 116 GB read and processed in 1500 s. Status of LHCb-INFN Computing, 17 Domenico Galli

  18. B+–: Pion Momentum Resolution p / p for identified pionscoming from B0 |p / p| vs p for identified pionscoming from B0 p / p FWMH  0.01 p / p p [GeV/c] Status of LHCb-INFN Computing, 18 Domenico Galli

  19. B0 Mass Plots All pi+ pi- pairs with no cuts • Pt> 800 MeV/c • d/d> 1.6 • lB0 > 1 mm All pi+ pi- pairs with all cuts All pi+ pi- pairs with all cuts (magnified) 3425 events MeV/c2 MeV/c2 FWMH  66 MeV 105 events MeV/c2 Status of LHCb-INFN Computing, 19 Domenico Galli

  20. bb Inclusive Background Mass Plot • Total number of events 484k. • Only events with single interaction taken into account at the moment: ~240k. • 213 events in mass region after all cuts. • 32/213 are ghosts. All pi+ pi- pairs with all cuts GeV/c2 Status of LHCb-INFN Computing, 20 Domenico Galli

  21. Signal Efficiency and Mass Plots for Tighter Cuts • Final Efficiency (tighter cuts) @zero bb inclusive background (240k events) = 871/22271 = 4% • Rejection against bb inclusive background >1-1/240000 = 99.9996% 871 signal events in mass region 16 BG events from signal sample in mass region (all ghosts) • Pt> 2.8 GeV/c • d/d > 2.5 • lB0 > 0.6 mm GeV/c2 GeV/c2 Status of LHCb-INFN Computing, 21 Domenico Galli

  22. Conclusions • MC production farm stably running (with increasing resources) since more than 1 year. • INFN Tier-1 is the second most active LHCb MC production centre (after CERN). • The collaboration with the CNAF staff is excellent. • Still we aren’t using GRID tools in production, but we plan to move as soon as the detector design is stable. • An analysis mini-farm for interactive work is running since more than 1 month and we plan to extend the number of nodes depending on the availability of the resources. • Massive analysis system architecture already tested using a parallel file system and 106 CPUs. • We need at least to keep the present computing power at CNAF (but more resources to keep production running in parallel with massive analysis activities would be welcome) to supply the analysis facility to the LHCb-Italian collaboration. Status of LHCb-INFN Computing, 22 Domenico Galli

More Related