1 / 19

Belle Computing / Data Handling

Belle Computing / Data Handling. What is Belle and why we need large-scale computing? Current Belle computing system & data handling Planning for super-B era A case study. Youngjoon Kwon Yonsei Univ. Jysoo Lee KISTI. &. What is Belle ?. e - (8 GeV) e + (3.5 GeV).

adanne
Download Presentation

Belle Computing / Data Handling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Belle Computing / Data Handling • What is Belle and why we need large-scale computing? • Current Belle computing system & data handling • Planning for super-B era • A case study Youngjoon Kwon Yonsei Univ. Jysoo Lee KISTI &

  2. What is Belle ? e- (8 GeV) e+ (3.5 GeV) • KEKB asymmetric energy collider • e+ (3.5 GeV) e- (8 GeV) • design Luminosity = 1034 /cm2/s • E(cm) = 10.58 GeV on resonance of (4S) production • Belle detector optimized for • studying matter-antimatter asymmetry in the Universe

  3. The Belle Experiment. • To study matter-anitmatter asymmetry in B meson decays. • Accumulated 100 million pairs since turn-on in 1999. • Published 44 journal papers and over 200 conference contributions

  4. Belle’s need for large-scale computing • To achieve ½ of Belle’s physics goals: need ~108 events • Time required for “Real Data” analysis • 40 days/ 100Mevts / 1GHz • Need 10GHz/analysis to finish one data loop within 1 week • Belle produce ~ 20 papers/year • a typical paper takes ~2 years to finish analysis => 40 analyses being done simultaneously • Hence, we need ~400 GHz to sustain current activity of “real data” analysis alone • But, we also need Monte-Carlo sample (x4 in size) • 10 sec/evt/GHz => 130 years/GHz • Hence, need ~200 GHzto provide MC sample within a year • Need almost 1 THz to sustain physics analysis activites • We need additional CPU’s for raw data processing, etc.

  5. Central Belle computing system

  6. CPUs • Belle’s reference platform: Sparc’s running Solaris 2.7 • 9 workgroup servers (500 MHz, 4CPU) • 38 compute servers (500 MHz, 4CPU) • LSF batch system / 40 tape drives (2 each on 20 servers) • Fast access to disk servers • 20 user workstations with DAT, DLT, AITs • Additional Intel CPUs • Compute servers (@KEK, Linux RH 6.2/7.2) • 4 CPU (Pentium Xeon 500-700 MHz) servers~96 units • 2 CPU (Pentium III 0.8~1.26 GHz) servers~167 units • User terminals (@KEK to log onto the group servers) • 106 PCs (~50Win2000+X window sw, ~60 Linux) • User analysis PCs(@KEK, unmanaged) • Compute/file servers at universities • A few to a few hundreds @ each institution • Used in generic MC production as well as physics analyses at each institution • Tau analysis center @ Nagoya U. for example

  7. Disk servers @ KEK • 8TB NFS file servers • 120TB HSM (4.5TB staging disk) • DST skims • User data files • 500TB tape library (direct access) • 40 tape drives on 20 sparc servers • DTF2:200GB/tape, 24MB/s IO speed • Raw, DST files • generic MC files are stored and read by users(batch jobs) • ~12TB local data disks on PCs • Not used efficiently at this point

  8. Data storage requirements • Raw data: 1GB/pb-1 (100 TB /100 fb-1) • DST: 1.5GB /pb-1/copy (150 TB /100 fb-1) • Skims for calibration: 1.5GB /pb-1 • MDST: 50GB/fb-1 (5 TB /100 fb-1) • Other physics skims: 30GB/fb-1 (3 TB /100 fb-1) • Generic MC (MDST): ~20 TB/year • Total: ~450 TB/year

  9. CPU requirements – DST production • Goal: 3 months to reprocess all data • Often we have to wait for const. • Often we have to restart due to bad constants • 300 GHz (PIII) for 1fb-1/day

  10. CPU requirements – MC production • For every real data set, need to generate at least x3 as many MC events • 240 GB/fb data in the compressed format • No intermediate info (DC hits; ECL showers) are saved • With every new release of the s/w library, need to produce new generic MC sample • 400 GHz (PIII) for 1fb-1/day

  11. Data transfer to remote users • A firewall & login servers make the data transfer miserable (100 Mbps max.) • DAT tapes are used for massive data transfer • Compressed hadron skim files • MC events generated by outside institutions • Dedicated GbE network to a few institutions are now being added • Total 10 Gbit to/from KEK being added • Slow network to most other collaborators

  12. Compute problem? • Obviously, the existing computing resources are already stretched to over-capacity • Data set is doubling every year with no end in site. • Management of data and CPU is already a major burden • By far the most cost effective solution are large clusters of commodity PCs running Linux. • How to manage these? • GRID!

  13. Prototype GRID-style analysis • Need to run multi-parameter fitting program for CP violation measurement => a multi-CPU CP-fitter

  14. Planning for Super-B era • x15 increase in luminosity is planned c. 2006 • Data accumulation: ~ 2PB/year • Including MC’s, need 10PB of storage to start super-B • To re-process 2 year’s accumulation (2 ab-1) of data in 3 months, we need x30 CPU power • CPU @ KEK alone is not enough • A cluster of local data centers (connected by GRID) is planned! • One unit of LDC • 300 GHz + 60 TB + 3 MBps to KEK • Cost: $0.3M + $0.2M + $(Network) • Can we afford one?

  15. Belle-GRID – a case study Two Australian collaborators in Belle (U. Melobourne & U. Sydney) are working on a GRID prototype for Belle physics analyses

  16. Belle-GRID – a case study Blue-print for Belle-GRID in Australia

  17. Belle-GRID – a case study • Belle analysis using a Grid environment • useful locally » adopted by Belle » wider community • construction of a Grid Node at Melbourne • Certificate Authority to approve security • Globus toolkit... • GRIS (Grid Resource Information Service) - LDAP with Grid security • Globus Gateway - connected to local queue (GNU Queue; PBS?) • GSIFTP - data resource providing access to local storage • Replica Catalog - LDAP for virtual data directory • Replicate this in Sydney • initial test of Belle code with grid node & queue • data access via the grid (Physical File Names as stored in Replica Catalog) • modification of Belle code to access the data on the grid • test of Belle code with grid node & queue & grid data access • connect 2 grid nodes (Melbourne EPP and Sydney EPP) • test of Belle code running over separated grid clusters • implement or build Resource Broker

  18. Belle-GRID – a case study • Belle analysis test case… • Analysis of charmless B meson decays to 2 vector mesons, used to determine 2 angles of the CKM unitarity triangle. • Belle analysis code over Grid resources (10 files ; 2 GB total) • Data files processed serially 95 mins • Data files processed over Globus 35 mins • Data access (2 secure protocols GASS/GSIFTP ; 100 Mbit network) • NFS access for comparison 8.5 MB/s • GASS access 4.8 MB/s • GSIFTP access 9.1 MB/s • Belle analysis using Grid data access • NFS access for comparison 0.34 MB/s • GSIFTP data streaming 0.36 MB/s

  19. Summary • Belle’s computing resources are stretched to over-capacity. • Moreover, we are planning a x15 increase in luminosity (so called the “super KEKB”) within a few years. • Perhaps, Local Data Centers connected by GRID is the only viable option. • Two Australian groups are working on a Belle-GRID analysis prototype. So far it has been working as planned.

More Related