Calcolo @ CDF

A. Sidoti University of Pisa - INFN Pisa 1 Calcolo @ CDF Workshop sulle Problematiche di Calcolo e Reti nell'INFN 24-28 Maggio 2004Sant' Elmo Beach Hotel, Castiadas (CA) • Outline • The CDF experiment • CDF computing model • Interactions INFN – CDF

CDF Experiment@Tevatron • CDF is a high energy physics experiment at the Tevatron (Fermilab, Batavia, IL). Multipurpose detector. • Broad Physics Spectrum: • Top • Electroweak • SUSY searches • B-sector (and charm) • QCD • The Tevatron is a proton-antiproton collider with √s = 1.96 TeV • CDF is taking data since March 2001 • ~700 physicists are involved • Italy is the country giving the largest non-US contribution (~10%) • Other countries: • Canada • Japan,Korea,Taiwan • Spain,Germany,Switzerland,UK,Russia, Finland

Tevatron Run II Luminosity projections Yield 700 events/ fb-1 Dataset Bhad(SVT) HighPt Ele Size 28(5)TB 2TB Events 140M 15M • So far(May 04) ~340pb-1 on tape • Event size ~250kB (50kB if compressed) • dataset size: And will hopefully increase! Just two datasets are shown, many others (control datasamples)! Also duplicated for different versions Run2 reconstruction 20MEv/day ~200Hz 2GHz-sec per event Tevatron Luminosity Plan

Computing@CDF : History • In RunI it was difficult and inefficient to work far from Fermilab • ->Wanted to improve for RunII • Computing needs à la LHC before GRID! • Need to develop a computing model by our own. • No enough man power to start from scratch. • Strategy: • Integrating best solutions available in the CDF framework to build the CDF computing model. • Batch system: started with LSF -> FBSNG -> Condor • Data Handling: (cf A. Fella Talk) Sam and DCache • ….

Compile/link/debug everywhere • Submit from everywhere • Execute @ FNAL • Submission of N parallel jobs with single command • Access data from CAF disks • Access tape data via transparent cache • Get job output everywhere • Store small output on local scratch area for later analysis • Access to scratch area from everywhere • Installing GW and WN with ups/upd • IT WORKS NOW • Remote cloning works! My favorite Computer FNAL ftp rootd gateway CDF Central Analysis Farm My Desktop Log out job tape scratchserver N jobs enstore out dCache NFS rootd Local Data servers A pile of PC’s

Batch System • Batch system we are using now implements: • Jobs can run with different time limit (CPU and Real); prioritized with the time limit. • Fair sharing among users • Group of users might have higher privileges (and fair sharing among them) • This is implemented for the “Italian” portion of CAF we have at FNAL • FBSNG is our batch system queue(http://www-isd.fnal.gov/fbsng/) • If we need to switch to other batch systems (e.g. PBS) we need to have the same features

CondorCaf@FNAL ASCAF (Taiwan) KorCAF (Corea) CAF@CNAF CAF@FNAL UCSDCAF (SanDiego) Enstore, Tapes CDF computing scheme (offline) Raw Data Reprocessed Data (Raw+Physics Objects) User’s desktops • Reconstruction Farm: • Located at FNAL • FBSNG queue batch system • Users do not run jobs there Ntuple, rootuple • Central Analysis at Fermilab (CAF) • User analysis jobs are running (producing ntuples) • FBSNG Batch queue system • Authentication through Kerberos

Hardware resources in CDF-GRID

Job status on a web page, command line-mode to monitor job execution (top, ls, tail, debug) • Possible to connect a local gdb to a process running on a remote farm! • User decides where to run (FNAL, CNAF, San Diego, ….) • It would have been hard to have physics results for conferences and publication without building CDF-dCAFs • MC production off-site is a reality and necessity (at least a factor 10 more MC than data events) • We are running production on UCSDCAF Towards CDF-Grid… • CDF proposal: do offsite 50% of analysis work • Plan and core are ready (dCAFs) • Working hard on missing pieces • Our proposal: do 15% of analysis work in Italy in one year if enough resources

CNAF performance: data  CPU : OK • Data import : 1TB/day • ~120Mbit/sec • OK • Data export : •  output at FNAL • 200Mbits/sec achieved Data analysis: • Problem : • >100 processes read from same disk… performance drop to zero • Solution (home made): • Files are copied on worker node scratch disk and opened there • Queuing tool limits to 20 copies at the same time  file server feeds at 110MByte/sec (950Mbit/sec)

Sicurezza: CDF-Grid • Accedere a CDF-Grid dall’Italia e viceversa puo` diventare un incubo a causa delle diverse security policies adottate dalle diverse sezioni: • La sicurezza a FNAL si basa su Kerberos (5). • I dati vengono da FNAL (rcp, bbftp, ftp,…) kerberizzati • L’output dei jobs deve finire sui desktops delle sezioni (ftp, rcp kerberizzato …). • Le sezioni devono accettare di runnare un server (tutti vogliono runnare un client ma nessuno il server!) di una applicazione trusted per il trasferimento dati • Al momento sopravviviamo perche` i sys.man sono amici • Ma chiaramente non e` una soluzione scalabile • Vorremmo scelte univoche per tutte le sezioni INFN (almeno quelle che ospitano un gruppo CDF) • Stesso problema per l’interattivo

Interattivo nelle Sezioni • Qualche soluzione adottata nelle sezioni: • Pisa(D. Fabiani, E. Mazzoni): • Farm di CPU per interattivo • Storage Area Network in comune con gli altri esperimenti di sezione (Computing Storage Facility) • cf . E. Mazzoni • Padova(M. Menguzzato): • Creazione di un cluster Mosix con i diversi desktops del gruppo. • Completamente trasparente agli utenti • Ottime performances. • Per il momento AFS non e` montato (il codice di CDF e` disponibile su AFS) -> Problema grave!

Batch • Il contributo italiano al CDF-Grid e` stato fondamentale e sarebbe stato difficile senza l’aiuto dei gruppi calcolo delle sezioni INFN. • Bologna(F. Semeria – O. Pinazza): • Attivita` di test ed installazione a Bologna del sistema di code FBSNG usato sulla CAF • Sistema di web monitoring delle dCAF. • CNAF(F. Rosso et al.): insostituibile l’aiuto dei sistemisti CNAF per installazione hardware/software della CAF@CNAF. Attivita` di installazione DH (dCache e SAM cf talk A. Fella).

Batch II • Frascati(I. Sfiligoi): • Implementazione icaf: area scratch delle dCAF e tool grafici. • Installazione di Condor in sostituzione di FBSNG sulle dCAF (al momento due dCAF usano CondorCAF: UCSFCAF e CondorCAF) • Implementazione di PEAC (Proof Enabled Analysis Cluster) • Demo presentata al SC2003 (Phoenix, AZ)

PEAC Significant part of analysis involves interactive visualization of histograms Large ntuples (1-100 GB) will be inevitable -> time processing by Root can be long Physicist tend to loose “inspiration” with time PEAC extends the concept of batch job to interactive analysis Borrow CPU from batch processes for brief period Use PROOF to parallelize the rootuple access Demo at SC2003 Results (http://hepweb.ucsd.edu/fkw/sc2003/ ): Analysis B+->D0p (6GB ntuples) Plot takes 10 minutes on a P4 2.6GHz On INFN farm with 12 Proof slaves 1st pass 39 s 2nd pass 22s (ntuples cached)

Conclusioni • CDF sta costruendo una griglia computazionale per MC and Analysis. • Tre anni prima di LHC! • Fondamentale per poter fare analisi di fisica (e tenere il passo dei dati che raccoglieremo (speriamo!) ) • Difficile fare analisi e costruire le dCAF senza il supporto dei gruppi di calcolo locali INFN.

Ringraziamenti S. Belforte, M. Neubauer, M.Menguzzato, A. Fella, I. Sfiligoi, F. Wurthwein per il materiale fornito

BackUP

Level 1 Level 2 CDF rates, or why data to analyze do not scale with L 1. Luminosity changes by a factor 3 in a 16 hour run 4.5 E31 1.5 E31 2. Triggers at Level 1 are automatically prescaled 3. Rate to tape stays in [50,70] Hz at all times Level 3

Mosix a Padova • 1 Quadriproc 4x Xeon PIII 700 Mhz (2.5TB) di disco esportato via NFS (master, accetta il login) • 2 biproc 2x Xeon PIV 2.4GHz (400GB each) (slaves, non accettano login) • I jobs migrano dal master alle slaves : • fattore 3 piu` veloci • Comodita` per l’utente per totale trasparenza • Problemi: • gmake solo in modalita` NON OpenMosix • OpenAFS non compila (problema importante e documentato). Risolto in vecchie release di OpenMosix ma si e` riproposto. • Possibili soluzioni: • Upgrade! • Ricompilare OpenAFS con NFS enabled e montare AFS su una macchina che non fa parte del cluster • Mirror del ramo AFS (giornaliero)

The landscape • DAQ data logging upgrade • More data = more physics • Approved by FNAL’s Physics Advisor Committee and Director • Computing needs grow, but DOE/Fnal-CD budget flat • CDF proposal: do offsite 50% of analysis work • CDF-GRID • We have a plan on how to do it • We have most tools in use already • We are working on missing ones (ready by end of year) • Our proposal: do 15% of analysis work in Italy possible !

http://cdfcaf.fnal.gov Monitoring the CAFs

Also non-FNAL developed monitor tools CNAF monitoring Ganglia

Tevatron: Luminosity Integrated Luminosity is a key ingredient for Tevatron RunII success. Analysis presented here is based on different integrated luminosity period (72pb-1) Record Peak Luminosity (05/02/2004) 6.11031 cm-2 s-1 CDF Takes efficiency at >85% Silicon integrated most of runs CDF and DØ are collecting 1pb-1/day

CDF Tevatron Run II Luminosity projections Yield 700 events/ fb-1 Dataset Bhad(SVT) HighPt Ele Size 28(5)TB 2TB Events 140M 15M • Proton-antiproton collider means: • Larger number of physics object • Events are bigger (storage, I/O) • Reconstruction and analysis need more CPU power Typical bbar events • So far(May 04) ~240pb-1 on tape • Event size ~250kB (50kB if compressed) • dataset size: Just two datasets are shown, many others (control datasamples)! Also duplicated for different versions Tevatron Luminosity Plan Run2 reconstruction 20MEv/day ~200Hz 2GHz-sec per event

Production Farm Events processed per day Total # evts processed

FNAL (totale 2004, including test systems, spares etc.) 500 duals 184TB disco 30% non-FNAL-owned FNAL (INFN 2003) 162 duals 34TB CNAF-CDF (attuale) 54 duals 8.5 TB Analysis Farms: USA vs.ITALIA

31 31 Transverse Mass

Calcolo @ CDF

Calcolo @ CDF

Presentation Transcript

Calcolo CDF

CDF

CDF

CDF