ATLAS Database Deployment and Operations Requirements and Plans

ATLAS Database Deployment and Operations Requirements and Plans WLCG 3D Project Workshop CERN, Geneva, Switzerland September 13-14, 2006 Alexandre Vaniachine (Argonne)

Outline • ATLAS Database Deployment and Operations activity • ATLAS Database Applications Requirements Update • Geometry DB • COOL DB • TAG DB • Muon DB • ATLAS Plans • Conclusions Alexandre Vaniachine

Database Deployment & Operations: Activity • Since the previous 3D Workshop in ATLAS Computing a high-profile “Database Deployment & Operations” activity is now defined • The activity consists in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure http://uimon.cern.ch/twiki/bin/view/Atlas/DatabaseOperations Alexandre Vaniachine

Database Deployment & Operations: Domains The activity is shaped in four domains: • Distributed Deployment - Grigori Rybkine (U.K.) • LCG Deployment: Mireia Dosil (Spain) and Suijian Zhou (Taiwan) • OSG Deployment: Yuri Smirnov (U.S.) • NorduGrid Deployment: Frederik Orellana (Denmark) • Legacy Sites (batch): John Idarraga (Canada) • Distributed Operations (ATLAS subset of WLCG 3D operations) • Tier0/1 operations: Gancho Dimitrov, Florbela Viegas (CERN) • Tier1/2 operations: Stefan Stonjek (U.K.) • Distributed calibration centers: Manuela Cirilli (U.S.) • Development - Jerome Fulachier (France) • Monitoring, FroNTier/Squid, Dynamic Deployment,… • Documentation & User Support - vacant • Data access, File Transfer (by users), Best Practices, ... • Further information is at http://uimon.cern.ch/twiki/bin/view/Atlas/DatabaseOperations Alexandre Vaniachine

ATLAS Software & Computing Workshop • The first Database Deployment and Operations session was held during the ATLAS Software Workshop this week http://indico.cern.ch/conferenceDisplay.py?confId=a057208#2006-09-12 • The very first session was quite full • Which is a good sign for the newly defined activity Alexandre Vaniachine

Database Requirements Update • During the session the requirements updates for all four major ATLAS database applications were collected Alexandre Vaniachine

Centralized Database Applications • There are many ATLAS database applications at centralized at CERN like the production system database or online applications, e.g. • online COOL • PVSS archive and config • histogram archiving • subdetector databases • Since this is a distributed databases workshop, where centralized databases are less relevant their requirements are not covered in this talk Alexandre Vaniachine

Geometry DB Data size • ATLAS geometry description is improving, more realism is being added – that results to new sets of primary numbers in the Geometry Database • So the geometry DB size is growing, but NOT Very Rapidly! • Some numbers for comparison • 11.0.4X 14 ATLAS geometry tags, SQLite replica size ~10 MB • 12.0.3 30 ATLAS geometry tags, SQLite replica size ~18 MB (~4 MB compressed) Alexandre Vaniachine Slideby V. Tsulaia

Geometry DB Data Access • Access from Athena applications – RDBAccessSvc • Stable interface • The implementation is changing following the development of underlying libraries (CORAL) • Significant modification in db connection configuration for 12.X.X, due to migration to the CORAL Connection Service Alexandre Vaniachine Slideby V. Tsulaia

Geometry DB Data Replication • The master copy located at ATLAS RAC, CERN Oracle Service. The data is replicated using various RDBMS technologies for worldwide usage • Oracle to SQLite. The main replication strategy for the moment. • Independent distribution of SQLite replicas < 12.0.0 (pacman) • SQLite replica is part of the DB Release >= 12.0.0 • Oracle to MySQL. Was used actively up to 12.0.0. Currently is performed ‘on demand’ only • Oracle to Oracle, provides a possibility of direct replication to T1 Oracle servers Alexandre Vaniachine Slideby V. Tsulaia

Geometry DB Outlook • We don’t expect the rapid growth of Geometry DB size in the near future • One significant new feature - the migration of ID dictionaries to the Geometry DB • Development of selective replication tools (CORAL based) Alexandre Vaniachine Slideby V. Tsulaia

Online/offline conditions database servers Tier-1 replica ATLAS pit Computer centre Outside world Calibration updates ATLAS pit network (ATCN) CERN public network gateway Tier-1 replica Offline master CondDB Online OracleDB Online / PVSS / HLT farm Streams replication Tier-0 recon replication Dedicated 10Gbit link Tier-0 farm • Database server architecture split into online and offline Oracle servers • Online server on ATCN network (though in computer centre), reserved for online • Automatic replication to offline ‘master’ conditions database server • Calibration updates go to online if needed for online running, to offline otherwise • Further replication to Tier-1 Oracle servers and beyond Alexandre Vaniachine Slide by R. Hawkings

Conditions data sources • From online system (written to online Oracle server) • Configuration data for each run (written at run start) • DCS (detector control system) data from PVSS system • PVSS Oracle archive and dedicated PVSS Oracle to COOL transfer processes • … written ~continuously (bulk updates every few minutes) in and out of data-taking • Monitoring information from TDAQ and detector monitoring systems • Updated calibration constants derived and used online (e.g. dedicated runs) • DCS will likely dominate pure relational data: rough estimate O(100s GB/year) • Large-volume calibration data stored in POOL files (using DDM), referenced in COOL • From prompt calibration processing (written to offline Oracle server) • Calibration constants derived during 24 hours between end of fill and first processing of bulk physics sample • From offline calibration processing (at Tier-0 and external sites, esp Tier-2) • Updated calibration constants for use in subsequent reprocessing (at Tier-1s) • Updated information on data quality (used in lumiblock definition) • Some offline conditions will have to be fed back to online system (via gateway) Alexandre Vaniachine Slide by R. Hawkings

Uncertainties in DCS Data Volumes • The raw PVSS data flow (without ‘zero suppression’) could reach 6 TB of Oracle storage per year • PVSS has enough tools to develop the ‘zero suppression’ • It is a responsibility of ATLAS subsystems to implement the PVSS raw data volume reduction in size • Also not all PVSS conditions data have to be replicated to Tier1s • E.g. the PVSS Oracle archive (which will not go outside of CERN, maybe not even to the offline server) • It is only a part of DCS data which goes to COOL DB • this is the part that have to be replicated at Tier1 sites • this latter transfer should certainly be significantly less than 6 TB per year Alexandre Vaniachine

Conditions data replication to Tier-1s • Role of Tier-1s according to the ATLAS computing model • Long term access to and archiving of fraction of RAW data • Reprocessing of RAW data with new reconstruction and calibration constants • Collaboration-wide access to resulting processed data (ESD, AOD, TAG, …) • Managed production of derived datasets for physics groups • Some calibration processing (especially that requiring RAW data) • Host simulated data processed at associated Tier-2s and others • Need enough conditions data to do full reprocessing of RAW data • All calibration data (at least for non-prompt reconstruction passes) • COOL DCS data needed in reconstruction (e.g. module HVs, perhaps temps.) • Not yet clear what fraction of total DCS data really needed in reconstruction • Should not need all online monitoring and configuration data • Tier-1 conditions data using Oracle, replicated from Tier-0 by Oracle Streams • In principle straightforward to ‘replicate everything’ - not COOL-API-specific • How to apply selectivity? E.g. don’t need all COOL folders for Tier-1 replication • Can do Streams replication configuration at table level, but is extra complication worthwhile - are we close to networking/Tier-1 storage limits? • Table/COOL folder creation would have to be managed more carefully Alexandre Vaniachine Slide by R. Hawkings

Conditions data at Tier-2s and beyond • Role of Tier-2s in according to ATLAS computing model • Simulation production (sending data to Tier-1s) • User analysis of AOD and small samples of ESD data • Calibration processing of AOD and ESD data • Different sites will have different roles depending on size and user community • Database replication requirements • Simulation typically done with long release cycles, steady production • Static replication to SQLite files included in DBRelease (as at present) may be enough • User analysis of AOD and ESD, and calibration proccesing • Likely to need some access to subsets of latest conditions data - dynamic replication • Possible solutions for dynamic replication of COOL data • Replication through COOL-API to MySQL servers, using a ‘synchronisation’ mechanism - some work has been done, not usable before COOL 1.4 • Requires ‘active’ MySQL servers and support at participating Tier-2s • Frontier - web-caching of database query results in a proxy server • Requires only proxy servers at Tier-2s - easier to configure and manage? • Problem of ‘stale data’ - needs careful version control and flushing of conditions data Alexandre Vaniachine Slide by R. Hawkings

Conditions data requirements in calibration/alignment challenge • Testing in the calibration / alignment challenge • Initial phase - simulation, starting now • A few sets of conditions data corresponding to different detector configurations (ideal, as-build) included in the database release • Will be used for large-scale simulation / reconstruction with ‘static’ conditions data • Second phase: real-time calibration exercise • Want to rapidly propagate new conditions data around collaboration • E.g. for re-reconstruction of samples at Tier-1s, using new calibration • Need at least Oracle Streams replicas at Tier-1s (in ‘production’ mode) • Would be useful to also try replication solutions for Tier-2 and beyond • Depends on progress with tools (dynamic COOL replication and Frontier) in the next few months… Alexandre Vaniachine Slide by R. Hawkings

TAG DB Users and Interfaces • Production of TAG’s • Tier 0/1 Operations (80%) (POOL) • Physics Groups (15%) (POOL) • Users (5%) (POOL) • Data Lookup via TAG’s • Physics Groups (Tag Navigator Tool) • Users (Tag Navigator Tool) • Data Mining via TAG’s • Users (variable) For other categories, please come to Data Management II Session Alexandre VaniachineATLAS Software Week Slide by J. Cranshaw

Tier 0 Processing DDM Datasets ‘Tier 0’ TAG Production Stages • Relates to any official production of TAG files including those from reprocessing at Tier 1’s. LoadDB POOL, T0MS Mass Storage (Distributed) A B A: Loading Table (Partitioned) B: Archive Table (Partitioned) Alexandre VaniachineATLAS Software Week Slide by J. Cranshaw

TAG DB Distribution to Tier1s and Tier2s • Various TAG DB distribution models being considered Alexandre Vaniachine

Sizes and Rates • Nominal DAQ rate is 200 Hz • TAG size is 1 kB/ev. • With the expected duty cycle this gives • 2 TB/yr of TAG data from beam operations. • 2 TB/yr of TAG data from reprocessing • (reduced by 1.5TB/yr if replacement policy implemented?) • 0.5 TB/yr of derived TAG data. • ×3 (files, tables, indexes) TOTAL: (2+(2-1.5)+0.5)×3 = 9 TB/yr • In the model where the relational TAG DB is queried during job definition, non-CERN relational TAG DB is for load balancing/availability. Alexandre VaniachineATLAS Software Week Slide by J. Cranshaw

Query Model • Class 1: Queries which deliver summary information about the selection: count, average, sum, etc. • Essentially simple data mining. • Class 2: Queries which deliver the reference to the data for a selection. • Assume this is the most common. • Class 3: Queries which return all collection information, metadata+reference. • We would like to minimize these. Alexandre VaniachineATLAS Software Week Slide by J. Cranshaw

General plan Muon MDT Calibration. A stream of raw MDT event data is sent from CERN to three calibration centers: Michigan, Rome, Munich Event block is sent approx once per hour (approx 100 GB / day) Need real-time Conditions data within < 1 hour) The centers process geographical regions with some overlap. Any center could take over from another. Calibration output goes into COOL CondDB Would like a complete calibration once per day. t-zero per tube (350,000 tubes ) Rt relation per region ( 10,000 regions) Will do Monitoring and storage of diagnostic information at same time as calibration. Alexandre Vaniachine Slide by J. Rothberg

Calibration process events 100 Calib jobs Statistics, Track segments (charge, time, position, flags) Histos (time, ADC) per tube Offline replica CondDB Local copy Histos DB CondDB diagnostics T, B field maps, run conditions, Iterate, Generate Rt table tube flags Rt Table CERN Per region At each Calibration center Local Calib DB CORAL Alexandre Vaniachine Slide by J. Rothberg

Merge and Create CondDB 3 Calib centers CERN Local Calib DB 40 MB/day Local Calib DB Merged Calib DB Validate, merge CORAL Local Calib DB Generate COOL CondDB, Calibrations, tube flags ATHENA COOL Cond DB MERGE, generate COOL DB once per day. Monitor info sent once per hour. Alexandre Vaniachine Slide by J. Rothberg

Recent Progress at the Centers Michigan (Shawn McKee with help from Gancho, Florbela, and Eva) set up Orcale DB and CERN replica (streams) Rome (1,3) installed demo Oracle 10g and a preliminary set of tables. (Domizia Orestano, Paolo Bagnaia, Paola Celio, Elisabetta Vilucchi ) Access to tables using CORAL (Domizia) Near future: Calib Tables read by CORAL, write COOL DB (Rome, CERN) Fill tables with test beam data; run validation codes (Rome) Test replicas and data transfer (UltraLight). Michigan. Alexandre Vaniachine Slide by J. Rothberg

Plans • ATLAS Milestones relevant to the 3D operations Alexandre Vaniachine

ATLAS Calibration Data Challenge - Details • Requirements form ATLAS Calibration Data Challenge Coordinator – Richard Hawkings Alexandre Vaniachine

Conclusions Requirements • TAG DB is the largest among four ATLAS distributed database applications (Geometry DB, COOL DB, TAG DB and Muon DB) • At the nominal LHC rate it is estimated to require about 10 TB of Oracle storage per year per Tier 1 site. • The TAG DB is also very demanding in throughput requirements for the Oracle streams replication • The next ATLAS database application in size is COOL DB used to store the calibrations and conditions data • Current estimate – 100s of GB • Main uncertainty – DCS data “zero suppression” factor • The Muon DB application in ATLAS Distributed Muon Calibration Data Centers provides the most demanding requirements on the latency and high-availability of the Oracle streams replication Plans • Start of 3D operations in October fits well ATLAS Calibration Data Challenge requirement demanding COOL DB replication to Tier1s Alexandre Vaniachine

ATLAS Database Deployment and Operations Requirements and Plans

ATLAS Database Deployment and Operations Requirements and Plans

Presentation Transcript

Functional Requirements Status and Plans

Database Requirements and Design

Requirements, Development Plans, and Requests

Functional Requirements Status and Plans

OSG Software and Operations Plans

Development, Deployment and Operations of ATLAS Databases

B-Factory Operations and Plans

Requirements, Development Plans, and Requests

ATLAS Results and Plans

RHIC Operations and Plans

ATLAS Database Operations

ATLAS Detector – status and plans

ATLAS Detector : status and upgrade plans

ATLAS Computing Status and Plans

Regional ITS Architectures and Deployment Plans

VST ATLAS: Requirements, Operations and Products

ATLAS : status, limitations and upgrade plans

ATLAS Detector – status and plans

gLite Certification, Deployment, and Operations Process

Database Deployment and Operations

Operations and plans - RDIG T2s