1 / 17

User Forum 2 Data Management System

User Forum 2 Data Management System. “Tell me and I forget. Show me and I remember. Involve me and I understand.” Chinese proverb. Still to come…. LHCb Computing Model (simplified) Requirements for Data Management System Introduction to DIRAC DIRAC Data Management System (DMS)

Download Presentation

User Forum 2 Data Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Forum 2Data Management System “Tell me and I forget. Show me and I remember. Involve me and I understand.” Chinese proverb

  2. Still to come… • LHCb Computing Model (simplified) • Requirements for Data Management System • Introduction to DIRAC • DIRAC Data Management System (DMS) • Core DIRAC DM Components • Bulk Transfer Framework • Data Driven Automated Transfers • Reliable Data Management • Overview of EGEE resources used

  3. LHCb Computing Model (Simplified) Reconstructed RAW File (rDST) Stripped File (DST) Reconstruction Job Stripping Job DST Broadcast RAW Physics File RAW Replication

  4. DM Requirements in Numbers 2GB RAW file every ~30s Upload to Castor at 40MB/s on 1GB dedicated link. Each RAW file replicated from Castor to 1 of LHCb’s 6 Tier1s using shared 10GB links. Aggregated 40MB/s. Each Stripped DST produced is replicated to all Tier1s using dedicated network. Each Tier1 (on average) ~11MB/s in AND out..

  5. Introduction to DIRAC • DIRAC is LHCb’s Grid Workload and Data Management System • Initial incarnation as LHCb production system • Since evolved into a generic Community Grid Solution • Either stand-alone environment or • Community Overlay Grid System (COGS) • Architecture based on Services and Agents • Implementing Service Oriented Architecture • VO specific utilities can be tailored as required • Demonstrated 10k concurrently running jobs • Management of O(10M) Data files and replicas • See Stuart Paterson’s talk for more on WMS

  6. DIRAC Core Data Management System • The main components are: • Replica Manager • File Catalogues • Storage Element and access plug-ins Core DM Clients User Interface WMS DM Agents FileCatalogueB ReplicaManager FileCatalogueA LCG File Catalogue Core DM Components StorageElement StoragePlugInX RFIOStorage SRMStorage SE Service Physical Storage

  7. DM Core Components • Replica Manager provides logic for DM operations • Interaction with StorageElement • File upload/download/removal to/from Grid, File replication across SEs • Interaction with File Catalog API • File/replica registration/removal, Obtain replica information • Logging of operations returned to client • StorageElement is an abstraction of a Storage facility • Access provided by plug-in modules for access protocols • Current plug-ins: srm, gridftp, bbftp, sftp, http • File Catalogue API • All file catalogues offer same interface • Can be used interchangeably • LCG File Catalog (LFC), ProcessingDB…. VO specific resources easily integrated

  8. Other Key Components • Data Management requests stored in RequestDB • XML containing parameters for DM operation • Operation type, LFN, etc…. • Requests obtained and placed through RequestDB Service • Transfer Agent polls RequestDB Service for work (multi-threaded) • Contacts Replica Manager to perform DM operation • Full log of operations returned • Retries based on logging info • Until success • Redundacy built-in Replica Manager Failed ToDo Request Database RequestDB Svc Transfer Agent

  9. FC API RAW Upload to Castor DIRAC @ Online Gateway LHCb ONLINE SYSTEM ToDo RequestDB Svc Request Database Data Mover Online Run Database Done Online Storage Transfer Agent Done Replica Manager CERN-IT RFIO Plugin ADTDB LFC File movement Request movement rfcp

  10. SRM/G-U-C Bulk Data Transfers • gLite File Transfer Service (FTS) • Provides point-to-point reliable bulk transfers • Channel architecture • SURLs at SRM X to SURLs at SRM Y • Utilizing high throughput dedicated networks • Network resources pledged to WLCG • CERN-Tier1s • Tier1-Tier1 matrix • DIRAC DM System Interfaced to FTS • Use FTS CLI to submit and monitor jobs • DIRAC DM System • Scheduling and placement of transfers • Preparing source and target SURLs EGEE Replica Manager FTS Svc Transfer Agent

  11. Data Driven Production Management • DIRAC DM components developed to perform data driven management • AutoDataTransferDB (AdtDB) contains pseudo file catalogue • Offers API to manipulate catalogue entries • Based on ‘transformations’ contained in the DB • Transformations defined for each DM operation to be performed • Defines source and target SEs • File mask (based on LFN namespace) • Number of files to be transferred in each job • Can select files of given properties and locations • Replication Agent manipulates AdtDB API • Checks active files in AdtDB • Applies mask based on file type • Checks the location of file • Files which pass mask and match SourceSE selected for transformation • Once threshold number of files found FTS jobs created • ReplicationAgent logic generalised to support multiple transformation types

  12. FC API RAW Replication DIRAC @ Online Gateway LHCb ONLINE SYSTEM ToDo RequestDB Svc Request Database Data Mover Online Run Database Done Online Storage Transfer Agent • When file uploaded to Castor registered in AdtDB • This is the hook to data driven replication Done Replica Manager CERN-IT RFIO Plugin ADTDB LFC File movement Request movement rfcp

  13. Request movement File Movement SRM/G-U-C RAW Replication II DIRAC DM System WLCG Replication Agent Replica Manager AdtDB FTS Svc Request Database RequestDB Svc Transfer Agent Tier1 SRM LFC • After replication registration • LFC and ProcessingDB • ProcessingDB drives data driven reconstruction and stripping jobs

  14. Reliable Data Management • LHCb dedicated VO Box provided at Tier1s • DIRAC instance installed • RequestDB service • TransferAgent • Provides failover mechanism • File upload from WN to associated SE • If fails alternative SE chosen, ‘move’ request put to VO box • Also provided initial mechanism for DST distribution • DST uploaded to associated Tier1SE • ‘Replication’ requests put to VOBoxes • Proven capable of 100MB/s integrated across all Tier1s.

  15. Use of Resources • During LHCb’s DC06 DIRAC’s DM System • Stored 3.8M files at CERN + Tier1s • 292TB of tape • 262TB of disk • +registration in the LCG File Catalogue

  16. Summary • DIRAC core DMS extensible, reliable, redundant • VO specific resources plug-able • 5 years of experiencing managing LHCb data • Data driven operations to meet LHCb computing model • Initial upload of RAW physics files • Replication to Tier1s • Broadcast of DSTs • In the last year DIRAC DMS handled 3.8M files/replicas • 292TB of tape • 262TB of disk

  17. Questions…?

More Related