1 / 15

Don Quijote

Don Quijote. Data Management for the ATLAS Automatic Production System. Miguel Branco – CERN ATC miguel.branco@cern.ch. Overview. Don Quijote New Focus Functionalities POOL Architecture Current Status NorduGrid US Grid 3(+) LCG-2 Integration with ATLAS prodsys Future plans.

elroy
Download Presentation

Don Quijote

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC miguel.branco@cern.ch

  2. Overview • Don Quijote • New Focus • Functionalities • POOL • Architecture • Current Status • NorduGrid • US Grid 3(+) • LCG-2 • Integration with ATLAS prodsys • Future plans Don Quijote - Status & Plans

  3. Don Quijote • Data Management for the ATLAS Automatic Production System • Allow transparent registration and movement of replicas between all grid “flavors” used by ATLAS • US Grid • Nordugrid • LCG • (support for legacy systems might be introduced soon) • Avoid creating yet another catalog • which grid middleware wouldn't recognize (e.g Resource Brokers) • use existing catalogs and data management tools • find common features between tools and catalogs • bridge them and provide a unified interface • Accessible as a service • lightweight clients Don Quijote - Status & Plans

  4. Don Quijote – new focus • Provide a single tool to end-users to manage data files • Integrates all tools that users would have to know about into a single one. E.g.: • FCpublish, FCregister, … (POOL File Catalogs) • edg-rm, edg-rmc, edg-lrc, … (EDG) • globus-rls-cli, globus-url-copy, … (Globus) • ldapsearch, … (querying information system) • rfdir, rfcp, … (common use of Castor) • Acts as a POOL-aware Replica Manager • Eases security requirements for end-users • Temporarily! Don Quijote - Status & Plans

  5. Functionalities • search | fullSearch | searchHosts ( lpn ) • add[Restricted] ( lpn, url [, guid, fsize, md5sum ] ) • addTemporary[Restricted] ( lpn, url, nrhours [, guid, fsize, md5sum ] ) • keepUntil ( url, nrhours ) • makePermanent ( url ) • removeReplica ( url ) • remove ( lpn ) • rename ( old lpn, new lpn ) • stageOut( url ) • getToDestination ( src SE, lpn , dest ) • putToSE ( src turl, lpn, dest SE [, guid, md5sum] ) Replica Catalogs Manipulation File Movement LPN = Logical Collection Name + Logical File Name (unique) Don Quijote - Status & Plans

  6. Functionalities - POOL • Integrates file movement with POOL XML File Catalogs • Uses DQ + POOL FC command line tools • Python scripts • Use-cases: • Get local copy of file and generate or update corresponding PoolFileCatalog.xml • (to provide input data and input POOL XML catalog for a job) • Copy and register a local copy of a file to a grid flavor given UUID in the local PoolFileCatalog.xml • (to register output data from a job) Don Quijote - Status & Plans

  7. Architecture • Python Client • C++ client library • Configuration file indicating endpoint of each server • Servers • Per grid-flavor • GSI and insecure • Configuration file User interface tool written in Python Servers and client library written in C++ Don Quijote - Status & Plans

  8. Changes on Server-side • Why was server-side code rewritten? • Partly because of CMS experience • Persistent connections were necessary • Connection pooling mechanism • Each request could not instantiate a connection to the grid catalog – too slow! • Partly from our initial experience • Flexible security mechanism • Either provide a single certificate for all, or delegate credentials • Initial version: • A command line tool for each grid flavor with the same syntax and same “output” • Clarens server was forking out a process that executed the request by calling the command line tool • This proved to be inefficient and too restrictive – e.g. could not maintain persistent connections across multiple requests! • Therefore, • Server code was built by extending the command line tools – each tool is now a daemon Don Quijote - Status & Plans

  9. Current Status • Current structure: DqCore DqPoolRls DqGlobusRls DqLcgReplicaAccess DqClassicReplicaAccess C++ Client Module DqLcgInfoService DqVdtInfoService dms.py Python Module C++Python wrapper (user interface) DqNgInfoService DqLcgPoolFileCatalog DqFakePoolFileCatalog DqFactory DqConfigFile DqInterface DqMonitor DqUI DqServerLcg, DqServerNg, DqServerVdt Don Quijote - Status & Plans

  10. NorduGrid • Globus RLS 2.x • Only Classic Storage Elements (GridFTP servers) • Information System • Connects to LDAP • Special attributes in the RLS DqCore DqGlobusRls DqClassicReplicaAccess DqNgInfoService DqFakePoolFileCatalog DqFactory DqConfigFile DqInterface DqMonitor DqUI DqServerNg Don Quijote - Status & Plans

  11. LCG-2 • EDG/LCG RLS (v2.2) • GFAL support: • SRM/Castor support • SRM/dCache support • Classic Storage Element support • Information System: • LDAP-based (MDS) • Native POOL Support • Using POOL-1.6.5 DqCore DqPoolRls DqLcgReplicaAccess DqLcgInfoService DqLcgPoolFileCatalog DqFactory DqConfigFile DqInterface DqMonitor DqUI DqServerLcg Don Quijote - Status & Plans

  12. US Grid 3(+) • Globus RLS 2.x • DQ supports at the moment only Classic Storage Elements (GridFTP servers) • No “information system” interface • DQ creates a “dummy” information system which consists of a local configuration file DqCore DqGlobusRls DqClassicReplicaAccess DqVdtInfoService DqFakePoolFileCatalog DqFactory DqConfigFile DqInterface DqMonitor DqUI DqServerVdt Don Quijote - Status & Plans

  13. Integration with ATLAS prodsys • Executors are using their “native” grid tools to do file registration • But are adding extra-metadata attributes required by DQ • This allows integration with DQ • Windmill is using DQ • To locate replicas of files • Renaming of logical files to their final names (after validation) • This week: move files across grids so that each executor finds at least a replica of all files required by the jobs Don Quijote - Status & Plans

  14. Future plans • Better integration with POOL • Must come from end-users experience • Better end-user documentation and support • For now, focus has been only on the Automatic Production System • Get “best” replica (not high priority) • within a grid • between grids • Monitoring • Still being discussed… • Reliable transfer service • Using MySQL database to manage transfers and automatic retries Don Quijote - Status & Plans

  15. Future plans • Release command line tools appropriate for end-users • Request has been made to provide such tools for the Combined Test Beam effort • Provide servers as Pacman-caches • Much to improve • Reliability • Easy installation of client tool for users outside “grid” • Get local copies of files to non-grid machine • ? wrap in Pacman the minimal Globus GridFTP libraries • As true interoperability comes, Don Quijote goes… • Common information schema & similar catalogs • Common interface to storage resource “managers” Don Quijote - Status & Plans

More Related