170 likes | 530 Views
Data Management GridPP and EDG. Gavin McCance University of Glasgow May 2, 2002. http://www.gridpp.ac.uk/datamanagement http://cern.ch/grid-data-management. Who are we?. GridPP Effort based at Glasgow Collaboration with European DataGrid WP2: Data Management work package
E N D
Data ManagementGridPP and EDG Gavin McCance University of Glasgow May 2, 2002 http://www.gridpp.ac.uk/datamanagement http://cern.ch/grid-data-management
Who are we? • GridPP • Effort based at Glasgow • Collaboration with European DataGrid • WP2: Data Management work package • CERN, Finland, Italy • Replication collaboration with Globus + PPDG project Gavin McCance
What do we* do? • Replica management • Replica catalogues • File access and transfer • Grid query optimisation (replica optimisation)* • Secure meta-data catalogues* • Service Index Gavin McCance
Replica Catalogues • Must maintain replica of the same files • Have a globally unique Logical File Name (LFN) mapping to multiple physical instances of the file (PFNs). • Catalogue to keep track of all these mappings! File-1 LFN Paris File-1 Chicago Glasgow File-1 File-1 Gavin McCance
…catalogues • Current services use LDAP • Collaboration with Globus + PPDG on new replica catalogue framework (GIGGLE) • Prototype Replica Location Service (RLS) under development • Will use meta-data service (Spitfire)… • API implemented as wrapper for current LDAP based replica catalogue Gavin McCance
…RLS RLI • Implemented as web service RLI RLI LRC LRC LRC LRC LRC Storage Element Storage Element Storage Element Storage Element Storage Element Gavin McCance
Transferring files • What replicates the files? • Grid Data Mirroring Package (GDMP) • GDMP 3.0 software just released • GSI authentication and authorisation • GridFTP file transfer • Subscription based file replication • Automatic update of replica catalogue • http://cmsdoc.cern.ch/cms/grid/ Gavin McCance
Replica Manager • New web service under development • GDMP functionality will be absorbed • Will use replica location service • Core API has been defined • replicateFile, copyAndRegisterFile, deleteFile, registerEntry, unregisterEntry • Iteration with WP5 on accessing data from Storage Elements Gavin McCance
Optimisation • Negotiation with scheduling for data intensive jobs • minimise job time / max grid throughput • Given the distribution of data a job will use, what is the most appropriate place to run it? • Once its running: is it better to remote-open, cache or make a new replica nearby? Gavin McCance
…Optimisation • Dynamic replication decisions based on network stats and file access patterns • Economic model being tested • “Greedy” local optimisation leads to a reasonable global optimum… • Data-centric grid simulation to test these replication algorithms Gavin McCance
Meta-data • Need for transparent, secure access to meta-data • Both for grid-specific (e.g. Replica catalogue) and application specific meta-data. • Spitfire service available • Current version 1.1.0 • http://hep-proj-spitfire.web.cern.ch/hep-proj-spitfire Gavin McCance
Current Spitfire • Secure access over HTTPS to retrieve from or publish to any RDBMS • Can use web-browser as client Gavin McCance
Security • Authentication is provided over SSL via a Globus certificate • Remote users are mapped onto a database role, so can only perform authenticated operations on the database Gavin McCance
HTTP + SSLRequest + client certificate Is certificate signedby a trusted CA? Has certificatebeen revoked? No No Yes Finddefault Role ok? Request and connection ID Security Mechanism Servlet Container SSLServletSocketFactory RDBMS Trusted CAs TrustManager Revoked Certsrepository Security Servlet ConnectionPool Authorization Module Does user specify role? Role repository Translator Servlet Role Connectionmappings Map role to connection id Gavin McCance
Developments to Spitfire • Web Services API is defined • Implementation to start immediately • Access via SOAP, initially over HTTPS • Higher level services • Meta-data distribution and replication • Clean-up services Gavin McCance
Service Index • How do I find a specific grid service? • E.g. replica location server, image database, information service • XML Service description • What, where, attributes, how to contact. • Scalable architectures for querying this developed • Service index web service • W. Hoschek’s thesis and paper (WP2@CERN) • API developed Gavin McCance
More Info • More information available at… http://www.gridpp.ac.uk/datamanagement http://cern.ch/grid-data-management Gavin McCance