270 likes | 394 Views
EU DataGrid Data Management Workpackage : WP2 Status and Plans. Peter Z Kunszt IT/DB http://cern.ch/grid-data-management/. Outline. WP2 Mandate, Tasks and Structure Achievements to date Task status Replication Metadata Optimization Security Plans Issues. DataGrid workpackages.
E N D
EU DataGrid Data Management Workpackage : WP2Status and Plans Peter Z Kunszt IT/DB http://cern.ch/grid-data-management/ C5 14.6.2002
Outline • WP2 Mandate, Tasks and Structure • Achievements to date • Task status • Replication • Metadata • Optimization • Security • Plans • Issues C5 14.6.2002
DataGrid workpackages • WP1: Workload Management • WP2: Grid Data Management • WP3: Grid Monitoring Services • WP4: Fabric management • WP5: Mass Storage Management • WP6: Integration Testbed – Production quality International Infrastructure • WP7: Network Services • WP8: High-Energy Physics Applications • WP9: Earth Observation Science Applications • WP10: Biology Science Applications • WP11: Information Dissemination and Exploitation • WP12: Project Management C5 14.6.2002
Mandate: Data Management The goal of this work package is to specify, develop, integrate and test tools and middleware infrastructure to coherently manage and share petabyte-scale information volumes in high-throughput production-quality grid environments. The work package will develop a general-purpose information sharing solution with unprecedented automation, ease of use, scalability, uniformity, transparency and heterogeneity. C5 14.6.2002
Mandate: Data Management It will allow to securely access massive amounts of data in a universal global name space, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote replicas. Novel software for automated wide-area data cachinganddistribution will act according to dynamic usage patterns. C5 14.6.2002
Data Management Tasks • Data Transfer Efficient, secure and reliable transfer of data between sites • Data Replication Replicate data consistently across sites • Data Access Optimization Optimize data access using replication and remote open • Data Access Control Authentication, ownership, access rights on data • Metadata Storage Grid-wide persistent metadata store for all kinds of Grid information C5 14.6.2002
Current Constituency • CERN Peter Kunszt, Heinz Stockinger, Leanne Guy, Diana Bosio, Akos Frohner, Wolfgang Hoschek, Kurt Stockinger • INFNFlavia Donno, Andrea Domenici, Livio Salconi, Giuseppe Andronico, Federico Di Carlo, Marco Serra • PPARCGavin McCance, WilliamBell, David Cameron, Paul Millar • Trento Cultural InstituteRuben Carvajal Schiaffino, Luciano Serafini, Floriano Zini • U.Helsinki/CSCMika Silander, Joni Hahkala, Ville Nenonen, Niklas Karlsson • KDC StockholmOlle Mulmo, Gian Luca Volpato C5 14.6.2002
Additional Resources, collaborators • LCG:Erwin Laure, Itzhak Ben-Akiva, James Casey • PPDG/Condor: Andy Hanushevsky, Aleksandr Konstantinov, Alan Roy • Globus/ISI: Ann Chervenak, Bob Schwarzkopf C5 14.6.2002
Outline • WP2 Mandate, Tasks and Structure • Achievements to date • Task status • Replication • Metadata • Optimization • Security • Plans • Issues C5 14.6.2002
Deliverables C5 14.6.2002
Communications • Quarterly reports on WP2 • Regular meetings and workshops • 2 DataGrid conferences per year • 2 WP2 dedicated workshops per quarter • Weekly management meetings • Weekly phone conferences depending on task • 8 dedicated WP2 mailinglists • Long list of publications in all of our areas – see www. 26 in total, 12 in journals/proceedings. C5 14.6.2002
Outline • WP2 Mandate, Tasks and Structure • Achievements to date • Task status • Replication • Metadata • Optimization • Security • Plans • Issues C5 14.6.2002
Apps Mware Globus Local Application Local Database Local Computing Grid Application Layer Grid Job Management Data Management Metadata Management Object to File Mapper Collective Services Information & Monitoring Replica Manager Grid Scheduler Replica Catalog Interface Replica Optimization Underlying Grid Services Metadata Service Computing Element Services Storage Element Services Replica Catalog Authorisation, Authentication and Accounting Service Index Grid Fabric services Fabric Monitoring and Fault Tolerance Node Installation & Management Fabric Storage Management Resource Management Configuration Management C5 14.6.2002
File Management Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator Replica Catalog: Map Logical to Physical files Replica Selection: Get ‘best’ file Security Pre- Post-processing: Prepare files for transfer Validate files after transfer Replication Automation: Data Source subscription Site A Site B Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns Storage Element A Storage Element B File Transfer File A File X File A File C File B File Y File B File D C5 14.6.2002
Current Components • File Transfer: Use GridFTP – deployed • Close collaboration with Globus • NetLogger (Brian Tierney and John Bresnahan) • Replication: GDMP – deployed • Wrapper around Globus ReplicaCatalog • All functionality in one integrated package • Using Globus 2 • Uses GridFTP for transferring file • Replication: edg-replica-manager – deployed • Replication: Replica Location Service Giggle – in testing • Distributed Replica Catalog • Replication: Replica Manager Reptor – in testing • Optimization: Replica Selection OptorSim – in simulation • Metadata Storage: SQL Database ServiceSpitfire – deployed • Servlets on HTTP(S) with XML (XSQL) • GSI enabled access + extensions • GSI interface to CASTOR – delivered C5 14.6.2002
Current Status Replication • GDMP version 3.0.5 • Support for multiple VOs • New security mechanism for server and clients • uses Globus 2.0 beta 21 on EDG testbed • Linux RedHat 6.2 (6.1.1) partly already RedHat 7.2 (some manual changes required) • GDMP is part of VDT (Virtual Data Toolkit) • Alain Roy (Condor team) does support for US sites • Replica Manager • Wrapper around existing Globus replica manager edg-replica-manager • Core API as defined in Replica Manager Design Document is implemented Reptor • Java Servlet, alpha release in internal testing C5 14.6.2002
Current Status Replication cont. • Replica Catalog • GUI and command line tool for existing RC • Available to the production testbed • Introduced a security patch for RC • Collaboration with Globus and PPDG on new, distributed Replica Catalog Framework (Giggle) • joint design with Globus, G:coding, EDG:testing • Internal alpha release, integrated with Reptor • Working prototype of Simulator Optimization C5 14.6.2002
Metadata Management and Security Project Spitfire • 'Simple' Grid Persistency • Grid Metadata • Application Metadata • Unified Grid enabled front end to relational databases. • Metadata Replication and Consistency • Publish information on the metadata service Secure Grid Services • Grid authentication, authorization and access control mechanisms enabled in Spitfire • Modular design, reusable by other Grid Services C5 14.6.2002
Connecting Layer SOAP Global Spitfire Layer SOAP SOAP SOAP SOAP SOAP OracleLayer DB2Layer PGLayer MyLayer Local Spitfire Layer Oracle DB2 PostGres MySQL Spitfire Architecture • XSQL Servlet as one access mode for ‘simple’ web access • Web/Grid Services Paradigm • SOAP interfaces • JDBC interface to RDBMS • Pluggability and extensibility • Atomic RDBMS is always consistent • No local replication of data • Role-based authorization C5 14.6.2002
HTTP + SSLRequest + client certificate Is certificate signedby a trusted CA? Has certificatebeen revoked? No No Yes Find default Role ok? Request and connection ID Security Mechanism Servlet Container SSLServletSocketFactory RDBMS Trusted CAs TrustManager Revoked Certsrepository Security Servlet ConnectionPool Authorization Module Does user specify role? Role repository Translator Servlet Role Connectionmappings Map role to connection id C5 14.6.2002
Status Spitfire & Security • Integrated XSQL-Spitfire including security mechanisms • XSQL version of a Replica MetaData Catalog • Schema is given • Roles are fixed • Queries are predefined • No SOAP, just XML over http • Installation decoupled from Tomcat & MySQL • Work on SOAP interface using axis • Security mechanisms planned C5 14.6.2002
Outline • WP2 Mandate, Tasks and Structure • Achievements to date • Task status • Replication • Metadata • Optimization • Security • Plans • Issues C5 14.6.2002
Reptor Optor Giggle GDMP RepMeC WP2 Replication Services Replica Manager Client Optimization Transaction Consistency File Transfer Postprocessing Preprocessing Replica Location Subscription Replica Metadata C5 14.6.2002
Summary of Plans • GDMP v3.0 will be the LAST release of GDMP as we know it. • GDMP in the future will rely on the Replica Manager and provide the subscription based mirroring functionality. It will be implemented as a web service. • The Replica Catalog will be replaced with the Giggle framework, jointly developed with Globus. • The Replica Manager will take over most of GDMPs current functionality and will be a web service. We provide client APIs. All user interaction should go through the RM. • Spitfire will have both an XML over http access method for static queries and a SOAP interface. • Security infrastructure for all services in place as presented. C5 14.6.2002
Outline • WP2 Mandate, Tasks and Structure • Achievements to date • Task status • Replication • Metadata • Optimization • Security • Plans • Issues C5 14.6.2002
Issues • Getting everybody started, coordination with partners – one very late deliverable • Technology challenges – evolving field, keeping on the top needs a lot of training effort of everybody • Coordination within WP2 and US projects – not enough funds for travel @ CERN • Very late arrival of and changes in manpower – additional training and supervision • A lot of young members are working / have worked on their PhD • Different agenda of user community delays development: continuous requests for support and enhancements of components that we want to phase out (GDMP) takes out development manpower, documentation suffers most. C5 14.6.2002
Outlook • Very ambitious programme, can we make it? • Current workforce is very well motivated and has a very high level of interaction – see our mailinglist archives • A lots of enhancements in inter-task coordination was done, further effort is necessary – workshops THANK YOU FOR YOUR ATTENTION C5 14.6.2002