410 likes | 557 Views
WebDAV in EMI-Data. Super B meeting via EVO , 6.12.2011. Patrick F uhrmann , EMI. Data. Content. EMI EMI in general EMI release plan What happens after EMI EMI data product portfolio What do we provide Details on the SE’s as basis for storage WebDAV in EMI Data What do we provide
E N D
WebDAV in EMI-Data Super B meeting via EVO, 6.12.2011 Patrick Fuhrmann, EMI Data
Content • EMI • EMI in general • EMI release plan • What happens after EMI • EMI data product portfolio • What do we provide • Details on the SE’s as basis for storage • WebDAV in EMI Data • What do we provide • Details on the SE’s as basis for storage • Data access and transfer models With contributions by • Ricardo Rocha • Paul Millar • Zsolt Molnar • TigranMkrtchyan • Jon Kerr Nilsen • Alejandro Ayllon • FabrizioFurano • Alberto Di Meglio (Boss) superB meeting via EVO
EMI factsheets EMI in general superB meeting via EVO
Where we are Stolen from Alberto Di Meglio Before EMI 3 years After EMI Applications Integrators, System Administrators Standard interfaces Specialized services, professional support and customization Standard interfaces EMI Reference Services Standards,New technologies (clouds) Users and Infrastructure Requirements superB meeting via EVO
Release and support policy Kebnekaise Lappland, Sw, 2100m Giebnegáisi Matterhorn Swiss, Italy, 4478m Stolen from Alberto Di Meglio Done In Preparation Start EMI 0 EMI 1 EMI 2 EMI 3 Major releases Supp. & Maint. Support & Maintenance Support & Maintenance Support & Maintenance 01/05/2010 30/04/2012 28/02/2013 31/10/2010 superB meeting via EVO
What happens after May 2013 ? • Not clear. • The EU reviewers strongly recommended to put more efforts into future planning. • Strategic directory has been nominated and is now in place. • NA3 together with the SD has to find a sustainability model for the time beyond EMI. • Organization similar to ‘Apache’ is in discussion, combining the different product teams to an open source initiative. (NOT a new EMI EU project). • Hopefully more information after next collaboration board meeting on the 14/15 Dec at CERN. superB meeting via EVO
Important : Beside the future of EMI • The three European Storage Software providers dCache.org, CERN DM and INFN/StoRM have been build a ‘pseudo’ collaboration through WLCG. This collaboration will exist beyond the EMI and will follow up on interesting projects by themselves. superB meeting via EVO
Actual topic And now to EMI – Data Please find all the details in the EMI Data wiki https://twiki.cern.ch/twiki/bin/view/EMI/EmiJra1T3DataDJRA122 superB meeting via EVO
EMI Data Product families Professional Storage Solutions Client Libraries dCache DPM File Location and meta data Service (LFC) Reliable File Transport Service superB meeting via EVO
EMI Storage Element Portfolio Disk Storage Layer HSM Interface SRM HSM ENSTORE dCache Any Other TSM GPFS Any FS DPM superB meeting via EVO
EMI Storage element characteristics dCache • 100 PBytes world-wide • Most WLCG Tier I’s • Holds 50% of WLCG data • Supports most known tape systems • Featuring • File replication on hot-spot detection • Draining of pools • Resilient dataset management • Replication on arrival • In use at the Italien Tier I plus about 40 Tier II’s • Makes use of features of underlying storage system (GPFS, Lustre… • Supports tape through GPFS DPM • Easy to install • Little maintenance • More than 200 installations superB meeting via EVO • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites • Easy to install • Very little maintenance • Majority of WLCG sites
WebDAV WebDAV in EMI Data Servers Clients access and transfer models superB meeting via EVO
WebDAV in EMI Storage Elements • dCache and DPM already support http(s) and WebDAV • StoRMwillfollow for EMI-3 • Authentication done though username/password or X509/Certificates superB meeting via EVO
WebDAV clients • Most operating systems support http(s)/WebDAV. • General available tools : wget and curl • EMI people are contributing to ROOT to improve the http(s) client. (redirect and vector read) • dCache soon comes with browser plug-in for easy file management and transfers via WebDAV. superB meeting via EVO
Data Models Data models with WebDAV superB meeting via EVO
Semi static catalogue based (WLCG) LFC Web DAV ROOT • Prototype works with LFC / DPM / dCache • No aggregation library but using natural http protocol redirection • Needs catalogue synchronization (see later) superB meeting via EVO storage element storage element
Semi static catalogue and transfers LFC Update File Transfer Service (FTS2) WebDAV 3rd party • On FTS2 see next slide superB meeting via EVO storage element storage element
FTS3 and WebDAV 3rd party copy • Next generation File Transfer Services, FTS 3 • Redesign based on experience of last years • Based on GFAL-2 • Decommission of channel concept. • Prototype ready in April ’12 (Framework for new approaches) • Many interesting new approaches • Support of http including 3rd party copy (delegation) • Feedback of real resource utilization • Interactively • Automatically (callout to storage elements) • Autonomously (learning) superB meeting via EVO
Dynamic catalogue based Dynamic Catalogue Query for file Web DAV ROOT superB meeting via EVO storage element storage element
Dynamic “catalogue failed” correction Catalogue Web DAV ROOT superB meeting via EVO storage element storage element
WebDAV federation models in EMI • Currently those alternatives are under investigation. • Design tries to integration existing infrastructures with dynamic catalogue idea. • A design paper will be available end of January (Fabrizio F.) • First official presentation on that topic at CHEP in NYC. superB meeting via EVO
Content of this presentation Slightly related topics superB meeting via EVO
The EMI-Data Lib superB meeting via EVO
The EMI Data Lib (cont.) • October 2011 : Deliver consolidation plan in EMI • Draft exists, main ideas ready • December 2011 : Finish prototype implementation • Prototype should be ready for EMI-2 • Merging 2 data libraries in two month is challenging • Initial work already started • 2012 Testing • Many crucial components are affected • Plenty of testing needed to achieve production quality • December 2012 : Finish migration to EMI data superB meeting via EVO
SE and catalogue synchronization • Storage element and catalogue synchronization • Event based synchronizing of data location information between SE’s and catalogues. • Supposed to solve : • Dangling reverences in catalogues (pointers to lost files) • Synchronizing access permission information between SE’s and catalogues ? • Doesn’t solve : • Dark data (File in SE’s which are not referenced from catalogues) DPM, StoRM or dCache LFC or experiment catalogue Command Line Interface List of removed files Generic Adapter Generic Adapter Messaging infrastructure superB meeting via EVO
thx EMI is partially funded by the European Commission under Grant Agreement INFSO-RI-261611 superB meeting via EVO
Backup slides Backup slides superB meeting via EVO
SE’s in EMI Breaking news : DPM superB meeting via EVO
News from DPM • Ricardo replaced Jean-Philippe as DPM/LFC PI. • DPM 1.8.2 • Improved scalability of all frontend daemons • Especially with many concurrent clients • Faster DPM drain • Better balancing of data among disk nodes • Different weights to each filesystem • Improved validation & testing • Collaboration with ASGC for this purpose (thanks!) • Hammercloud tests running regularly • They started with a 400 core setup, we looked at the issues, now moving to 1000 cores to increase load superB meeting via EVO
Future releases : DPM (provided by Ricardo) 1.8.3 November • Package consolidation: EPEL compliance • Fixes in multi-threaded clients • Replace httpg with https on the SRM • Improve dpm-replicate (dirs and FSs) • GUIDs in DPM • Synchronous GET requests • Reports on usage information • Quotas • Accounting metrics • HOT file replication 1.8.4 January 1.8.5 superB meeting via EVO
News from DPM (Administration) • DPM Admin contrib package • Contribution from GridPP • Now packaged and distributed with the DPM components • http://www.gridpp.ac.uk/wiki/DPM-admin-tools • Nagios monitoring plugins for DPM • Available now • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Monitoring • Puppet templates • Available now in beta • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Puppet superB meeting via EVO
Some news from dCache superB meeting via EVO
Slightly modified release numbers LHC Tech. Break April April 2011 2012 2.2 EMI - 2 2.1 1.9.14 2.0 1.9.13 EMI - 1 1.9.12 superB meeting via EVO
More on dCache Some dCache lab secrets superB meeting via EVO
Adapting different back-ends pNFS WebDAV gridFTP xRootD dCache Pool Data Access Abstraction File or whatever Hadoop FS Object Store superB meeting via EVO
Pool storage abstraction • Pool data access abstraction layer allows to plug-in different storage back-ends • We start with Hadoop FS as a prove of concept • Feature-set of dCache (pNFS,WebDAV..) plus • Easy maintenance of Hadoop FS • Pools might no longer be multi-purpose e.g. • Hadoop FS not very good in random seeks. • Object Stores might only support PUT, GET • Allows sites to migrate from BestMan/Hadoop to dCache • Will try Objects Stores later. superB meeting via EVO
The Three Tier Model superB meeting via EVO
The Three Tier Model (Motivation) Different storage back-ends have different properties • Tape • Single stream • Non shareable • High latency • Cheap reliable • Low power • Spinning disk • Multiple stream • Medium shareable • Medium latency • Reasonable speed • Medium costs • SSD • Multiple stream • Highly shareable • Low latency • Good speed • Super expensive Different protocols/applications have different requirements • Random access / Analysis • Many uncontrollable streams • Very low latency requirements • Chaotic seeks • Transfer speeds not that important • WAN Transfer / Reconstruction • Controlled/Low number of streams • Latency doesn’t matter • High transfer speeds superB meeting via EVO
The Three Tier Model SSD Spinning Disks Tape SRM/gridFTP/WAN Precious Copy Precious Or Cached Copy Cached Copy pNFS Random Access Analysis SRM/gridFTP/http WAN/streaming superB meeting via EVO
More cool stuff dCache will come with it’s own WebDAV browser client. Stay tuned. superB meeting via EVO
Some conclusions • EMI (DATA) is already significantly contributing to the HEP data grid … • Sustainability is now being worked on. • Industry standards are becoming available within EMI-Data • EMI builds the framework of collaboration even among natural competitors (DPM, StoRM and DPM). Customers benefits. • Go and tryout the EMI repository !!! • More info on EMI Data with all details and timelines : https://twiki.cern.ch/twiki/bin/view/EMI/EmiJra1T3DataDJRA122 superB meeting via EVO