220 likes | 365 Views
WP2: Data Management. Tutorial for PM9 Release RAL 31 st January 2002 Gavin McCance University of Glasgow. PM9 Release. Grid Data Mirroring Package (GDMP) * Basic replica management tool How-to… Spitfire Basic meta-data management prototype.
E N D
WP2: Data Management Tutorial for PM9 Release RAL 31st January 2002 Gavin McCance University of Glasgow
PM9 Release • Grid Data Mirroring Package (GDMP)* • Basic replica management tool • How-to… • Spitfire • Basic meta-data management prototype * Previously called ‘Grid Data Management Pilot’ Gavin McCance
GDMP • Useful documentation and reference • WP2 web page: • http://grid-data-management.web.cern.ch/grid-data-management • GDMP page • http://cmsdoc.cern.ch/cms/grid • GDMP 2.0 manual • ‘GDMP User Instructions for the Testbed’ Gavin McCance
GDMP • Version 2.0 (not the 2.0alpha) • Client-Server system for replicating files from one grid site to another • Subscription mechanism allows for automatic replication of files • Interfaced to the Grid Replica Catalogue (currently Globus MDS Replica Cat) Gavin McCance
GDMP • Any file type can be transferred • Replication mechanisms assume read-only files – i.e. no update synchronisation • Particular plug-in for Objectivity • Handles update of local database Gavin McCance
GDMP: Requirements • Tested on Linux RH6.1 and RH6.2 • Globus Toolkit 2.0 Alpha 9 • i.e. the EDG PM9 special release • GridFTP (NOT gsi-wuftp !) • g++ from gcc-2.91.66 or gcc-2.95.2 • RPM v3 or higher • Or.. Usual GNU make collection Gavin McCance
THE GDMP EDG PM9 RPM* * One of these is not an acronym • Recommend this for UK testbed • DataGrid WP6 site • (Or.. Get original RPM from GDMP site) • Manual gives RPM, SRPM, and tarball installation instructions • All paths relative to GDMP_INSTALL_DIR • = /opt/edg in testbed release • Or the path from ./configure --prefix Gavin McCance
Configuration • Full details in manual • Edit /opt/edg/etc/gdmp.conf. Set: • GDMP_INSTALL_DIR • GDMP_LOCAL_HOST & PORT • GLOBUS_LOCATION • If used: OBJECTIVITY stuff: binaries,boot file path, root directory Gavin McCance
RepCat Configuration • http://www.globus.org/datagrid/deliverables/replicaGettingStarted.pdf • GDMP_REP_CAT_URL • =ldap://host2/rc=replica-catalogue,… • GDMP_REP_CAT_MANAGER_DN • =cn=RCManager, dc=host2, dc=cern, dc=ch • GDMP_REP_CAT_MANAGER_PWD • =secret Gavin McCance
Inetd Configuration • As root: • configure_gdmp <install-dir> <userid> <port> • Updates /etc/services, /etc/inetd • Request served as ‘gdmp_server’ using: • GDMP_INSTALL_DIR/utils/gdmp_server_start • User manual Section 3.4 and Appendix A. Gavin McCance
Server cert • GDMP requires a CA-signed server certificate to identify itself • Default issue is one from CERN • Not really secure, since anyone can download GDMP RPMs. • Get a new one from your CA if being used for production Gavin McCance
GDMP client usage SiteA • A) su gdmp(or whatever user) • Currently client applications should run as same user as the server (given in /etc/inetd) • A) grid-proxy-init • B) Add gdmp server DN cert to mapfile! • A) setenv GDMP_CONFIG_FILE /opt/edg/etc/gdmp.conf • A)gdmp_ping hostb.ac.uk:2000 • “The GDMP server on hostb.ac.uk:2000 is listening” Site B Gavin McCance
…GDMP usage Site A Site B • A,B) Start GDMP services (inetd) • B) Registers itself with site A • gdmp_host_subscribe hosta.ac.uk:2000 • A) New files Register them • gdmp_register_local_file -d /pool/files/ • This updates the local GDMP internal catalogue (on A) Gavin McCance
…GDMP usage Site A Site B • A) Tell the world (well..all subscribed sites) • gdmp_publish_catalogue • Will update the import catalogue on all subscribed sites eg. The import catalogue on site B • By default, it will also publish the GDMP internal catalogue on the Globus Replica Catalogue Gavin McCance
…GDMP usage Site A Site B • B) Get the new files from site A (and from any other sites to which B may be subscribed) • gdmp_replicate_get • Any new files on A will be transferred from site A site B • Put in: GDMP_FLATFILE_ROOT_DIR as specified by gdmp.conf • By default, Globus Replica Catalogue is updated Gavin McCance
Staging Support • Support for staging to and from MSS • GDMP server at B will be notified if there is some staging to be done at A and will drop connection. When staging is complete, B is notified by A, and can re-request the transfer. • GDMP: section 7. Gavin McCance
Automation • Transfer waits until site B runs gdmp_replicate_get • However, when import catalogue is updated on B, a script is called GDMP_NOTIFICTION_FOR_PUBLISH_CATALOGUE • An example would be to run gdmp_replicate_getso the transfer happens automatically Gavin McCance
RepCat C++ API • Described in Appendix D. • WP2 working with Globus on new distributed Replica Catalogue model • GIGGLE framework • Will attempt to keep existing APIs as much as possible! Gavin McCance
Meta-data • Spitfire is a basic prototype • Purpose is the allow secure access to any SQL database over the grid • Secure access via HTTP(S) • Standard access (ie. Don’t need to know what the backend DB is) Gavin McCance
Meta-data • Current implementation is via XSQL templates • http://hep-proj-spitfire.cern.ch/hep-proj-spitfire • Server side XSQL templates are ‘filled-in’ by attributes from an http GET or POST • Example.. Gavin McCance
Meta-data • Template metatrig.xsql on server: • “select LFN from FileMetaData where TRIGGER=@trig and RUNNO>=@runmin and RUNNO<=@runmax” • An HTTP(S) request (eg. from a browser form) • http://meta1.atlas.rl.ac.uk/metatrig.xsql?trig=low1-a25&runmin=1100&runmax=1500 • Will return an XML or HTML encoded list of matching Logical File Names. • Good if you have a specific problem now! Gavin McCance
Meta-data • Must maintain templates • Dependence on Oracle XSQL code • No client side APIs defined yet • It’s being rewritten for next release • Initially for new replica catalogue • + Proper authorisation + meta-data distribution + client side API Gavin McCance