1 / 28

Data Management System

Data Management System. Jean Salzemann CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 6th, 2007 Credits: Giuseppe Misurelli. Outline. Grid Data Management Challenge Storage Elements and SRM LFC File Catalog Data Movement Utils. Grid DM Challenge. Grid Data Management Challenge

rtindall
Download Presentation

Data Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management System Jean Salzemann CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 6th, 2007 Credits: Giuseppe Misurelli

  2. Outline • Grid Data Management Challenge • Storage Elements and SRM • LFC File Catalog • Data Movement Utils

  3. Grid DM Challenge • Grid Data Management Challenge • Storage Elements and SRM • LCG File Catalog • Data Movement Utils

  4. The Grid DM Challenge /1

  5. The Grid DM Challenge /2 • DM works with files, this assumption is due the following reasons: • semantic of file is very good understood by everyone • file is the smallest granularity of data.

  6. Data Management Services • Storage Element – common interface to storage • Storage Resource Manager Castor, dCache, DPM, … • POSIX-I/O gLite-I/O • Native Accessprotocolsrfio, dcap • Transfer protocols gsiftp • Catalogs – keep track where data are stored • File Catalog • Replica Catalog LFC, Metadata Catalog (es. AMGA) • File Authorization Service • Metadata Catalog • File Transfer – schedules reliable file transfer • Data Scheduler • File Transfer Service lcg-utils, gLite FTS

  7. SE and SRM • Grid Data Management Challenge • Storage Elements and SRM • LFC File Catalog • Data Movement Utils

  8. SRM in an example /1 She is running a job which needs: Data for physics event reconstruction Simulated Data Some data analysis files She will write files remotely too They are at CERN In dCache They are at Fermilab In a disk array They are at Nikhef in a classic SE

  9. SRM in an example /2 dCache Own system, own protocols and parameters I talk to them on your behalf I will even allocate space for your files And I will use transfer protocols to send your files there You as a user need to know all the systems!!! gLite DPM Independent system from dCache or Castor SRM Castor No connection with dCache or DPM

  10. Storage Resource Management • Data are stored on disk pool servers or Mass Storage Systems • storage resource management needs to take into account • Transparent access to files (migration to/from disk pool) • File pinning • Space reservation • File status notification • Life time management • The SRM (Storage Resource Manager) takes care of all these details • The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world

  11. gLite SE types /1 • gLite 3.0 data access protocols: • File Transfer: GSIFTP (GridFTP) • File I/O (Remote File access): gsidcap insecure RFIO secured RFIO (gsirfio) • Classic SE: • GridFTP server • Insecure RFIO daemon (rfiod) – only LAN limited file access • Single disk or disk array • No quota management • Does not support the SRM interface

  12. gLite SE types /2 • Mass Storage Systems (Castor) • Files migrated between front-end disk and back-end tape storage hierarchies • GridFTP server • Insecure RFIO (Castor) • Provide a SRM interface with all the benefits • Disk pool managers (dCache and gLite DPM) • manage distributed storage servers in a centralized way • Physical disks or arrays are combined into a common (virtual) file system • Disks can be dynamically added to the pool • GridFTP server • Secure remote access protocols (gsidcap for dCache, gsirfio for DPM) • SRM interface

  13. File Catalog and DM Tools • Grid Data Management Challenge • Storage Elements and SRM • LFC File Catalog • Data Movement Utils

  14. Files & replicas: Naming Conventions • Logical File Name (LFN) • An alias created by a user to refer to some item of data, e.g. “lfn:cms/20030203/run2/track1” • Globally Unique Identifier (GUID) • A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” • Site URL (SURL) (or Physical File Name (PFN) or Site FN) • The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) • Transport URL (TURL) • Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

  15. LFC - Description • Provides • Bulk operations • Cursors for large queries • Timeouts and retries for client operations • Features • User exposed transaction API • Hierarchical namespace and namespace operations • Integrated GSI Authentication and Authorization • Access Control Lists (Unix Permissions and POSIX ACLs) • Checksums Supported database backends: Oracle and MySQL

  16. File Metadata User Metadata Logical File Name (LFN) GUID System Metadata (ACLs, Ownership,etc File Replica Symlinks Storage File Name Link name Storage Host LFC - Architecture • LFC stores both logical and physical mappings for the file in the same database  Speed up of operations • Treats all entities as files in a UNIX-like filesystem. • File API also similar to UNIX (create(), mkdir(), chown()….) • Hierarchical namespace of LFNs mapped to the GUIDs • GUIDs mapped to the physical locations of file replicas in the storage • Systemattributes of files (creation time, file size and checksum…) stored as LFN attributes • One field for user-definedmetadata • Multiple LFNs per GUID allowed as symboliclinks to the primary LFN. User defined Metadata

  17. File Catalog and DM Tools • Grid Data Management Challenge • Storage Elements and SRM • LFC File Catalog • Data Movement Utils

  18. GFAL: Grid File Access Library • Interactions with SE require some components: • → File catalog services to locate replicas • → SRM • → File access mechanism to access files from the SE on the WN • GFAL does all this tasks for you: • → Hides all these operations • → Presents a POSIX interface for the I/O operations • → User can create all commands needed for storage management → It offers as well an interface to SRM • Supported protocols: • → file (local or nfs-like access) • → dcap, gsidcap and kdcap (dCache access) • → rfio (castor access) and gsirfio (dpm)

  19. lcg-utils DM tools • High level interface (CL tools and APIs) to • Upload/download files to/from the Grid (UI,CE and WN <---> SEs) • Replicate data between SEs and locate the best replica available • Interact with the file catalog • Definition: A file is considered to be a Grid File if it is both physically present in a SE and registered in the File Catalog • lfc commands to interact with file catalog features • lcg-utils commands ensure the consistency between files in the Storage Elements and entries in the File Catalog

  20. LFC commands LFC Catalog commands

  21. lfc-ls Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where pathspecifies the LFN pathname (mandatory) • Remember that LFC has a directory tree structure • /grid/<VO_name>/<you create it> • All members of a VO have read-write permissions for their own directory • You can set LFC_HOME to use relative path > lfc-ls /grid/gilda/misurelli > export LFC_HOME=/grid/gilda > lfc-ls -l misurelli LFC Namespace Defined by the user

  22. lfc-mkdir Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... • Where pathspecifies the LFC pathname • Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand: • lfc-mkdir /grid/gilda/misurelli/practise • lfc-ls -l /grid/gilda/misurelli

  23. lcg-utils commands Replica Management File Catalog Interaction

  24. lcg-utils: lcg-cr • Upload a file to a SE and register it into the catalog lcg-cr -d dest_file | dest_host [-g guid] [-l lfn] [-v | --verbose] --vo vo src_file where: • dest_hostis the fully qualified hostname of the destination SE • dest_fileis a valid SURL (both sfn:// or srm:// format are valid) • guidspecifies the Grid Unique IDentifier. If this option is not present, a GUID is generated internally • lfnspecifies the Logical File Name associated with the file • vospecifies the Virtual Organization the user belongs to • src_filespecifies the source file name: the protocol can be file:/// or gsiftp:///

  25. Advanced utilities: gridftp commands Used for low level management of file/directories in SEs edg-gridftp-existsTURL Checks if file/dir exists on a SE edg-gridftp-lsTURL Lists a directory on a SE globus-url-copysrcTURL dstTURL Copies files between SEs edg-gridftp-mkdirTURL Creates a directory on a SE edg-gridftp-renamesrcTURL dstTURL Renames a file on a SE edg-gridftp-rmTURL Removes a file from a SE edg-gridftp-rmdirTURL Removes a directory on a SE

  26. Globus-url-copy • globus-url-copy srcTURL destTURL • low level file transfer • Interaction with RLS components • edg-lrc command (actions on LRC) • edg-rmc command (actions on RMC) • C++ and Java API for all catalog operations • http://edg-wp2.web.cern.ch/edg-wp2/replication/docu/r2.1/edg-lrc-devguide.pdf • http://edg-wp2.web.cern.ch/edg-wp2/replication/docu/r2.1/edg-rmc-devguide.pdf • Using low level CLI and API is STRONGLY discouraged • Risk: loose consistency between SEs and catalogues • REMEMBER: a file is in Grid if it is BOTH: • stored in a Storage Element • registered in the file catalog

  27. References • gLite documentation homepage • http://glite.web.cern.ch/glite/documentation/default.asp • LFC and DPM documentation • https://uimon.cern.ch/twiki/bin/view/LCG/DataManagementDocumentation

  28. Questions…

More Related