1 / 16

Data Management Systems

Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego moore@sdsc.edu http://www.npaci.edu/DICE/. Data Management Systems. Data sharing - data grids Federation across administration domains

clea
Download Presentation

Data Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying Data Grids to Support Distributed Data ManagementStorage Resource BrokerReagan W. MooreIan FiskBing ZhuUniversity of California, San Diegomoore@sdsc.eduhttp://www.npaci.edu/DICE/

  2. Data Management Systems Data sharing - data grids Federation across administration domains Latency management Sustained data transfers Data publication - digital libraries Discovery Organization Data preservation - persistent archives Technology management Authenticity

  3. Consistent Data Environments Storage Resource Broker combines the functionality of data grids, digital libraries, and persistent archives within a single data environment SRB provides Metadata consistency Latency management functions Technology evolution management

  4. Metadata Consistency Storage Resource Broker uses a logical name space to assign global identifiers to digital entities Files, SQL command strings, database tables, URLs State information that characterizes the result of operations on the digital entities is mapped onto the logical name space Consistency of state information is managed as update constraints on the mapping Write locks, synchronization flags, schema extension SRB state information is managed in the MCAT metadata catalog

  5. SRB Latency Management Remote Proxies, Staging Data Aggregation Containers Prefetch Network Destination Network Destination Source Caching Client-initiated I/O Streaming Parallel I/O Replication Server-initiated I/O

  6. SRB 2.0 - Parallel I/O Client-directed parallel I/O- Client/Server Thread-safe client client decides the number of threads to use each thread is responsible for a data segment and connects to the server independently utilities srbpput and srbpget Sustains 80% to 90% of available bandwidth using 4 parallel I/O streams and a window size of 800 kBytes

  7. SRB 2.0 - Parallel I/O (cont1) Server-directed parallel I/O - Client/Server Server plans and decides number of threads to use Separate “Control” and “data transfer” sockets Client listens on the “control” socket and spawns threads to handle data transfer Always a one-hop data transfer between client and server Similar to HPSS Works seamlessly with HPSS Mover protocol Also works for other file systems

  8. SRB 2.0 - Parallel I/O (cont2) Parallel I/O - Server/Server Copy, replicate and staging operations Always used in third-party transfer operations Server/server data transfer, client not involved Uses up to 4 threads depending on file size 7-10 times improvement for large files across country Up to 39 MB/sec across campus (PC raid disk, gBit ethernet).

  9. Federated SRB server model Peer-to-peer Brokering Read Application Parallel Data Access Logical Name Or Attribute Condition 1 6 5/6 SRB server SRB server 3 4 5 SRB agent SRB agent 2 Server(s) Spawning R1 MCAT 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control R2 Data Access

  10. SRB 2.0 - Bulk operations Uploading and downloading large number of small files Multi-threaded Bulk registration – 500 files in one call Fill 8 MB buffer before sending Use of container New Sbload and Sbunload utilities Over 100 files per second registration 3-10+ times speedup

  11. Technology Management C, C++, Libraries Unix Shell Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX SDSC Storage Resource Broker & Meta-data Catalog Application Linux I/O OAI WSDL Access APIs DLL / Python Java, NT Browsers GridFTP Consistency Management / Authorization-Authentication Prime Server Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Storage Abstraction Databases DB2, Oracle, Sybase, SQLServer Servers HRM

  12. SRB Archival Tape Library System SRB archival storage system in addition to HPSS, UniTree, ADSM. A distributed pool of disk caches for front end A tape library system back end STK silo for tape storage and tape mount 3590 tape drives I/O always performed on disk cache Always stage data to cache

  13. CMS Experiment Ian Fisk - user level application Installed SRB servers at CERN, Fermi Lab, UCSD under a user account Remotely invoked data replication From UCSD, invoked data replication from CERN to Fermi Lab, and to UCSD Data transfers automatically used four parallel I/O streams, default window size of 800 kBytes Observed Sustained data transfer at 80% to 90% of available bandwidth Transferred over 1 TB of data per day using multiple sessions

  14. Future plans SRB 2.1 - Grid-oriented features, SRB-G (5/31/03) Add GridFTP driver – Access data through GridFTP server Upgrade to GSI 2.2 (GSI 1.1 in current version) Provide encrypted data transfer facility, using GSI encryption, between servers and between server and client. Explore network encryption as a digital entity property WSDL Services interface for SRB including data movement,replication, access control, metadata ingestion and retrieval and container support. SRB 2.2 – Federated MCATs (8/30/03) Peer-to-peer MCATs Mount point like interface - /sdsc/…, /caltech/…

  15. Next CMS Experiments Sustained transfer Use 4 MB window size Bulk data registration In tests with DOE ASCI project, sustained registration of 400 files per second Peer-to-peer federation Prototype of ability to initiate data and metadata exchanges between MCAT catalogs

  16. For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE http://www.npaci.edu/DICE/SRB/index.html http://www.npaci.edu/dice/srb/mySRB/mySRB.html

More Related