1 / 21

Distributed File System: Data Storage for Networks Large and Small

Distributed File System: Data Storage for Networks Large and Small. Pei Cao Cisco Systems, Inc. . Review: DFS Design Considerations. Name space construction AAA Operator batching Client caching Data consistency Locking. Summing it Up: CIFS as an Example. Network transport in CIFS

elina
Download Presentation

Distributed File System: Data Storage for Networks Large and Small

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.

  2. Review: DFS Design Considerations • Name space construction • AAA • Operator batching • Client caching • Data consistency • Locking

  3. Summing it Up: CIFS as an Example • Network transport in CIFS • Use SMB (Server Message block) messages over a reliable connection-oriented transport • TCP • NetBIOS over TCP • Use persistent connections called “sessions” • If a session is broken, client does the recovery

  4. Design Choices in CIFS • Name space construction: • per-client linkage, multiple methods for server resolution • file://fs.xyz.com/users/alice/stuff.doc • \\cifsserver\users\alice\stuff.doc • E:\stuff.doc • CIFS also offers “redirection” method • A share can be replicated in multiple servers or moved • Client open  server reply “STATUS_DFS_PATH_NOT_COVERED”  client issues “TRANS2_DFS_GET_REFERRAL”  server reply with new server

  5. Design Choices in CIFS • AAA: Kerberos • Older systems use NTLM • Operator batching: supported • These methods have “AndX” variations: TREE_CONNECT, OPEN, CREATE, READ, WRITE, LOCK • Server implicitly takes results of preceding operations as input for subsequent operations • First command that encounters an error stops all subsequent processing in the batch

  6. Design Choices in CIFS • Client caching • Cache both file data and file metadata, write-back cache, can read-ahead • Offers strong cache consistency using an invalidation-based approach • Data access consistency • Oplocks: similar to “tokens” in AFS v3 • “level II oplock”: read-only data locks • “exclusive oplock”: exclusive read/write data lock • “batch oplock”: exclusive read/write “open” lock and data lock and metadata lock • Transition among the oplocks • Observation: can have a hierarchy of lock managers

  7. Design Choices in CIFS • File and data record locking • Offer “shared” (read-only) and “exclusive” (read/write) locks • Part of the file system; Mandatory • Can lock either a whole file or byte-range in the file • Lock request can specify a timeout for waiting • Enables atomic writes with the “ANDX” batching with Writes • “Lock/write/unlock” as a batched command sequence • Additional capability: “directory change notification”

  8. DFS for Mobile Networks • What properties of DFS are desirable: • Handle frequent connection and disconnection • Enable clients to operate in disconnected state for an extended period of time • Ways to resolve/merge conflicts

  9. Design Issues for DFS in Mobile Networks • What should be kept in client cache? • How to update the client cache copies with changes made on the server? • How to upload changes made by the client to the server? • How to resolve conflicts when more than one clients change a file during disconnected state?

  10. Example System: Coda • Client cache content: • User can specify which directories should always be cached on the client • Also cache recently used files • Cache replacement: walk over the cached items every 10 min to reevaluate their priorities • Updates from server to client: • The server keeps a log of callbacks that couldn’t be delivered and deliver them upon client connection

  11. Coda File System • Upload the changes from client to server • The client has to keep a “replay log” • Contents of the “replay log” • Ways to reduce the “replay log” size • Handling conflicts • Detecting conflicts • Resolving conflicts

  12. Performance Issues in File Servers • Components of server load • Network protocol handling • File system implementation • Disk accesses • Read operations • Metadata • Data • Write operations • Metadata • Data • Workload characterization

  13. DFS for High-Speed Networks: DAFS • Proposal from Network Appliance and companies • Goal: eliminate memory copies and protocol processing • Standard implementation: network buffers  file system buffer cache  user-level application buffers • Designed to take advantage of RDMA (“Remote DMA”) network protocols • Network transport provides direct memory  memory transfer • Protocol processing is provided in hardware • Suitable for high-bandwidth, low-error-rate, low-latency network

  14. DAFS Protocol • Data read from the client: • RDMA request from the server to copy file data directly into application buffer • Data write from the client • RDMA request from the server to copy application buffer into server memory • Implementation: • as a library linked to user application interface with RDMA network library directly • Eliminate two data copies • as a new file system implementation in the kernel • Eliminate one data copy • Performance advantage: • Example: 90 usec/op in NFS vs. 25 usec/op in DAFS

  15. DAFS Features • Session-based • Offer authentication of client machines • Flow control by server • Stateful lock implementation with leases • Offers atomic writes • Offers operator batching

  16. Clustered File Servers • Goal: scalability in file service • Build a high-performance file service using a collection of cheap file servers • Methods for Partitioning the Workload • Each server can support one “subtree” • Advantages • Disadvantages • Each server can support a group of clients • Advantages • Disadvantages • Client requests are sent to server in round-robin or load-balanced fashion • Advantages • Disadvantages

  17. Non-Subtree-Partition Clustered File Servers • Design issues • On which disks should the data be stored? • Management of memory cache in file servers • Data consistency management • Metadata operation consistency • Data operation consistency • Server failure management • Single server failure fault tolerance • Disk failure fault tolerance

  18. Mapping Between Disks and Servers • Direct-attached disks • Network-attached disks • Fiber-channel attached disks • iSCSI attached disks • Managing the network-attached disks: “volume manager”

  19. Functionalities of a Volume Manager • Group multiple disk partitions into a “logical” disk volume • Volume can expand or shrink in size without affecting existing data • Volume can be RAID-0/1/5, tolerating disk failures • Volume can offer “snapshot” functionalities for easy backup • Volumes are “self-evident”

  20. Implementations of Volume Manager • In-kernel implementation • Example: Linux volume manager, Veritas volume manager, etc. • Disk server implementation • Example: EMC storage systems

  21. Serverless File Systems • Serverless file systems in WAN • Motivation: peer-to-peer storage; never lose the file • Serverless file system in LAN • Motivation: client powerful enough to be like servers; use all client’s memory to cache file data

More Related