170 likes | 367 Views
Introduction to DFS. Distributed File Systems. A file system whose clients, servers and storage devices are dispersed among the machines of a distributed system File system operations have to be carried out over the network A good DFS should ensure transparency
E N D
Distributed File Systems • A file system whose clients, servers and storage devices are dispersed among the machines of a distributed system • File system operations have to be carried out over the network • A good DFS should ensure transparency • Clients should have the look and feel of a conventional file system
Naming and Transparency • Mapping between the logical and physical objects • Location Transparency – Name and physical storage location have no relationship • Location independence – Name and physical storage are independent • Name need not be changed if physical location is changed • Location independent files are essentially logical data containers • Location transparency hides the association b/w names and physical storage
Naming Schemes • Combination of host name and local name • Local name is a path similar to Unix • Neither transparent nor independent • Attaching remote directories to the local directory • Popularized by Sun’s NFS • Appears as a coherent directory tree • Globally unique names • Truly transparent • Global naming structure spans all names • Difficult to achieve due to special files
Implementing Naming Schemes • Transparent naming requires mapping between names and their associated locations • Aggregating files into components for scalability and manageability • Hierarchical directory trees • Replication and caching • Maintaining consistency of cached view • Location independent file identifiers
Accessing Remote Files • Needs network data transfer • Remote service mechanism • Remote procedure call • Caching for improved performance
Caching • Idea is fetch once, use multiple times • If requested data is not available, get it from server • Store fetched data • Perform access on local data • Replace data when cache becomes full • One master copy at the server, several secondary copies at clients • Granularity – File blocks to entire file
Cache Location • Main memory • Workstations can be diskless • Faster access • Technology trends memory accesses becoming faster • Server caches will be in main memory – code reusability • Local disks • Reliability via persistence • Hybrid schemes • Best of both worlds
Cache Update Policy • Policy regarding when the modified data is reflected on the master copy • Can have significant impact on the performance • Write through policy • All writes are reflected immediately on the master copy • Blocking • Delayed writes • Write on flush • Periodic writes • Write on close
Ensuring consistency • Ensuring that data being read is consistent with master copy • Client initiated approach • Clients validates with server whether its data is up-to-date • Frequency of validation is the main issue • Check on first access • Check on every access • Periodic checking
Server Initiated Approaches • Server records the files each client is accessing • Detects potential inconsistency and notifies clients • Conflicts occur when at least 2 clients cache and one is writing • Invalidation/Update based mechanisms • Session semantics • Consistency enforced upon file closing • Unix semantics • Consistency enforced upon write
Why or Why not Caching • Locality of accesses • Gains in performance and scalability • Big chunks of data lead to lesser overheads • Disk accesses can be optimized for larger chunks of data • Consistency maintenance is the cost • Memory/disk space requirements at clients
Stateful vs. Stateless Servers • Stateful servers maintain information about files being accessed by clients • Clients are given connection ids, which acts as index into inode tables • Performance gains – Prefetching file blocks • Stateless servers maintain no state • Each request is self-contained • Reliability is the issue !!!