320 likes | 440 Views
Sensitivity of Cluster File System Access to I/O Server Selection. A. Apon, P. Wolinski, and G. Amerson University of Arkansas. Overview. Benchmarking study Parallel Virtual File System (PVFS) Network File System (NFS) Testing parameters include Pentium-based cluster node hardware
E N D
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas
Overview • Benchmarking study • Parallel Virtual File System (PVFS) • Network File System (NFS) • Testing parameters include • Pentium-based cluster node hardware • Myrinet interconnect • Varying number and configuration of I/O servers and client request patterns
Outline • File system architectures • Performance study design • Experimental results • Conclusions and future work
NFS Architecture • Client/server system • Single server for files NFS Server Node 1 Each cluster node has dual-processor Pentium Linux, HD, lots of memory Node 0 Node 2 Network Switch DATA FILE Node N
Node 0 Node 2 Node 1 PVFS Architecture • Also a client/server system • Many servers for each file • Fixed sized stripes in round-robin fashion DATA FILE Network Switch Each cluster node still has dual-processor Pentium Linux, HD, lots of memory
PVFS Architecture • One node is a manager node • Maintains metadata information for files • Configuration and usage options include: • Size of stripe • Number of I/O servers • Which nodes serve as I/O servers • Native PVFS API vs. UNIX/POSIX API
Native PVFS API example #include <pvfs.h> int main() { int fd, bytes; fd=pvfs_open(fn,O_RDONLY,0,NULL,NULL); ... pvfs_lseek(fd, offset, SEEK_SET); ... bytes_read = pvfs_read(fd, buf_ptr, bytes); ... pvfs_close(fd); }
Performance Study Design • Goals • Investigate the effect on cluster I/O when using the NFS server or the PVFS I/O servers also as clients • Compare PVFS with NFS
Performance Study Design • Experimental cluster • Seven dual-processor Pentium III 1GHz, 1GB memory computers • Dual EIDE disk RAID 0 subsystem in all nodes, measured throughput about 50MBps • Myrinet switches, 250MBps theoretical bandwidth
Performance Study Design • Two extreme client workloads • Local whole file (LWF) • Takes advantage of caching on server side • One process per node, each process reads the entire file from beginning to end Node 1 Node 2 Node N
Performance Study Design • Two extreme client workloads • Global whole file (GWF) • Minimal help from caching on the server side • One process per node, each process reads a different portion of the file, balanced workload Node 1 Node 2 Node N
NFS Parameters • Mount on Node 0 is a local mount • Optimization for NFS • NFS server can participate or not as a client in the workload
PVFS Parameters • A preliminary study was performed to determine the “best” stripe size and request size for the LWF and GWF workloads • Stripe size of 16KB • Request size of 16MB • File size of 1GB • All I/O servers for a given file participate in all requests for that file
System Software • RedHat Linux version 7.1 • Linux kernel version 2.4.17-rc2 • NFS protocol version 3 • PVFS version 1.5.3 • PVFS kernel version 1.5.3 • Myrinet network drivers gm-1.5-pre3b • MPICH version 1.2.1
Experimental Pseudocode For all nodes Open the test file Barrier synchronize with all clients Get start time Loop to read/write my portion Barrier synchronize with all clients Get end time Report bytes processed and time For Node 0 Receive bytes processed, report aggregate throughput
Clearcache • Clear NFS client and server-side caches • Unmount NFS directory, shutdown NFS • Restart NFS, remount NFS directories • Clear server-side PVFS cache • Unmount PVFS directories on all nodes • Shutdown PVFS I/O daemons, manager • Unmount pvfs-data directory on slaves • Restart PVFS manager, I/O daemons • Remount PVFS directories, all nodes
Experimental Parameters • Number of participating clients • Number of PVFS I/O servers • PVFS native API vs. UNIX/POSIX API • I/O servers (NFS as well as PVFS) may or may not also participate as clients
Experimental Results • NFS • PVFS native API vs UNIX/POSIX API • GWF, varying server configurations • LWF, varying server configurations
PVFS and NFS, GWF, 1 and 2 clients with/without server participating
PVFS and NFS, LWF, 1, 2, 3 clients with/without servers participating
PVFS, LWF and GWF, separate clients and servers, seven nodes
Conclusions • NFS can take advantage of a local mount • NFS performance is limited by contention at the single server • Limited to the disk throughput or the network throughput from the server, whichever has the most contention
Conclusions • PVFS performance generally improves (does not decrease) as the number of clients increases • More improvement seen with LWF workload than with the GWF workload • PVFS performance improves when the workload can take advantage of server-side caching
Conclusions • PVFS is better than NFS for all types of workloads where more than one I/O server can be used • PVFS UNIX/POSIX API performance is much less than the performance using the PVFS native API • May be improved by a new release of the Linux kernel
Conclusions • For a given number of servers, PVFS I/O throughput decreases when the servers also act as clients • For the workloads tested, PVFS system throughput increases to the maximum possible for the cluster when all nodes participate as both clients and servers
Observation • The drivers and libraries have been in constant upgrade during these studies. However, our recent experiences indicate that they are now stable and interoperate well together.
Future Work • Benchmarking with cluster workloads that include both computation and file access • Expand the benchmarking to a cluster with a higher number of PVFS clients and PVFS servers