190 likes | 268 Views
QoS -enabled Tree-based Distributed Mutexes for Clouds. James Edmondson, Aniruddha Gokhale , Douglas Schmidt Vanderbilt University October 11, 2011. Table of Contents. Introduction Example Proposed Solution QoS Properties Performance Profile Conclusion. Introduction.
E N D
QoS-enabled Tree-based Distributed Mutexes for Clouds James Edmondson, AniruddhaGokhale, Douglas Schmidt Vanderbilt University October 11, 2011
Table of Contents • Introduction • Example • Proposed Solution • QoS Properties • Performance Profile • Conclusion
Introduction • Mutual exclusion is the acquisition of a resource by competing threads or processes • In distributed systems, most mutual exclusion techniques are only interested in proper execution • No QoS guarantees • Priority inversions are common • Fault tolerance is a side note or rarely considered • Centralized solutions are surprisingly prevalent and pervasive
Example (GFS) • Google File System—subsequently ported to openly available Hadoop File System • All requests go through a master controller • Who then goes through a name node • Which then passes off operations to a datanode • System is optimized for throughput of large files
Problems (GFS) • High throughput but high latency • When master controller goes down, so does the file system • When name nodes go down, so does the file system • Mutual exclusion for writes • No concept of quality-of-service • For clouds, this means you can’t show preference to high paying customers or real-time or mission-critical applications
How to approach solution? • Look to established distributed mutual exclusion techniques • Try to optimize for high throughput and low latency • Provide quality-of-service mechanisms for real-time or preferred application deployments • Ideally, should accommodate adding or removing new nodes whenever necessary
Introduction (other techniques) Lamport technique Broadcast to all processes Wait for replies from all Enter critical section B A F R(1,D) G(1,D) R(1,D) G(1,D) C G(1,D) R(1,D) R(1,D) G(1,D) D E R(1,D) G(1,D)
Introduction (other techniques) Agarwal-El Abbadi,Tree-based quorum Each process requests CS from log n processes in its quorum Each process in the quorum must grant access A B C R(1,D) R(1,D) G(1,D) G(1,D) D E F
Introduction (other techniques) Raymond Spanning Tree Token is passed along a spanning tree If a process has a token, it is free to enter its critical section A G(1,D) R(1,D) B C G(1,D) R(1,D) D E F R(1,D)
Proposed Solution Tree-based scheme • Each process only requests from its parent • Root process grants based on some priority mechanism • Reply percolates down to requester • Release percolates back up to root A R(1,D) G(1,D) L(D) B C G(1,D) R(1,D) L(D) D E F
QoS Properties Fine-grained priority control • Higher priority processes can be placed at higher levels • Faster access for higher priority processes • Multiple priority mechanisms available • Weighted Level and Fair are of note A 4 G(1,D) R(2,1,F) R(2,1,D) B C 3 3 G(1,D) R(2,1,D) R(2,1,E) R(2,1,F) D E F 2 2 2
QoS Properties • Processes closer to the root get far more critical sections with certain reply and release models • Before forwarding a reply to a child, if we have a request pending, go ahead and enter our CS before forwarding • Before forwarding a release to a parent, if we have a request pending, go ahead and enter CS before forwarding A R(1,D) L(D,B) G(1,D) B C G(1,D) R(1,D) L(D) D E F
QoS Properties • Fault tolerance for token timeout • Multicast token request with a timeout • Response bypasses logical tree • Response is only a notification that token is still alive • If timeout, then inform all processes of invalidated old token • View change request (plurality vote) for new token A logical real A B C B C D E F D E F
QoS Properties • Fault tolerance for process failure • Presume process will be replaced or restarted on failure • Maintain queue of requests, remove requests on release • Most faults do not require any specialized changes • Introduce sync message to cover special failures • Executed by parent of a failed process A R(1,D) S(1,D) G(1,D) L(D) B C C B Process Cloud S(1,D) G(1,D) L(D) R(1,D) D E F
QoS Properties • Fault tolerance for process failure • Root process failure is only slightly different • If a process starts and has no parent, send a token request, equivalent to token timeout • If not in a cloud of available processes, vote for a root process and rework routing network
Performance Profile (no faults) • Worst case message complexity • 3d (d = depth of tree = log n in complete b-tree) • Best case message complexity A • 0 – root process G(1,D) R(1,B) G(1,B) R(1,D) L(B) L(D) • 3 – child process of root B C G(1,D) R(1,D) L(D) D E F
Performance Profile (no faults) • Heavy load – weighted/level priority – no root requests • Message complexity - 3d (d = highest depth of tree where recurring requests are occurring) • Synchronization delay – 2 x t x d (t = message time) d = 1 A R(2,2,C) G(C) L(C) R(2,1,C) R(2,1,B) R(2,2,B) G(B) L(B) B C R(3,1,D) D E F
Performance Profile (no faults) • Averages for heavy load – fair priority – with root requests • Message complexity – O(3d) (d = maximum depth of tree) • Synchronization delay – O(t) to O(2 t d) , t = message transmit time A R(1,D) L(D,B) G(1,D) B C G(1,D) R(1,D) L(D) D E F
Conclusion • QoS-enabled tree-based mutexes offer many benefits over traditional distributed mutexes • Reduced priority inversions • Robust fault tolerance • Excellent scalability and throughput • Relatively easy to implement • Trees are a familiar data structure for most programmers • Code and information at project site • http://code.google.com/p/qosmutex • Also can contact me at jedmondson@gmail.com