170 likes | 303 Views
Replicated Distributed Programs. Eric C. Cooper University of California Berkeley, 1985 4. 21. 2004 Presented by Roh, HyunGul hgroh@camars.kaist.ac.kr. Contents. Introduction Overview A Model of Replicated Distributed Programs Replicated Procedure Calls Performance analysis
E N D
Replicated Distributed Programs Eric C. Cooper University of California Berkeley, 1985 4. 21. 2004 Presented by Roh, HyunGul hgroh@camars.kaist.ac.kr
Contents • Introduction • Overview • A Model of Replicated Distributed Programs • Replicated Procedure Calls • Performance analysis • Synchronization • Transaction • Binding • Summary 2
Introduction • Want highly available distributed programs • Despite failures of some of its component • Fault-tolerant • Nonstop • Replicated Distributed Programs • Replication ; von Neumann(1956) • How in this paper? • Replication on per-module basis; • Flexible & not burdening the programmer • Provide transparency to programmer • Fundamental mechanism • Troupes, or replicated modules • Replicated procedure call, many-to-many 3
Replicated DistributedPrograms Active agent as single thread State information Troupe module procedures Overview - A model of Replicated Distributed Program - How can replications be added? How can controls be transferred? Which protocols are used? • What are problems issued? • Synchronization problem • Transaction • Binding distributed & replicated Programs • Reconfiguration & Recovery from partial failure 4
Replicated DistributedPrograms Active agent as single thread State information Troupe module procedures Distributed Modules & Treads • Modules • Packaging state information & procedures • Separating the interface to that abstraction from its implementation • Threads • “A thread of control is an abstraction intended to capture the notion of an active agent in a computation” • Particular thread runs in exactly on module at a given time • Multiple threads in same module • Moving among modules • Implementation • Provide location transparency • Module; implemented by server • Thread; implemented by using RPC to transfer control from server to server 5
Adding Replication to Distributed Programs • Partial failures of the distributed program • Masking failures is replication • Replication transparency (RT) • Terminology • Troupes; replicated module • Troupe members; the replicas • Assumptions • Troupe members; execute on fail-stop processor • if not => Byzantine agreement • How is replication transparency in troupe model guaranteed? • Deterministic troupe: a set of replicas of a deterministic modules • (input → unique output) • Troupe consistent (TC) ; When all its members are in the same state • In the absence of application-specific knowledge, TC ⇔ RT • All troupes is deterministic ⇒ guarantee RT troupe Replicated Procedure Calls Troupe member Troupe member Troupe member 6
client server Call P P: proc Call P P: proc Call P P: proc Replicated Procedure Calls(1/3) • RPC (remote procedure call) : distributed programs can be written as local programs • Replicated Procedure Calls: When modules are replaced by troupes, natural generalization of RPC • Server & Client? • Server troupe: have a procedure module • Client troupe: caller • How? The Circus Paired Message Protocol • Previous works by author • Characteristic • Paired msgs (e.g. call and return) • Reliably delivered • Variable length • Call sequence numbers • Based on the RPC • Use UDP, the DARPA User Datagram Protocol • Connectionless but retransmission 7
server client P: proc Call P P: proc Call P Call P P: proc P: proc Call P [Many-to-one call] Replicated Procedure Calls(2/3) • RPPCs are implemented on the top of the paired message layer • One-to-Many calls • Each client troupe member to the entire server troupe • The client will normally wait for all return msgs from the server troupe • Many-to-One calls • Each server troupe member handles from the entire client troupe • Two problems: • Distinguish unrelated call • How many other call msgs to expect as part of the same replicated call • (Author’s previous work ) [One-to-many call] 8
client server Call P P: proc Call P P: proc Call P P: proc Replicated Procedure Calls(3/3) • Many-to-Many calls • Client: call msgs to entire server troupes • Server: return msgs to entire client troupes • Waiting for Message to arrive • Since troupes are assumed to be deterministic, all msgs will be identical • When should computation proceed? • only after the entire set has arrived • First come • only after the entire set has arrived • error detection, error correction • Expensive execution time • First come • Determined by fastest member of each troupe • Send return to un-received members as soon as fastest member call got • Crashes and Partitions • crash detection: Probing & timeout • Network partition: Which member receive Majority of the expected set of message? • Collators • A function that maps a set of msgs into a single result 9
Performance Analysis • Measuring the cost of replicated procedure calls as a function of the degree of replication • Six VAX-11/750 by a single 10 Mb/s Ethernet 10
The Synchronization Problem for troupes • Multiple threads of control • If they want same resource? • Serializability can be achieved by any of a number of concurrency control algorithm • When Server module is a troupes; • Serialized by each server troupe member • Serialized by the same order 11
Replicated Transactions • The Transaction mechanism • Guaranteeing serializability & atomicity • Conventional transaction • The permanence of committed updates • Crash recovery algorithm • Correctness condition for conventional transactions • serializability • Troupe consistency must also be preserved • Existing concurrency control algorithm for replicated DB • Require communication among replicas • Can’t be used in troupe model • One well-known multiple-copy concurrency control algorithm • two-phase locking with unanimous update 12
A Troupe Commit Protocol • Optimistic • “concurrent transaction are unlikely to conflict” • Detect un-serialized transaction • transform such un-serialized transaction into a deadlock • Essential property • Two troupe members succeed in committing two transaction iff both troupe members attempt to commit the transactions in the same order ready_to_commit Client troupe C Server troupe S Client troupe C’ S1 (T,T’) (T, T’) T’ T S2 (T,T’) (T,T’) (1) (2) Commit or not Commit or not 13
Binding Agents for Distributed Programs • “A binding agent is a mechanism the enables programs to import and export modules by interface name” • [lookups, registration, deletion] can be provided by a general purpose name server • Clients cache the result of lookups • The classic cache invalidation problem • Garbage collection: obsolete registration information 14
Binding Agents for Replicated Programs • Import and export troupes rather than single modules • Binding agent must manipulate sets of module addresses rather than single addresses • More complicated cache invalidation problem • Troup ID as a form of incarnation number 15
Reconfiguration & Recovery from Partial Failure • Detect crash by timeout • Replace crashed troupe • New troupe member (add_troupe_member) • State consistent with that of the other members • Be registered with the binding agent • “get_state” procedure 16
Summary • Details are invisible • Replication transparency • Transfer of control; replicated procedure calls • Circus paired protocol • Many-to-many • Serializability for concurrency control • Binding & reconfiguration 17