Computer Science 328 Distributed Systems

Computer Science 328Distributed Systems Lecture 12 Distributed Transactions

Distributed Transactions • A transaction (flat or nested) that invokes operations in several servers. T11 A A T1 H T T12 B T B T21 C Y T2 K D C T22 F D Z Nested Distributed Transaction Flat Distributed Transaction

Coordination in Distributed Transactions Coordinator join Coordinator Participant Open Transacton Coordinator resides in 0ne of the servers TID A T X Close Transacton join Participant Abort Transacton join T B 3 1 Y a.method (TID, ) Join (TID, ref) Participant C A Participant D 2 Z Coordinator & Participants The Coordination Process

join openTransaction participant closeTransaction A a.withdraw(4); . . join BranchX T participant b.withdraw(T, 3); Client B b.withdraw(3); T = openTransaction BranchY join a.withdraw(4); c.deposit(4); participant b.withdraw(3); c.deposit(4); d.deposit(3); C closeTransaction D d.deposit(3); Note: the coordinator is in one of the servers, e.g. BranchX BranchZ Distributed banking transaction

Atomic Commit Protocols • Atomicity principle requires that either all the distributed operations of a transaction complete or all abort. • When the client asks the coordinator to commit the transaction, the two-phase commit protocol is executed. • In a one-phase commit protocol, the coordinator communicates commit or abort to all participants until all acknowledge. • When a client/coordinator requests a commit, it does not allow a server to make a unilateral decision to abort a transaction. • A server may have to abort the transaction, for example, in the case of deadlock. • In a two-phase protocol, any participant can abort its part of the transaction. Transaction is committed by consensus.

Operations for Two-Phase Commit Protocol • canCommit?(trans)-> Yes / No • Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote. • doCommit(trans) • Call from coordinator to participant to tell participant to commit its part of a transaction. • doAbort(trans) • Call from coordinator to participant to tell participant to abort its part of a transaction. • haveCommitted(trans, participant) • Call from participant to coordinator to confirm that it has committed the transaction. • getDecision(trans) -> Yes / No • Call from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages.

The two-phase commit protocol • Phase 1 (voting phase): • 1. The coordinator sends a canCommit? request to each of the participants in the transaction. • 2. When a participant receives a canCommit? request it replies with its vote (Yes or No) to the coordinator. Before voting Yes, it prepares to commit by saving objects in permanent storage. If the vote is No the participant aborts immediately. • Phase 2 (completion according to outcome of vote): • 3. The coordinator collects the votes (including its own). • (a) If there are no failures and all the votes are Yes the coordinator decides to commit the transaction and sends a doCommit request to each of the participants. • (b) Otherwise the coordinator decides to abort the transaction and sends doAbort requests to all participants that voted Yes. • 4. Participants that voted Yes are waiting for a doCommit or doAbort request from the coordinator. When a participant receives one of these messages it acts accordingly and in the case of commit, makes a haveCommitted call as confirmation to the coordinator. Recall that server may crash

Coordinator Participant step status step status canCommit? prepared to commit 1 Yes (waiting for votes) 2 prepared to commit (uncertain) doCommit 3 committed haveCommitted 4 committed done Communication in Two-Phase Commit Protocol • To deal with server crashes • Each server saves information relating to the two-phase commit protocol in permanent storage. The information can be retrieved by a new server after a server crash. • To deal with canCommit? loss • The participant may decide to abort unilaterally after a timeout. • To deal with Yes/No loss, the coordinator aborts the transaction after a timeout (pessimistic!). It must annouce doAbort to those who sent in their votes. • To deal with doCommit loss • The participant may wait for a timeout, send a getDecision request.

Two Phase Commit (2PC) Protocol Coordinator Participant CloseTrans() Execute • Execute • Precommit notready ready request • Abort • Send NO to coordinator • Uncertain • Send request to each participant • Wait for replies (time out possible) • Precommit • send YES to coordinator • Wait for decision NO YES Timeout or a NO All YES COMMIT decision ABORT decision • Abort • Send ABORT to each participant • Commit • Send COMMIT to each participant • Commit • Make transaction visible Abort

Locks in Distributed Transactions • Each server is responsible for applying concurrency control to its objects. • Servers are collectively responsible for serial equivalence of operations. • Locks are held locally, and cannot be released until all servers involved in a transaction have committed or aborted. • Locks are retained during 2PC protocol • Since lock managers work independently, deadlocks are very likely.

Distributed Deadlocks • The wait-for graph in a distributed set of transactions is held partially by each server • To find cycles in a distributed wait-for graph, we can use a central coordinator: • Each server reports updates of its wait-for graph • The coordinator constructs a global graph and checks for cycles • Centralized deadlock detection suffers from usual comm. problems. • In edge chasing servers collectively make the global wait-for graph and detect deadlocks : • Servers forward “probe” messages to servers in the edges of wait-for graph, pushing the graph forward, until cycle is found.

V Held by Wait for C A X Z U  V Wait for Held by T  U  V T U Wait for Held by B Y X: U  V Y: T  U  V Example: Edge Chasing V Held by Wait for C A X Z Held by Wait for T U B Wait for Held by Y X: U  V Y: T  U Z: V  T LOCAL Wait-for GRAPHS Z: T U V  T deadlock detected

Edge Chasing • Initiation: When a server S1 notes that a transaction T starts waiting for another transaction U, where U is waiting to access an object at another server S2, it initiates detection by sending <TU> to S2. • Detection: Severs receive probes and decide whether deadlock has occurred and whether to forward the probes. • Resolution: When a cycle is detected, a transaction in the cycle is aborted to break the deadlock.

W ® ® ® W U V W Held by Waits for Deadlock C detected A Z X Initiation ® ® W U V Waits ® W U for V U Held by Waits for B Y Probes Transmitted to Detect Deadlock

(c) detection initiated at object requested by W (a) initial situation (b) detection initiated at object requested by T T Waits for Waits for T T ® T U ® ® T W V ® ® ® W V T U V U V U V ® ® T U W U ® ® ® T U W V Waits W ® W V for Waits W W for Two Probes Initiated At about the same time, T requests the object held by U and W requests the object held by V.

Transaction Priority • In order to ensure that only one transaction in a cycle is aborted, transactions are given priorities in such a way that all transactions are totally ordered. • When a deadlock cycle is found, the transaction with the lowest priority is aborted. Even if several different servers detect the same cycle, only one transaction aborts.

2PC in Nested Transactions • Each (sub)transaction has a coordinator openSubTransaction(trans-id)  subTrans-id getStatus(trans-id)  (committed / aborted / provisional) • Each sub-transaction starts after its parent and finishes before it. • When a sub-transaction finishes, it makes a decision to abort or provisionally commit. (Note that a provisional commit is not the same as being prepared – it is just a local decision and is not backed up on permanent storage.) • When the top-level transaction completes, it does a 2PC with its subs to decide to commit or abort.

Abort No Provisional Yes Yes Provisional Provisional Yes No Yes Abort Provisional Yes Example 2PC in Nested Transactions T11 A T11 T1 H T1 T12 B T12 T T T21 C T21 T2 T2 K D T22 T22 F Nested Distributed Transaction Bottom up decision in 2PC

T abort (at M) 11 T provisional commit (at X) 1 T T provisional commit (at N) 12 provisional commit (at N) T 21 aborted (at Y) T 2 T provisional commit (at P) 22 An Example of Nested Transaction

Information Held by Coordinators of Nested Transactions Coordinator of Child Participant Provisional Abort list transaction transactions commit list T T , T yes T , T T , T 1 2 1 12 11 2 T T , T yes T , T T 1 11 12 1 12 11 T T , T no (aborted) T 2 21 22 2 T no (aborted) T 11 11 T T , T T but not T , T 21 12 21 12 21 12 T no (parent aborted) T 22 22 • When each sub-transaction was created, it joined its parent transaction. • The coordinator of each parent transaction has a list of its child sub-transactions. • When a nested transaction provisionally commits, it reports its status and the status of • its descendants to its parent. • When a nested transaction aborts, it reports abort without giving any information about • its descendants. • The top-level transaction receives a list of all sub-transactions, together with their status.

Hierarchic Two-Phase Commit Protocol forNested Transactions • canCommit?(trans, subTrans) -> Yes / No • Call a coordinator to ask coordinator of child subtransaction whether it can commit a subtransaction subTrans. The first argument trans is the transaction identifier of top-level transaction. Participant replies with its vote Yes / No. • The coordinator of the top-level transaction sends canCommit? to the coordinators of its immediate child sub-transactions. The latter, in turn, pass them onto the coordinators of their child sub-transactions. • Each participant collects the replies from its descendants before replying to its parent. • T sends canCommit? messages to T1 (but not T2 which has aborted); T1 sends CanCommit? messages to T12 (but not T11).

Flat Two-Phase Commit Protocol for Nested Transactions • canCommit?(trans, abortList) -> Yes / No • Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote Yes / No. • The coordinator of the top-level transaction sends canCommit? Messages to the coordinators of all sub-transactions in the provisional commit list (e.g., T1 and T12). • If the participant has any provisionally committed transactions that are decendants of the transaction with TID trans: • Check that they do not have any aborted ancestors in the abortList. Then prepare to commit. • Those with aborted ancestors are aborted. • Send a Yes vote to the coordinator. • If the participant does not have a provisionally committed descendent, it must have failed after it performed a provisional commit. Send a No vote to the coordinator.

Transaction Recovery • Recovery is concerned with: • Object (data) durability: saved permanently • Failure Atomicity: effects are atomic even when servers crash • Recovery Manager’s tasks • To save objects (data) on permanent storage for committed transactions. • To restore server’s objects after a crash • To maintain and reorganize a recovery file for an efficient recovery procedure. • To collect freed storage (garbage collection)

The Recovery File Recovery File Transaction Entries T1: committed T1: Prepared T2: Prepared Trans. Status Object Ref Object Ref Intention List Object Ref Object Ref … Object Ref Object Entries name name name name name … … values values values values values

A 100 C 278 B 242 B 220 A 80 C 300 B 200 Example: Recovery File 200 300 100 b: c: a: Transaction T1 Transaction T2 balance = b.getBalance() b.setBalance = (balance*1.1) balance = b.getBalance() b.setBalance(balance*1.1) a.withdraw(balance* 0.1) c.withdraw(balance*0.1) 220 b: 242 b: 80 a: 278 c: p1 p3 p4 p7 p2 p5 p6 p0 T2: Preped <B, P4> <C, P6> p5 T1: Preped <A, P1> <B, P2> P0 T1: Commit p3

The Recovery File for 2PC Transaction Entries T1: committed T1: Prepared T2: Prepared Trans. Status Object Ref Object Ref Intention List Object Ref Object Ref … Object Ref Object Entries name name name name name … … values values values values values Coor’d: T2 Participants list Coor’d: T1 Participants list Par’pant: T1 Coordinator Par’pant: T1 Coordinator Coordination Entries …

Computer Science 328 Distributed Systems