530 likes | 734 Views
Distributed Transaction Processing. Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of CS 223 at UCI . Distributed Transaction. . Transaction T. Action: a 1 ,a 2. Action: a 3. Action: a 4 ,a 5.
E N D
Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of CS 223 at UCI
Distributed Transaction . Transaction T Action: a1,a2 Action: a3 Action: a4,a5 Transaction T spans multiple databases Notes 11
Distributed Transactions Processing • Each DBMS performs locking & logging for recovery • How to rollback T? • easy. Each DBMS can rollback independently. • How to commit T? • If DBMS1 commits T, and DBMS 2 experiences a a failure before REDO logs of T persistent, T cannot be committed at DBMS2. • Loss of atomicity. • Atomic Commit Protocols • Protocol guarantees all sites make the same decision – commit, or abort!
Atomic Commit Protocol • Guarantee of Atomic commit protocol: • Protocol resilient to communication and site failures • despite failures, if all failures repaired, then transactions commits or aborts at all sites. • Metrics to compare different protocols: • Overhead: I/O (number of records written to disk), network (number of messages– when no failures, when there are failures). • Latency: number of rounds of messages, amount of forced I/O to disk. • Blocking: unbounded waiting due to failures. • Recovery Complexity-- how complex is the termination protocol • Most common ACP: • Two-phase commit (2PC) • Centralized 2PC, Distributed 2PC, Linear 2PC, … • Three Phase Commit (3PC) Notes 11
Terminology • Resource Managers (RMs) • Usually databases • Participants • RMs that did work on behalf of transaction • Coordinator • Component that runs two-phase commit on behalf of transaction Notes 11
States of the Transaction • At Coordinator: • Initiated (I) -- transaction known to system • Preparing (P) -- prepare message sent to participants • committed (C) -- has committed • Aborted (A) -- has aborted • At participant: • Initiated (I) • Prepared (P) -- prepared to commit, if the coordinator so desires • committed (C) • Aborted (A) Notes 11
REQUEST-TO-PREPARE PREPARED* COMMIT* Participant DONE Local Prepare Work. Write prepare on logs (Forced) Local Prepare Work. (lazy) Write Commit record on logs (Forced) Local commit work. Write completion log record on logs. Ack when durable. (effectively forced, though not immediately) Write Completion log record (lazy) 1 2 Coordinator 3 4 5 Notes 11
Protocol Database • Coordinator maintains a protocol database (in main memory) for each transaction • Protocol database • enables coordinator to execute 2PC • answers inquiries by participants about status of transaction • cohorts may make such inquiries if they fail during recovery • entry for transaction deleted when coordinator is sure that no one will ever inquire about transaction again (when it has been acked by all the participants) Notes 11
Two phase commit -- normal actions commit case (coordinator) • make entry into protocol database for transaction marking its status as initiated when coordinator first learns about transaction • Add participant to the cohort list in protocol database when coordinator learns about the cohorts • Change status of transaction to preparing before sending prepare message. (it is assumed that coordinator will know about all the participants before this step) • On receipt of PREPARE message from cohort, mark cohort as PREPARED in the protocol database. If all cohorts PREPARED, then change status to COMMITTED and send COMMIT message. • must force a commit log record to disk before sending commit message. • on receipt of ACK message from cohort, mark cohort as ACKED. When all cohorts have acked, then delete entry of transaction from protocol database. • Must write a completed log record to disk before deletion from protocol database. No need to force the write though. Notes 11
Two Phase Commit - normal actions commit case (participant) • On receipt of PREPARE message, write PREPARED log record before sending PREPARED message • needs to be forced to disk since coordinator may now commit. • On receipt of COMMIT message, write COMPLETION log record before sending ACK to coordinator • cohort must ensure log forced to disk before sending ack -- but no great urgency for doing so. Notes 11
REQUEST-TO-PREPARE NO ABORT* Participant Local Prepare Work. Write prepare on logs (Forced) if voting yes. Else, local abort work. Write Abort log record (lazy) Local Prepare Work. (lazy) Write Abort log record (forced) Local abort work Write Abort record on logs. Ack when done (forced, though not immediately) Write completion log record (lazy) 1 2 Coordinator 3 4 DONE 5 Notes 11
Timeout actions • At various stages of protocol, transaction waits from messages at both coordinator and participants. • If message not received, on timeout, timeout action is executed: • Coordinator Timeout Actions • waiting for votes of participants: ABORT transaction, send aborts to all, delete from protocol database • waiting for ack from some participant: forward the transaction to recovery process that periodically will periodically send COMMIT to participant. When participant will recover, and all participants send an ACK, coordinator writes a completion log record and deletes entry from protocol database. • Cohort timeout actions: • waiting for prepare: abort the transaction, write abort log record, send abort message to coordinator. Alternatively, it could wait for the coordinator to ask for prepareand then vote NO. • Waiting for decision: forward transaction to recovery process. Recovery process periodically executes status-transaction call to the coordinator. Such a transaction is blocked for recovery of failure. • NOTE: The recovery process could have used a different termination protocol -- e.g., polling other participants to reduce blocking. (cooperative Termination) Notes 11
2PC is blocking Sample scenario: Coord P2 W P1 P3 W P4 W Notes 11
coord P2 w P3 P1 w w P4 Case I: P1“W”; coordinator sent commits P1“C” Case II: P1 NO; P1 A P2, P3, P4 (surviving participants) cannot safely abort or commit transaction Notes 11
Recovery Actions (cohort) • All sites execute REDO-UNDO pass • Detection: A site knows it is a cohort if it finds a prepared log record for a transaction • If the log does not contain a completion log record: • reacquire all locks for the transaction • ask coordinator for the status of transaction • The coordinator, if it has no information about the transaction in its protocol database in memory, it returns ABORT message. If it has information and transaction committed, it sends back a COMMIT. Else, if it has information but transaction not yet committed, it sends back a WAIT. • If log contains a completion log record • do nothing Notes 11
Recovery Action (coordinator) • If protocol database was made fault-tolerant by logging every change, simply reconstruct the protocol database and restart 2PC from the point of failure. • However, since we have only logged the commit and completion transitions and nothing else: • if the log does not contain a commit. Simply abort the transaction. If a cohort asks for status in the future, its status is not in the protocol database and it will be considered as aborted. • If commit log record, but no completion log record, • recreate transactions entry committed in the protocol database and the recovery process will ask all the participants if they are still waiting for a commit message. If no one is waiting, the completion entry will be written. • If commit log record + completion log record • do nothing. Notes 11
2PC analysis • Count number of messages, and log writes and number of forced log writes • Normal Processing overhead • Coordinator: 2 log writes (commit/Abort, complete) 1 forced + 2 messages per cohort • Cohort • 2 log writes both forced (prepared, committed/aborted) • 2 messages to coordinator • Various Optimizations to reduce overheads! Notes 11
Read-only Transaction • A read-only participant need only respond to phase one. It doesn’t care what the decision is. • It responds Prepared-Read-Only to Request-to-Prepare, to tell the coordinator not to send the decision • Limitation - All other participants must be fully terminated, since the read-only participant will release locks after voting. • No more testing of SQL integrity constraints • No more evaluation of SQL triggers
Presumed Abort • After a coordinator decides Abort and sends Abort to participants, it forgets about T immediately. • Participants don’t acknowledge Abort (with Done) Coordinator Participant Log Start2PC Request-to-Prepare Log prepared Prepared Log abort (forget T) Commit Log abort (forget T) • If a participant times out waiting for the decision, it asks the coordinator to retry. • If the coordinator has no info for T, it replies Abort. • Note: Lots of savings when transaction aborts!
Transfer of Coordination If there is one participant, you can save a round of messages 1. Coordinator asks participant to prepare and become the coordinator. 2. The participant (now coordinator) prepares, commits, and tells the former coordinator to commit. 3. The coordinator commits and replies Done. Coordinator Participant Log prepared Request-to-Prepare-and -transfer-coordination Log committed Log commit Commit Done • Supported by Transarc Encina, but not in any standards.
Reducing Blocking of 2PC • 2PC results in blocking when the cohort is in a prepared state • Blocked transactions hold onto resources causing increased contention. • What can we do to reduce blocking?
Heuristic Commit • Suppose a participant recovers, but the termination protocol leaves T blocked. • Operator can guess whether to commit or abort • Must detect wrong guesses when coordinator recovers • Must run compensations for wrong guesses • Heuristic commit • If T is blocked, the local resource manager (actually, transaction manager) guesses • At coordinator recovery, the transaction managers jointly detect wrong guesses.
Cooperative Termination Protocol (CTP) • Assume coordinator includes a list of participants in Request-to-Prepare. • If a participant times-out waiting for the decision, it runs the following protocol. 1. Participant P sends Decision-Req to other participants 2. If participant Q voted No or hasn’t voted or received Abort from the coordinator, it responds Abort 3. If participant Q received Commit from the coordinator, it responds Commit. 4. If participant Q is uncertain, it responds Uncertain(or doesn’t respond at all). • If all participants are uncertain, then P remains blocked.
Cooperative Termination Issues • Participants don’t know when to forget T, since other participants may require CTP • Solution 1 - After receiving Done from all participants, coordinator sends End to all participants • Solution 2 - After receiving a decision, a participant may forget T any time. • To ensure it can run CTP, a participant should include the list of participants in the vote log record.
Is there a non-blocking protocol? Theorem: If communications failure or total site failures (i.e., all sites are down simultaneously) are possible, then every atomic protocol may cause processes to become blocked. Two exceptions: if we ignore communication failures, it is possible to design such a protocol (Skeen et. al. 83) If we impose some restrictions on transactions (I.e., what data they can read/write) such a protocol can also be designed (Mehrotra et. al. 92) Notes 11
Next… • Three-phase commit (3PC) • Nonblocking if reliable network (no communications failure) and no total site failures • Handling communications failures Notes 11
Why 2PC blocks? • Since operational site on timeout in prepare state does not know if the failed site(s) had committed or aborted the transaction. • Polling all operational sites does not work since all the operational sites might be in doubt. Notes 11
Approach to Making ACP Non-blocking • For a given state S of a transaction T in the ACP, let the concurrency set of S be the set of states that other sites could be in. • For example, in 2PC, the concurrency set of PREPARE state is {PREPARE, ABORT, COMMIT} • To develop non-blocking protocol, one needs to ensure that: • concurrency set of a transaction does not contain both a commit and an abort • There exists no non-committable state whose concurrency set contains a commit. A state is committable if occupancy of the state by any site implies everyone has voted to commit the transaction. • Necessity of these conditions illustrated by considering a situation with only 1 site operational. If either of the above violated, there will be blocking. • Let S be a state with concurrency set of both commit and abort. If last node is in state S, it cannot commit or abort since we do not know the state of others. • Let S be a non-committable state whose concurrency set includes commit . If last node is in state S, it cannot ABORT unilaterally (since S’s concurrency set contains commit). It cannot commit either, since not all sites have voted to commit –presumably one may vote NO. • Sufficiency illustrated by designing a termination protocol that will terminate the protocol correctly if the above assumptions hold. Notes 11
REQUEST-PREPARE PRECOMMIT PREPARED ACK COMMIT Coordinator Participant Log start-3PC record (participant list) Log prepared record (state W) Log commit record (state C) Log committed record (state C) Notes 11
REQUEST-PREPARE PRECOMMIT PREPARED ACK COMMIT Coordinator Participant 1. Timeout: abort 1. Timeout: Abort 2. Timeout Termination Protocol 2. Timeout: ignore 3. Timeout Termination Protocol Note: Timeout failure means the corresponding cohort/coordinator failed and not message failures which are assumed to not fail. Notes 11
Three Phase Commit - Termination Protocol • Choose a backup coordinator from the remaining operational sites. • Backup coordinator sends messages to other operational sites to make transition to its local state (or to find out that such a transition is not feasible) and waits for response. • Based on response as well as its local state, it continues to commit or abort the transaction. • It commits, if its concurrency set includes a commit state. Else, it aborts. Notes 11
Decision reached All sites learn decision Coordinator fails Start 3PC Termination Protocol • Only operational processes participate in termination protocol. • Recovered processes wait until decision is reached and • then learn decision Notes 11
PRECOMMIT PREPARED ACK COMMIT Coordinator Participant REQUEST-PREPARE Abortable (A) Uncertain (U) Precommitted (PC) Committed (C) Notes 11
Termination Protocol • Elect new coordinator • Use Election Protocol • New coordinator sends STATE-REQUEST to participants • Makes decision using termination rules • Communicates to participants Notes 11
STATE-REQUEST* ABORT* ABORTABLE Participant Coordinator Notes 11
STATE-REQUEST* COMMIT* COMMITTED Participant Coordinator Notes 11
STATE-REQUEST* ABORT* UNCERTAIN* Participant Coordinator Notes 11
STATE-REQUEST* PRECOMMIT* PRECOMMITTED, NO COMMITTED ACK* COMMIT* Participant Coordinator Notes 11
Termination Protocol Sample scenario: Coord P1 W P2 W P3 W Notes 11
Termination Protocol Sample scenario: Coord P1 W P2 W P3 PC Notes 11
Note: 3PC unsafe with communication failures! P W P W W abort commit Notes 11
Replication • Replicate Data at multiple sites in a distributed system • Why? • Better Availability • Better response time • Better throughput • Key Issues • How to ensure consistency • How to propagate updates
Basic Technique • Treat replicated data just as ordinary data. Ensure 1 copy serializability using 2PL, 2PC combination • Locking Protocol • Reads lock all copies, writes lock all copies • Reads all one copy, writes lock all copies • Quorum based protocols – assign weights to each copy, ensure read and write quorums conflict • Majority voting • Primary Copy • Propogation • Eager – at time of operation • Deferred – at commit time
Weaker Consistency Models • Single Master Replication • Update transactions execute at Master copy • Logs shipped from master to backups and used to recreate the state of the database • Read-only transactions could execute over “older” transaction consistent state of the data. • If primary fails, backups could take over transaction processing. • Multimaster Replication • Update anywhere, read anywhere • Could result in inconsistent state of the database. • Reconciliation in case of inconsistency.