340 likes | 377 Views
Concurrency Control and Recovery. In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves performance, yields better utilization of resources. BUT: if not careful, concurrent access can lead to incorrect database
E N D
Concurrency Control and Recovery • In real life: • users access the database concurrently, and • systems crash. • Concurrent access to the database also improves performance, • yields better utilization of resources. • BUT: if not careful, concurrent access can lead to incorrect database • states. Crashes can also leave the database in incoherent states. • Basic concurrency/recovery concept: transaction • executed atomically. All or nothing. • We cover: • transactions in SQL • implementation of transactions and recovery.
Flight Reservation get values for :flight, :date, :seat EXEC SQL SELECT occupied INTO :occ FROM Flight WHERE fltNum = :flight AND fltdt= :date AND fltSeat=:seat if (!occ) { EXEC SQL UPDATE Flights SET occupied = ‘true’ WHERE fltNum= :flight AND fltdt= :date AND fltSeat=:seat /* more code missing */ } else /* notify customer that seat is not available */
Problem #1 Customer 1 - finds a seat empty Customer 2 - finds the same seat empty Customer 1 - reserves the seat. Customer 2 - reserves the seat. Customer 1 will not be happy. serializability
Bank Transfers Transfer :amount from :account1 to :account2 EXEC SQL SELECT balance INTO :balance1 FROM Accounts WHERE accNo = :account1 if (balance1 >= amount) EXEC SQL UPDATE Accounts SET balance = balance + :amount WHERE acctNo = :account2; EXEC SQL UPDATE Accounts SET balance = balance - :amount WHERE acctNo = :account1; Crash...
Transactions • The user/programmer can group a sequence of commands so that • they are executed atomically and in a serializable fashion: • Transaction commit: all the operations should be done and recorded. • Transaction abort: none of the operations should be done. • In SQL: • EXEC SQL COMMIT; • EXEC SQL ROLLBACK; • Easier said than done...
ACID Properties Atomicity:all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database.
How Do We Assure ACID? Concurrency control: Guarantees consistency and isolation, given atomicity. Logging and Recovery: Guarantees atomicity and durability. If you are going to be in the logging business, one of the things that you’ll have to do is learn about heavy equipment. -- Robert VanNatta Logging History of Columbia County
More on SQL and Transactions • Read only transactions: • if the transaction is only reading, we can allow more operations • in parallel. • EXEC SQL SET TRANSACTION READ ONLY; • The default is: • SET TRANSACTION READ WRITE;
Dirty Data Data that has been written by a transaction that has not committed yet is called dirty data. Do we allow our transaction to read dirty data? It may go away… In SQL: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED Note: default for READ UNCOMMITTED transactions is that they are READ ONLY.
Problems with Dirty Data • Transfer program: 1. Transfer $N from account 1 to account 2 • 2a. If account 1 has enough for the transfer, • 2b. subtract $N from account 1, and commit. • 2c. Subtract $N from account 2, and commit. • Bad scenario: A1: $100, A2: $200, A3: $300 • T1: transfer $150 from A1 to A2 • T2: transfer $250 from A2 to A3. • Events: • T2 does step 1, -> A3 has $550 • T1 does step 1, -> A2 has $350 • T2 does step 2a, all is ok. • T1 does step 2a and finds that A1 doesn’t have enough funds. • T2 does step 2b, -> A2 now has $100 • T1 does step 2c, -> A2 now has -$50.
Concurrency Control Methods • Schedules • Serial schedules • Serializable schedules • Locking • Lock manager • 2 Phase Locking • Deadlocks: • Prevention • Detection
Schedules • A schedule is an interleaving of a set of actions • of different transactions, such that the actions of • any single transaction are in order. • A schedule represents some actual sequence of • database actions. • In a complete schedule, every transaction either • commits or aborts. • Initial state + Schedule -> Final state.
Acceptable Schedules • Serial schedules: • The transactions run one at a time from beginning to completion. • Note: there are many possible serial schedules. Each one is OK. The • DBMS does not provide any guarantee in which order concurrently • submitted transactions are executed. • Serializable schedules: • Final state is what some serial schedule would have produced.
Aborted Transactions • Slight modification to the definition: • A schedule is serializable if it is equivalent to a serial schedule • of committed transactions. • As if the aborted transactions never happened. • Two issues to consider w.r.t. aborted transactions: • how does one undo the effect of a transaction? • What if another transaction sees the effects of an aborted one?
S X -- Ö Ö Ö -- S Ö Ö X Ö Locks • Concurrency control is usually done via locking. • The lock manager maintains a list of entries: • object identifier (can be page, record, etc.) • number of objects holding lock on the object • nature of the lock (shared or exclusive) • pointer to a list of lock requests. • Lock compatibility table: • If a transaction cannot get a lock, it is • suspended on a wait queue.
Handling Lock Requests Lock Request (XID, OID, Mode) Mode==S Mode==X Currently Locked? Empty Wait Queue? Yes No Yes Yes No Put on Queue No Grant Lock
lock point shrinking phase growing phase Two-Phase Locking (2PL) • 2 phase locking: • if T wants to read an object, it first obtains an S lock. • If T wants to write an object, it first obtains an X lock. • If T releases any lock, it can acquireno new locks. • Recall: all this is done transparently to the user by the DBMS. • 2PL guarantees serializability! • Why?? # of locks Time
T1 T2 Serializability Graphs • Two actions conflict if they access the same data item. • The precedence graph contains: • A node for every committed transaction • An arc from Ti to Tj if an action of Ti precedes and conflicts • with an action of Tj. • T1 transfers $100 from A to B, T2 adds 6% to both • R1(A), W1(A), R2(A), W2(A), R2(B), W2(B), R1(B), W1(B)
Conflict Serializability • 2 schedules are conflict equivalentif: • they have the same sets of actions, and • each pair of conflicting actions is ordered in the same way. • A schedule is conflict serializableif it is conflict equivalent to a serial schedule. • Note: Some serializable schedules are not conflict serializable! • Theorem: A schedule is conflict serializable iff its precedence graph is acyclic. • Theorem: 2PL ensures that the precedence graph will be acyclic!
Deadlocks • Suppose we have the following scenario: • T1 asks for an exclusive lock on A • T2 asks for an exclusive lock on B • T1 asks for a shared lock on B • T2 asks for a shared lock on A • Both T1 and T2 are waiting! We have a DEADLOCK. • Possible solutions: • Prevent deadlocks to start with, or • Detect when they happen and do something about it.
Deadlock Prevention • Give each transaction a timestamp. “Older” transactions have • higher priority. • Assume Ti requests a lock, but Tj holds a conflicting lock. • We can follow two strategies: • Wait-die: if Ti has higher priority, it waits; else Ti aborts. • Wound-wait: if Ti has higher priority, abort Tj; else Ti waits. • Note: after aborting, restart with original timestamp! Both strategies guarantee deadlock-free behavior!
An Alternative to Prevention • In theory, deadlock can involve many transactions: • T1 waits-for T2 waits-for T3 ...waits-for T1 • In practice, most “deadlock cycles” involve only 2 • transactions. • Don’t need to prevent deadlock! • What’s the problem with prevention? • Allow it to happen, then notice it and fix it. • Deadlock detection.
Deadlock Detection • Lock Manager maintains a “Waits-for” graph: • Node for each transaction. • Arc from Ti to Tj if Tj holds a lock and Ti • is waiting for it. • Periodically check graph for cycles. • “Shoot” some transaction to break the cycle. • Simpler hack: time-outs. • T1 made no progress for a while? Shoot it.
Detection Versus Prevention • Prevention might abort too many transactions. • Detection might allow deadlocks to tie up resources for a while. • Can detect more often, but it’s time-consuming. • The usual answer: • Detection is the winner. • Deadlocks are pretty rare. • If you get a lot of deadlocks, reconsider your schema/workload!
Review: ACID Properties Atomicity:all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database. The Recovery Manager guarantees Atomicity & Durability.
Motivation for Recovery • Atomicity: • Transactions may abort (“Rollback”). • Durability: • What if DBMS stops running? (Causes?) • Desired Behavior after system restarts: • T1, T2 & T3 should be durable. • T4 & T5should be aborted (effects not seen). crash! T1 T2 T3 T4 T5
Handling the Buffer Pool • Force every write to disk? • Poor response time. • But provides durability. • Steal buffer-pool frames from uncommited transactions? • If not, poor throughput. • If so, how can we ensure atomicity? No Steal Steal Force Trivial Desired No Force
Basic Idea: Logging • Record REDO and UNDO information, for every update, in a log. • Sequential writes to log (put it on a separate disk). • Minimal info (difference) written to log, so multiple updates fit in a single log page. • Log: An ordered list of REDO/UNDO actions • Log record contains: <XID, pageID, offset, length, old data, new data> • and additional control information. • The Write-Ahead Logging Protocol: • Must force the log record for an update before the corresponding data page gets to disk. • Must write all log records for a transaction beforecommit.
DB RAM LSNs pageLSNs flushedLSN pageLSN WAL & the Log • Each log record has a unique Log Sequence Number (LSN). • LSNs always increasing. • Each data pagecontains a pageLSN. • The LSN of the most recent log record for an update to that page. • System keeps track of flushedLSN. • The max LSN flushed so far. • WAL:Before a page is written, • pageLSN £ flushedLSN Log records flushed to disk “Log tail” in RAM
Recovery • Three steps: (a la` ARIES) • Starting from a checkpoint: • Analysis: figure out which transactions committed • since the checkpoint, and which failed. • REDOall actions in the log. • UNDO effects of failed transactions.
Summary • Users access the database concurrently, and sometimes there are • crashes. • Transactions are sets of operations that are guaranteed to be atomic. • The DBMS guarantees: Atomicity, Consistency, Isolation, • Durability. • Isolation and consistency are guaranteed via locking: 2-phase • (need special care for deadlocks). • Atomicity and durability are guaranteed by: • Logging • Recovery manager (that uses the log). There are MANY MANY more missing details!