390 likes | 534 Views
CS411 Database Systems. 11: Transaction Management. A transaction is a sequence of operations that must be executed as a whole. Every DB action takes place inside a transaction. BUSEY SAVINGS Winslett $ 1000 . BUSEY CHECKING Winslett $0 . Transfer $500. Debit savings Credit checking.
E N D
CS411Database Systems 11: Transaction Management
A transaction is a sequence of operations that must be executed as a whole. Every DB action takes place inside a transaction. BUSEY SAVINGS Winslett $1000 BUSEY CHECKING Winslett $0 Transfer $500 Debit savings Credit checking Either both (1) and (2) happen or neither!
We abstract away most of the application code when thinking about transactions. User’s point of view Transfer $500 Code writer’s point of view Debit savings Credit checking Read Balance1 Write Balance1 Read Balance2 Write Balance2 Concurrency control ‘s & recovery’s point of view
Schedule: The order of execution of operations of two or more transactions. Schedule S1 Transaction1 Transaction2 R(A) R(C) W(A) R(B) W(C) R(B) W(B) W(B) Read data item A (typically a tuple) Time
Why do we need transactions? Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) R(A) W(A) Time
What will be the final account balance? Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) R(A) W(A) Time The Lost Update Problem
What will be the final account balance? Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) F A I L R(A) W(A) Time Dirty reads cause problems.
Abort or roll back are the official words for “fail”. Commit All your writes will definitely absolutely be recorded and will not be undone, and all the values you read are committed too. Abort/rollback Undo all of your writes! We’ll give a more technical definition later on.
The concurrent execution of transactions must be such that each transaction appears to execute in isolation.
The ACID properties: Atomic • A transaction is performed entirely or not at all Consistency Preservation • A transaction’s execution takes the database from one correct state to another Isolation • The updates of a transaction must not be made visible to other transactions until it is committed Durability • If a transaction changes the database and is committed, the changes must never be lost because of subsequent failure
Serial Schedule: the transactions are performed one after the other. Read(A) Write(A) Read(B) Write(B) Read(C)Write(C)Read(B) Write(B) Read(A) Write(A) Read(B) Write(B) Read(C)Write(C)Read(B) Write(B)
A schedule is serializable if it is guaranteed to give the same final result as some serial schedule. Which of these are serializable? Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B)
Why is that last schedule ok? Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(B) Write(B) Read(A) Write(A) Read(B) Write(B) And this will never cause an interleaving problem Will always give the same result as
Our missing sailor Jim Gray got his Turing Award for figuring out how to tell whether a schedule is serializable.
Certain pairs of operations conflict with each other. Write(A) Write(A) Write(A) Read(A) Write(A) Write(A) Read(A) Write(A) Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B)
Let’s clean up by collapsing all the nodes for a transaction into one big node. Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B) Blue Black Blue Black Blue Black
Let’s clean up by collapsing all the nodes for a transaction into one big node. A schedule is conflictserializableiff its dependency graph is acyclic. Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B) Blue Black Blue Black Blue Black
Is this schedule conflict serializable? T1 T2 T3 T1 T2 T3 R2(C) W2(C) R2(B) W2(B) R1(A) W1(A) R1(C) W1(C) R3(C) W3(C) R1(A) R2(C) W2(C) W1(A) R1(C) W1(C) R3(C) W3(C) R2(B) W2(B) Schedule S2 Schedule S1
What serial schedule is S1 equivalent to, if any? S2 S1 R(A) W(A) R(C) W(C) R)C( W(B) R(A) R(B) W(B) W(B) R(A) W(A) R(A) R(B) R(C) W(B) W(C) R)C(
What serial schedule is S1 equivalent to, if any? S1 R(A) W(A) R(A) R(C) R(C) W(A) W(C) R)A( Nothing
Locks are the basis of most protocols to guarantee serializability. Before you can read (write) item A, you must get a read (write) lock on A from the lock manager, which you eventually give up. Shared lock = read lock: many transactions can hold a shared lock on the same item at the same time. Exclusive lock = write lock: only one transaction can hold an exclusive lock on an item at any given time.
The lock manager makes decisions based on its compatibility matrix S X X N N S N Y
Lock management keeps a hash table of current locks Locking and unlocking must be atomic: ensured by semaphores Entry in the lock table Data item ID # trans who have it locked Lock type(S, X) Queue of requests Trans47 has lock Trans 109 has lock Trans 448 wants X lock Trans 599 wants S lock Queue to lock a particular data item
T3 T2 T1 2-Phase Locking (2PL): no new locks once you’ve given one up. Two phases: getting locks (growing), then releasing locks (shrinking). At least one end of each arrow is a Write R/W(A) R/W(A) R/W(B) R/W(B) R/W(C) R/W(C)
2PL doesn’t solve every potential problem. T1 T2 We should never have let T1 commit. W(B) R(A) W(A) R(A) R(B) W(B) T1 commits Cascading rollback Now T2 aborts!
How do we deal with this? Commit trans T only after all transactions that wrote data that T read have committed Or only let a transaction read an item after the transaction that last wrote this item has committed Strict 2PL: 2PL + a transaction releases its locks only after it has committed. How does Strict 2PL prevent cascading rollback?
Locks can also lead to deadlocks Deadlock = circular wait among transactions waiting for each other to release a lock. • Prevent: every transaction must lock all items it needs in advance (ha!) • Or detect Livelock: a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. • Fair waiting schemes (i.e., first-come-first-served) for locks T2 T1
Detect deadlocks by timeouts, or occasionally looking for cycles in a waits-for graph. Edge from Ti to Tjiff Ti waits for Tj to release a lock T1: S(A) S(D)…………..S(B) T2: …………………X(B) ……………………………………X(C) T3: …………………………………. S(D)S(C) ………………...X(A) T4: …………………………………………………… X(B) T1 T2 T3 T4
Even S2PL does not solve the phantom problem. Schedule Non-repeatable read! Would never happen in serial execution. Find the oldest CS undergrad Find the oldest CS undergrad Elaine Add Frank, a 71-year-old CS major; commit. Frank CS 71 B-tree on Major Frank
Solution: lock the index used to reach the tuples. Else must lock the whole table to prevent new records with major = CS being added. Index Put a lock here. CS
How do you lock an index anyway? • E.g., lock a B-tree node • Different locking protocol (not 2PL, S2PL) designed specifically for traversing trees. Why?
Locks aren’t the only way to guarantee serializability Locking is good when conflicts are likely Timestamp-based approaches are possible • Nice if conflicts are rare (e.g., working in parallel as a team on separate data) • Otherwise, lots of transactions will abort • Details in the book
What exactly are we locking, anyway? The most common data lock granularities: • Record • Page • Table If you access a small number of records, the best locking granularity is one record. If you access many records of the same table, best to have coarser granularity locks. Because you need to rearrange it when you add/delete/expand/contract tuples
Real DBMS lock matrix: ~ 20 x 20 S X X N N S N Y
Example extra lock mode: increment Increments and decrements commute with each other. Very useful for banks, but requires atomic read+update. T1: x=R(A), W(A=x+1), z=R(A), W(z=z+1) T2: y=R(A), W(A=y-1) Why bother with this at all???
Intension locks are very common Scan through the table, intending to modify a few tuples. What kind of lock to put on the table? Has to be X (if we only have S or X). But, blocks all other read requests!
Intention locks IS, IX increase concurrency. E.g., the whole relation • Before S locking an item, must IS lock the root. • Before X locking an item, must IX lock the root. • Should make sure: • If T1 S-locks a node, no T2 can X-lock an ancestor. • Achieved if S conflicts with IX. • If T1 X-locks a node, no T2 can S- or X-lock an ancestor. • Achieved if X conflicts with IS and IX. E.g., a tuple
Allowed Sharings IS IX S X -- Ö Ö Ö Ö Ö -- IS Ö Ö Ö Ö IX Ö Ö Ö Ö S Ö Ö Ö X