340 likes | 555 Views
Controlled concurrency. Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens We look at 3 bad examples We then look at how we can understand whether concurrency is OK or not. Then we look at how to control concurrency.
E N D
Controlled concurrency • Now we start looking at what kind of concurrency we should allow • We first look at uncontrolled concurrency and see what happens • We look at 3 bad examples • We then look at how we can understand whether concurrency is OK or not. • Then we look at how to control concurrency
FIGURE 21.3 (a) : The lost update problem. This occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect. • Eg: X = 20, Y = 15, M = 2, N = 3
FIGURE 21.3 (b) The temporary update (dirty read) problem. When one transaction updates a database item and then the transaction fails : the updated item is accessed by another transaction before it is changed back to its original value Eg: X = 20 Y = 15 M = 2 N = 3 • Here issues of concurrency and recovery
FIGURE 21.3 (c) The incorrect summary problem. If one T is calculating an aggregate summary function on a number of records while another T id updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated. Eg: A = 2, N = 3, X = 10, Y = 8
Serial Schedules • Serial schedule: A schedule S is serial if, for every transaction T in the schedule, all operations of T are executed consecutively in S • i.e. all of one T has to finish before another T starts • Eg: T2 T1 T3 is serial • Otherwise, the schedule is called nonserial or interleaved schedule • S1 = r1(x), w1(x), r2(x), r2(y): serial: T1 T2 • S2 = r1(x), r2(x), w1(x), r2(y): interleaved
Concurrency • How to deal with problems of inconsistency of data because of concurrency? • Like in the 3 examples we saw earlier • Only allow serial execution. Problem? • Wasteful:T1 is doing I/O, T2 is forced to wait • Solution:Allow controlled concurrency • Allow when no conflict • Don’t allow when conflict • Now we see how to do “controlled concurrency”
Concurrency Eg– Figure 21.5 • Which of C, D should be allowed? • Eg: • X= 50 • M = 10 • N = 5
Different serial schedules • Will 2 diff. serial schedules always give same results ? • No – diff. serial schedules can give diff. results. Eg: • T1 = r(x), r(y), x = x + y, w(x) • T2 = r(x), r(y), y = x + y, w(y) • x = 20, y = 30 • Serial schedule T1T2 : final values of X, Y? • Serial schedule T2T1 : final values of X, Y? • Any serial execution is OK:why? • o/w we should not allow concurrency at all. • Eg: Suppose T1T2 OK, but T2T1 not OK: • All of T1has to happen before all of T2 • Makes no sense to talk about T1 and T2 executing concurrently
Serializability • Implication for concurrent execution? • Want concurrent schedule equivalent to some serial schedule • Serializable:A schedule S is serializable if it is equivalent to some serial schedule. • Intuition behind serializability:since any serial execution OK • allow interleaved execution as long as result will be same as some serial execution. • Eg: Fig. 17.5 D OK (equivalent to A), C not OK
Serializability: Result Equivalency • We said schedule S is serializable if it is equivalent to some serial schedule. • What does “equivalent” mean ? • Check if concurrent schedule produces the same result as a serial schedule. How ? • First approach: pick some data values, try. • Result equivalent: Two schedules are result equivalent if they produce same final state on some data • Is this idea OK? • Saw it with Fig 17.5 Eg
Serializability: Result Equivalency • Problem: could have happened by accident i.e. on the data we happened to look at, get the same result but not generally true • Eg: Look at Fig 17.5 again • Any values of X, M, N which will make C produce same result as A (or B) ? • When M = 0 • But C should not be allowed • Want stronger guarantee. How ? • Important ops should be in same order as serial
Conflicting Operations • Order of some pairs of ops are important to consider for concurrency/recovery, others not. • Two operations are in conflict: When ? • 1. Belong to different transactions. Why? • Within T1 can’t switch: Eg: w1(y), r1 (x) • 2. Access the same data item. Why? • If diff. data, then doesn’t matter: • w1(x), w2 (y) same as w2(y), w1 (x) • 3. One of them is a write op. . Why? • r1(x),r2 (x) same as r2(x),r1(x): data unchanged
Complete Schedules • Complete Schedule : S of T1, T2, … Tn • Exactly same ops in S and T1, T2, … Tn • Includes abort/commit for each Ti • If op1 before op2 in Ti then same order in S • For any pair of conflicting operations, one must occur before other in S • We can leave out internal operations
Serializability: Conflict Equivalent • Eg: S: r1(x), r2(y), w1(y), w1(x), w2(x) • What are the conflict pairs ? • (r1(x), w2(x)) • (w1(x), w2(x)) • (r2(y), w1(y)) • Conflict Equivalent:Two schedules are conflict equivalent if the order of any two conflicting operations is the same • i.e. have the same conflict pairs
Serializability: Conflict Equivalent • Eg:T1 = r1(x), w1(y), T2 = r2(y), w2(x) • S1 = r1(x), r2(y), w2(x), w1(y) • S2 = r2(y), w2(x), r1(x), w1(y) • Are S1, S2 conflict equivalent ? • are conflict pairs the same ? • What are the conflict pairs of S1 • (r1(x), w2(x)), (r2(y), w1(y)) • What are the conflict pairs of S2 • (w2(x)), r1(x)), (r2(y), w1(y)) • Different pairs: not conflict equivalent
Serializability: Conflict Equivalent • Eg:S3 = r1(x), r2(y), w1(y), w2(x) S4 = r2(y), r1(x), w1(y), w2(x ) • Are S3, S4 conflict equivalent ? • are conflict pairs the same ? • What are the conflict pairs of S3 • (r1(x), w2(x)), (r2(y), w1(y)) • What are the conflict pairs of S4 • (r1(x), w2(x)), (r2(y), w1(y)) • Same pairs : are conflict equivalent
Serializability Eg– Figure 21.5 • Which of C, D should be allowed?
Serializability: Conflict Equivalency • S is conflict serializable if it is conflict equivalent to some serial schedule S’ • Figure 17.5 : A (T1T2) is serial, so is B (T2T1) • Is D conflict serializable • D’s conflict pairs equivalent to those of A or B? • Conflict pair of A, B, D ? • A: (r1(x), w2(x)), (w1(x), r2(x)), (w1(x), w2(x)) • B: (r2(x), w1(x)), (w2(x), r1(x)), (w2(x),w1(x)) • D: (r1(x), w2(x)), (w1(x), r2(x)), (w1(x), w2(x)) • Is C conflict serializable. Conflict pairs ? • C: (r1(x), w2(x)), (w1(x), w2(x)), (r2(x), w1(x)) • C not equivalent to A: r2(x) before w1(x) • C not equivalent to B: w1(x) before w2(x)
Serializability • Serializable not the same as serial. • What is the difference ? • Serial means no interleaving: T1 T2 T3 etc • Serializable allows interleaving, but has to be equivalent to a serial schedule • Serializable schedule : • Will leave the database in a consistent state. • Interleaving is controlled and will result in the same state as if the transactions were serially executed, • Will achieve efficiency due to concurrent execution.
Testing For Conflict Serializability Testing for conflict serializability Algorithm 17.1: • Looks at only read_Item (X) and write_Item (X) operations : not the internal ops • Constructs a precedence graph (serialization graph) - a graph with directed edges • An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj • The scheduleis serializable if and only if the precedence graph has no cycles.
FIGURE 21.7: precedence graph for Figure 21.5 • Constructing precedence graphs for schedules from Figure 17.5 to test for conflict serializability. Precedence graphs for (a) serial schedule A. (b) serial schedule B. (c) schedule C (not serializable). (d) schedule D (serializable, equivalent to schedule A). • How do we interpret the cycles ?
FIGURE 21.8 (a). • Another example of serializability testing. (a) The READ and WRITE operations of three transactions T1, T2, and T3. • We will look at schedules in next 2 slides • And draw the precedence graphs
FIGURE 21.8 (b). • Schedule E. • Precedence graph ? Serializable ?
FIGURE 21.8 (c). • Schedule F. • Precedence graph ? Serializable ?
Serializability • Issue: OS controls how ops get interleaved : • Resulting schedule may or may not be serializable • Problem ? • If not serializable, then what? • Have to rollback. Problem? • Expensive – not practical! How to solve? • Guarantee serializability. How ? • Locks: • Current approach used in most DBMSs: • Two phase locking: will study
View Serializability • We have seen result equivalent and conflict equivalent. • View equivalent:another condition. [RG] eg: • Schedule S2 is serial • Schedule S1: R1(A), W2(A), W1(A), W3(A). Is this conflict serializable? • No – precedence graph has a cycle. • T1 → T2 → T1 • Do you think S1 should be allowed ? Schedule S1: T1: R(A) W(A) T2: W(A) T3: W(A) Schedule S2: T1: R(A),W(A) T2: W(A) T3: W(A)
View Serializability • S1 is equivalent (in every situation) to serial S2 i.e. T1,T2,T3. Why? • Because final value of A written by T3 • This is a blind write so does not matter whether T1, T2 were in serial order or interleaved • Stronger than result equivalent, weaker than conflict equivalent • View equivalent:we won’t do formal defn. • View serializability good enough • but expensive to test (NP-hard) • so use conflict serializability since easier to test
Other Notions of Serializability Other Types of Equivalence of Schedules • Under special semantic constraints • schedules that are otherwise not conflict serializable may work correctly. • [SKS Eg] in next slide
[SKS] Example • A is checking account • B is savings account • T1 transferring 50$ from A to B • T5 transferring 10$ from B to A • Is this schedule conflict serializable? • No. Also not view serializable • Though we have not studied definition. • Should this schedule be allowed ? • Yes : Eg: A = 100, B = 30. In general, OK. Why? • D: debit, C: credit. D D C C same as D C D C
Recoverability vs Serializability • Both affected by concurrent execution of transactions, but the two are quite different • Recoverability:How to recover if transaction aborts or system crashes • Serializability :Even if no system crashes and all transactions commit • Have to make sure we get correct results • Equivalent to serial schedule
Serializability Tests • DBMS has to provide a mechanism to ensure that schedules are conflict serializable • We have seen how to test a schedule to see if it is (was) serializable. • How can this be used? • We could run the transactions without attempting to control concurrency. Then what ? • Test to see if the schedule which resulted was serializable. If serializable, then what ? • Everything OK. If not serializable, then what ? • Rollback. Problem ? • Expensive. Alternative ?
Concurrency Control vs. Serializability Tests • Develop concurrency control protocols that only allow concurrent schedules which we want • Serializable • Recoverable, cascadeless . • Connection between concurrency control protocols and serializability tests ? • Tests for serializability help us understand why a concurrency control protocol is correct • i.e. why protocol guarantees serializability.