710 likes | 916 Views
L3: DDBS Design (Tailored from the Slides by Prof. Hector Garcia-Molina ). Introduction Fragmentation Horizontal fragmentation Vertical fragmentation Allocation. Distributed Database Design. Top-down approach: - have DB… - how to split and allocate the sites
E N D
L3: DDBS Design (Tailored from the Slides by Prof. Hector Garcia-Molina) • Introduction • Fragmentation • Horizontal fragmentation • Vertical fragmentation • Allocation
Distributed Database Design Top-down approach: - have DB… - how to split and allocate the sites Multi-DBs (or bottom-up): no design issues!
Two issues in DDB design: • Fragmentation • Allocation • Note: issues not independent, • but will cover separately
Example Employee relation E (eno,name,add,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where add=Sa where add=Sb and… and ...
Example Employee relation E (eno,name,add,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where add=Sa where add=Sb and… and ... Motivation: Two sites: Sa, Sb Qa Qb Sa Sb
Eno NM Add Sal E 5 Joe Sa 10 7 Sally Sb 25 8 Tom Sa 15 .. .. F Eno NM Add Sal Eno NM Add Sal 5 Joe Sa 10 7 Sally Sb 25 .. 8 Tom Sa 15 .. At Sb At Sa
F = { F1, F2 } F1 = add=SaE F2 = add=SbE
F = { F1, F2 } F1 = loc=SaE F2 = loc=SbE called primary horizontal fragmentation
Fragmentation • Horizontal Primary • depends on local attributes • R Derived • depends on foreign relation • Vertical • R Fragmentation also called Sharding
Three common horizontal partitioning techniques • Round robin • Hash partitioning • Range partitioning
Round robin • R D0 D1 D2 • t1 t1 • t2 t2 • t3 t3 • t4 t4 • ... t5 • Evenly distributes data • Good for scanning full relation • Not good for point or range queries
Hash partitioning • R D0 D1 D2 • t1h(k1)=2 t1 • t2h(k2)=0 t2 • t3h(k3)=0 t3 • t4h(k4)=1 t4 • ... • Good for point queries on key; also for joins • Not good for range queries; point queries not on key • If hash function good, even distribution
Range partitioning • R D0 D1 D2 • t1: A=5 t1 • t2: A=8 t2 • t3: A=2 t3 • t4: A=3 t4 • ... • Good for some range queries on A • Need to select good vector: else unbalance • data skew • execution skew partitioning vector 4 7 V0 V1
Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 E F2 = sal>20 E Problem: Some tuples lost!
Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E Tuples with 5 < sal < 10 are duplicated...
Prefer to deal with replication explicitly Example: F = { F5, F6, F7 } F5 = sal 5 E F6 = 5< sal <10 EF7 = sal 10 E Then replicate F6 if convenient (part of allocation problem)
Desired properties for horizontal fragmentation R F ={ F1, F2, … } (1) Completeness t R, Fi F such that t Fi
(2) Disjointness t Fi, Fj such that tFj, i j, Fi, Fj F (3) Reconstruction - ignore
How do we get completeness and disjointness? (1) Check it “manually”! e.g., F1 = sal<10 E ; F2 = sal10 E (2) “Automatically” generate fragments with these properties Desired simple predicates Fragments
Example of generation • Say queries use predicates: • A<10, A>5, Add = SA, Add = SB • Next: - generate “minterm” predicates • - eliminate useless ones
Minterm predicates (part I) (1) A<10 A>5 Add=SA Add=SB (2) A<10 A>5 Add=SA ¬(Add=SB) (3) A<10 A>5 ¬(Add=SA) Add=SB (4) A<10 A>5 ¬(Add=SA) ¬(Add=SB) (5) A<10 ¬(A>5) Add=SA Add=SB (6) A<10 ¬(A>5) Add=SA ¬(Add=SB) (7) A<10 ¬(A>5) ¬(Add=SA) Add=SB (8) A<10 ¬(A>5) ¬(Add=SA) ¬(Add=SB)
Minterm predicates (part I) (1) A<10 A>5 Add=SA Add=SB (2) A<10 A>5 Add=SA ¬(Add=SB) (3) A<10 A>5 ¬(Add=SA) Add=SB (4) A<10 A>5 ¬(Add=SA) ¬(Add=SB) (5) A<10 ¬(A>5) Add=SA Add=SB (6) A<10 ¬(A>5) Add=SA ¬(Add=SB) (7) A<10 ¬(A>5) ¬(Add=SA) Add=SB (8) A<10 ¬(A>5) ¬(Add=SA) ¬(Add=SB)
5 < A < 10 A 5 Minterm predicates (part I) (1) A<10 A>5 Add=SA Add=SB (2) A<10 A>5 Add=SA ¬(Add=SB) (3) A<10 A>5 ¬(Add=SA) Add=SB (4) A<10 A>5 ¬(Add=SA) ¬(Add=SB) (5) A<10 ¬(A>5) Add=SA Add=SB (6) A<10 ¬(A>5) Add=SA ¬(Add=SB) (7) A<10 ¬(A>5) ¬(Add=SA) Add=SB (8) A<10 ¬(A>5) ¬(Add=SA) ¬(Add=SB)
Minterm predicates (part II) (9) ¬(A<10) A>5 Add=SA Add=SB (10) ¬(A<10) A>5 Add=SA ¬(Add=SB) (11) ¬(A<10) A>5 ¬(Add=SA) Add=SB (12) ¬(A<10) A>5 ¬(Add=SA) ¬(Add=SB) (13) ¬(A<10) ¬(A>5) Add=SA Add=SB (14) ¬(A<10) ¬(A>5) Add=SA ¬(Add=SB) (15) ¬(A<10) ¬(A>5) ¬(Add=SA) Add=SB (16) ¬(A<10) ¬(A>5) ¬(Add=SA) ¬(Add=SB)
A 10 Minterm predicates (part II) (9) ¬(A<10) A>5 Add=SA Add=SB (10) ¬(A<10) A>5 Add=SA ¬(Add=SB) (11) ¬(A<10) A>5 ¬(Add=SA) Add=SB (12) ¬(A<10) A>5 ¬(Add=SA) ¬(Add=SB) (13) ¬(A<10) ¬(A>5) Add=SA Add=SB (14) ¬(A<10) ¬(A>5) Add=SA ¬(Add=SB) (15) ¬(A<10) ¬(A>5) ¬(Add=SA) Add=SB (16) ¬(A<10) ¬(A>5) ¬(Add=SA) ¬(Add=SB)
Final fragments: F2: 5 < A < 10 Add=SA F3: 5 < A < 10 Add=SB F6: A 5 Add=SA F7: A 5 Add=SB F10: A 10 Add=SA F11: A 10 Add=SB
Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be SA, SB, we need to add fragments F4: 5 <A <10 Add SA Add SB F8: A 5 Add SA Add SB F12: A 10 Add SA Add SB
Why does this work? Predicates: p1 p2 p3 p4 p1 p2 p3 ¬ p4 ¬ p1 ¬ p2 ¬ p3 ¬ p4 ...
(1) Completeness: Take t R pi(t) must be T or F! Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4
(2) Disjointness Say t Fragment p1 p2 ¬ p3 ¬ p4 Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F t cannot be in any other fragment!
Summary • Given simple predicates Pr= { p1, p2,.. pm } • minterm predicates are • M={m | m = pk*, 1 k m } • where pk* is pk or is ¬ pk pkPr • Fragments m R for all m M are • complete and disjoint
Another Desired Fragmentation Property:Match Access Patterns data A try to place in same fragment frequently accessed together data B data C
Return to example: E(Eno, NM, Add, SAL,…) Common queries: Qa: select * Qb: select * from E from E where Add=Sa where Add=Sb and … and ...
Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {Add=Sa, Add=Sb} F2={ add=Sa E, add=Sb E } (3) Pr = {Add=Sa, Add=Sb, Sal<10} F3={ add=Sa sal<10 E, add=Sa sal10 E, add=Sb sal<10E, add=Sb sal10 E }
F2 is good… (not F1 , F3 ) In other words: Qa: Select … add = Sa ... Add=Sa sal < 10 Qb: Select … add = Sb ... Add=Sa sal 10 F3 F1 F2 Add=Sb sal < 10 Add=Sb sal 10
PAY Title, Sal EMP PROJ Jno, Jname, Budget. Loc Eno, Ename, Title ASG Eno, Jno, Resp, Dur HF Example Database Information The global schema Owner and member relations Cardinality of each relation L1 L2 L3
PHF Desired Properties: Completeness • A set of simple predicates Pr is said to be complete if and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr. • Example: Applications: Q1: Find the projects at each location Q2: Find projects with budget less than $200,000 Predicates: Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”} Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”, BUDGET 200000, BUDGET >200000}
PHF Example: PAY PAY Application: Employee records kept at two sites , one site handling records with salary>30,000, another site handling records with salary <=30,000. Simple predicates: Pr = {SAL 30000, SAL > 30000} Minterm predicates: m1: (SAL 30000); m2: (SAL > 30000) PAY1 PAY2
PHF Example: PROJ p1: LOC = “Montreal” p2 : LOC = “New York” p3 : LOC = “Paris” • Applications: • Find the name and budget of projects given their location. • Issued at three sites • Access project information according to budget • One site access 200000 other accesses >200000 p4: BUDGET 200000 p5 : BUDGET > 200000 m1: LOC = “Montreal” BUDGET 200000 m2 : LOC = “Montreal” BUDGET > 200000 m3 : LOC = “New York” BUDGET 200000 m4 : LOC = “New York” BUDGET > 200000 m5 : LOC = “Paris” BUDGET 200000 M6 : LOC = “Paris” BUDGET > 200000
PROJ PHF Example: PROJ - Result PROJ1 PROJ2 PROJ4 m1: LOC = “Montreal” BUDGET 200000 m2 : LOC = “Montreal” BUDGET > 200000 m3 : LOC = “New York” BUDGET 200000 m4 : LOC = “New York” BUDGET > 200000 m5 : LOC = “Paris” BUDGET 200000 M6 : LOC = “Paris” BUDGET > 200000 PROJ6
PHF - Correctness • Completeness • Since Pr is complete and minimal, the selection predicates are complete • Reconstruction • If relation R is fragmented into FR = {R1, R2, …, Rr} R = Ri FRRi • Disjointness • Minterm predicates that form the basis of fragmentation should be mutually exclusive
PAY Title, Sal EMP PROJ Jno, Jname, Budget. Loc Eno, Ename, Title ASG Eno, Jno, Resp, Dur DHF: Derived Horizontal Fragmentation Owner relation DHF: Defined on a member relation according to a selection operation on its owner Each link is an equijoin L1 L2 L3 Member relation
PAY2= SAL> 30000 PAY PAY1= SAL 30000 PAY PAY Title, Sal EMP Eno, Ename, Title L1 EMP2 = EMP PAY2 DHF – Example EMP1 = EMP PAY1 EMP DHF
DHF: Derived Horizontal Fragmentation • Let S be horizontally fragmented and let there be a link L with owner(L) = S, and member(L) = R, the derived horizontal fragments of R are defined as Ri= RSi , 1 iw where Si is the horizontal fragment of S, is the semijoin operator, and w is the maximum number of fragments • Inputs to derived horizontal fragmentation: • partitions of owner relation • member relation • the semijoin condition • The algorithm is straight forward.
DHF: Correctness • Completeness: • primary horizontal fragmentation based on completeness of selection predicates. For derived horizontal fragmentation based on referential integrity • Reconstruction: • union • Disjointedness: • primary horizontal fragmentation based on mutually exclusive simple predicates. For derived horizontal fragmentation, based on whether the link between owner and member is 1:1 or 1:m relationship
DHF: Issues • Multiple owners for a member relation; how should we derived horizontally fragment a member relation. • There can be a chain of derived horizontal fragmentation.
L3.2.2: DDBS Design • Introduction • Fragmentation • Horizontal fragmentation • Vertical fragmentation • Allocation
Vertical fragmentation E Example: E2 E1
R[T] R1[T1] Ti T Rn[Tn] Just like normalization of relations ...
Properties: R[T] Ri[Ti] (1) Completeness U Ti = T all i