620 likes | 636 Views
Learn about relational model, query languages, integrity constraints, functional dependencies, pitfalls of bad design, decomposition, lossless join, dependency preservation, and normal forms in database design and normalization. Understand why good table design is crucial.
E N D
Temple University – CIS Dept.CIS616– Principles of Data Management V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)
Overview • Relational model • formal query languages • commercial query languages (SQL) • Integrity constraints • domain I.C., foreign keys • functional dependencies • Functional Dependencies • DB design and normalization
Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition • normal forms
Goal • Design ‘good’ tables • sub-goal#1: define what ‘good’ means • sub-goal#2: fix ‘bad’ tables • in short: “we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key” • Let’s see why, and how:
Pitfalls takes1 (ssn, c-id, grade, name, address) Ssn c-id Grade Name Address A 123 smith Main cs331
Pitfalls ‘Bad’ - why? because: ssn->address, name Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main
Pitfalls • Redundancy • space • (inconsistencies) • insertion/deletion anomalies:
Ssn c-id Grade Name Address 123 cs331 A smith Main … … … … … 234 null null jones Forbes Pitfalls • insertion anomaly: • “jones” registers, but takes no class - no place to store his address!
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main Pitfalls • deletion anomaly: • delete the last record of ‘smith’ (we lose his address!)
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main Solution: decomposition • split offending table in two (or more), e.g.: ? ?
Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition • lossless join • dependency preserving • normal forms
Decompositions • there are ‘bad’ decompositions • we want: • lossless and • dependency preserving
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id,grade) c-id Grade cs331 A cs351 B cs211 A ssn->name, address ssn, c-id -> grade
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Decompositions - lossy: can not recover original table with a join! c-id Grade cs331 A cs351 B cs211 A ssn->name, address ssn, c-id -> grade
Decompositions – lossy: Another example • Decomposition of R = (A, B) into R1 = (A), R2 = (B) A B A B 1 2 1 1 2 A(r) B(r) r A B A (r) B (r) 1 2 1 2
Decompositions example of non-dependency preserving S# -> address S# -> status S# -> address, status address -> status
Decompositions is it lossless? S# -> address, status address -> status S# -> address S# -> status
Decompositions - lossless Definition: Consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition of R if we always have: An easier criterion?
Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally:
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Decomposition - lossless example: Ssn c-id Grade R2 123 cs331 A R1 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade
Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition • lossless join decomp. • dependency preserving • normal forms
Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables - counter-example: S# -> address S# -> status S# -> address, status address -> status
Decomposition - depend. pres. dependency preserving decomposition: S# -> address address -> status S# -> address, status address -> status (but: S#->status ?)
Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables more specifically: … the FDs of the canonical cover Let Fibe the set of dependencies F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins expensive
Decomposition - depend. pres. why is dependency preservation good? S# -> address S# -> status S# -> address address -> status (address->status: ‘lost’)
Decomposition - depend. pres. A: eg., record that ‘Philly’ has status ‘A’ S# -> address S# -> address address -> status S# -> status (address->status: ‘lost’)
Decomposition - depend. pres. • To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done w.r.t. F) • result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t • If result contains all attributes in , then functional dependency is preserved • We apply the test on all dependencies in F to check if a decomposition is dependency preserving • The test takes polynomial time • Computing F+and(F1 F2 … Fn)+ needs exponential time
Decomposition - conclusions • decompositions should always be lossless • joining attribute -> superkey • whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)
Normalization using FD • When decomposing a relation schema R with a set of functional dependencies F into R1, R2,…, Rn we want: • Lossless-join decomposition: otherwise … information loss • No redundancy: relations Ripreferably should be in either Boyce-Codd Normal Form or Third Normal Form • Dependency preservation: Let Fibe the set of dependencies in F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, i.e., (F1 F2 … Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins expensive
Normalization using FD - Example • R = (A, B, C)F = {A B, B C) • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B} and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A} and A AB • Not dependency preserving (cannot check B C without computing R1 R2)
Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition ( how to fix the problem) • normal forms ( how to detect the problem) • BCNF, • 3NF, • (1NF, 2NF)
Normal forms - BCNF We saw how to fix ‘bad’ schemas - but what is a ‘good’ schema? Answer: ‘good’, if it obeys a ‘normal form’, i.e., a set of rules Typically: Boyce-Codd Normal Form (BCNF)
Normal forms - BCNF Defn.: Rel. R is in BCNF w.r.t. F, if • informally: everything depends on the full key, and nothing but the key • semi-formally: every determinant (of the cover) is a candidate key
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Normal forms - BCNF Example and counter-example: ssn->name, address ssn->name, address ssn, c-id -> grade
Normal forms - BCNF Formally: for every FD a->b in F+ • a->b is trivial (a is a superset of b) or • a is a superkey • (or both)
Normal forms - BCNF Theorem: given a schema R and a set of FD ‘F’, we can always decompose it to schemas R1, … Rn, so that • R1, … Rn are in BCNF and • the decomposition is lossless (…but, some decomp. might lose dependencies)
BCNF Decomposition How? ….essentially, break off FDs of the cover eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade
grade name ssn address c-id Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade
Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Normal forms - BCNF Ssn c-id Grade 123 cs331 A 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade
grade name ssn address c-id Normal forms - BCNF pictorially: we want a ‘star’ shape :notin BCNF
G F H B A D C E Normal forms - BCNF pictorially: we want a ‘star’ shape or
name name ssn st# ssn st# address address Normal forms - BCNF or a star-like: (e.g., 2 cand. keys): STUDENT(ssn, st#, name, address) =
G F H B D A D C E Normal forms - BCNF but not: or
BCNF Decomposition result := {R};done := false;compute F+;while (not done) do if (there is a schema Riin result that is not in BCNF)then beginlet be a nontrivial functional dependency that holds on Risuch that Riis not in F+, and = ;result := (result – Ri) (Ri – ) (, );end else done := true; Note: each Riis in BCNF, and decomposition is lossless-join
T S J Normal forms - 3NF consider the ‘classic’ case: STJ( Student, Teacher, subJect) T-> J S,J -> T is it BCNF?
T S J Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T How to decompose it to BCNF?
Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )
Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )
Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T in this case: impossible to have both • BCNF and • dependency preservation Welcome 3NF (…a weaker normal form)!
S T J Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T informally, 3NF ‘forgives’ the red arrow in the can. cover