450 likes | 586 Views
Foundations of Database Systems. Normalization Instructor: Zhijun Wang. Modification Anomalies. Deletion Anomaly Insertion Anomaly Update Anomaly. Modification Anomalies. The EQUIPMENT_REPAIR table before and after an incorrect update operation on AcquisitionCost for Type = Drill Press:.
E N D
Foundations of Database Systems Normalization Instructor: Zhijun Wang DBMS and Applications
Modification Anomalies • Deletion Anomaly • Insertion Anomaly • Update Anomaly DBMS and Applications
Modification Anomalies • The EQUIPMENT_REPAIR table before and after an incorrect update operation on AcquisitionCost for Type = Drill Press: DBMS and Applications
Normalization • Normalization is a process of evaluating and converting a relation to reduce modification anomalies • Essentially, normalization detects and eliminates data redundancy DBMS and Applications
Normal Forms • 1NF– A table that qualifies as a relation is in 1NF • 2NF– A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key • Boyce-Codd Normal Form (BCNF)– A relation is in BCNF if every determinant is a candidate key • 3NF– A relation is in 3NF if it is in 2NF and has no determinants except the primary key DBMS and Applications
Job Application Example Jbno Jsname JsQual …… Pemp …… DBMS and Applications
First Normal Form (1NF) • To be in First Normal Form (1NF) a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted • UNF • Job-Seeker(Jbno, Jsname, (JsQual, Subject, level, award), (Pemp, employer, addr, pay)) • 1 NF • Job-Seeker(Jbno, Jsname) • (JsQual, Subject, level, award) • (Pemp, employer, addr, pay) DBMS and Applications
Second Normal Form (2NF) • To be in Second Normal Form (2NF) the relation must be • in 1NF and • each non-key attribute must be dependent on the whole key (not a subset of the key) • 2NF • (Pemp, employer, addr, pay) => • (Pemp, pay), (employer, addr) DBMS and Applications
Boyce-Codd Normal Form (BCNF) • Let R be a relation schema • X be a subset of the attributes of R • A be an attribute of R • R is in BCNF if • for every FD X A, one of the following is true • A X (it is a trivial FD), or X is a super-key. • Theoretically we must check each FD • X A in F+ • It is actually sufficient to check • whether the LHS of each FD in F is a super-key. DBMS and Applications
Testing for BCNF • E.g. R (A, B, C, D), with F = { A B, B C} • Decompose R into R1(A,B) and R2(A,C,D) • What is the key for R2 ? • Neither of the dependencies in F contain only attributes from (A,C,D) • so we might be mislead into thinking R2 satisfies BCNF. • In fact, dependency AC in F + shows R2 is not in BCNF. DBMS and Applications
Testing for BCNF • R2(A,C,D) with functional dependence {AC} • Not trivial. • The key is (A, D) • A is not a superkey. • R2(A,C,D) is not in BCNF DBMS and Applications
Lossless-join decomposition into BCNF • Given schema R • W Schema R • A is a single attribute • W A is a FD that violates BCNF • Decompose R into • R A, and • WA • If either is not in BCNF, decompose recursively • the decomposition is lossless-join DBMS and Applications
Example of BCNF Decomposition • R = (branch-name, branch-city, assets, customer-name, loan-number, amount)F = {branch-name assets branch-city loan-number amount branch-name}Key = {loan-number, customer-name} • Decomposition • R (F1) • R1 = (branch-name, branch-city, assets) • R2 = (branch-name, customer-name, loan-number, amount) • R2 (F2) • R3 = (branch-name, loan-number, amount) • R4 = (customer-name, loan-number) • Final decomposition R1, R3, R4 DBMS and Applications
Example of BCNF Decomposition R = (branch-name, branch-city, assets,customer-name, loan-number, amount) branch-name assets branch-city WA R-A R1 = (branch-name, branch-city, assets) R2 = (branch-name,customer-name, loan-number, amount) DBMS and Applications
Example of BCNF Decomposition R2 = (branch-name,customer-name, loan-number, amount) loan-number amount branch-name WA R-A R3 = (branch-name, loan-number, amount) R4 = (customer-name, loan-number) DBMS and Applications
Multiple Ways to Decompose • If there are multiple dependencies (in a relation) that violate BCNF • There may be different ways to decompose, resulting in different collections of BCNF relations • A designer has to choose one based on the semantics of the applications. DBMS and Applications
Steps to BCNF DBMS and Applications
Putting a Relation into BCNF:EQUIPMENT_REPAIR DBMS and Applications
Putting a Relation into BCNF:EQUIPMENT_REPAIR EQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost, RepairNumber, RepairDate, RepairAmount) ItemNumber (Type, AcquisitionCost) RepairNumber (ItemNumber, Type, AcquisitionCost, RepairDate, RepairAmount) ITEM (ItemNumber, Type, AcquisitionCost) REPAIR (ItemNumber, RepairNumber, RepairDate, RepairAmount) Where REPAIR.ItemNumber must exist in ITEM.ItemNumber DBMS and Applications
Putting a Relation into BCNF: New Relations DBMS and Applications
Putting a Relation into BCNF: SKU_DATA DBMS and Applications
Putting a Relation into BCNF: SKU_DATA SKU_DATA (SKU, SKU_Description, Department, Buyer) SKU (SKU_Description, Department, Buyer) SKU_Description (SKU, Department, Buyer) Buyer Department SKU_DATA (SKU, SKU_Description, Buyer) BUYER (Buyer, Department) Where BUYER.Buyer must exist in SKU_DATA.Buyer DBMS and Applications
Putting a Relation into BCNF: New Relations DBMS and Applications
Third Normal Form (3NF) • To be in Third Normal Form (3NF) • the relation must be in 2NF • no transitive dependencies may exist within the relation. • A transitive dependency is when • an attribute is indirectly functionally dependent on the key (that is, the dependency is through another non-key attribute) DBMS and Applications
Third Normal Form • Let • R be a relation schema • X be a subset of the attributes of R • A be an attribute of R • R is in 3NF if • for every FD X A, one of the following is true • A X (it is a trivial FD), or • X is a super-key, or • A is part of some key for R. {not for BCNF} • BCNF is stronger than 3NF • every BCNF relation is also in 3NF DBMS and Applications
Violation of 3NF Transitivity problem Duplication of same data, e.g. (Randolph,fee) DBMS and Applications
FD/Normalization Example • Consider a relations R with 5 attributes ABCDE. You are given the following dependencies: A->B, BC->E, ED->A • List all keys for R • Is R in 3NF? • Is R in BCNF? DBMS and Applications
FD/Normalization Example • Take each FD in turn and find out the keys for R • ED->A • ED->A & A->B => ED->B • => ED->AB => EDC->ABC => EDC->ABCDE • Hence EDC is a key • BC -> E • BC->E => BCD->ED => BCD->A => BCD->ABCDE • Hence BCD is also a key • A->B • A->B => AC->BC • AC->BC & BC->E => AC->E => AC->BCE => ACD->BCDE =>ACD->ABCDE • Hence ACD is also a key DBMS and Applications
FD/Normalization Example • Test the conditions for each FD to see if R is in 3NF • A->B • Not trivial. • A is not a superkey. • B is part of the key BCD. • BC-> E • Not trivial. • BC is not a superkey. • E is part of the key EDC. • ED->A • Not trivial. • ED is not a superkey. • A is part of the key ACD. R is in 3NF DBMS and Applications
FD/Normalization Example • Test the conditions for each FD to see if R is in BCNF • A->B • Not trivial. • A is not a superkey. • BC-> E • Not trivial. • BC is not a superkey. • ED->A • Not trivial. • ED is not a superkey. R is not in BCNF DBMS and Applications
3NF vs BCNF Example Reserves ( sailorID, boatID, day, creditCardID ) • Consider FDsailorIDcreditCardID • i.e., a sailor uses a unique credit card to pay for reservations • sailorIDis not a superkey • creditCardID is not part of a key • this relation is not in 3NF DBMS and Applications
3NF vs BCNF Example • But if we also know FDcreditCardID sailorID • i.e., credit cards uniquely identify the owner • <creditCardID, boatID, day> is also a key • FD1SailorID creditCardID does not violate 3NF • Because creditCardID is part of a key • FD2 creditCardID SailorID (previous slide) does not violate 3NF • Reservesis in 3NF • but in all tuples with same S value, the same <S, C> pair is redundantly stored • Reserves is not in BCNF DBMS and Applications
Bank Example Banker-info-schema = (branch-name, customer-name, banker-name, office-number) • The functional dependencies for this relation schema are:banker-name branch-name office-number customer-name branch-name banker-name • The key is: {customer-name, branch-name} DBMS and Applications
Bank Example (branch-name, customer-name, banker-name, office-number) • Is it in 3NF? • banker-name branch-name office-number • Not trivial • Not superkey • NOT part of a candidate key • customer-name branch-name banker-name • Not trivial • Superkey • NOT Part of a candidate key It is not in 3NF DBMS and Applications
Bank Example (branch-name, customer-name, banker-name, office-number) banker-namebranch-name office-number WA R-A (banker-name, branch-name, office-number) (customer-name, branch-name, banker-name) (customer-name, branch-name, banker-name) DBMS and Applications
Applying 3NF to Banker-info-schema Banker-office-schema = (banker-name, branch-name, office-number) Banker-schema = (customer-name, branch-name, banker-name) DBMS and Applications
Hourly_Emps Example • 2 FDs • I (I,N,L,R,W,H) • R W • Since • R is not a superkey and • W is not part of any key, • it is a violation of 3NF DBMS and Applications
Decompositions • Decompose INLRWH into INLRH & RW • INLRH has {hkid, name, lot, rating, hours_worked} • FD I INLRH • I is a key for INLRH • RW has {rating, hourly_wage} • FD R W • R is a key for RW • Both schemas are in BCNF • This decomposition was guided by the FD (R W) that violated 3NF • Method: remove W from the main schema DBMS and Applications
Decomposition R W (I,N,L,R,W,H) (R,W) (I,N,L,R,H) I INLRH R W BCNF!! BCNF!! DBMS and Applications
Normalization: Advantages and Disadvantages DBMS and Applications
Summary of Normal Forms DBMS and Applications
Denormalization for Performance • De-Normalization (Controlled Redundancy) • May want to use non-normalized schema for performance • E.g. Two tables • Branch= (branch-name, branch-city,assets) • Loan-info = (customer-name, loan-number, branch-name, amount) • displaying branch-city along with loan-number and amount requires join of Branch with Loan-info • Solution: Use denormalized relation containing attributes of Branch as well as Loan-info with all above attributes • faster lookup • Extra space and extra execution time for updates • Extra coding work for programmer and possibility of error in extra code DBMS and Applications
The Referential Integrity Constraint • A referential integrity constraint is a statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation DBMS and Applications
Foreign Key with a Referential Integrity Constraint NOTE: The primary key of the relation is underlined and any foreign keys are in italics in the relations below: SKU_DATA (SKU, SKU_Description, Department, Buyer) ORDER_ITEM (OrderNumber, SKU, Quantity, Price, ExtendedPrice) Where ORDER_ITEM.SKU must exist in SKU_DATA.SKU DBMS and Applications
Questions? DBMS and Applications