1 / 47

Chapter 3.6-7

Chapter 3.6-7. Normalization of Database Tables. Normalization. Normalization is process for assigning attributes to entities Reduces data redundancies Helps eliminate data anomalies Produces controlled redundancies to link tables Normalization stages 1NF - First normal form

arich
Download Presentation

Chapter 3.6-7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3.6-7 Normalization of Database Tables

  2. Normalization • Normalization is process for assigning attributes to entities • Reduces data redundancies • Helps eliminate data anomalies • Produces controlled redundancies to link tables • Normalization stages • 1NF - First normal form • 2NF - Second normal form • 3NF - Third normal form • 4NF - Fourth normal form

  3. title length year address Name Name address Example: Starts-in Stars Movies Owns Studios

  4. Problem Example Movies • Update anomalies: If Harrison Ford’s phone # changes, must change it in each of his tuples. If Length value of Star Wars needs to be changed, must change all occurrences • Deletion anomalies: If we delete Wayne’s World entries from database, we also loose all info about Dana Carvey & Mike Meyers

  5. Conversion to 1NF • Repeating groups must be eliminated • Proper primary key developed • Uniquely identifies each tuple • Dependencies can be identified • undesirable dependencies allowed • Partial • based on part of composite primary key • Transitive • one nonprime attribute depends on another nonprime attribute

  6. Conversion to 1NF Cont. • An attribute that is at least part of a key is known as a prime attribute or key attribute or primary key.

  7. Example • Projects assigned to employees • Each project has a number and a name • Each employee has a number, a name, a job class • Each employee working on a project, need to keep number of hours spent on project, and hourly rate. • Project Assignments Table : ( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) • What’s the Key for this relation?

  8. Data Organization: 1NF

  9. Dependency Diagram (1NF) PROJ_ NUM, EMP_NUM --> PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS PROJ_NUM --> PROJ_NAME EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS JOB_CLASS --> CHG_HOUR

  10. 1NF Summarized • All key attributes defined • Primary Key identified • No repeating groups in table • All attributes dependent on primary key

  11. 2NF Summarized • In 1NF, but • Includes no partial dependencies • Partial dependency: • An attribute is functionally dependent on a portion of the primary key. • Example: • PROJ_NUM  PROJ_NAME • EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS

  12. Conversion to 2NF • Start with 1NF format: • Write each key component on a separate line • Write dependent attributes after each key component • Write original key on last line • Write any remaining attributes after original key • Each component is new table PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)

  13. 2NF Conversion Results

  14. 2NF Summarized • In 1NF • Includes no partial dependencies • Still possible to exhibit transitive dependency • Attributes may be functionally dependent on non-key attributes

  15. Conversion to 3NF • decompose table(s) to eliminate transitive functional dependencies PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)

  16. 3NF Summarized • In 2NF • Contains no transitive dependencies

  17. Boyce-Codd Normal Form (BCNF) • Formally, R is in BCNF if for every nontrivial FD for R, say XA, X is a superkey. • “Nontrivial” = right-side attribute not in left side. • Trivial FDs examples • AA • ABA • Informally: the only arrows in the FD diagram are arrows out of superkeys • Note: BCNF violation when relation has more than one superkey that overlap

  18. 3NF Table Not in BCNF What normal form?

  19. Decomposition of Table Structure to Meet BCNF

  20. Decomposition to Reach BCNF Setting: relation R with FDs F. Suppose relation R has BCNF violation XB and X not a superkey.

  21. 1. Compute X+. • Cannot be all attributes – why? 2. Decompose R into X+ and (R–X+) X. 3. Find the FD’s for the decomposed relations. • Project the FD’s from F = calculate all consequents of F that involve only attributes from X+ or only from (RX+) X. R X+ X

  22. Decomposition to Reach BCNF • Identify the violating FD • E.g. : X1X2…Xn B1B2…Bm • Add to the right-hand side of FD as many attributes as are functionally determined by (X1X2…Xn) • Decompose relation R into two relations: • One relation has all attributes Xs & Bs • Second relation has the Xs plus any other remaining attributes from R other than Bs

  23. BCNF--Example Assume R(S, J, T) • S: Student • J: subject • T: Teacher • Student S is taught subject J by teacher T. • Constraints: • For each subject, each student of that subject is taught by only one teacher • Each teacher teaches only one subject (but each subject is taught by several teachers)

  24. BCNF--Example • Functional Dependencies: • S , J  T • T  J

  25. BCNF--Example • Candidate keys: {S, J} and {S, T} • 3NF but not in BCNF • Update anomaly: if we delete the info that Jane is studying Physics we also loose the info that Prof. Brown teaches Physics • Solution: two relations R1{S, T} R2{T, J} S T J

  26. Decomposition Based on BCNF is Necessarily Correct Attributes A, B, C. FD: B  C Relations R1[A,B] R2[B, C] Tuples in R: (a, b, c) Tuples in R1: (a, b) Tuples in R2: (b, c) Natural join of R1 and R2 = (a, b, c)  original relation R can be reconstructed by forming the natural join of R1 and R2.

  27. Decomposition Based on BCNF is Necessarily Correct Attributes A, B, C. FD: B  C Relations R1[A,B] R2[B, C] Tuples in R: (a, b, c) , (d, b, e) Tuples in R1: (a, b), (d, b) Tuples in R2: (b, c), (b, e) Tuples in the natural join of R1 and R2: (a,b,c), (a,b, e) (d, b, c), (d, b, e) Can (a,b,e), (d, b, c) be a bogus tuples?

  28. Decomposition Based on BCNF is Necessarily Correct • Answer: No • Because: B  C i.e. if 2 tuples have same B attribute then they must have the same C attribute.  (b,c) = (b,e)  (a, b,e) = (a, b,c) and (d, b, c) = (d, b, e)

  29. Theorem • Any two-attribute relation is in BCNF.

  30. Decomposition Theorem • Suppose we decompose a relation R(X, Y, Z) into R1(X, Y) and R2(X,Z) and project the R onto R1 and R2. • Then join(R1, R2) is guaranteed to reconstruct R if and only if XY or XZ • Notice that whenever we decompose because of a BNCF violation, one of the above FDs holds.

  31. 3NF One FD structure causes problems in BCNF: • If you decompose, you can’t recover all of the original FD’s. • If you don’t decompose, you violate BCNF. Abstractly: ABC and CB. • Example : street city  zip, and zip  city. BCNF violation: CB has a left side that is not a superkey. • Based on previous algorithm, decompose into BC and AC. • But the FD ABC does not hold in new tables.

  32. Example A = street, B = city, C = zip. zip  city BCNF violation Decompose: It is a bad idea to decompose relation because you loose the ability to check the dependency: street city  zip

  33. Example zip  city Decompose: It is a bad idea to decompose relation because you loose the ability to check the dependency: street city  zip

  34. “Elegant” Workaround Define the problem away. • A relation R is in 3NF iff (if and only if)for every nontrivial FD XA, either: 1. X is a superkey, or 2. A is prime = member of at least one key. • Thus, if we just normalize to the 3NF, the problem goes away.

  35. What 3NF and BCNF Give You • There are two important properties of a decomposition: • Recovery: it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. • Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

  36. 3NF and BCNF, Continued • We can get (1) with a BCNF decomposition. • We can get both (1) and (2) with a 3NF decomposition. • But we can’t always get (1) and (2) with a BCNF decomposition. • street-city-zip is an example.

  37. Mutli-valued Dependencies Fourth Normal Form

  38. Definition of MVD • A multivalued dependency is an assertion that two attributes (sets of attributes) are independent of one another. • Formally: A multivalued dependency (MVD) on R, X ->->Y, says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation.

  39. Example Actors(name,addr,phones,cars) with MVD Namephones. name addr phones cars sue ap1b1 sue ap2b2 it must also have the same tuples with phonescomponents swapped: name addr phones cars sue ap2b1 sue ap1b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.

  40. Example 2 • An actor may have more than one address • Key? What normal form? • Note the redundancies • MVD: name  street city • read: name determines 1 or more street & city independent of all other attributes

  41. MVD Rules 1. Every FD is an MVD. • Because if X Y, then swapping Y’s between tuples that agree on X doesn’t create new tuples. • Example, in Actors: nameaddr. • Note: the opposite is not true i.e. not every MVD is a FD 2. Complementation: if XY, then XZ, where Z is all attributes not in X or Y. • Example: since namephones holds in Actors, the nameaddr cars.

  42. Splitting Doesn’t Hold • namestreet cityholds, but • namestreet does not hold • Name does not determine 1 or more street independent of city. • namecitydoes not hold

  43. Example 2 • An actor may have more than one address • MVD: name  street city • read: name determines 1 or more street & city independent of all other attributes • Also (complement MVD): name  title year

  44. Fourth Normal Form • The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. • There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.

  45. 4NF Eliminate redundancy due to multiplicative effect of MVD’s. • Roughly: treat MVD’s as FD's for decomposition, but not for finding keys. • Formally: R is in Fourth Normal Form if whenever MVDXY is nontrivial (Y is not a subset of X, and X Y is not all attributes), then X is a superkey. • Remember, X Y implies XY, so 4NF is more stringentthan BCNF. • Decompose R, using4NF violation XY,into XY and X  (R—Y). R Y X

  46. Example Drinkers(name,addr,phones,cars) • FD: nameaddr • Nontrivial MVD’s: namephones namecars. • Only key: {name, phones, cars} • All three dependencies above violate 4NF. Why? • Successive decomposition yields 4NF relations: D1(name, addr) D2(name, phones) D3(name, cars)

  47. Example 2 • namestreet city • Decompose into: R1(name, street, city) R2(name, title, year)

More Related