1 / 81

Normalization: Kroenke Chapters 3 and 4

Normalization: Kroenke Chapters 3 and 4. A relation is categorized by one of several normal forms . An aid to design helps characterize relations that experience anomalies in update operations Higher normal forms TEND to be better design, but not guaranteed .

vivien
Download Presentation

Normalization: Kroenke Chapters 3 and 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization: Kroenke Chapters 3 and 4

  2. A relation is categorized by one of several normal forms. • An aid to design • helps characterize relations that experience anomalies in update operations • Higher normal forms TEND to be better design, but not guaranteed.

  3. Remember the one fact-one place theme! • Deletion anomaly: • Deleting 1 fact inadvertently deletes another. • Insertion anomaly: • inserting 1 fact not possible without inserting another seemingly unrelated fact.

  4. First Normal Form – 1NF • A relation is 1NF if each attribute is atomic • That is, attributes are simple types (int, float, string, char, etc)

  5. Second Normal Form – 2NF • How about this as a base table? Primary key

  6. Definition: • R is a relation; • X and Y are attributes of R. • Y is functionally dependent on X iff each X-value in R has precisely one Y-value in R associated with it. • A common notation is X  Y.

  7. Status S# Qty P# City S# Name Example: • In the supplier’s S table, Status, City, and Name are functionally dependent on S#. • In the SP table, Qty is functionally dependent on the combined attributes of S# and P#

  8. City and status are not functionally dependent on each other. • There may be several entries containing ‘London’ but different status values. • There may be several entries containing a status of 50 but have different cities.

  9. QTY is not functionally dependent on either P# or S#. • S1 might have multiple QTY values for different parts • Similar for P1

  10. Def: Y is Fully Functionally Dependent on X if X  Y but Y is not functionally dependent on any proper subset of X. • In S, (S#, Status)  City -- but not fully because S# City • in SP: (S#, P#)  Qty -- fully because neither S# nor P# by itself determines Qty. • The functional dependence requires BOTH S# and P#.

  11. Semantic notation. Must understand meaning of data, NOT a consequence of table data. • For example, suppose that each city in S has the same status. • Is it coincidence or by design?

  12. Why is this important? • What if we combined the relations S and SP into a single relation, Firstas in a few slides previous? First(S#, P#, Status, City, QTY) • Underlined attributes represent the primary key.

  13. Insertion Anomalies: • Cannot enter fact that a supplier is located in a city unless that supplier already supplies some part. • Why?

  14. Deletion Anomalies • Suppose S3 no longer supplies P2. • Delete (S3, P2, 10, Paris, 200) • if that is the ONLY part S# supplied, you lose fact that S3 is in Paris.

  15. Update Anomalies • S1 moves from London to Amsterdam. • May have to update many entries. • Violates the “one fact, one place” guideline • These problems are caused by dependences on a proper subset of the primary key.

  16. See also Kroenke’s example on page 95 and the text on page 96.

  17. Second Normal Form (2NF) • A relation is 2NFiff it is 1NF and every non-key attribute is fully functionally dependent on the primary key. • There are no attributes dependent on a proper subset of the primary key.

  18. Table First is NOT 2NF. Some nonkey attributes are not fully dependent on the primary key (S#, P#). • Some are dependent on S# only • The S and SP tables ARE 2NF. • They are a better design in this case. • Similar example in Fig 3-10 on page 106 of text.

  19. How about this table? • Does it contain redundancy? • Are there update anomalies?

  20. Status S# City Transitive dependencies • Suppose a supplier status is determined by the supplier’s city. • That is, City Status. • Since also S#  City then S# status is a result of these dependencies. • A Transitive dependency exists as shown below.

  21. Fee SID dorm • Similarly, a Housing table that links a student with a dorm and a residence fee would also likely have a transitive dependence.

  22. Insertion anomalies: • Cannot state fact that a supplier in Rome must have a status of 50 unless there is already a supplier there. • Cannot state fact a dorm has a specific cost unless there is already a student there.

  23. Deletion anomalies • Delete (S5, 30, Athens) • If it’s the ONLY Athens, lose fact that status for Athens must be 30. • Delete (100, Randolph, $3200) from the Housing table. • If that’s the only “Randolph” then you lose the connection between dorm and cost. 

  24. Update anomalies • “Change status of London supplier” may mean multiple updates. • Violates the “one fact” – “one place” rule. i.e. that each fact should be stored in one place.

  25. Third Normal Form (3NF) • A relation is 3NFiff it is 2NF and every non-key attribute is nontransitively dependent on the primary key. i.e. non-key attributes are mutually independent. • Again, it’s a consequence of the meaning of the data, not the data itself.

  26. Suppose all London suppliers had a status of 50. • Is that coincidence? • Is it by design?

  27. Question: • Is 3NF better than 2NF? • Maybe. • In the cases presented here, probably so. • An employee table where EmpIDAddressZipCode is not 3NF. • We may not care about AddressZip_Code unless it’s a UPS or Post Office application.

  28. Table Decomposition • Dividing a table into 2 or more tables to achieve a higher normal form. • Previously we divided First into tables S and SP to achieve 2NF.

  29. Now we find that S is not 3NF, so we should decompose S into two tables. • We have options: • SS(S#, Status) and CS(City, Status) • SC(S#, City) and SS(S#, Status) or • SC(S#, City) and CS(City, Status) • Which is best?

  30. Need to ask: • Does the decomposition result in a loss of information? • For example, can we still relate the attributes that have been separated into two tables? • Are the two relations independent of each other?

  31. Option 1: SS(S#, Status) and CS(City, Status) • Cannot get the city of a supplier. • Can you see why?

  32. Option 2: SC(S#, City) and SS(S#, Status) • Relations not independent. • If two suppliers are in the same city, must make sure they have the same status. • Requires monitoring of changes, possibly the use of triggers. Extra work.

  33. CAN get the status of a city but ONLY if there’s a supplier there. Otherwise there’s a loss of information. • Can’t store the status of a city unless there’s a supplier there.

  34. Option 3: SC(S#, City) and CS(City, Status) • Two relations are independent. • No loss of information • Best option

  35. Fee SID dorm Decompose the Housing table into one of • SD(SID, Dorm) and DF(Dorm, Fee) • SD(SID, Dorm) and SF(SID, Fee) • SF(SID, Fee) and DF(Dorm, Fee) Which is better? Construct a similar argument

  36. Best decomposition frequently follows the FD arrows. • This is a guideline, not an absolute rule.

  37. Determinants • Consider SMA(SID, MID, AID) where a student has one advisor for a major and an advisor advises for one major. • This table is 3NF since there is only one non-key attribute. • S2 drops Physics and you may lose the fact that A3 advises for Physics. SID AID MID

  38. Def: If Y is fully functionally dependent on X then X is a determinant. • Def: A tuple is an entry from a relation. The name is rooted in the historical development by E.J. Codd who used mathematical models to describe relations. • Def: An attribute is a candidate key if that attribute uniquely identifies a tuple. A primary key is chosen from a list of candidates keys. • Every candidate key is a determinant.

  39. Boyce-Codd Normal Form (BCNF) • A relation is BCNFif every determinant is a candidate key. • SMA is NOT BCNF since AID is a determinant but not a candidate key.

  40. Possible decompositions: • SA(SID, AID) and AM(AID, MID) • No Loss but relations are not independent. • How do you “Find the major of S1”. • It requires a search of two tables which seems somewhat counterintuitive.

  41. SM(SID, MID) and AM(AID, MID). • Cannot get advisor of a student.

  42. SA(SID, AID) and SM(SID, MID). • Cannot get who advises what. • None of the three possible decompositions seems satisfactory.

  43. Student (S) redundant Advisor (A) Major (M) • Solution: Look at bigger picture (E-R diagram) • Relations: • S, M, A (With a foreign key matching the primary key in M), SM, and SA to implement the many-many relationships

  44. NOTE: With BOTH SM and SA, it is possible for inconsistency to occur. • Could have (S1, M5) in SM; • (S1, A3) in SA; • and have M8 as a foreign key for advisor A3 in the Advisor table. • Would need software or triggers to assure consistency which adds to overhead.

  45. On the other hand, relationship between S and M is derived from relationships between S and A and between A and M. • This provides an argument that the relationship between S and M should not be shown as a separate relationship

  46. Of course, then the fact that “ a student is majoring is something” is NOT explicitly stored. • The design is based on business rules which we assume to be correct. • May not always be the case.

  47. Maybe the business rule that states “a student is majoring in something” is flawed. • Allows a student to choose a major without having an advisor first.

  48. Perhaps a better rule is “a student has an advisor, which determines the major”. • It would be a model that forces student to choose an advisor, which may be a better rule since many students do NOT seek out advisors in timely fashion.

  49. Multivalued Dependencies (MVDs) • Consider SMA(Student, Major, Activity) • A student can have multiple majors and participate in multiple activities.

  50. This relation is BCNF vacuously (There are no determinants) • Can’t store the major of a student unless that student has an activity.

More Related