1 / 62

Normalization

Normalization. Joe Meehean. Redundancies. Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing data make building software on top of DB difficult Normalization process of removing redundancies. Modification Anomalies.

zonta
Download Presentation

Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization Joe Meehean

  2. Redundancies • Repeated data in database • Wastes space • Can cause modification anomalies • unexpected side effect when changing data • make building software on top of DB difficult • Normalization • process of removing redundancies

  3. Modification Anomalies • Insert anomaly • extra data must be known to insert a row into a table • Update anomaly • must change multiple rows to modify a single fact • Deletion anomaly • deleting a row causes other data to be deleted • deletes more data than is necessary or desired

  4. Bad College Database • All data in 1 table

  5. Bad College Database • Insert anomaly • adding Rush Daniels as a student • requires knowing which offerings Rush isenrolled in • cannot add Rush as a student until he enrolls

  6. Bad College Database • Update anomaly • if Emily changes her name to Emma • need to change multiple rows

  7. Bad College Database • Delete anomaly • if Roger drops out of college and we delete him • we also delete that there is an offering of DB in the spring

  8. Functional Dependencies (FDs) • Constraint between 2 or more columns • Represented by → • X determines Y (X→Y) if there exists at most 1 value of Y for each value of X • like a mathematical function f(x) = y • left hand side (or LHS) is called the determinant • e.g., StdNo determines Student first name • StdNo → First Name

  9. Organizing FDs • Make a list • can condense list by listing all dependent columns for a given determinant • e.g., StdNo→First Name, Last Name • Determinants should be minimal • least # of columns required to determine values of other columns • e.g., StdNo,First Name → Last Name

  10. Bad College Database • StdNo→ First Name, Last Name • OfferNo → Term, Year, Course No, Course Descr. • Std No, Offer No → Grade

  11. Identifying FDs • From business narrative • Look for words like unique • e.g., “Each student has a unique student number, a first name, and a last name.” • Look for 1-M relationships • child (M-side) is the determinant (LHS) • e.g., “Faculty teach many offerings.” • e.g., Offer No → Faculty Id

  12. Identifying FDs • From relational tables • FDs where determinant (LHS) is not the PK or a candidate key • recall, a candidate key is column(s) that unique identify a row • e.g., Zip →State • Combined PKs • does 1 column determine values of some other columns? • e.g., StdNo→ First Name, Last Name

  13. Questions?

  14. Normal Forms • Normalization • remove redundancies in tables • removes modification anomalies • makes data easier to modify • Normal form • rules about functional dependencies (FDs) allowed • each successive normal form removes FDs

  15. Normal Forms 1NF 2NF 3NF/BCNF

  16. 1st Normal Form • All relational tables are already in 1NF by definition

  17. 2nd Normal Form • Key columns • columns that are part (or all of) a candidate key • recall a candidate key is a key that uniquely identifies a row • Non-key columns • columns that are not part of a candidate key

  18. 2nd Normal Form • A table is in 2NF if each non-key column • depends on all candidate keys • NOT on any subset of any candidate key • check functional dependencies (FDs) • A 2NF violation • a FD where part of a key determines a non-key column

  19. 2nd Normal Form • 2NF Violations • StdNo → First Name, Last Name • OfferNo → Term, Year, Course No, Course Descr.

  20. 3rd Normal Form • A table is in 3NF • if it is in 2NF • AND each non-key column depends only on candidate keys • NOT other non-key columns • e.g., CourseNr → Course Desc. • 3NF violation • a non-key column on the right-hand side (RHS) • AND anything other than a candidate key on LHS

  21. 3rd Normal Form • 3NF prohibits transitive dependencies • Transitive dependencies • if A → B & B → C, then A → C • e.g., Offer No → Course No • & Course No → Course Desc. • then Offer No → Course Desc.

  22. Combined 2NF & 3NF • A table is in 3NF if each non-key column depends on • all candidate keys • whole candidate keys • and nothing but candidate keys

  23. 3rd Normal Form • 2NF Violations • StdNo → First Name, Last Name • OfferNo → Term, Year, Course No, Course Descr. • 3NF Violations • CourseNo → Course Descr. • OfferNo→ Course Descr.

  24. Boyce-Codd Normal Form (BCNF) • Revised, simpler version of 3NF • Covers additional special cases • A table is in BCNF if every determinant is a candidate key • Violations are easy to detect • determinant (LHS) is not a candidate key • e.g., StdNo → Last Name

  25. Boyce-Codd Normal Form (BCNF) • Excludes 2 redundancies that 3NF does not • part of a key determines part of a key • a non-key determines part of a key

  26. Boyce-Codd Normal Form (BCNF) • BCNF Violations • Email → StdNo

  27. Simple Synthesis (BCNF) • Convert tables into BCNF • Eliminate extraneous columns from LHS of FDs • Remove derived (transitive) FDs • Arrange FDs into groups by determinant • For each FD group make table with determinant as primary key • Merge tables where one table include all columns of other table • choose PK of one of the tables to be PK of new table

  28. Bad College Database (1) • StdNo→ First Name • StdNo → Last Name • OfferNo → Term • OfferNo→ Year • Offer No → Course No • Offer No → Course Descr. • Std No, Offer No → Grade • Course No → Course Descr.

  29. Bad College Database (2) • StdNo→ First Name • StdNo → Last Name • OfferNo → Term • OfferNo→ Year • Offer No → Course No • Offer No → Course Descr. • Std No, Offer No → Grade • Course No → Course Descr.

  30. Bad College Database (3) • StdNo→ First Name, Last Name • OfferNo → Term, Year, Course No • Std No, Offer No → Grade • Course No → Course Descr.

  31. Bad College Database (4)

  32. Bad College Database (5)

  33. Importance of Normal Form Violations • We have the BCNF synthesis process • we can just make BCNF tables • why do we care about detecting NF violations? • DBA has 2 jobs • make new databases • maintain old ones • Making new DBs requires using BCNF synthesis process • Maintaining old DBs requires detecting NF violations • perhaps made by other employees • detecting violations narrows scope of DB redesign

  34. Questions?

  35. 4th Normal Form (4NF) • M-way relationships • associative entity types (weak entities) • multiple associations • primary key made of FKs from 3 or more tables • often represent important documents • glue multiple things together • e.g., invoice • can sometimes contain redundancies

  36. 4th Normal Form (4NF) Student Offering Textbook Enroll StdNo Name OfferNo Location TextNo TextTitle

  37. 4th Normal Form (4NF) Enroll Table

  38. Multivalued Dependencies (MVDs) • Given table R with columns X,Y, and Z • X→→ Y • each X maps to a set of Ys (between 1 and M) • X →→ Z • each X maps to a set of Zs (between 1 and M) • Y & Z are independent • knowing Y doesn’t tell you anything about Z and vice-versa • Y →→ Z & Y → Z • Z →→ Y & Z → Y • also Y,V →→ Z, unless V →→ Z • Every FD is an MVD • not every MVD is an FD

  39. Trivial MVDs • MVD X →→ Y is trivial if • Y is a subset of X • OR X and Y are the only columns in the table • OR X → Y and X → Z • e.g., has-job table • E# →→ P# • e.g. offering table • C#, S# →→ #S

  40. Multivalued Dependences (MVDs) • non-trivial MVDs manifest as redundancies in tables • there exist rows where X and Y are the same but Z is different • e.g., enroll table • O# →→ S# • O# →→ T# • S# independent of T# • if Emily drops 242 it doesn’t change the text books

  41. Multivalued Dependences (MVDs) • non-trivial MVDs manifest as redundancies in tables • there exist rows where X and Y are the same but Z is different • e.g., enroll table • O# →→ S# • O# →→ T# • S# independent of T# • if Emily drops 242 it doesn’t change the text books

  42. 4th Normal Form (4NF) • 4th normal form • table in BCNF • AND all MVDs are trivial • Detecting a violation • are there any MVDs? • are those MVDs non-trivial?

  43. 4th Normal Form (4NF) • Resolving violations • X →→ Y • X →→ Z

  44. More Examples • S →→ O & S →→ G ? • O →→ G & O →→ S ? • G →→ S & G →→ O ?

  45. More Examples • S →→ O & S →→ G ? • O →→ G & O →→ S ? • G →→ S & G →→ O ? • Offering and Grade not independent • Grade and Student not independent • Student and Offering not indepedent

  46. More Examples • B →→ E & B →→ C • Is this a trivial MVD?

  47. More Examples • B →→ E & B →→ C • Is this a trivial MVD? • E is not a subset of B & C is not a subset of B • B and E are not the only columns in the table • B → E & B → C • NO!!!

  48. More Examples

  49. Questions?

  50. QUIZ BREAK!!! • PQty →→ PDesc & PQty →→ Part# ?

More Related