1 / 30

Fundamentals/ICY: Databases 2010/11 WEEK 9

Fundamentals/ICY: Databases 2010/11 WEEK 9. John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK. Subsets and Supersets.

bdarnell
Download Presentation

Fundamentals/ICY: Databases 2010/11 WEEK 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fundamentals/ICY: Databases2010/11WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

  2. Subsets and Supersets • A  Bmeans that A is a “subset” of B (and that B is a “superset” of A). I.e., every member of A is also a member of B. • Carefully distinguish between subset-of and member-of!!! • The symbol means the same as  • does NOTmean that there cannot be equality. • Examples: •  {4,5} • {5}  {4,5,6}, {6,4}  {4,5,6,7}, {6,4,7,5}  {4,5,6,7} • {n | n is an even whole number}  {n | n is a whole number}

  3. Subsets and Supersets •  Afor any set A. • A  Afor any set A. (Reflexivity) • If A BandBA then A = B. (Antisymmetry) • If A BandBC then AC. (Transitivity)

  4. Some Operations on Sets • Union of sets A and B: • AB = the set of things that are in A or B (or both). • NB: no repetitions created. • Intersection of sets A and B: • A  B = the set of things that are in both A and B. • Difference of sets A and B: • A  B = the set of things that are in A but not B. • Note: also notated by a backslash instead of a minus sign.

  5. Some Properties of those Operations • Union and intersection are commutative • (“can switch”): • AB = BA • A  B = B  A • Union and intersection are associative • (“can group differently”): • A (B  C) = (AB)  C • A  (B  C) = (A  B)  C • Because of associativity, we can omit parentheses: • AB  C  D A  B  C  D

  6. Two Other Properties • Union distributes over intersection: • A(B  C) = (AB)  (A  C) • Intersection distributes over union: • A(B  C) = (AB) (A  C)

  7. Same Difference? • Exercises for bath-time: • Is the difference operation commutative or associative? • And does it take part in any distributivity with the other operations?

  8. “Tuples” • A “tuple” is an ordered sequence of items of any sort. We will only deal with finite tuples. Items CAN be duplicated. • Can also be called a “vector” in other CS terminology. • Notation: 6, JAB, 5, “JAB”, 5, , 9> • Singleton and empty tuples: <6>, <> • The concatenation ( ⃘) of two tuples is just the result of putting them end to end to get one tuple. • <6, JAB, 5> ⃘ <5,6> = <6, JAB, 5, 5, 6> • <6, JAB, 5> ⃘ <> = <6, JAB, 5>

  9. Table Rows are “Tuples” • In a table, each attribute has a “domain” – the set of values that the attribute can have. E.g., the set of integers, the set of all character strings of any length, or the set of character strings of a specific format and length. • If the attribute allows NULL values, we include NULL in the value domain as well. • The values in a row form a tuple of values from the respective value domains. Just a list of the values, one for each attribute.

  10. Tuples in a Table People • The tuples are •  ‘9568876A’, ‘Chopples’, 37 > •  ‘2544799Z’, ‘Blurp’, NULL > •  ‘1698674F’, ‘Rumpel’, 88 >

  11. “Cartesian Products” and “Relations” • The set of all possible tuples formed from some sets is called the Cartesian product of the sets. • Notation, e.g.: D  E F  G  H • if D, E, F, G, H are the sets—not necessarily different. • Any subsetat all of that Cartesian product is called a relation on the sets in question (D, E, …) • even the whole of the product (even if infinite) • and even the empty set. • I.e., a relation on D, E, …, H is just some set of tuples that are each of form <d,e, …, h> where d  D, e  E, …, h  H.

  12. Examples • Let A = {3, 8, 2} and B = {‘jjj’, ‘bb’}. • Then A  B = • { <3, ‘jjj’>, <3, ‘bb’>, <8, ‘jjj’>, <8, ‘bb’>, <2, ‘jjj’>, <2, ‘bb’> }. • B  B = { <‘jjj’, ‘jjj’>, <‘jjj’, ‘bb’>, <‘bb’, ‘jjj’>, <‘bb’, ‘bb’>}. • A   =  =   A • A  {TRUE} = { <3, TRUE>, <8, TRUE>, <2, TRUE> } • Some relations on A andB: • {<3, ‘jjj’>, <3, ‘bb’>, <2, ‘jjj’>} • { <2, ‘bb’> } • A  B • 

  13. Rows as forming a Relation • So, for a given table, the set of all possible rows, considered as tuples, forms the Cartesian product of the value domains of the table. • And, provided the table does not have repeated rows: • AT ANY MOMENTthe actual set of rows, considered as tuples, is a relation on the table’s value domains. • NB: crucial here that no row is exactly repeated, because a mathematical set cannot have repeated elements.

  14. Relation from a Table People • The relationat the moment is • ‘9568876A’, ‘Chopples’, 37 > • ‘2544799Z’, ‘Blurp’, NULL > •  ‘1698674F’, ‘Rumpel’, 88 >

  15. A Table as a Relation? • People loosely talk about tables being relations. • This is mathematically inaccurate for several reasons: • The table properly speakingincludes not just the rows but also the attribute names themselves, their domains, specification of primary and foreign keys, etc. • It’s only the rows at any given momentthat form a relation. When a value in the table changes or a row is added or deleted, the mathematical relation is replaced by a different one. • Relations do not cater for tables with repeated rows. • ((But there is a more advanced notion of relation, based on “bags” rather than sets, that does cater for repeated rows.)) • But OK if you know what you (and those people) mean.

  16. ((Aside: “Bags” in Maths)) • A variant of sets called “bags” (or “multisets”) is used in maths (and CS) and allows repeated members. There are union, etc. operations that respect the repetitions. • So bags and their operations are a better fit to DB tables and notably their repetition-respecting operations (e.g. UNION ALL) than sets and their operations are. • But bags are non-standard and they’re not normally covered at an introductory level. • See Garcia-Molina et al 2009 for bags and their use in the DB area.

  17. — Back to Database Design —NORMALIZATION

  18. That Table put into 1NF (assuming there is a PK)

  19. 1NF can have Undesirable Dependencies • 1NF tables can contain “partial,” “transitive” and other generally undesirable functional dependencies of an attribute X on a determinant D. • By “undesirable” I will mean mainly that the determinant D is not a superkey, so that at least one attribute Y in the table is not determined by D, • so Y can have different values in the table for equal D values, • so redundancy (repetition of the association betweenD and X values) can arise.

  20. Partial and Transitive Dependencies

  21. 1NF can have Partial Dependencies • Partial dependency: where the determinant is part but not all of the primary key (and NB: is therefore not a superkey) • The determined attribute X is necessarily outside the whole PK—exercise: why?

  22. Second Normal Form • A table is in second normal form (2NF) if: • It is in 1NF and • It includes nopartialdependencies

  23. Conversion to 2NF • For each determinantD involved in a partial dependency in the original table T, • use D as the PK for a new tableNT(D) • and move out the attributes X determined by Dinto NT(D). • D itself stays in T as well as being copied into NT(D).

  24. Reminder:Partial and Transitive Dependencies

  25. Second Normal Form (2NF) Conversion results on example on previous slide

  26. But 2NF can still haveUndesirable Dependencies • A prime attribute is one that is within some candidate key • (not necessarily the primary key). • A transitive dependency is where thedeterminant Dis at least partially outside the PK and is not a superkey, • and the determined attribute X is non-prime (and therefore in particular is not inside the PK; the reason for this restriction is on a later slide). • E.g.: previous Figure for simple case of a simple (= one-attribute) determinant. • Above definition is partly based on Garcia-Molina, Ullman & Widom 2009 – see later ref. More general than the account in R&C and R,C&C.

  27. Third Normal Form • A table is in third normal form (3NF)if: • It is in 2NF and • It contains no transitive dependencies

  28. Conversion to 3NF • For each determinantD involved in a transitive dependency in the original table T, • use D as the PK for a new tableNT(D) • and move out the attributes X transitively determined by Dinto NT(D). • NB: the determinants themselves stay in T as well.

  29. In 2NF but not in 3NF

  30. Third Normal Form (3NF) Conversion Results on previous example

More Related