Fundamentals/ICY: Databases 2010/11 WEEK 9

Fundamentals/ICY: Databases2010/11WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Subsets and Supersets • A  Bmeans that A is a “subset” of B (and that B is a “superset” of A). I.e., every member of A is also a member of B. • Carefully distinguish between subset-of and member-of!!! • The symbol means the same as  • does NOTmean that there cannot be equality. • Examples: •  {4,5} • {5}  {4,5,6}, {6,4}  {4,5,6,7}, {6,4,7,5}  {4,5,6,7} • {n | n is an even whole number}  {n | n is a whole number}

Subsets and Supersets •  Afor any set A. • A  Afor any set A. (Reflexivity) • If A BandBA then A = B. (Antisymmetry) • If A BandBC then AC. (Transitivity)

Some Operations on Sets • Union of sets A and B: • AB = the set of things that are in A or B (or both). • NB: no repetitions created. • Intersection of sets A and B: • A  B = the set of things that are in both A and B. • Difference of sets A and B: • A  B = the set of things that are in A but not B. • Note: also notated by a backslash instead of a minus sign.

Some Properties of those Operations • Union and intersection are commutative • (“can switch”): • AB = BA • A  B = B  A • Union and intersection are associative • (“can group differently”): • A (B  C) = (AB)  C • A  (B  C) = (A  B)  C • Because of associativity, we can omit parentheses: • AB  C  D A  B  C  D

Two Other Properties • Union distributes over intersection: • A(B  C) = (AB)  (A  C) • Intersection distributes over union: • A(B  C) = (AB) (A  C)

Same Difference? • Exercises for bath-time: • Is the difference operation commutative or associative? • And does it take part in any distributivity with the other operations?

“Tuples” • A “tuple” is an ordered sequence of items of any sort. We will only deal with finite tuples. Items CAN be duplicated. • Can also be called a “vector” in other CS terminology. • Notation: 6, JAB, 5, “JAB”, 5, , 9> • Singleton and empty tuples: <6>, <> • The concatenation ( ⃘) of two tuples is just the result of putting them end to end to get one tuple. • <6, JAB, 5> ⃘ <5,6> = <6, JAB, 5, 5, 6> • <6, JAB, 5> ⃘ <> = <6, JAB, 5>

Table Rows are “Tuples” • In a table, each attribute has a “domain” – the set of values that the attribute can have. E.g., the set of integers, the set of all character strings of any length, or the set of character strings of a specific format and length. • If the attribute allows NULL values, we include NULL in the value domain as well. • The values in a row form a tuple of values from the respective value domains. Just a list of the values, one for each attribute.

Tuples in a Table People • The tuples are •  ‘9568876A’, ‘Chopples’, 37 > •  ‘2544799Z’, ‘Blurp’, NULL > •  ‘1698674F’, ‘Rumpel’, 88 >

“Cartesian Products” and “Relations” • The set of all possible tuples formed from some sets is called the Cartesian product of the sets. • Notation, e.g.: D  E F  G  H • if D, E, F, G, H are the sets—not necessarily different. • Any subsetat all of that Cartesian product is called a relation on the sets in question (D, E, …) • even the whole of the product (even if infinite) • and even the empty set. • I.e., a relation on D, E, …, H is just some set of tuples that are each of form <d,e, …, h> where d  D, e  E, …, h  H.

Examples • Let A = {3, 8, 2} and B = {‘jjj’, ‘bb’}. • Then A  B = • { <3, ‘jjj’>, <3, ‘bb’>, <8, ‘jjj’>, <8, ‘bb’>, <2, ‘jjj’>, <2, ‘bb’> }. • B  B = { <‘jjj’, ‘jjj’>, <‘jjj’, ‘bb’>, <‘bb’, ‘jjj’>, <‘bb’, ‘bb’>}. • A   =  =   A • A  {TRUE} = { <3, TRUE>, <8, TRUE>, <2, TRUE> } • Some relations on A andB: • {<3, ‘jjj’>, <3, ‘bb’>, <2, ‘jjj’>} • { <2, ‘bb’> } • A  B • 

Rows as forming a Relation • So, for a given table, the set of all possible rows, considered as tuples, forms the Cartesian product of the value domains of the table. • And, provided the table does not have repeated rows: • AT ANY MOMENTthe actual set of rows, considered as tuples, is a relation on the table’s value domains. • NB: crucial here that no row is exactly repeated, because a mathematical set cannot have repeated elements.

Relation from a Table People • The relationat the moment is • ‘9568876A’, ‘Chopples’, 37 > • ‘2544799Z’, ‘Blurp’, NULL > •  ‘1698674F’, ‘Rumpel’, 88 >

A Table as a Relation? • People loosely talk about tables being relations. • This is mathematically inaccurate for several reasons: • The table properly speakingincludes not just the rows but also the attribute names themselves, their domains, specification of primary and foreign keys, etc. • It’s only the rows at any given momentthat form a relation. When a value in the table changes or a row is added or deleted, the mathematical relation is replaced by a different one. • Relations do not cater for tables with repeated rows. • ((But there is a more advanced notion of relation, based on “bags” rather than sets, that does cater for repeated rows.)) • But OK if you know what you (and those people) mean.

((Aside: “Bags” in Maths)) • A variant of sets called “bags” (or “multisets”) is used in maths (and CS) and allows repeated members. There are union, etc. operations that respect the repetitions. • So bags and their operations are a better fit to DB tables and notably their repetition-respecting operations (e.g. UNION ALL) than sets and their operations are. • But bags are non-standard and they’re not normally covered at an introductory level. • See Garcia-Molina et al 2009 for bags and their use in the DB area.

— Back to Database Design —NORMALIZATION

That Table put into 1NF (assuming there is a PK)

1NF can have Undesirable Dependencies • 1NF tables can contain “partial,” “transitive” and other generally undesirable functional dependencies of an attribute X on a determinant D. • By “undesirable” I will mean mainly that the determinant D is not a superkey, so that at least one attribute Y in the table is not determined by D, • so Y can have different values in the table for equal D values, • so redundancy (repetition of the association betweenD and X values) can arise.

Partial and Transitive Dependencies

1NF can have Partial Dependencies • Partial dependency: where the determinant is part but not all of the primary key (and NB: is therefore not a superkey) • The determined attribute X is necessarily outside the whole PK—exercise: why?

Second Normal Form • A table is in second normal form (2NF) if: • It is in 1NF and • It includes nopartialdependencies

Conversion to 2NF • For each determinantD involved in a partial dependency in the original table T, • use D as the PK for a new tableNT(D) • and move out the attributes X determined by Dinto NT(D). • D itself stays in T as well as being copied into NT(D).

Reminder:Partial and Transitive Dependencies

Second Normal Form (2NF) Conversion results on example on previous slide

But 2NF can still haveUndesirable Dependencies • A prime attribute is one that is within some candidate key • (not necessarily the primary key). • A transitive dependency is where thedeterminant Dis at least partially outside the PK and is not a superkey, • and the determined attribute X is non-prime (and therefore in particular is not inside the PK; the reason for this restriction is on a later slide). • E.g.: previous Figure for simple case of a simple (= one-attribute) determinant. • Above definition is partly based on Garcia-Molina, Ullman & Widom 2009 – see later ref. More general than the account in R&C and R,C&C.

Third Normal Form • A table is in third normal form (3NF)if: • It is in 2NF and • It contains no transitive dependencies

Conversion to 3NF • For each determinantD involved in a transitive dependency in the original table T, • use D as the PK for a new tableNT(D) • and move out the attributes X transitively determined by Dinto NT(D). • NB: the determinants themselves stay in T as well.

In 2NF but not in 3NF

Third Normal Form (3NF) Conversion Results on previous example

Fundamentals/ICY: Databases 2010/11 WEEK 9

Fundamentals/ICY: Databases 2010/11 WEEK 9

Presentation Transcript

Fundamentals of Sampling Method

Fundamentals of Electrochemistry

Organizational Chart

11/5-11/9 THIS WEEK

WEEK 11 ENGLISH 9

Wed Week 9

English 10 Week 34

INT222 – Internet Fundamentals

INT222 – Internet Fundamentals

WELCOME To ESC101N: Fundamentals of Computing

WEEK 11 ENGLISH 9

Week of 9/11 Memorial

English 10 Week One: 9/6/11-9/9/11

INT222 – Internet Fundamentals

INT222 – Internet Fundamentals

English 9: Week 11- Monday

Mon Week 9

Of Burbank

PAD 540 STR Course Tutorial / Tutorialoutlet

Accounting Fundamentals

Python Fundamentals

Chap 9/11 : Fundamentals of Genetics