870 likes | 1.03k Views
Data Management: Databases and Organizations Richard Watson. Summary of Chapters 3-6 prepared by Kirk Scott. Data Modeling and SQL. Chapter 3. The Single Entity Chapter 4. The One-to-Many Relationship Chapter 5. The Many-to-Many Relationship
E N D
Data Management: Databases and OrganizationsRichard Watson Summary of Chapters 3-6 prepared by Kirk Scott
Data Modeling and SQL • Chapter 3. The Single Entity • Chapter 4. The One-to-Many Relationship • Chapter 5. The Many-to-Many Relationship • Chapter 6. One-to-One and Recursive Relationships
Introduction • Large parts of these overheads will be somewhat repetitive • They cover in general terms some of the things that were specifically illustrated by concrete SQL examples • However, the repetition shouldn’t be harmful • It should put the examples into a broader context, and add new examples to flesh the ideas out • The ultimate goal is for the basic concepts and diagramming to be clear so that there will be no trouble considering design questions in chapter 7
Chapter 3. The Single Entity • The author starts with the entity relationship diagramming conventions and the concept of a single entity • The author represents an entity with a box containing its name in capital letters inside, at the top • Full field names are given after that in small letters • The primary key field is marked with an asterisk
Different diagramming conventions are perfectly acceptable, as long as you are consistent • The name of the entity may be given above the box representing it • You may choose just to capitalize just the first letter of the name
In theory, you could qualify field names, although this would be redundant, given the entity name at the top • You could also use short names for fields if space is at a premium • Primary keys could be marked with pk or underlined
Chapter 4. The One-to-Many Relationship • The author uses the crow’s foot to mark a one-to-many relationship in a ER diagram • In a simple ER diagram fields may not be listed, just entity names and crow’s feet • In a more complete diagram, fields can be listed
The author does not include the embedded pk/fk in the list of fields in the fk/many table because it is redundant • I do not follow this convention • I believe that in the interests of clarity it is worthwhile to include the fk in the list of fields
Chapter 5. The Many-to-Many Relationship • As is known, the many-to-many relationship is the most “complicated” of the relationships • The book presents some interesting examples that arise in real situations • They illustrate ideas that are not immediately apparent from the examples that have gone before • The first example is based on a bill of sale, shown on the next overhead
The Bill of Sale Example: An Interesting Case of a pk/fk Relationship
The book analyzes this situation as consisting of base entities which are a sale and the items which are sold • There is a many-to-many relationship between these base entities because each sale can consist of many items • Also, each item can be present in many sales • The book’s ER for this analysis is shown on the next overhead
When first introducing many-to-many relationships, I referred to the table in the middle • More formally, the book refers to an associative entity • The associative entity is the table in the middle that captures the relationship between two base entities
In the ER notation for this example the + sign is used • This has not been seen before • For the purposes of understanding the book’s example, it is important to know what this means
The + sign is shown over a crow’s foot • It symbolizes the fact that the embedded fk is part of the pk of the table it’s embedded in • You have seen an example of a table in the middle where the pk is the concatenation of the two embedded fk’s • This example is not the same as that
In this example the saleno is the pk of the Sale table • It is embedded as a fk in the Lineitem table • A saleno value will appear in the Lineitem table as many times as there are separate lines belonging to the sale • These separate lines are identified by lineno’s • The lineno’s are not embedded fk’s based on the unique identifiers, itemno’s, of entries in the Item table
An alternative way of representing the relationship would be to list the fields of the table in the middle this way: • salenopk, fk • linenopk • itemnofk • lineqty • lineprice • Note again that the saleno is both a pk and a fk, while the lineno is purely pk
At first glance it may seem a little strange, but the table in the middle contains every line of every sale, listed separately • It is the saleno and the lineno together which uniquely identify the entries in the Lineitem table • This model actually reflects reality well • It differs, in particular, from the car sale example
In the car sale example, there were individual cars that were sold • In the example database they were only shown as being sold once • In reality, the same car might be sold more than once • This could be modeled by making the salesdate part of the pk of the Carsale table
In the Sale, Lineitem, Item example, the items are not actually individual items • An item is a kind of item, like a screw or a shovel or a microwave oven • The seller may have many of each kind of item in stock and doesn’t distinguish between individual items
Multiple instances of the same (kind of) item may be sold to the same customer • Also, the same (kind of) item can be sold to more than one customer • It’s not incredibly difficult, but it’s worth emphasizing that the itemnodoes appear in the table in the middle as a fk • This tells which item that line of a sale was in reference to • However, the itemno is not part of the pk of the table in the middle
In a perfect world, you might argue that each item should appear on only one line of a sale • If so, then you could dispense with individual line numbers and use the itemno as part of the pk instead • However, reality makes the given solution better
When creating a data model, it should be flexible and accommodate all possibilities • Could a customer, in the middle of making a purchase, decide that more instances of a certain item were desired? • If so, do you allow this, and how do you support it?
From a business point of view, few things are more destructive than a computer system whose model imposes artificial constraints on the user (seller and customer) • Of course, if a customer decides that more instances are desired you want to sell them
Have you ever heard any kinds of things like these: • “I’d like to let you buy more, but the computer won’t allow it.” • The customer wants to scream • “I’d like to let you buy more, but it will be necessary to start a completely new bill of sale.” • The customer wants to scream
“I’d like to let you buy more, but it will be necessary to go back and modify the earlier line of the sale for that item.” • The customer again wants to scream • The best scenario would go like this: • “Oh, you want 20 instead of 10? We’ll just add another line here at the bottom for another 10.” • The customer sighs with satisfaction…
Relational Division, For All, and Not Exists • The book points out that SQL, with operations like AND, OR, NOT, and so on, has qualities of algebra • Similarly, there are set operators like UNION • Although Microsoft Access SQL doesn’t support INTERSECT, some implementations do
The Cartesian product represents a form of multiplication for relations • The results of a join operation are a subset of the results of a product • In an algebraic system, the existence of a multiplication operation implies the existence of a division operation
As pointed out when doing the concrete SQL examples, there is no FOR ALL operator • However, double NOT EXISTS can accomplish the same thing • For those who are theoretically inclined, it may be worth noting that it is FOR ALL/double NOT EXISTS which is analogous to division in a relational system
In any case, on test one, not everyone was clear on the order and role of the tables in a double NOT EXISTS query • This book shows a ER diagram of 3 tables capturing a many-to-many relationship • This diagram is labeled generically, but it is of the same structure as the Lineitem example
It then outlines the double NOT EXISTS query that could be written for it • The fact that this models the Lineitem example is not important • The table in the middle could have a completely concatenated primary key • It could also have its own, separate primary key
The important point is that the base tables are at the ends of the ER diagram • The book refers to these as target and source, respectively • The table in the middle, the associative entity, is labeled Target-Source by the book
If you want to find those rows of the target which are in relation to all of the rows of the source, • Then in the double NOT EXISTS query: • The target appears first, in the outermost query • The source appears second, in the middle, in the first nested subquery • And the table in the middle appears last, in the second nested subquery • The ER diagram and the schematic query are shown on the next overhead
A Design with a Cycle • The next diagram illustrates a design containing a cycle • Such designs will become especially important when considering normalization, the theory of correctness in designs • For the time being simply note that there is nothing preventing designs with cycles
A Concatenated Key with Date • The next example design is one where both of the embedded foreign keys are part of the primary key of a table in the middle • However, it is more complicated than that because a date field is also included in the primary key • This allows the same pair of base values to be paired with each other more than once
A Simple Concatenated Key • The next design is actually somewhat simpler • It also has two embedded pk/fk’s in the table in the middle • The table in the middle isn’t pure key though • There is also a non-key attribute field for the table in the middle
The Music CD Library Example • In the overheads for chapters 3 and 4 some very primitive starting designs were given for a collection of music CD’s • At the end of chapter 5, with the capability to model many-to-many relationships, this model blossoms • On the next overhead an 8 entity design is shown • Note that 4 of the 8 entities can be classified as associative entities