500 likes | 665 Views
Chapter 10. Logical Data Modeling: Normalization Physical Database Design: Overview and Denormalization. Agenda. Data Modeling in the SDLC and Outcomes What is Logical Data Modeling and Normalization Functional Dependency Normal Forms Translating ERDs to Normalized Relations
E N D
Chapter 10 Logical Data Modeling: Normalization Physical Database Design: Overview and Denormalization
Agenda • Data Modeling in the SDLC and Outcomes • What is Logical Data Modeling and Normalization • Functional Dependency • Normal Forms • Translating ERDs to Normalized Relations • View Integration • Overview of Physical Database Design • Denormalization
Learning Objectives • Understand key terminology • Know where logical data modeling fits into systems development • Be able to represent relationships through relations • Be able to translate ERDs into normalized relations • Be able to merge different sets of normalized relations
l Data Modeling within the SDLC
Outcomes of Logical Database Design • Normalized relations (see Examples) Note: Account for every data element on or in • system or process input (e.g., forms) • system or process output (e.g., reports, query screens) • data stores or E-R diagram Each data element must be kept in the system’s database or derived from data in the database
Logical Database Design • Based upon the conceptual data model • Four key steps • Develop a logical data model for each known user interface for the application using normalization principles. • Combine normalized data requirements from all user interfaces into one consolidated logical database model (view integration). • Translate the conceptual E-R data model for the application into normalized data requirements. • Compare the consolidated logical database design with the translated E-R model and produce one final logical database model for the application.
Physical Database Design • Based upon results of logical database design • Key decisions • Choosing storage format for each attribute from the logical database model • Grouping attributes from the logical database model into physical records • Arranging related records in secondary memory (hard disks and magnetic tapes) so that records can be stored, retrieved and updated rapidly • Selecting media and structures for storing data to make access more efficient
Integrated Set of Relations (Fig. 10-3c) Notice that this attribute was in Query view (Fig. 10-3a) but not Report view (Fig. 10-3b) (derived from two previous examples)
Conceptual Data Model and Transformed Relations (Fig. 10-3d) Customer Order Processing Application
Final Set of Normalized Relations (Fig. 10-3e) Notice that these attributes were in ER diagram (Fig. 10-3d) but not in integrated view from Query and Report (Fig. 10-3c) (derived from two previous examples) Notice that this attribute was in integrated view from Query and Report (Fig. 10-3c) but not in ER diagram (Fig. 10-3d)
What Is Logical Data Modeling • Translating conceptual data models into a format consistent with the architecture used by the data management software to be used with the application • Normalization • analysis of functional dependencies between data items to result in a structure of data that is simple, stable, and fundamental
Well-Structured Relations • Avoid Anomalies (Errors, Inconsistencies, Problems) • Insertion • Deletion • Modification
Anomalies: Example of Non-Simple Relation Primary Key • Although Emp_ID and Course make each row unique, the table is not simple • Data in each instance not just related to primary key • It should be possible to add a new employee without supplying course data, but not here (thus, an insertion anomaly) • It should be possible to delete a course without losing employee data, but not here (thus, a deletion anomaly) • Notice redundant data; errors could occur on modifying data, e.g., salary (thus, a modification anomaly) • Table contains data on more than one entity; use separate, simpler tables
Example of Simpler Relations: No Redundancy Primary Key Primary Key • Each instance contains only data related to the primary key • Anomalies less likely • Insertion of employee does not require course data • Deleting course would not affect data in employee table • Less redundancy makes modification errors less likely
Data Normalization • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. • The process of decomposing relations with anomalies to produce smaller, well-structured relations.
Functional Dependency • For a relation (table), attribute A depends on attribute B if for every valid row the value of B determines the value of A B A • E.g. • Student ID Student name • Order No + Product No Quantity ordered
Normal Forms • First normal form • No multi-valued attributes. • Every attribute value is atomic. • Second normal form • 1NF and every non-key attribute is fully functionally dependent on the primary key. • Every non-key attribute must be defined by the entire key, not by only part of the key. • No partial functional dependencies. • Third normal form • 2NF and no transitive dependencies (functional dependency between non-key attributes.)
Relation with transitive dependency (a) SALES relation with simple data
Removing a transitive dependency (a) Decomposing the SALES relation
Transforming E-R Diagrams into Relations 1. Map Regular Entities to Relations. • Composite attributes: Use only their simple, component attributes. • Multi-valued Attribute - Becomes a separate relation with a foreign key taken from the superior entity.
Mapping a composite attribute (a) CUSTOMER entity type with composite attribute
Transforming E-R Diagrams Into Relations 2. Map Weak Entities • Becomes a separate relation with a foreign key taken from the superior entity.
Example of mapping a weak entity (a) Weak entity DEPENDENT
Transforming E-R Diagrams Into Relations 3. Map Binary Relationships • One-to-Many - Primary key on the one side becomes a foreign key on the many side • Many-to-Many - Create a new relation with the primary keys of the two entities as its primary key • One-to-One - Primary key on the mandatory side becomes a foreign key on the optional side
Example of mapping a 1:M relationship (a) Relationship between customers and orders
Example of mapping an M:N relationship (a) Requests relationship (M:N)
Mapping a binary 1:1 relationship (a) Binary 1:1 relationship
Transforming E-R Diagrams Into Relations 4. Map Associative Entities • Identifier Not Assigned • Default primary key for the association relation is composed of the primary keys of the two entities • Identifier Assigned • It is natural and familiar to end-users. • Default identifier may not be unique.
Mapping an associative entity with an identifier (a) Associative entity (SHIPMENT)
Transforming E-R Diagrams Into Relations 5. Map Unary Relationships • One-to-Many - Recursive foreign key in the same relation • Many-to-Many - Bill-of-materials: Two relations: • One for the entity type. • One for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity
Transforming E-R Diagrams Into Relations 6. Map Ternary (and n-ary) Relationships • One relation for each entity and one for the associative entity
Mapping a ternary relationship (a) Ternary relationship with associative entity
View Integration / Merging Relations • Building relations from independent “views” or data models • Issues that arise • Synonyms • Homonyms • Transitive dependencies • ISA relationships (supertypes/subtypes)
Physical File and Database Design • The following information is required: • Normalized relations, including volume estimates • Definitions of each attribute • Descriptions of where and when data are used, entered, retrieved, deleted, and updated (including frequencies) • Expectations or requirements for response time and data integrity • Descriptions of the technologies used for implementing the files and database • Must be collected during Analysis
Designing Fields • Field • Smallest unit of named application data recognized by system software • Attributes from relations will be represented as fields • Data Type • A coding scheme recognized by system software for representing organizational data • Choosing data types • Four objectives • Minimize storage space • Represent all possible values of the field • Improve data integrity of the field • Support all data manipulations desired on the field • Calculated fields • A field that can be derived from other database fields
Methods of Controlling Data Integrity • Default Value • A value a field will assume unless an explicit value is entered for that field • Range Control • Limits range of values that can be entered into field • Referential Integrity • An integrity constraint specifying that the value (or existence) of an attribute in one relation depends on the value (or existence) of the same attribute in another relation • Null Value • A special field value, distinct from 0, blank, or any other value, that indicates that the value for the field is missing or otherwise unknown
Denormalization • Implementing relations in non-normalized form • Why? • Optimize processing efficiency at cost of managing anomalies • combine into one table data needed together • separate a table which has several uses • Partition over several servers
Indexed File Organization A file organization in which rows are stored either sequentially or nonsequentially and an index is created that allows software to locate individual rows Index: A table used to determine the location of rows in a file that satisfy some condition
Guidelines for Choosing Indexes • Specify a unique index for the primary key of each table. • Specify an index for foreign keys. • Specify an index for nonkey fields that are referenced in qualification, sorting and grouping commands for the purpose of retrieving data.
Summary: Four Key Steps in Logical Database Modeling • Develop a logical data model for each known user interface (form and report) using normalization principles • Combine normalized data requirements from all interfaces (view integration) • Translate E-R data model (developed without explicit concern for interfaces) into normalized data requirements • Compare results of 2 and 3 and produce, through view integration, one final model