280 likes | 440 Views
Database Design &Implementation. One thing is paramount in military, commercial or industrial applications: Never lose the content of an operational database. This requires persistence .
E N D
Database Design &Implementation • One thing is paramount in military, commercial or industrial applications: Never lose the content of an operational database. This requires persistence. • Hybrid object-relational databases (ORDB’s) are one way to solve the problem of writing object-oriented applications with persistent data content. • The COOL framework includes GEN which generates C/C++ code for a hybrid ORDB, and LCP which supports method delegation between prototype object instances by interpreting a database of function names and/or function pointers. • Understanding ORDBs requires more details about database architecture (more slides).
Relational Databases • An RBD is a set of ‘tuples’; each tuple represents a simple object with scalar attributes. Tuples are stored externally as records in a file and viewed conceptually as rows of a table, or geometrically as points in a multi-typed coordinate space. • Complex structured data types (and object instances) are decomposed or ‘normalized’ into simple parts or Second Normal Form (2NF): (no structured attributes or repeating groups are allowed). • For maintenance and reliability reasons, the design is further normalized (3NF): (There are no redundant or indirectly computable field values and all properties are stored in only one place.) [Ref: Sanders Ch. 3 and Appndx A.] • Other database types include object-oriented OODB’s, and object-relational ORDB’s. (next slide)
Composite pkeys in RDB’s • Every tuple must have a unique field (or set of fields) called its primary key (pkey) which uniquely identifies it. • A composite pkey for a child or component tuple is often built by concatenating multiple key fields from a chain of ancestors. This complicates pkey-to-fkey matching. • Example: Dept--->Course--->Section (ERD on next slide) • CS has Dept# = 91 and OOAD has course# = 91.522. • Almost every course has a section # 201, so 201 is only a unique identifier within the child set of sections of a particular course, just as 522 is only a unique course# within a particular Department. • (In my syllabus I renamed this 01f522 - Dept 91 is assumed. 01f adds a new ‘term=Fall 2001’ component to this identifier. I teach only one section of CS Dept courses over multiple terms.)
Composite Pkeys (Example) • Example: CS Dept View of SIS Database ERD: • The unique pkey which selects my section of OOAD in the Student Information System (SIS) Database is a composite of Dept, Course and Section number: 91.522.201. Department pkey: 91 Course pkey: 91+522 Section pkey: 91+522+201 (This is an’instance diagram’, not an ERD. It shows field values in a single table row, whereas an ERD shows only entity types.)
Surrogate Keys in RDB’s • The unique pkey which selects this section of OOAD in the Student Information System (SIS) Database is a composite of Dept, Course and Sectio number: 91.522.201. • For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’ pkey to replace the user-defined composite keys. This improves code quality and performance (by expediting the fundamental RDB operation ‘join’: match pkeys to fkeys). • Example: Entity with old and new key name and value: • Entity: alternate (old pkey): surrogate (name = value): • Dept deptNo = 91 DEid = DE000001 • Course courseNo = 91+522 COid = CO000220 • Section sectNo= 91+522+201 SEid = SE002601
RDB with Surrogate pkeys: • A GEN Example: CS Dept View of SIS Database ERD: • Entity: alternate (old pkey): surrogate (name = value): • Dept deptNo = 91 DEid = DE000001 • Course courseNo = 91+522 COid = CO000220 • Section sectNo= 91+522+201 SEid = SE002601 Department DE DE000001 91 (Note that the fkey only references the immediate ancestor or container of an object or tuple.) Course CO CO000220 DE000001 522 Section SE SE002601 CO000220 201 A Persistence Requirement(WHY?): Each table has a mnemonic abbreviation (DE,CO,SE) encoded into the pkey value of its objects.
Surrogate Keys in COOL/GEN • GEN uses surrogate pkeys and matching fkeys, but does not hide them. (OK for CAD/CASE tools with hi-tech users.) • Pkeys can never be re-used for new objects, as long as fkeys exist that can reference their former object (in old but still-in-use database versions).
Persistent Object Identifiers • C++ and Java objects have an object-id (oid), typically represented by its virtual memory address. This oid corresponds at least conceptually to the pkey of an RDB tuple. This type of oid is not visible and not persistent, because it disappears when the program terminates. • One way to avoid loss of information and achieve persistence is to have the RDBMS take over or duplicate OS memory-mapping functions: moving large segments of virtual memory to/from mass storage in a fail-safe manner. • Another way to achieve persistence is to convert pkey/fkey relationships to/from object references during import/export data flows. (This is done by COOL/GEN.)
Persistent Databases • Persistence means that pkeys and fkeys are preserved during export to mass storage or remote sites and re-import by the same or another DataBase Management System (DBMS) • A relational database (RBD) supports inter-object relationships by foreign key (fkey) fields. These are both user-visible and persistent: they get saved in mass storage if the program terminates. • The process of mapping RDB pkey-fkey associations to and from C++ pointers is called ‘pointer swizzling’. import Database in Main Memory Database in Mass Storage export
Referential Integrity • The principle of ’Referential Integrity’: • To maintain valid database content, all fkey values must match the unique primary key or object identifier of another tuple, or else have the reserved ‘null’ (unknown or undefined) value. • N-ary relations (N-way associations) can be implemented by a new associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes). • Most relations are binary (N = 2). Note that fkeys may refer to the same or different types. • Example: see next slide
N-ary Relation (ERD Styles) • N-ary relations are many-to-many associations among N object instances (of the same or different types). • N-way associations can be implemented by introducing a new associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes). (Most relations are binary: N = 2). • The diamond indicats a ternary relation among types AA, BB and CC. [It is superfluous if N=2, if the relation is one to many, or if an associative entity replaces it.] Example for N=3: BB AA BB CC AA CC (3 fkeys inside) AABBCC Optional attributes New Entity AABBCC gives these atributes a home, and replaces the diamond.
Extended ER Diagrams • When an RDB implements an Extended ERD (EERD), a tuple’s fkeys or inter-object cross-references can identify either a super-class object or an associated parent or container object (instance of a class). • Both types of fkeys share the same integer key value range, although they have distinct semantic meaning. • To improve readability, EERD’s should use different styles for inheritance than for instance-level associations In this example, CC both inherits from AA and is a component of the composite entity BB. It contains two fkeys, (say) AAid and BBid. AA BB 0..* CC
Multiple Inheritance on EERD’s • Multiple inheritance requires an fkey to each superclass object whose properties (atttributes or methods) are inherited. • In a prototype implementation of multiple inheritance, superclass object[s] actually exist apart from their corresponding subclass object[s]. Each sub-object has fkeys to each of its direct ancestor objects. • For a C or C++ implementation, only one of possibly divergent inheritance hierarchies can be mapped into pre-compiled method inheritance. Avoid divergence if possible! • For an ORDB, fkeys also support dynamic mapping of method inheritance. The COOL/LCP interpreter implements such a dynamic map (from a concrete object to its generic Active Instance, from object class to generic Active Class).
ORDB via Prototype Delegation • An Extended ERD (EERD) can be implemented as either a relational RDB, object-oriented OODB, or object-relational ORDB. An OODB is supported by its own class-based data representations. • An ORDB can be class-based or prototype-based with delegation. (GEN is prototype-based.) • Prototype delegation does not rely on Class membership for method inheritance - it creates object-level relationships to support method delegation: ANY client object can ‘delegate’ any of its behavior to another server object via the oid equivalent of an fkey. • To make disciplined use of delegation requires some policy other than anarchy.
GEN Database: Persistence Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB): • Its external persistent RDB format is a union of records representing tuples of different types. • During import, fkeys are augmented or replaced by parent and first-child and next-sibling object reference pointers, which follow strict GEN naming conventions. • During export, pointers are removed but fkeys are preserved or restored for persistent storage in external RDB tuples.
GEN Database: Schema Constraints • The external RDB schema (or EER Diagram) is first converted to Third Normal Form. • Other attributes that would normally comprise a user-defined (and typically composite) primary key can be removed during schema or EERD conversion to Third Normal Form. • This eliminates redundant attributes that functionally depend on some fkey instead of the pkey attribute.
GEN Database: External Format • Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB): • Its external RDB format is a union of records representing tuples of different types: • Every tuple record has an integral and immutable ‘surrogate’ primary key attribute (and object id). • Different tuple types have pairwise disjoint pkey ranges. • All foreign keys (fkeys) use this surrogate pkey value to refer to their parent (container or superclass) record type.
GEN Database: Internal Format • During import, fkeys are augmented or replaced by direct parent object pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions. • This results in an internal ORDB format which is a set of multiply-threaded linked lists of parent-to-children and super-to-subclass object (tuple instance) reference pointers. • Parent-pointers support direct access to parent table attributes, replacing pair-wise join queries in an RDB. • For each 1-to-many parent-child relationship, chgen provides a child_loop macro while gencpp provides a for-each iterator.
GEN Database: Import/Export GEN creates two schema-based import/export utilities: • pr_load parses tuples and imports an external RDB into a memory-resident object-relational database (ORDB); • pr_dump exports the modified ORDB back to the persistent external RDB. • During import, fkeys are augmented or replaced by direct parent pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions, Super- and sub-class objects are also connected in the same way. • This results in an internal ORDB format which is a set of multiply-threaded linked lists from each parent through each of its child-sets, that supports parent-child JOINs.
Importing RDB’s to C++/Java • If the RDB is imported to an object-relational database implemented in C++ or Java, then during import the fkey fields of RDB tuple types should be converted to corresponding C++/Java object reference types. • Caveat/pre-condition: All fkeys implied by links on the RDB’s data model or EERD must conform to inheritance and type constraints of the language (C++ or Java). • Fkeys in an RDB can also support non-exhaustive or over-lapping subclasses (going beyond C++ constraints). • Fkeys and object references can also support dynamic migration (of an object among the subclasses of its class). • Example: An object may make transitions among OLC states (states become subclasses of the object’s class).
Object-Relational Databases - Prototypes and Delegation The last few slides were inspired by Shlaer-Mellor-User Group email related to Divergent Inheritance (parallel hierarchies). This motivates the use of prototypes and delegation to explain the static information architecture that is supported by COOL’s chGEN/GENcpp code generator, and illustrates concurrent sub-state machine models for dynamic behavior. • To: shlaer-mellor-users@projtech.com • Subject: Re: (SMU) Polymorphic events and other paranormal activity • Message 10/734 From lschneid@eng.delcoelect.com • Sep 04, 01 08:45:33 AM • responding to Fontana: . . .
Divergent Hierarchies • responding to Fontana: • > I think Jay was driving at divergent hierarchies, not multiple inheritance, eg: • > relationship S1 - supertype Dog, subtypes BigDog and SmallDog • > relationship S2 - supertype Dog, subtypes BlackDog and WhiteDog DOG CLASS Relationship S2: (BIG xor SMALL) (Mutex and exhaustive): Relationship S1: (BLACK xor WHITE) (Mutex and exhaustive): Black Dog Small Dog Big Dog White Dog Divergent Hierarchies Example: > relationship S1 - supertype Dog, subtypes BigDog xor SmallDog > relationship S2 - supertype Dog, subtypes BlackDog xor WhiteDog
OLC’s with Concurent Sub-states • > Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC): • > BigDog: Woofing <--> Sleeping • > SmallDog: Yipping <--> Skittering • > BlackDog: Panting <--> Drooling • > WhiteDog: Shedding <--> Scratching • > Now create one instance of Dog - let's say it is a big black dog, with a • > dogId = 13. It must be in one of the BigDog states (Woofing or Sleep-> ing), • > AND in one of the BlackDog states (Panting or Drooling). Big: Dog #13 (Big and Black): Woofing Black: Panting Drooling Sleeping
Merging OLC Behaviors of Concurrent Subclases: • Each of the 4 subclasses has its own ‘object lifecycle’ (OLC); • E.g. every Big&Black Dog must be in one of the BigDog states (Woofing or Sleeping), AND in one of the BlackDog states (Panting or Drooling). • Dog #13 (Big and Black) has the behavior/activity of both BigDogs and BlackDogs: Black Dog OLC: Panting Drooling Big Dog OLC: Woofing Woof& Pant Woof& Drool Sleep &Pant Sleep& Drool Sleeping
Divergent Hierarchies - revisited (1) DOG CLASS Partition S1: (BLACK xor WHITE) (Mutex and exhaustive): Partition S2: (BIG xor SMALL) (Mutex and exhaustive): • C++ does not support divergent class hierarchies. • One alternate is prototype objects with delegation. • RDB’s can support prototypes and delegation: • In our example, each dog object belongs to one subclass for color, and simultaneously to another subclass for size. • That is, a ‘real’ dog object simultaneously belongs to, and inherits from, exactly one of the subclasses in each inheritance tree above. • The next slide shows (by its messiness) that multiple inheritance is best avoided. Black Dog Small Dog Big Dog White Dog
Divergent Hierarchies - revisited (2) DOG CLASS Partition S2: (BIG xor SMALL) (Mutex and exhaustive): Partition S1: (BLACK xor WHITE) (Mutex and exhaustive): • Level 3 includes concrete ‘leaf’ objects or ‘real’ dogs, which simultaneously belong to a distinct pair of subclasses at level 2 of the inheritance tree (compositional inheritance of properties). • So there are really 4 leaf classes at level 3, below level 2 above. • Each leaf class instance at level 3 has exactly two paths up to level 1; both paths must end up at the same root object (Dog instance). Black Dog Small Dog Big Dog White Dog Big Black Dog Big White Dog Small Black Dog Small White Dog
Composition or Implementation Inheritance • With compositional inheritance, dogs will inherit from two ‘component’ classes: Color and Size. • This is ‘impure’ multiple inheritance in C++ ( impure because the two ancestor classes have nothing in common with animals, which may not behave well as clients of Color or Size ancestor methods). • Java does not have multiple inheritance - but any class may ‘implement’ the interfaces Color’ and ‘Size’ instead. • Dogs must then be eligible to inherit (C++) or implement (Java) all the methods of the Color and Size classes - an undesirable compromise. Over-riding only hides the mis-match between class Dog and Color or Size classes.
References • Frank & Ulrich: ”Delegation: An Important Concept for the Appropriate Design of Object Models”, JOOP June 2000 (pp13-17, 44) • Eliens: Principles of OO Software Dev. 2ed., AWL 2000 (Sect. 5.4: Prototypes - delegation vs. inheritance) • Kilov/Ross: Information Models, PH 1994 (Not about delegation, but covers multiple/concurrent/overlapping subclass membership.) • Lee &Tepfenhart: UML and C++: A Practical Guide to OO Dev, 2ed, PH 2001(pp206-210) (Multiple Inheritance examples Fig. 12-4,12-5) • Sanders: Data Modeling, Boyd-Fraser/ITP 1995 (Ch. 3 and Appendix)