680 likes | 1.29k Views
Course: Database Design. Basic Information Course Code : COM 411 Instructor: O. Adigun Office: NASB-022 Email: ranmiadigun@yahoo.com ranmiadigun@yabatech.edu.ng GSM no: 08055317571. Textbooks. Peter Rob, Carlos Coronel: Database Systems: Design Implementation and 5th Edition 2001
E N D
Course: Database Design • Basic Information • Course Code : COM 411 • Instructor: O. Adigun • Office: NASB-022 • Email: ranmiadigun@yahoo.com • ranmiadigun@yabatech.edu.ng • GSM no: 08055317571 COM 411 DATABASE DESIGN
Textbooks • Peter Rob, Carlos Coronel: Database Systems: Design Implementation and 5th Edition 2001 • Simson Graham C (2001); Database processing: Fundamentals Design and Implementation. Prentuce-Hall • Jeff Ullman & Jennifer Widom A first Course in Database systems • Abraham Siberschatz, Henry F. Kort and S. Sudarshan: Database System Concepts • Philip J. Pratt and Joseph J. Adamski: The Concepts of Database Management COM 411 DATABASE DESIGN
Requirements • Prerequisites • Programming language • Work and Grading • Attendance 5% • Home work/Assignment: 15% • Test: 10% • Exam 70% • Cheating • Don’t even think about it! • No form of academic dishonesty will be tolerated in this class. COM 411 DATABASE DESIGN
What is a database? • A database is a shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization. For example, a phone book is a database of names, addresses and phone numbers. • address problems such as • How to store data • How to query data effectively • How to update the data securely( by multiple users) • Contrast with using file systems for the same task COM 411 DATABASE DESIGN
Examples of Database Applications • Purchases from the supermarket • Purchases using your credit card • Booking a holiday at the travel agents • Using the local library • Taking out insurance • Using the Internet • Studying at university COM 411 DATABASE DESIGN
Example of a Database • Every entity has a set of records • HINT: In analogy to OOP, an entity is an object class while a record is an object instance • To definea database • Describe the structure of every record • A set of properties called data elements • Each has a name and a data type • Some other restrictions or constraints (e.g. valid major) • To construct a database • Store the actual records in their corresponding tables • To manipulatea database • Query, insert, edit, delete COM 411 DATABASE DESIGN
Early Data Management - Ancient History • Data are Early Data Management - Ancient History not stored on disk • Programmer defines both logical data structure and physical structure (storage structure, access methods, I/O modes, etc) • One data set per program. High data redundancy. • Each program defines and manages its own data. • Collection of application programs that perform services for the end users (e.g. reports). COM 411 DATABASE DESIGN
Problems • There is no persistence. • All data is transient and disappears when the program terminates. • Random access memory (RAM) is expensive and limited • All data may not fit available memory • Programmer productivity low • The programmer has to do a lot of tedious work. COM 411 DATABASE DESIGN
File Processing - Recent (and Current) History • Data are stored in files with interface between programs and files. • Various access methods exist (e.g., sequential, indexed, random) • One file corresponds to one or several programs. COM 411 DATABASE DESIGN
Problems With File Systems • Data are highly redundant • sharing limited and at the file level • Data is unstructured • “flat” files • High maintenance costs • data dependence; accessing data is difficult • ensuring data consistency and controlling access to data • Sharing granularity is very coarse • Difficulties in developing new applications COM 411 DATABASE DESIGN
Advantages of Database systems nSelf-describing nature of a database system: A DBMS catalog stores the description of the database • The description is called meta-data (tables, fields, types, etc…)s • This allows the same DBMS software to work with different databases (general-purpose) • Insulation between programs and data: a.k.a. program-data independence. • Allows flexibility in changing the data, its storage structures and operations without having to change the DBMS access programs • Data Abstraction: A data model is used to hide storage details and present the users with a conceptual view of the database • Users don’t care how things are physically stored • Where record1 ends and record2 starts on stored files • File-based systems need to know this info • Data Abstraction: A data model is used to hide storage details and present the users with a conceptual view of the database • Users don’t care how things are physically stored • Where record1 ends and record2 starts on stored files • File-based systems need to know this info COM 411 DATABASE DESIGN
Advantages of Database systems (Contd.) • Sharing of data and multi-user transaction processing allowing a set of concurrent users to retrieve and to update the database • Concurrency control & recovery within the DBMS guarantee that each transaction is correctly executed (or completely aborted as we shall see later) • OLTP (Online Transaction Processing) is a major part of database applications • In file processing, every file is usually accessed by a single user at a time • Restricting unauthorized access to data • Providing storage structures for efficient query processing • Indexes (similar indexes at the end of books) • Reduces data redundancy • Avoids inconsistency • Security • Maintains Integrity COM 411 DATABASE DESIGN
In order to provide processes • There are requirements of digital databases • Security • Reliability • Integrity • Providing user views • Being independent of the underlying data • Providing support for metadata • High performance • Providing concurrency to users COM 411 DATABASE DESIGN
DBMS Query Processor Integrated Database PROGRAM 1 … PROGRAM 2 Database Approach COM 411 DATABASE DESIGN
The Database Approach • The limitations of the file-based approach can be attributed to two • factors: • The definition of the data (its nature and structure) is embedded in the • programs which are used, rather than being stored separately and independently. • There is no control over the access and manipulation of data beyond that imposed by the programs. • A more effective approach requires two new concepts: • the Database • the Database Management System • A database is a shared collection of logically related data (and a description of this data), designed to meet the information needs of an organisation. COM 411 DATABASE DESIGN
Database • A database is a single, large repository of data that is defined once and used simultaneously by many users (e.g. in different departments.) • Rather than having disconnected files with redundant data, all data is integrated, with minimum duplication • No data is any longer “owned” by a single department; all data is now a shared corporate resource. • A database holds not only an organisation’s operational data, but also a description (more likely a number of separate descriptions) of that data. • The description is known as the database schema or meta-data (the data about data). • This provides a degree of independence between programs and the data since the structure of the data is separated from the programs and stored in the database. COM 411 DATABASE DESIGN
Logically Related Data? • When we analyse the information needs of an organisation we try to identify entities, attributes, and relationships. • An entity is a distinct object (a person, place or thing, etc.) in the organisation that must be represented in the database. • e.g. a Renter, a Property, a Lease Agreement, etc. • An attribute is a property that describes some aspect of the object that we wish to record. • e.g. Rent, Number of rooms, Name, Address, etc. • A relationship is an association among two or more entities. • e.g. Properties have Owners, Renters are associated with a Lease agreement and a Property, etc. • A database combines these aspects: it holds data that is logically related in ways that correspond with the relationships. COM 411 DATABASE DESIGN
What is a response to theserequirements? • A Database Management System (DBMS) • is a Database (collection of Files), and the associated; programs that operate on the files as well as an organizational structure supporting the files and their use • It is a software system that manages a database • Typically provides these functions: • Backup/recovery, Retrieval • Data definition • Storage management • Abstract data model • High level query and data manipulation language • Transaction processing • Recovery from crashes • Interface with programming languages COM 411 DATABASE DESIGN
Database Management System • The structure of the database is determined by the abstractdata model that is used. • A database management system (DBMS) is the generalized tool that facilitates the management of and access to the database. • There are many different types of DBMSs, ranging from small systems that run on personal computer to huge systems that run on mainframe. • The following are examples of database applications: • computerized library systems • automated teller machines • flight reservation systems • computerized parts inventory systems COM 411 DATABASE DESIGN
Database Management System • The Database Management System (DBMS) is the software that enables users to define, create, and maintain the database and which provides controlled access to this database. • The DBMS comes between the users and the database. • First, the DBMS allows users to define the database, usually through a Data Definition Language (DDL). • The DDL allows users to specify the data types and structures of, and the constraints on, the data to be stored in the database. • Second, the DBMS allows users to insert, update, delete, and retrieve data from the database, usually through a Data Manipulation Language (DML) COM 411 DATABASE DESIGN
Database Management System contd. • The fact that the database provides a central storage repository for data allows the DML to provide a general enquiry facility for this data, called a query language. • Query languages help prevent the problems of either a fixed set of operations or a proliferation of programs that occur with file-based systems. • The DBMS provides controlled access to the database. • Prevents access by unauthorised users. • Protects integrity of data by maintaining its consistency. • Controls access to allow sharing. • Facilitates recovery following hardware or software failures. COM 411 DATABASE DESIGN
Components of DBMS Environment COM 411 DATABASE DESIGN
Components of DBMS Environment • Hardware • Can range from a PC to a network of computers. • Software • DBMS, operating system, network software (if necessary) and also the application programs. • Data • Used by the organization and a description of this data called the schema. • Procedures • Instructions and rules that should be applied to the design and use of the database and DBMS. • People COM 411 DATABASE DESIGN
TYPES OF DBMS • Relational – data model based on tables • Network – data model based on graphs with records as nodes and relationships between records as edges • Hierarchical – data model based on trees • Object-Oriented – data model based on the object-oriented programming paradigm • Distributed – composed of several independent DBMSs running at the nodes of a communications network • A flat-file database program, allows the user to create many databases but lets him/her work with only one file at a time. Using a flat -file database program, one can create simple applications such as mailing list databases or personnel files. COM 411 DATABASE DESIGN
History of Database Models • History: • North American Rockwell developed GUAM (Generalized Update Access Method) • Mid 1960s Rockwell partner with IBM to create Information Management System (IMS) • IMS DB/DC lead the mainframe database market in 70’s and early 80’s • Represents well hoe components are decomposed into parts • Network: • CODASYL (Conference on Data Systems Languages) created a group to work on standardization of databases: Database Task Group (DBTG) • Identified 3 database component: • Network schema (database organization) • Subschema (views of database per user) • Data management language COM 411 DATABASE DESIGN
Hierarchical Database Model • Logically represented by an upside down tree • Each parent can have many children • Each child has only one parent COM 411 DATABASE DESIGN
Hierarchical Database Model • Advantages • Conceptual simplicity • Database security and integrity • Data independence • Efficiency • Disadvantages • Complex implementation • Difficult to manage and lack of standards • Lacks structural independence • Applications programming and use complexity • Implementation limitations (no M:N relationship) COM 411 DATABASE DESIGN
Network Database Model • Each record can have multiple parents • Composed of sets - relationships • Each set has owner record and member record • Member may have several owners • A set represents a 1:M relationship between the owner and the member COM 411 DATABASE DESIGN
Network Database Model • Advantages • Conceptual simplicity • Handles more relationship types • Data access flexibility • Promotes database integrity • Data independence • Conformance to standards • Disadvantages • System complexity • Lack of structural independence COM 411 DATABASE DESIGN
Relational Database History • 1970 (Relational Data Model) • Edgar Codd (at IBM Almaden Research, San Jose) proposed a • new data model • Idea is followed by the development of several DBMS (e.g., • System R) and development of theoretical results • First deployed on mainframe computers (DB2), then also personal computers • Oracle, Informix, SQL server, DB2 • Since the early 1980's, the Relational Model has dominated the field • The database management systems (DBMS) field is rapidly changing. • Development of the: • Entity Relationship model (more on this later) and • Object Oriented Database Management System • promise to change the field significantly in the next few years. COM 411 DATABASE DESIGN
Relational database Model • Perceived by user as a collection of tables for data storage • Tables are a series of row/column intersections (a row corresponds to a record, a column to a field) • Tables related by sharing common entity characteristic(s) • Relational Database Software allows the user to work with several database files at the same time and share information across the files. • For example, to implement an accounting database system, one requires relational capabilities to link together information that is stored in the different files. • An example of a relational database software would be MS Access, Oracle, Sybase and Paradox. COM 411 DATABASE DESIGN
Relational Database Model (con’t.) A collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. COM 411 DATABASE DESIGN
Objectives of RelationalDesign • to reduce the amount of redundant data in relations that can waste space and causes integrity problems • to provide fast access to data from the relations • Following are some of the advantages of using a database over a traditional file-processing system: • Potential for enforcing standards. • Flexibility. • Reduced application development time. • Availability of up-to-date information to all users. • Economies of scale. COM 411 DATABASE DESIGN
Advantages & Disadvantages • Advantages • Easier database design, implementation, management, and use • Data can be shared. • Data storage and redundancy can be reduced. • Data inconsistency can be avoided. • Data Integrity can be maintained. • Standards can be enforced. • Security restrictions can be applied. • Independence between physical storage and logical data design can be maintained. • Disadvantages • Substantial hardware and system software overhead • Poor design and implementation is made easy • May promote “islands of information” problems COM 411 DATABASE DESIGN
Terminology COM 411 DATABASE DESIGN
A Relation in the Language of RDBMS Relation Tuples Attributes COM 411 DATABASE DESIGN
Some definitions… nPrimary key – • A table column (or columns if the key is composite) that can be used to uniquely identify every row of the table; • no two rows of the table have the same value n Foreign key – • A column, or combination of columns, that contain • values that are found in the primary key of some • table (including, possibly, itself). • A foreign key may be null, and almost always is not unique. COM 411 DATABASE DESIGN
More definitions • Domain – a pool of values from which • specific attributes of specific relations draw their actual values • Relational join • Tables can be linked by a relational join. • A primary key of a table is linked to another foreign key on another table. COM 411 DATABASE DESIGN
Keys Two tables (relations) that are related via a foreign key COM 411 DATABASE DESIGN
Cardinality • Cardinality defines the numeric relationships between occurrences of the entities on either end of the relationship line • 0:M (zero to many) optional relationship indicating that a Color might have zero Adjectives, one Adjective or lots of Adjectives, and that a Adjective might be un-owned, but can only be owned by a maximum of one Color. • This is implemented in a database as a nullable foreign key column in the Adjective table that references the Color table. Adjective Colour COM 411 DATABASE DESIGN
Cardinality (cont.) Can be either Optional (0) or Mandatory (1) • one-to-one (0:0) (0:1) (1:0) (1:1) • one-to-many (0:M) (0:M) (1:M) (1:M) • many-to-one (M:0) (M:1) (M:0) (M:1) • many-to-many (M:M) (M:M) (M:M) (M:M) Note: (_:O,M) is sometimes used to denote an optional many COM 411 DATABASE DESIGN
The Database Environment • A database system aims to provide users with an abstract view of data by hiding certain details of how data is stored and manipulated. • Therefore, the starting point for the design of a database must be an abstract description of the information requirements of the organisation. This will be in terms of entities, attributes and relationships, as we saw before. • And it will not be concerned at all with the mechanisms which might be used to store or retrieve the data. (It is abstract.) COM 411 DATABASE DESIGN
An Abstract View • For example, in the estate agent example we may build an abstract view which contains the following: • Entities: Staff, Property, Owner, and Renter (maybe others, too). • Attributes describing properties or qualities of each entity (e.g. Staff have names, addresses, and salaries). • Relationships between these entities (e.g. Owners own Properties). • Since a database is a shared resource, we may also be concerned to provide different users with different views of the data held in the database. COM 411 DATABASE DESIGN
ANSI-SPARC Architecture • To satisfy these needs, the architecture of most commercial DBMSs is based on the ANSI-SPARC architecture (1975). • American National Standards Institute (ANSI) • Standards Planning And Requirements Committee (SPARC) • Although this never became a formal standard, it is useful to help understand the functionality of a typical DBMS. • The ANSI-SPARC model of a database identifies three distinct levels at which data items can be described. • These levels form a three-level architecture comprising: • External • Conceptual • Logical and • Internal (physical) COM 411 DATABASE DESIGN
Internal view ANSI-SPARC* Architecture ASSIGNMENT(ENO,PNO,ENAME,PNAME) Users External Schema External view External view External view Conceptual view EMP(ENO: string, ENAME: string, TITLE: string) PROJ(PNO: string, PNAME: string, BUDGET: integer) WORKS(ENO: string, PNO: string, RESP: string, DUR: integer) Conceptual Schema Conceptual view Store all the relations as unsorted files. Build indexes on EMP(ENO), PROJ(PNO) and WORKS(ENO,PNO). DBMS Internal Schema DBMS COM 411 DATABASE DESIGN
Three-Level Architecture I • The objective of the three-level architecture is to separate the users’ view(s) of the database from the way that it is physically represented. This is desirable for the following reasons: • 1. It allows independent customized user views. • Each user should be able to access the same data, but have a different customized view of the data. These should be independent: changes to one view should not affect others. • 2. It hides the physical storage details from users. • Users should not have to deal with physical database storage details. They should be allowed to work with the data itself, without concern for how it is physically stored. COM 411 DATABASE DESIGN
Three-Level ArchitectureII • The database administrator should be able to change the database storage structures without affecting the users’ views. • From time to time rationalisations or otherchanges to the structure of an organisation’s data will be required. • 4. The internal structure of the database should be unaffected by changes to the physical aspects of the storage. • For example, a changeover to a new disk. • 5. The database administrator should be able to change the conceptual or global structure of the database without affecting the users. • This should be possible while still maintaining the desired individual users’ views. COM 411 DATABASE DESIGN
The External Level • The external level represents the user’s view of the database. • It consists of a number of different views of the database, potentially one for each user. • It describes the part of the database that is relevant to a particular user. • Views may provide different representations of the same data. • For example, some users might view dates in the form (day/month/year) while others prefer (year/month/day). • Some views might include derived or calculated data. • For example, a person’s age might be calculated from their date of birth since storing their age would require it to be updated each year. • Conceptual Level • Community view of the database. • Describes what data is stored in database and relationships among the data. COM 411 DATABASE DESIGN
The Conceptual Level • The conceptual level describes what data is stored in the database and the relationships among the data. • It is a complete view of the data requirements of the organization that is independent of any storage considerations. • The conceptual level represents: • All entities, their attributes, and their relationships. • The constraints on the data and security and integrity information. • The conceptual level supports each external view, in that any data available to a user must be contained in, or derivable from, the conceptual level. • The description of the conceptual level must not contain any storage dependent details. COM 411 DATABASE DESIGN
The Internal Level • The internal level covers the physical representation of the database on the computer (and may be specified in some programming language). • It describes how the data is stored in the database in terms of particular data structures and file organizations. • The internal level is concerned with: • Allocating storage space for data and indexes. • Describing the forms that records will take when stored. • Record placement. Assembling records into files. • Data compression and encryption techniques. • The internal level interfaces with the OS to place data on the storage devices, build the indexes, retrieve the data, etc. COM 411 DATABASE DESIGN