350 likes | 408 Views
Learn physical database design, performance tuning, security, emerging DB technologies with hands-on projects in this graduate-level course.
E N D
The Teaching Staff • Instructor: Lu Hongjun • Office: 3543 (Lift 25-26), HKUST • E-Mail: luhj@cs.ust.hk • URL: http://www.cs.ust.hk/~luhj • Research Interests: • Data/Knowledge base management with emphasis on query processing and optimization • Data warehousing and data mining • Applied performance evaluation • Database application development • Parallel and distributed database systems • TA: • Name Jiang Haifeng Liu Guimei • Office: 4212 (DB Lab) HKUST • E-Mail: jianghf@cs.ust.hk cslgm@cs.ust.hk • URL: http://ihome.ust.hk/~jianghf http://ihome.ust.hk/~cslgm
References • R.Ramakrishnan & J. Gehrke. Database Management Systems, 3rd Ed. McGraw Hill, 2000 • D. Shasha & P. Bonnet. Database Tuning: Principles, Experiments, and Troubleshooting Techniques, Revised edition , Morgan Kaufmann, 2002 • Related papers
Course Contents • Part I: Issues in database administration • Database design • Principles of database performance tuning • Database security • Part II: Emerging DB-related technology • OLAP and data warehouse • XML data management • Data stream processing • Course Web Page: http://course.cs.ust.hk/comp334/
Grading • Written assignment (20%) • Exams (25%) • Course project (50 %) • Class participation (5%)
Course Project Requirements • Carried in teams of two or four • Database related projects • You propose your own project, and get approve from the instructor • Topic: database related • The amount of work : it accounts for 50% of your final grade • Required documents (double-spaced) • Project proposal (1-2 pages) • due date: 23-24/02 • Status report (4-6 pages) • due date: 28-29/03 • Final report (8-10 pages) • due date: 10-11/05
Summary • It is a graduate level course • Not a DBA course • Not an introductory database course • Not a programming course, but you need to know how to write programs • Hopefully, you will leave with • A good grade • A good understanding of studied topics
Review -- RDBMS • Relational database systems • The basic concepts in database systems • Relational data model • Relational languages • Database design • Previous course: conceptual and logic design • This course: physical database design • Database management systems • The basic components of DBMS • Storage management • Transaction management • Query processing & optimization
What Is Database & DBMS? • Database: a very large,integrated, persistent collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., James is taking CSIT530) • A Database Management System (DBMS)is a software package designed to store and manage databases.
Data Models • A data model is a collection of concepts for describing • data and related operations, • semantics of data, • relationship among data, and • constraints on data • Two types of data models • Conceptual models: emphasize semantics of data • Entity-Relationship model, Object-Oriented model • Logical models: ways how the data is organized in the logical level • Hierarchical model, Network model, Relational model
Instances and Schemas • A schemais a description of a particular collection of data, using a given data model - the logical structure of the database (e.g., set of customers and accounts and the relationship between them) • Schema Instance - the actual content of the database at a particular point in time • Similar to types and variables in programming languages
Levels of Abstraction • ANSI-SPARC three-level architecture • Many views, single conceptual (logical) schemaand physical schema. • Views describe how users see the data. • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. View View View Conceptual Schema Physical Schema
Data Independence • Applications insulated from how data is structured and stored. • Ability to modify a schema definition in one level without affecting a schema definition in the next higher level. • The interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. • Logical data independence: Protection from changes in logical structure of data. • Physical data independence: Protection from changes in physical structure of data.
Procedures And standards Analysts & Programmers End Users Database Environment Specifies & enforces Database Administrator System administrator Database Designer manages designs Hardware DBMS Application Programs use write Data designs
DBMS Related Languages • Data Definition Language (DDL) • Specification notation for defining the database schema • Data storage and definition language - special type of DDL in which the storage structure and access methods used by the database system are specified • Data Manipulation Language (DML) • Language for accessing and manipulation the data organized by the appropriate data model • Two classes of languages • Procedural - user specifies what data is required and how to get those data. • Nonprocedural - user specifies what data is required without specifying how to get those data
DBMS Related Languages Host Language Programming Language for DBMS Applications DDL Data Sublanguage Procedural DML Non-Procedural Query Language
Evolution of Database Technology • 1960s: Hierarchical (IMS) & network (CODASYL) DBMS. • 1970s: Relational data model, relational DBMS implementation. • 1980: RDBMS rules the earth • 1985-: Advanced data models (extended-relational, OO, deductive, etc.) Application-oriented DBMS (spatial, scientific, engineering, etc.). • 1990s: ORDB, OLAP, Data mining, data warehousing, multimedia databases, and network databases.
What is an RDBMS • A piece of software that manages data based on the relational model • Relational data, SQL queries • Commercial products • Oracle, IBM DB2, IBM Informix, Sybase, Microsoft SQL Server • Each has ~10 million lines of C/C++ code • Smaller packages – MySQL, PostgresSQL
Relational Data Model • Main concept: relation • A table with rows and columns • Every relation has a schema • Description of the columns, or fields • Relational data – rows in a table • No order among the rows in a table • The most widely used data model!
University Database • Conceptual schema: • Students (sid: string, name: string, login: string, age: integer, gpa:real) Cardinality = 3, degree = 5 , all rows distinct • Courses (cid: string, cname:string, credits:integer) • Enrolled (sid:string, cid:string, grade:string)
Relational Languages • Formal languages • Relational algebra • Relational calculus • Commercial language: SQL • DDL (Data Definition Language) • Create Table, Create Index, Create View … • DML (Data Manipulation Language) • Queries • Select • Updates • Insert, Delete, Update
Creating Tables CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL) CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2))
Primary Key Constraints • A set of fields is a key for a relation if : • 1. Any two distinct tuples differ in some fields of the set, and • 2. This is not true for any subset of the set. • A superkey: Condition 1 true and 2 false. • E.g., sid is a key for Students. {sid, gpa} is a superkey. • One primary key can be set per relation.
Primary and Candidate Keys CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL, PRIMARY KEY (sid), UNIQUE (login)) CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))
Foreign Key Constraints • Foreign key : a set of fields in a relation • Refers to the primary key of another relation • Referential integrity • No dangling references CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY(sid,cid), FOREIGN KEY(sid) REFERENCESStudents ) Enrolled Students
Integrity Constraints (ICs) • IC: condition that must be true for any db instance • Domain constraints • Primary constraints • Foreign key constraints • ICs are specified when a schema is defined. • ICs are checked when relations are modified. • A legal instance of a relation • Satisfies all specified ICs
Adding and Deleting Tuples INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2) DELETE FROM Students S WHERE S.name = ‘Smith’
Queries SELECT * FROM Students S WHERE S.sid = 53688
Querying Multiple Tables SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=“A” Enrolled Students
Functional Components of DBMS User/Application Database Administrator Security Control Transaction Manager DML Stmt. DDL Command Transaction Management Query Processing & Optimization DDL Compiler Concurrency Control Recovery Query Plan Execution Engine Query Processing Lock Table Log Buffer Index/file/record Management Buffer Management Storage Manager Statistics Metadata Indexes User data Storage Management
Query Optimization • A major strength of RDBMS • SQL queries are declarative • Optimizer figures out how to answer them • Re-order operations • Pick among alternatives of one operation • Ensure that the answer is correct!
Transaction • A key concept in databases • An atomic sequence of actions (read/write) • Brings DB from a consistent state to another • ACID • Atomicity • Consistency • Isolation • Durability
Concurrency Control & Recovery • Concurrency Control • Essential for good DBMS performance • Run several user programs concurrently • Interleave actions of different users • Ensure the correctness • Users may think it is a single-user system. • Recovery • Essential for durability of transactions
RDBMS Features • Effective and efficient access • Easier application development • Data independence • Data integrity and security • Concurrent access • Recovery from crashes • Uniform data administration
Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. • Levels of abstraction give data independence. • A DBMS typically has a layered architecture. • DBAs hold responsible jobs and are well-paid! • DBMS R&D is one of the broadest, most exciting areas in CS.