1 / 117

CS5423 Principles of Database Systems

CS5423 Principles of Database Systems. DB Fundamentals. What are Database Management Systems. DBMS is a system for providing EFFICIENT , CONVENIENT , and SAFE MULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data. Capabilities of DBMS. Persistent Storage

chloe
Download Presentation

CS5423 Principles of Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS5423Principles of Database Systems DB Fundamentals

  2. What are Database Management Systems DBMS is a system for providing EFFICIENT, CONVENIENT, and SAFEMULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data

  3. Capabilities of DBMS • Persistent Storage • Data Structures for efficient access to very large amounts of data • Programming Interface • More than reading and writing of files. Access/search and modify data through a query language • Transaction Management- ACID • a set of properties of  database transactions intended to guarantee validity even in the event of errors, power failures, etc • Atomicity: each transaction is treated as a single "unit • Consistency: any data written to the database must be valid according to all defined rules,  • Isolation:  controls how and when changes are made and if they become visible to each other, users, and systems • Durability: completed transactions  will survive permanently

  4. Example: Banking System • Data • Information on accounts, customers, balances, current interest rates, transaction histories, etc. • MASSIVE • Many gigabytes at a minimum for big banks, more if keep history of all transactions, even more if keep images of checks -> Far too big to fit in main memory • PERSISTENT • Data outlives programs that operate on it

  5. Example: Banking System • SAFE: • from system failures • from malicious users • CONVENIENT: • simple commands to debit account, get balance, write statement, transfer funds, etc. • also unpredicted queries should be easy • EFFICIENT: • don't search all files in order to get balance of one account, get all accounts with low balances, get large transactions, etc. • massive data! -> DBMS's carefully tuned for performance

  6. Multi-user Access • Many people/programs accessing same database, or even same data, simultaneously -> Need careful controls • Alex @ ATM1: withdraw $100 from account #007 get balance from database; if balance >= 100 then balance := balance - 100; dispense cash; put new balance into database; • Bob @ ATM2: withdraw $50 from account #007 get balance from database; if balance >= 50 then balance := balance - 50; dispense cash; put new balance into database; • Initial balance = 120. Final balance = ??

  7. Why File Systems Won’t Work • Storing data: file system is limited • size limit by disk or address space • when system crashes we may lose data • Password/file-based authorization insufficient • Query/update: • need to write a new C++/Java program for every new query • need to worry about performance • Concurrency: limited protection • need to worry about interfering with other users • need to offer different views to different users (e.g. registrar, students, professors) • Professor: All students’ grade • Student-Own grade • Schema change: • entails changing file formats • need to rewrite virtually all applications That’s why the notion of DBMS was motivated!

  8. DBMS Architecture User/Web Forms/Applications/DBA query transaction DDL commands Query Parser Transaction Manager DDL Processor Query Rewriter Concurrency Control Logging & Recovery Query Optimizer Query Executor Records Indexes Lock Tables Buffer: data, indexes, log, etc Buffer Manager Main Memory Storage Manager Storage data, metadata, indexes, log, etc CS411

  9. Data Structuring: Model, Schema, Data • Data model • conceptual structuring of data stored in database • E.g. arrays, objects in C or Java • ex: data is set of records, for a university system, student table • each with student-ID, name, address, courses, photo • ex: airplane system: • data is graph where nodes represent cities, edges represent airline routes • Schema versus data • schema: describes how data is to be structured, defined at set-up time, rarely changes (also called "metadata") • A set of attributed with a name for a relation • Student(Id, Name, Address, Curse,..) • datais actual "instance" of database, rows of relations(tables) changes rapidly • Each row has a component(value) for each attribute of relation defined in schema • vs. types and variables in programming languages

  10. Schema vs. Data • Schema: name, name of each field, the type of each field • Students (Sid:string, Name:string, Age: integer, GPA: real) • A template for describing a student • Data: an example instance of the relation

  11. Data Structuring: Model, Schema, Data • Data definition language (DDL) • commands for setting up schema of database • CREATE, DROP, ALTER • Data Manipulation Language (DML) • Commands to manipulate data in database: • Select, INSERT, DELETE, MODIFY • Also called "query language“ • Does not affect schema

  12. Create another column e.g. address • DDL • Insert another student information • DML

  13. People • DBMS user: queries/modifies data • DBMS application designer • set up schema, loads data, … • DBMS administrator • user management, performance tuning, … • DBMS implementer: builds systems

  14. Key Steps in Building DB Applications • Step 0: pick an application domain • Step 1: conceptual design • Discuss what to model in the application domain • Need a modeling language to express what you want • ER model is the most popular such language • Output: an ER diagram of the application domain • Step 2: pick a type of DBMS’s • Relational DBMS is most popular and is our focus

  15. Key Steps in Building DB Applications • Step 3: translate ER design to a relational schema • Use a set of rules to translate from ER to relational schema • Use a set of schema refinement rules to transform the above relational schema into a goodrelational schema • 1NF, 2NF, 3NF, BCNF, 4NF,…… • At this point • You have a good relational schema on paper

  16. Key Steps in Building DB Applications • Step 4: Implement your relational DBMS using a "database programming language" called SQL • SELECT-FROM-WHERE-GROUPBY-HAVING • Step 5: Ordinary users cannot interact with the database directly and the database also cannot do everything you want, hence write your application program in C++, Java, PHP, etc. to handle the interaction and take care of things that the database cannot do

  17. Questions so far?

  18. ER Model • ER ? • A language to specify • what information a database must hold • what are the relationships among components of that information • What we will cover • basic constructs, and design principles • Proposed by Peter Chen in 1976 • "The Entity-Relationship Model --- Toward a Unified View of Data". in ACM transactions on database systems (TODS) • One of the most cited CS papers

  19. Entities and Attributes • Entities • Real-world objects distinguishable from other objects • Described using a set of attributes • Attributes • each has an atomic domain: string, integers, reals, etc. • Entity set: a collection of similar entities

  20. makes Product Company 1 a b 2 A= c 3 B= d (Binary) Relationship • A mathematical definition: • if A, B are sets, then a relation R is a subset of A x B (Cartesian product) • A={1,2,3}, B={a, b, c, d}, R = {(1,a), (1,c), (3,b)} • makes is a subset of Product x Company:

  21. makes Product Company a b c 1 2 3 1 2 3 1 2 3 a b c d a b c d Multiplicity of E/R Relationships • one-one: • many-one • many-many

  22. did budget name dname ssn lot Departments Employees Manages Works_In Example • An employee can work in many departments; a department can have many employees • In contrast, each department has at most one manager. • So, how can we create ER diagram for this ?

  23. ER Model vs. Relational Model • Both are used to model data • ER model has many concepts • Entities, relationships, attributes, etc. • Well-suited for capturing the app. requirements • Not well-suited for computer implementation • Relational model • Has just a single concept: relation (table) • World is represented with a collection of tables • Well-suited for efficient manipulations on computers

  24. Relations • Schema vs. instance = columns vs. rows • Schema of a relation • Relation name • Attribute names • Attribute types (domains) • Schema of a database • A set of relation schemas • Questions • When do you determine a schema (instance)? • How often do you change your mind?

  25. Relation: An Example Name of Table (Relation) Column (Field, Attribute) Products Domain (Atomic type) Row (Record, Tuple)

  26. Relations • The database maintains a current database state • Updates to the data happen very frequently • add a tuple • delete a tuple • modify an attribute in a tuple • Updates to the schema are relatively rare, and rather painful. Why?

  27. Translating ER Diagram to Rel. Design • Basic cases • entity set E = relation with attributes of E • relationshipR = relation with attributes being keys of related entity sets + attributes of R • Special cases • combining two relations • translating weak entity sets • translating is-a relationships and subclasses

  28. since name dname ssn lot Employees Manages Translating ER Diagrams budget did Departments Translation to relational model? 1-to-1 1-to-Many Many-to-1 Many-to-Many

  29. E2 ER Diagram to Relations Relational schema e.g.account=(bname, acct_no, bal) E/R diagram E = ( a1, …, an ) E a1 ….. an R1 E1 R1= ( a1, b1, c1, …, ck) a1 …. an c1 …. ck b1 …. bm

  30. E2 ER Diagram to Relations R1 E1 c1 …. ck a1 …. an b1 …. bm • Could have : R1= ( a1, b1, c1, …, ck) • Put b1 as the key for R1, it is also the key for E2=(b1, …., bn) • Usual strategy(combination) • ignore R1 • Add a1, c1, …., ck to E2 instead, i.e. • E2=(b1, …., bn, a1, c1, …, ck)

  31. E2 ER Diagram to Relations R1 ? ? E1 c1 …. ck a1 …. an b1 …. bm E2 = ( b1, …, bm ) E1 = ( a1, …, an ) R1 R1 = ( a1, b1, c1 …, ck ) E1 = ( a1, …, an ) R1 E2 = ( b1, …, bm , a1, c1, …, ck) E1 = ( a1, …, an , b1, c1, …, ck) R1 E2 = ( b1, …, bm ) R1 Treat as n:1 or 1:m

  32. Relational Algebra • Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Oklahoma • Could write in C++/Java, but a bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra, • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles

  33. What is an “Algebra”? • Mathematical system consisting of: • Operands--- variables or values from which new values can be constructed • Operators--- symbols denoting procedures that construct new values from given values • Examples • Arithmetic algebra, linear algebra, Boolean algebra …… • What are operands? • Arithmetic Algebra: variables like x and constants like 15 • What are operators? • Arithmetic Algebra: addition, subtraction…

  34. What is Relational Algebra? • An algebra • Whose operands are relations or variables that represent relations • Whose operators are designed to do common things that we need to do with relations in a database • relations as input, new relation as output • Can be used as a query languagefor relations

  35. Relational Operators at a Glance • Five basic RA operations: • Basic Set Operations • union, difference (no intersection, no complement) • Selection: (s) eliminates some rows of relations • Projection: (p) eliminates some columns of relations • Cartesian Product: (X) combine the tuples of two relations • When our relations have attribute names: • Renaming: (r) change the relation schema • Derived operations: • Intersection, complement • Joins (natural join, equi-join, theta join, semi-join, ……)

  36. Set Operations • Union: all tuples in R1 or R2, denoted as R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • If any, is duplicate elimination required? • yes • Intersection: all tuples in both R1 and R2 as R1 ∩ R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • Difference: all tuples in R1 but not in R2, denoted as R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • All-Employees - Retired-Employees

  37. Selection • Applied to a relation R, produce a new relation with a subset of R’s tuples • Returns all tuples (rows) which satisfy a condition that involves the attributes of R, denoted assc(R) • c is a condition: =, <, >,≤, ≥, ≠, AND, OR, NOT • Output schema: same as input schema • Find all employees with salary more than $40,000: • sSalary > 40000(Employee)

  38. Projection • Unary operation: returns certain columns, denoted as P A1,…,An(R) • Eliminates duplicate tuples ! • Input schema R(B1, …, Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1, …, An) • Example: project social-security number and names: • PSSN, Name (Employee)

  39. Selection vs. Projection • Think of relation as a table • How are they similar? • Selection of subsets from table • How are they different? • Horizontal vs. vertical? • selection-projection • Duplicate elimination for both? • No duplicate in the result of selection

  40. Cartesian Product • Each tuple in R1 with each tuple in R2, denoted as R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Output schema is S(A1, …, An, B1, …, Bm) • Very rare in practice; but joinsare very common • Example: Employee x Dependent

  41. Example Employee Dependent Employee x Dependent

  42. Renaming • Does not change the relational instance, denoted as Notation: rS(B1,…,Bn) (R) • Changes the relational schema only • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • Soc-sec-num, firstname(Employee)

  43. Set Operations: Intersection • Intersection: all tuples both in R1 and in R2, denoted as R1 R2 • R1, R2 must have the same schema • R1 R2 has the same schema as R1, R2 • Example: UnionizedEmployeesRetiredEmployees • Intersection is derived: • R1 R2 = R1 – (R1 – R2) why ? R1 R2 R1 ∩ R2

  44. Theta Join • A join that involves a predicate q, denoted as R1 qR2 • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 qR2 = sq (R1 x R2) • Take the product R1 x R2 • Then apply SELECTC to the result • As for SELECT, C can be any Boolean-valued condition

  45. Theta Join: Example Product Component ProductInfo:= Product Product.pname=Component.PName Component

  46. Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where{C1, …, Cp} = {A1, …, An} U{B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: • {A1,…,An} {B1,…, Bm}(called the join attributes)

  47. Natural Join: Examples Employee Dependent Employee Dependent = PSSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)

  48. Natural Join: Examples R S R S

  49. Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R S? • Given R(A, B), S(A, B), what is R S?

  50. Equi-join • Special case of theta join: condition c contains only conjunction of equalities • Result schema is the same as that of Cartesian product • May have fewer tuples than Cartesian product • Most frequently used in practice: R1 A=BR2 • Natural join is a particular case of equi-join • A lot of research on how to do it efficiently

More Related