1 / 100

Database Management Systems Course 236363

Faculty of Computer Science Technion – Israel Institute of Technology. Database Management Systems Course 236363 . Lecture 3: Relational Algebra. Outline. The Relational Model. A conceptual model for representing data, integrity constraints, and queries

amy
Download Presentation

Database Management Systems Course 236363

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Faculty of Computer Science Technion – Israel Institute of Technology Database Management Systems Course 236363 Lecture 3: Relational Algebra

  2. Outline

  3. The Relational Model • A conceptual model for representing data, integrity constraints, and queries • All based on the notion of a schema • DBMS is responsible for translating specifications into the physical environment at hand • Storage in files, caches, indexes • Queries translated to query plans (high-level imperative programs) • Query plans translated to low-level execution over stored data

  4. Querying: Which Courses Avia Took? S C T Assembly ... mov $1, %rax mov $1, %rdi mov$message, %rsi mov $13, %rdx syscall mov$60, %rax xor %rdi, %rdi ... QL SQL SELECT C.name FROM S,C,T WHERE S.name = ‘Avia’ AND S.ID = T.sID AND T.cNum = C.number Logic (RC) {⟨x⟩|∃y,n,z,l,g [S(y,n,'Avia')∧C(z,x,l)∧T(y,z,g)]} Python Logic Programming (Datalog) for s in S: for c in C: for t in T: if s.sName==‘Avia’ and s.ID==t.sID and t.cNum == c.number: print c.name Q(x)  S(y,n,‘Avia’),C(z,x,l),T(y,z,g) Algebra (RA) πC.name(σS.name=‘Avia’, number=cNum, ID=sID(S⨉C⨉T)))

  5. The Relational Algebra (RA) • Mathematical query language • Introduced by Edgar Codd • [E. F. Codd: A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13(6): 377-387 (1970)] • [E. F. Codd: Relational Completeness of Data Base Sublanguages. In: R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose, California (1972)] • Since invention, developed and studied by Codd and many others Edgar F. Codd (1923-2003)

  6. RA Example Names of students who study DB: name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  7. Why RA? • Understanding the relational algebra is a key understanding central concepts in databases: SQL, query evaluation, query optimization • Tool for building theoretical foundations of various query languages(e.g., SQL) • Tool for developing novel data/query models • See, e.g., [Fagin, Kimelfeld, Reiss, Vansummeren: Document Spanners: A Formal Approach to Information Extraction. J. ACM 62(2): 12 (2015)]

  8. RA vs Other QLs • Some subtle (yet important) differences between RA and other languages • Can tables have duplicate records? • (RA vs. SQL) • Are missing (NULL) values allowed? • (RA vs. SQL) • Is there any order among records? • (RA vs. SQL) • Is the answer dependent on the domain from which values are taken (not just the DB)? • (RA vs. RC)

  9. Relation Schema • A relation schema is a finite sequence of distinct attribute namesatt with a mapping of each to a domain dom of legal values • Notation: (att1:dom1,…,attk:domk) • Example: (sid:int, name:string, year:int)

  10. Tuples • Let s be a relation schema (att1:dom1,…,attk:domk) • A tuple (over s) is a sequence (v1,…,vk) of values vi, where each viis in domi • That is, a tuple is an element of dom1×...×domk

  11. Relations • A relationR is a pair (s,r) • sis a relation schema • Called the header of R • r is a finite set r of tuples over s

  12. Ignoring Domains • In this lecture we ignore the attribute domains, since they play no special role • (Well, almost; they make a difference for query equivalence, but we do not get there…) • For example, we will write (sid, name, year) instead of (sid:int, name:string, year:int)

  13. Notation • Notation 1: • Let R be a relation with the header (att1,…,attk) • Let t=(v1,…,vk)be a tuple in R • We refer to vi by t.atti • Notation 2: • Let a1,…,am be attributes in att1,…,attk • We denote by t[a1,…,am] the tuple (t.a1,…,t.am) t.sid= 861 t.name= Alma t[sid,name] = (861,Alma) t[name,sid,year] = (Alma,861,2) t

  14. Databases • A database schema is finite set of relation names, each mapped into a relation schema • Example: Student(sid,name,year) , Course(cid,topic) , Studies(sid,cid) • A (database) instance over a schema consists of a relation for each relation schema

  15. What is “Algebra”? • An abstract algebra consists of: • A class of elements • A collection of operators • Each operator: • Has an arityd • Has a domain of sequences (e1,…,ed) of elements • Maps every sequence in its domain to an element e • The definition of an operator allows for composition: o1(o2(x),o1(y,o4(x,z))) • Examples: • Ring of integers: (𝕫,{+,·}) • Boolean algebra: ({true,false},{∧,∨,¬}) • Relational algebra

  16. The Relational Algebra • In the relational algebra (RA) the elements are relations • Recall: pairs (s,r) • RA has 6 primitive operators: • Unary: projection, selection, renaming • Binary: union, difference, Cartesian product • Each of the six is essential (independent)—we cannot define it using the others • We will see what exactly this means and how this can be proved • In practice, we allow many more useful operators that can be defined by the primitive ones • For example, intersection via union and difference

  17. Outline

  18. 6 Primitive (Basic) Operators • Projection () • Selection () • Renaming () • Union (∪) • Difference (\) • Cartesian Product (×)

  19. Projection by Example R = sid,name(R) = year(R) = Fewer tuples (why?)

  20. Definition of Projection • Projection is a unary operator of the form A1,…,Akwhere each Ai is an attribute name • A projection is parameterized by attributes, so we actually have infinitely many different projection operators • Legal input: relation R in with attributes A1,…,Ak (and possibly others) • A1,…,Ak(R) is the relation S with: • Header (A1,…,Ak) • Tuple set {t[A1,…,Ak] | t ∊ R} Q: If R has 1000 tuples, how many tuples can A1,…,Ak(R) have?

  21. Selection by Example R = course=‘DB’(R) = year=1 ⋀ grade>84(R) =

  22. Definition of Selection • Selection is a unary operator of the formc where c is a logical condition (selection predicate) on attributes • c consists of comparisons and logical connectors (∧,∨,¬) • item1 = item2 • price ≥ 500 ∧ price ≤ budget • Legal input: A relation with all the attributes mentioned in the selection predicate • The condition is applied to each tuple in the input, and each violating tuple is filtered out • Formally, c(R)is the relation Swith the header of R and the tuple set {t | t ∊ R and t ⊨ c} Q:If R has 1000 tuples, how many tuples can c(R) have?

  23. Variants of Selection • Various variants of RA may allow different languages for specifying selection predicates • e.g., c2>a2+b2; name starts with ‘A’, etc. • Common to all predicate formalisms: a predicate applies to a single tuple • Cannot state cross-tuple conditions, e.g., • “there is another tuple with the same name” • “contains at least 100 tuples”

  24. Renaming by Example R = year/level(R) =

  25. Definition of Renaming • Renaming is a unary operator of the formA/B where A and B are attribute names • Legal input: A relation with a header that contains A and does not contain B • Renaming changes only the header—attribute A becomes B • Formally, A/B(R) is the relation Swith • The header of R with A replaced by B • The tuple set of R Q: If R has 1000 tuples, how many tuples can A/B(R) have?

  26. Union and Difference by Example R = S = R∪S = R \ S =

  27. Definition of Union and Difference • Binary operators, interpreted as operations over the tuple sets • Legal input: a pair of relations R and S with the exact same header • We then say that R and S are union compatible • Formally: • R∪Sis the relation with the header of R (and S) and the union of the tuple sets • R\Sis the relation with the header of R (and S) and the difference between the tuple sets Q: If each of R and S have 1000 tuples, how many tuples can be in R∪S? R\S?

  28. Cartesian Product by Example R = S = R×S =

  29. Definition of Cartesian Product • Binary operator, similar to set product, but each output pair is combined into a single tuple • Legal input: A pair of relations with disjoint sets of attributes • So how to cross-product Mom(ssn) with Dad(ssn)? • Formally, let R and S have the headers (A1,…,Ak) and (B1,…,Bm), respectively; then R×Sis the relation Twith: • Header (A1,…,Ak,B1,…,Bm) • Tuple set {r∘s|r ∊ R and s ∊ S} • ∘ denotes concatenation Q: If each of R and S have 1000 tuples, how many tuples can be in R×S?

  30. Shorthand Notation For Cartesian product of named relations (e.g., R, S), we actually allow common attributes, and implicitly assume their renaming to name.attribute R = S = R×S =

  31. Parentheses Convention • We have defined 3 unary operators and 3 binary operators • It is acceptable to omit the parentheses from o(R) when o is unary • Then, unary operators take precedence over binary ones • Example: (course=‘DB’(Course)) ×(cid/cid1(Studies) ) becomes course=‘DB’Course× cid/cid1Studies

  32. Composition Example Names of students who study DB: name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  33. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  34. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  35. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  36. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course×cid/cid1Studies))))

  37. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  38. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  39. name(sid=sid1(sid/sid1Student× sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  40. name(sid=sid1(sid/sid1Student× sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  41. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  42. name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))

  43. Task Phrase a query that finds the names of students who get private lessons (i.e., the student takes a course that no one else takes)

  44. Normal Form for RA Expressions Claim: A RA expression of the form F(E) is equivalent to a RA expression of the form F’(E)with no negation in F’. Proof: Repeatedly applying, get where α is the “complement” of θ.

  45. Normal Form for RA Expressions Claim: A RA expression is equivalent to a RA expression in which all selections are of the form (X θ Y) where X and Y are constants or variables. Proof: Apply the previous claim and eliminate negations, only ˄ and ˅ connectives. Now,

  46. Example: Normal Form for RA Expressions

  47. Outline

  48. Implied Operators • We now discuss relational operators that are: • Not among the 6 basic operators • Can be expressed in RA (implied) • Very common in practice • Enhancing the available operator set with the implied operators is a good idea! • Easier to write queries • Easier to understand/maintain queries • Easier for DBMS to apply specialized optimizations

  49. Outline

  50. Joins • Cartesian product is rarely standalone without selection, and is commonly followed by projection • The combination π×is referred to generally as “join” • There are several common cases that apply specific selections and projections, which we introduce here

More Related