1k likes | 1.43k Views
Faculty of Computer Science Technion – Israel Institute of Technology. Database Management Systems Course 236363 . Lecture 3: Relational Algebra. Outline. The Relational Model. A conceptual model for representing data, integrity constraints, and queries
E N D
Faculty of Computer Science Technion – Israel Institute of Technology Database Management Systems Course 236363 Lecture 3: Relational Algebra
The Relational Model • A conceptual model for representing data, integrity constraints, and queries • All based on the notion of a schema • DBMS is responsible for translating specifications into the physical environment at hand • Storage in files, caches, indexes • Queries translated to query plans (high-level imperative programs) • Query plans translated to low-level execution over stored data
Querying: Which Courses Avia Took? S C T Assembly ... mov $1, %rax mov $1, %rdi mov$message, %rsi mov $13, %rdx syscall mov$60, %rax xor %rdi, %rdi ... QL SQL SELECT C.name FROM S,C,T WHERE S.name = ‘Avia’ AND S.ID = T.sID AND T.cNum = C.number Logic (RC) {⟨x⟩|∃y,n,z,l,g [S(y,n,'Avia')∧C(z,x,l)∧T(y,z,g)]} Python Logic Programming (Datalog) for s in S: for c in C: for t in T: if s.sName==‘Avia’ and s.ID==t.sID and t.cNum == c.number: print c.name Q(x) S(y,n,‘Avia’),C(z,x,l),T(y,z,g) Algebra (RA) πC.name(σS.name=‘Avia’, number=cNum, ID=sID(S⨉C⨉T)))
The Relational Algebra (RA) • Mathematical query language • Introduced by Edgar Codd • [E. F. Codd: A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13(6): 377-387 (1970)] • [E. F. Codd: Relational Completeness of Data Base Sublanguages. In: R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose, California (1972)] • Since invention, developed and studied by Codd and many others Edgar F. Codd (1923-2003)
RA Example Names of students who study DB: name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
Why RA? • Understanding the relational algebra is a key understanding central concepts in databases: SQL, query evaluation, query optimization • Tool for building theoretical foundations of various query languages(e.g., SQL) • Tool for developing novel data/query models • See, e.g., [Fagin, Kimelfeld, Reiss, Vansummeren: Document Spanners: A Formal Approach to Information Extraction. J. ACM 62(2): 12 (2015)]
RA vs Other QLs • Some subtle (yet important) differences between RA and other languages • Can tables have duplicate records? • (RA vs. SQL) • Are missing (NULL) values allowed? • (RA vs. SQL) • Is there any order among records? • (RA vs. SQL) • Is the answer dependent on the domain from which values are taken (not just the DB)? • (RA vs. RC)
Relation Schema • A relation schema is a finite sequence of distinct attribute namesatt with a mapping of each to a domain dom of legal values • Notation: (att1:dom1,…,attk:domk) • Example: (sid:int, name:string, year:int)
Tuples • Let s be a relation schema (att1:dom1,…,attk:domk) • A tuple (over s) is a sequence (v1,…,vk) of values vi, where each viis in domi • That is, a tuple is an element of dom1×...×domk
Relations • A relationR is a pair (s,r) • sis a relation schema • Called the header of R • r is a finite set r of tuples over s
Ignoring Domains • In this lecture we ignore the attribute domains, since they play no special role • (Well, almost; they make a difference for query equivalence, but we do not get there…) • For example, we will write (sid, name, year) instead of (sid:int, name:string, year:int)
Notation • Notation 1: • Let R be a relation with the header (att1,…,attk) • Let t=(v1,…,vk)be a tuple in R • We refer to vi by t.atti • Notation 2: • Let a1,…,am be attributes in att1,…,attk • We denote by t[a1,…,am] the tuple (t.a1,…,t.am) t.sid= 861 t.name= Alma t[sid,name] = (861,Alma) t[name,sid,year] = (Alma,861,2) t
Databases • A database schema is finite set of relation names, each mapped into a relation schema • Example: Student(sid,name,year) , Course(cid,topic) , Studies(sid,cid) • A (database) instance over a schema consists of a relation for each relation schema
What is “Algebra”? • An abstract algebra consists of: • A class of elements • A collection of operators • Each operator: • Has an arityd • Has a domain of sequences (e1,…,ed) of elements • Maps every sequence in its domain to an element e • The definition of an operator allows for composition: o1(o2(x),o1(y,o4(x,z))) • Examples: • Ring of integers: (𝕫,{+,·}) • Boolean algebra: ({true,false},{∧,∨,¬}) • Relational algebra
The Relational Algebra • In the relational algebra (RA) the elements are relations • Recall: pairs (s,r) • RA has 6 primitive operators: • Unary: projection, selection, renaming • Binary: union, difference, Cartesian product • Each of the six is essential (independent)—we cannot define it using the others • We will see what exactly this means and how this can be proved • In practice, we allow many more useful operators that can be defined by the primitive ones • For example, intersection via union and difference
6 Primitive (Basic) Operators • Projection () • Selection () • Renaming () • Union (∪) • Difference (\) • Cartesian Product (×)
Projection by Example R = sid,name(R) = year(R) = Fewer tuples (why?)
Definition of Projection • Projection is a unary operator of the form A1,…,Akwhere each Ai is an attribute name • A projection is parameterized by attributes, so we actually have infinitely many different projection operators • Legal input: relation R in with attributes A1,…,Ak (and possibly others) • A1,…,Ak(R) is the relation S with: • Header (A1,…,Ak) • Tuple set {t[A1,…,Ak] | t ∊ R} Q: If R has 1000 tuples, how many tuples can A1,…,Ak(R) have?
Selection by Example R = course=‘DB’(R) = year=1 ⋀ grade>84(R) =
Definition of Selection • Selection is a unary operator of the formc where c is a logical condition (selection predicate) on attributes • c consists of comparisons and logical connectors (∧,∨,¬) • item1 = item2 • price ≥ 500 ∧ price ≤ budget • Legal input: A relation with all the attributes mentioned in the selection predicate • The condition is applied to each tuple in the input, and each violating tuple is filtered out • Formally, c(R)is the relation Swith the header of R and the tuple set {t | t ∊ R and t ⊨ c} Q:If R has 1000 tuples, how many tuples can c(R) have?
Variants of Selection • Various variants of RA may allow different languages for specifying selection predicates • e.g., c2>a2+b2; name starts with ‘A’, etc. • Common to all predicate formalisms: a predicate applies to a single tuple • Cannot state cross-tuple conditions, e.g., • “there is another tuple with the same name” • “contains at least 100 tuples”
Renaming by Example R = year/level(R) =
Definition of Renaming • Renaming is a unary operator of the formA/B where A and B are attribute names • Legal input: A relation with a header that contains A and does not contain B • Renaming changes only the header—attribute A becomes B • Formally, A/B(R) is the relation Swith • The header of R with A replaced by B • The tuple set of R Q: If R has 1000 tuples, how many tuples can A/B(R) have?
Union and Difference by Example R = S = R∪S = R \ S =
Definition of Union and Difference • Binary operators, interpreted as operations over the tuple sets • Legal input: a pair of relations R and S with the exact same header • We then say that R and S are union compatible • Formally: • R∪Sis the relation with the header of R (and S) and the union of the tuple sets • R\Sis the relation with the header of R (and S) and the difference between the tuple sets Q: If each of R and S have 1000 tuples, how many tuples can be in R∪S? R\S?
Cartesian Product by Example R = S = R×S =
Definition of Cartesian Product • Binary operator, similar to set product, but each output pair is combined into a single tuple • Legal input: A pair of relations with disjoint sets of attributes • So how to cross-product Mom(ssn) with Dad(ssn)? • Formally, let R and S have the headers (A1,…,Ak) and (B1,…,Bm), respectively; then R×Sis the relation Twith: • Header (A1,…,Ak,B1,…,Bm) • Tuple set {r∘s|r ∊ R and s ∊ S} • ∘ denotes concatenation Q: If each of R and S have 1000 tuples, how many tuples can be in R×S?
Shorthand Notation For Cartesian product of named relations (e.g., R, S), we actually allow common attributes, and implicitly assume their renaming to name.attribute R = S = R×S =
Parentheses Convention • We have defined 3 unary operators and 3 binary operators • It is acceptable to omit the parentheses from o(R) when o is unary • Then, unary operators take precedence over binary ones • Example: (course=‘DB’(Course)) ×(cid/cid1(Studies) ) becomes course=‘DB’Course× cid/cid1Studies
Composition Example Names of students who study DB: name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course×cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student× sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student× sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
name(sid=sid1(sid/sid1Student × sid,cid(cid=cid1(topic=‘DB’Course × cid/cid1Studies))))
Task Phrase a query that finds the names of students who get private lessons (i.e., the student takes a course that no one else takes)
Normal Form for RA Expressions Claim: A RA expression of the form F(E) is equivalent to a RA expression of the form F’(E)with no negation in F’. Proof: Repeatedly applying, get where α is the “complement” of θ.
Normal Form for RA Expressions Claim: A RA expression is equivalent to a RA expression in which all selections are of the form (X θ Y) where X and Y are constants or variables. Proof: Apply the previous claim and eliminate negations, only ˄ and ˅ connectives. Now,
Implied Operators • We now discuss relational operators that are: • Not among the 6 basic operators • Can be expressed in RA (implied) • Very common in practice • Enhancing the available operator set with the implied operators is a good idea! • Easier to write queries • Easier to understand/maintain queries • Easier for DBMS to apply specialized optimizations
Joins • Cartesian product is rarely standalone without selection, and is commonly followed by projection • The combination π×is referred to generally as “join” • There are several common cases that apply specific selections and projections, which we introduce here